1. Polymorphism in a relational database

    so we’ve settled out in san jose and since it’s been a few weeks of last.fm top artists and not much else, I figured I would check in.

    On the WePay front, we’re having a lot of fun and have been hard at work.  The change of scenery has been really helpful in letting us really refocus and dig in on what matters.  The consistent 75-80 degrees isn’t too shabby, either.

    I’ve spent the past week wrestling with a particularly challenging technical issue - basically the recreation of a more advanced facebook newsfeed.  The polymorphism required makes working in a relational database environment not so much fun.

    Being as i’m undergoing my 4th rewrite I’m considering this my opportunity to vent. Perhaps in articulation the solution will become clear… and so the issues as I see them are as follows:

    • Different activities….
      • Invitations to join groups, changing profile information, sending money, receiving money, etc.
    • …with different objects
      • Photos, transactions, comments, etc
    • Multiple users & groups involved in varying roles (i.e. Bill invited Sally to join the Ultimate Frisbee group)
    • Multiple perspectives of each event 
      • in Bill’s feed: “you invited Sally to Ultimate Frisbee”
      • in Sally’s feed: “Bill invited you to Ultimate Frisbee”
      • in the Ultimate Frisbee feed: “Bill invited Sally to join this group”

    My worry in all of this is that normalizing the data in a strict fashion will require queries to become extremely complex and unwieldy very quickly (if even possible at all).  Based on the type of the actors and the type of action and the type of object, we are querying different tables… which makes me uncomfortable.

    And so I believe there are two solutions: queuing or denormalization.  

    In a queueing scheme, I’d build a listening queue for every group or user who would then be notified of events as they occur, pre-formatted for their appropriate display, and probably hosted by some 3rd party queuing provider like Amazon.  Sally would have a queue of events that read as “Bill invited you to Ultimate Frisbee”.  In Bill’s queue it would be formatted appropriate, as in the group.  The load here occurs at the time the event happens, as it must be added to all appropriate queues.  Generating the actual feed for display would be trivial, requiring little to no database queries.

    The downsides are obvious - requires 3rd party software, complexity in implementation, etc etc.  However, I think the prospects for long-term scalability are much higher.

    Denormalization is a little bit more “hackish”, but might be easier to implement and require less moving parts.  Basically, my strategy would be to store 2 group ids (an actor and an acted-upon) and a group id, an action_type and a serialized data object.  So if action_type was “message_sent”, the data object might hold the message body, if it was “photo_post” it would hold the location of the new and old photos.  When rendering the event in a feed, I could format the data appropriately.  However, this could still require a significant amount of queries in addition to parsing my data object for each feed item.

    Would love to hear any thoughts or war stories on feeds, polymorphism in a relational database, queueing or the like - either here or on twitter @billclerico