Saturday, July 31, 2010

Adding a Author index to our Bloggy App

There’s one more point about Arin’s design ( WTF is a SuperColumn? An Intro to the Cassandra Data Model
) which allows you to search by tag or for all posts by searching by a default tag (“__notag__”) . But what if (as is likely ) we want to get all posts by one author ? The Answer is to add a new ColumnFamily to our keyspace that looks the same as the TaggedPosts ColumnFamily but uses the Authors name as the tag. So our this will look like:
AuthorPosts : { // CF
     // blog entries created by “Andy"
      Andy: {  // Row key is the tag name
          // column names are TimeUUIDType, value is the row key into BlogEntries
           timeuuid_1 : i-got-a-new-guitar,
           timeuuid_2 : another-cool-guitar,
       },
_AllAuthors_: {  // Row key is the tag name
          // column names are TimeUUIDType, value is the row key into BlogEntries
          timeuuid_1 : i-got-a-new-guitar,
          timeuuid_2 : another-cool-guitar,
      }
}
We’ve used a made up tag _allAuthors_ for a row that’s going to store all posts from all authors. And in the conf file we add a column family definition like this:
<ColumnFamily CompareWith="TimeUUIDType" Name="AuthorPosts"/> 
We can add the post indexes to our ColumnFamily like this
ColumnPath authorsColumnPath = new ColumnPath("AuthorPosts");

authorsColumnPath.setColumn(asByteArray(timeUUID));
ks.insert(authorValue, authorsColumnPath, slugValue.getBytes());
//And do it for all others
ks.insert("_All-Authors_", authorsColumnPath, slugValue.getBytes());
Here authorValue is a string containg the Authors Name that we have used earlier in the code. timeUUID has been created earlier in the code when we added the TaggedPosts columns. See the previous post for details of creating this value.

The interesting thing about this is that we are using ColumnFamilys as indexes, in traditional SQL we would simply have done something like “Select * from Posts where Author like ‘Andy’ order by postdate” . Here in Cassandra we are creating indexes in Column Families so predetermining how we can search the data. Careful design is needed I think !

1 comment:

  1. Thanks i like your blog very much , i come back most days to find new posts like this.

    paper napkins

    ReplyDelete