Brendan's Braindump: January 2009

I mean that literally. This is nothing like a timely post. Still, better than no post at all, right?

Day Three was my volunteering exercise -- hey I was a starving student at the time, so I got a free student rego in exchange for helping out with conference organisation, collecting feedback forms, making sure the speakers were actually in the rooms speaking to people etc -- not a bad gig really if you choose the right room. I missed out on the "architectures you've always wondered about" track but I heard most of those guys in 2007 anyway, so I chose the Data Storage Rethinking: Document Oriented Distributed Databases track which I was very happy about -- it was fascinating and very useful for me.

Some notes only barely converted from my rough typing in between pressing the little clicker to count people going in and out of the room:

Hypertable

A column-based bigtable clone
GPLed
Stores history of everytyhing– even deletes are just stored as new entries with a flag
Splits tables automatically across machines if you need to
Instrumentation for monitoring etc not there yet for 1.0 (jan/feb next year) (note as of blog posting date: it's at 0.9.2 right now, getting there...)
In 1.1 master-slave communication will work much better, including intelligent resource allocation
Keeps a write-ahead commit log as well as the data store, so can recover from failure if written to a distributed FS
Has "Hyperspace" distributed lock manager – equivalent to “chubby” at google (whatever that is?! presumably somebody reading this knows...)
-> currently a SPOF but will have “some form of replication” by release
Can run on any distributed FS: hadoop HDFS, KFS (Kosmos FS) etc
All communication is asynchronous
Languages: C++ plus Thrift bindings which will expose java, python, PHP etc... Release containing this stuff will come out in a few weeks
Concurrency: “it uses MVCC”, he skipped it.. What does this mean?? (Wikipedia tells me it's "multi-version concurrency control" which is used by CouchDB, BerkeleyDB, MySQL/InnoDB etc)
Achieved over 1m inserts/sec on AOL test data (1TB of 30-byte query log rows -- ie almost pathological but good for certain use cases)
Google has “megatable”, abstraction layer on top of bigtable — hypertable will have an equivalent eventually
Have their own communication protocol

Atomserver

From the guys behind homeaway.com – Bryon Jacob and chris Berry
Took Abdera from apache to build their own framework
Added Atom Publishing Protocol extensions, eg
open search - google
paging – mark nottingham - rfc5005
“atom store” - get, put, edit, search via APP – canonical example is gdata
Uses Abdera which graduated from the Apache incubator this week and will go 1.0 very soon
Provides a solid, scalable, etc implementation on top of Abdera
APP Spec doesn’t force you to make services and workspaces first-class objects with own RESTful interfaces and URIs, but they do anyway
POST for new content where you let the server assign the ID, or PUT if you know the URI you want
Uses model of starting at the beginning and following next links to get everything (a la GData, I think..?)
Incrementing index numbers for all changes, so you can see things twice, as each changes gives the item a higher inde number, good for syncing eg queues
Has APP categories (aka tags)
Can create tags specifically for items using category docs
Can create hooks for auto-categorisers
xpath one built in, can use to extract standard tags from custom XML into category tags for querying later
view feeds by category, atomserver specific but based on gdata implementation
can do boolean ANDs and ORs of tags, to do a vague equyivalent of SQL SELECT queries
Concurrency for edits: each edit must have the revision number appended to the URI for optimistic locking – if that’s not the correct revision, it is rejected (409 CONFLICT)
link rel=”edit” URI has the revision number built in
Example of “atom-based service architecture” using objects with states — eg could do a moderation service by querying for a feed of objects with “UNMODERATED” tags
“etags are the preferred way to pass query parameters to atom” -- I wrote it down but I don't really know what they mean?! I thought etags were about caching?!
Aggregate feeds ability (in AtomServer only) - “we join on categories in the same way that SQL would join tables based on a column”
Batch updates via one feed doc — (I thought mime multipart was supposed to be used for that??)
Custom (pluggable) content storage – only supports RDBMS now but planning to support key-value stores (eg couchdb) later (we at the BBC are very keen to see this happen!)
Scales with multiple front-ends using one database – can replicate etc but still requires one sql database (for now)
I think I heard them say at the end that they don’t support mysql because they use transactions!?!? would be good to know more about that...

neo4j

it's a graph database
started off talking about growth in connected data, eg Facebook’s MySQL store: “facebook has hundreds of machines with 1TB RAM to keep their entire database in memory”
At first it sounded silly and just a replacement for RDF but when I could see that they can do depth-X pathExists() searches eg (friends of friends of people), 2ms for 1m people with average 50 connections, ie 25m connections! eg haven't you always wondered how LinkedIn could always say how many degrees away from each person you are when you do searches? that's hard! (I'm not saying that they use neo4j at LinkedIn, but they must use some similar algorithms -- I think they keep most of their social graph state in memory as well, from what I remember hearing at the QCon architectures track in 2007)
Has a NeoMock in-memory implementation for testing, but you can just put lots of RAM to your JVM and it uses memory for you
neo4j now has sparql support! v interesting
they are working on NeoRDF – have two customers but haven’t released as a product yet, it's coming in 1.1
They use the OSGi architecture for plugins -- it seems to be becoming a real standard now
They are thinking about releasing a standalone server, REST API etc
but they say that exposing domain-oriented services is better than exposing the database over the wire – as Ian yesterday was describing, Eran calls it “terrorist-oriented architecture” -- ie independent cells all capable of surviving on their own -- the extreme case of "small pieces loosely joined"
"it's hard to think of a good REST API for something as chatty as we are"
what's coming in v2.0? they are thinking of sharding (aka partitioning) on top of newton (infiniflow) from paremus
based on CAP theorem, BASE rather than ACID -- ie everything is synchronised eventually -- if you don't get this then google it, there are loads of presos about it
licensed under AGPLv3 – if you develop software with it it’s free, but if you use it to store more than 1m primitives, you have to pay

Couchdb
jan h, jan@apache.org

has the same optimistic locking approach as atomserver, keeps all revisions – ie nothing is locked, ever
uses mapreduce (with javascript as the scripting language!) for views, aggregations etc rather than inventing a new query language
is slow the first time, as it parses the javascript etc, builds a btree index
next time you query, it checks if anything updated and if so, gives the diff to the view server and builds a new diff with just the new data
therefore inserts are cheap, you only rebuild views when they are queried again (and even then only incrementally)
does syncing between DBs (based on lotus notes?!), one direction or bi-directional
books.couchdb.org/relax – drafts coming out in jan (looking now, they might be running a bit behind... but some intro chapters are up at least)

Couchdb in the real world
Jan again (who is available for consulting BTW, apparently we at the BBC have already employed him at least once!)

lots of standard storage design patterns change in a couchdb world...
eg views (sort of like stored procedures, but using map/reduce)
views are saved in “design documents” where your javascript goes
you can have multiple views in the same design document, but realise that they’re all updated each time any data changes
also put validation rules, authorisation, and more into a design document
CouchDB has no such thing as sequences, but you wouldn’t want to use sequences in a distributed env anyway – you have a system field called _id but that’s not a guaranteed sequence
if you do want to order your results, do it by a natural key such as time rather than some sequence id
CouchDB provides no transactions, no roundtripping, no multi-node transactions (they would be too expensive) - "use an http proxy if you need redundancy"!!? (syncing helps with that I guess? but how resilient is it really?)
Can have master-slave or master-master replication setups, eventually consistent but BASE, not ACID (see notes above)
You can add as many masters as you like, unlike mysql
Replication communication also happens over HTTP so you can use caches, proxies etc
because it’s all asynchronous, you can actually call couchdb directly via ajax

Phew, long and disjointed, sorry about that, but it's enough to get my notes down, I might expand on these topics later as we explore these technologies some more!

Hope it's useful to somebody.

Brendan's Braindump

Tuesday, 13 January 2009

QCon SF 2008 Day Three - nothing like a timely post

Interview with Vinod Khosla!

Search this Blog

Archive

Labels