Monday, December 15, 2008

CouchDB for Enterprise Applications?

A little late in noticing this: CouchDB has become a first class citizen project of Apache [http://couchdb.apache.org/].

This is an interesting development. For us, enterprise application folks, it takes a while to get rid of the relational paradigm and bring ourselves to consider a document-oriented database such as this one. But I have to say that years of squeezing documents into the relational model have slowly driven me to appreciate the virtue of something different.
Full blown document management systems have always been something that feels too much "document publisher" oriented. The preoccupations seem to be largely around supporting the management of documents humans read, and, as a consequence, some of the enterprise considerations are not really there.
CouchDB may be the beginning of something different.

Here is what I like about CouchDB
- It's schema-free. The documents it manages are not tied a predefined format and support both attachments and versioning.
- Views are manipulated in "scripting" language
- Indexing, incremental replication support
- Views leverage a scalable map-reduce approach
- Full REST api, client language independence
- Architected from the ground up for massive scalability and high availability deployments
- Architected from the ground up for smart replication
- Integration with Lucene for full text search (upcoming I think)

Here is what I do not like about CouchDB
- Javascript everywhere as the default. Javascript has evolved quite a lot since I first dealt with it, but I still have difficulties accepting its permissiveness.
Of course, this can be changed - in particular for the construction of views. This alleviates my concern.

I am ambivalent on the usage of JSON as the storage format. While I like the principles of JSON, it lacks some of the higher level features that enterprise systems in general care about.
I also have not a clear view yet on how transactional behavior will work. ACID-compliance on single nodes seems to be the focus.
I also do not have a clear view yet on how security will be managed, but I trust those issues are either sorted out or will be sorted out.

That being said, the REST/HTTP - JSON - JavaScript combination will be appealing to most of the Web 2.0 / Enterprise 2.0 community. Adobe and others have done a lot to popularize this combination and it does offer a high degree of flexibility.
This popularization, combined with the up-front massive scalability built into much of the architecture, leads me to believe that performance for document based interactions will meet the requirements of most enterprise applications. I have, however, not seen independent data around this.

There have been other efforts along the same lines. But I do like the usage of Erlang [http://www.erlang.org/] for the implementation language. Erlang is an elegant language for developing scalable server code and satisfies most of the constraints I have seen been mandatory to satisfy in high performance distributed systems.

I am curious to know what the Enterprise Software community thinks of this. We all have a number of cases in which we would trade some relational features for the flexibility this document-centric approach brings. I can see a few for the products I am responsible for...

Are there other efforts similar to this you would recommend looking at? I have looked (a little) at Hypertable on Hadoop. But it's not the same thing...

No comments: