Tuesday, August 18, 2009

Map-Reduce Database (MRDBMS)

With data growing faster than Moores Law, the different flavours of massively-parallel processing (MPP) database architectures are in limelight.
After Google revealing the power of Map-reduce applications for procesing large scale data, most of the new generation databases have followed that path. Hence I call them as Map-Reduce Databases.
With high scalability and performance being two most important considerations for a successful web 2.0+ application architecture, Relational DBMS is being either supplemented or replaced by MapReduce database systems (MRDBMS)


The important picks in my list folllow:

BigTable : From Google
I would call this as Father of MapReduce databases. The only way I know to use this is via Google App Engine :(
http://en.wikipedia.org/wiki/BigTable
If you use Google App engine, you can use BigTable via the native DataStore API, using JPA or JDO.

HBase : Based on HDFS from Yahoo
This is mostly implemented using Java. Has a C++ competitor called Hypertable.

AsterData - nCluster : From Aster Data
In-Database MapReduce
You need to implement Map-Reduce functions in a language of your choice.
(A variety of languages Java, C#, Python, Perl, C++, and C are supported) and deploy them onto the Queen Node. These functions can then be called in the SQL queries.

Greenplum Database : From Greenplum