Tuesday, August 18, 2009

Map-Reduce Database (MRDBMS)

With data growing faster than Moores Law, the different flavours of massively-parallel processing (MPP) database architectures are in limelight.
After Google revealing the power of Map-reduce applications for procesing large scale data, most of the new generation databases have followed that path. Hence I call them as Map-Reduce Databases.
With high scalability and performance being two most important considerations for a successful web 2.0+ application architecture, Relational DBMS is being either supplemented or replaced by MapReduce database systems (MRDBMS)


The important picks in my list folllow:

BigTable : From Google
I would call this as Father of MapReduce databases. The only way I know to use this is via Google App Engine :(
http://en.wikipedia.org/wiki/BigTable
If you use Google App engine, you can use BigTable via the native DataStore API, using JPA or JDO.

HBase : Based on HDFS from Yahoo
This is mostly implemented using Java. Has a C++ competitor called Hypertable.

AsterData - nCluster : From Aster Data
In-Database MapReduce
You need to implement Map-Reduce functions in a language of your choice.
(A variety of languages Java, C#, Python, Perl, C++, and C are supported) and deploy them onto the Queen Node. These functions can then be called in the SQL queries.

Greenplum Database : From Greenplum

Wednesday, July 29, 2009

Microsoft-Yahoo Search Deal

Yahoo signs an important search deal with Microsoft. An agreement for the next 10 years. Yahoo will continue to search throughout the existing content and properties but the technology behind the search will be powered by Microsoft - The BING - Yahoo will be the exclusive sales force for both companies premium search advertisers with Microsoft investing in the search technology.

I don't need to say that technically this deal was profitable for Microsoft. Who wont love to move from the 3rd position in the world to 2nd position, still keeping their technology as the foundation for all future development.

With this deal Yahoo is betting into the future. Yahoo finds this necessary for both companies to scale and compete against the online search and Advertising giant GOOGLE. The companies will invest couple of 100 millions for the successful transformation. Yahoo expects the users, advertisers and publishers will benefit from this deal.

For any one who is eager to know the impacts of this deal on the search as well as on the related services offered by both companies following are the expectations from the CEOs of both companies
  1. Users are expected to have the same privacy and capability as before on all the existing services
  2. The developers from both companies are expected to have the same ability of innovation
  3. There will be decisions made on what apis will be equal and open to share technology etc. in web and mobile space
  4. The companies expect to see a lot of innovation in the areas of technology innovation, for example better algorithms for relevance of content in the area of UI innovation.
  5. They expect to see innovations in areas other than technology. For example with combined marketing of Ads, both will get to see more information for business intelligence, on what customers need, what customers use most etc

As per the stated plan an execution team will be formed, with members from Yahoo and Microsoft. The team is expected to have mostly members of sales team from Yahoo and members of technology team from Microsoft.

Microsoft says the foundation platform for the joint search engine (search and advertising) will be Bing. Obviously this is a big win for Bing. What is more interesting is that as part of the agreement announced Microsoft will acquire rights for Yahoo's search engine technology for the next 10 years. This also mean

  • As part of all these some engineers from Yahoo may move to Microsoft. We might see the YUI and Hadoop technologies powering the decision engine along with AdCenter advertising sales technology.
  • Some developers of Yahoo search department might get moved to some other areas of development
  • Since there is some redundancy as part of this, some developers might be asked to go :(

Since Ballmer stated that introduction of Bing in Yahoo will be more of an integration than instead of a replace of search technology in Yahoo, the chances of third option (Yahoo layoff)may be minimal as lot of new opportunities always come into limelight due to the integration process at least for an initial duration. Especially when all these processes will take it into 2010.

Its is expected that with the vast amount of content that Yahoo possess over the last decade, Bing is going to benefit in term of its relevance in the search result. How much of this improvement will be and how much of that will take away the share of advertising market from Google to Microsoft/Yahoo will depend really on the technology capability of Bing as well as on the sales capabilities of Y! :)

Tuesday, July 21, 2009

Google App Engine is Supporting JVM

Almost a year after its release Google App Engine (GAE) is supporting Java. GAE supported only python as the runtime when it was released. There were stories of Java support in GAE for a long time.
Till now, the only options I knew to host Java applications in the cloud was the Amazon Web Services and the salesforce app engine. One of the important comment that attracts me in GAE is the support for multiple languages like Java, Scala, JRuby, Jython what else.. Literally google says that it support more than Java, it supports anything that can run on a JVM.

Google says it supports the Sun JVM, but with some restrictions. As per google these restrictions are for ensuring security in their cloud.

Google sees GAE as a Platform As A Service (PAAS) solution and all those Google services like BigTable are also available for JVM based applications (apart from the world famous Google Infrastructure).

Thoughtworks as always have done their cut-throat analysis of JVM support for GAE.
They have come up with some suggestions/recommendations in the areas of Testing, Persistence and Concurrency
As per their analysis:

  • Testing: There are things that google need to improve on testing aspects. The service stub approach may not be practical always. Developers need to be cautious that the testing will take more time than usual. If a google specific API like dataservice is used to access the google BigTable in the code, then testing that code outside GAE may become a herculian task today.
  • Persistence:
    Coding for JPA or JDO will only guarantee 80% abstraction. Developers still need to understand that you are dealing with Nested HashMaps and not with RDBMS.
    For people like me who have been using the RDBMS in the enterprise world for so many years this is going to be a big shift. Though I won't hesitate when i think about the scalability that BigTable offers :)
  • Concurrency:
    Google App Engine has a single thread model. 3rd party libraries may need to change to work in this environment.

So I hope altogether this is a good move from Google to support JVM on their PAAS solution and hope they will or some else will soon come up with a development/debugging environment for JVM based applications in GAE which will make our lifes more easy.

Monday, July 20, 2009

Developing Java Apps for Google App Engine

Developing Java Apps for Google App Engine


Testing:

There are things that google need to improve on testing aspects. The service stub approach may not be practical always.

Developers need to be cautious that the testing will take more time than otherwise.


Persistence:

Coding for JPA or JDO will only guarantee 80% abstraction. Developers still need to understand that you are dealing with Nested HashMaps and not with RDBMS.

This is a big shift in the enterprise development world.


Concurrency:

Google App Engine has a single thread model.

3rd party libraries may need to change to work in this environment.

Saturday, July 18, 2009

Hadoop is getting bigger!

This July has been a Hadoop / Map Reduce month till now..

• Hadoop Core is renamed Hadoop Common.
With the growth of Haddop and addition of specialized sub-projects, hadoop core no longer was the core of haddop rather just the common libraries of hadoop and hadoop abstraction projects.

• MapReduce and the Hadoop Distributed File System (HDFS) are now separate subprojects.

• Avro and Chukwa are new Hadoop subprojects
MapReduce programming model and Hadoop filesystem seems to take more buzz in the world and more related projects are getting released. The latest sub project releases on Apache were Avro and Chukwa

Avro is optimized cross-language data serialzation system for RPC and persistent storage. Its backed by a schema defined in JSON. The first release supports APIs in Java, C and Python.

Chukwa is data collection system for managing large distributed systems

Tuesday, June 23, 2009

HadoopDB

HadoopDB is a work released by researchers at Yale university.

Sunday, June 7, 2009

Virtual Power Plants

Are virtual power plants going to be a revolution?

I never thought.. I would write a blog!!!

I never thought.. I would write a blog...
Yet.. Here goes my first blog and I am still finding difficult and hesitant to scribble my thoughts on a rich text editor...

I always loved to go through other(s) blogs and found it always an opportunity to learn, relax and sometime provide one line comments...I was also questioned by some of my friends on why hesitant on blogging...Well I had so many reasons of my own and I used to get convinced by those.. I don't know whether those reasons will ever convince others.. may be never...

Dedicating this first blog to all my friends!!!