Rewards
PLAY : Cities Toys TechCzar.com

Yahoo! Search running Apache Hadoop on Large Scale
Over on the Yahoo! Hadoop blog, you can read about how the webmap team in Yahoo! Search is using the Apache Hadoop distributed computing framework. They're using over 10,000 CPU cores to build the map and processing a ton of data to do so. They end up using over 5 petabytes of raw disk storage, eventually outputting over 300 terabytes of compressed data that's used to power every single search.

As part of that post, I got to interview Sameer and Arnab to learn more about the history of the webmap and why they moved from our proprietary infrastructure to using Hadoop.

One of the points I try to make during the interview is that this a huge milestone for Hadoop. Yahoo! is using Hadoop in a very large scale (and growing) production deployment. It's not just an experiment or research project. There's real money on the line. (It's too bad we had a technical glitch in the video right as we were discussing a Really Big Number.)

As Eric says in that post:

The Webmap launch demonstrates the power of Hadoop to solve truly Internet-sized problems and to function reliably in a large scale production setting. We can now say that the results generated by the billions of Web search queries run at Yahoo! every month depend to a large degree on data produced by Hadoop clusters.

It looks to me like 2008 and 2009 are going to be big growth years for the Hadoop project--and not just at Yahoo!

Stay tuned...
Members bite back:
TechCzar.com TechCzar.com
TechCzar.com - TechCzar.com

TechCzar.com - TechCzar.com
TechCzar.com