Friday, July 23, 2010

Introducing the Mapper API

At Google I/O we announced the Mapper API. Built completely on top of public App Engine APIs today, this API is only the first component of App Engine’s MapReduce toolkit, but can be extremely useful on its own.

The Mapper API can already be of use to many developers who would otherwise need to build their own tool for doing large scale data manipulation. In addition to taking care of the distribution of these jobs over task queues, it provides the ability to store state, batch datastore writes via mutation pools, and ships with an easy to use administrative interface for job management, all optimized for the constraints of App Engine’s dynamic serving environment. Some examples of the types of operations that work with minimal configuration with this tool:

  • Report Generation
  • Large scale migration of entity properties
  • Datastore cleanup
  • Computing statistics and metrics
For an introduction to the Mapper API, watch Mike Aizatsky’s video from Google I/O, where he demonstrates building a source code indexer. The slides can be downloaded here, and the video is below:

The App Engine team has also written a few great articles on how to use the Mapper API.
  • For Python developers, take a look at the Python Mapper API post on Nick Johnson’s blog.
  • For Java developers, Ikai Lan has written a great post about the Java Mapper API, which takes some design cues from Hadoop’s API and includes several examples of common operations such as large scale modification of properties or batch delete.

When you’re ready to jump in and start using the tool, head over to the project homepage on Google Code. You’ll want to check out the “Getting Started” page for the language you’re using:

Happy Mapping!

- Fred, Mike, Ikai, Nick + the App Engine team


Cyrille said...

Trying to implement the mapper for massive entity creation. As far as you know, is the Mapper compatible with the use of transactions ?

Mani Doraisamy said...

Great! Burst computing finally meets app engine.

Sheth Raxit said...

I think bad combo. I am trying some stuff,

however Map-Reduce is for processing large data to divide and work, however appengine inherently does not provide cron/TaskQueue/httpRequest can run for more than few seconds or minutes.

Would love to see realtime/practical use of Mapreduce and appengine combo.

PS : I am trying to solve some algorithmic problem and appengine seems to have great limitation to kind of problem i am solving (may be better for web based App!)