At Google I/O we announced the Mapper API. Built completely on top of public App Engine APIs today, this API is only the first component of App Engine’s MapReduce toolkit, but can be extremely useful on its own.
The Mapper API can already be of use to many developers who would otherwise need to build their own tool for doing large scale data manipulation. In addition to taking care of the distribution of these jobs over task queues, it provides the ability to store state, batch datastore writes via mutation pools, and ships with an easy to use administrative interface for job management, all optimized for the constraints of App Engine’s dynamic serving environment. Some examples of the types of operations that work with minimal configuration with this tool:
- Report Generation
- Large scale migration of entity properties
- Datastore cleanup
- Computing statistics and metrics
- For Python developers, take a look at the Python Mapper API post on Nick Johnson’s blog.
- For Java developers, Ikai Lan has written a great post about the Java Mapper API, which takes some design cues from Hadoop’s API and includes several examples of common operations such as large scale modification of properties or batch delete.
When you’re ready to jump in and start using the tool, head over to the project homepage on Google Code. You’ll want to check out the “Getting Started” page for the language you’re using:
- Fred, Mike, Ikai, Nick + the App Engine team