Thursday, September 24, 2009

App Engine talks at a conference near you

It's September and the fall conference calendar is starting to fill up. Members of the App Engine team will present at these conferences: join us if you are in the area, and feel free to tweet @app_engine if you want us to participate in local developer community events around these dates!

We look forward to meeting you during these trips, if you can make it.

Tuesday, September 22, 2009

Agile paddling with App Engine: lessons learned building the Canoe '09 website

I work for Norex, a web development company in Halifax, Nova Scotia. As part of our sponsorship of the 2009 ICF Canoe Sprint World Championships in Halifax (Canoe '09), we developed an application to deliver real time race results to standard and mobile web browsers. Thanks to the ability to rapidly develop and deploy a scalable application on Google App Engine, and to do so live during the event, what began as a small experiment became a huge success for Norex.

Our goal was to import instant results (directly from the timing system FinishLynx), upcoming race information, athlete bios, teams, and past race results from a system used by the race organizers, and to reformat that data for distribution to web browsers. We chose to develop in Google App Engine (with Django and appengine patch) and iUI (iPhone User Interface) frameworks. In fact, this was our first time deploying an app using any of these technologies! While we had a few weeks to build the prototype, we also had to deal with the added challenges of a varying data format and special cases that could occur during live races. As with many software projects, these specifications were not provided on a regular or consistent basis from the race organizers, so we had to anticipate the unknowns and work to adjust to the conditions at the time.

The Canoe 09 Web Interface
The Canoe 09 Web Interface

To give you a taste for just how agile we were able to be on App Engine, here are some events during the race which we were able to cope with:

  • After the first day, the closed-circuit TV system which the race organizers planned to use for the event was down. Before it was back up and running, we were able to deploy a new view appropriate for display on the TVs.
  • By the end of the first day, the application had handled 840,000 requests at a peak of 60 requests/second. This was already several orders of magnitude higher than we had expected. At the start of the second day, we reached 10% of our CPU quota before 7:45am and were processing 100 request/second. While we had originally cached only the main pages, with a few lines of code, we extended our caching strategy to cache every possible page request, a different version for each web platform, and re-deployed during the day's races.
  • The rules of the events changed at 2pm on day 3 of the event to allow crew changes. By 4pm, I had reworked the models and views to support the change.
  • At 11am on the final day of races, someone in the control booth manually deleted inaccurate records, but this created inconsistencies in the model causing errors. By 11:05am, we had deployed a patch to stop the errors, and by 1pm had deployed a version which gracefully failed when referenced objects were inadvertently deleted.
  • Every night we deployed new views of the data as requested by race organizers, participants, and spectators.

We found that new deployments were possible (and quick) because we could stage versions of the application to the App Engine servers, and test these staged versions on live data. Switching from the live version to the staged version and back again takes seconds, thus we could paddle as confidently and rapidly as the racers.

The International Canoe Federation was so impressed by the dependability and versatility of the solution, that they elected to replace their usual channels with our application to serve up the official results for news organizations and to Sport Federation sites all over the world, and to be used internally for calculating medal counts.

All in all, we saw over 1,000,000 page views from 93 countries around the world, and experienced incredible stability and scalability from Google App Engine even when we were spiking 350 requests per second during the finals. It was a big win for Norex, and App Engine has proven itself to be a serious contender for developing scalable web applications. Thanks so much to the Google App Engine team for providing such an outstanding product!

Link to: ZAP Results Application

Monday, September 14, 2009

Migration to a Better Datastore

At Google, we've learned through experience to treat everything with healthy skepticism. We expect that servers, racks, shared GFS cells, and even entire datacenters will occasionally go down, sometimes with little or no warning. This has led us to try as hard as possible to design our products to run on multiple servers, multiple cells, and even multiple datacenters simultaneously, so that they keep running even if any one (or more) redundant underlying parts go down. We call this multihoming. It's a term that usually applies narrowly, to networking alone, but we use it much more broadly in our internal language.

Multihoming is straightforward for read-only products like web search, but it's more difficult for products that allow users to read and write data in real time, like GMail, Google Calendar, and App Engine. I've personally spent a while thinking about how multihoming applies to the App Engine datastore. I even gave a talk about it at this year's Google I/O.

While I've got you captive, I'll describe how multihoming currently works in App Engine, and how we're going to improve it with a release next week. I'll wrap things up with more detail about App Engine's maintenance schedule.

Bigtable replication and planned datacenter moves

When we launched App Engine, the datastore served each application's data out of one datacenter at a time. Data was replicated to other datacenters in the background, using Bigtable's built-in replication facility. For the most part, this was a big win. It gave us mature, robust, real time replication for all datastore data and metadata.

For example, if the datastore was serving data for some apps from datacenter A, and we needed to switch to serving their data from datacenter B, we simply flipped the datastore to read only mode, waited for Bigtable replication to flush any remaining writes from A to B, then flipped the switch back and started serving in read/write mode from B. This generally works well, but it depends on the Bigtable cells in both A and B to be healthy. Of course, we wouldn't want to move to B if it was unhealthy, but we definitely would if B was healthy but A wasn't.

Planning for trouble

Google continuously monitors the overall health of App Engine's underlying services, like GFS and Bigtable, in all of our datacenters. However, unexpected problems can crop up from time to time. When that happens, having backup options available is crucial.

You may remember the unplanned outage we had a few months ago. We published a detailed postmortem; in a nutshell, the shared GFS cell we use went down hard, which took us down as well, and it took a while to get the GFS cell back up. The GFS cell is just one example of the extent to which we use shared infrastructure at Google. It's one of our greatest strengths, in my opinion, but it has its drawbacks. One of the most noticeable drawback is loss of isolation. When a piece of shared infrastructure has problems or goes down, it affects everything that uses it.

In the example above, if the Bigtable cell in A is unhealthy, we're in trouble. Bigtable replication is fast, but it runs in the background, so it's usually at least a little behind, which is why we wait for that final flush before switching to B. If A is unhealthy, some of its data may be unavailable for extended periods of time. We can't get to it, so we can't flush it, we can't switch to B, and we're stuck in A until its Bigtable cell recovers enough to let us finish the flush. In extreme cases like this, we might not know how soon the data in A will become available. Rather than waiting indefinitely for A to recover, we'd like to have the option to cut our losses and serve out of B instead of A, even if it means a small, bounded amount of disruption to application data. Following our example, that extreme recovery scenario would go something like this:

We give up on flushing the most recent writes in A that haven't replicated to B, and switch to serving the data that is in B. Thankfully, there isn't much data in A that hasn't replicated to B, because replication is usually quite fast. It depends on the nature of the failure, but the window of unreplicated data usually only includes a small fraction of apps, and is often as small as a few thousand recent puts, deletes, and transaction commits, across all affected apps.

Naturally, when A comes back online, we can recover that unreplicated data, but if we've already started serving from B, we can't automatically copy it over from A, since there may have been conflicting writes in B to the same entities. If your app had unreplicated writes, we can at least provide you with a full dump of those writes from A, so that your data isn't lost forever. We can also provide you with tools to relatively easily apply those unreplicated writes to your current datastore serving out of B.

Unfortunately, Bigtable replication on its own isn't quite enough for us to implement the extreme recovery scenario above. We use Bigtable single-row transactions, which let us do read/modify/write operations on multiple columns in a row, to make our datastore writes transactional and consistent. Unfortunately, Bigtable replication operates at the column value level, not the row level. This means that after a Bigtable transaction in A that updates two columns, one of the new column values could be replicated to B but not the other.

If this happened, and we switched to B without flushing the other column value, the datastore would be internally inconsistent and difficult to recover to a consistent state without the data in A. In our July 2nd outage, it was partly this expectation of internal inconsistency that prevented us from switching to datacenter B when A became unhealthy.

Megastore replication saves the day!

Thankfully, there's a solution to our consistency problem: Megastore replication. Megastore is an internal library on top of Bigtable that supports declarative schemas, multi-row transactions, secondary indices, and recently, consistent replication across datacenters. The App Engine datastore uses Megastore liberally. We don't need all of its features - declarative schemas, for example - but we've been following the consistent replication feature closely during its development.

Megastore replication is similar to Bigtable replication in that it replicates data across multiple datacenters, but it replicates at the level of entire entity group transactions, not individual Bigtable column values. Furthermore, transactions on a given entity group are always replicated in order. This means that if Bigtable in datacenter A becomes unhealthy, and we must take the extreme option to switch to B before all of the data in A has flushed, B will be consistent and usable. Some writes may be stuck in A and unavailable in B, but B will always be a consistent recent snapshot of the data in A. Some scattered entity groups may be stale, ie they may not reflect the most recent updates, but we'd at least be able to start serving from B immediately, as opposed waiting for A to recover.

To Paxos or not to Paxos

Megastore replication was originally intended to replicate across multiple datacenters synchronously and atomically, using Paxos. Unfortunately, as I described in my Google I/O talk, the latency of Paxos across datacenters is simply too high for a low-level, developer facing storage system like the App Engine datastore.

Due to that, we've been working with the Megastore team on an alternative: asynchronous, background replication similar to Bigtable's. This system maintains the write latency our developers expect, since it doesn't replicate synchronously (with Paxos or otherwise), but it's still consistent and fast enough that we can switch datacenters at a moment's notice with a minimum of unreplicated data.

Onward and upward

We've had a fully functional version of asynchronous Megastore replication for a while. We've been testing it heavily, working out the kinks, and stressing it to make sure it's robust as possible. We've also been using it in our internal version of App Engine for a couple months. I'm excited to announce that we'll be migrating the public App Engine datastore to use it in a couple weeks, on September 22nd.

This migration does require some datastore downtime. First, we'll switch the datastore to read only mode for a short period, probably around 20-30 minutes, while we do our normal data replication flush, and roll forward any transactions that have been committed but not fully applied. Then, since Megastore replication uses a new transaction log format, we need to take the entire datastore down while we drop and recreate our transaction log columns in Bigtable. We expect this to only take a few minutes. After that, we'll be back up and running on Megastore replication!

As described, Megastore replication will make App Engine much more resilient to hiccoughs and outages in individual datacenters and significantly reduce the likelihood of extended outages. It also opens the door to two new options which will give developers more control over how their data is read and written. First, we're exploring allowing reads from the non-primary datastore if the primary datastore is taking too long to respond, which could decrease the likelihood of timeouts on read operations. Second, we're exploring full Paxos for write operations on an opt-in basis, guaranteeing data is always synchronously replicated across datacenters, which would increase availability at the cost of additional write latency.

Both of these features are speculative right now, but we're looking forward to allowing developers to make the decisions that fit their applications best!

Planning for scheduled maintenance

Finally, a word about our maintenance schedule. App Engine's scheduled maintenance periods usually correspond to shifts in primary application serving between datacenters. Our maintenance periods usually last for about an hour, during which application serving is continuous, but access to the Datastore and memcache may be read-only or completely unavailable.

We've recently developed better visibility into when we expect to shift datacenters. This information isn't perfect, but we've heard from many developers that they'd like more advance notice from App Engine about when these maintenance periods will occur. Therefore, we're happy to announce below the preliminary maintenance schedule for the rest of 2009.

  • Tuesday, September 22nd, 5:00 PM Pacific Time (migration to Megastore)
  • Tuesday, November 3rd, 5:00 PM Pacific Time
  • Tuesday, December 1st, 5:00 PM Pacific Time

We don't expect this information to change, but if it does, we'll notify you (via the App Engine Downtime Notify Google Group) as soon as possible. The App Engine team members are personally dedicated to keeping your applications serving without interruption, and we realize that weekday maintenance periods aren't ideal for many. However, we've selected the day of the week and time of day for maintenance to balance disruption to App Engine developers with availability of the full engineering teams of the services App Engine relies upon, like GFS and Bigtable. In the coming months, we expect features like Megastore replication to help reduce the length of our maintenance periods.

Friday, September 4, 2009

App Engine Launcher for Windows

As recently announced on the Google App Engine Blog, the 1.2.5 SDK for Python now includes a GUI for creating, running, and deploying App Engine applications when developing on Windows. We call this the Google App Engine Launcher.

About a year ago, a few of us recognized a need for a client tool to help with App Engine development. In our 20% time, a we wrote a launcher for the Mac. Of course, not all App Engine developers have Macs, so more work was needed. Thus, a new crew of 20%ers set off to write a launcher for our App Engine developers on Windows. Although Google is spread out across many offices around the world, it is surprisingly easy to connect with passionate engineers. For example, this new launcher for Windows has contributions from Dave Symonds in Australia, Mark Dalrymple on the east coast, and more engineers here in Mountain View.

The Windows launcher is written in Python and uses wxPython for its GUI. This means (with a little care) the launcher should work on Linux, and we'd like Linux developers to have the option of using it. Although we ship a binary of the Launcher for Windows (thanks to py2exe), shipping binaries for Linux is a bit more challenging. Fortunately, Google has a well-traveled path for solving this problem. For example, Google O3D provides binaries for Windows/Mac; it also provides source code and instructions for building on Linux. Thus inspired, we've open sourced the Windows launcher so that developers can use it on other platforms.

The goal of the launcher is to help make App Engine development quick and easy. There may be other tasks you'd like to integrate (e.g. run tests, re-encode images before deploying, etc) and with the launcher now open sourced, you can add them! We look forward to seeing contributions from the community.

We have also started the process of open sourcing the Mac version of the launcher. The source code is now available; however, it references some missing Google libraries, so it won't yet compile in its current state. Fortunately, those libraries have also been open sourced, so it will be possible to get things up and running using entirely open source code. I'll be using more of my 20% time to clean up the Mac launcher project in the coming weeks.

We hope the launcher will improve the workflow for App Engine developers. We also hope the source code will enable developers to adapt it to their needs, just as we do on Chrome, my main project. Finally, I am proud to continue a tradition of openness which began with my very first project at Google.

-- John Grabowski, Software Engineer

Let us know how the launcher works for you.

Open Source Code for the App Engine Launcher: for Windows and Linux, and for Mac OS X.

Screenshot of Google App Engine Launcher for Windows

Thursday, September 3, 2009

App Engine SDK 1.2.5 released for Python and Java, now with XMPP support

Today we are releasing version 1.2.5 of the App Engine SDK for both Python and Java, our first simultaneous release across both runtimes. We're excited about the great new functionality in this release ... including XMPP!

XMPP Support

XMPP (or Jabber as it is sometimes known) is an open standard for communicating in real-time (instant messaging). One of the most popular API requests in the App Engine issue tracker has been support for XMPP, so today we are excited to mark that issue closed with the release of our new XMPP API for both Python and Java SDKs!

Like the other APIs that App Engine provides for developers, XMPP is built on the same powerful infrastructure that serves other Google products. In this case, we take advantage of the servers that run Google Talk. This new API allows your app to exchange messages with users on any XMPP-based network, including (but not limited to!) Google Talk. If you're currently participating in the Google Wave developer preview, you can also use the XMPP API to build bots that interact with your waves.

We've tried to make the XMPP API as simple as possible to incorporate into your existing Python or Java applications. We use the same webhook pattern that Cron and Task Queue already use: you send outgoing messages with an API call; you receive incoming messages as an HTTP POST. You can read more about the features the XMPP API in our documentation (Python, Java).

We're very proud of our first XMPP release, but there's still more work to do. In the future we hope to provide even more functionality to apps, such as user status (presence) and info on new subscriptions. If you have particular requests or feedback, please let us know.

Task Queue API for Java

Python developers have been processing tasks offline using App Engine Task Queues since mid-June, but until now the feature was not available in the App Engine for Java SDK. The 1.2.5 SDK now includes support for creating Tasks and Queues in our Java runtime.

If you're familiar with the Python Task Queue API, the Java version will look very familiar. We use the same webhooks pattern as with Cron (and now XMPP). The API provides a simple pattern for creating tasks, assigning them a payload and a worker, and inserting them into queues for scheduling and processing. There's lots of potential with the Task Queue API, so make sure to check out the Java Task Queue Documentation for more details.

Raising Limits for the Task Queue API

With the 1.2.5 release, we are increasing the daily quota for Task Queue insertions to 100K for billing-enabled apps. Ultimately, we will raise the quota for both free and billing-enabled apps, but we hope this intermediate step opens up new scenarios for our developers using Task Queues.

New App Engine Launcher for Windows

Last but not least, we're very excited that 1.2.5 for Python now includes a Windows-based version of a useful tool that Mac OS X users have been enjoying for sometime: The Google App Engine Launcher!

Screenshot of Google App Engine Launcher for Windows

This tool simplifies the process of creating new Python projects, testing them locally, and uploading them to the App Engine servers. In addition, we're releasing the source code for both Mac and Windows App Engine Launchers as open source projects. Watch this space for more details on where you can find the source, and how Linux developers can use the Launcher as well.

1.2.5 also includes the usual set of bug fixes, tweaks, and API polish. For a more detailed look at all the things that have changed this release, take a look at our release notes and, as always, let us know what you think!