Friday, July 15, 2011

Java App Engine outage, July 14, 2011

On July 14, 2011, beginning at 7 PM US/Pacific time (PDT/GMT-7), a subset of Java App Engine applications were affected by a service outage, which gradually increased in magnitude over time. At 9:30 PM US/Pacific, repair work commenced which began to reduce the effect of the outage; by 11:30 PM US/Pacific, the repair work had completed, restoring normal service to all Java App Engine applications.
During this period, affected applications would have experienced high latency and error rates. This outage occurred shortly after a scheduled maintenance period; however, the outage was not related to the maintenance work.
Overall reliability, quick return to service, and fast, accurate communication to our customers are some of the core goals of Google App Engine's service offering. While we restored service relatively quickly, it's clear to us that we fell short in prompt communication of status updates. We apologize for this, and we'll look at our procedures to improve our performance in this area.
In the meantime, we have a preliminary understanding of the outage, and we are continuing our investigation to insure that we have fully repaired the root cause. We will publish a detailed postmortem once we have concluded our research. Thanks again for your patience and understanding.

[Edit] Clarification: no HR datastore apps were affected. Overall, the outage resulted in a 1.9% error rate, affecting approximately 0.005% of all App Engine traffic at peak.

Posted by Wesley Chun, Google App Engine team


Eric Gonzalez said...

Were any clients affected by the outage using the high replication data store?

The App Engine Team said...

No, but it was not a datastore issue.