Google App Engine Blog: Developer Insights: Teaching thousands of students to program on Udacity with App Engine (part 2)

This post is the second of our two-part series discussing how Udacity uses Google App Engine.

Today’s guest blogger is Chris Chew, senior software engineer at Udacity, which offers free online courses in programming and other subjects. Chris shares how Udacity itself is built using App Engine.

Steve Huffman blogged yesterday about how App Engine enables the project-based learning that makes his web development course so powerful. People are often surprised to learn that Udacity itself is built on App Engine.

The choice to use App Engine originally came from Mike Sokolsky, our CTO and cofounder, after his experience keeping the original version of our extremely popular AI course running on a series of virtual machines. Mike found App Engine’s operational simplicity extremely compelling after weeks of endlessly spinning up additional servers and administering MySQL replication in order to meet the crazy scale patterns we experience.

Close to a year later, with ten months of live traffic on App Engine, we continue to be satisfied customers. While there are a few things we do outside App Engine, our choice to continue using App Engine for our core application is clear: We prefer to spend our time figuring out how to scale personalized education, not memcached. App Engine’s infrastructure is better than what we could build ourselves, and it frees us to focus on behavior rather than operations.

How Udacity Uses App Engine

The App Engine features we use most include a pretty broad swath of the platform:

High Replication Datastore with NDB
Memcache
Task Queues - Deferred execution, MapReduce, batch jobs
App Engine Search API -- Indexing both course content and student résumés
Blobstore API -- Lecture videos, résumés, data exportation
Image API - Thumbnail generation
MapReduce API - Daily usage analytics, data migrations, data maintenance

A high-level representation of our “stack” looks something like this:

Trails and Trove are two libraries developed in-house mainly by Piotr Kaminski. Trails supplies very clean semantics for creating families of RESTful endpoints on top of a webapp2.RequestHandler with automagic marshalling. Trove is a wrapper around NDB that adds common property types (e.g. efficient dereferencing of key properties), yet another layer of caching for entities with relations (both in-process and memcache), and an event “watcher” framework for reliably triggering out-of-band processing when data changes.

Something notable that is not represented in the drawing above is a specific set of monkey patches from Trove we apply to NDB to create better hooks similar to the existing pre/post-put/delete hooks. These custom hooks power a “watcher” abstraction that provides targeted pieces of code the opportunity to react to changes in the data layer. Execution of each watcher is deferred and runs outside the scope of the request so as to not increase response times.

Latency

During our first year of scaling on App Engine we learned its performance is a complex thing to understand. Response time is a function of several factors both inside and outside our control. App Engine’s ability to “scale-out” is undeniable, but we have observed high variance in response times for a given request, even during periods with low load on the system. As a consequence we have learned to do a number of things to minimize the impact of latency variance:

Converting usage of the old datastore API to the new NDB API
Using NDB.tasklet coroutines as much as possible to enable parallelism during blocking RPC operations
Not indexing fields by default and adding an index only when we need it for a query
Carefully avoiding index hotspots by indexing fields with predictable values only when necessary (i.e. auto-now DateTime and enumerated “choices” String properties).
Materializing data views very aggressively so we can limit each request to the fewest datastore queries possible

This last point is obvious in the sense that naturally you get faster responses when you do less work. But we have taken pre-materializing views to an extreme level by denormalizing several aspects of our domain into read-optimized records. For example, the read-optimized version of a user’s profile record might contain standard profile information, plus privacy configuration, course enrollment information, course progress, and permissions -- all things a data modeler would normally want to store separately. We pile it together into the equivalent of a materialized view so we can fetch it all in one query.

Conclusion

App Engine is an amazingly complete and reliable platform that works astonishingly well for a huge number of use cases. It is very apparent the services and APIs have been designed by people who know how to scale web applications, and we feel lucky to have the opportunity to ride on the shoulders of such giants. It is trivial to whip together a proof-of-concept for almost any idea, and the subsequent work to scale your app is significantly less than if you had rolled your own infrastructure.

As with any platform, there are tradeoffs. The tradeoff with App Engine is that you get an amazing suite of scale-ready services at the cost of relentlessly optimizing to minimize latency spikes. This is an easy tradeoff for us because App Engine has served us well through several exciting usage spikes and there is no question the progress we have already made towards our mission is significantly more than if we were also building our own infrastructure. Like most choices in life, this choice can be boiled down to a bumper sticker:

Editor’s note: Chris Chew and Steve Huffman will be participating in a Google Developers Live Hangout tomorrow, Thursday, November 1st, check it out here and submit your questions for them to answer live on air.

-Contributed by Chris Chew, Senior Software Engineer, Udacity

Posted by Zafir Khan, Product Marketing Manager, Google App Engine