Google App Engine Blog: October 2012

Wednesday, October 31, 2012

Developer Insights: Teaching thousands of students to program on Udacity with App Engine (part 2)

This post is the second of our two-part series discussing how Udacity uses Google App Engine.

Today’s guest blogger is Chris Chew, senior software engineer at Udacity, which offers free online courses in programming and other subjects. Chris shares how Udacity itself is built using App Engine.

Steve Huffman blogged yesterday about how App Engine enables the project-based learning that makes his web development course so powerful. People are often surprised to learn that Udacity itself is built on App Engine.

The choice to use App Engine originally came from Mike Sokolsky, our CTO and cofounder, after his experience keeping the original version of our extremely popular AI course running on a series of virtual machines. Mike found App Engine’s operational simplicity extremely compelling after weeks of endlessly spinning up additional servers and administering MySQL replication in order to meet the crazy scale patterns we experience.

Close to a year later, with ten months of live traffic on App Engine, we continue to be satisfied customers. While there are a few things we do outside App Engine, our choice to continue using App Engine for our core application is clear: We prefer to spend our time figuring out how to scale personalized education, not memcached. App Engine’s infrastructure is better than what we could build ourselves, and it frees us to focus on behavior rather than operations.

How Udacity Uses App Engine

The App Engine features we use most include a pretty broad swath of the platform:

High Replication Datastore with NDB
Memcache
Task Queues - Deferred execution, MapReduce, batch jobs
App Engine Search API -- Indexing both course content and student résumés
Blobstore API -- Lecture videos, résumés, data exportation
Image API - Thumbnail generation
MapReduce API - Daily usage analytics, data migrations, data maintenance

A high-level representation of our “stack” looks something like this:

Trails and Trove are two libraries developed in-house mainly by Piotr Kaminski. Trails supplies very clean semantics for creating families of RESTful endpoints on top of a webapp2.RequestHandler with automagic marshalling. Trove is a wrapper around NDB that adds common property types (e.g. efficient dereferencing of key properties), yet another layer of caching for entities with relations (both in-process and memcache), and an event “watcher” framework for reliably triggering out-of-band processing when data changes.

Something notable that is not represented in the drawing above is a specific set of monkey patches from Trove we apply to NDB to create better hooks similar to the existing pre/post-put/delete hooks. These custom hooks power a “watcher” abstraction that provides targeted pieces of code the opportunity to react to changes in the data layer. Execution of each watcher is deferred and runs outside the scope of the request so as to not increase response times.

Latency

During our first year of scaling on App Engine we learned its performance is a complex thing to understand. Response time is a function of several factors both inside and outside our control. App Engine’s ability to “scale-out” is undeniable, but we have observed high variance in response times for a given request, even during periods with low load on the system. As a consequence we have learned to do a number of things to minimize the impact of latency variance:

Converting usage of the old datastore API to the new NDB API
Using NDB.tasklet coroutines as much as possible to enable parallelism during blocking RPC operations
Not indexing fields by default and adding an index only when we need it for a query
Carefully avoiding index hotspots by indexing fields with predictable values only when necessary (i.e. auto-now DateTime and enumerated “choices” String properties).
Materializing data views very aggressively so we can limit each request to the fewest datastore queries possible

This last point is obvious in the sense that naturally you get faster responses when you do less work. But we have taken pre-materializing views to an extreme level by denormalizing several aspects of our domain into read-optimized records. For example, the read-optimized version of a user’s profile record might contain standard profile information, plus privacy configuration, course enrollment information, course progress, and permissions -- all things a data modeler would normally want to store separately. We pile it together into the equivalent of a materialized view so we can fetch it all in one query.

Conclusion

App Engine is an amazingly complete and reliable platform that works astonishingly well for a huge number of use cases. It is very apparent the services and APIs have been designed by people who know how to scale web applications, and we feel lucky to have the opportunity to ride on the shoulders of such giants. It is trivial to whip together a proof-of-concept for almost any idea, and the subsequent work to scale your app is significantly less than if you had rolled your own infrastructure.

As with any platform, there are tradeoffs. The tradeoff with App Engine is that you get an amazing suite of scale-ready services at the cost of relentlessly optimizing to minimize latency spikes. This is an easy tradeoff for us because App Engine has served us well through several exciting usage spikes and there is no question the progress we have already made towards our mission is significantly more than if we were also building our own infrastructure. Like most choices in life, this choice can be boiled down to a bumper sticker:

Editor’s note: Chris Chew and Steve Huffman will be participating in a Google Developers Live Hangout tomorrow, Thursday, November 1st, check it out here and submit your questions for them to answer live on air.

-Contributed by Chris Chew, Senior Software Engineer, Udacity

Posted by Zafir Khan, Product Marketing Manager, Google App Engine

Tuesday, October 30, 2012

Developer Insights: Teaching thousands of students to program on Udacity with App Engine, Steve Huffman (part 1)

This post is the first of our two-part series discussing how Udacity uses Google App Engine.

Today’s guest blogger is Steve Huffman, founder of Reddit and Hipmunk. Steve recently taught a web development course at Udacity, which offers free online courses in programming and other subjects. Steve shares his experience using Google App Engine to teach the course.

This past spring I had the pleasure of teaching a course for Udacity, an online education company putting high quality college level courses online for free. I was recruited to Udacity by a former college professor and friend of mine, Dave Evans, Udacity's VP of Education.

When I was a Computer Science student at the University of Virginia, I was fortunate to take a cryptology course taught by Professor Evans. He presented us two ways to get an A in this course. We could either do it the old fashioned way--do well on tests and homeworks as well as completing a course-long project of our choosing; or, we could break into his computer and set our grade to an A. Naturally, we pretended to do the former, while spending our evenings huddled outside Professor Evans' house working on the latter. My team received A's.

It was one of the first times where I felt I was not just completing course objectives as a student, but thinking about real-world problems as a computer scientist. When Professor Evans emailed me early this year inquiring whether I’d be interested in teaching a course on Web Developement, I said, “Yes!” long before my brain had a chance to remind me that I already had a full-time job.

The course I taught was CS 253: Web Development, which aimed to teach students the fundamentals of building web applications. I’ve always wanted to teach-- it’s one of my favorite aspects of my job at Hipmunk. Web Development in particular is appealing because not only is it, in my opinion, the world’s most valuable profession, but even starting from scratch it doesn’t take much time to acquire the skills to build a site that can change the world. I wanted my course to leave students with such skills.

Choosing a platform for CS253

The course would be divided into seven one-hour lectures. After completing the seven lessons, students would have the skills to build a fully-functional blog platform from the ground-up, user accounts and all. I knew from experience that there is a dark side to web development: system administration. You can write all the fancy software you want that works on your own machine, but actually getting it online can be quite the pain. Where do you host it? Which database will you use? Do you know how to install such a database? Do you know how to configure a web server?

Learning the basics of web development in seven lessons was going to be challenging enough, I didn’t want students to have to deal with learning how to be system administrators at the same time. Fortunately, we decided in our first meeting that Google App Engine was the right tool for this course. Despite having never used it myself, the idea of it seemed to fit perfectly. Students would be able to write fully-functional web applications without having to deal with the tedium of installing web servers and databases, at least that was the plan. To be honest, I was a little skeptical at first, but I also didn’t have much of a choice--I wasn’t about to waste any time explaining how to get PosgreSQL running in Windows.

Reflections on App Engine

App Engine turned out to be one of the best decisions we made. Here are a couple of reasons why:

Write locally, deploy globally.
With App Engine, you can develop and run your application on your own machine, database and all, and with a simple command, deploy your application to Google’s servers and have it run identically on the Internet. When this worked for the first time for me, I was blown away. I’ve spent a significant, perhaps embarrassing, amount of time deploying code over the years. To see it happen in just a few seconds was astonishing.

Students being able to get their code running on the Internet with almost no hassle was one of the most important aspects of my course. First, it gave the students an immediate sense of power. After the first lesson, they would have their own code running live on the Internet! Second, it enabled a really nice mechanic of the course--each lesson would end with an assignment to add a feature to their live website. We could then grade these assignments in real-time. All the students had to do was submit a URL.

Excellent documentation.
App Engine’s documentation is superb. I tried to focus the majority of the course on high-level concepts common to all web development platforms; however, it was unavoidable that many parts of the course are specific to App Engine itself. For better or for worse, many of the App Engine concepts I taught I had learned only moments before. I got to know and appreciate that documentation very well. Even some of the more subtle concepts, like how the Datastore deals with replication lag, was all there and clearly explained.

The perfect level of abstraction.
A trap many beginner web developers fall into is starting with a very abstract web framework like Rails. While Rails enables a beginner to write simple apps quickly, and allows pros to appear to be wizards, it masks a lot of really important web concepts. App Engine sits at just the right level of abstraction for beginners and pros alike. I think it’s critically important to understand the difference between a GET and a POST, what HTTP headers look like, and how cookies work, for example. These aren’t difficult concepts, and having a deep understanding of them early will carry prospective developers far.

Auto-scaling.
While most of the students’ work in my course was probably used only by themselves and our grading scripts, we did spend a fair amount of time discussing how to design web applications that can support millions of users. Once you understand how to design an application that can run across many machines, App Engine will take care of the challenge of actually launching those machines when required and deploying your code seamlessly. Should the website I built during the course lessons, asciichan.com, ever hit the big time, I can rest assured App Engine will scale it with no effort from me.

Conclusion

Teaching CS 253 was a tremendous experience. To date, over 57,000 students have enrolled in the course! Check out some of the cool sites they built on App Engine after just seven lessons:

This is a project I’m incredibly proud of, and I’m deeply thankful to the folks at Udacity for giving me the opportunity. Furthermore, I’m grateful to Google and the App Engine team for building such a strong product. CS 253 could not have worked without it.

Editor’s note: Stay tuned tomorrow for the second post in this series by Chris Chew, one of the developers at Udacity. He’s going to explain how the Udacity team uses App Engine to power the courses themselves. Also, Steve and Chris will be participating in a Google Developers Live Hangout this Thursday, November 1st, check it out here and submit your questions for them to answer live on air.

-Contributed by Steve Huffman, Founder, Reddit and Hipmunk

Posted by Zafir Khan, Product Marketing Manager, Google App Engine

Friday, October 26, 2012

About today's App Engine outage

This morning we failed to live up to our promise, and Google App Engine applications experienced increased latencies and time-out errors.

We know you rely on App Engine to create applications that are easy to develop and manage without having to worry about downtime. App Engine is not supposed to go down, and our engineers work diligently to ensure that it doesn’t. However, from approximately 7:30 to 11:30 AM US/Pacific, about 50% of requests to App Engine applications failed.

Here’s what happened, from what we know today:

Summary

4:00 am - Load begins increasing on traffic routers in one of the App Engine datacenters.
6:10 am - The load on traffic routers in the affected datacenter passes our paging threshold.
6:30 am - We begin a global restart of the traffic routers to address the load in the affected datacenter.
7:30 am - The global restart plus additional load unexpectedly reduces the count of healthy traffic routers below the minimum required for reliable operation. This causes overload in the remaining traffic routers, spreading to all App Engine datacenters. Applications begin consistently experiencing elevated error rates and latencies.
8:28 am - google-appengine-downtime-notify@googlegroups.com is updated with notification that we are aware of the incident and working to repair it.
11:10 am - We determine that App Engine’s traffic routers are trapped in a cascading failure, and that we have no option other than to perform a full restart with gradual traffic ramp-up to return to service.
11:45 am - Traffic ramp-up completes, and App Engine returns to normal operation.

In response to this incident, we have increased our traffic routing capacity and adjusted our configuration to reduce the possibility of another cascading failure. Multiple projects have been in progress to allow us to further scale our traffic routers, reducing the likelihood of cascading failures in the future.

During this incident, no application data was lost and application behavior was restored without any manual intervention by developers. There is no need to make any code or configuration changes to your applications.

We will proactively issue credits to all paid applications for ten percent of their usage for the month of October to cover any SLA violations. This will appear on applications’ November bills. There is no need to take any action to receive this credit.

We apologize for this outage, and in particular for its duration and severity. Since launching the High Replication Datastore in January 2011, App Engine has not experienced a widespread system outage. We know that hundreds of thousands of developers rely on App Engine to provide a stable, scalable infrastructure for their applications, and we will continue to improve our systems and processes to live up to this expectation.

- Posted by Peter S. Magnusson, Engineering Director, Google App Engine

Tuesday, October 23, 2012

App Engine 1.7.3 Released

For our October release we have a number of offerings, fixes, and small refinements as colorful as the fall season.

General Enhancements

Django 1.4 is now fully supported for Python 2.7

Java classloading priority can now be granted to specific JAR files. This is an experimental feature. More information can be found here.

App Engine SDK support for Java 7

With Java 7, many new language enhancements have been added, including:

The ability to use the String class in Switch statements

Expression of binary literals using simple prefixes 0b or 0B

Single catch blocks that can handle multiple exceptions

Improved type inference for generic instance creation

Auto closing of resources when enclosed within a try-with-resources statement

Simplified varargs method invocation

We’re happy to announce that as of this release the App Engine Java SDK has support for running and testing your applications using a local Java 7 installation. To get started now, developers should download the latest App Engine Java SDK and a Java SE 7 or JDK 7 distribution. From there they can follow the existing documentation for running and testing applications locally.

In an upcoming release, we will be including some of the new Java 7 functionality as well as formal Java 7 support within the App Engine Java runtime. Before this is available, we strongly encourage developers to start testing their applications using Java 7 and the latest App Engine Java SDK.

And while Java 7 support is not yet available within the App Engine Java runtime, developers interested in an early preview can sign up for our trusted tester program.

Want more information?

The complete list of features and bug fixes for 1.7.3 can be found in our release notes. For App Engine coding questions and answers check us out on Stack Overflow, and for general discussion and feedback, find us on our Google Group.

- Posted by the Google App Engine Team

Thursday, October 18, 2012

Developer Insights: Building scalable social games on App Engine

Today’s guest blogger is Hernan Liendo, CTO of Zupcat, developer of social games played by millions of people worldwide. Hernan shares his team’s experience using App Engine to build RaceTown, a multiplayer racing game.

Choosing a cloud service provider

RaceTown is one of Zupcat’s most popular games; it has almost 900,000 monthly unique users, opens more than 40,000 connections via the Channel API per day, processes more than 15,000 queries per second and delivers terabytes of content everyday. When deciding our architecture, we took into account several unique requirements of social games:

High uptime

Short loading time

Flexibility to deal with social network API changes

Ability to manage thousands of players, concurrently, from all over the world

Adjustment to capabilities and performance issues on different users’ computers

Ability to measure user actions to constantly improve the user experience

Hosting and delivering quality, beautiful game art

Complex game domains and algorithms: such as enemy adaptable performance, path finding, and 2D and 3D rendering

App Engine addresses these complicated issues. It provides few tracerouting hops from almost anywhere in the world, great uptime, automatic scalability, no need for infrastructure monitoring and a reasonable price for content delivery.

Implementing App Engine

The diagram above shows a simplified view of our game architecture. We’ve discovered that App Engine is good to use not only as a game backend server, but also as a metrics server and content delivery network. In addition, we periodically synchronize game state and retrieve data to and from the server.

The App Engine Datastore is great because it has high availability and easily handles hundreds of millions of rows of data, which is important for social games. For example, we can easily scan the Datastore to present high score information and gamer stats to the user. Additionally, because gamers tend to spend lot of time during a game session, we’ve found it’s helpful to cache game data. Using Memcache, we have significantly reduced Datastore API calls and lowered users’ waiting time.

Another tip for App Engine developers - although App Engine API failures are uncommon, you must be sure to write proper retrying code to minimize the possibility of exposing users to an application crash. RaceTown performs almost a hundred million operations daily, and proper client side retrying algorithms have enabled us to reduce failure rates to very low levels.

Final thoughts

I believe that today there is no technology that matches App Engine. You can run your code and store your data in the very same servers that Google uses. Migrating your applications to this technology means you have to start thinking in a cloud-centric way and reinvent your architecture to stop working inside a relational database and classic clustered web server.

If you can achieve this, your products will be delivered using the same infrastructure that Google uses, without a huge corporate budget!

- Contributed by Hernan Liendo, @hernanliendo of Zupat, @zupcat

Tuesday, October 9, 2012

Developer Insights: Streak brings CRM to the inbox with Google Cloud Platform

Cross-posted with the Google Developers Blog

Today’s guest blogger is Aleem Mawani, co-founder of Streak, a startup alum of Y Combinator, a Silicon Valley incubator. Streak is a CRM tool built into Gmail. Aleem shares his experience building and scaling their product using Google Cloud Platform.

Everyone relies on email to get work done – yet most people use separate applications from their email to help them with various business processes. Streak fixes this problem by letting you do sales, hiring, fundraising, bug tracking, product development, deal flow, project management and almost any other business process right inside Gmail. In this post, I want to illustrate how we have used Google Cloud Platform to build Streak quickly, scalably and with the ability to deeply analyze our data.

We use several Google technologies on the backend of Streak:

App Engine to serve our app

App Engine Datastore to persist user data

Memcache to make operations fast

BigQuery to analyze our logs and power dashboards

App Engine Search API to let users sift through their data

Prediction API to machine learn over user data

Google Translate API to translate our app to over 40 languages.

Our core learning is that you should use the best tool for the job. No one technology will be able to solve all your data storage and access needs. Instead, for each type of functionality, you should use a different service. In our case, we aggressively mirror our data in all the services mentioned above. For example, although the source of truth for our user data is in the App Engine Datastore, we mirror that data in the App Engine Search API so that we can provide full text search, Gmail style, to our users. We also mirror that same data in BigQuery so that we can power internal dashboards.

System Architecture

App Engine - We use App Engine for Java primarily to serve our application to the browser and mobile clients in addition to serving our API. App Engine is the source of truth for all our data, so we aggressively cache using Memcache. We also use Objectify to simplify access to the Datastore, which I highly recommend.

Google Cloud Storage - We mirror all of our Datastore data as well as all our log data in Cloud Storage, which acts as a conduit to other Google cloud services. It lets us archive the data as well as push it to BigQuery and the Prediction API.

BigQuery - Pushing the data into BigQuery allows us to run non-realtime queries that can help generate useful business metrics and slice user data to better understand how our product is getting used. Not only can we run complex queries over our Datastore data but also over all of our log data. This is incredibly powerful for analyzing the request patterns to App Engine. We can answer questions like:

Which requests cost us the most money?
What is the average response time for every URL on our site over the last 3 days?

BigQuery helps us monitor error rates in our application. We process all of our log data with debug statements, as well as something called an “error type” for any request that fails. If it’s a known error, we'll log something sensible, and we log the exception type if we haven’t seen it before. This is beneficial because we built a dashboard that queries BigQuery for the most recent errors in the last hour grouped by error type. Whenever we do a release, we can monitor error rates in the application really easily.

A Streak dashboard powered by BigQuery showing current usage statistics

In order to move the data into Cloud Storage from the Datastore and LogService, we developed an open source library called Mache. It’s a drop-in library that can be configured to automatically push data into BigQuery via Cloud Storage. The data can come from the Datastore or from LogService and is very configurable - feel free to contribute and give us feedback on it!

Google Cloud Platform also makes our application better for our users. We take advantage of the App Engine Search API and again mirror our data there. Users can then query their Streak data using the familiar Gmail full text search syntax, for example, “before:yesterday name:Foo”. Since we also push our data to the Prediction API, we can help users throughout our app by making smart suggestions. In Streak, we train models based on which emails users have categorized into different projects. Then, when users get a new email, we can suggest the most likely box that the email belongs to.

One issue that arises is how to keep all these mirrored data sets in sync. It works differently for each service based on the architecture of the service. Here’s a simple breakdown:

Having these technologies easily available to us has been a huge help for Streak. It makes our products better and helps us understand our users. Streak’s user base grew 30% every week for 4 consecutive months after launch, and we couldn’t have scaled this easily without Google Cloud Platform. To read more details on why Cloud Platform makes sense for our business, check out our case study and our post on the Google Enterprise blog.

-Contributed by Aleem Mawani, co-founder of Streak

Tuesday, October 2, 2012

Jenkins, meet Google App Engine

Today’s guest post comes from Ryan Campbell and Stephen Connolly, developers at CloudBees. CloudBees is a major supporter of Jenkins, the popular open source continuous integration server, and the creator of DEV@Cloud, a hosted version of Jenkins.

As development teams grow, it becomes increasingly hard to ensure that their work is in sync. Jenkins is one of the leading tools to combat this issue. Jenkins provides a process where work across your team is automated, so that building, testing and deployment are all centralized in one location. As a major supporter of Jenkins, CloudBees has helped companies streamline their build, test and deployment processes both on premise, and in the cloud via CloudBees DEV@Cloud.

Google App Engine users can now run Jenkins continuous integration in the cloud by signing up at appengine.cloudbees.com. Jenkins will monitor your projects’ source code for any changes, run the necessary builds and tests, and notify your team of any problems - or automatically deploy the application to Google App Engine if everything looks good. This process helps to prevent the deployment of broken code, and gives everyone a central record of what changes went into each deployment. If you’re new to continuous integration and Jenkins, the Jenkins wiki is a great place to get started.

The video below shows you how to setup a Jenkins Maven job that checks out the source code, builds the application, runs any tests, and then deploys to Google App Engine. Note that you can use virtually any source code service you like, including GitHub or CloudBees’ own Git and SVN servers.

Once you have a basic build working, you can integrate additional online services into your Jenkins workflow, like Sauce Labs for browser-based tests, Sonar for code analysis, or JFrog Artifactory as an artifact repository manager. These and several other CloudBees services can be automatically subscribed to using the Services link in your toolbar.

In summary, CloudBees Jenkins for Google App Engine is unique in several ways:

It's fully managed, which means you don't have to set up Jenkins or the build machines you need.

You always have enough build capacity -- we dynamically add more build machines as you need them.

CloudBees Jenkins is free to get started.

Sign up for CloudBees DEV@Cloud at appengine.cloudbees.com to have Jenkins monitor the health of your projects, and automatically deploy your applications to Google App Engine. You only need your Google App Engine account, no credit cards or command lines are required. Now, you can focus on delivering features, and rely on CloudBees to manage the development infrastructure.

- Contributed by Ryan Campbell and Stephen Connolly, developers at CloudBees.

Monday, October 1, 2012

New Google BigQuery Launch includes Datastore Import for Trusted Testers

Google BigQuery, a service for performing real-time analytics on your data, launched several exciting new features this morning. Part of the release is a new feature which enables you to import data from our experimental Datastore backup tool directly into BigQuery for analysis. We are opening up this capability to a small group of trusted testers - please sign up here if you’re interested.

The BigQuery launch also includes support for JSON and its nested/repeated structure as well as significant improvements to the data loading pipeline. Check out their post on the Google Developers Blog for more details on their latest release.

- Posted by the Google App Engine Team

Google App Engine Blog

News, notes, tips and tricks from the Google App Engine Team