Showing posts with label developer-insights. Show all posts
Showing posts with label developer-insights. Show all posts

Monday, February 4, 2013

Scaling SongPop to 60 million users with App Engine and Google Cloud Storage


Continuing our Developer Insights series, today’s guest bloggers are Olivier Michon, CTO, and Alexis Hanicotte, software engineer, from Fresh Planet, maker of the popular mobile application SongPop.  SongPop is a social application where players compete to be the fastest to guess the name of a song or artist.

SongPop is a social mobile app where players compete on who will be the fastest to guess a songs’ artist or title. It is a huge success for our very small team of just 6 engineers. We now have more than 60 million users, were the number 5 most downloaded iOS game of 2012, and went from 0 to more than 10k queries/second on our servers in less than 6 months. This has been made possible in large part because we run on Google App Engine.

App Engine allows us to quickly prototype, iterate and release our games.  We’ve been accumulating experience with the platform since 2009, but really saw the power of autoscaling once SongPop became a hit.

Our experience scaling with App Engine

SongPop was released in May 2012.  During our growth to our first 100,000 daily active users (DAU), our App Engine backend scaled smoothly.  This allowed us to spend our time making actual improvements to the game experience, while our user growth continued at a rapid pace.

We opened a Premier Account with App Engine around the time when we reached 100,000 DAU which gave us access to live customer support.  It came in handy when we encountered two downtimes, with one lasting just 10 minutes and the other for 1 hour.  But we were able to promptly reach Premier Support and had Google engineers investigate these issues with us.  

Once we reached 500,000 DAU, we applied a variety of optimization ideas to reduce latency.  For example, we used to have user data spread over many models, but we combined them into a single entry in the Datastore to reduce read operations.  We also often needed the list of a user’s opponents, so instead of querying every time, we cached this result using Memcache.  It only took one engineer and just 4 days of work to reduce the latency with these optimizations.   

As we reached the milestone of 1 million DAU, some Datastore queries (used to find random opponents in the game) showed high latencies and a high rate of timeouts.  We had to enforce deadlines, implement better fallbacks and identified, with the help of Premier Support, that degraded performance came from the fact that our queries were relying on many different indexed properties.  So the solution was simple - either add a composite index with all the properties we needed or combine into a single one.

Using App Engine + Google Cloud Storage

For each game session, our users need to download song samples in order to play. It is critical that this data gets delivered fast and reliably wherever the user is located. We chose to use Google Cloud Storage for this use case.   It has proved to provide high performance content delivery, as we have been able to serve 17 terabytes/day of songs and images worldwide.

In addition to its reliability, Cloud Storage is great because of its integration with App Engine. We can easily read and write files from our application to Cloud Storage using the same syntax as we would use to write local files (using Python). We found it intuitive and convenient because you do not have to manage opaque keys to retrieve your files (just use the path you specified), and you can browse your files through a directory-like structure. Cloud Storage also allows you to manage access rights, can be used with Google BigQuery, and it is priced affordably compared to other solutions we considered.



Architecture diagram of SongPop

Long live App Engine!


When we speak to other game developers, we always recommend that they use App Engine.  We’ve used other services such as EC2 from Amazon Web Services for other games before, but we’ve found App Engine to be a better service for our needs.  We don’t want to spend time setting up servers and load balancing, when we could instead use that time to build great games and let our service provider handle the infrastructure for us.

When we compare the development of SongPop to stories of other apps, we’re thankful that App Engine allowed us to have only one engineer working full-time on the backend portion of our app.  Even better, he was able to do additional work on adding new features to the game instead of solely focusing on infrastructure issues.  With App Engine, scaling our game was easy.

Others things we want to share

  • Do not worry if documented resources limits and rates look too small, they are just there to make sure one app does not abuse any resource, but they can scale. Most of our limits have been increased by 18-fold! We had days where we made 230,000,000 UrlFetch API calls, for instance.
  • The Location headers is a really great feature,  because it is easily accessible for a wide variety of use cases, such as selecting users’ opponents or building their game profile.

Contributed by Olivier Michon and Alexis Hanicotte, FreshPlanet

Posted by Zafir Khan, Product Marketing Manager, Google App Engine

Thursday, December 20, 2012

Mobile voucher sales terminal in Africa powered by App Engine


Today’s guest blogger is Dale Humby, CTO of Nomanini.  Nomanini is a startup based in South Africa which provides a platform to sell prepaid products such as airtime, electricity and insurance electronically.  In this post, Dale explains how the backend systems of their flagship device, Lula, run on App Engine.

Introduction


In rural markets, it is often difficult to distribute physical vouchers, which can be used to provide access to services such as electricity, insurance, and airtime for mobile phones.  Nomanini enables entrepreneurs in Africa to earn income selling prepaid vouchers in their local communities.  We do this by distributing a portable, user-friendly voucher sales terminal, known as the Lula, which can be used on-the-go by people ranging from taxi drivers to street vendors.



The Lula, Nomanini's portable voucher sales terminal

Running Lula on App Engine



Nomanini uses Google App Engine to support the backend system for our network of point of sale terminals.

Terminals in the field connect to our App Engine application through the GSM mobile network.  The terminals synchronize when a connection becomes available, allowing sales to be processed even when devices are offline.

The devices make an https post to a URL endpoint. Any data uploaded by the device is queued for processing as multiple tasks, and the App Engine application sends information back to the device within the returned body.  Often the only responsibility a URL endpoint has is to create a task. If work can be broken into discrete areas we fan out to other tasks which run in parallel on separate App Engine instances. By controlling how many concurrent tasks are run in each queue, we are able to prioritize specific parts of our application ensuring the best quality of service for our customers.

By utilizing cross entity group transactions, transactional tasks and appropriate key naems for entities in the Datastore, we have been able to build an extremely resilient application capable of processing data.  

Services we use



  • ProdEagle (another application built on top of Google App Engine) for real time metrics visualization

Benefits of building on App Engine


Saves time
App Engine’s High Replication Datastore gives us the peace of mind that our data will be available and accurate to a degree that we couldn’t easily replicate ourselves. As a start up, capital and time are in short supply. With App Engine, we can focus on building our unique application rather than worrying about infrastructure.  App Engine Task Queues allow parallel data execution and retrying on failure, with little code overhead for developers.

Scalability
Nomanini has very cyclical traffic patterns: our peak traffic occurs in the early morning and late afternoon, with a monthly peak around payday in South Africa. Google App Engine automatically scales our application so that we don’t have to pay for excess server capacity during off-peak times, but have capacity available when we need it.

Ease of operation
Deployment is a breeze on App Engine.  With just a few scripts tied in to our continuous integration server we can:

  • write the version number and app name into app.yaml
  • deploy automatically
  • run data migration scripts
  • change the default version once all indexes are serving

Monitoring is simplified by using built in dashboards. We can also export logs to Google Cloud Storage and run ad-hoc queries on analytics using Google BigQuery.

Conclusion

To reduce our time to market we used as many off-the-shelf components as possible, including Google App Engine.

We chose App Engine because it offers a platform with a consistent, well documented, ready-to-use set of services and allows our developers to test within an environment that is identical to our production environment - a practice that used to be prohibitively expensive.

Using the Google App Engine platform has saved our small development team time that we would have had to use to design, build and test a highly reliable backend to support our network of vending terminals in farflung places.  Instead, we can focus our time on building a device that we hope will impact the way local commerce is done in rural economies.


To read more about the story of Nomanini, check out the post about us on the Official Google Blog.  


-Contributed by Dale Humby, CTO, Nomanini

Posted by Zafir Khan, Product Marketing Manager, Google App Engine

Wednesday, October 31, 2012

Developer Insights: Teaching thousands of students to program on Udacity with App Engine (part 2)


This post is the second of our two-part series discussing how Udacity uses Google App Engine.

Today’s guest blogger is Chris Chew, senior software engineer at Udacity, which offers free online courses in programming and other subjects.  Chris shares how Udacity itself is built using App Engine.

Steve Huffman blogged yesterday about how App Engine enables the project-based learning that makes his web development course so powerful.  People are often surprised to learn that Udacity itself is built on App Engine.

The choice to use App Engine originally came from Mike Sokolsky, our CTO and cofounder, after his experience keeping the original version of our extremely popular AI course running on a series of virtual machines.  Mike found App Engine’s operational simplicity extremely compelling after weeks of endlessly spinning up additional servers and administering MySQL replication in order to meet the crazy scale patterns we experience.

Close to a year later, with ten months of live traffic on App Engine, we continue to be satisfied customers.  While there are a few things we do outside App Engine, our choice to continue using App Engine for our core application is clear:  We prefer to spend our time figuring out how to scale personalized education, not memcached.  App Engine’s infrastructure is better than what we could build ourselves, and it frees us to focus on behavior rather than operations.

How Udacity Uses App Engine

The App Engine features we use most include a pretty broad swath of the platform:


A high-level representation of our “stack” looks something like this:




Trails and Trove are two libraries developed in-house mainly by Piotr Kaminski.  Trails supplies very clean semantics for creating families of RESTful endpoints on top of a webapp2.RequestHandler with automagic marshalling.  Trove is a wrapper around NDB that adds common property types (e.g. efficient dereferencing of key properties), yet another layer of caching for entities with relations (both in-process and memcache), and an event “watcher” framework for reliably triggering out-of-band processing when data changes.

Something notable that is not represented in the drawing above is a specific set of monkey patches from Trove we apply to NDB to create better hooks similar to the existing pre/post-put/delete hooks.  These custom hooks power a “watcher” abstraction that provides targeted pieces of code the opportunity to react to changes in the data layer.  Execution of each watcher is deferred and runs outside the scope of the request so as to not increase response times.

Latency

During our first year of scaling on App Engine we learned its performance is a complex thing to understand.  Response time is a function of several factors both inside and outside our control.  App Engine’s ability to “scale-out” is undeniable, but we have observed high variance in response times for a given request, even during periods with low load on the system.  As a consequence we have learned to do a number of things to minimize the impact of latency variance:

  • Converting usage of the old datastore API to the new NDB API
  • Using NDB.tasklet coroutines as much as possible to enable parallelism during blocking RPC operations
  • Not indexing fields by default and adding an index only when we need it for a query
  • Carefully avoiding index hotspots by indexing fields with predictable values only when necessary (i.e. auto-now DateTime and enumerated “choices” String properties).
  • Materializing data views very aggressively so we can limit each request to the fewest datastore queries possible

This last point is obvious in the sense that naturally you get faster responses when you do less work.  But we have taken pre-materializing views to an extreme level by denormalizing several aspects of our domain into read-optimized records.  For example, the read-optimized version of a user’s profile record might contain standard profile information, plus privacy configuration, course enrollment information, course progress, and permissions -- all things a data modeler would normally want to store separately.  We pile it together into the equivalent of a materialized view so we can fetch it all in one query.


Conclusion

App Engine is an amazingly complete and reliable platform that works astonishingly well for a huge number of use cases.  It is very apparent the services and APIs have been designed by people who know how to scale web applications, and we feel lucky to have the opportunity to ride on the shoulders of such giants.  It is trivial to whip together a proof-of-concept for almost any idea, and the subsequent work to scale your app is significantly less than if you had rolled your own infrastructure.

As with any platform, there are tradeoffs.  The tradeoff with App Engine is that you get an amazing suite of scale-ready services at the cost of relentlessly optimizing to minimize latency spikes.  This is an easy tradeoff for us because App Engine has served us well through several exciting usage spikes and there is no question the progress we have already made towards our mission is significantly more than if we were also building our own infrastructure.  Like most choices in life, this choice can be boiled down to a bumper sticker:



Editor’s note: Chris Chew and Steve Huffman will be participating in a Google Developers Live Hangout tomorrow, Thursday, November 1st, check it out here and submit your questions for them to answer live on air.

-Contributed by Chris Chew, Senior Software Engineer, Udacity


Posted by Zafir Khan, Product Marketing Manager, Google App Engine

Thursday, October 18, 2012

Developer Insights: Building scalable social games on App Engine


Today’s guest blogger is Hernan Liendo, CTO of Zupcat, developer of social games played by millions of people worldwide.  Hernan shares his team’s experience using App Engine to build RaceTown, a multiplayer racing game.  


Choosing a cloud service provider


RaceTown is one of Zupcat’s most popular games; it has almost 900,000 monthly unique users, opens more than 40,000 connections via the Channel API per day, processes more than 15,000 queries per second and delivers terabytes of content everyday.  When deciding our architecture, we took into account several unique requirements of social games:


  • High uptime
  • Short loading time
  • Flexibility to deal with social network API changes
  • Ability to manage thousands of players, concurrently, from all over the world
  • Adjustment to capabilities and performance issues on different users’ computers
  • Ability to measure user actions to constantly improve the user experience
  • Hosting and delivering quality, beautiful game art
  • Complex game domains and algorithms: such as enemy adaptable performance, path finding, and 2D and 3D rendering

App Engine addresses these complicated issues.  It provides few tracerouting hops from almost anywhere in the world, great uptime, automatic scalability, no need for infrastructure monitoring and a reasonable price for content delivery.

Implementing App Engine




The diagram above shows a simplified view of our game architecture. We’ve discovered that App Engine is good to use not only as a game backend server, but also as a metrics server and content delivery network.  In addition, we periodically synchronize game state and retrieve data to and from the server.  


The App Engine Datastore is great because it has high availability and easily handles hundreds of millions of rows of data, which is important for social games.  For example, we  can easily scan the Datastore to present high score information and gamer stats to the user.  Additionally, because gamers tend to spend lot of time during a game session, we’ve found it’s helpful to cache game data. Using Memcache, we have significantly reduced Datastore API calls and lowered users’ waiting time.

Another tip for App Engine developers - although App Engine API failures are uncommon, you must be sure to write proper retrying code to minimize the possibility of exposing users to an application crash.  RaceTown performs almost a hundred million operations daily, and proper client side retrying algorithms have enabled us to reduce failure rates to very low levels.

Final thoughts

I believe that today there is no technology that matches App Engine.  You can run your code and store your data in the very same servers that Google uses.  Migrating your applications to this technology means you have to start thinking in a cloud-centric way and reinvent your architecture to stop working inside a relational database and classic clustered web server.

If you can achieve this, your products will be delivered using the same infrastructure that Google uses, without a huge corporate budget!


- Contributed by Hernan Liendo, @hernanliendo of Zupat, @zupcat

Tuesday, October 9, 2012

Developer Insights: Streak brings CRM to the inbox with Google Cloud Platform


Cross-posted with the Google Developers Blog

Today’s guest blogger is Aleem Mawani, co-founder of Streak, a startup alum of Y Combinator, a Silicon Valley incubator.  Streak is a CRM tool built into Gmail.  Aleem shares his experience building and scaling their product using Google Cloud Platform.

Everyone relies on email to get work done – yet most people use separate applications from their email to help them with various business processes. Streak fixes this problem by letting you do sales, hiring, fundraising, bug tracking, product development, deal flow, project management and almost any other business process right inside Gmail.  In this post, I want to illustrate how we have used Google Cloud Platform to build Streak quickly, scalably and with the ability to deeply analyze our data.



We use several Google technologies on the backend of Streak:

  • BigQuery to analyze our logs and power dashboards

Our core learning is that you should use the best tool for the job. No one technology will be able to solve all your data storage and access needs. Instead, for each type of functionality, you should use a different service. In our case, we aggressively mirror our data in all the services mentioned above. For example, although the source of truth for our user data is in the App Engine Datastore, we mirror that data in the App Engine Search API so that we can provide full text search, Gmail style, to our users. We also mirror that same data in BigQuery so that we can power internal dashboards.

System Architecture



App Engine - We use App Engine for Java primarily to serve our application to the browser and mobile clients in addition to serving our API. App Engine is the source of truth for all our data, so we aggressively cache using Memcache. We also use Objectify to simplify access to the Datastore, which I highly recommend.

Google Cloud Storage - We mirror all of our Datastore data as well as all our log data in Cloud Storage, which acts as a conduit to other Google cloud services. It lets us archive the data as well as push it to BigQuery and the Prediction API.

BigQuery - Pushing the data into BigQuery allows us to run non-realtime queries that can help generate useful business metrics and slice user data to better understand how our product is getting used. Not only can we run complex queries over our Datastore data but also over all of our log data. This is incredibly powerful for analyzing the request patterns to App Engine. We can answer questions like:

  • Which requests cost us the most money?
  • What is the average response time for every URL on our site over the last 3 days?

BigQuery helps us monitor error rates in our application.  We process all of our log data with debug statements, as well as something called an “error type” for any request that fails.  If it’s a known error, we'll log something sensible, and we log the exception type if we haven’t seen it before.  This is beneficial because we built a dashboard that queries BigQuery for the most recent errors in the last hour grouped by error type. Whenever we do a release, we can monitor error rates in the application really easily.


A Streak dashboard powered by BigQuery showing current usage statistics
In order to move the data into Cloud Storage from the Datastore and LogService, we developed an open source library called Mache. It’s a drop-in library that can be configured to automatically push data into BigQuery via Cloud Storage. The data can come from the Datastore or from LogService and is very configurable - feel free to contribute and give us feedback on it!

Google Cloud Platform also makes our application better for our users. We take advantage of the App Engine Search API and again mirror our data there. Users can then query their Streak data using the familiar Gmail full text search syntax, for example, “before:yesterday name:Foo”. Since we also push our data to the Prediction API, we can help users throughout our app by making smart suggestions. In Streak, we train models based on which emails users have categorized into different projects. Then, when users get a new email, we can suggest the most likely box that the email belongs to.

One issue that arises is how to keep all these mirrored data sets in sync. It works differently for each service based on the architecture of the service. Here’s a simple breakdown:



Having these technologies easily available to us has been a huge help for Streak. It makes our products better and helps us understand our users. Streak’s user base grew 30% every week for 4 consecutive months after launch, and we couldn’t have scaled this easily without Google Cloud Platform.  To read more details on why Cloud Platform makes sense for our business, check out our case study and our post on the Google Enterprise blog.

-Contributed by Aleem Mawani, co-founder of Streak