Wednesday, March 31, 2010

Easy Performance Profiling with Appstats

Since App Engine debuted 2 years ago, we’ve written extensively about best practices for writing scalable apps on App Engine. We make writing scalable apps as easy as possible, but even so, getting everything right can be a challenge - and when you don’t, it can be tough to figure out what’s going wrong, or what’s slowing your app down.

Thanks to a new library, however, you no longer need to guess. Appstats is a library for App Engine that enables you to profile your App Engine app’s performance, and see exactly what API calls your app is making for a given request, and how long they take. With the ability to get a quick overview of the performance of an entire page request, and to drill down to the details of an individual RPC call, it’s now easy to detect and eliminate redundancies and inefficiencies in the way your App Engine app works. Appstats is available for both the Python and Java runtimes.

Enabling Appstats is remarkably straightforward. In Python, it can be made to work with any Python webapp framework. For the full lowdown, see the docs, but here’s the quickstart if you’re using a supported framework. First, create or open a file called appengine_config.py in your app’s root directory, and add the following to it:

def webapp_add_wsgi_middleware(app):
   from google.appengine.ext.appstats import recording
   app = recording.appstats_wsgi_middleware(app)
   return app

In Java, appstats works by installing a filter. Again, the lowdown is here, but for a quickstart, add this to your web.xml:

   <filter>
       <filter-name>appstats</filter-name>
       <filter-class>com.google.appengine.tools.appstats.AppstatsFilter</filter-class>
       <init-param>
           <param-name>logMessage</param-name>
           <param-value>Appstats available: /appstats/details?time={ID}</param-value>
        </init-param>
   </filter>
   <filter-mapping>
       <filter-name>appstats</filter-name>
       <url-pattern>/*</url-pattern>
   </filter-mapping>

This installs the Appstats event recorder, which is responsible for recording API calls made by your app so you can review them later. The recorder is pretty low overhead, so you can even use it on a live site, if you wish - though you may want to disable it once you no longer require it.

The other component of Appstats is the administrative interface. To install this on the Python runtime, you need to make a change to your app.yaml file. Add the following block inside the ‘handlers’ section of app.yaml:

- url: /stats.*
 script: $PYTHON_LIB/google/appengine/ext/appstats/ui.py

And for Java, add this to your web.xml:

   <servlet>
       <servlet-name>appstats</servlet-name>
       <servlet-class>com.google.appengine.tools.appstats.AppstatsServlet</servlet-class>
   </servlet>
   <servlet-mapping>
       <servlet-name>appstats</servlet-name>
       <url-pattern>/appstats/*</url-pattern>
   </servlet-mapping>

   <security-constraint>
       <web-resource-collection>
           <url-pattern>/appstats/*</url-pattern>
       </web-resource-collection>
       <auth-constraint>
           <role-name>admin</role-name>
       </auth-constraint>
   </security-constraint>

The url here - ‘/stats’ - can be anything you like, as long as it ends with ‘/stats’, and is the URL that you can access the Appstats admin console over.

For additional ease-of-use, we can add the admin interface as a custom admin console page. In Python, add the following block to the end of app.yaml:

admin_console:
 pages:
 - name: Appstats
   url: /stats

Similarly, you can do this in Java by adding this to appengine-web.xml:

<admin-console>
 <page name="Appstats" url="/stats" />
</admin-console>

After redeploying your app, you should now see a new option in your app’s admin console labelled ‘Appstats’. Clicking on this will show you the appstats admin console.

The main page of the admin console provides some high level information on RPC calls made and URL paths requested, and, down the bottom, a history of recent requests. Clicking the plus button on any of these will expand the entry to show more details about it, but for the really juicy details, click on an individual request in the Requests History section.

This page goes into detail on what happened in an individual request to our app. The timeline is of particular interest: Each row represents an individual RPC call that was made, with the start time, end time, and CPU time consumed all noted by means of a chart on the right. As you can see, it’s quite possible for the CPU time consumed by an RPC call to exceed the wall clock time - this typically occurs when multiple machines are involved in assembling a reply to your RPC request.

Let’s take a look at the code in question. Clicking on the datastore_v3.RunQuery RPC call to expand, we can get a complete stacktrace for the RPC call, and by clicking on a stack frame, Appstats shows us the source code in question! We see that the culprit looks something like this:

def get_questions(self):
 return models.Question.all()

This is a really frequent query - it’s executed every time anyone loads the front page - so it’s a prime candidate for the memcache API. Modifying the function to take advantage of it is straightforward:

def get_questions(self):
 questions = memcache.get("latest_questions")
 if not questions:
   questions = models.Question.all().fetch(50)
   memcache.set("latest_questions", questions, time=60)
 return questions

All we’re doing here is first, checking if the results are already available in memcache. If they’re not, we fetch them the regular way, by doing a datastore query, and store them in memcache for future reference, telling it to keep them around for 60 seconds.

60 seconds is relatively high for data that may change often, but we figure users won’t be bothered by this on our site. Much shorter timeouts - as low as just a few seconds - can still save huge amounts of resources, especially on popular sites. Few users will worry about a 5 second cache timeout, but if you’re getting 100 queries a second, you’ve just eliminated 99.8% of your query overhead!

If we repeat our request with the new code, we can take a look at the updated statistics and note the improvement on a ‘warm’ request that fetches data from memcache:

Much better! Faster by every metric: We’ve cut a chunk off the wallclock time, so our users get pages faster, and we’ve reduced CPU time and API CPU time as well! As you can imagine, even more dramatic improvements are possible for more complex applications.

Appstats can also help you with your wardrobe: participate in the Appstats contest for the best before/after screenshots of Application Stats. Post your screenshots online and link to them on Twitter copying @app_engine and using the hashtag #coolappstats, before May 2nd 2010. The coolest pair of screenshots will be used to create a Google App Engine T-shirt, and we will send that T-shirt, autographed by the App Engine team, to the winner.

Monday, March 29, 2010

Read Consistency & Deadlines: More control of your Datastore

Last week we announced the 1.3.2 release of the App Engine SDK. We’re particularly excited about two new datastore features: eventually consistent reads, and datastore deadlines.

Read Consistency Settings

You now have the option to specify eventually consistent reads on your datastore queries and fetches. By default, the datastore updates and fetches data in a primary storage location, so reading an entity always has exactly up to date data, a read policy known as “strong consistency.” When a machine at the primary storage location becomes unavailable, a strongly consistent read waits for the machine to become available again, possibly not returning before your request handler deadline expires. But not every use of the datastore needs guaranteed, up-to-the-millisecond freshness. In these cases, you can tell the datastore (on a per-call basis) that it’s OK to read a copy of the data from another location when the primary is unavailable. This read policy is known as “eventual consistency.” The secondary location may not have all of the changes made to the primary location at the time the data is read, but it should be very close. In the most common case, it will have all of the changes, and for a small percentage of requests, it may be a few hundred milliseconds to a few seconds behind. By using eventually consistent reads, you trade consistency for availability; in most cases you should see a reduction in datastore timeouts and error responses for your queries and fetches.

Prior to this new feature, all datastore reads used strong consistency, and this is still the default. However, eventual consistency is useful in many cases, and we encourage using it liberally throughout most applications. For example, a social networking site that displays your friends’ status messages may not need to display the freshest updates immediately, and might prefer to show older messages when a primary datastore machine becomes unavailable, rather than wait for the machine to become available again, or show no messages at all with an error.

(Note that eventual consistency is never used during a transaction: transactions are always completely consistent.)

Datastore Deadlines

The datastore now also allows you to specify a deadline for your datastore calls, which is the maximum amount of time a datastore call can take before responding. If the datastore call is not completed by the deadline, it is aborted with an error and app execution can continue. This is especially useful since the datastore now retries most calls automatically, for up to 30 seconds. By setting a deadline that is smaller than that, you allow the datastore to retry up to the amount of time that you specify, while always returning control to your app within the deadline window. If your application is latency sensitive, or if you’d prefer to take an alternate action when a request takes too long (such as displaying less data or consulting a cache), deadlines are very useful: they give your application more control.

Setting the Read Policy and Datastore Deadline

To enable deadlines and eventual consistency with Python, you create an RPC object with the function create_rpc() and set the deadline and read_policy on the object. You then pass the RPC object to the call as an argument. Here’s an example of how you would do this on a datastore fetch:

rpc = db.create_rpc(deadline=5, read_policy=db.EVENTUAL_CONSISTENCY)
results = Employee.all().fetch(10, rpc=rpc)

To set a deadline and datastore read policy in Java, you may call the methods addExtension() and setTimeoutMillis(), respectively, to a single Query object:

Query q = pm.newQuery(Employee.class);
q.addExtension("datanucleus.appengine.datastoreReadConsistency", "EVENTUAL");
q.setTimeoutMillis(3000);

You can also use these features in JDO and JPA using configuration. You can also use these features directly with the low-level Java datastore API. See the documentation for these features in Python and Java for more information.

Thursday, March 25, 2010

App Engine SDK 1.3.2 Released

Today we are excited to announce the release of version 1.3.2 of the App Engine SDK for both the Java and Python runtimes. 1.3.2 includes a number of changes and bug fixes.

For this release, we have concentrated on removing a number of limitations that have been affecting developers:

  • Blobstore API - A new method (fetch_data for Python, fetchData for Java) allows your application to request the contents of a Blob from within your application’s code.
  • URLFetch API - We’ve expanded the number of ports you can access with the URLFetch API. You can now access ports 80-90, 440-450, and 1024-65535.
  • Mail API - We’ve expanded the allowed mail attachments to include common document extensions including .doc, .ppt, and .xls.
  • Task Queue API - We’ve increased the maximum total Task Queue refill rate to 50 per second.

We’re also happy to announce, based on your feedback, a new Denial of Service (DoS) blocking system in App Engine. This system allows you to blacklist specific IP addresses from accessing your application, and to prevent them from costing your application money or eating up your quota. You can also view the top IPs that have accessed your application in the Admin Console, to help you figure out what IPs you may want to block. More information on this feature is available for Python and Java.

There’s a lot of other changes and fixes in this release, including a new Java version of the Appstats profiling tool, so read the release notes (Python, Java) for a complete list of changes and download the new versions of the SDK.

Friday, March 12, 2010

App Engine Community Update

It's been a while since the last community update post, and you've probably been wondering what the App Engine community has been up to over the holiday period. There's been a lot of activity, so without further ado, here's your community update.

Open Source Projects

There were a lot of new App Engine Open Source projects released since the last update, for both Java and Python, and including everything from libraries to complete working apps.

gdispatch is an extension to the Python webapp framework that allows you to embed routes alongside your request handlers. SUAS is a Python library that provides straightforward authentication and session management for App Engine apps. If you're looking for a complete, lightweight framework, tipfy might be what you're looking for - it bills itself as "a cute little Python framework for App Engine which follows the basic concepts of web.py and webapp".

The Java community has been even busier, with members releasing simpleds and Objectify, both of which provide alternative interfaces to the App Engine datastore for those of you who prefer systems designed specifically with App Engine's datastore in mind over JPA or JDO. For your unit testing needs, there's Kotori Web JUnit Runner, which allows you to run JUnit unit tests in production. Finally, appleguice provides a sample application demonstrating how to integrate Guice dependency injection with GWT and App Engine.

Interesting Apps

The last few months have also seen a plethora of App Engine based apps released, many of which provide source code so you can deploy your own, or contribute to their development. We've picked out a few that we think you'll find of interest.

JobTracker is a Python app that provides an interface to allow tracking working hours and tasks in your team, and is licensed under the APL 2.

Nimbits provides a time series service on App Engine, allowing you to record regular measurements - be they server latencies or fishtank temperatures - and process and export them via APIs or a web interface.

nxdom is an interface designed to make picking the domain name for your next project easier; it operates on lists of recently expired domains, and its Python source is freely available.

OpenShare is a new banner exchange service targeted at open source projects; it's written in Python and is available under the GPL 2.

If you're feeling nostalgic, z-machine lets you play interactive text games through a browser interface, or over XMPP. You can even save a game in one interface, and continue playing in the other. If you're inspired by this, the Parchment project implements a z-machine interpreter in pure Javascript.

Worried about your app when out and about? App Engine Watch is an Android app that lets you monitor your app's quotas from your phone.

A lot of projects have centered around Twitter integration, with Pulse of the Planet letting you visualize tweets in real time on a map, while Lazytweet provides a friendly interface for convincing other people to do your work for you (or doing their work for them, if you're feeling generous). Tweet Engine makes tweeting using a shared Twitter account easier by allowing you to share an account with your group without needing to give them all the account password. It's also open source - you can get the source here and deploy your own, or just use the live instance at http://www.tweetengine.net/.

If that's still not enough microblogging for you, TypePad has released TypePad Motion, their open source community micro-blogging app. It's available here.

Runtimes

If your favorite language isn't Java or Python, that doesn't mean you're out of luck. As long as there's an implementation on the JVM, you can use it on App Engine! Several projects are improving integration between App Engine and other JVM languages.

This post discusses how to use Clojure on App Engine, while this post covers using Quercus - a PHP implementation for the JRE - with App Engine task queues. Finally, if Groovy is your language of choice, definitely give the gaelyk project a look, and if you prefer JavaScript, take a look at appenginejs.

Blogging

Many bloggers continue to turn out insightful and useful posts relating to App Engine. Here's a few that caught our eye:

If all the Twitter related projects interested you, and you're wondering about implementing your own, this article on building feedertweeter and this article on using Twitter's OAuth API on App Engine are probably of interest.

There's an excellent blog post on hitching.net about practical use of geohashing for geospatial apps on App Engine.

Just about everything on the gaejexperiments blog is worth a read if you're a Java coder. Recent posts cover topics such as using reCAPTCHA, using the Java blobstore API and using the task queue.

The Django-nonrel folks are showing some impressive progress - see their post on how to deal with noSQL databases, and read their blog for more information.

The wolfire blog has a great post on how AppScale helps prevent App Engine 'lock-in'. If you want to learn more, this post on the IBM developerworks site goes into more detail about AppScale.

If you want to keep up with blogs, articles, and news about App Engine, the App Engine reddit should be your first port of call. And if you want to see your own post or project up here, submit it to the reddit, or mention it in the groups!

Monday, March 8, 2010

App Engine joins the Google over IPv6 Program

The Google over IPv6 program allows ISPs with good connectivity to request IPv6 access for most Google services. In about a week, we'll be adding Google App Engine and the appspot.com domain to this program. This means that all App Engine apps will become accessible over IPv6 to anyone participating in the program!

For most people, this won't require any changes to your code at all. If your App Engine code reads os.environ["REMOTE_ADDR"] in Python, or HttpServletRequest.getRemoteAddr() in Java, be aware that this value may be an IPv4 address, like "192.0.2.1", or an IPv6 address, like "2001:db8::1". Now is the time to verify that your code doesn't make any IPv4-specific assumptions, so that your IPv6-ready users will have a seamless transition.

Some libraries for manipulating IP addresses, should your app require one, include ipaddr and java.net.InetAddress.

Q: My app doesn't handle IP addresses at all. Can I ignore this announcement?
A: Yes.

Q: I have a third-party domain hosted through Google Apps and the ghs.google.com CNAME. Will this affect me?
A: This particular change only affects appspot.com domains, but you should still make sure your code is IPv6-safe, as we're always working to add IPv6 to more services.

Q: Will everyone see an IPv6 AAAA record for my app?
A: No, the AAAA record is visible to less than 1% of users, whose ISPs are participating in the Google over IPv6 program.

Q: Does this mean each app will get its own unique IPv6 address?
A: No, we try to keep our IPv4 and IPv6 services as similar as possible.

Q: My ISP isn't a Google over IPv6 participant. How can I verify that that my code doesn't choke on IPv6 addresses?
A: You should include IPv6 addresses in your unit tests. After the launch, you'll be able to use the IPv4-IPv6 Website Gateway provided by the SixXS project. It's accessed by appending .ipv4.sixxs.org to any IPv6-enabled hostname. For example, the "Shoutout" sample app will be visible at http://shoutout.appspot.com.ipv4.sixxs.org/.

Q: Why does the previous link report a "does not have an IPv6 address" error?
A: That means we haven't launched yet. The delay is meant to give developers time to review their address-handling code.

If you experience any problems with IPv6 serving and App Engine, please report it in the App Engine issue tracker or post about it in the App Engine discussion group.