Monday, March 29, 2010

Read Consistency & Deadlines: More control of your Datastore

Last week we announced the 1.3.2 release of the App Engine SDK. We’re particularly excited about two new datastore features: eventually consistent reads, and datastore deadlines.

Read Consistency Settings

You now have the option to specify eventually consistent reads on your datastore queries and fetches. By default, the datastore updates and fetches data in a primary storage location, so reading an entity always has exactly up to date data, a read policy known as “strong consistency.” When a machine at the primary storage location becomes unavailable, a strongly consistent read waits for the machine to become available again, possibly not returning before your request handler deadline expires. But not every use of the datastore needs guaranteed, up-to-the-millisecond freshness. In these cases, you can tell the datastore (on a per-call basis) that it’s OK to read a copy of the data from another location when the primary is unavailable. This read policy is known as “eventual consistency.” The secondary location may not have all of the changes made to the primary location at the time the data is read, but it should be very close. In the most common case, it will have all of the changes, and for a small percentage of requests, it may be a few hundred milliseconds to a few seconds behind. By using eventually consistent reads, you trade consistency for availability; in most cases you should see a reduction in datastore timeouts and error responses for your queries and fetches.

Prior to this new feature, all datastore reads used strong consistency, and this is still the default. However, eventual consistency is useful in many cases, and we encourage using it liberally throughout most applications. For example, a social networking site that displays your friends’ status messages may not need to display the freshest updates immediately, and might prefer to show older messages when a primary datastore machine becomes unavailable, rather than wait for the machine to become available again, or show no messages at all with an error.

(Note that eventual consistency is never used during a transaction: transactions are always completely consistent.)

Datastore Deadlines

The datastore now also allows you to specify a deadline for your datastore calls, which is the maximum amount of time a datastore call can take before responding. If the datastore call is not completed by the deadline, it is aborted with an error and app execution can continue. This is especially useful since the datastore now retries most calls automatically, for up to 30 seconds. By setting a deadline that is smaller than that, you allow the datastore to retry up to the amount of time that you specify, while always returning control to your app within the deadline window. If your application is latency sensitive, or if you’d prefer to take an alternate action when a request takes too long (such as displaying less data or consulting a cache), deadlines are very useful: they give your application more control.

Setting the Read Policy and Datastore Deadline

To enable deadlines and eventual consistency with Python, you create an RPC object with the function create_rpc() and set the deadline and read_policy on the object. You then pass the RPC object to the call as an argument. Here’s an example of how you would do this on a datastore fetch:

rpc = db.create_rpc(deadline=5, read_policy=db.EVENTUAL_CONSISTENCY)
results = Employee.all().fetch(10, rpc=rpc)

To set a deadline and datastore read policy in Java, you may call the methods addExtension() and setTimeoutMillis(), respectively, to a single Query object:

Query q = pm.newQuery(Employee.class);
q.addExtension("datanucleus.appengine.datastoreReadConsistency", "EVENTUAL");

You can also use these features in JDO and JPA using configuration. You can also use these features directly with the low-level Java datastore API. See the documentation for these features in Python and Java for more information.

No comments: