My Thoughts on Idempotency
What is Idempotency?
Idempotency is a property of certain operations whereby they can be applied an arbitrary number of times without changing the state beyond the initial application. Most often in software the term idempotency is applied to APIs. An idempotent API is one that when it is invoked an arbitrary number times with a fixed request produces the same observable side effects as if it had been called a single time.
A key observation to make about idempotent APIs is they are evaluated for idempotency based on the side effects they produce, not the result they return! Lets look at two examples of idempotent APIs to clarify this point
// EXAMPLE 1:
// UploadBlob will upload the provided blob to the provided path.
// If a blob already exists at the provided path an ALREADY_EXISTS
// error will be returned. If the blob was successfully uploaded
// then no error will be returned.
UploadBlob(path, body) error// EXAMPLE 2:
// CreateDatabase will create a database with given name and config.
// If the database already exists no error will be returned.
// Instead the current status of the database will be returned.
CreateDatabase(name, config) (state, error)
Both of these APIs if they are called multiple times with the same input can produce different results. The first API may first return success and then later return an ALREADY_EXISTS
error. While the second API may first return status=pending
and on subsequent calls return status=running
Despite the different return values, these APIs are both considered idempotent because retries do not result in any side effects not caused by the initial invocation.
Why does idempotency matter?
Idempotency is a key building block in distributed systems. The reason idempotency is needed is because exactly-once delivery in distributed system is impossible to achieve. If we had a solution to the exactly-once delivery problem we would also have a solution to the classic impossible two generals problem.
Given that exactly-once delivery is impossible, communication between components in distributed systems are described in terms of at-least-once delivery or at-most-once delivery.
For applications which can tolerate message loss or duplicate events at-most-once / at-least-once delivery works just fine. But there are many applications that appear to require exactly-once delivery.
Instead of attempting to achieve exactly-once delivery we instead achieve exactly once execution. We do this by combining at-least-once delivery with an idempotent API. The caller side continues to retry until it gets a successful status back from the API, and the API is written to be idempotent. These two have the effect of producing exactly-once execution.
How do you make an API idempotent?
APIs are made idempotent by using some key to deduplicate requests. Many APIs have a natural idempotency key built in. For example our earlier examples of UploadBlob
and CreateDatabase
both had naturally built in keys that could be used to enforce idempotency. In the case of UploadBlob
the key was the blob path and in the case of CreateDatabase
the key was the the database name.
APIs which have a naturally built in key to deduplicate requests will often be implemented in an idempotent way without the code author giving any special attention to the making the API idempotent.
However there are other APIs which do not have a naturally built in key which can be used. For example consider an API to charge a credit card. There is no naturally built in key here. Said another way there is no obvious way to distinguish between a retry and an identical but distinct request. In order to make an API like this idempotent an explicit idempotency key needs to be added to the the API request. The idempotency key will be generated client side and stored server side in order to enable server side deduplication of retries. The Stripe APIs are an excellent illustration of this.
While it is true that many APIs do not need explicit idempotency keys because they have some natural key which can be used, it can be helpful to define explicit idempotency keys for all APIs anyways. This can improve the consistency and flexibility of the APIs.
Does the server store idempotency keys forever?
Idempotency keys must be stored on the server in order for them to get used to deduplicate requests. But if the keys get stored on the server then there must be some method for expiring keys otherwise they will consume an unbounded amount of space.
If the idempotency key is simply a resource identifier rather than an explicit idempotency key, then the server does not need to do anything special to expire the keys. The lifecycle of the resource will simply be the lifecycle of the idempotency key. However, if the API supports passing an explicit idempotency key, then the server must explicitly expire those keys. There are two common policies for idempotency key expiration. The first is based on time. In this approach the keys are stored for a fixed time window and then deleted. This is the approach the Stripe APIs took. Another approach is to expire a key after the resource it operated on has been deleted, this is the approach the AWS APIs take.
What issues can you encounter with client side retries?
While idempotent APIs make client side retries safe, there are still some common issues that you should be aware of when you introduce client side retires.
Issue 1: The Already Existing Record
Suppose you implemented the UploadBlob
API from earlier in this post. Also suppose you implemented some client side retries to help mask transient failures for your users. It seems like you did everything right here, but there is actually an unfortunate experience your customers will encounter. Consider the following sequence of events
- Client uses client SDK to issue
UploadBlob
request - On the first attempt the SDK gets a timeout error, but the server actually did successfully upload the blob.
- The SDK wants to be helpful so it retries the transient timeout error but this time it gets an
ALREADY_EXISTS
error which it returns to the user. - The user gets an error indicating that the blob already exists. But this is misleading because the blob actually did not exist, the SDK just had uploaded it on an earlier request that timed out.
This is a very common issue that arises when implementing idempotent APIs plus client side retries. A common solution to this problem is to simply have the API return the existing record on idempotency key conflict rather than returning an error. It is also possible to make the API behavior configurable by the client in the case of an idempotency key conflict.
Issue 2: The recreated resource
Suppose we implemented idempotent CreateDatabase
and DeleteDatabase
APIs. Also suppose we implemented a client side SDK to help our users with automatic retries. Now consider the following sequence of events
- Bob uses SDK to issue
CreateDatabase
request - The SDK issues the first attempt and gets a timeout. On the server side the call was successful and the database was actually created.
- Alice uses the SDK to issue a
DeleteDatabase
request - The SDK issues the first attempt to delete the database and is successful.
- Alice gets an indication that the database was deleted.
- The SDK retries the Bob’s earlier attempt to create the database because the previous call encountered a transient timeout error. This second call is successful and the database is created for a second time.
This behavior is actually pretty odd. Bob actually ended up successfully creating the same database twice. There is not really an obviously correct behavior in this case. In the case of the AWS APIs they simply enable this double creation as this example illustrates. Other applications can make other reasonable decisions as to how to handle this case.
Well that is all my thoughts on idempotency. Thanks for reading.