Monday, January 23, 2012

Idempotent services

An important service concept which is often neglected in (web) service design is idempotency.

If web services use HTTP as the communication protocol, it must be realized that HTTP cannot guarantee a quality of service (QoS) of exactly once. You can either achieve best effort or at least once. The former is when you do not do any retries after a communication error; the latter is achieved when you do any retries after a communication error.

Since best effort may mean that you loose a message every now and then, this QoS is rarely preferred. Usually some retry mechanism is implemented or configured. Hence, the QoS is most often at least once.

Take a look this. Suppose you send a message over HTTP to a web service, but you receive a time out. What has happened? Is the message accepted by the web service? Is it even processed? Or is it not sent? We can't really know.

When the sender of the message is an interactive application, the person using the application generally pushes the submit button again. The message is resent. This is typical behaviour. (Resending may occur at many points along the communication line: proxies, load balancers, etc. Don't think this is only human!)

But what if the message was received by the web service? The service may have processed it. Suppose the processing results in the insertion of data in a database. What happens when the second message arrives?

Surely, the second message may lead to a second insertion of the same data. There may thus be multiple data records for exactly the same data. This is not really what is wanted by the average database owner. Database designers identify data with unique keys. What happens if you try to insert the same data, with the same unique key, in the database, is a uniqueness constraint violation. This results in an error. In Java, we are all familiar with the DuplicateKeyException.

So the sender of the message will first insert data into the service's database, but he doesn't receive an acknowledgement of this, due to the HTTP time out. He sends it again, but now he receives a DuplicateKeyException. Hmm, that's unexpected. Now what?

The sender needs to perform at least one read operation into the service's database in order to verify if the data were correctly inserted into the database or not. A human, operating an application, may do this naturally, but to implement this in an automatic way can be very complex indeed. And who needs to implement this complexity? It is the service consumer, not the service provider. From a business point of view, this is not very customer friendly.

All this can be avoided if the service would be implemented as an idempotent service. Idempotency means that no matter how often I send the same message, I always get the same response.

Read operations are idempotent. No matter how often I read the same data, I will always get the same answer. (Yes, of course, until somebody changes those data, but that is not the point here.)

It is the create, update, and delete operations where idempotency becomes important. Suppose I want to create some data in a database. In normal operation, the service accepts my data, will insert it in a database, and will respond with a success message, very likely including the unique key which identifies the inserted data.

If I were to send the same data again for creation, an idempotent service will not respond with a DuplicateKeyException, but with the same answer as I would have received if this were the first message to insert these data in the database. Thus, I should receive the same success message again, and if the unique key is included in that message, it should be the same unique key.

For update and delete operations, idempotency essentially works the same as for the create operation.

Of course, implementing idempotency can be complex for the service provider. And complexity costs money. That's probably the real reason why idempotent services are so rare. (At least in the government environments I tend to work.)

But once a service is made idempotent, it is foolproof, and can guarantee data consistency across integration boundaries. Services should be made idempotent. It should be an architectural and service design principle. Especially so for services which are used by many service consumers. You can't really push out your own complexity of keeping data consistent to your consumers, can you? The consumers are your customers, you should treat them as such. You gain more customers if you make it easy for them.

3 comments:

  1. Anonymous21:35

    hi ignazw

    If idempotency means that no matter how often I send the same message, I always get the same response, it seems that it is important to be able to recognize a service as an idempotent service.

    Because if I am not aware that I am using an idempotent service, I might interpret multiple same responses as multiple successes instead of just one success (depending on the functionality). Or did I miss something?

    So, how can idempotency of a service be known or communicated?

    regards
    Jan Vervecken

    ReplyDelete
  2. The responses must be identical, if and only if the requests were identical. Normally there will be unique message IDs associated with the messages. Idempotency must include these message IDs. If the same data is to be inserted, but is sent with a different message ID in the request, the service should interpret this as a new message, and hence it should (try to) insert it twice.

    If you include unique message IDs in the idempotency solution, there should be no confusion. You will then also recognize multiple identical responses as being the response to a single request.

    ReplyDelete
  3. I fully agree with your views. Thanks for sharing your points.These are really helpful to all.
    IT Support Glasgow

    ReplyDelete