Act 1

------------------------ The curtain opens ------------------------

- Jill µ service is already sitting in the bar. The setting is typical of all bars. It is crowded with a lot of µ services having a good time with drinks and conversations. It is a chatty place. Jack µ service enters, looking haggard. -

A Services Bar

Jack µ: What a day! Shall I sit here? ... I am going to sit here. Please don't mind.

Jill µ: Go ahead. No problems.

Jack µ: Thank you. I am Jack. Who are you?

Jill µ: I am Jill. Pleased to meet you in person!

Jack µ: Meet you in person? What does that mean? Does that mean that we know each other?

Jill µ: You talk to me all the time professionally, by calling my URI - Ring any bells?

Jack µ: Oh yeah. I know you. I send requests to that url, I mean your url all the time. Yeah, I remember.

Jill µ: Told you.

- Jill is smiling. Then Jack's demeanor changes. Looks up agitated & angry -

Jack µ: You gave me so much PAIN last week. Why did you do that?

- Jill looks at Jack shocked for a moment, but calms herself down -

Jill µ: What did I do?

Jack µ: Oh! You don't remember last wednesday. Oh boy! I remember it all.

- A look of understanding comes on Jill's face -

Jill µ: Ah! I do remember. It was a bad day. My development guys were all looking at my problem...

- Jack lifts his hand and interrupts Jill -

Jack µ: No, no, no! It was a BAD day for me. I got sc***ed because of you. It is all because of you.

Jill µ: Hold on. Let us take it slow. Here, have a drink.

Jack µ: Don't patronize me. You did this to me.

Jill µ: What really happened? You need to give me details so that I can understand. Calm down and have this.

- Jill picks up the CPU credits from the counter and hands it to Jack and he takes a swig. Jack looks a bit satiated -

Jack µ: Okay. This is what happened. I called you to get the answers I need from you to do my job as usual. I waited for an answer but it never came. And I kept waiting. Then more people asked for the information and I called you again. And you just did not respond. I got all stuck up - all blocked threads; all my servers froze up. I couldn't serve any of my customers. Everyone was shouting & pinging me again & again but, I just couldn't do anything. I lost face because of you.

Jill µ: Ok. Let us talk through this. We are both µ services. We take care of different aspects of the business domain - my developers call it bounded contexts. We work together to ensure that our customer needs are met. Your customers are my customers too. For a given usecase, many times, we have to talk to each other and get things done. And we do that all the time and mostly it is successful.

Jack µ: Yeah, I know that. I am not d**b. But on that day, you....

- Jill interrupts Jack -

Jill µ: Hold on Jack. Let me finish. We both want the same thing & you need to understand that. But given the nature of the world we live in, it is possible that one of us doesn't work as well at times. Things go wrong with the Infra guys sometimes. Or sometimes our developers make a mistake. And sometimes the Network guy acts up - rare - but that does happen. We are a distributed system. We have to live with the problems that come with it.

Cascading Failures

Jack µ: Yeah, yeah. I understand all that. But I really don't want go down when you go down. I don't like this CASCADING of failures . There are many things that I can do to help my customers even when you are down. But since you bring me down with you, I can't serve anyone. I don't like it.

Jill µ: I understand that Jack. I appreciate you wanting to be available for our customers. Please believe me. But as I said, things go wrong with us at times. Last week, I heard your DB guy was acting up and you couldn't help your customers. Pitcher µ was complaining about it to me.

Jack µ: Yeah, that did happen. I know it happens to all of us. But I don't want to feel helpless like this.

Jill µ: Let us dig deeper Jack. How do you call me to get your answers?

Jack µ: I call you through your REST endpoint. I make a call with the right parameters and you return back a JSON response almost immediately. That is the way I like it. But that time everything went wrong.


Jill µ: I got what you are doing now. By the way, do you use TIMEOUTS on your side? I mean on your calls to me as a client.

Jack µ: What timeouts? I don't follow you.

Jill µ: Ok. Timeout is the concept of time boxing an operation so that, if things don't work out within that time, you give up and move on to something else. This is employed in many places. Not just in service to service requests, but when getting any resource that is used by µ services like us. We could use timeouts for acquiring a lock, connecting to a database or file system. In all such actions, timeout is good idea.

Jack µ: I don't follow that fully. Can you explain it a bit more?

Jill µ: Sure. Let me take an example of what I do. Whenever I try to connect to a database, I try it for a period of 6 seconds. If something is wrong - I mean the DB could be having some issues or the network could be acting up, I give up on that task temporarily and throw an error for everybody to see and move on to other things. I might try it again, but I don't keep hoping and waiting forever for things to happen in the first call. Am I making sense?

Jack µ: Kind off.... But... But if you give up, then you can't serve the customer needs. I mean you don't have the data to serve the request. Then, what is the point?

Jill µ: You are right. For that request, I can't serve the customer for sure. But I also don't get bogged down by a slow or failed resource or dependency. I can drop that request and serve other requests. This way, I don't have a set of blocked threads and won't become completely unresponsive. Isn't that better?

Jack µ: Yeah. It sort of makes sense at a high level. Not sure, how it applies to me though.

Jill µ: You said you call my REST endpoint. That is a http call and you must be using some kind of http client. Almost all good http client libraries whether it is in Python, Ruby, Java or any other language, will provide a way to configure timeouts. So go ahead and configure them with some sane values and you are done.

Jack µ: Wow! That is interesting. So what kind of timeouts are involved in http calls?

Jill µ: Typically there are two timeouts that are configured. One is the connection timeout - the time taken for establishing the tcp connection. The other is a read timeout - which is time waiting for data to be received during a read. All these libraries provide the ability to configure both these values. Once you configure them to appropriate values as applicable to you, you are done.

Jack µ: That sounds pretty neat. So you are saying that once I configure myself with these timeouts for my interactions with you, I will be hale, healthy and happy!

Jill µ: Sure. Doing this will be a great step towards your own stability and avoiding Cascading Failures.

Jack µ: Okay! This is good. I am finishing this drink and going to talk to my Dev folks right after. I am going to be the stablest µ service in the world. Cheers!!

Jill µ: Cheers!! Have fun buddy! Have a timeout!

Music grows louder and Jack and Jill go up to the dance floor.

------------------------ The curtain falls ------------------------


comments powered by Disqus