Nebulous: The Protocol

(It was going to be “The Nebulous Protocol”, but that sounds like something with Matt Damon in it.)

Introduction

This document defines some specific rules of engagement for processes to communicate using STOMP. It so happens that NebulousStomp is currently the only open-source thing to use these rules – I made them up – but technically speaking, that's a “protocol”, and I'm going to call it such because you can't stop me.

The basic idea is this: one system passes a message, a request; and then waits for an answer to that message, a response, from another system. STOMP doesn't explicitly support that; the protocol does it.

(Note that a request and a response are just notional terms; you might in theory have a request which provokes a response which provokes another response in return, in which case the second message is both a response and a request, and the terms get rather less useful.)

The Protocol

Let's start with the actual protocol, the rules. They won't make sense without reading the rest of the document, but:

  1. Every valid request (that is, every message with an identifiable verb other than "success" or "error") will always generate a response. If the handling routine cannot generate a valid response, or if there is no routine to handle the verb in question, then the response should be the error verb. If no response is required, you should still receive the success verb.

  2. You should expect that requests that do not have a verb will not be responded to or even understood. (However, there is no general requirement for message bodies to follow the verb / parameters / description format. If you have a program that sends a request and waits for a response, the format of that response is outside the remit of The Protocol.)

  3. If a request has the neb-reply-to header, the response should use that as a destination; otherwise it should be sent to the same destination that the request was sent to.

  4. If the request has a neb-reply-id header, then the response should set the neb-in-reply-to header to that.

  5. The request may specify the content type of a response (JSON or text) by it's own content type. That is, if a response can be in either form, it should take the form of the request as an indication as which to use.

  6. A given queue that is the target of requests within The Protocol will be for only one system or common group of systems. They may consume (ACK) messages sent to it even if they do not have the facility to process them. If multiple systems share a queue, they should understand that messages will be consumed from it at random. (So, don't do that.)

Components

By way of an explanation of the above section.

Headers

STOMP allows you to define custom headers. We have three.

Message Body

The Protocol specifies a format for the message body. It consists of three fields:

Nebulous supports message bodies in either JSON or plain text (in which case it expects to find the fields formatted in the same way as STOMP headers: seperated by line breaks, where each line consists of the field name, a colon, and the value).

There are a couple of special verbs:

Notes on usage

There are two use cases here. The first (let's call it Q & A) is when a process needs information and sends a request, then waits for a response. The second (let's call it Responder) is at the other end of that process; it camps onto one or more queues waiting for requests, and arranges for responses to be sent.

Responder

Let's talk about the Responder use case first, since it's simpler. On any given system, you'll need to designate a queue for incoming requests. You might want more than one. (In ABL, where traffic jams are likely because I can't just spawn up a thread to handle each incoming message, my current thinking is to have two incoming queues, one for reqests that take a few seconds, and another for requests that take longer.)

Remember that rule 6 says that any messages that go to a queue like this will be consumed without concern for whether the message will make sense to the system in question; this is basically there so that the ABL code can split up the process of grabbing new messages from the process of working out what they are and how to answer them.

But the simplicity is appealing, regardless: post to this queue and the Responder that looks after this system will process it. Rule 1 guarantees you an answer so long as the system is up, so if you don't get one then either the target system is down or it's too busy to respond.

The expectation is that the Responder system should use the verb of the message to control how it is dealt with. The parameters field is verb-specific; the combination of verb, expected parameters, and the nature of the returned message form a sort of contract, ie “verb x always expects parameters like this, and always behaves like that in response”. This is partly implied by Rule 2, I think.

Q & A

The Q & A use case is more interesting since it's what the whole thing is for, but some of it really falls outside of The Protocol.

In theory, Rule 3 says you can fail to use neb-reply-to and pick up your response from the same queue you posted the message to. But rule 6 says that if you do that you don't have any guarantee at all of getting your message; the Responder will take it. So for practical purposes you have to set a reply-to queue in your request.

Likewise, Rule 4 says that neb-reply-id is optional. But in practice you should almost certainly set it. Yes, you can specify a brand new queue that is unique to your request – or probably unique, anyway – but it turns out that some message servers, our RabbitMQ included, don't let you subscribe to a queue that doesn't exist. It's easy enough to create a queue: you just send a message to it. But now there are two messages in that queue (or more if turns out that you've got some queue namespace collision after all) and you have to pick your response from it, and the easiest way is to set a reply id.

So let's assume a median worst case: you're posting a request to a Responder queue with a reply-id set to something hopefully unique, and reply-to set to a common queue that many processes use to pick up replies. There are two challenges here: first, working out which message is yours; second, avoiding consuming those messages that are not.

For a unique reply-id you could do worse than starting with the session ID that STOMP returns when you send a CONNECT frame; clearly the message server thinks that is unique, and it should know. In the Ruby Stomp gem, you can get it with client.connection_frame().headers["session"] where client is your Stomp::Client instance; in my ABL jhstomp.p library call get_session_id(). (This is how NebulousStomp does it.)

Now that you have a good reply ID you can tell which is yours by Rule 4; just test the neb-in-reply-to header of each message.

The second problem, of avoiding consuming messages that don't match your reply-id, is handled by careful use of STOMP. If when subscribing you set the header ack:client-individual, then you must manually acknowledge each message you want to consume with an ACK frame. (Again, how NebulousStomp does it.)

Finally, you get to handle the response. Rule 1 says that it will either be an error verb, a success verb … or something else specific to the verb. The nature of messages in responses is really outside of The Protocol; you are free to use the verb / parameters / description format if you wish. Again, I'm assuming that a given verb will always require the same parameters and return the same message. I think that to do otherwise would be very confusing. But The Protocol can't enforce that.

Note also that while The Protocol says that a request should always result in a response, there is nothing to say that the sender of the request should care – say, in the example of a request that results in a report being emailed, which takes 20 minutes. In practice where that is the case I've been sending the success verb early, to say “yes, got your message, I see that it is valid – don't wait up”; but the party that sends the message doesn't have to check it, since it's not very helpful.