An interesting scenario that I keep bumping into is: there is a device, which is typically "headless" (no significant UI), but it performs some specific, useful function ("IoT" device. Note the quotes). There is a web app, with a rich UI. Your customers wants them to "talk".
I smell trouble at the "talk" part.
- Data collection + display? Sure, IoT is born to do exactly this.
- Control the device, issuing commands? You have to be careful with this one, naive solutions bring a lot of trouble.
- Getting interactive feedback? (Command - response - Web UI update) Let's talk about this, because it can be done, but it is not so straightforward.
The trouble with this scenario is that it looks so simple to non technical people (and less technical people alike... I heard an IT manager ask once how he can wire up his scanner to our web application, but that's another story). However, it is so easy to come up with bad or ugly solutions!
Fortunately, with a couple of new-ish technology bits and some good patterns, it is possible to come up with a good solution.
Well... Yeah, it you follow The Good, The Bad and the Ugly trinity, you have to start with "The Good". But I don't want to talk about the good already, it will spoil all the fun!
Let's come back to the good later.
<do not do this!>
Sending commands to a device is quite easy. You open a socket on the device, listening. You somehow know where the device is, which address it has (which is an interesting problem in its own, but I digress), so from your web server you just connect to that socket (address/port) and send your commands. You probably don't have a firewall, but if you have one, just punch a hole through it to let message pass through.
</do not do this!>
Even if you go trough all the effort of making it secure (using an SSH tunnel, for example, but I have seen plain text sockets with ASCII protocols. Open to the Internet.), you are exposing a single port, probably on a low-power device (like an ARM embedded device), possibly using a low-bandwidth channel (like GPRS). How much does it take to DoS it? Probably you don't even need the first D (as in DDoS), you could do it from a single machine.
But let's say you somehow try and cover this hole, maybe with a VPN, inserting a field gateway in front of your "IoT" device(s), or putting a VPN inside the devices themselves if they are powerful enough (and the aforementioned client exists for you platform/architecture. Good luck with some ARM v4 devices with 1MB (Mega) disk space I have seen, but I digress again).
Great, you are probably relieved because now you can have interactive feedback!
<do not do this!>
You see, it is easy. Your user click on the page. On the web server, inside the click handler (whatever this is: a controller, a handler, a servlet...) you open a socket and send a command trough TCP, and wait for a response. The client receives the command, process it, aswers back through the socket and closes the connection. The web server receives it, prepares an HTTP response and returns it to the web browser. Convenient!
</do not do this!>
Now you have thread affinity all the way: the same thread of execution spawns servers, programs, devices. Blocking threads is a performance bottleneck in any case but it is a big issue on a server.
If the network is slow (and it will be), you may end up keeping the web server thread hanging for seconds. Let's forget about hanging the web browser UI (you can put up a nice animation, or use Ajax), but keeping a web server thread hung for seconds doing nothing is BAD. Like in "2 minutes, and your web server will crash for resource exhaustion" bad.
IoT is difficult, real-time web applications are difficult, so let's ditch them.
We go back one or two decades, and write "desktop" applications. Functionalities provided by the former web applications are exposed as web services, and we consume them from our desktop app. Which is connected directly to the device. Which is not an "Internet of Things" device anymore, maybe a "Intranet of Things" device (I should probably register that term! :) ), if it is not connected by USB.
It makes sense in a lot of cases, if the device and the PC/Tablet/whatever are co-located. But it imposes a lot of physical constraints (there is a direct connection between the device and the PCs/Tablets that can control that device). Also, if the app was a web app to begin with, there are probably good reasons for that: easy of deployment, developers familiar with the framework, runs on any modern web browser, ...
Especially if you discover that half your clients are using Windows PCs, the other half Linux, and a third half Android tablets. Now you need to build and maintain three different desktop applications. Which is an ugly mess.
Besides, how do you reach your "IoT"-device now, if it is on a private Intranet? How do you update it, collect diagnostics and logs in a central location? You can not, or you have to setup complicate firewall rules, policies, local update servers. Again, feasible, but ugly.
The good (finally)
The solution I came up with is to use WebSockets (or better, a multi-transport library like SignalR) + AMQP "response" queues to make it good.
AMQP is a messanging protocol. It is a raising standard, and it is implemented by many (most) queuing servers and event hubs (see my previous post). An interesting usage for AMQP is to create "response queues". A hint on how this might work is given, for example, in the RabbitMQ tutorial. The last tutorial in the series describes how to code an RPC mechanism using AMQP.
The interesting part of the tutorial is in the client code:
var consumer = new QueueingBasicConsumer(channel);
var replyQueueName = channel.QueueDeclare(exclusive: true, autoDelete: true).QueueName;
// Do "Something special" here with replyQueueName
var ea = (BasicDeliverEventArgs)consumer.Queue.Dequeue();
if(ea.BasicProperties.CorrelationId == corrId)
The client declares a name-less (name is auto generated) queue. Plus, the queue should be deleted on disconnection. There are two flags for that: exclusive and autoDelete.
Than it does something with the queue name, and then continuously waits and reads messages from this queue.
The "something special" is: communicate to the server the device availability, and specify the name of the queue. This queue will be like an inbox for our device: a place on a server where who wants to communicate with us will place a message. The device (or better, a thread/task on the device) will wait for any incoming message, read it, and then dispatch it. The device will act accordingly.
It is important to note that the client is establishing the connection to a server (the queueing system server), not the other way around. This prevents the problem highlighted in the "Bad" section.
WebSockets (and related transport mechanism, like Server-Sent events, long polling, etc.) allow code on the server side to push content to the connected clients as it happens, in real-time.
So the device communicates with the web server using some standard way (direct HTTP POST to the server, or even better posting to a queue, and then having a worker read from the queue and POST to the web server, so you have queue-based load levelling, and the server pushes the update to the client.
Note that the server knows which devices are "on" and can be controlled by the client (because the first thing a device does is to announce to the server its availability, and where it can be contacted), and it can also know which client is talking to which device, because traffic passes through the server:
Put the two together, and you have a working system for real time command + control of "IoT"-like devices, with real-time feedback and response, from a standard web application.