# Monday, 29 August 2016

Cloud, at last!

After some time experimenting, studying, designing (but mostly: presenting possible scenarios to management), we are preparing to move a central part of our systems to the cloud!
Cost savings (especially OPEX - especially linked to sourcing: finding and hiring a good DBA is very hard!), increased availability and resistance to HW failures/catastrophes are the key points I presented to management to help them decide.

On the downside, to be ready to move will require a good engineering effort; our systems are very old, but the general architecture built during the years is sound. It was good (surprising and pleasant) to discover how we already used  many of the patterns listed in the Azure Cloud Design Patterns Architecture Guidance in our systems.



The legacy components of the system have been extensively extended during the years, and the new parts and paths developed since I joined the company in 2012 always followed a classic pattern which you may recognize from several IoT designs:
  •   Field devices -> Queue (Inbox/Outbox)
  •   Queue -> Processing -> SQL
  •   Commands -> Queue (Inbox) <- Device
  
More precisely:
  • Field devices communicate to a "central" server, which just collects the data, buffers them on a durable (temporary) store. Little or no processing here (basics validation only)
  • On different machines, "consume" the items in the temporary store: pull things from there, persist each event in an "append-only" data store (Event Sourcing)
  • Process the events: generate domain objects through a series of steps (3), from the append-only store events to the final objects persisted in SQL tables (Pipes and Filters)
  • Generate "synthesized" data for reporting and statistics queries (Materialized View)
The back-end is already decomposed in several "medium" services: not really "micro" services, but several HTTP-based services talking through a REST API.
These services are already quite robust: they have to, since they are already exposed to the Internet. In particular, they implement Cache-aside for performance, Circuit Breaker/Retry with exp. backoff when they talk to external services (and, in most cases, even when they talk internally to each other), sharding for big data, throttling for some of the public-facing APIs.

Technically, the challenge is so interesting. The architecture is really apt to be ported to the cloud, but to make it really competitive (and to minimize running costs), some pieces will have to be rewritten.
To make the transition as smooth as possible, initially most of the pieces will be less than optimal (mostly IaaS - VMs, SQL storage where NoSQL/Cloud storage would suffice, Compute instances, ..) but will be slowly rewritten to be more efficient, more "cloudy" (App Fabric, Tables, Functions, ...).

Really excited to have begun this journey!