Friday, May 03, 2013

The Benefits of a Reverse Proxy

A typical ASP.NET public website hosted on IIS is usually configured in such a way that the server that IIS is installed on is visible to the public internet. HTTP requests from a browser or web service client are routed directly to IIS which also hosts the ASP.NET worker process.  All the functionality needed to produce the web site is embodied in a single server. This includes caching, SSL termination, authentication, serving static files and compression. This approach is simple and straightforward for small sites, but is hard to scale, both in terms of performance, and in terms of managing the complexity of a large complex application. This is especially true if you have a distributed service oriented architecture with multiple HTTP endpoints that appear and disappear frequently.

A reverse proxy is server component that sits between the internet and your web servers. It accepts HTTP requests, provides various services, and forwards the requests to one or many servers.

font-side-proxy

Having a point at which you can inspect, transform and route HTTP requests before they reach your web servers provides a whole host of benefits. Here are some:

Load Balancing

This is the reverse proxy function that people are most familiar with. Here the proxy routes incoming HTTP requests to a number of identical web servers. This can work on a simple round-robin basis, or if you have statefull web servers (it’s better not to) there are session-aware load balancers available. It’s such a common function that load balancing reverse proxies are usually just referred to as ‘load balancers’. There are specialized load balancing products available, but many general purpose reverse proxies also provide load balancing functionality.

Security

A reverse proxy can hide the topology and characteristics of your back-end servers by removing the need for direct internet access to them. You can place your reverse proxy in an internet facing DMZ, but hide your web servers inside a non-public subnet.

Authentication

You can use your reverse proxy to provide a single point of authentication for all HTTP requests.

SSL Termination

Here the reverse proxy handles incoming HTTPS connections, decrypting the requests and passing unencrypted requests on to the web servers. This has several benefits:

  • Removes the need to install certificates on many back end web servers.
  • Provides a single point of configuration and management for SSL/TLS
  • Takes the processing load of encrypting/decrypting HTTPS traffic away from web servers.
  • Makes testing and intercepting HTTP requests to individual web servers easier.

Serving Static Content

Not strictly speaking ‘reverse proxying’ as such. Some reverse proxy servers can also act as web servers serving static content. The average web page can often consist of megabytes of static content such as images, CSS files and JavaScript files. By serving these separately you can take considerable load from back end web servers, leaving them free to render dynamic content.

Caching

The reverse proxy can also act as a cache. You can either have a dumb cache that simply expires after a set period, or better still a cache that respects Cache-Control and Expires headers. This can considerably reduce the load on the back-end servers.

Compression

In order to reduce the bandwidth needed for individual requests, the reverse proxy can decompress incoming requests and compress outgoing ones. This reduces the load on the back-end servers that would otherwise have to do the compression, and makes debugging requests to, and responses from, the back-end servers easier.

Centralised Logging and Auditing

Because all HTTP requests are routed through the reverse proxy, it makes an excellent point for logging and auditing.

URL Rewriting

Sometimes the URL scheme that a legacy application presents is not ideal for discovery or search engine optimisation. A reverse proxy can rewrite URLs before passing them on to your back-end servers. For example, a legacy ASP.NET application might have a URL for a product that looks like this:

http://www.myexampleshop.com/products.aspx?productid=1234

You can use a reverse proxy to present a search engine optimised URL instead:

http://www.myexampleshop.com/products/1234/lunar-module

Aggregating Multiple Websites Into the Same URL Space

In a distributed architecture it’s desirable to have different pieces of functionality served by isolated components. A reverse proxy can route different branches of a single URL address space to different internal web servers.

For example, say I’ve got three internal web servers:

http://products.internal.net/
http://orders.internal.net/
http://stock-control.internal.net/

I can route these from a single external domain using my reverse proxy:

http://www.example.com/products/    -> http://products.internal.net/
http://www.example.com/orders/ -> http://orders.internal.net/
http://www.example.com/stock/ -> http://stock-control.internal.net/

To an external customer it appears that they are simply navigating a single website, but internally the organisation is maintaining three entirely separate sites. This approach can work extremely well for web service APIs where the reverse proxy provides a consistent single public facade to an internal distributed component oriented architecture.

So …

So, a reverse proxy can off load much of the infrastructure concerns of a high-volume distributed web application.

We’re currently looking at Nginx for this role. Expect some practical Nginx related posts about how to do some of this stuff in the very near future.

Happy proxying!

Monday, April 22, 2013

Stop Your Console App The Nice Way

When you write a console application, do you simply put

Console.WriteLine("Hit <enter> to end");
Console.ReadLine();

at the end of Main and block on Console.ReadLine()?

It’s much nicer to to make your console application work with Windows command line conventions and exit when the user types Control-C.

The Console class has a static event CancelKeyPress that fires when Ctrl-C is pressed. You can create an AutoResetEvent that blocks until you call Set in the CancelKeyPress handler.

It’s also nice to stop the application from being arbitrarily killed at the point where Ctrl-C is pressed by setting the EventArgs.Cancel property to true. This gives you a chance to complete what you are doing and exit the application cleanly.

Here’s an example. My little console application kicks off a worker thread and then blocks waiting for Ctrl-C as described above. When Ctrl-C is pressed I send a signal to the worker thread telling it to finish what it’s doing and exit.

using System;
using System.Threading;

namespace Mike.Spikes.ConsoleShutdown
{
class Program
{
private static bool cancel = false;

static void Main(string[] args)
{
Console.WriteLine("Application has started. Ctrl-C to end");

// do some cool stuff here
var myThread = new Thread(Worker);
myThread.Start();

var autoResetEvent = new AutoResetEvent(false);
Console.CancelKeyPress += (sender, eventArgs) =>
{
// cancel the cancellation to allow the program to shutdown cleanly
eventArgs.Cancel = true;
autoResetEvent.Set();
};

// main blocks here waiting for ctrl-C
autoResetEvent.WaitOne();
cancel = true;
Console.WriteLine("Now shutting down");
}

private static void Worker()
{
while (!cancel)
{
Console.WriteLine("Worker is working");
Thread.Sleep(1000);
}
Console.WriteLine("Worker thread ending");
}
}
}

Much nicer I think.

Thursday, April 18, 2013

Serializing Lua Coroutines With Pluto

In my last post I showed how it’s possible to use Lua as a distributed workflow scripting language because of its build in support for coroutines. But in order to create viable long-running workflows we have to have some way of saving the script’s state at the point that it pauses; we need a way to serialize the coroutine.

Enter Pluto:

Pluto is a heavy duty persistence library for Lua. Pluto is a library which allows users to write arbitrarily large portions of the "Lua universe" into a flat file, and later read them back into the same or a different Lua universe.

I downloaded the Win32 build of Tamed Pluto from here and placed it in a directory alongside my Pluto.dll. You have to tell Lua where to find C libraries by setting the Lua package.cpath:

using (var lua = new Lua())
{
lua["package.cpath"] = @"D:\Source\Mike.DistributedLua\Lua\?.dll";

...
}

Lua can now find the Pluto library and we can import it into our script with:

require('pluto')

Here’s a script which defines a function foo which calls a function bar. Foo is used to create a coroutine. Inside bar the coroutine yields. The script uses Pluto to serialize the yielded coroutine and saves it to a file.

-- import pluto, print out the version number 
-- and set non-human binary serialization scheme.
require('pluto')
print('pluto version '..pluto.version())
pluto.human(false)

-- perms are items to be substituted at serialization
perms = { [coroutine.yield] = 1 }

-- the functions that we want to execute as a coroutine
function foo()
local someMessage = 'And hello from a long dead variable!'
local i = 4
bar(someMessage)
print(i)
end

function bar(msg)
print('entered bar')
-- bar runs to here then yields
coroutine.yield()
print(msg)
end

-- create and start the coroutine
co = coroutine.create(foo)
coroutine.resume(co)

-- the coroutine has now stopped at yield. so we can
-- persist its state with pluto
buf = pluto.persist(perms, co)

-- save the serialized state to a file
outfile = io.open(persistCRPath, 'wb')
outfile:write(buf)
outfile:close()

This next script loads the serialized coroutine from disk, deserializes it, and runs it from the point that it yielded:

-- import pluto, print out the version number 
-- and set non-human binary serialization scheme.
require('pluto')
print('pluto version '..pluto.version())
pluto.human(false)

-- perms are items to be substituted at serialization
-- (reverse the key/value pair that you used to serialize)
perms = { [1] = coroutine.yield }

-- get the serialized coroutine from disk
infile, err = io.open(persistCRPath, 'rb')
if infile == nil then
error('While opening: ' .. (err or 'no error'))
end

buf, err = infile:read('*a')
if buf == nil then
error('While reading: ' .. (err or 'no error'))
end

infile:close()

-- deserialize it
co = pluto.unpersist(perms, buf)

-- and run it
coroutine.resume(co)

When we run the scripts, the first prints out ‘entered bar’ and then yields:

LUA> pluto version Tamed Pluto 1.0
LUA> entered bar

The second script loads the paused ‘foo’ and ‘bar’ and continues from the yield:

LUA> pluto version Tamed Pluto 1.0
LUA> And hello from a long dead variable!
LUA> 4

Having the ability to run a script to the point at which it starts a long running call, serialize its state and store it somewhere, then pick up that serialized state and resume it at another place and time is a very powerful capability. It means we can write simple procedural scripts for our workflows, but have them execute over a distributed service oriented architecture.

The complete code for this experiment is on GitHub here.

Thursday, April 04, 2013

Lua as a Distributed Workflow Scripting Language

I’ve been spending a lot of time recently thinking about ways of orchestrating long running workflows in a service oriented architecture. I was talking this over at last Tuesday’s Brighton ALT NET when Jay Kannan, who’s a game developer amongst other things, mentioned that Lua is a popular choice for scripting game platforms. Maybe I should check it out. So I did. And it turns out to be very interesting.

If you haven’t heard of Lua, it’s a “powerful, fast, lightweight, embeddable scripting language” originally conceived by a team at the Catholic University of Rio de Janeiro in Brazil. It’s the leading scripting language for game platforms and also pops up in other interesting locations including Photoshop and Wikipedia. It’s got a straightforward C API that makes it relatively simple to p-invoke from .NET, and indeed there’s a LuaInterface library that provides a managed API.

I got the source code from the Google code svn repository and built it in VS 2012, but there are NuGet packages available as well.

It turned out to be very simple to use Lua to script a distributed workflow. Lua has first class coroutines which means that you can pause and continue a Lua script at will. The LuaInterface library allows you inject C# functions and call them as Lua functions, so it’s simply a case of calling an asynchronous C# ‘begin’ function, suspending the script by yielding the coroutine, waiting for the asynchronous function to return, setting the return value, and starting up the script again.

Let me show you how.

First here’s a little Lua script:

a = 5
b = 6

print('doing remote add ...')

r1 = remoteAdd(a, b)

print('doing remote multiply ...')

r2 = remoteMultiply(r1, 4)

print('doing remote divide ...')

r3 = remoteDivide(r2, 2)

print(r3)

The three functions ‘remoteAdd’, ‘remoteMultiply’ and ‘remoteDivide’ are all asynchronous. Behind the scenes a message is sent via RabbitMQ to a remote OperationServer where the calculation is carried out and a message is returned.

The script runs in my LuaRuntime class. This creates and sets up the Lua environment that the script runs in:

public class LuaRuntime : IDisposable
{
private readonly Lua lua = new Lua();
private readonly Functions functions = new Functions();

public LuaRuntime()
{
lua.RegisterFunction("print", functions, typeof(Functions).GetMethod("Print"));
lua.RegisterFunction("startOperation", this, GetType().GetMethod("StartOperation"));

lua.DoString(
@"
function remoteAdd(a, b) return remoteOperation(a, b, '+'); end
function remoteMultiply(a, b) return remoteOperation(a, b, '*'); end
function remoteDivide(a, b) return remoteOperation(a, b, '/'); end

function remoteOperation(a, b, op)
startOperation(a, b, op)
local cor = coroutine.running()
coroutine.yield(cor)

return LUA_RUNTIME_OPERATION_RESULT
end
"
);
}

public void StartOperation(int a, int b, string operation)
{
functions.RunOperation(a, b, operation, result =>
{
lua["LUA_RUNTIME_OPERATION_RESULT"] = result;
lua.DoString("coroutine.resume(co)");
});
}

public void Execute(string script)
{
const string coroutineWrapper =
@"co = coroutine.create(function()
{0}
end)"
;
lua.DoString(string.Format(coroutineWrapper, script));
lua.DoString("coroutine.resume(co)");
}

public void Dispose()
{
lua.Dispose();
functions.Dispose();
}
}

When this class is instantiated it creates a new LuaInterface environment (the Lua class) and a new instance of a Functions class that I’ll explain below.

The constructor is where most of the interesting setup happens. First we register two C# functions that we want to call from inside Lua: ‘print’ which simply prints from the console, and ‘startOperation’ which starts an asynchronous math operation.

Next we define our three functions: ‘remoteAdd’, ‘remoteMultiply’ and ‘remoteDivide’ which all in turn invoke a common function ‘remoteOperation’. RemoteOperation calls the registered C# function ‘startOperation’ then yields the currently running coroutine. Effectively the script will stop here until it’s started again. After it starts, the result of the asynchronous operation is accessed from the  LUA_RUNTIME_OPERATION_RESULT variable and returned to the caller.

The C# function StartOperation calls RunOperation on our Functions class which has an asynchronous callback. In the callback we set the result value in the Lua environment and execute ‘coroutine.resume’ which restarts the Lua script at the point where it yielded.

The Execute function actually runs the script. First it embeds it in a ‘coroutine.create’ call so that the entire script is created as a coroutine, then it simply starts the coroutine by calling ‘coroutine.resume’.

The Functions class is just a wrapper around a function that maintains an EasyNetQ connection to RabbitMQ and makes an EasyNetQ request to a remote server somewhere else on the network.

public class Functions : IDisposable
{
private readonly IBus bus;

public Functions()
{
bus = RabbitHutch.CreateBus("host=localhost");
}

public void Dispose()
{
bus.Dispose();
}

public void RunOperation(int a, int b, string operation, Action<int> resultCallback)
{
using (var channel = bus.OpenPublishChannel())
{
var request = new OperationRequest()
{
A = a,
B = b,
Operation = operation
};
channel.Request<OperationRequest, OperationResponse>(request, response =>
{
Console.WriteLine("Got response {0}", response.Result);
resultCallback(response.Result);
});
}
}

public void Print(string msg)
{
Console.WriteLine("LUA> {0}", msg);
}
}
 
Here’s a sample run of the script:
 
DEBUG: Trying to connect
DEBUG: OnConnected event fired
INFO: Connected to RabbitMQ. Broker: 'localhost', Port: 5672, VHost: '/'
LUA> doing remote add ...
DEBUG: Declared Consumer. queue='easynetq.response.143441ff-3635-4d5d-8e42-6b379b3f8356', prefetchcount=50
DEBUG: Published to exchange: 'easy_net_q_rpc', routing key: 'Mike_DistributedLua_Messages_OperationRequest:Mike_DistributedLua_Messages', correlationId: '50560dd9-2be1-49a1-96f6-9c62641080ae'
DEBUG: Recieved
RoutingKey: 'easynetq.response.143441ff-3635-4d5d-8e42-6b379b3f8356'
CorrelationId: '50560dd9-2be1-49a1-96f6-9c62641080ae'
ConsumerTag: '101343d9-9497-4893-88e6-b89cc1de29a4'
Got response 11
LUA> doing remote multiply ...
DEBUG: Declared Consumer. queue='easynetq.response.f571f6d7-b963-4a88-bf62-f05785009e39', prefetchcount=50
DEBUG: Published to exchange: 'easy_net_q_rpc', routing key: 'Mike_DistributedLua_Messages_OperationRequest:Mike_DistributedLua_Messages', correlationId: '0ea7e1c3-6f12-4cb9-a861-2f5de8f2600d'
DEBUG: Model Shutdown for queue: 'easynetq.response.143441ff-3635-4d5d-8e42-6b379b3f8356'
DEBUG: Recieved
RoutingKey: 'easynetq.response.f571f6d7-b963-4a88-bf62-f05785009e39'
CorrelationId: '0ea7e1c3-6f12-4cb9-a861-2f5de8f2600d'
ConsumerTag: '2c35f24e-7745-4475-885a-d214a1446a70'
Got response 44
LUA> doing remote divide ...
DEBUG: Declared Consumer. queue='easynetq.response.060f7882-685c-4b00-a930-aa4f20f7c057', prefetchcount=50
DEBUG: Published to exchange: 'easy_net_q_rpc', routing key: 'Mike_DistributedLua_Messages_OperationRequest:Mike_DistributedLua_Messages', correlationId: 'ea9a90cc-cd7d-4f05-b171-c6849026ac4a'
DEBUG: Model Shutdown for queue: 'easynetq.response.f571f6d7-b963-4a88-bf62-f05785009e39'
DEBUG: Recieved
RoutingKey: 'easynetq.response.060f7882-685c-4b00-a930-aa4f20f7c057'
CorrelationId: 'ea9a90cc-cd7d-4f05-b171-c6849026ac4a'
ConsumerTag: '90e6b024-c5c4-440a-abdf-cb9a000c131c'
Got response 22
LUA> 22
DEBUG: Model Shutdown for queue: 'easynetq.response.060f7882-685c-4b00-a930-aa4f20f7c057'
Completed
DEBUG: Connection disposed

You can see the Lua print statements interleaved with EasyNetQ DEBUG statements showing the messages being published and consumed.
 
So there you go, a distributed workflow scripting engine in under 100 lines of code. All I’ve got to do now is serialize the Lua environment at each yield and then restart it again from its serialized state. This is possible according to a bit of googling yesterday afternoon. Watch this space.
 
You can find the code for all this on GitHub here:
 

Tuesday, March 05, 2013

Coders, Musicians and Cooks

In a light hearted Twitter exchange yesterday I asked why so many coders also played guitar. Mark Seemann suggested that there was also a high correlation with cooking.

code_play_cook_tweet

How about a survey? Mark and I have over 4000 twitter followers between us so I was sure we could get a few of them to click a few radio buttons, with no promise of reward, but with the warm feeling that they were helping to progress the anthropological understanding of the coding tribe. So I created a survey and waited for the results to stream in. Of course, as Rob Pickering pointed out, this kind of self selecting survey is very unscientific, but fun nonetheless.

What have we learnt?

  1. It is ridiculously easy to create an online survey using Google Drive. Just click ‘new form’, type in a few questions, and Bob’s yer uncle. It automatically builds a nice summary graphic on the fly as the survey progresses (see below).
  2. Most of Mark and I’s followers are C# coders. 81%. That’s not really surprising.
  3. More coders cook than play musical instruments: 85% verses 53%. That’s not really surprising either. I expect if you took a survey of the general population you’d find quite a high percentage who cook verses those who play.
  4. Guitar is by far the most popular instrument among coders. 30% play guitar. This confirms my hunch, but doesn’t answer my question – why it’s such a high percentage.

Here is the Google Drive graphic of the results.

code_play_cook_survey

Wednesday, February 27, 2013

Fractured Product Syndrome

How do you sell your software product? Do you fork the source code of your previous customer, modify it a little and deploy it as its own instance with its own database? Maybe each customer even has their own server? Their own set of servers?

You are suffering from the Fractured Product Syndrome.

Let me tell you a story.

Colin and Alex have a little web development consultancy, called Coaltech (get it?) they are pretty handy with ASP.NET and SQL Server and already had a respectable list of satisfied clients when a local car hire company, Easy Wheels, asked them if they could build them a car booking system. It would keep a list of Easy Wheels’ cars and allow customers to view them and book them online.

Colin and Alex signed a contract with Easy Wheels and got busy. Their “just do it” attitude to getting features done meant that the code wasn’t particularly beautiful, but it did the job and after a couple of months work the system went live. Easy Wheels were very happy with their new system, so happy in fact that they started to tell other independent car rental companies about it. A few months later Colin and Alex got a call from Hire My Ride. They loved what Coaltech had done for Easy Wheels and wanted a similar system. Of course it would have to be redesigned to fit in with Hire My Ride’s branding, but the basic functionality was pretty much identical. Colin and Alex did the obvious thing, they took a copy of the Easy Wheels code and tweaked it for Hire My Ride. Of course they charged the same bespoke software price for Hire My Ride’s system.

Soon a third and fourth car hire company had asked Coaltech for their ‘system’. Each time Colin and Alex took a copy of the last customer’s code – it made sense to take the most recent code because they inevitably added a few refinements with each customer – and then they altered it to match the new customer’s requirements. Before long they had arranged a deal with another web agency to take over all their non-car-hire work and decided to concentrate full time on the ‘system’. It needed a name too, they couldn’t carry on calling it ‘Easy Wheels’ and they certainly couldn’t market it like that. Soon it became ‘Coaltech Cars’ with a new marketing website. They also found that they couldn’t keep up with demand with just the two of them doing customer implementations. Each new customer took around six weeks of development work with the inevitable to-and-fro of slightly different requirements and design changes. To help meet demand they started to hire developers. First Jez, then Dave, then Neville. They all modified the software in slightly different ways, but it didn’t seem to matter at the time.

Fast forward five years. Coaltech now has 50 employees and around 50 customers. It’s fair to say that they are the leading vendor of independent car rental management systems. The majority of their staff are developers, although they also have a small sales, HR and customer relationship management team. You would have thought that Colin and Alex would be happy to have turned their little web development company into such a success, but instead life just seemed to be one long headache. Although the company had grown, it always seemed hard to turn a profit. As it was customers often balked at the price of the system. The same bug would turn up again and again for different customers. There never seemed to be enough time to fix them all. Even though they might have delivered a new feature for one client, it always seemed to take just as long to implement it for another; sometimes longer depending on what code they’d been forked from. The small team that looked after the servers were in constant fire fighting mode and got very upset when anyone suggested ‘upgrading’ any of the clients -  it always meant bugs, downtime and screaming customers. And then the government changed the Value Added Tax rate. Colin had to cancel his holiday and they lost two of their best developers after several weeks of late nights and no weekends while they updated and redeployed 50 separate web applications.

The end for Coaltech cars came slowly and painfully. Alex had the first hint of it when he was made aware of a little company called RentBoy. They had a little software-as-a-service product for small car hire companies. To use it you entered your credit card number, a logo, a few other details, and you were good to go. They weren’t any immediate competition for Coaltech Cars, having only a small subset of the features, but they soon captured the low end of the market, the one or two man band car companies that had never been able to afford Coaltech’s sign up or licensing fees.

But then the features started coming thick and fast and soon Coaltech found they were loosing out to RentBoy when bidding for clients. Colin found an article on the High Scalability blog about RentBoy’s architecture. They had a single scalable application that served all their customers, one place to fix bugs, one point of scalability and maintenance. They practiced continuous deployment allowing them to quickly build and release new features for all their customers. The company had four employees and already more customers than Coaltech. They charged about a tenth of Coaltech’s fees.

Coaltech’s new client revenues soon dried up. They’d always made a certain amount of money from sign-up fees. Too late they realised that they had to start shedding staff, always a painful thing for a small closely knit company. The debts mounted until the bank would no longer support them, and before long they had to declare bankruptcy. Luckily Colin and Alex managed to get jobs as project managers in enterprise development teams.

The moral of the story? Try to avoid Fractured Product Syndrome if you possibly can. Although simply forking the source code for each new customer appears by far the easiest thing to do at the start, it simply doesn’t scale. Start thinking about how to build multi-tenanted software-as-a-service long before you get to where Coaltech got to. Learn to say ‘no’ to customers if you have to. It’s far better to have a high number of low value customers than smaller numbers of higher value ones on a differentiated platform. It’s much easier for a low-value volume software provider to move into the high-value space than for a high-value provider to move down.

Learn to recognise Fractured Product Syndrome and address it before it gets serious!

Thursday, February 21, 2013

EasyNetQ on .NET Rocks!

dotnetrocks

Last week I had the pleasure of being interviewed by Carl Franklin and Richard Campbell for a .NET Rocks episode on RabbitMQ and EasyNetQ. It was terrific fun and a real honour to be invited on the show. I’ve been listening to .NET Rocks since it started in 2002 so you can imagine how excited I was. Carl and Richard are seasoned pros when it comes to podcasting, and the awesome ninja editing skills they posses turned my rather hesitant and rambling answers into something that almost sounded coherent.

You can listen to the show on the link below:

http://www.dotnetrocks.com/default.aspx?ShowNum=848

Now Richard, about that Tesla …

Wednesday, February 06, 2013

EasyNetQ in Africa

Anthony Moloney got in touch with me recently. He’s using EasyNetQ, my simple .NET API for RabbitMQ, with his team in Kenya. Here’s an email he sent me:

Hi Mike,

Further to the brief twitter exchange today about using EasyNetQ on our Kenyan project. We started using EasyNetQ back in early November and I kept meaning to drop you a line to thank you for all your good work.

Virtual City are based in Nairobi and supply mobile solutions to the supply chain and agribusiness industry in Africa. African solutions for African problems. I got involved with them about 2 years ago to help them improve the quality of their products and I have been working on and off with them since then. Its been a bit of a journey we are getting there.

We have a number of client applications including android and wpf working in an online/offline mode over mobile networks. We need to process large amounts of incoming commands from these applications. These commands are also routed via the server to other client apps.

The application had originally used MVC and SQL server to synchronously process and store the commands but we were running into severe performance problems. We looked at various MQ solutions and decided to use RabbitMQ, WebApi & Mongo to improve processing throughput. While researching a .Net API for RabbitMQ I noticed that you had created the EasyNetQ API.

EasyNetQ greatly simplifies interacting with RabbitMQ and providing your needs are not too complicated you really don't need to know too much about the guts of RabbitMQ. We replaced the existing server setup in about a week. The use of RabbitMQ has greatly increased the scalability of the product and allows us to either scale up or scale out.

We are also using the EasyNetQ management API for monitoring queue activity on our customer services dashboard.

Kind Regards

Anthony Moloney

One of the great rewards of running an open source project is hearing about the fascinating ways that it’s used around the world. I really like that it’s an ‘African solution for African problems’ and built by a Kenyan development team. It’s also interesting that they’ve used OSS projects like RabbitMQ and Mongo alongside .NET. It reminds me of the Stack Overflow architecture, a .NET core surrounded by OSS infrastructure.

Friday, February 01, 2013

How to Write Scalable Services

SDO_arch

I’ve spent the last five years implementing and thinking about service oriented architectures. One of the core benefits of a service oriented approach is the promise of greatly enhanced scalability and redundancy. But to realise these benefits we have to write our services to be ‘scalable’. What does this mean?

There are two fundamental ways we can scale software: 'Vertically' or 'horizontally'.

  • Vertical Scaling addresses the scalability of a single instance of the service. A simple way to scale most software is simply to run it on a more powerful machine; one with a faster processor or more memory. We can also look for performance improvements in the way we write the code itself. An excellent example of company using this approach is LMAX. However, there are many drawbacks to the vertical scaling approach. Firstly the costs are rarely linear; ever more powerful hardware tends to be exponentially more expensive and the costs (and constraints) of building sophisticated performance optimised software are also considerable. Indeed premature performance optimisation often leads to overly complex software that's hard to reason about and therefore more prone to defects and high maintenance costs. Most importantly, vertical scaling does not address redundantcy; vertically scaling an application just turns a small single point of failure into a large single point of failure.

  • Horizontal Scaling. Here we run multiple instances of the application rather than focussing on the performance of a single instance. This has the advantage of being linearly scalable; rather than buying a bigger, more expensive box, we just buy more copies of the same cheap box. With the right architectural design, this approach can scale massively. Indeed it's the approach taken by almost all of largest internet scale companies: Facebook, Google, Twitter etc.. Horizontal Scaling also introduces redundancy; the loss of a single node need not impact the system as a whole. For these reasons, horizontal scaling is the preferred approach to building scalable, redundant systems.

So, the fundamental approach to building scalable systems is to compose them of horizontally scaled services. In order to do this we need to follow a few basic principles:

  • Stateless. Any services that stores state across an interaction with another service is hard to scale. For example, a web service that stores in-memory session state between requests requires a sophisticated session-aware load balancer. A stateless service, by contrast, only requires simple round-robin load balancing. For a web application (or service) you should avoid using session state or any static or application level variables.

  • Coarse Grained API. To be stateless, a service should expose an API that exposes operations as a single interaction. A chatty API, where one sets up some data, asks for some transition, and then reads off some results, implies statefulness by its design. The service would need to identify a session and then maintain information about that session between successive calls. Instead a single call, or message, to the service should encapsulate all the information that the service requires to complete the operation.

  • Idempotent. Much scalable infrastructure is a trade-off between competing constraints. Delivery guarantees are one of these. For various reasons it's is far simpler to guarantee 'at least once' delivery than 'exactly once'. If you can make your software tolerant of multiple deliveries of the same message it will be easier to scale.

  • Embrace Failure. Arrays of services are redundant if the system as a whole can survive the loss of a single node. You should design your services and infrastructure to expect and survive failure. Consider implementing a Chaos Monkey that randomly kills processes. If you start by expecting your services to fail, you'll be prepared when they inevitably do.

  • Avoid instance specific configuration. A scalable service should be designed in such a way that it doesn't need to know about other instances of itself, or have to identify itself as a specific instance. I shouldn't need to have to configure one instance any differently than another. This would include communication mechanisms that require messages to be addressed to a specific instance of the service, or some non-convention based way that the service was required to identify itself. Instead we should rely on infrastructure (load-balancers, pub-sub messaging etc.) to manage the communication between arrays of services.

  • Simple automated deployment. Have a service that can scale is no advantage if we can't deploy it when we are close to capacity. A scalable system must have automated processes to deploy new instances of services as the need arises.

  • Monitoring. We need to know when services are close to capacity so that we can add additional service instances. Monitoring is usually an infrastructure concern; we should be monitoring CPU, network, and memory usage and have alerts in place to warn us when these pass certain trigger points. Sometimes it's worth introducing application specific alerts when some internal trigger is reached, such as the number of items in an in-memory queue, for example.

  • KISS - Keep It Small and Simple. This is good advice for any software project, but is especially pertinent to building scalable resilient systems. Large monolithic codebases are hard to reason about, hard to monitor, and hard to scale. Building your system out of many small pieces makes it easy to address those pieces independently. Design your system so that each service has only one purpose and is decoupled from the operations of other services. Have your services communicate using non-proprietary open standards to avoid vendor lock-in and allow for a heterogeneous platform. JSON over HTTP, for example, is an excellent choice for intra-service communication. Every platform has HTTP and JSON libraries and there is abundant off-the-shelf infrastructure (proxies, load-balancers, caches) that can be used to help your system scale.

This post just gives a few pointers to building scalable systems, for far more detailed examples and case studies I can't recommend the High Scalability Blog enough. The Dodgy Coder blog has a very nice summary of some of the High Scalability case studies here.

Thursday, January 31, 2013

Visual Studio: Multiple Startup Projects

I’ve been using Visual Studio for over 10 years, but still keep on learning new tricks. That’s because I learn things very slowly dear reader! Today’s ‘wow, I didn’t know you could do that moment’, was finding out that it’s possible to launch multiple startup projects when one hits F5 – or Control-F5 in this case.

My scenario is that I’m writing a little spike for a distributed application. Each of my services is implemented as a console app. During development, I want to run integration tests where the services all talk to each other, so before running the tests I want to run all the services. Now I’m used to right-clicking on a project and choosing ‘Set as startup project’, but you can’t select multiple projects this way and it’s very tedious to launch multiple projects by going to each one in turn, setting it as the startup project, and then hitting ctrl-F5. What I didn’t know is that you can right click on the Solution node, select ‘Set Startup Projects’ and you get this dialogue:

Select_startup_project

You can then select multiple startup projects and choose any number of them to ‘Start without debugging’.

Now I can hit ctrl-F5 and all my little services start up. Wonderful.