Thursday 27 June 2013

A fourth dimension of scalability

The AKF parners' book Art of Scalability places a justified emphasis on their concept of enabling scalability in three dimensions, as an alternative to "failing up". So, rather than buying a shinier server one designs systems to do one or more of:
  1. Split traffic between multiple servers. Probably with a load balancer in front, each server having the same code and same data and sharing the work round-robin or according to load.
  2. Split between functionality, with different servers for different classes of request. So, for example, logins go to one pool, search to another and so on. The idea being that with specialization servers can be more efficient, if only by what code or data they need to cache and the number of relationships with other servers they must maintain. Also, when managing upgrades on portions of the code base the effect can more easily be split away from impacting other services.
  3. Split by customer. This comes out as a class of specialisation of data; but in this case to do with user data rather than function or functional data. 

Thinking about a web request, the first case will take any URL, the second might look at the path and file name, the third might look at origin IP or a customer ID in the query part of the URL or in a cookie. In a database a pool of slaves might handle (read) requests in the first case, different servers handle different databases in the second (so preferring data that is join-free), and different servers handle splits of tables by ranges of the key in the third. The ease of such splits varies between front-end systems, in particular web servers, and back-end systems, such as databases. However, in order to isolate one fault from affecting all users it is nice to try and align splits somewhat across tiers - which allows the idea of swim-lanes to come in. So far these are ideas that someone that has spent time worrying about reliable and scalable systems design is probably familiar with - although Abbott and Fisher codify it neatly.
So, can n+1 be applied to this?
My offering: "split by lumpiness".
Some traffic comes in bursts, can you identify and separate such traffic and route it to a dedicated server group? With cloud hosting this would map neatly to changing the size and/or specification of the servers. A similar effect might be obtained with a load balanced split. However, bursts of traffic will often have some commonality - a group of users, or a topic, or access to a suddenly popular page due to external references. See Barabasí's book Bursts for more on this. If the burst can be identified, e.g. by referrer page, then caching may help service those requests. This may be an emergent split, not a planned one - simply having planned the ability to make such a split.
Another lumpy split is down to the work involved in processing a request. Is there an advantage in provisioning a mix of traffic to one server or in having tasks of similar complexity competing with each other for resources? If the ratio between different classes of traffic varies then provisioning for a consistent response time gets harder. If more complex and non-time critical tasks can be handed off then the workload of a customer facing server isn't going to be affected by other tasks with a variable impact. Note that this is slightly more than simply offloading asynchronous tasks: if every request requires the same effort and causes an asynchronous task of consistent effort then the customer experience won't vary - so adding a server to handle the load is the first sort of split above and splitting the tasks is the second sort of split above (by function). If the task complexity varies, so that some are either naturally slow or might affect the usually quick, then a split by variation allows better resource planning and consistent response times.

No comments:

Post a Comment