, , ,

In the paper “Above the Clouds: A Berkeley View of Cloud Computing,” cloud computing is described by three aspects: the illusion by cloud user of infinite computing resources on demand; the elimination of upfront cost commitments by the users; and the ability to pay for computing resources on a short-term basis.

A cloud evangelist Ben Kepes in his article “Want an Irrefutable Example of the Value of Cloud? Here Goes gives an example that demonstrates the potential behind the “illusion of infinite computing”:

One of CycleComputing’s science clients was running a massive scale run against a cancer problem – something to do with simulating the effects of different compounds on a protein associated with cancer. The run was estimated to take 341700 hours (39 years). CycleComputing built a utility supercomputer with some 10600 cloud instances, each of which were an individual multi-core machine – apparently this is the largest cloud HPC (High Performance Computing) environment ever built. If it had been built physically it would have required 12000sq feet of data center space and cost $44M. Instead, over a two hour build time, and a nine hour run time, the total cost of the job run was $4362. 39 compute years, spun up in only a couple of hours, and completely run in half a day. Compelling story huh?

As this story demonstrates, the prospects of what cloud’s scaling will be able to do for solving humanity’s tough problems are truly promising.

The feature most responsible for this illusion of infinite computing is the cloud’s ability to elastically scale. In this post we take a deeper look at how cloud scales.

There are two types of scaling:

  • horizontal and
  •  vertical

Horizontal scaling occurs by linking together identical virtual machines that appear as one bigger VM  to the user.  Horizontal scaling is referred to as “scaling out” when a VM is added. And when a VM is released – “scaling in.”


In the above picture a customer starts with one VM. When that customer’s demand jumps significantly (1), the two new VMs are cloned (2) and are linked in with the first one (3) to create an illusion of one VM that is three times as powerful as the original one(4).

Horizontal scaling is done on the fly without any interruption of service to the customers and is entirely transparent to them.

Unlike with horizontal scaling, vertical scaling replaces an existing VM with either larger or smaller VM. It is referred to as scaling up or down.


In the picture above a customer starts with one VM that has one CPU. When the demand grows (1), a new VM with two CPUs is created and booted up (2). Customer then switches to the new scaled up VM (3).

Vertical scaling involves interruption in customer service because it requires reconfiguring and rebooting the VM.  For this reason horizontal scaling is a more common form of scaling in cloud environments.

In horizontal scaling the two software components that facilitate scaling are Automated Scaling Listener and a Load Balancer.   Suppose a user has a website with one VM behind an Automated Scaling Listener and a Load Balancer (See picture below).


If the load suddenly grows, an Automated Scaling Listener detects it (1).  It issues API calls to create a second and third VM cloned from the original one (2) (3).


New VMs are added to the Load Balancer (4).  Now there are three VMs behind the load balancer and they can process the increased demand (5).  Later, after the load has decreased for some period of time, the Automated Scaling Listener issues a delete VM API call which removes a VM from the load balancer, powers it down and deletes it.

Such dynamic horizontal scaling could scale out to dozens or even hundreds of VMs.