Lets say our current application is hosted on 2 2XL servers (8 CPU, 32 MEM) to take care of the peak capacity of 100k users. In the traditional world, since there is no way to increase the servers size on the fly, the server capacities would have built in headroom of around 30-40%. That means, you are paying for that extra headroom on top of paying for unused capacity during non-peak hours (~20k users).
To use the feature of auto-scaling, the application needs to be prepared for it.
Identify what is the non-peak traffic, and what capacity is needed to handle that. Then divide that capacity by 2 and split it into 2 to ensure you have availability. One in one AZ / Region and the other in another AZ / region. So to manage the peak capacity, you might have 8 of L (2 CPU, 8 MEM) servers handling it, instead of only 2 2XL (8 CPU, 32 MEM).
Move Rocks into Cloud
In the extreme case that the application hard-coded nodes cannot be changed (lets say A,B,C,D), then you can still have only A, B running during non peak hours and setup the auto-scaling in the below way
- Launch all the nodes (A, B, C, D) in appropriate EC2 instances (E1, E2, E3, E4)
- Take an independent image of each of the nodes (Ai, Bi, Ci, Di)
- Store the images in S3
- Terminate E3, E4 instances that host C, D nodes
- Put the auto-scaling feature to bring up additional EC2 instances if needed.
- Put the configuration to launch Ci first, then Di
Of course, this is just taking care of the half of the picture (ramping down when not needed) and does not care of the unplanned peak (ramping up beyond the 4 nodes). Moving rocks into cloud does come with its own limitations, so it is always good to make the application cloud-ready from the ground up.
Readings on auto-scaling, in case you have not looked at it already