When the application exceeds the capacity of a server, we can scale up (upgrade the server) or scale sideways (add more servers). In the case of microservices, horizontally scaling into two or more machines makes more sense since we get improved availability as a bonus. And, once we have a distributed setup, we can always scale up by upgrading servers.
Two machines sharing the load of the microservices.
The load balancer still is a single point of failure. To avoid this, multiple balancers can run in parallel.
Horizontal scaling is not without its problems, however. Going past one machine poses a few critical points that make troubleshooting much more complex and typical problems that come with using the microservice architecture emerge.
How do we correlate log files distributed among many servers?
How do we collect sensible metrics?
How do we handle upgrades and downtime?
How do we handle spikes and drops in traffic?
These are all problems inherent to distributed computing, and are something that you will experience (and have to deal with) as soon as more than one machine is involved.
This option is excellent if you have a few spare machines and want to improve your application’s availability. As long as you keep things simple, with services that are more or less uniform (same language, similar frameworks), you will be fine. Once you pass a certain complexity threshold, you’ll need containers to provide more flexibility.