(Reading time: approx. 6 minutes)
What exactly is scalability?
Scalability and performance are overloaded terms that are often used incorrectly.
Scalability is a measure of how much work a system can handle at once without malfunctioning. And by system, I mean both the software and the team coding it.
Performance is purely a measure of speed. A high-performance system does its work really, really fast. And although scalability is related to performance, they are not the same thing.
So when your CEO asks you to build a scalable system, where do you start? Start by setting goals.
Before choosing technology, hiring a team or writing a single-line of code, set goals for both your system and your team that are measurable and meaningful to the success of your organization.
For your system, set a number for maximum concurrent users as your goal. This will change at different stages of your business so revisit this very frequently. Don’t bother stating response-time as a goal because every commercial web application’s response-time should be the same; 1 second or less.
Response-time is the point at which the browser can begin rendering the requested web page. It is a handy performance metric to monitor because it is down-stream from all the others (including page-load speed) and is therefore a good indicator of overall system health. It also has the most direct impact on customer experience. Response times greater than 1 second makes a system feel sluggish.
Its ok to keep track of other metrics but only because they help you maintain or improve your response-time. No one cares about the IOPS of your NAS or the clock speed and number of CPU cores you are running if the end result is slow response-times. Your customers’ perception of your system’s quality is most affected by how fast it responds to their requests.
A discussion of measurement tools is outside the scope of this post but here is a good reference to a list, organized by type.
Good architecture scales, not ‘good’ programming languages
For a long time I thought that there were good programming languages and bad programming languages. I used to think that the good ones could scale while the bad ones could not.
Programming languages don’t guarantee good or bad performance. Programmers do.
Good code has little to do with the language it is written in and everything to do with the programmer writing it.
Don’t get me wrong. Choosing the right language is important. The programming language dictates the type and number of developers you will need on your team. Language also dictates the tools and community support you will have for developing, building, testing and debugging. Different languages require different coding standards to avoid their respective pitfalls. Your build and deployment process is also determined by language choice. This is a big deal. Depending on the programming language, the code-test-debug cycles will vary greatly in time and complexity as will your ability to hot-deploy new versions of your application to production.
So yes, programming language choice is important but not really for performance and scalability reasons.
Good architecture scales, not programming languages.
Good architecture is simple
Figure 1 depicts a generic web application architecture. It is simple, maintainable and most importantly, it will scale. I recommend using it as a starting point for designing yours.
Simplicity is the most important design principle for your deployment architecture and application as well. Choose the simplest design that gets the job done.
Minimize the use of design patterns and frameworks. Design patterns add unnecessary code, cost and risk because patterns are by definition generic and seldom solve real-world problems without adaptation. If you find yourself writing a lot of extra code to follow a pattern, stop and ask yourself if it would be simpler to code your own solution from scratch.
Frameworks are tempting to use because they speed development significantly, usually with high-quality. However, the dangers of using frameworks is the ‘auto-magic’ that happens when the application is running. For example, the Spring Framework executes its own code at run time to interact with the application code. In the case of Spring, dozens of objects are created to run operations before and after application code executes. Frameworks that make heavy use of AOP and ORMs are especially dangerous because in addition to unexpected ‘auto-magic’ operations, scalability issues are more common. Imagine trying to debug a production issue at 2am when you have no idea what the framework is doing behind the scenes to alter your code’s behaviour?
If you decide to use frameworks, do so sparingly and understand everything it is doing behind the scenes.
Sessions and caching are 2 other features that should be used very judiciously because they add complexity and bottlenecks.
Sessions bind objects in memory to a specific user. Caching binds objects in memory to a specific server node. In a clustered server environment this poses 2 problems. First, each server node must use some of its resources to host these objects and maps. Second, to provide fail-over redundancy, these objects and maps must be replicated across all server nodes. This is a major scalability issue as network bandwidth and CPU capacity are quickly eaten up by replication operations.
If you absolutely must have sessions, configure your load balancer to use sticky sessions. You are trading scalability for session-level fail-over redundancy. So instead of your entire cluster slowing down and failing due to replication load, only the users attached to the single failing node will be affected. Trust me, its a good trade.
Limiting the use of patterns, frameworks, sessions and caching has benefits beyond system scalability. The simpler a system, the easier it is to maintain and extend. A simple system allows your team to be scalable. By this I mean, that an increase in time or personnel results in an increase in output (more new features or fixed defects).
The opposite is also true. An overly complicated system does not allow the team working with it to scale. At some point, adding extra personnel or time makes little or no difference to the output.
Good architecture is elegantly simple. Good architecture allows your system and your team to scale.
How to get there on a budget? Apply the Pareto Principle.
The Pareto principle or 80/20 rule as it relates to system architecture, is a generalism that states, 80% of a system’s use in production spans only 20% of a systems feature set. It is not meant to be a precise measure of the ratio it’s just meant to reflect the disparity between what the system can do and what it is actually used for.
I’ll admit, this is based purely on my observations over the years working on enterprise systems but I stand behind it. Think about the software you use daily, be it an email client, word-processor or image editing tool. I think you would agree that most of the time you only use a small subset of the software’s features.
In a perfect world this would not have an impact on system design and implementation. However, in real-life, companies do not have limitless funds to buy more hardware or hire massive teams of developers to optimize every feature. In real-life, you need to understand the performance characteristics of the most important execution paths in your system and put focus on them.
In the case of Hubba, our most important user stories can be grouped into 2 categories. The first category constitutes less than 10% of server load and involves complicated business logic for our retail and brand customers who use Hubba to curate their product data and run analytics on their customers’ engagement. The second category constitutes over 90% of our server load and comes from mobile devices requesting mobile web pages that display views of that very same curated product data. Both use cases are extremely important but have very different performance characteristics.
Looking at Figure 2 you can see that Hubba’s deployment architecture reflects this.
At hubba.com brands, retailers and other data owners curate their content and run analytics on extremely secure, stateful connections and interact with dynamic screens delivered to desktop browsers over highly reliable network connections. Google scale traffic is not ever expected in this part of the system, whereas a highly secure and interactive experience is. For this reason, Apache Tomcat clusters are employed in this part of the system.
Apache Tomcat is a process-based web-server combined with a JSP-Servlet container. It is slower than high-performance web servers at delivering web content but provides superior security and a server-side application runtime environment. How good is Apache Tomcat as an application server? Over half the Fortune 500 companies worldwide use it.
At hubb.ly on the other hand, customers use their mobile devices to access content that brands and retailers have curated at hubba.com. This part of the system needs to handle Google scale traffic but with the added challenge of a network connection that is slower, less secure and less reliable. For this reason, almost all of Hubba’s mobile content is stored as static files that are optimized for mobile devices and served from an Nginx cluster.
Nginx is an asynchronous, high-performance web-server that is efficient at serving static content. It is several orders of magnitude faster at serving static content than Apache. And most importantly, it scales better than Apache.
Note how steady Nginx response-time and memory consumption remains under heavy load compared to Apache (see Figure 3 and Figure 4).
Even though both hubba.com and hubb.ly servers share the same data sources, hubb.ly servers do not make calls to the database and only perform write operations to log files. Changes to web content, curated on the hubba.com side triggers a set of background processes that updates the static files hosted on Nginx so that data and web content are synchronized in near real-time.
So by clearly understanding the performance characteristics of the most important execution paths, we designed the application and deployment architecture to economically provision server resources to 2 very different categories of use cases.
The real goal
Designing and building a scalable system that performs well is difficult. And although its rewarding to see our system perform well under load, fast response-times is not the real goal for me.
The real goal is to help people. Help them learn. Help them play. Help them connect. Help them do their jobs better.
What’s your real goal for making your system scale? Keep that goal in sight at all times and you will create a better product.