Another article in our series addressing real questions asked by the CTOs of tech companies from around the world. This time, a question from Fabian Sipp from Morphean:
How to build scalable web products?
There is a moment of glory and pride when your startup is front page news. It’s a great feeling and you just have to celebrate. No surprise that Stanley’s Corp employees planned to do just that! As they raised 10 million dollars in funding and pushed ahead with their marketing, their efforts paid off quickly with newspaper coverage.
The CEO and CTO scheduled a company-wide meeting on Wednesday to share the great news and celebrate… or so everyone thought. When the time came, the meeting was quick and painful. The CEO announced that the traffic they are getting from marketing is killing their service and unfortunately they had lost two key clients, responsible for 10% of the company’s revenue. Then he passed the microphone to the CTO who explained what the root cause was. It turned out that the application was hard to scale and the massive growth in traffic was causing horrible delays for users. It took half a year to fix that.
Sounds terrible, right? This is just an imaginary example but this scenario happens in real life way too often. How do you prevent it? Read below.
What is a scalable web product?
Scalability is the measure that determines whether your product is ready for rapid growth or not. In other words, if your fanbase were to increase by a million people in one day, a truly scalable web product could cope, or at worst run just slightly slower than usual. Non-scalable products will experience delays and the user experience inevitably suffers.
How to scale a web product?
That’s the question we are constantly hearing from our clients. In this article we will focus on the basic high-level knowledge around the subject. In general, you have two choices: vertical scaling and horizontal scaling.
Vertical scaling is extending the capabilities of existing machine, for instance by increasing its memory, cpu power or data storage.
Horizontal scaling is growing by extending the number of machines that are responsible for handling operations.
Is scaling important in the first place?
First of all, scaling does not cut costs. It increases the costs of running an application. A scalable application uses more resources, including running your application on more hardware overall, and more distributed hardware (possibly around the globe), and the power required to run that extra hardware. If you do not actually use these resources, you may still be paying for precious time of your employees to maintain potential capabilities you do not need at the moment. To avoid that situation you need to:
Predict when you expect growth and then be ready with scalable application.
Master other solutions that help you increase the performance of your application. Consider optimizing your code with better algorithms that are capable of handling high data volumes seamlessly, for example by dropping big-o complexity from n^2 to n*log (if possible).
1. Continuous Integration and Continuous Delivery
Organize your development flow to enable both Continuous Integration and Continuous Delivery. It will ensure your application is ready for quick fixes and it will allow automatic bootstrapping.There are a few tools you should consider using:
Version control system: A must-have. This is how developers communicate. Nowadays, the industry standard is GIT rather than the previously used solution, Subversion (SVN).
Docker: Enables developers to compose reusable containers that will work identically both on their computers as well as servers in the cloud. Docker is a solution that has received a lot of attention lately thanks to its high-performance nature that outpaces virtual machines.
Automatic task runners (e.g. Travis, Teamcity or Jenkins): These are responsible for running prepared automatic tasks such as automatic tests. Most of the time, developers use them to automatically verify their new solution and if it builds correctly in the production environment. (Usually, code is developed in at least three different environments: development, test and production)
Microservices are design patterns that enable you to divide your application into smaller parts that depend on each other via contracts.
Each single microservice can be built by a different team and may be deployed in isolation, so long as it does not violate the contract.
Microservices have become popular yet for another reason. Small parts can be vertically scaled much more easily. That means you simply may add more microservers to handle the job, as defined by the contract.
3. Go into the cloud – don’t reinvent the wheel
Given you are ready to automatically deploy and bootstrap your application, consider using cloud solutions. You don’t need to invest funds in your own servers, just rent them from vendors who pay for the maintenance and provide availability at 99.9%.Vendors to consider are:
Amazon Web Services
Google Cloud Platform
If you plan on expansion, your code most likely should support multiple languages. However, the main problem is accessibility around the globe. That means quick access to your product in any key country where your product is launched.Consider using localized Content Delivery Networks that are nearby to your key users, such as
5. Database scalability
Depending on the type of data you store and the type of actions your users will be performing, you will need to take database scalability into account.
The infrastructure behind Twitter, adapted from the piechart on the official Twitter blog. Note that Database is a relatively small element here. The rest of the mentioned keywords are mostly Cache systems. Check below to learn more.
Don’t be mislead by the graphs above. Your database choices are very important. Consider these two multipurpose databases, as you are likely to need them.
MongoDB is scalable document-based database used by almost every single tech giant.To give you a quick glimpse, the MongoDB official webpage mentions: eBay, Adobe, Forbes, Bosch, Cisco, MetLife, and much much more.If you are expecting to populate a database based on different locations or that your tables will grow beyond 5GB, MongoDB is likely a good choice for you.
Elasticsearch is full-text search engine based on documents. It is a scalable solution for your own search functionality that nearly every product provides, either for customers or employees. Elasticsearch is used by: Facebook, Netflix, Microsoft, Wikimedia Foundation, Uber and more.You should consider Elasticsearch when application logs show significant growth and you need a high-performing solution for analytics and monitoring.
A cache is vital a pattern for any application. It basically remembers the computations that are requested. If one user computes a value, a second user looking for the same value (or the first user carrying out the same task a second time) gets the answer immediately thanks to the cache. This mechanism is implemented in many layers of your application. You may not even realise that some of your solutions do this behind the scenes.
Redis is an example or a reliable cache you should consider. For example, Twitter uses its own modified version of Redis with over 10,000 instances, read more here.
Memcached is another example of a cache system. Depending on your specific use and needs, it may also be great choice for your applications. It works especially well as a cache layer for websites.
The overwhelming and costly list above may give you a headache. Don’t just blindly invest. Consult with your developers and independent experts, which solutions will work best for you. If you don’t know anybody we will be more than happy to help.