Many organizations as well as developers are moving to the public cloud to take advantage of being able to quickly scale their infrastructure. The cloud made scaling up and down infrastructure easier, more agile and less expensive. However, as adopters that build more complex applications quickly realize, this model is broken in many ways. We believe there are better ways to build and maintain your infrastructure on the cloud.
Companies spend 3-4x more resources to have industry standard reliability. Cloud providers offer many tools to build cloud-based services, but the reliability of software is still the developer’s responsibility. In order to achieve the industry standard of 99.9% availability for a simple web application, developers need to spend 3 to 4 times more money. This is spent on spinning up redundant virtual machines as well as managing all the complexities that this design introduces such as monitoring, logging and upgrades. As a result, companies allocate 10-15% of their R&D workforce to stabilizing software. In addition to that, companies lose even more whenever their software is down and is hence perceived as unreliable.
Initially, having flexible infrastructure on the cloud appears very simple. However, the reality is that managing the size and number of virtual machines quickly creates overhead that is hard to manage given the high coupling with services running on top of them. Many companies plan and provision their infrastructure to accommodate their peak usages because of the high complexity of dynamically managing capacity. As a result, it is no secret that the average CPU utilization on the cloud is actually between 10-15% and many companies end up paying two to four times more than what they should. More than $23B were spent in 2016 on IaaS. Around $10B was wasted last year due to mostly idle virtual machines. The wasted capacity unfortunately grows with the overall spending.
The emerging breed of applications in areas such as IoT, big data, machine learning, and bioinformatics, have spiky workloads and process unprecedented amounts of data. The reliability and high availability of these applications are critical as the majority of their workloads are tied to mission critical operations. More developers are now using the public cloud to run these applications in the absence of enough time and tools required to achieve targeted reliability at a reasonable price. Cloud technologies are moving quite fast and the majority of developers are still struggling with managing, maintaining and optimizing said infrastructure. Developers’ time should be better spent on innovation and focusing on their core differentiating software components.
What if developers had ..
Serverless cloud with liberating programming models. Given the complexity of managing IaaS, functions such as AWS Lambda and higher level services such as Google Compute Engine are taking a stab at a potential solution. However, developers are either constrained by certain programming models or highly dependent on other services offered by cloud providers. In the case of AWS Lambda, developers can not run their preferred open source packages or current services that they have already been using. Ideally developers should not be involved with the unnecessary complexities of IaaS, not become constrained to use a certain model or get locked with a certain provider.
Smart infrastructure with its own DevOps. Smart Infrastructure should be able to detect the application’s state, current and future needs. Infrastructure should be built with the application’s SLA and KPIs (key performance indicators) in mind. This is what DevOps does to keep applications up to their expected performance. Such tasks are repeatable and machine intelligence could be augmented to do it more frequently and with higher precision. Not only that, but smart infrastructure could potentially apply its learnings across the board. It is pretty much like employing a DevOps engineer with hundreds of thousands hours of experience.
High precision billing. There is no point in having to pay for unutilized capacity. Raw compute (CPU, memory, and I/O) should be billed like electricity where you pay for your consumption only, down to the nanosecond. If your application is actively using CPU, memory or I/O then you should pay for this actual consumption only, no games, and no capacity tricks.