Case Study - Optimizing a Development Pipeline for a Multinational Insurer

Optimizing a Development Pipeline for a Multinational Insurer

Introduction

In 2016, Levvel was contracted to solve case-specific issues for a multinational insurer, tasked with building and supporting a deployment pipeline and infrastructure for several Java-based applications. The primary goal of the engagement was to improve the insurance company’s legacy development and deployment processes, which were heavily intensive and completely manual, leading to lot of human error. For example, a developer would build a WAR file on their local computer, copy it to all servers, back-up the app directory, move the WAR file into the app directory, and manually edit the property files.

Another example involved the building out and supporting of the company’s infrastructure for applications. Multiple teams of people were needed to bring up a single virtual machine (VM), and every step of the process was manual. One server alone could take up to a month before it was operational. In this environment, changing business requirements required several additional servers, and scaling up and down was impractical. There were also multiple single-points-of-failure within the provisioning process and in the operation of the infrastructure that only manual intervention could correct.

One server alone could take up to a month before it was operational.

Leveraging several different technologies and best practices, Levvel automated the vast majority of the company’s development pipeline. Levvel also added manual steps where the business required them or where automation was not feasible. By combining containerization technologies, Docker and Nomad specifically, Levvel repurposed the client’s existing infrastructure in a way that allowed the applications to easily scale up and down without the need to request additional VMs. The system had the ability to identify failures on its own and self-correct without missing requests.

Challenge

The client was building several different applications simultaneously and experiencing poor visibility—team members struggled to identify what versions of code were deployed on which servers, when the code was deployed, and by whom. An increasing amount of time was needed to manually deploy the different applications to different environments as the company scaled, and frequent human error made the process very unreliable.

The company was also running over 150 VMs to support a single application. Most of the resources on these servers were heavily underutilized. Each component of the application was given its own VM to run on, and it took engineers a significant amount of time to make additional VMs operational.

Landscape

Large enterprises are notorious for being slow to adopt change, and for good reason. At that size, small changes can have a domino effect and lead to huge, unwanted repercussions. Assuming a company validated its proposed changes, implementing them across such a large network takes time and money that could be spent building things that generate revenue.

These large enterprises are learning that they can save a lot of time and resources by automating away a majority of the tedious work, freeing resources to build the money-making applications.

Large enterprises are learning that a lot of time and money can be saved by automating away a majority of the tedious work.

Solution

Initially, Levvel set up Github for Enterprise and Jenkins, creating a Git workflow that triggered different Jenkins jobs based on events triggered by normal development activities. For example, if a user created a pull request (PR) in Github, Jenkins would build and/or test the code base. After the PR was merged, Jenkins would deploy it to the development environment.

Levvel added a chatbot to the client’s team communications that could read simple, human statements and react to those messages.

There were some instances in the client’s internal processes and workflows that could not be fully or appropriately automated. For example, if the legal department needed to sign off on a new change before it could be deployed, there was no easy way for the system to detect that and keep it moving down the pipeline. To address this, Levvel added a chatbot (Hubot) to the client’s team communications that could read simple, human statements and react to those messages. Now, legal can tell Hubot, “I approve the model environment”, and then the automation can take back over. Additionally, QA can say, “I’m testing”, and Hubot knows to lock the QA environment so that no new deployments will occur. QA can then say, “QA passed/failed”, and Hubot can start deployments again.

A beneficial side effect to this solution is that everything is now easily traceable. If a user were to rewire Github Enterprise to Hubot and have Hubot trigger the Jenkins jobs, Hubot would know everything that happened in the pipeline. Hubot can trace who triggers changes, when they were triggered, and what changes occurred. Logging these events creates complete accountability from end to end. It also becomes easy for users to ask Hubot about the current state. For example, if the user asks, “What version is in QA?”, Hubot will respond with a version number, SHA1, or whichever version identifier the client prefers.

Levvel used Docker to consolidate the unused resources on the 150+ VMs into a couple of different clusters. Although this consolidation reduced the number of needed VMs to nine, Levvel decided to keep 15 of the VMs for room to scale up and to run any new services or applications that may be needed in the future.

Using Docker to consolidate unused resources on the 150+ VMs, Levvel reduced the needed VMs to nine.

Levvel chose Nomad (backed by Consul) for Docker orchestration and scheduling. This allowed them to use the tool’s built-in service discovery and health check mechanisms. Nomad automatically replaces containers that fail their health checks after a certain period of time, and service discovery allows the applications to find the services they need without manually tracking where things were running. Levvel used NGINX as a reverse proxy to the applications to help with High Availability, and to communicate across different security zones in the network.

Using Consul-Template, Levvel modified and reloaded the backend configuration for NGINX as the location of services changed, once again leveraging the health checks and service discovery mechanisms.

There was enough redundancy that if one or two containers supporting a service went down, there were still enough running to handle any request. Once the system noticed that there was a failure, it immediately reconfigured the proxy. Requests no longer went to those missing containers, and new containers were rescheduled to take their place. Once the new containers were ready, the proxy was once again reconfigured to send traffic to them.

Approach

Actually building out a solution using a containerization approach isn’t very time consuming. The biggest time consumers are identifying how the pipeline should work to best support the business, and then turning those parameters into programmatic rules.

In order to ensure the success of this deployment, it is also very important to educate the groups that will be interacting with the pipeline. It is relatively easy to implement a help system via Hubot, but the teams also need education on why this is being done and how it helps them be more productive. A failure to secure team buy-in can lead to a lot of pushback and harm.

In order to ensure the success of this deployment, it is also very important to educate the groups that will be interacting with the pipeline.

Levvel wanted to ensure continuity while transitioning between the legacy infrastructure and the containerized solution. Levvel identified small areas where they could run some applications and services parallel to their containerized counterparts. This slowed the transition of all the instances of those applications/services to containers as Levvel gained confidence in the test group. Eventually, all the applications/services would be transitioned to containers and from an external view, nothing had changed.

Value and Benefits

After their engagement with Levvel, the insurance company’s error-prone, time-consuming deployments are now automated and reliable. What previously took several hours now takes just several minutes, and resources are used more efficiently. Business rules are enforced on the deployment pipeline ensuring changes don’t skip a step or miss an approval. The company has gained end-to-end visibility and accountability into their entire process.

What previously took several hours now takes just several minutes, and resources are used more efficiently.

Levvel’s efforts significantly reduced resource costs, going from over 150 servers to just 15. This solution also limited losses from downtime via the HA (high availability) nature and self-healing properties enabled through Nomad with Docker. Even more savings were realized by freeing up resources to do work that carried a higher value.

Closing

The containerization approach is agnostic to the languages on which the applications are built. It can be used with practically any source code repository, CI (continuous integration), team communication platform, or other existing infrastructure. Once these tools are integrated and automation processes are created, one just needs to write the rules that apply to their current system’s specific needs.

While containerization may not be the right choice for every business, the ability to automatically recover from failures and the benefits of having HA built into existing infrastructure can be realized by any organization. This same pattern can be used with or without containers, or in a hybrid environment. It greatly simplifies management of a company’s infrastructure, and it can also be used as an easy way to transition from a non-container solution to a fully containerized one.

More holistically, the transformation achieved through the selection and integration of appropriate DevOps tooling—coupled with a thorough understanding of the challenges facing development and operations resources at the beginning of the project—improved quality and efficiency in ways that benefited both IT and business teams. Developers were enabled to make product improvements faster and with fewer errors. Operations teams had manual, error-prone processes completely automated, allowing them to focus on more valuable initiatives. Legal and compliance teams simplified their workflow and communication with technology. And, most importantly, these changes drove expense reductions through more efficient use of computing resources.

Optimizing a Development Pipeline for a Multinational Insurer

More Info

Designing an On-Demand Pet Service Platform

More Info

Levvel Creates Payments Engine RFP for US-Based Fortune 100 Company

More Info

Building a Full-Service Order Management Solution

More Info