[ko] Remove untranslated files in ko/case-studies/

pull/28212/head
Jihoon Seo 2021-05-31 18:23:51 +09:00
parent f007e221ba
commit 7ebb227122
76 changed files with 0 additions and 2341 deletions

Binary file not shown.

Before

Width:  |  Height:  |  Size: 7.6 KiB

View File

@ -1,86 +0,0 @@
---
title: Adform Case Study
linkTitle: Adform
case_study_styles: true
cid: caseStudies
logo: adform_featured_logo.png
draft: false
featured: true
weight: 47
quote: >
Kubernetes enabled the self-healing and immutable infrastructure. We can do faster releases, so our developers are really happy. They can ship our features faster than before, and that makes our clients happier.
new_case_study_styles: true
heading_background: /images/case-studies/adform/banner1.jpg
heading_title_logo: /images/adform_logo.png
subheading: >
Improving Performance and Morale with Cloud Native
case_study_details:
- Company: AdForm
- Location: Copenhagen, Denmark
- Industry: Adtech
---
<h2>Challenge</h2>
<p><a href="https://site.adform.com/">Adform's</a> mission is to provide a secure and transparent full stack of advertising technology to enable digital ads across devices. The company has a large infrastructure: <a href="https://www.openstack.org/">OpenStack</a>-based private clouds running on 1,100 physical servers in 7 data centers around the world, 3 of which were opened in the past year. With the company's growth, the infrastructure team felt that "our private cloud was not really flexible enough," says IT System Engineer Edgaras Apšega. "The biggest pain point is that our developers need to maintain their virtual machines, so rolling out technology and new software takes time. We were really struggling with our releases, and we didn't have self-healing infrastructure."</p>
<h2>Solution</h2>
<p>The team, which had already been using <a href="https://prometheus.io/">Prometheus</a> for monitoring, embraced <a href="https://kubernetes.io/">Kubernetes</a> and cloud native practices in 2017. "To start our Kubernetes journey, we had to adapt all our software, so we had to choose newer frameworks," says Apšega. "We also adopted the microservices way, so observability is much better because you can inspect the bug or the services separately."</p>
<h2>Impact</h2>
<p>"Kubernetes helps our business a lot because our features are coming to market faster," says Apšega. The release process went from several hours to several minutes. Autoscaling has been at least 6 times faster than the semi-manual VM bootstrapping and application deployment required before. The team estimates that the company has experienced cost savings of 4-5x due to less hardware and fewer man hours needed to set up the hardware and virtual machines, metrics, and logging. Utilization of the hardware resources has been reduced as well, with containers notching 2-3 times more efficiency over virtual machines. "The deployments are very easy because developers just push the code and it automatically appears on Kubernetes," says Apšega. Prometheus has also had a positive impact: "It provides high availability for metrics and alerting. We monitor everything starting from hardware to applications. Having all the metrics in <a href="https://grafana.com/">Grafana</a> dashboards provides great insight on your systems."</p>
{{< case-studies/quote author="Edgaras Apšega, IT Systems Engineer, Adform" >}}
"Kubernetes enabled the self-healing and immutable infrastructure. We can do faster releases, so our developers are really happy. They can ship our features faster than before, and that makes our clients happier."
{{< /case-studies/quote >}}
{{< case-studies/lead >}}
Adform made <a href="https://www.wsj.com/articles/fake-ad-operation-used-to-steal-from-publishers-is-uncovered-1511290981">headlines</a> last year when it detected the HyphBot ad fraud network that was costing some businesses hundreds of thousands of dollars a day.
{{< /case-studies/lead >}}
<p>With its mission to provide a secure and transparent full stack of advertising technology to enable an open internet, Adform published a <a href="https://site.adform.com/media/85132/hyphbot_whitepaper_.pdf">white paper</a> revealing what it did—and others could too—to limit customers' exposure to the scam.</p>
<p>In that same spirit, Adform is sharing its cloud native journey. "When you see that everyone shares their best practices, it inspires you to contribute back to the project," says IT Systems Engineer Edgaras Apšega.</p>
<p>The company has a large infrastructure: <a href="https://www.openstack.org/">OpenStack</a>-based private clouds running on 1,100 physical servers in their own seven data centers around the world, three of which were opened in the past year. With the company's growth, the infrastructure team felt that "our private cloud was not really flexible enough," says Apšega. "The biggest pain point is that our developers need to maintain their virtual machines, so rolling out technology and new software really takes time. We were really struggling with our releases, and we didn't have self-healing infrastructure."</p>
{{< case-studies/quote
image="/images/case-studies/adform/banner3.jpg"
author="Edgaras Apšega, IT Systems Engineer, Adform"
>}}
"The fact that Cloud Native Computing Foundation incubated Kubernetes was a really big point for us because it was vendor neutral. And we can see that a community really gathers around it. Everyone shares their experiences, their knowledge, and the fact that it's open source, you can contribute."
{{< /case-studies/quote >}}
<p>The team, which had already been using Prometheus for monitoring, embraced Kubernetes, microservices, and cloud native practices. "The fact that Cloud Native Computing Foundation incubated Kubernetes was a really big point for us because it was vendor neutral," says Apšega. "And we can see that a community really gathers around it."</p>
<p>A proof of concept project was started, with a Kubernetes cluster running on bare metal in the data center. When developers saw how quickly containers could be spun up compared to the virtual machine process, "they wanted to ship their containers in production right away, and we were still doing proof of concept," says IT Systems Engineer Andrius Cibulskis.</p>
<p>Of course, a lot of work still had to be done. "First of all, we had to learn Kubernetes, see all of the moving parts, how they glue together," says Apšega. "Second of all, the whole CI/CD part had to be redone, and our DevOps team had to invest more man hours to implement it. And third is that developers had to rewrite the code, and they're still doing it."</p>
<p>The first production cluster was launched in the spring of 2018, and is now up to 20 physical machines dedicated for pods throughout three data centers, with plans for separate clusters in the other four data centers. The user-facing Adform application platform, data distribution platform, and back ends are now all running on Kubernetes. "Many APIs for critical applications are being developed for Kubernetes," says Apšega. "Teams are rewriting their applications to .NET core, because it supports containers, and preparing to move to Kubernetes. And new applications, by default, go in containers."</p>
{{< case-studies/quote
image="/images/case-studies/adform/banner4.jpg"
author="Andrius Cibulskis, IT Systems Engineer, Adform"
>}}
"Releases are really nice for them, because they just push their code to Git and that's it. They don't have to worry about their virtual machines anymore."
{{< /case-studies/quote >}}
<p>This big push has been driven by the real impact that these new practices have had. "Kubernetes helps our business a lot because our features are coming to market faster," says Apšega. "The deployments are very easy because developers just push the code and it automatically appears on Kubernetes." The release process went from several hours to several minutes. Autoscaling is at least six times faster than the semi-manual VM bootstrapping and application deployment required before.</p>
<p>The team estimates that the company has experienced cost savings of 4-5x due to less hardware and fewer man hours needed to set up the hardware and virtual machines, metrics, and logging. Utilization of the hardware resources has been reduced as well, with containers notching two to three times more efficiency over virtual machines.</p>
<p>Prometheus has also had a positive impact: "It provides high availability for metrics and alerting," says Apšega. "We monitor everything starting from hardware to applications. Having all the metrics in Grafana dashboards provides great insight on our systems."</p>
{{< case-studies/quote author="Edgaras Apšega, IT Systems Engineer, Adform" >}}
"I think that our company just started our cloud native journey. It seems like a huge road ahead, but we're really happy that we joined it."
{{< /case-studies/quote >}}
<p>All of these benefits have trickled down to individual team members, whose working lives have been changed for the better. "They used to have to get up at night to re-start some services, and now Kubernetes handles all of that," says Apšega. Adds Cibulskis: "Releases are really nice for them, because they just push their code to Git and that's it. They don't have to worry about their virtual machines anymore." Even the security teams have been impacted. "Security teams are always not happy," says Apšega, "and now they're happy because they can easily inspect the containers."</p>
<p>The company plans to remain in the data centers for now, "mostly because we want to keep all the data, to not share it in any way," says Cibulskis, "and it's cheaper at our scale." But, Apšega says, the possibility of using a hybrid cloud for computing is intriguing: "One of the projects we're interested in is the <a href="https://github.com/virtual-kubelet/virtual-kubelet">Virtual Kubelet</a> that lets you spin up the working nodes on different clouds to do some computing."</p>
<p>Apšega, Cibulskis and their colleagues are keeping tabs on how the cloud native ecosystem develops, and are excited to contribute where they can. "I think that our company just started our cloud native journey," says Apšega. "It seems like a huge road ahead, but we're really happy that we joined it."</p>

Binary file not shown.

Before

Width:  |  Height:  |  Size: 6.8 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 19 KiB

View File

@ -1,84 +0,0 @@
---
title: Amadeus Case Study
case_study_styles: true
cid: caseStudies
new_case_study_styles: true
heading_background: /images/case-studies/amadeus/banner1.jpg
heading_title_logo: /images/amadeus_logo.png
subheading: >
Another Technical Evolution for a 30-Year-Old Company
case_study_details:
- Company: Amadeus IT Group
- Location: Madrid, Spain
- Industry: Travel Technology
---
<h2>Challenge</h2>
<p>In the past few years, Amadeus, which provides IT solutions to the travel industry around the world, found itself in need of a new platform for the 5,000 services supported by its service-oriented architecture. The 30-year-old company operates its own data center in Germany, and there were growing demands internally and externally for solutions that needed to be geographically dispersed. And more generally, "we had objectives of being even more highly available," says Eric Mountain, Senior Expert, Distributed Systems at Amadeus. Among the company's goals: to increase automation in managing its infrastructure, optimize the distribution of workloads, use data center resources more efficiently, and adopt new technologies more easily.</p>
<h2>Solution</h2>
<p>Mountain has been overseeing the company's migration to <a href="http://kubernetes.io/">Kubernetes</a>, using <a href="https://www.openshift.org/">OpenShift</a> Container Platform, <a href="https://www.redhat.com/en">Red Hat</a>'s enterprise container platform.</p>
<h2>Impact</h2>
<p>One of the first projects the team deployed in Kubernetes was the Amadeus Airline Cloud Availability solution, which helps manage ever-increasing flight-search volume. "It's now handling in production several thousand transactions per second, and it's deployed in multiple data centers throughout the world," says Mountain. "It's not a migration of an existing workload; it's a whole new workload that we couldn't have done otherwise. [This platform] gives us access to market opportunities that we didn't have before."</p>
{{< case-studies/quote author="Eric Mountain, Senior Expert, Distributed Systems at Amadeus IT Group" >}}
"We want multi-data center capabilities, and we want them for our mainstream system as well. We didn't think that we could achieve them with our existing system. We need new automation, things that Kubernetes and OpenShift bring."
{{< /case-studies/quote >}}
{{< case-studies/lead >}}
In his two decades at Amadeus, Eric Mountain has been the migrations guy.
{{< /case-studies/lead >}}
<p>Back in the day, he worked on the company's move from Unix to Linux, and now he's overseeing the journey to cloud native. "Technology just keeps changing, and we embrace it," he says. "We are celebrating our 30 years this year, and we continue evolving and innovating to stay cost-efficient and enhance everyone's travel experience, without interrupting workflows for the customers who depend on our technology."</p>
<p>That was the challenge that Amadeus—which provides IT solutions to the travel industry around the world, from flight searches to hotel bookings to customer feedback—faced in 2014. The technology team realized it was in need of a new platform for the 5,000 services supported by its service-oriented architecture.</p>
<p>The tipping point occurred when they began receiving many requests, internally and externally, for solutions that needed to be geographically outside the company's main data center in Germany. "Some requests were for running our applications on customer premises," Mountain says. "There were also new services we were looking to offer that required response time to the order of a few hundred milliseconds, which we couldn't achieve with transatlantic traffic. Or at least, not without eating into a considerable portion of the time available to our applications for them to process individual queries."</p>
<p>More generally, the company was interested in leveling up on high availability, increasing automation in managing infrastructure, optimizing the distribution of workloads and using data center resources more efficiently. "We have thousands and thousands of servers," says Mountain. "These servers are assigned roles, so even if the setup is highly automated, the machine still has a given role. It's wasteful on many levels. For instance, an application doesn't necessarily use the machine very optimally. Virtualization can help a bit, but it's not a silver bullet. If that machine breaks, you still want to repair it because it has that role and you can't simply say, 'Well, I'll bring in another machine and give it that role.' It's not fast. It's not efficient. So we wanted the next level of automation."</p>
{{< case-studies/quote image="/images/case-studies/amadeus/banner3.jpg" >}}
"We hope that if we build on what others have built, what we do might actually be upstream-able. As Kubernetes and OpenShift progress, we see that we are indeed able to remove some of the additional layers we implemented to compensate for gaps we perceived earlier."
{{< /case-studies/quote >}}
<p>While mainly a C++ and Java shop, Amadeus also wanted to be able to adopt new technologies more easily. Some of its developers had started using languages like <a href="https://www.python.org/">Python</a> and databases like <a href="https://www.couchbase.com/">Couchbase</a>, but Mountain wanted still more options, he says, "in order to better adapt our technical solutions to the products we offer, and open up entirely new possibilities to our developers." Working with recent technologies and cool new things would also make it easier to attract new talent.</p>
<p>All of those needs led Mountain and his team on a search for a new platform. "We did a set of studies and proofs of concept over a fairly short period, and we considered many technologies," he says. "In the end, we were left with three choices: build everything on premise, build on top of <a href="http://kubernetes.io/">Kubernetes</a> whatever happens to be missing from our point of view, or go with <a href="https://www.openshift.com/">OpenShift</a> and build whatever remains there."</p>
<p>The team decided against building everything themselves—though they'd done that sort of thing in the past—because "people were already inventing things that looked good," says Mountain.</p>
<p>Ultimately, they went with OpenShift Container Platform, <a href="https://www.redhat.com/en">Red Hat</a>'s Kubernetes-based enterprise offering, instead of building on top of Kubernetes because "there was a lot of synergy between what we wanted and the way Red Hat was anticipating going with OpenShift," says Mountain. "They were clearly developing Kubernetes, and developing certain things ahead of time in OpenShift, which were important to us, such as more security."</p>
<p>The hope was that those particular features would eventually be built into Kubernetes, and, in the case of security, Mountain feels that has happened. "We realize that there's always a certain amount of automation that we will probably have to develop ourselves to compensate for certain gaps," says Mountain. "The less we do that, the better for us. We hope that if we build on what others have built, what we do might actually be upstream-able. As Kubernetes and OpenShift progress, we see that we are indeed able to remove some of the additional layers we implemented to compensate for gaps we perceived earlier."</p>
{{< case-studies/quote image="/images/case-studies/amadeus/banner4.jpg" >}}
"It's not a migration of an existing workload; it's a whole new workload that we couldn't have done otherwise. [This platform] gives us access to market opportunities that we didn't have before."
{{< /case-studies/quote >}}
<p>The first project the team tackled was one that they knew had to run outside the data center in Germany. Because of the project's needs, "We couldn't rely only on the built-in Kubernetes service discovery; we had to layer on top of that an extra service discovery level that allows us to load balance at the operation level within our system," says Mountain. They also built a stream dedicated to monitoring, which at the time wasn't offered in the Kubernetes or OpenShift ecosystem. Now that <a href="https://www.prometheus.io/">Prometheus</a> and other products are available, Mountain says the company will likely re-evaluate their monitoring system: "We obviously always like to leverage what Kubernetes and OpenShift can offer."
</p>
<p>The second project ended up going into production first: the Amadeus Airline Cloud Availability solution, which helps manage ever-increasing flight-search volume and was deployed in public cloud. Launched in early 2016, it is "now handling in production several thousand transactions per second, and it's deployed in multiple data centers throughout the world," says Mountain. "It's not a migration of an existing workload; it's a whole new workload that we couldn't have done otherwise. [This platform] gives us access to market opportunities that we didn't have before."</p>
<p>Having been through this kind of technical evolution more than once, Mountain has advice on how to handle the cultural changes. "That's one aspect that we can tackle progressively," he says. "We have to go on supplying our customers with new features on our pre-existing products, and we have to keep existing products working. So we can't simply do absolutely everything from one day to the next. And we mustn't sell it that way."</p>
<p>The first order of business, then, is to pick one or two applications to demonstrate that the technology works. Rather than choosing a high-impact, high-risk project, Mountain's team selected a smaller application that was representative of all the company's other applications in its complexity: "We just made sure we picked something that's complex enough, and we showed that it can be done."</p>
{{< case-studies/quote >}}
"The bottom line is we want these multi-data center capabilities, and we want them as well for our mainstream system," he says. "And we don't think that we can implement them with our previous system. We need the new automation, homogeneity, and scale that Kubernetes and OpenShift bring."
{{< /case-studies/quote >}}
<p>Next comes convincing people. "On the operations side and on the R&D side, there will be people who say quite rightly, 'There is a system, and it works, so why change?'" Mountain says. "The only thing that really convinces people is showing them the value." For Amadeus, people realized that the Airline Cloud Availability product could not have been made available on the public cloud with the company's existing system. The question then became, he says, "Do we go into a full-blown migration? Is that something that is justified?"</p>
<p>"The bottom line is we want these multi-data center capabilities, and we want them as well for our mainstream system," he says. "And we don't think that we can implement them with our previous system. We need the new automation, homogeneity, and scale that Kubernetes and OpenShift bring."</p>
<p>So how do you get everyone on board? "Make sure you have good links between your R&D and your operations," he says. "Also make sure you're going to talk early on to the investors and stakeholders. Figure out what it is that they will be expecting from you, that will convince them or not, that this is the right way for your company."</p>
<p>His other advice is simply to make the technology available for people to try it. "Kubernetes and OpenShift Origin are open source software, so there's no complicated license key for the evaluation period and you're not limited to 30 days," he points out. "Just go and get it running." Along with that, he adds, "You've got to be prepared to rethink how you do things. Of course making your applications as cloud native as possible is how you'll reap the most benefits: 12 factors, CI/CD, which is continuous integration, continuous delivery, but also continuous deployment."</p>
<p>And while they explore that aspect of the technology, Mountain and his team will likely be practicing what he preaches to others taking the cloud native journey. "See what happens when you break it, because it's important to understand the limits of the system," he says. Or rather, he notes, the advantages of it. "Breaking things on Kube is actually one of the nice things about it—it recovers. It's the only real way that you'll see that you might be able to do things."</p>

Binary file not shown.

Before

Width:  |  Height:  |  Size: 20 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 21 KiB

View File

@ -1,92 +0,0 @@
---
title: Ancestry Case Study
case_study_styles: true
cid: caseStudies
new_case_study_styles: true
heading_background: /images/case-studies/ancestry/banner1.jpg
heading_title_logo: /images/ancestry_logo.png
subheading: >
Digging Into the Past With New Technology
case_study_details:
- Company: Ancestry
- Location: Lehi, Utah
- Industry: Internet Company, Online Services
---
<h2>Challenge</h2>
<p>Ancestry, the global leader in family history and consumer genomics, uses sophisticated engineering and technology to help everyone, everywhere discover the story of what led to them. The company has spent more than 30 years innovating and building products and technologies that at their core, result in real and emotional human responses. <a href="https://www.ancestry.com">Ancestry</a> currently serves more than 2.6 million paying subscribers, holds 20 billion historical records, 90 million family trees and more than four million people are in its AncestryDNA network, making it the largest consumer genomics DNA network in the world. The company's popular website, <a href="https://www.ancestry.com">ancestry.com</a>, has been working with big data long before the term was popularized. The site was built on hundreds of services, technologies and a traditional deployment methodology. "It's worked well for us in the past," says Paul MacKay, software engineer and architect at Ancestry, "but had become quite cumbersome in its processing and is time-consuming. As a primarily online service, we are constantly looking for ways to accelerate to be more agile in delivering our solutions and our&nbsp;products."</p>
<h2>Solution</h2>
<p>The company is transitioning to cloud native infrastructure, using <a href="https://www.docker.com">Docker</a> containerization, <a href="https://kubernetes.io">Kubernetes</a> orchestration and <a href="https://prometheus.io">Prometheus</a> for cluster monitoring.</p>
<h2>Impact</h2>
<p>"Every single product, every decision we make at Ancestry, focuses on delighting our customers with intimate, sometimes life-changing discoveries about themselves and their families," says MacKay. "As the company continues to grow, the increased productivity gains from using Kubernetes has helped Ancestry make customer discoveries faster. With the move to Dockerization for example, instead of taking between 20 to 50 minutes to deploy a new piece of code, we can now deploy in under a minute for much of our code. We've truly experienced significant time savings in addition to the various features and benefits from cloud native and Kubernetes-type technologies."</p>
{{< case-studies/quote author="PAUL MACKAY, SOFTWARE ENGINEER AND ARCHITECT AT ANCESTRY" >}}
"At a certain point, you have to step back if you're going to push a new technology and get key thought leaders with engineers within the organization to become your champions for new technology adoption. At training sessions, the development teams were always the ones that were saying, 'Kubernetes saved our time tremendously; it's an enabler. It really is incredible.'"
{{< /case-studies/quote >}}
{{< case-studies/lead >}}
It started with a Shaky Leaf.
{{< /case-studies/lead >}}
<p>Since its introduction a decade ago, the Shaky Leaf icon has become one of Ancestry's signature features, which signals to users that there's a helpful hint you can use to find out more about your family tree.</p>
<p>So when the company decided to begin moving its infrastructure to cloud native technology, the first service that was launched on <a href="https://kubernetes.io">Kubernetes</a>, the open source platform for managing application containers across clusters of hosts, was this hint system. Think of it as Amazon's recommended products, but instead of recommending products the company recommends records, stories, or familial connections. "It was a very important part of the site," says Ancestry software engineer and architect Paul MacKay, "but also small enough for a pilot project that we knew we could handle in a very appropriate, secure way."</p>
<p>And when it went live smoothly in early 2016, "our deployment time for this service literally was cut down from 50 minutes to 2 or 5 minutes," MacKay adds. "The development team was just thrilled because we're focused on supplying a great experience for our customers. And that means features, it means stability, it means all those things that we need for a first-in-class type operation."</p>
<p>The stability of that Shaky Leaf was a signal for MacKay and his team that their decision to embrace cloud native technologies was the right one for the company. With a private data center, Ancestry built its website (which launched in 1996) on hundreds of services and technologies and a traditional deployment methodology. "It worked well for us in the past, but the sum of the legacy systems became quite cumbersome in its processing and was time-consuming," says MacKay. "We were looking for other ways to accelerate, to be more agile in delivering our solutions and our products."</p>
{{< case-studies/quote image="/images/case-studies/ancestry/banner3.jpg" >}}
"And when it [Kubernetes] went live smoothly in early 2016, 'our deployment time for this service literally was cut down from 50 minutes to 2 or 5 minutes,' MacKay adds. 'The development team was just thrilled because we're focused on supplying a great experience for our customers. And that means features, it means stability, it means all those things that we need for a first-in-class type operation.'"
{{< /case-studies/quote >}}
<p>That need led them in 2015 to explore containerization. Ancestry engineers had already been using technology like <a href="https://www.java.com/en/">Java</a> and <a href="https://www.python.org">Python</a> on Linux, so part of the decision was about making the infrastructure more Linux-friendly. They quickly decided that they wanted to go with Docker for containerization, "but it always comes down to the orchestration part of it to make it really work," says MacKay.</p>
<p>His team looked at orchestration platforms offered by <a href="https://docs.docker.com/compose/">Docker Compose</a>, <a href="http://mesos.apache.org">Mesos</a> and <a href="https://www.openstack.org/software/">OpenStack</a>, and even started to prototype some homegrown solutions. And then they started hearing rumblings of the imminent release of Kubernetes v1.0. "At the forefront, we were looking at the secret store, so we didn't have to manage that all ourselves, the config maps, the methodology of seamless deployment strategy," he says. "We found that how Kubernetes had done their resources, their types, their labels and just their interface was so much further advanced than the other things we had seen. It was a feature fit."</p>
{{< case-studies/lead >}}
Plus, MacKay says, "I just believed in the confidence that comes with the history that Google has with containerization. So we started out right on the leading edge of it. And we haven't looked back since."
{{< /case-studies/lead >}}
<p>Which is not to say that adopting a new technology hasn't come with some challenges. "Change is hard," says MacKay. "Not because the technology is hard or that the technology is not good. It's just that people like to do things like they had done [before]. You have the early adopters and you have those who are coming in later. It was a learning experience on both sides."</p>
<p>Figuring out the best deployment operations for Ancestry was a big part of the work it took to adopt cloud native infrastructure. "We want to make sure the process is easy and also controlled in the manner that allows us the highest degree of security that we demand and our customers demand," says MacKay. "With Kubernetes and other products, there are some good solutions, but a little bit of glue is needed to bring it into corporate processes and governances. It's like having a set of gloves that are generic, but when you really do want to grab something you have to make it so it's customized to you. That's what we had to do."</p>
<p>Their best practices include allowing their developers to deploy into development stage and production, but then controlling the aspects that need governance and auditing, such as secrets. They found that having one namespace per service is useful for achieving that containment of secrets and config maps. And for their needs, having one container per pod makes it easier to manage and to have a smaller unit of deployment.
</p>
{{< case-studies/quote image="/images/case-studies/ancestry/banner4.jpg" >}}
"The success of Ancestry's first deployment of the hint system on Kubernetes helped create momentum for greater adoption of the technology."
{{< /case-studies/quote >}}
<p>With that process established, the time spent on deployment was cut down to under a minute for some services. "As programmers, we have what's called REPL: read, evaluate, print, and loop, but with Kubernetes, we have CDEL: compile, deploy, execute, and loop," says MacKay. "It's a very quick loop back and a great benefit to understand that when our services are deployed in production, they're the same as what we tested in the pre-production environments. The approach of cloud native for Ancestry provides us a better ability to scale and to accommodate the business needs as work loads occur."</p>
<p>The success of Ancestry's first deployment of the hint system on Kubernetes helped create momentum for greater adoption of the technology. "Engineers like to code, they like to do features, they don't like to sit around waiting for things to be deployed and worrying about scaling up and out and down," says MacKay. "After a while the engineers became our champions. At training sessions, the development teams were always the ones saying, 'Kubernetes saved our time tremendously; it's an enabler; it really is incredible.' Over time, we were able to convince our management that this was a transition that the industry is making and that we needed to be a part of it."</p>
<p>A year later, Ancestry has transitioned a good number of applications to Kubernetes. "We have many different services that make up the rich environment that [the website] has from both the DNA side and the family history side," says MacKay. "We have front-end stacks, back-end stacks and back-end processing type stacks that are in the cluster."</p>
<p>The company continues to weigh which services it will move forward to Kubernetes, which ones will be kept as is, and which will be replaced in the future and thus don't have to be moved over. MacKay estimates that the company is "approaching halfway on those features that are going forward. We don't have to do a lot of convincing anymore. It's more of an issue of timing with getting product management and engineering staff the knowledge and information that they need."</p>
{{< case-studies/quote >}}
"... 'I believe in Kubernetes. I believe in containerization. I think if we can get there and establish ourselves in that world, we will be further along and far better off being agile and all the things we talk about, and it'll&nbsp;go&nbsp;forward.'"
{{< /case-studies/quote >}}
<p>Looking ahead, MacKay sees Ancestry maximizing the benefits of Kubernetes in 2017. "We're very close to having everything that should be or could be in a Linux-friendly world in Kubernetes by the end of the year," he says, adding that he's looking forward to features such as federation and horizontal pod autoscaling that are currently in the works. "Kubernetes has been very wonderful for us and we continue to ride the wave."</p>
<p>That wave, he points out, has everything to do with the vibrant Kubernetes community, which has grown by leaps and bounds since Ancestry joined it as an early adopter. "This is just a very rough way of judging it, but on Slack in June 2015, there were maybe 500 on there," MacKay says. "The last time I looked there were maybe 8,500 just on the Slack channel. There are so many major companies and different kinds of companies involved now. It's the variety of contributors, the number of contributors, the incredibly competent and friendly community."</p>
<p>As much as he and his team at Ancestry have benefited from what he calls "the goodness and the technical abilities of many" in the community, they've also contributed information about best practices, logged bug issues and participated in the open source conversation. And they've been active in attending <a href="https://www.meetup.com/Utah-Kubernetes-Meetup/">meetups</a> to help educate and give back to the local tech community in Utah. Says MacKay: "We're trying to give back as far as our experience goes, rather than just code."</p>
<p>When he meets with companies considering adopting cloud native infrastructure, the best advice he has to give from Ancestry's Kubernetes journey is this: "Start small, but with hard problems," he says. And "you need a patron who understands the vision of containerization, to help you tackle the political as well as other technical roadblocks that can occur when change is needed."</p>
<p>With the changes that MacKay's team has led over the past year and a half, cloud native will be part of Ancestry's technological genealogy for years to come. MacKay has been such a champion of the technology that he says people have jokingly accused him of having a Kubernetes tattoo.</p>
<p>"I really don't," he says with a laugh. "But I'm passionate. I'm not exclusive to any technology; I use whatever I need that's out there that makes us great. If it's something else, I'll use it. But right now I believe in Kubernetes. I believe in containerization. I think if we can get there and establish ourselves in that world, we will be further along and far better off being agile and all the things we talk about, and it'll go forward."</p>
<p>He pauses. "So, yeah, I guess you can say I'm an evangelist for Kubernetes," he says. "But I'm not getting a tattoo!"</p>

Binary file not shown.

Before

Width:  |  Height:  |  Size: 7.7 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 7.7 KiB

View File

@ -1,85 +0,0 @@
---
title: BlaBlaCar Case Study
case_study_styles: true
cid: caseStudies
new_case_study_styles: true
heading_background: /images/case-studies/blablacar/banner1.jpg
heading_title_logo: /images/blablacar_logo.png
subheading: >
Turning to Containerization to Support Millions of Rideshares
case_study_details:
- Company: BlaBlaCar
- Location: Paris, France
- Industry: Ridesharing Company
---
<h2>Challenge</h2>
<p>The world's largest long-distance carpooling community, <a href="https://www.blablacar.com/">BlaBlaCar</a>, connects 40 million members across 22 countries. The company has been experiencing exponential growth since 2012 and needed its infrastructure to keep up. "When you're thinking about doubling the number of servers, you start thinking, 'What should I do to be more efficient?'" says Simon Lallemand, Infrastructure Engineer at BlaBlaCar. "The answer is not to hire more and more people just to deal with the servers and installation." The team knew they had to scale the platform, but wanted to stay on their own bare metal servers.</p>
<h2>Solution</h2>
<p>Opting not to shift to cloud virtualization or use a private cloud on their own servers, the BlaBlaCar team became early adopters of containerization, using the CoreOs runtime <a href="https://coreos.com/rkt">rkt</a>, initially deployed using <a href="https://coreos.com/fleet/docs/latest/launching-containers-fleet.html">fleet</a> cluster manager. Last year, the company switched to <a href="http://kubernetes.io/">Kubernetes</a> orchestration, and now also uses <a href="https://prometheus.io/">Prometheus</a> for monitoring.</p>
<h2>Impact</h2>
<p>"Before using containers, it would take sometimes a day, sometimes two, just to create a new service," says Lallemand. "With all the tooling that we made around the containers, copying a new service now is a matter of minutes. It's really a huge gain. We are better at capacity planning in our data center because we have fewer constraints due to this abstraction between the services and the hardware we run on. For the developers, it also means they can focus only on the features that they're developing, and not on the infrastructure."</p>
{{< case-studies/quote author="Simon Lallemand, Infrastructure Engineer at BlaBlaCar" >}}
"When you're switching to this cloud-native model and running everything in containers, you have to make sure that at any moment you can reboot without any downtime and without losing traffic. [With Kubernetes] our infrastructure is much more resilient and we have better availability than before."
{{< /case-studies/quote >}}
{{< case-studies/lead >}}
For the 40 million users of <a href="https://www.blablacar.com/">BlaBlaCar</a>, it's easy to find strangers headed in the same direction to share rides and costs. You can even choose how much "bla bla" chatter you want from a long-distance ride mate.
{{< /case-studies/lead >}}
<p>Behind the scenes, though, the infrastructure was falling woefully behind the rider community's exponential growth. Founded in 2006, the company hit its current stride around 2012. "Our infrastructure was very traditional," says Infrastructure Engineer Simon Lallemand, who began working at the company in 2014. "In the beginning, it was a bit chaotic because we had to [grow] fast. But then comes the time when you have to design things to make it manageable."</p>
<p>By 2015, the company had about 50 bare metal servers. The team was using a <a href="https://www.mysql.com/">MySQL</a> database and <a href="http://php.net/">PHP</a>, but, Lallemand says, "it was a very static way." They also utilized the configuration management system, <a href="https://www.chef.io/chef/">Chef</a>, but had little automation in its process. "When you're thinking about doubling the number of servers, you start thinking, 'What should I do to be more efficient?'" says Lallemand. "The answer is not to hire more and more people just to deal with the servers and installation."</p>
<p>Instead, BlaBlaCar began its cloud-native journey but wasn't sure which route to take. "We could either decide to go into cloud virtualization or even use a private cloud on our own servers," says Lallemand. "But going into the cloud meant we had to make a lot of changes in our application work, and we were just not ready to make the switch from on premise to the cloud." They wanted to keep the great performance they got on bare metal, so they didn't want to go to virtualization on premise.</p>
<p>The solution: containerization. This was early 2015 and containers were still relatively new. "It was a bold move at the time," says Lallemand. "We decided that the next servers that we would buy in the new data center would all be the same model, so we could outsource the maintenance of the servers. And we decided to go with containers and with <a href="https://coreos.com/">CoreOS</a> Container Linux as an abstraction for this hardware. It seemed future-proof to go with containers because we could see what companies were already doing with containers."</p>
{{< case-studies/quote image="/images/case-studies/blablacar/banner3.jpg">}}
"With all the tooling that we made around the containers, copying a new service is a matter of minutes. It's a huge gain. For the developers, it means they can focus only on the features that they're developing and not on the infrastructure or the hour they would test their code, or the hour that it would get deployed."
{{< /case-studies/quote >}}
<p>Next, they needed to choose a runtime for the containers, but "there were very few deployments in production at that time," says Lallemand. They experimented with <a href="https://www.docker.com/">Docker</a> but decided to go with <a href="https://coreos.com/rkt">rkt</a>. Lallemand explains that for BlaBlaCar, it was "much simpler to integrate things that are on rkt." At the time, the project was still pre-v1.0, so "we could speak with the developers of rkt and give them feedback. It was an advantage." Plus, he notes, rkt was very stable, even at this early stage.</p>
<p>Once those decisions were made that summer, the company came up with a plan for implementation. First, they formed a task force to create a workflow that would be tested by three of the 10 members on Lallemand's team. But they took care to run regular workshops with all 10 members to make sure everyone was on board. "When you're focused on your product sometimes you forget if it's really user friendly, whether other people can manage to create containers too," Lallemand says. "So we did a lot of iterations to find a good workflow."</p>
<p>After establishing the workflow, Lallemand says with a smile that "we had this strange idea that we should try the most difficult thing first. Because if it works, it will work for everything." So the first project the team decided to containerize was the database. "Nobody did that at the time, and there were really no existing tools for what we wanted to do, including building container images," he says. So the team created their own tools, such as <a href="https://github.com/blablacar/dgr">dgr</a>, which builds container images so that the whole team has a common framework to build on the same images with the same standards. They also revamped the service-discovery tools <a href="https://github.com/airbnb/nerve">Nerve</a> and <a href="http://airbnb.io/projects/synapse/">Synapse</a>; their versions, <a href="https://github.com/blablacar/go-nerve">Go-Nerve</a> and <a href="https://github.com/blablacar/go-synapse">Go-Synapse</a>, were written in Go and built to be more efficient and include new features. All of these tools were open-sourced.</p>
<p>At the same time, the company was working to migrate its entire platform to containers with a deadline set for Christmas 2015. With all the work being done in parallel, BlaBlaCar was able to get about 80 percent of its production into containers by its deadline with live traffic running on containers during December. (It's now at 100 percent.) "It's a really busy time for traffic," says Lallemand. "We knew that by using those new servers with containers, it would help us handle the traffic."</p>
<p>In the middle of that peak season for carpooling, everything worked well. "The biggest impact that we had was for the deployment of new services," says Lallemand. "Before using containers, we had to first deploy a new server and create configurations with Chef. It would take sometimes a day, sometimes two, just to create a new service. And with all the tooling that we made around the containers, copying a new service is a matter of minutes. So it's really a huge gain. For the developers, it means they can focus only on the features that they're developing and not on the infrastructure or the hour they would test their code, or the hour that it would get deployed."</p>
{{< case-studies/quote image="/images/case-studies/blablacar/banner4.jpg" >}}
"We realized that there was a really strong community around it [Kubernetes], which meant we would not have to maintain a lot of tools of our own," says Lallemand. "It was better if we could contribute to some bigger project like Kubernetes."
{{< /case-studies/quote >}}
<p>In order to meet their self-imposed deadline, one of the decisions they made was to not do any "orchestration magic" for containers in the first production alignment. Instead, they used the basic <a href="https://coreos.com/fleet/docs/latest/launching-containers-fleet.html">fleet</a> tool from CoreOS to deploy their containers. (They did build a tool called <a href="https://github.com/blablacar/ggn">GGN</a>, which they've open-sourced, to make it more manageable for their system engineers to use.)</p>
<p>Still, the team knew that they'd want more orchestration. "Our tool was doing a pretty good job, but at some point you want to give more autonomy to the developer team," Lallemand says. "We also realized that we don't want to be the single point of contact for developers when they want to launch new services." By the summer of 2016, they found their answer in <a href="http://kubernetes.io/">Kubernetes</a>, which had just begun supporting rkt implementation.</p>
<p>After discussing their needs with their contacts at CoreOS and Google, they were convinced that Kubernetes would work for BlaBlaCar. "We realized that there was a really strong community around it, which meant we would not have to maintain a lot of tools of our own," says Lallemand. "It was better if we could contribute to some bigger project like Kubernetes." They also started using <a href="https://prometheus.io/">Prometheus</a>, as they were looking for "service-oriented monitoring that could be updated nightly." Production on Kubernetes began in December 2016. "We like to do crazy stuff around Christmas," he adds with a laugh.</p>
<p>BlaBlaCar now has about 3,000 pods, with 1200 of them running on Kubernetes. Lallemand leads a "foundations team" of 25 members who take care of the networks, databases and systems for about 100 developers. There have been some challenges getting to this point. "The rkt implementation is still not 100 percent finished," Lallemand points out. "It's really good, but there are some features still missing. We have questions about how we do things with stateful services, like databases. We know how we will be migrating some of the services; some of the others are a bit more complicated to deal with. But the Kubernetes community is making a lot of progress on that part."</p>
<p>The team is particularly happy that they're now able to plan capacity better in the company's data center. "We have fewer constraints since we have this abstraction between the services and the hardware we run on," says Lallemand. "If we lose a server because there's a hardware problem on it, we just move the containers onto another server. It's much more efficient. We do that by just changing a line in the configuration file. And with Kubernetes, it should be automatic, so we would have nothing to do."</p>
{{< case-studies/quote >}}
"If we lose a server because there's a hardware problem on it, we just move the containers onto another server. It's much more efficient. We do that by just changing a line in the configuration file. With Kubernetes, it should be automatic, so we would have nothing to do."
{{< /case-studies/quote >}}
<p>And these advances ultimately trickle down to BlaBlaCar's users. "We have improved availability overall on our website," says Lallemand. "When you're switching to this cloud-native model with running everything in containers, you have to make sure that you can at any moment reboot a server or a data container without any downtime, without losing traffic. So now our infrastructure is much more resilient and we have better availability than before."</p>
<p>Within BlaBlaCar's technology department, the cloud-native journey has created some profound changes. Lallemand thinks that the regular meetings during the conception stage and the training sessions during implementation helped. "After that everybody took part in the migration process," he says. "Then we split the organization into different 'tribes'—teams that gather developers, product managers, data analysts, all the different jobs, to work on a specific part of the product. Before, they were organized by function. The idea is to give all these tribes access to the infrastructure directly in a self-service way without having to ask. These people are really autonomous. They have responsibility of that part of the product, and they can make decisions faster."</p>
<p>This DevOps transformation turned out to be a positive one for the company's staffers. "The team was very excited about the DevOps transformation because it was new, and we were working to make things more reliable, more future-proof," says Lallemand. "We like doing things that very few people are doing, other than the internet giants."</p>
<p>With these changes already making an impact, BlaBlaCar is looking to split up more and more of its application into services. "I don't say microservices because they're not so micro," Lallemand says. "If we can split the responsibilities between the development teams, it would be easier to manage and more reliable, because we can easily add and remove services if one fails. You can handle it easily, instead of adding a big monolith that we still have."</p>
<p>When Lallemand speaks to other European companies curious about what BlaBlaCar has done with its infrastructure, he tells them to come along for the ride. "I tell them that it's such a pleasure to deal with the infrastructure that we have today compared to what we had before," he says. "They just need to keep in mind their real motive, whether it's flexibility in development or reliability or so on, and then go step by step towards reaching those objectives. That's what we've done. It's important not to do technology for the sake of technology. Do it for a purpose. Our focus was on helping the developers."</p>

Binary file not shown.

Before

Width:  |  Height:  |  Size: 6.2 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 6.3 KiB

View File

@ -1,83 +0,0 @@
---
title: BlackRock Case Study
case_study_styles: true
cid: caseStudies
new_case_study_styles: true
heading_background: /images/case-studies/blackrock/banner1.jpg
heading_title_logo: /images/blackrock_logo.png
subheading: >
Rolling Out Kubernetes in Production in 100 Days
case_study_details:
- Company: BlackRock
- Location: New York, NY
- Industry: Financial Services
---
<h2>Challenge</h2>
<p>The world's largest asset manager, <a href="https://www.blackrock.com/investing">BlackRock</a> operates a very controlled static deployment scheme, which has allowed for scalability over the years. But in their data science division, there was a need for more dynamic access to resources. "We want to be able to give every investor access to data science, meaning <a href="https://www.python.org">Python</a> notebooks, or even something much more advanced, like a MapReduce engine based on <a href="https://spark.apache.org">Spark</a>," says Michael Francis, a Managing Director in BlackRock's Product Group, which runs the company's investment management platform. "Managing complex Python installations on users' desktops is really hard because everyone ends up with slightly different environments. We have existing environments that do these things, but we needed to make it real, expansive and scalable. Being able to spin that up on demand, tear it down, make that much more dynamic, became a critical thought process for us. It's not so much that we had to solve our main core production problem, it's how do we extend that? How do we evolve?"</p>
<h2>Solution</h2>
<p>Drawing from what they learned during a pilot done last year using <a href="https://www.docker.com">Docker</a> environments, Francis put together a cross-sectional team of 20 to build an investor research web app using <a href="https://kubernetes.io">Kubernetes</a> with the goal of getting it into production within one quarter.</p>
<h2>Impact</h2>
<p>"Our goal was: How do you give people tools rapidly without having to install them on their desktop?" says Francis. And the team hit the goal within 100 days. Francis is pleased with the results and says, "We're going to use this infrastructure for lots of other application workloads as time goes on. It's not just data science; it's this style of application that needs the dynamism. But I think we're 6-12 months away from making a [large scale] decision. We need to gain experience of running the system in production, we need to understand failure modes and how best to manage operational issues. What's interesting is that just having this technology there is changing the way our developers are starting to think about their future development."</p>
{{< case-studies/quote author="Michael Francis, Managing Director, BlackRock">}}
"My message to other enterprises like us is you can actually integrate Kubernetes into an existing, well-orchestrated machinery. You don't have to throw out everything you do. And using Kubernetes made a complex problem significantly easier."
{{< /case-studies/quote >}}
<p>One of the management objectives for BlackRock's Product Group employees in 2017 was to "build cool stuff." Led by Managing Director Michael Francis, a cross-sectional group of 20 did just that: They rolled out a full production Kubernetes environment and released a new investor research web app on it. In 100 days.</p>
<p>For a company that's the world's largest asset manager, "just equipment procurement can take 100 days sometimes, let alone from inception to delivery," says Karl Wieman, a Senior System Administrator. "It was an aggressive schedule. But it moved the dial." In fact, the project achieved two goals: It solved a business problem (creating the needed web app) as well as provided real-world, in-production experience with Kubernetes, a cloud-native technology that the company was eager to explore. "It's not so much that we had to solve our main core production problem, it's how do we extend that? How do we evolve?" says Francis. The ultimate success of this project, beyond delivering the app, lies in the fact that "we've managed to integrate a radically new thought process into a controlled infrastructure that we didn't want to change."</p>
<p>After all, in its three decades of existence, BlackRock has "a very well-established environment for managing our compute resources," says Francis. "We manage large cluster processes on machines, so we do a lot of orchestration and management for our main production processes in a way that's very cloudish in concept. We're able to manage them in a very controlled, static deployment scheme, and that has given us a huge amount of scalability."</p>
<p>Though that works well for the core production, the company has found that some data science workloads require more dynamic access to resources. "It's a very bursty process," says Francis, who is head of data for the company's Aladdin investment management platform division.</p>
<p>Aladdin, which connects the people, information and technology needed for money management in real time, is used internally and is also sold as a platform to other asset managers and insurance companies. "We want to be able to give every investor access to data science, meaning <a href="https://www.python.org">Python</a> notebooks, or even something much more advanced, like a MapReduce engine based on <a href="https://spark.apache.org">Spark</a>," says Francis. But "managing complex Python installations on users' desktops is really hard because everyone ends up with slightly different environments. Docker allows us to flatten that environment."</p>
{{< case-studies/quote image="/images/case-studies/blackrock/banner3.jpg">}}
"We manage large cluster processes on machines, so we do a lot of orchestration and management for our main production processes in a way that's very cloudish in concept. We're able to manage them in a very controlled, static deployment scheme, and that has given us a huge amount of scalability."
{{< /case-studies/quote >}}
<p>Still, challenges remain. "If you have a shared cluster, you get this storming herd problem where everyone wants to do the same thing at the same time," says Francis. "You could put limits on it, but you'd have to build an infrastructure to define limits for our processes, and the Python notebooks weren't really designed for that. We have existing environments that do these things, but we needed to make it real, expansive, and scalable. Being able to spin that up on demand, tear it down, and make that much more dynamic, became a critical thought process for us."</p>
<p>Made up of managers from technology, infrastructure, production operations, development and information security, Francis's team was able to look at the problem holistically and come up with a solution that made sense for BlackRock. "Our initial straw man was that we were going to build everything using <a href="https://www.ansible.com">Ansible</a> and run it all using some completely different distributed environment," says Francis. "That would have been absolutely the wrong thing to do. Had we gone off on our own as the dev team and developed this solution, it would have been a very different product. And it would have been very expensive. We would not have gone down the route of running under our existing orchestration system. Because we don't understand it. These guys [in operations and infrastructure] understand it. Having the multidisciplinary team allowed us to get to the right solutions and that actually meant we didn't build anywhere near the amount we thought we were going to end up building."</p>
<p>In search of a solution in which they could manage usage on a user-by-user level, Francis's team gravitated to Red Hat's <a href="https://www.openshift.com">OpenShift</a> Kubernetes offering. The company had already experimented with other cloud-native environments, but the team liked that Kubernetes was open source, and "we felt the winds were blowing in the direction of Kubernetes long term," says Francis. "Typically we make technology choices that we believe are going to be here in 5-10 years' time, in some form. And right now, in this space, Kubernetes feels like the one that's going to be there." Adds Uri Morris, Vice President of Production Operations: "When you see that the non-Google committers to Kubernetes overtook the Google committers, that's an indicator of the momentum."</p>
<p>Once that decision was made, the major challenge was figuring out how to make Kubernetes work within BlackRock's existing framework. "It's about understanding how we can operate, manage and support a platform like this, in addition to tacking it onto our existing technology platform," says Project Manager Michael Maskallis. "All the controls we have in place, the change management process, the software development lifecycle, onboarding processes we go through—how can we do all these things?"</p>
<p>The first (anticipated) speed bump was working around issues behind BlackRock's corporate firewalls. "One of our challenges is there are no firewalls in most open source software," says Francis. "So almost all install scripts fail in some bizarre way, and pulling down packages doesn't necessarily work." The team ran into these types of problems using <a href="/docs/getting-started-guides/minikube/">Minikube</a> and did a few small pushes back to the open source project.</p>
{{< case-studies/quote image="/images/case-studies/blackrock/banner4.jpg">}}
"Typically we make technology choices that we believe are going to be here in 5-10 years' time, in some form. And right now, in this space, Kubernetes feels like the one that's going to be there."
{{< /case-studies/quote >}}
<p>There were also questions about service discovery. "You can think of Aladdin as a cloud of services with APIs between them that allows us to build applications rapidly," says Francis. "It's all on a proprietary message bus, which gives us all sorts of advantages but at the same time, how does that play in a third party [platform]?"</p>
<p>Another issue they had to navigate was that in BlackRock's existing system, the messaging protocol has different instances in the different development, test and production environments. While Kubernetes enables a more DevOps-style model, it didn't make sense for BlackRock. "I think what we are very proud of is that the ability for us to push into production is still incredibly rapid in this [new] infrastructure, but we have the control points in place, and we didn't have to disrupt everything," says Francis. "A lot of the cost of this development was thinking how best to leverage our internal tools. So it was less costly than we actually thought it was going to be."</p>
<p>The project leveraged tools associated with the messaging bus, for example. "The way that the Kubernetes cluster will talk to our internal messaging platform is through a gateway program, and this gateway program already has built-in checks and throttles," says Morris. "We can use them to control and potentially throttle the requests coming in from Kubernetes's very elastic infrastructure to the production infrastructure. We'll continue to go in that direction. It enables us to scale as we need to from the operational perspective."</p>
<p>The solution also had to be complementary with BlackRock's centralized operational support team structure. "The core infrastructure components of Kubernetes are hooked into our existing orchestration framework, which means that anyone in our support team has both control and visibility to the cluster using the existing operational tools," Morris explains. "That means that I don't need to hire more people."</p>
<p>With those points established, the team created a procedure for the project: "We rolled this out first to a development environment, then moved on to a testing environment and then eventually to two production environments, in that sequential order," says Maskallis. "That drove a lot of our learning curve. We have all these moving parts, the software components on the infrastructure side, the software components with Kubernetes directly, the interconnectivity with the rest of the environment that we operate here at BlackRock, and how we connect all these pieces. If we came across issues, we fixed them, and then moved on to the different environments to replicate that until we eventually ended up in our production environment where this particular cluster is supposed to live."</p>
<p>The team had weekly one-hour working sessions with all the members (who are located around the world) participating, and smaller breakout or deep-dive meetings focusing on specific technical details. Possible solutions would be reported back to the group and debated the following week. "I think what made it a successful experiment was people had to work to learn, and they shared their experiences with others," says Vice President and Software Developer Fouad Semaan. Then, Francis says, "We gave our engineers the space to do what they're good at. This hasn't been top-down."</p>
{{< case-studies/quote >}}
"The core infrastructure components of Kubernetes are hooked into our existing orchestration framework, which means that anyone in our support team has both control and visibility to the cluster using the existing operational tools. That means that I don't need to hire more people."
{{< /case-studies/quote >}}
<p>They were led by one key axiom: To stay focused and avoid scope creep. This meant that they wouldn't use features that weren't in the core of Kubernetes and Docker. But if there was a real need, they'd build the features themselves. Luckily, Francis says, "Because of the rapidity of the development, a lot of things we thought we would have to build ourselves have been rolled into the core product. [The package manager<a href="https://helm.sh"> Helm</a> is one example]. People have similar problems."</p>
<p>By the end of the 100 days, the app was up and running for internal BlackRock users. The initial capacity of 30 users was hit within hours, and quickly increased to 150. "People were immediately all over it," says Francis. In the next phase of this project, they are planning to scale up the cluster to have more capacity.</p>
<p>Even more importantly, they now have in-production experience with Kubernetes that they can continue to build on—and a complete framework for rolling out new applications. "We're going to use this infrastructure for lots of other application workloads as time goes on. It's not just data science; it's this style of application that needs the dynamism," says Francis. "Is it the right place to move our core production processes onto? It might be. We're not at a point where we can say yes or no, but we felt that having real production experience with something like Kubernetes at some form and scale would allow us to understand that. I think we're 6-12 months away from making a [large scale] decision. We need to gain experience of running the system in production, we need to understand failure modes and how best to manage operational issues."</p>
<p>For other big companies considering a project like this, Francis says commitment and dedication are key: "We got the signoff from [senior management] from day one, with the commitment that we were able to get the right people. If I had to isolate what makes something complex like this succeed, I would say senior hands-on people who can actually drive it make a huge difference." With that in place, he adds, "My message to other enterprises like us is you can actually integrate Kubernetes into an existing, well-orchestrated machinery. You don't have to throw out everything you do. And using Kubernetes made a complex problem significantly easier."</p>

Binary file not shown.

Before

Width:  |  Height:  |  Size: 5.1 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 4.7 KiB

View File

@ -1,83 +0,0 @@
---
title: Buffer Case Study
case_study_styles: true
cid: caseStudies
new_case_study_styles: true
heading_background: /images/case-studies/buffer/banner3.jpg
heading_title_logo: /images/buffer.png
subheading: >
Making Deployments Easy for a Small, Distributed Team
case_study_details:
- Company: Buffer
- Location: Around the World
- Industry: Social Media Technology
---
<h2>Challenge</h2>
<p>With a small but fully distributed team of 80 working across almost a dozen time zones, Buffer—which offers social media management to agencies and marketers—was looking to solve its "classic monolithic code base problem," says Architect Dan Farrelly. "We wanted to have the kind of liquid infrastructure where a developer could create an app and deploy it and scale it horizontally as necessary."</p>
<h2>Solution</h2>
<p>Embracing containerization, Buffer moved its infrastructure from Amazon Web Services' Elastic Beanstalk to Docker on AWS, orchestrated with Kubernetes.</p>
<h2>Impact</h2>
<p>The new system "leveled up our ability with deployment and rolling out new changes," says Farrelly. "Building something on your computer and knowing that it's going to work has shortened things up a lot. Our feedback cycles are a lot faster now too."</p>
{{< case-studies/quote author="DAN FARRELLY, BUFFER ARCHITECT" >}}
"It's amazing that we can use the Kubernetes solution off the shelf with our team. And it just keeps getting better. Before we even know that we need something, it's there in the next release or it's coming in the next few months."
{{< /case-studies/quote >}}
{{< case-studies/lead >}}
Dan Farrelly uses a carpentry analogy to explain the problem his company, <a href="https://buffer.com">Buffer</a>, began having as its team of developers grew over the past few years.
{{< /case-studies/lead >}}
<p>"If you're building a table by yourself, it's fine," the company's architect says. "If you bring in a second person to work on the table, maybe that person can start sanding the legs while you're sanding the top. But when you bring a third or fourth person in, someone should probably work on a different table." Needing to work on more and more different tables led Buffer on a path toward microservices and containerization made possible by Kubernetes.</p>
<p>Since around 2012, Buffer had already been using <a href="https://aws.amazon.com/elasticbeanstalk/">Elastic Beanstalk</a>, the orchestration service for deploying infrastructure offered by <a href="https://aws.amazon.com">Amazon Web Services</a>. "We were deploying a single monolithic <a href="http://php.net/manual/en/intro-whatis.php">PHP</a> application, and it was the same application across five or six environments," says Farrelly. "We were very much a product-driven company. It was all about shipping new features quickly and getting things out the door, and if something was not broken, we didn't spend too much time on it. If things were getting a little bit slow, we'd maybe use a faster server or just scale up one instance, and it would be good enough. We'd move on."</p>
<p>But things came to a head in 2016. With the growing number of committers on staff, Farrelly and Buffer's then-CTO, Sunil Sadasivan, decided it was time to re-architect and rethink their infrastructure. "It was a classic monolithic code base problem," says Farrelly.<br><br>Some of the company's team was already successfully using <a href="https://www.docker.com">Docker</a> in their development environment, but the only application running on Docker in production was a marketing website that didn't see real user traffic. They wanted to go further with Docker, and the next step was looking at options for orchestration.</p>
{{< case-studies/quote image="/images/case-studies/buffer/banner1.jpg" >}}
And all the things Kubernetes did well suited Buffer's needs. "We wanted to have the kind of liquid infrastructure where a developer could create an app and deploy it and scale it horizontally as necessary," says Farrelly. "We quickly used some scripts to set up a couple of test clusters, we built some small proof-of-concept applications in containers, and we deployed things within an hour. We had very little experience in running containers in production. It was amazing how quickly we could get a handle on it [Kubernetes]."
{{< /case-studies/quote >}}
<p>First they considered <a href="https://mesosphere.com">Mesosphere</a>, <a href="https://dcos.io">DC/OS</a> and <a href="https://aws.amazon.com/ecs/">Amazon Elastic Container Service</a> (which their data systems team was already using for some data pipeline jobs). While they were impressed by these offerings, they ultimately went with Kubernetes. "We run on AWS still, so spinning up, creating services and creating load balancers on demand for us without having to configure them manually was a great way for our team to get into this," says Farrelly. "We didn't need to figure out how to configure this or that, especially coming from a former Elastic Beanstalk environment that gave us an automatically-configured load balancer. I really liked Kubernetes' controls of the command line. It just took care of ports. It was a lot more flexible. Kubernetes was designed for doing what it does, so it does it very well."</p>
<p>And all the things Kubernetes did well suited Buffer's needs. "We wanted to have the kind of liquid infrastructure where a developer could create an app and deploy it and scale it horizontally as necessary," says Farrelly. "We quickly used some scripts to set up a couple of test clusters, we built some small proof-of-concept applications in containers, and we deployed things within an hour. We had very little experience in running containers in production. It was amazing how quickly we could get a handle on it [Kubernetes]."</p>
<p>Above all, it provided a powerful solution for one of the company's most distinguishing characteristics: their remote team that's spread across a dozen different time zones. "The people with deep knowledge of our infrastructure live in time zones different from our peak traffic time zones, and most of our product engineers live in other places," says Farrelly. "So we really wanted something where anybody could get a grasp of the system early on and utilize it, and not have to worry that the deploy engineer is asleep. Otherwise people would sit around for 12 to 24 hours for something. It's been really cool to see people moving much faster."</p>
<p>With a relatively small engineering team—just 25 people, and only a handful working on infrastructure, with the majority front-end developers—Buffer needed "something robust for them to deploy whatever they wanted," says Farrelly. Before, "it was only a couple of people who knew how to set up everything in the old way. With this system, it was easy to review documentation and get something out extremely quickly. It lowers the bar for us to get everything in production. We don't have the big team to build all these tools or manage the infrastructure like other larger companies might."</p>
{{< case-studies/quote image="/images/case-studies/buffer/banner4.jpg" >}}
"In our old way of working, the feedback loop was a lot longer, and it was delicate because if you deployed something, the risk was high to potentially break something else," Farrelly says. "With the kind of deploys that we built around Kubernetes, we were able to detect bugs and fix them, and get them deployed super fast. The second someone is fixing [a bug], it's out the door."
{{< /case-studies/quote >}}
<p>To help with this, Buffer developers wrote a deploy bot that wraps the Kubernetes deploy process and can be used by every team. "Before, our data analysts would update, say, a <a href="https://www.python.org">Python</a> analysis script and have to wait for the lead on that team to click the button and deploy it," Farrelly explains. "Now our data analysts can make a change, enter a <a href="https://slack.com">Slack</a> command, '/deploy,' and it goes out instantly. They don't need to wait on these slow turnaround times. They don't even know where it's running; it doesn't matter."</p>
<p>One of the first applications the team built from scratch using Kubernetes was a new image resizing service. As a social media management tool that allows marketing teams to collaborate on posts and send updates across multiple social media profiles and networks, Buffer has to be able to resize photographs as needed to meet the varying limitations of size and format posed by different social networks. "We always had these hacked together solutions," says Farrelly.</p>
<p>To create this new service, one of the senior product engineers was assigned to learn Docker and Kubernetes, then build the service, test it, deploy it and monitor it—which he was able to do relatively quickly. "In our old way of working, the feedback loop was a lot longer, and it was delicate because if you deployed something, the risk was high to potentially break something else," Farrelly says. "With the kind of deploys that we built around Kubernetes, we were able to detect bugs and fix them, and get them deployed super fast. The second someone is fixing [a bug], it's out the door."</p>
<p>Plus, unlike with their old system, they could scale things horizontally with one command. "As we rolled it out," Farrelly says, "we could anticipate and just click a button. This allowed us to deal with the demand that our users were placing on the system and easily scale it to handle it."</p>
<p>Another thing they weren't able to do before was a canary deploy. This new capability "made us so much more confident in deploying big changes," says Farrelly. "Before, it took a lot of testing, which is still good, but it was also a lot of 'fingers crossed.' And this is something that gets run 800,000 times a day, the core of our business. If it doesn't work, our business doesn't work. In a Kubernetes world, I can do a canary deploy to test it for 1 percent and I can shut it down very quickly if it isn't working. This has leveled up our ability to deploy and roll out new changes quickly while reducing risk."</p>
{{< case-studies/quote >}}
"If you want to run containers in production, with nearly the power that Google uses internally, this [Kubernetes] is a great way to do that," Farrelly says. "We're a relatively small team that's actually running Kubernetes, and we've never run anything like it before. So it's more approachable than you might think. That's the one big thing that I tell people who are experimenting with it. Pick a couple of things, roll it out, kick the tires on this for a couple of months and see how much it can handle. You start learning a lot this way."
{{< /case-studies/quote >}}
<p>By October 2016, 54 percent of Buffer's traffic was going through their Kubernetes cluster. "There's a lot of our legacy functionality that still runs alright, and those parts might move to Kubernetes or stay in our old setup forever," says Farrelly. But the company made the commitment at that time that going forward, "all new development, all new features, will be running on Kubernetes."</p>
<p>The plan for 2017 is to move all the legacy applications to a new Kubernetes cluster, and run everything they've pulled out of their old infrastructure, plus the new services they're developing in Kubernetes, on another cluster. "I want to bring all the benefits that we've seen on our early services to everyone on the team," says Farrelly.</p>
{{< case-studies/lead >}}
For Buffer's engineers, it's an exciting process. "Every time we're deploying a new service, we need to figure out: OK, what's the architecture? How do these services communicate? What's the best way to build this service?" Farrelly says. "And then we use the different features that Kubernetes has to glue all the pieces together. It's enabling us to experiment as we're learning how to design a service-oriented architecture. Before, we just wouldn't have been able to do it. This is actually giving us a blank white board so we can do whatever we want on it."
{{< /case-studies/lead >}}
<p>Part of that blank slate is the flexibility that Kubernetes offers should the time come when Buffer may want or need to change its cloud. "It's cloud agnostic so maybe one day we could switch to Google or somewhere else," Farrelly says. "We're very deep in Amazon but it's nice to know we could move away if we need to."</p>
<p>At this point, the team at Buffer can't imagine running their infrastructure any other way—and they're happy to spread the word. "If you want to run containers in production, with nearly the power that Google uses internally, this [Kubernetes] is a great way to do that," Farrelly says. "We're a relatively small team that's actually running Kubernetes, and we've never run anything like it before. So it's more approachable than you might think. That's the one big thing that I tell people who are experimenting with it. Pick a couple of things, roll it out, kick the tires on this for a couple of months and see how much it can handle. You start learning a lot this way."</p>

Binary file not shown.

Before

Width:  |  Height:  |  Size: 11 KiB

View File

@ -1,61 +0,0 @@
---
title: Capital One Case Study
case_study_styles: true
cid: caseStudies
new_case_study_styles: true
heading_background: /images/case-studies/capitalone/banner1.jpg
heading_title_logo: /images/capitalone-logo.png
subheading: >
Supporting Fast Decisioning Applications with Kubernetes
case_study_details:
- Company: Capital One
- Location: McLean, Virginia
- Industry: Retail banking
---
<h2>Challenge</h2>
<p>The team set out to build a provisioning platform for <a href="https://www.capitalone.com/">Capital One</a> applications deployed on AWS that use streaming, big-data decisioning, and machine learning. One of these applications handles millions of transactions a day; some deal with critical functions like fraud detection and credit decisioning. The key considerations: resilience and speed—as well as full rehydration of the cluster from base AMIs.</p>
<h2>Solution</h2>
<p>The decision to run <a href="https://kubernetes.io/">Kubernetes</a> "is very strategic for us," says John Swift, Senior Director Software Engineering. "We use Kubernetes as a substrate or an operating system, if you will. There's a degree of affinity in our product development."</p>
<h2>Impact</h2>
<p>"Kubernetes is a significant productivity multiplier," says Lead Software Engineer Keith Gasser, adding that to run the platform without Kubernetes would "easily see our costs triple, quadruple what they are now for the amount of pure AWS expense." Time to market has been improved as well: "Now, a team can come to us and we can have them up and running with a basic decisioning app in a fortnight, which before would have taken a whole quarter, if not longer." Deployments increased by several orders of magnitude. Plus, the rehydration/cluster-rebuild process, which took a significant part of a day to do manually, now takes a couple hours with Kubernetes automation and declarative configuration.</p>
{{< case-studies/quote author="Jamil Jadallah, Scrum Master" >}}
<iframe width="560" height="315" src="https://www.youtube.com/embed/UHVW01ksg-s" frameborder="0" allow="autoplay; encrypted-media" allowfullscreen></iframe>
<br>
"With the scalability, the management, the coordination, Kubernetes really empowers us and gives us more time back than we had before."
{{< /case-studies/quote >}}
<p>As a top 10 U.S. retail bank, Capital One has applications that handle millions of transactions a day. Big-data decisioning—for fraud detection, credit approvals and beyond—is core to the business. To support the teams that build applications with those functions for the bank, the cloud team led by Senior Director Software Engineering John Swift embraced Kubernetes for its provisioning platform. "Kubernetes and its entire ecosystem are very strategic for us," says Swift. "We use Kubernetes as a substrate or an operating system, if you will. There's a degree of affinity in our product development."</p>
<p>Almost two years ago, the team embarked on this journey by first working with Docker. Then came Kubernetes. "We wanted to put streaming services into Kubernetes as one feature of the workloads for fast decisioning, and to be able to do batch alongside it," says Lead Software Engineer Keith Gasser. "Once the data is streamed and batched, there are so many tool sets in <a href="https://flink.apache.org/">Flink</a> that we use for decisioning. We want to provide the tools in the same ecosystem, in a consistent way, rather than have a large custom snowflake ecosystem where every tool needs its own custom deployment. Kubernetes gives us the ability to bring all of these together, so the richness of the open source and even the license community dealing with big data can be corralled."</p>
{{< case-studies/quote image="/images/case-studies/capitalone/banner3.jpg" >}}
"We want to provide the tools in the same ecosystem, in a consistent way, rather than have a large custom snowflake ecosystem where every tool needs its own custom deployment. Kubernetes gives us the ability to bring all of these together, so the richness of the open source and even the license community dealing with big data can be corralled."
{{< /case-studies/quote >}}
<p>In this first year, the impact has already been great. "Time to market is really huge for us," says Gasser. "Especially with fraud, you have to be very nimble in the way you respond to threats in the marketplace—being able to add and push new rules, detect new patterns of behavior, detect anomalies in account and transaction flows." With Kubernetes, "a team can come to us and we can have them up and running with a basic decisioning app in a fortnight, which before would have taken a whole quarter, if not longer. Kubernetes is a manifold productivity multiplier."</p>
<p>Teams now have the tools to be autonomous in their deployments, and as a result, deployments have increased by two orders of magnitude. "And that was with just seven dedicated resources, without needing a whole group sitting there watching everything," says Scrum Master Jamil Jadallah. "That's a huge cost savings. With the scalability, the management, the coordination, Kubernetes really empowers us and gives us more time back than we had before."</p>
{{< case-studies/quote image="/images/case-studies/capitalone/banner4.jpg" >}}
With Kubernetes, "a team can come to us and we can have them up and running with a basic decisioning app in a fortnight, which before would have taken a whole quarter, if not longer. Kubernetes is a manifold productivity multiplier."
{{< /case-studies/quote >}}
<p>Kubernetes has also been a great time-saver for Capital One's required period "rehydration" of clusters from base AMIs. To minimize the attack vulnerability profile for applications in the cloud, "Our entire clusters get rebuilt from scratch periodically, with new fresh instances and virtual server images that are patched with the latest and greatest security patches," says Gasser. This process used to take the better part of a day, and personnel, to do manually. It's now a quick Kubernetes job.</p>
<p>Savings extend to both capital and operating expenses. "It takes very little to get into Kubernetes because it's all open source," Gasser points out. "We went the DIY route for building our cluster, and we definitely like the flexibility of being able to embrace the latest from the community immediately without waiting for a downstream company to do it. There's capex related to those licenses that we don't have to pay for. Moreover, there's capex savings for us from some of the proprietary software that we get to sunset in our particular domain. So that goes onto our ledger in a positive way as well." (Some of those open source technologies include Prometheus, Fluentd, gRPC, Istio, CNI, and Envoy.)</p>
{{< case-studies/quote >}}
"If we had to do all of this without Kubernetes, on underlying cloud services, I could easily see our costs triple, quadruple what they are now for the amount of pure AWS expense. That doesn't account for personnel to deploy and maintain all the additional infrastructure."
{{< /case-studies/quote >}}
<p>And on the opex side, Gasser says, the savings are high. "We run dozens of services, we have scores of pods, many daemon sets, and since we're data-driven, we take advantage of EBS-backed volume claims for all of our stateful services. If we had to do all of this without Kubernetes, on underlying cloud services, I could easily see our costs triple, quadruple what they are now for the amount of pure AWS expense. That doesn't account for personnel to deploy and maintain all the additional infrastructure."</p>
<p>The team is confident that the benefits will continue to multiply—without a steep learning curve for the engineers being exposed to the new technology. "As we onboard additional tenants in this ecosystem, I think the need for folks to understand Kubernetes may not necessarily go up. In fact, I think it goes down, and that's good," says Gasser. "Because that really demonstrates the scalability of the technology. You start to reap the benefits, and they can concentrate on all the features they need to build for great decisioning in the business— fraud decisions, credit decisions—and not have to worry about, 'Is my AWS server broken? Is my pod not running?'"</p>

Binary file not shown.

Before

Width:  |  Height:  |  Size: 7.6 KiB

View File

@ -1,85 +0,0 @@
---
title: Crowdfire Case Study
case_study_styles: true
cid: caseStudies
new_case_study_styles: true
heading_background: /images/case-studies/crowdfire/banner1.jpg
heading_title_logo: /images/crowdfire_logo.png
subheading: >
How to Keep Iterating a Fast-Growing App With a Cloud-Native Approach
case_study_details:
- Company: Crowdfire
- Location: Mumbai, India
- Industry: Social Media Software
---
<h2>Challenge</h2>
<p><a href="https://www.crowdfireapp.com/">Crowdfire</a> helps content creators create their content anywhere on the Internet and publish it everywhere else in the right format. Since its launch in 2010, it has grown to 16 million users. The product began as a monolith app running on <a href="https://cloud.google.com/appengine/">Google App Engine</a>, and in 2015, the company began a transformation to microservices running on Amazon Web Services <a href="https://aws.amazon.com/elasticbeanstalk/">Elastic Beanstalk</a>. "It was okay for our use cases initially, but as the number of services, development teams and scale increased, the deploy times, self-healing capabilities and resource utilization started to become problems for us," says Software Engineer Amanpreet Singh, who leads the infrastructure team for Crowdfire.</p>
<h2>Solution</h2>
<p>"We realized that we needed a more cloud-native approach to deal with these issues," says Singh. The team decided to implement a custom setup of Kubernetes based on <a href="https://www.terraform.io/">Terraform</a> and <a href="https://www.ansible.com/">Ansible</a>.</p>
<h2>Impact</h2>
<p>"Kubernetes has helped us reduce the deployment time from 15 minutes to less than a minute," says Singh. "Due to Kubernetes's self-healing nature, the operations team doesn't need to do any manual intervention in case of a node or pod failure." Plus, he says, "Dev-Prod parity has improved since developers can experiment with options in dev/staging clusters, and when it's finalized, they just commit the config changes in the respective code repositories. These changes automatically get replicated on the production cluster via CI/CD pipelines."</p>
{{< case-studies/quote author="Amanpreet Singh, Software Engineer at Crowdfire" >}}
"In the 15 months that we've been using Kubernetes, it has been amazing for us. It enabled us to iterate quickly, increase development speed, and continuously deliver new features and bug fixes to our users, while keeping our operational costs and infrastructure management overhead under control."
{{< /case-studies/quote >}}
{{< case-studies/lead >}}
"If you build it, they will come."
{{< /case-studies/lead >}}
<p>For most content creators, only half of that movie quote may ring true. Sure, platforms like Wordpress, YouTube and Shopify have made it simple for almost anyone to start publishing new content online, but attracting an audience isn't as easy. Crowdfire "helps users publish their content to all possible places where their audience exists," says Amanpreet Singh, a Software Engineer at the company based in Mumbai, India. Crowdfire has gained more than 16 million users—from bloggers and artists to makers and small businesses—since its launch in 2010.</p>
<p>With that kind of growth—and a high demand from users for new features and continuous improvements—the Crowdfire team struggled to keep up behind the scenes. In 2015, they moved their monolith Java application to Amazon Web Services <a href="https://aws.amazon.com/elasticbeanstalk/">Elastic Beanstalk</a> and started breaking it down into microservices.</p>
<p>It was a good first step, but the team soon realized they needed to go further down the cloud-native path, which would lead them to Kubernetes. "It was okay for our use cases initially, but as the number of services and development teams increased and we scaled further, deploy times, self-healing capabilities and resource utilization started to become problematic," says Singh, who leads the infrastructure team at Crowdfire. "We realized that we needed a more cloud-native approach to deal with these issues."</p>
<p>As he looked around for solutions, Singh had a checklist of what Crowdfire needed. "We wanted to keep some things separate so they could be shipped independent of other things; this would help remove blockers and let different teams work at their own pace," he says. "We also make a lot of data-driven decisions, so shipping a feature and its iterations quickly was a must."</p>
<p>Kubernetes checked all the boxes and then some. "One of the best things was the built-in service discovery," he says. "When you have a bunch of microservices that need to call each other, having internal DNS readily available and service IPs and ports automatically set as environment variables help a lot." Plus, he adds, "Kubernetes's opinionated approach made it easier to get started."</p>
{{< case-studies/quote image="/images/case-studies/crowdfire/banner3.jpg" >}}
"We realized that we needed a more cloud-native approach to deal with these issues," says Singh. The team decided to implement a custom setup of Kubernetes based on Terraform and Ansible."
{{< /case-studies/quote >}}
<p>There was another compelling business reason for the cloud-native approach. "In today's world of ever-changing business requirements, using cloud native technology provides a variety of options to choose from—even the ability to run services in a hybrid cloud environment," says Singh. "Businesses can keep services in a region closest to the users, and thus benefit from high-availability and resiliency."</p>
<p>So in February 2016, Singh set up a test Kubernetes cluster using the kube-up scripts provided. "I explored the features and was able to deploy an application pretty easily," he says. "However, it seemed like a black box since I didn't understand the components completely, and had no idea what the kube-up script did under the hood. So when it broke, it was hard to find the issue and fix it."</p>
<p>To get a better understanding, Singh dove into the internals of Kubernetes, reading the docs and even some of the code. And he looked to the Kubernetes community for more insight. "I used to stay up a little late every night (a lot of users were active only when it's night here in India) and would try to answer questions on the Kubernetes community Slack from users who were getting started," he says. "I would also follow other conversations closely. I must admit I was able to avoid a lot of issues in our setup because I knew others had faced the same issues."</p>
<p>Based on the knowledge he gained, Singh decided to implement a custom setup of Kubernetes based on <a href="https://www.terraform.io/">Terraform</a> and <a href="https://www.ansible.com/">Ansible</a>. "I wrote Terraform to launch Kubernetes master and nodes (Auto Scaling Groups) and an Ansible playbook to install the required components," he says. (The company recently switched to using prebaked <a href="http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/AMIs.html">AMIs</a> to make the node bringup faster, and is planning to change its networking layer.)</p>
{{< case-studies/quote image="/images/case-studies/crowdfire/banner4.jpg" >}}
"Kubernetes helped us reduce the deployment time from 15 minutes to less than a minute. Due to Kubernetes's self-healing nature, the operations team doesn't need to do any manual intervention in case of a node or pod failure."
{{< /case-studies/quote >}}
<p>First, the team migrated a few staging services from Elastic Beanstalk to the new Kubernetes staging cluster, and then set up a production cluster a month later to deploy some services. The results were convincing. "By the end of March 2016, we established that all the new services must be deployed on Kubernetes," says Singh. "Kubernetes helped us reduce the deployment time from 15 minutes to less than a minute. Due to Kubernetes's self-healing nature, the operations team doesn't need to do any manual intervention in case of a node or pod failure." On top of that, he says, "Dev-Prod parity has improved since developers can experiment with options in dev/staging clusters, and when it's finalized, they just commit the config changes in the respective code repositories. These changes automatically get replicated on the production cluster via CI/CD pipelines. This brings more visibility into the changes being made, and keeping an audit trail."</p>
<p>Over the next six months, the team worked on migrating all the services from Elastic Beanstalk to Kubernetes, except for the few that were deprecated and would soon be terminated anyway. The services were moved one at a time, and their performance was monitored for two to three days each. Today, "We're completely migrated and we run all new services on Kubernetes," says Singh.</p>
<p>The impact has been considerable: With Kubernetes, the company has experienced a 90% cost savings on Elastic Load Balancer, which is now only used for their public, user-facing services. Their EC2 operating expenses have been decreased by as much as 50%.</p>
<p>All 30 engineers at Crowdfire were onboarded at once. "I gave an internal talk where I shared the basic components and demoed the usage of kubectl," says Singh. "Everyone was excited and happy about using Kubernetes. Developers have more control and visibility into their applications running in production now. Most of all, they're happy with the low deploy times and self-healing services."</p>
<p>And they're much more productive, too. "Where we used to do about 5 deployments per day," says Singh, "now we're doing 30+ production and 50+ staging deployments almost every day."</p>
{{< case-studies/quote >}}
The impact has been considerable: With Kubernetes, the company has experienced a 90% cost savings on Elastic Load Balancer, which is now only used for their public, user-facing services. Their EC2 operating expenses have been decreased by as much as 50%.
{{< /case-studies/quote >}}
<p>Singh notes that almost all of the engineers interact with the staging cluster on a daily basis, and that has created a cultural change at Crowdfire. "Developers are more aware of the cloud infrastructure now," he says. "They've started following cloud best practices like better health checks, structured logs to stdout [standard output], and config via files or environment variables."</p>
<p>With Crowdfire's commitment to Kubernetes, Singh is looking to expand the company's cloud-native stack. The team already uses <a href="https://prometheus.io/">Prometheus</a> for monitoring, and he says he is evaluating <a href="https://linkerd.io/">Linkerd</a> and <a href="https://envoyproxy.github.io/">Envoy Proxy</a> as a way to "get more metrics about request latencies and failures, and handle them better." Other CNCF projects, including <a href="http://opentracing.io/">OpenTracing</a> and <a href="https://grpc.io/">gRPC</a> are also on his radar.</p>
<p>Singh has found that the cloud-native community is growing in India, too, particularly in Bangalore. "A lot of startups and new companies are starting to run their infrastructure on Kubernetes," he says.</p>
<p>And when people ask him about Crowdfire's experience, he has this advice to offer: "Kubernetes is a great piece of technology, but it might not be right for you, especially if you have just one or two services or your app isn't easy to run in a containerized environment," he says. "Assess your situation and the value that Kubernetes provides before going all in. If you do decide to use Kubernetes, make sure you understand the components that run under the hood and what role they play in smoothly running the cluster. Another thing to consider is if your apps are 'Kubernetes-ready,' meaning if they have proper health checks and handle termination signals to shut down gracefully."</p>
<p>And if your company fits that profile, go for it. Crowdfire clearly did—and is now reaping the benefits. "In the 15 months that we've been using Kubernetes, it has been amazing for us," says Singh. "It enabled us to iterate quickly, increase development speed and continuously deliver new features and bug fixes to our users, while keeping our operational costs and infrastructure management overhead under control."</p>

Binary file not shown.

Before

Width:  |  Height:  |  Size: 22 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 8.7 KiB

View File

@ -1,89 +0,0 @@
---
title: GolfNow Case Study
case_study_styles: true
cid: caseStudies
new_case_study_styles: true
heading_background: /images/case-studies/golfnow/banner1.jpg
heading_title_logo: /images/golfnow_logo.png
subheading: >
Saving Time and Money with Cloud Native Infrastructure
case_study_details:
- Company: GolfNow
- Location: Orlando, Florida
- Industry: Golf Industry Technology and Services Provider
---
<h2>Challenge</h2>
<p>A member of the <a href="http://www.nbcunicareers.com/our-businesses/nbc-sports-group">NBC Sports Group</a>, <a href="https://www.golfnow.com/">GolfNow</a> is the golf industry's technology and services leader, managing 10 different products, as well as the largest e-commerce tee time marketplace in the world. As its business began expanding rapidly and globally, GolfNow's monolithic application became problematic. "We kept growing our infrastructure vertically rather than horizontally, and the cost of doing business became problematic," says Sheriff Mohamed, GolfNow's Director, Architecture. "We wanted the ability to more easily expand globally."</p>
<h2>Solution</h2>
<p>Turning to microservices and containerization, GolfNow began moving its applications and databases from third-party services to its own clusters running on <a href="https://www.docker.com/">Docker</a> and <a href="http://kubernetes.io/">Kubernetes.</a></p>
<h2>Impact</h2>
<p>The results were immediate. While maintaining the same capacity—and beyond, during peak periods—GolfNow saw its infrastructure costs for the first application virtually cut in half.</p>
{{< case-studies/quote author="SHERIFF MOHAMED, DIRECTOR, ARCHITECTURE AT GOLFNOW" >}}
"With our growth we obviously needed to expand our infrastructure, and we kept growing vertically rather than horizontally. We were basically wasting money and doubling the cost of our infrastructure."
{{< /case-studies/quote >}}
{{< case-studies/lead >}}
It's not every day that you can say you've slashed an operating expense by half.
{{< /case-studies/lead >}}
<p>But Sheriff Mohamed and Josh Chandler did just that when they helped lead their company, <a href="https://www.golfnow.com/">GolfNow</a>, on a journey from a monolithic to a containerized, cloud native infrastructure managed by Kubernetes.</p>
<p>A top-performing business within the NBC Sports Group, GolfNow is a technology and services company with the largest tee time marketplace in the world. GolfNow serves 5 million active golfers across 10 different products. In recent years, the business had grown so fast that the infrastructure supporting their giant monolithic application (written in C#.NET and backed by SQL Server database management system) could not keep up. "With our growth we obviously needed to expand our infrastructure, and we kept growing vertically rather than horizontally," says Sheriff, GolfNow's Director, Architecture. "Our costs were growing exponentially. And on top of that, we had to build a Disaster Recovery (DR) environment, which then meant we'd have to copy exactly what we had in our original data center to another data center that was just the standby. We were basically wasting money and doubling the cost of our infrastructure."</p>
<p>In moving just the first of GolfNow's important applications—a booking engine for golf courses and B2B marketing platform—from third-party services to their own Kubernetes environment, "our bill went down drastically," says Sheriff.</p>
<p>The path to those stellar results began in late 2014. In order to support GolfNow's global growth, the team decided that the company needed to have multiple data centers and the ability to quickly and easily re-route traffic as needed. "From there we knew that we needed to go in a direction of breaking things apart, microservices, and containerization," says Sheriff. "At the time we were trying to get away from <a href="https://www.microsoft.com/net">C#.NET</a> and <a href="https://www.microsoft.com/en-cy/sql-server/sql-server-2016">SQL Server</a> since it didn't run very well on Linux, where everything container was running smoothly."</p>
<p>To that end, the team shifted to working with <a href="https://nodejs.org/">Node.js</a>, the open-source, cross-platform JavaScript runtime environment for developing tools and applications, and <a href="https://www.mongodb.com/">MongoDB</a>, the open-source database program. At the time, <a href="https://www.docker.com/">Docker</a>, the platform for deploying applications in containers, was still new. But once the team began experimenting with it, Sheriff says, "we realized that was the way we wanted to go, especially since that's the way the industry is heading."</p>
{{< case-studies/quote image="/images/case-studies/golfnow/banner3.jpg" >}}
"The team migrated the rest of the application into their Kubernetes cluster. And the impact was immediate: On top of cutting monthly costs by a large percentage, says Sheriff, 'Running at the same capacity and during our peak time, we were able to horizontally grow. Since we were using our VMs more efficiently with containers, we didn't have to pay extra money at all.'"
{{< /case-studies/quote >}}
<p>GolfNow's dev team ran an "internal, low-key" proof of concept and were won over. "We really liked how easy it was to be able to pass containers around to each other and have them up and running in no time, exactly the way it was running on my machine," says Sheriff. "Because that is always the biggest gripe that Ops has with developers, right? 'It worked on my machine!' But then we started getting to the point of, 'How do we make sure that these things stay up and running?'"</p>
<p>That led the team on a quest to find the right orchestration system for the company's needs. Sheriff says the first few options they tried were either too heavy or "didn't feel quite right." In late summer 2015, they discovered the just-released <a href="http://kubernetes.io/">Kubernetes</a>, which Sheriff immediately liked for its ease of use. "We did another proof of concept," he says, "and Kubernetes won because of the fact that the community backing was there, built on top of what Google had already done."</p>
<p>But before they could go with Kubernetes, <a href="http://www.nbc.com/">NBC</a>, GolfNow's parent company, also asked them to comparison shop with another company. Sheriff and his team liked the competing company's platform user interface, but didn't like that its platform would not allow containers to run natively on Docker. With no clear decision in sight, Sheriff's VP at GolfNow, Steve McElwee, set up a three-month trial during which a GolfNow team (consisting of Sheriff and Josh, who's now Lead Architect, Open Platforms) would build out a Kubernetes environment, and a large NBC team would build out one with the other company's platform.</p>
<p>"We spun up the cluster and we tried to get everything to run the way we wanted it to run," Sheriff says. "The biggest thing that we took away from it is that not only did we want our applications to run within Kubernetes and Docker, we also wanted our databases to run there. We literally wanted our entire infrastructure to run within Kubernetes."</p>
<p>At the time there was nothing in the community to help them get Kafka and MongoDB clusters running within a Kubernetes and Docker environment, so Sheriff and Josh figured it out on their own, taking a full month to get it right. "Everything started rolling from there," Sheriff says. "We were able to get all our applications connected, and we finished our side of the proof of concept a month in advance. My VP was like, 'Alright, it's over. Kubernetes wins.'"</p>
<p>The next step, beginning in January 2016, was getting everything working in production. The team focused first on one application that was already written in Node.js and MongoDB. A booking engine for golf courses and B2B marketing platform, the application was already going in the microservice direction but wasn't quite finished yet. At the time, it was running in <a href="https://devcenter.heroku.com/articles/mongohq">Heroku Compose</a> and other third-party services—resulting in a large monthly bill.</p>
{{< case-studies/quote image="/images/case-studies/golfnow/banner4.jpg" >}}
"'The time I spent actually moving the applications was under 30 seconds! We can move data centers in just incredible amounts of time. If you haven't come from the Kubernetes world you wouldn't believe me.' Sheriff puts it in these terms: 'Before Kubernetes I wasn't sleeping at night, literally. I was woken up all the time, because things were down. After Kubernetes, I've been sleeping at night.'"
{{< /case-studies/quote >}}
<p>"The goal was to take all of that out and put it within this new platform we've created with Kubernetes on <a href="https://cloud.google.com/compute/">Google Compute Engine (GCE)</a>," says Sheriff. "So we ended up building piece by piece, in parallel, what was out in Heroku and Compose, in our Kubernetes cluster. Then, literally, just switched configs in the background. So in Heroku we had the app running hitting a Compose database. We'd take the config, change it and make it hit the database that was running in our cluster."</p>
<p>Using this procedure, they were able to migrate piecemeal, without any downtime. The first migration was done during off hours, but to test the limits, the team migrated the second database in the middle of the day, when lots of users were running the application. "We did it," Sheriff says, "and again it was successful. Nobody noticed."</p>
<p>After three weeks of monitoring to make sure everything was running stable, the team migrated the rest of the application into their Kubernetes cluster. And the impact was immediate: On top of cutting monthly costs by a large percentage, says Sheriff, "Running at the same capacity and during our peak time, we were able to horizontally grow. Since we were using our VMs more efficiently with containers, we didn't have to pay extra money at all."</p>
<p>Not only were they saving money, but they were also saving time. "I had a meeting this morning about migrating some applications from one cluster to another," says Josh. "I spent about 2 hours explaining the process. The time I spent actually moving the applications was under 30 seconds! We can move data centers in just incredible amounts of time. If you haven't come from the Kubernetes world you wouldn't believe me." Sheriff puts it in these terms: "Before Kubernetes I wasn't sleeping at night, literally. I was woken up all the time, because things were down. After Kubernetes, I've been sleeping at night."</p>
<p>A small percentage of the applications on GolfNow have been migrated over to the Kubernetes environment. "Our Core Team is rewriting a lot of the .NET applications into <a href="https://www.microsoft.com/net/core">.NET Core</a> [which is compatible with Linux and Docker] so that we can run them within containers," says Sheriff.</p>
<p>Looking ahead, Sheriff and his team want to spend 2017 continuing to build a whole platform around Kubernetes with <a href="https://github.com/drone/drone">Drone</a>, an open-source continuous delivery platform, to make it more developer-centric. "Now they're able to manage configuration, they're able to manage their deployments and things like that, making all these subteams that are now creating all these microservices, be self sufficient," he says. "So it can pull us away from applications and allow us to just make sure the cluster is running and healthy, and then actually migrate that over to our Ops team."</p>
{{< case-studies/quote >}}
"Having gone from complete newbies to production-ready in three months, the GolfNow team is eager to encourage other companies to follow their lead. 'This is The Six Million Dollar Man of the cloud right now,' adds Josh. 'Just try it out, watch it happen. I feel like the proof is in the pudding when you look at these kinds of application stacks. They're faster, they're more resilient.'"
{{< /case-studies/quote >}}
<p>And long-term, Sheriff has an even bigger goal for getting more people into the Kubernetes fold. "We're actually trying to make this platform generic enough so that any of our sister companies can use it if they wish," he says. "Most definitely I think it can be used as a model. I think the way we migrated into it, the way we built it out, are all ways that I think other companies can learn from, and should not be afraid of."</p>
<p>The GolfNow team is also giving back to the Kubernetes community by open-sourcing a bot framework that Josh built. "We noticed that the dashboard user interface is actually moving a lot faster than when we started," says Sheriff. "However we realized what we needed was something that's more of a bot that really helps us administer Kubernetes as a whole through Slack." Josh explains: "With the Kubernetes-Slack integration, you can essentially hook into a cluster and the issue commands and edit configurations. We've tried to simplify the security configuration as much as possible. We hope this will be our major thank you to Kubernetes, for everything you've given us."</p>
<p>Having gone from complete newbies to production-ready in three months, the GolfNow team is eager to encourage other companies to follow their lead. The lessons they've learned: "You've got to have buy-in from your boss," says Sheriff. "Another big deal is having two to three people dedicated to this type of endeavor. You can't have people who are half in, half out." And if you don't have buy-in from the get go, proving it out will get you there.</p>
<p>"This is The Six Million Dollar Man of the cloud right now," adds Josh. "Just try it out, watch it happen. I feel like the proof is in the pudding when you look at these kinds of application stacks. They're faster, they're more resilient."</p>

Binary file not shown.

Before

Width:  |  Height:  |  Size: 5.7 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 5.7 KiB

View File

@ -1,85 +0,0 @@
---
title: Haufe Group Case Study
case_study_styles: true
cid: caseStudies
new_case_study_styles: true
heading_background: /images/case-studies/haufegroup/banner1.jpg
heading_title_logo: /images/haufegroup_logo.png
subheading: >
Paving the Way for Cloud Native for Midsize Companies
case_study_details:
- Company: Haufe Group
- Location: Freiburg, Germany
- Industry: Media and Software
---
<h2>Challenge</h2>
<p>Founded in 1930 as a traditional publisher, Haufe Group has grown into a media and software company with 95 percent of its sales from digital products. Over the years, the company has gone from having "hardware in the basement" to outsourcing its infrastructure operations and IT. More recently, the development of new products, from Internet portals for tax experts to personnel training software, has created demands for increased speed, reliability and scalability. "We need to be able to move faster," says Solution Architect Martin Danielsson. "Adapting workloads is something that we really want to be able to do."</p>
<h2>Solution</h2>
<p>Haufe Group began its cloud-native journey when <a href="https://azure.microsoft.com/">Microsoft Azure</a> became available in Europe; the company needed cloud deployments for its desktop apps with bandwidth-heavy download services. "After that, it has been different projects trying out different things," says Danielsson. Two years ago, Holger Reinhardt joined Haufe Group as CTO and rapidly re-oriented the traditional host provider-based approach toward a cloud and API-first strategy.</p>
<p>A core part of this strategy was a strong mandate to embrace infrastructure-as-code across the entire software deployment lifecycle via Docker. The company is now getting ready to go live with two services in production using <a href="https://kubernetes.io/">Kubernetes</a> orchestration on <a href="https://azure.microsoft.com/">Microsoft Azure</a> and <a href="https://aws.amazon.com/">Amazon Web Services</a>. The team is also working on breaking up one of their core Java Enterprise desktop products into microservices to allow for better evolvability and dynamic scaling in the cloud.</p>
<h2>Impact</h2>
<p>With the ability to adapt workloads, Danielsson says, teams "will be able to scale down to around half the capacity at night, saving 30 percent of the hardware cost." Plus, shorter release times have had a major impact. "Before, we had to announce at least a week in advance when we wanted to do a release because there was a huge checklist of things that you had to do," he says. "By going cloud native, we have the infrastructure in place to be able to automate all of these things. Now we can get a new release done in half an hour instead of days."</p>
{{< case-studies/quote author="Martin Danielsson, Solution Architect, Haufe Group" >}}
"Over the next couple of years, people won't even think that much about it when they want to run containers. Kubernetes is going to be the go-to solution."
{{< /case-studies/quote >}}
{{< case-studies/lead >}}
More than 80 years ago, Haufe Group was founded as a traditional publishing company, printing books and commentary on paper.
{{< /case-studies/lead >}}
<p>By the 1990s, though, the company's leaders recognized that the future was digital, and to their credit, were able to transform Haufe Group into a media and software business that now gets 95 percent of its sales from digital products. "Among the German companies doing this, we were one of the early adopters," says Martin Danielsson, Solution Architect for Haufe Group.</p>
<p>And now they're leading the way for midsize companies embracing cloud-native technology like Kubernetes. "The really big companies like Ticketmaster and Google get it right, and the startups get it right because they're faster," says Danielsson. "We're in this big lump of companies in the middle with a lot of legacy, a lot of structure, a lot of culture that does not easily fit the cloud technologies. We're just 1,500 people, but we have hundreds of customer-facing applications. So we're doing things that will be relevant for many companies of our size or even smaller."</p>
<p>Many of those legacy challenges stemmed from simply following the technology trends of the times. "We used to do full DevOps," he says. In the 1990s and 2000s, "that meant that you had your hardware in the basement. And then 10 years ago, the hype of the moment was to outsource application operations, outsource everything, and strip down your IT department to take away the distraction of all these hardware things. That's not our area of expertise. We didn't want to be an infrastructure provider. And now comes the backlash of that."</p>
<p>Haufe Group began feeling the pain as they were developing more new products, from Internet portals for tax experts to personnel training software, that have created demands for increased speed, reliability and scalability. "Right now, we have this break in workflows, where we go from writing concepts to developing, handing it over to production and then handing that over to your host provider," he says. "And then when things go bad we have no clue what went wrong. We definitely want to take back control, and we want to move a lot faster. Adapting workloads is something that we really want to be able to do."</p>
<p>Those needs led them to explore cloud-native technology. Their first foray into the cloud was doing deployments in <a href="https://azure.microsoft.com/">Microsoft Azure</a>, once it became available in Europe, for desktop products that had built-in download services. Hosting expenses for such bandwidth-heavy services were too high, so the company turned to the cloud. "After that, it has been different projects trying out different things," says Danielsson.</p>
{{< case-studies/quote image="/images/case-studies/haufegroup/banner3.jpg" >}}
"We have been doing containers for the last two years, and we really got the hang of how they work," says Danielsson. "But it was always for development and test, never in production, because we didn't fully understand how that would work. And to me, Kubernetes was definitely the technology that solved that."
{{< /case-studies/quote >}}
<p>Two years ago, Holger Reinhardt joined Haufe Group as CTO and rapidly re-oriented the traditional host provider-based approach toward a cloud and API-first strategy. A core part of this strategy was a strong mandate to embrace infrastructure-as-code across the entire software deployment lifecycle via Docker. Some experiments went further than others; German regulations about sensitive data proved to be a road block in moving some workloads to Azure and Amazon Web Services. "Due to our history, Germany is really strict with things like personally identifiable data," Danielsson says.</p>
<p>These experiments took on new life with the arrival of the Azure Sovereign Cloud for Germany (an Azure clone run by the German T-Systems provider). With the availability of Azure.de—which conforms to Germany's privacy regulations—teams started to seriously consider deploying production loads in Docker into the cloud. "We have been doing containers for the last two years, and we really got the hang of how they work," says Danielsson. "But it was always for development and test, never in production, because we didn't fully understand how that would work. And to me, Kubernetes was definitely the technology that solved that."</p>
<p>In parallel, Danielsson had built an API management system with the aim of supporting CI/CD scenarios, aspects of which were missing in off-the-shelf API management products. With a foundation based on <a href="https://getkong.org/">Mashape's Kong</a> gateway, it is open-sourced as <a href="http://wicked.haufe.io/">wicked.haufe.io</a>. He put wicked.haufe.io to use with his product team.<br><br> Otherwise, Danielsson says his philosophy was "don't try to reinvent the wheel all the time. Go for what's there and 99 percent of the time it will be enough. And if you think you really need something custom or additional, think perhaps once or twice again. One of the things that I find so amazing with this cloud-native framework is that everything ties in."</p>
<p>Currently, Haufe Group is working on two projects using Kubernetes in production. One is a new mobile application for researching legislation and tax laws. "We needed a way to take out functionality from a legacy core and put an application on top of that with an API gateway—a lot of moving parts that screams containers," says Danielsson. So the team moved the build pipeline away from "deploying to some old, huge machine that you could deploy anything to" and onto a Kubernetes cluster where there would be automatic CI/CD "with feature branches and all these things that were a bit tedious in the past."</p>
{{< case-studies/quote image="/images/case-studies/haufegroup/banner4.jpg" >}}
"Before, we had to announce at least a week in advance when we wanted to do a release because there was a huge checklist of things that you had to do," says Danielsson. "By going cloud native, we have the infrastructure in place to be able to automate all of these things. Now we can get a new release done in half an hour instead of days."
{{< /case-studies/quote >}}
<p>It was a proof of concept effort, and the proof was in the pudding. "Everyone was really impressed at what we accomplished in a week," says Danielsson. "We did these kinds of integrations just to make sure that we got a handle on how Kubernetes works. If you can create optimism and buzz around something, it's half won. And if the developers and project managers know this is working, you're more or less done." Adds Reinhardt: "You need to create some very visible, quick wins in order to overcome the status quo."</p>
<p>The impact on the speed of deployment was clear: "Before, we had to announce at least a week in advance when we wanted to do a release because there was a huge checklist of things that you had to do," says Danielsson. "By going cloud native, we have the infrastructure in place to be able to automate all of these things. Now we can get a new release done in half an hour instead of days."</p>
<p>The potential impact on cost was another bonus. "Hosting applications is quite expensive, so moving to the cloud is something that we really want to be able to do," says Danielsson. With the ability to adapt workloads, teams "will be able to scale down to around half the capacity at night, saving 30 percent of the hardware cost."</p>
<p>Just as importantly, Danielsson says, there's added flexibility: "When we try to move or rework applications that are really crucial, it's often tricky to validate whether the path we want to take is going to work out well. In order to validate that, we would need to reproduce the environment and really do testing, and that's prohibitively expensive and simply not doable with traditional host providers. Cloud native gives us the ability to do risky changes and validate them in a cost-effective way."</p>
<p>As word of the two successful test projects spread throughout the company, interest in Kubernetes has grown. "We want to be able to support our developers in running Kubernetes clusters but we're not there yet, so we allow them to do it as long as they're aware that they are on their own," says Danielsson. "So that's why we are also looking at things like [the managed Kubernetes platform] <a href="https://coreos.com/tectonic/">CoreOS Tectonic</a>, <a href="https://azure.microsoft.com/en-us/services/container-service/">Azure Container Service</a>, <a href="https://aws.amazon.com/ecs/">ECS</a>, etc. These kinds of services will be a lot more relevant to midsize companies that want to leverage cloud native but don't have the IT departments or the structure around that."</p>
<p>In the next year and a half, Danielsson says the company will be working on moving one of their legacy desktop products, a web app for researching legislation and tax laws originally built in Java Enterprise, onto cloud-native technology. "We're doing a microservice split out right now so that we can independently deploy the different parts," he says. The main website, which provides free content for customers, is also moving to cloud native.</p>
{{< case-studies/quote >}}
"the execution of a strategy requires alignment of culture, structure and technology. Only if those three dimensions are aligned can you successfully execute a transformation into microservices and cloud-native architectures. And it is only then that the Cloud will pay the dividends in much faster speeds in product innovation and much lower operational costs."
{{< /case-studies/quote >}}
<p>But with these goals, Danielsson believes there are bigger cultural challenges that need to be constantly addressed. The move to new technology, not to mention a shift toward DevOps, means a lot of change for employees. "The roles were rather fixed in the past," he says. "You had developers, you had project leads, you had testers. And now you get into these really, really important things like test automation. Testers aren't actually doing click testing anymore, and they have to write automated testing. And if you really want to go full-blown CI/CD, all these little pieces have to work together so that you get the confidence to do a check in, and know this check in is going to land in production, because if I messed up, some test is going to break. This is a really powerful thing because whatever you do, whenever you merge something into the trunk or to the master, this is going live. And that's where you either get the people or they run away screaming." Danielsson understands that it may take some people much longer to get used to the new ways.</p>
<p>"Culture is nothing that you can force on people," he says. "You have to live it for yourself. You have to evangelize. You have to show the advantages time and time again: This is how you can do it, this is what you get from it." To that end, his team has scheduled daylong workshops for the staff, bringing in outside experts to talk about everything from API to Devops to cloud.</p>
<p>For every person who runs away screaming, many others get drawn in. "Get that foot in the door and make them really interested in this stuff," says Danielsson. "Usually it catches on. We have people you never would have expected chanting, 'Docker Docker Docker' now. It's cool to see them realize that there is a world outside of their Python libraries. It's awesome to see them really work with Kubernetes."</p>
<p>Ultimately, Reinhardt says, "the execution of a strategy requires alignment of culture, structure and technology. Only if those three dimensions are aligned can you successfully execute a transformation into microservices and cloud-native architectures. And it is only then that the Cloud will pay the dividends in much faster speeds in product innovation and much lower operational costs."</p>

Binary file not shown.

Before

Width:  |  Height:  |  Size: 14 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 14 KiB

View File

@ -1,73 +0,0 @@
---
title: Huawei Case Study
case_study_styles: true
cid: caseStudies
new_case_study_styles: true
heading_background: /images/case-studies/huawei/banner1.jpg
heading_title_logo: /images/huawei_logo.png
subheading: >
Embracing Cloud Native as a User and a Vendor
case_study_details:
- Company: Huawei
- Location: Shenzhen, China
- Industry: Telecommunications Equipment
---
<h2>Challenge</h2>
<p>A multinational company that's the largest telecommunications equipment manufacturer in the world, Huawei has more than 180,000 employees. In order to support its fast business development around the globe, <a href="http://www.huawei.com/">Huawei</a> has eight data centers for its internal I.T. department, which have been running 800+ applications in 100K+ VMs to serve these 180,000 users. With the rapid increase of new applications, the cost and efficiency of management and deployment of VM-based apps all became critical challenges for business agility. "It's very much a distributed system so we found that managing all of the tasks in a more consistent way is always a challenge," says Peixin Hou, the company's Chief Software Architect and Community Director for Open Source. "We wanted to move into a more agile and decent practice."</p>
<h2>Solution</h2>
<p>After deciding to use container technology, Huawei began moving the internal I.T. department's applications to run on <a href="http://kubernetes.io/">Kubernetes</a>. So far, about 30 percent of these applications have been transferred to cloud native.</p>
<h2>Impact</h2>
<p>"By the end of 2016, Huawei's internal I.T. department managed more than 4,000 nodes with tens of thousands containers using a Kubernetes-based Platform as a Service (PaaS) solution," says Hou. "The global deployment cycles decreased from a week to minutes, and the efficiency of application delivery has been improved 10 fold." For the bottom line, he says, "We also see significant operating expense spending cut, in some circumstances 20-30 percent, which we think is very helpful for our business." Given the results Huawei has had internally and the demand it is seeing externally the company has also built the technologies into <a href="http://developer.huawei.com/ict/en/site-paas">FusionStage™</a>, the PaaS solution it offers its customers.</p>
{{< case-studies/quote author="Peixin Hou, chief software architect and community director for open source" >}}
"If you're a vendor, in order to convince your customer, you should use it yourself. Luckily because Huawei has a lot of employees, we can demonstrate the scale of cloud we can build using this technology."
{{< /case-studies/quote >}}
<p>Huawei's Kubernetes journey began with one developer. Over two years ago, one of the engineers employed by the networking and telecommunications giant became interested in <a href="http://kubernetes.io/">Kubernetes</a>, the technology for managing application containers across clusters of hosts, and started contributing to its open source community. As the technology developed and the community grew, he kept telling his managers about it.</p>
<p>And as fate would have it, at the same time, Huawei was looking for a better orchestration system for its internal enterprise I.T. department, which supports every business flow processing. "We have more than 180,000 employees worldwide, and a complicated internal procedure, so probably every week this department needs to develop some new applications," says Peixin Hou, Huawei's Chief Software Architect and Community Director for Open Source. "Very often our I.T. departments need to launch tens of thousands of containers, with tasks running across thousands of nodes across the world. It's very much a distributed system, so we found that managing all of the tasks in a more consistent way is always a challenge."</p>
<p>In the past, Huawei had used virtual machines to encapsulate applications, but "every time when we start a VM," Hou says, "whether because it's a new service or because it was a service that was shut down because of some abnormal node functioning, it takes a lot of time." Huawei turned to containerization, so the timing was right to try Kubernetes. It took a year to adopt that engineer's suggestion the process "is not overnight," says Hou but once in use, he says, "Kubernetes basically solved most of our problems. Before, the time of deployment took about a week, now it only takes minutes. The developers are happy. That department is also quite happy."</p>
<p>Hou sees great benefits to the company that come with using this technology: "Kubernetes brings agility, scale-out capability, and DevOps practice to the cloud-based applications," he says. "It provides us with the ability to customize the scheduling architecture, which makes possible the affinity between container tasks that gives greater efficiency. It supports multiple container formats. It has extensive support for various container networking solutions and container storage."</p>
{{< case-studies/quote image="/images/case-studies/huawei/banner3.jpg" >}}
"Kubernetes basically solved most of our problems. Before, the time of deployment took about a week, now it only takes minutes. The developers are happy. That department is also quite happy."
{{< /case-studies/quote >}}
<p>And not least of all, there's an impact on the bottom line. Says Hou: "We also see significant operating expense spending cut in some circumstances 20-30 percent, which is very helpful for our business."</p>
<p>Pleased with those initial results, and seeing a demand for cloud native technologies from its customers, Huawei doubled down on Kubernetes. In the spring of 2016, the company became not only a user but also a vendor.</p>
<p>"We built the Kubernetes technologies into our solutions," says Hou, referring to Huawei's <a href="http://developer.huawei.com/ict/en/site-paas">FusionStage™</a> PaaS offering. "Our customers, from very big telecommunications operators to banks, love the idea of cloud native. They like Kubernetes technology. But they need to spend a lot of time to decompose their applications to turn them into microservice architecture, and as a solution provider, we help them. We've started to work with some Chinese banks, and we see a lot of interest from our customers like <a href="http://www.chinamobileltd.com/">China Mobile</a> and <a href="https://www.telekom.com/en">Deutsche Telekom</a>."</p>
<p>"If you're just a user, you're just a user," adds Hou. "But if you're a vendor, in order to even convince your customers, you should use it yourself. Luckily because Huawei has a lot of employees, we can demonstrate the scale of cloud we can build using this technology. We provide customer wisdom." While Huawei has its own private cloud, many of its customers run cross-cloud applications using Huawei's solutions. It's a big selling point that most of the public cloud providers now support Kubernetes. "This makes the cross-cloud transition much easier than with other solutions," says Hou.</p>
{{< case-studies/quote image="/images/case-studies/huawei/banner4.jpg" >}}
"Our customers, from very big telecommunications operators to banks, love the idea of cloud native. They like Kubernetes technology. But they need to spend a lot of time to decompose their applications to turn them into microservice architecture, and as a solution provider, we help them."
{{< /case-studies/quote >}}
<p>Within Huawei itself, once his team completes the transition of the internal business procedure department to Kubernetes, Hou is looking to convince more departments to move over to the cloud native development cycle and practice. "We have a lot of software developers, so we will provide them with our platform as a service solution, our own product," he says. "We would like to see significant cuts in their iteration cycle."</p>
<p>Having overseen the initial move to Kubernetes at Huawei, Hou has advice for other companies considering the technology: "When you start to design the architecture of your application, think about cloud native, think about microservice architecture from the beginning," he says. "I think you will benefit from that."</p>
<p>But if you already have legacy applications, "start from some microservice-friendly part of those applications first, parts that are relatively easy to be decomposed into simpler pieces and are relatively lightweight," Hou says. "Don't think from day one that within how many days I want to move the whole architecture, or move everything into microservices. Don't put that as a kind of target. You should do it in a gradual manner. And I would say for legacy applications, not every piece would be suitable for microservice architecture. No need to force it."</p>
<p>After all, as enthusiastic as Hou is about Kubernetes at Huawei, he estimates that "in the next 10 years, maybe 80 percent of the workload can be distributed, can be run on the cloud native environments. There's still 20 percent that's not, but it's fine. If we can make 80 percent of our workload really be cloud native, to have agility, it's a much better world at the end of the day."</p>
{{< case-studies/quote >}}
"In the next 10 years, maybe 80 percent of the workload can be distributed, can be run on the cloud native environments. There's still 20 percent that's not, but it's fine. If we can make 80 percent of our workload really be cloud native, to have agility, it's a much better world at the end of the day."
{{< /case-studies/quote >}}
<p>In the nearer future, Hou is looking forward to new features that are being developed around Kubernetes, not least of all the ones that Huawei is contributing to. Huawei engineers have worked on the federation feature (which puts multiple Kubernetes clusters in a single framework to be managed seamlessly), scheduling, container networking and storage, and a just-announced technology called <a href="http://containerops.org/">Container Ops</a>, which is a DevOps pipeline engine. "This will put every DevOps job into a container," he explains. "And then this container mechanism is running using Kubernetes, but is also used to test Kubernetes. With that mechanism, we can make the containerized DevOps jobs be created, shared and managed much more easily than before."</p>
<p>Still, Hou sees this technology as only halfway to its full potential. First and foremost, he'd like to expand the scale it can orchestrate, which is important for supersized companies like Huawei as well as some of its customers.</p>
<p>Hou proudly notes that two years after that first Huawei engineer became a contributor to and evangelist for Kubernetes, Huawei is now a top contributor to the community. "We've learned that the more you contribute to the community," he says, "the more you get back."</p>

Binary file not shown.

Before

Width:  |  Height:  |  Size: 11 KiB

View File

@ -1 +0,0 @@
<svg id="Layer_1" data-name="Layer 1" xmlns="http://www.w3.org/2000/svg" viewBox="0 0 215.9892 128.40633"><defs><style>.cls-1{fill:transparent;}.cls-2{fill:#4c81c2;}</style></defs><title>ibm_featured_logo</title><rect class="cls-1" x="-5.9997" y="-8.99955" width="229.48853" height="143.9928"/><polygon class="cls-2" points="190.441 33.693 162.454 33.693 164.178 28.868 190.441 28.868 190.441 33.693"/><path class="cls-2" d="M115.83346,28.867l25.98433-.003,1.7014,4.83715c.01251-.00687-27.677.00593-27.677,0C115.84224,33.69422,115.82554,28.867,115.83346,28.867Z"/><path class="cls-2" d="M95.19668,28.86593A18.6894,18.6894,0,0,1,106.37358,33.7s-47.10052.00489-47.10052,0V28.86488Z"/><rect class="cls-2" x="22.31176" y="28.86593" width="32.72063" height="4.82558"/><path class="cls-2" d="M190.44115,42.74673h-31.194s1.70142-4.79994,1.691-4.80193h29.50305Z"/><polygon class="cls-2" points="146.734 42.753 115.832 42.753 115.832 37.944 145.041 37.944 146.734 42.753"/><path class="cls-2" d="M110.04127,37.94271a12.47,12.47,0,0,1,1.35553,4.80214H59.28193V37.94271Z"/><rect class="cls-2" x="22.31176" y="37.94271" width="32.72063" height="4.80214"/><polygon class="cls-2" points="156.056 51.823 157.768 46.998 181.191 47.005 181.191 51.812 156.056 51.823"/><polygon class="cls-2" points="148.237 46.997 149.944 51.823 125.046 51.823 125.046 46.997 148.237 46.997"/><path class="cls-2" d="M111.81,46.99627a15.748,15.748,0,0,1-.68923,4.82641H96.85137V46.99627Z"/><rect class="cls-2" x="31.43162" y="47.01973" width="14.06406" height="4.8019"/><rect class="cls-2" x="68.7486" y="46.99627" width="14.03976" height="4.82537"/><path class="cls-2" d="M138.87572,57.03292s.004,3.65225.001,3.65913H125.04558V55.89h26.35583l1.637,4.4773c.00773.00292,1.57841-4.48815,1.58153-4.47835h26.56223V60.692h-13.763c-.00124-.00687-.00771-3.65819-.00771-3.65819l-1.273,3.65819-25.99183-.00687Z"/><path class="cls-2" d="M68.7486,55.889h40.30365v-.00188a18.13723,18.13723,0,0,1-3.99812,4.80494s-36.30647.00668-36.30647,0Z"/><rect class="cls-2" x="31.43162" y="55.88794" width="14.06406" height="4.80316"/><rect class="cls-2" x="167.41912" y="64.94348" width="13.76302" height="4.80212"/><path class="cls-2" d="M138.87572,64.94348H125.04558V69.7456c-.00688-.0025,13.83411.00167,13.83411,0C138.87969,69.7431,138.89532,64.94348,138.87572,64.94348Z"/><path class="cls-2" d="M164.63927,64.94348c-.06255-.007-1.61218,4.79962-1.67723,4.80212l-19.60378.00835c-.01543-.00751-1.72371-4.81745-1.725-4.81047Z"/><path class="cls-2" d="M68.74672,64.94233H104.985a23.7047,23.7047,0,0,1,4.32076,4.80327c.06609-.0025-40.5581.00167-40.5581,0Z"/><path class="cls-2" d="M45.49359,69.74436v-4.802H31.45487V69.7431Z"/><rect class="cls-2" x="167.41912" y="73.99693" width="13.76198" height="4.80295"/><rect class="cls-2" x="125.04474" y="73.99693" width="13.83097" height="4.80212"/><path class="cls-2" d="M159.74351,78.8224c.00376-.02169,1.69745-4.82964,1.72373-4.82547H144.80219c-.029-.00209,1.70848,4.80378,1.70848,4.80378S159.7404,78.84241,159.74351,78.8224Z"/><path class="cls-2" d="M68.74766,78.79905c0,.01919-.00094-4.80212,0-4.803H82.9958s.01272,4.80462,0,4.80462C82.98224,78.80072,68.74766,78.79489,68.74766,78.79905Z"/><path class="cls-2" d="M111.30529,73.9961a13.94783,13.94783,0,0,1,.89542,4.825H97.10364v-4.825Z"/><rect class="cls-2" x="31.45487" y="73.9961" width="14.03872" height="4.80171"/><rect class="cls-2" x="167.41912" y="82.86525" width="23.0212" height="4.80421"/><rect class="cls-2" x="115.83139" y="82.86525" width="23.04432" height="4.80421"/><polygon class="cls-2" points="156.647 87.669 149.618 87.669 147.931 82.865 158.272 82.865 156.647 87.669"/><path class="cls-2" d="M22.3099,82.86525v4.80212H55.008c.01366.00751-.01469-4.79919,0-4.79919Z"/><path class="cls-2" d="M111.60237,82.86525c-.3442,1.58445-.65962,3.5158-1.81732,4.80421l-.43175-.00209H59.28005V82.86525Z"/><polygon class="cls-2" points="153.461 96.733 152.814 96.733 151.171 91.92 155.147 91.92 153.461 96.733"/><rect class="cls-2" x="167.41788" y="91.91953" width="23.02244" height="4.82547"/><path class="cls-2" d="M59.27307,96.73333V91.92745s47.24073.00585,47.37623.00585A17.945,17.945,0,0,1,94.43864,96.745l-35.15859-.00959"/><rect class="cls-2" x="115.83139" y="91.91953" width="23.04432" height="4.82547"/><path class="cls-2" d="M55.008,91.94079s-.01469,4.79253,0,4.79253c.01366,0-32.6885.0196-32.69809.00961-.00888-.00961.00875-4.81548,0-4.81548S54.9933,91.95664,55.008,91.94079Z"/></svg>

Before

Width:  |  Height:  |  Size: 4.3 KiB

View File

@ -1,80 +0,0 @@
---
title: IBM Case Study
linkTitle: IBM
case_study_styles: true
cid: caseStudies
logo: ibm_featured_logo.svg
featured: false
new_case_study_styles: true
heading_background: /images/case-studies/ibm/banner1.jpg
heading_title_logo: /images/ibm_logo.png
subheading: >
Building an Image Trust Service on Kubernetes with Notary and TUF
case_study_details:
- Company: IBM
- Location: Armonk, New York
- Industry: Cloud Computing
---
<h2>Challenge</h2>
<p><a href="https://www.ibm.com/cloud/">IBM Cloud</a> offers public, private, and hybrid cloud functionality across a diverse set of runtimes from its OpenWhisk-based function as a service (FaaS) offering, managed <a href="https://kubernetes.io">Kubernetes</a> and containers, to <a href="https://www.cloudfoundry.org">Cloud Foundry</a> platform as a service (PaaS). These runtimes are combined with the power of the company's enterprise technologies, such as MQ and DB2, its modern artificial intelligence (AI) Watson, and data analytics services. Users of IBM Cloud can exploit capabilities from more than 170 different cloud native services in its catalog, including capabilities such as IBM's Weather Company API and data services. In the later part of 2017, the IBM Cloud Container Registry team wanted to build out an image trust service.</p>
<h2>Solution</h2>
<p>The work on this new service culminated with its public availability in the IBM Cloud in February 2018. The image trust service, called Portieris, is fully based on the <a href="https://www.cncf.io">Cloud Native Computing Foundation (CNCF)</a> open source project <a href="https://github.com/theupdateframework/notary">Notary</a>, according to Michael Hough, a software developer with the IBM Cloud Container Registry team. Portieris is a Kubernetes admission controller for enforcing content trust. Users can create image security policies for each Kubernetes namespace, or at the cluster level, and enforce different levels of trust for different images. Portieris is a key part of IBM's trust story, since it makes it possible for users to consume the company's Notary offering from within their IKS clusters. The offering is that Notary server runs in IBM's cloud, and then Portieris runs inside the IKS cluster. This enables users to be able to have their IKS cluster verify that the image they're loading containers from contains exactly what they expect it to, and Portieris is what allows an IKS cluster to apply that verification.</p>
<h2>Impact</h2>
<p>IBM's intention in offering a managed Kubernetes container service and image registry is to provide a fully secure end-to-end platform for its enterprise customers. "Image signing is one key part of that offering, and our container registry team saw Notary as the de facto way to implement that capability in the current Docker and container ecosystem," Hough says. The company had not been offering image signing before, and Notary is the tool it used to implement that capability. "We had a multi-tenant Docker Registry with private image hosting," Hough says. "The Docker Registry uses hashes to ensure that image content is correct, and data is encrypted both in flight and at rest. But it does not provide any guarantees of who pushed an image. We used Notary to enable users to sign images in their private registry namespaces if they so choose."</p>
{{< case-studies/quote author="Michael Hough, a software developer with the IBM Container Registry team" >}}
"We see CNCF as a safe haven for cloud native open source, providing stability, longevity, and expected maintenance for member projects—no matter the originating vendor or project."
{{< /case-studies/quote >}}
{{< case-studies/lead >}}
Docker had already created the Notary project as an implementation of <a href="https://github.com/theupdateframework/specification">The Update Framework (TUF)</a>, and this implementation of TUF provided the capabilities for Docker Content Trust.
{{< /case-studies/lead >}}
<p>"After contribution to CNCF of both TUF and Notary, we perceived that it was becoming the de facto standard for image signing in the container ecosystem", says Michael Hough, a software developer with the IBM Cloud Container Registry team.</p>
<p>The key reason for selecting Notary was that it was already compatible with the existing authentication stack IBM's container registry was using. So was the design of TUF, which does not require the registry team to have to enter the business of key management. Both of these were "attractive design decisions that confirmed our choice of Notary," he says.</p>
<p>The introduction of Notary to implement image signing capability in IBM Cloud encourages increased security across IBM's cloud platform, "where we expect it will include both the signing of official IBM images as well as expected use by security-conscious enterprise customers," Hough says. "When combined with security policy implementations, we expect an increased use of deployment policies in CI/CD pipelines that allow for fine-grained control of service deployment based on image signers."</p>
<p>The availability of image signing "is a huge benefit to security-conscious customers who require this level of image provenance and security," Hough says. "With our IBM Cloud Kubernetes as-a-service offering and the admission controller we have made available, it allows both IBM services as well as customers of the IBM public cloud to use security policies to control service deployment."</p>
{{< case-studies/quote
image="/images/case-studies/ibm/banner3.jpg"
author="Michael Hough, a software developer with the IBM Cloud Container Registry team"
>}}
"Image signing is one key part of our Kubernetes container service offering, and our container registry team saw Notary as the de facto way to implement that capability in the current Docker and container ecosystem"
{{< /case-studies/quote >}}
<p>Now that the Notary-implemented service is generally available in IBM's public cloud as a component of its existing IBM Cloud Container Registry, it is deployed as a highly available service across five IBM Cloud regions. This high-availability deployment has three instances across two zones in each of the five regions, load balanced with failover support. "We have also deployed it with end-to-end TLS support through to our back-end IBM Cloudant persistence storage service," Hough says.</p>
<p>The IBM team has created and open sourced a Kubernetes admission controller called Portieris, which uses Notary signing information combined with customer-defined security policies to control image deployment into their cluster. "We are hoping to drive adoption of Portieris through its use of our Notary offering," Hough says.</p>
<p>IBM has been a key player in the creation and support of open source foundations, including CNCF. Todd Moore, IBM's vice president of Open Technology, is the current CNCF governing board chair and a number of IBMers are active across many of the CNCF member projects.</p>
{{< case-studies/quote
image="/images/case-studies/ibm/banner4.jpg"
author="Michael Hough, a software developer with the IBM Cloud Container Registry team"
>}}
"With our IBM Cloud Kubernetes as-a-service offering and the admission controller we have made available, it allows both IBM services as well as customers of the IBM public cloud to use security policies to control service deployment."
{{< /case-studies/quote >}}
<p>"Given that, we see CNCF as a safe haven for cloud native open source, providing stability, longevity, and expected maintenance for member projects—no matter the originating vendor or project," Hough says. Because the entire cloud native world is a fast-moving area with many competing vendors and solutions, "we see the CNCF model as an arbiter of openness and fair play across the ecosystem," he says.</p>
<p>With both TUF and Notary as part of CNCF, IBM expects there to be standardization around these capabilities beyond just de facto standards for signing and provenance. IBM has determined to not simply consume Notary, but also to contribute to the open source project where applicable. "IBMers have contributed a CouchDB backend to support our use of IBM Cloudant as the persistent store; and are working on generalization of the pkcs11 provider, allowing support of other security hardware devices beyond Yubikey," Hough says.</p>
{{< case-studies/quote author="Michael Hough, a software developer with the IBM Cloud Container Registry team" >}}
"There are new projects addressing these challenges, including within CNCF. We will definitely be following these advancements with interest. We found the Notary community to be an active and friendly community open to changes, such as our addition of a CouchDB backend for persistent storage."
{{< /case-studies/quote >}}
<p>The company has used other CNCF projects <a href="https://containerd.io">containerd</a>, <a href="https://www.envoyproxy.io">Envoy</a>, <a href="https://prometheus.io">Prometheus</a>, <a href="https://grpc.io">gRPC</a>, and <a href="https://github.com/containernetworking">CNI</a>, and is looking into <a href="https://github.com/spiffe">SPIFFE</a> and <a href="https://github.com/spiffe/spire">SPIRE</a> as well for potential future use.</p>
<p>What advice does Hough have for other companies that are looking to deploy Notary or a cloud native infrastructure?</p>
<p>"While this is true for many areas of cloud native infrastructure software, we found that a high-availability, multi-region deployment of Notary requires a solid implementation to handle certificate management and rotation," he says. "There are new projects addressing these challenges, including within CNCF. We will definitely be following these advancements with interest. We found the Notary community to be an active and friendly community open to changes, such as our addition of a CouchDB backend for persistent storage."</p>

View File

@ -1,78 +0,0 @@
---
title: ING Case Study
linkTitle: ING
case_study_styles: true
cid: caseStudies
weight: 50
featured: true
quote: >
The big cloud native promise to our business is the ability to go from idea to production within 48 hours. We are some years away from this, but that's quite feasible to us.
new_case_study_styles: true
heading_background: /images/case-studies/ing/banner1.jpg
heading_title_logo: /images/ing_logo.png
subheading: >
Driving Banking Innovation with Cloud Native
case_study_details:
- Company: ING
- Location: Amsterdam, Netherlands
- Industry: Finance
---
<h2>Challenge</h2>
<p>After undergoing an agile transformation, <a href="https://www.ing.com/">ING</a> realized it needed a standardized platform to support the work their developers were doing. "Our DevOps teams got empowered to be autonomous," says Infrastructure Architect Thijs Ebbers. "It has benefits; you get all kinds of ideas. But a lot of teams are going to devise the same wheel. Teams started tinkering with <a href="https://www.docker.com/">Docker</a>, Docker Swarm, <a href="https://kubernetes.io/">Kubernetes</a>, <a href="https://mesosphere.com/">Mesos</a>. Well, it's not really useful for a company to have one hundred wheels, instead of one good wheel.</p>
<h2>Solution</h2>
<p>Using Kubernetes for container orchestration and Docker for containerization, the ING team began building an internal public cloud for its CI/CD pipeline and green-field applications. The pipeline, which has been built on Mesos Marathon, will be migrated onto Kubernetes. The bank-account management app <a href="https://www.yolt.com/">Yolt</a> in the U.K. (and soon France and Italy) market already is live hosted on a Kubernetes framework. At least two greenfield projects currently on the Kubernetes framework will be going into production later this year. By the end of 2018, the company plans to have converted a number of APIs used in the banking customer experience to cloud native APIs and host these on the Kubernetes-based platform.</p>
<h2>Impact</h2>
<p>"Cloud native technologies are helping our speed, from getting an application to test to acceptance to production," says Infrastructure Architect Onno Van der Voort. "If you walk around ING now, you see all these DevOps teams, doing stand-ups, demoing. They try to get new functionality out there really fast. We held a hackathon for one of our existing components and basically converted it to cloud native within 2.5 days, though of course the tail takes more time before code is fully production ready."</p>
{{< case-studies/quote author="Thijs Ebbers, Infrastructure Architect, ING">}}
"The big cloud native promise to our business is the ability to go from idea to production within 48 hours. We are some years away from this, but that's quite feasible to us."
{{< /case-studies/quote >}}
{{< case-studies/lead >}}
ING has long embraced innovation in banking, launching the internet-based ING Direct in 1997.
{{< /case-studies/lead >}}
<p>In that same spirit, the company underwent an agile transformation a few years ago. "Our DevOps teams got empowered to be autonomous," says Infrastructure Architect Thijs Ebbers. "It has benefits; you get all kinds of ideas. But a lot of teams are going to devise the same wheel. Teams started tinkering with Docker, Docker Swarm, Kubernetes, Mesos. Well, it's not really useful for a company to have one hundred wheels, instead of one good wheel."</p>
<p>Looking to standardize the deployment process within the company's strict security guidelines, the team looked at several solutions and found that in the past year, "Kubernetes won the container management framework wars," says Ebbers. "We decided to standardize ING on a Kubernetes framework." Everything is run on premise due to banking regulations, he adds, but "we will be building an internal public cloud. We are trying to get on par with what public clouds are doing. That's one of the reasons we got Kubernetes."</p>
<p>They also embraced Docker to address a major pain point in ING's CI/CD pipeline. Before containerization, "Every development team had to order a VM, and it was quite a heavy delivery model for them," says Infrastructure Architect Onno Van der Voort. "Another use case for containerization is when the application travels through the pipeline, they fire up Docker containers to do test work against the applications and after they've done the work, the containers get killed again."</p>
{{< case-studies/quote
image="/images/case-studies/ing/banner3.jpg"
author="Thijs Ebbers, Infrastructure Architect, ING"
>}}
"We decided to standardize ING on a Kubernetes framework." Everything is run on premise due to banking regulations, he adds, but "we will be building an internal public cloud. We are trying to get on par with what public clouds are doing. That's one of the reasons we got Kubernetes."
{{< /case-studies/quote >}}
<p>Because of industry regulations, applications are only allowed to go through the pipeline, where compliance is enforced, rather than be deployed directly into a container. "We have to run the complete platform of services we need, many routing from different places," says Van der Voort. "We need this Kubernetes framework for deploying the containers, with all those components, monitoring, logging. It's complex." For that reason, ING has chosen to start on the <a href="https://www.openshift.org/">OpenShift Origin</a> Kubernetes distribution.</p>
<p>Already, "cloud native technologies are helping our speed, from getting an application to test to acceptance to production," says Van der Voort. "If you walk around ING now, you see all these DevOps teams, doing stand-ups, demoing. They try to get new functionality out there really fast. We held a hackathon for one of our existing components and basically converted it to cloud native within 2.5 days, though of course the tail takes more time before code is fully production ready."</p>
<p>The pipeline, which has been built on Mesos Marathon, will be migrated onto Kubernetes. Some legacy applications are also being rewritten as cloud native in order to run on the framework. At least two smaller greenfield projects built on Kubernetes will go into production this year. By the end of 2018, the company plans to have converted a number of APIs used in the banking customer experience to cloud native APIs and host these on the Kubernetes-based platform.</p>
{{< case-studies/quote
image="/images/case-studies/ing/banner4.jpg"
author="Onno Van der Voort, Infrastructure Architect, ING"
>}}
"We have to run the complete platform of services we need, many routing from different places. We need this Kubernetes framework for deploying the containers, with all those components, monitoring, logging. It's complex."
{{< /case-studies/quote >}}
<p>The team, however, doesn't see the bank's back-end systems going onto the Kubernetes platform. "Our philosophy is it only makes sense to move things to cloud if they are cloud native," says Van der Voort. "If you have traditional architecture, build traditional patterns, it doesn't hold any value to go to the cloud." Adds Cloud Platform Architect Alfonso Fernandez-Barandiaran: "ING has a strategy about where we will go, in order to improve our agility. So it's not about how cool this technology is, it's about finding the right technology and the right approach."</p>
<p>The Kubernetes framework will be hosting some greenfield projects that are high priority for ING: applications the company is developing in response to <a href="https://ec.europa.eu/info/law/payment-services-psd-2-directive-eu-2015-2366_en">PSD2</a>, the European Commission directive requiring more innovative online and mobile payments that went into effect at the beginning of 2018. For example, a bank-account management app called <a href="https://www.yolt.com/">Yolt</a>, serving the U.K. market (and soon France and Italy), was built on a Kubernetes platform and has gone into production. ING is also developing blockchain-enabled applications that will live on the Kubernetes platform. "We've been contacted by a lot of development teams that have ideas with what they want to do with containers," says Ebbers.</p>
{{< case-studies/quote author="Alfonso Fernandez-Barandiaran, Cloud Platform Architect, ING" >}}
Even with the particular requirements that come in banking, ING has managed to take a lead in technology and innovation. "Every time we have constraints, we look for maybe a better way that we can use this technology."
{{< /case-studies/quote >}}
<p>Even with the particular requirements that come in banking, ING has managed to take a lead in technology and innovation. "Every time we have constraints, we look for maybe a better way that we can use this technology," says Fernandez-Barandiaran.</p>
<p>The results, after all, are worth the effort. "The big cloud native promise to our business is the ability to go from idea to production within 48 hours," says Ebbers. "That would require all these projects to be mature. We are some years away from this, but that's quite feasible to us."</p>

Binary file not shown.

Before

Width:  |  Height:  |  Size: 10 KiB

View File

@ -1,87 +0,0 @@
---
title: NAIC Case Study
linkTitle: NAIC
case_study_styles: true
cid: caseStudies
logo: naic_featured_logo.png
featured: false
new_case_study_styles: true
heading_background: /images/case-studies/naic/banner1.jpg
heading_title_logo: /images/naic_logo.png
subheading: >
A Culture and Technology Transition Enabled by Kubernetes
case_study_details:
- Company: NAIC
- Location: Washington, DC
- Industry: Regulatory
---
<h2>Challenge</h2>
<p>The <a href="http://www.naic.org/">National Association of Insurance Commissioners (NAIC)</a>, the U.S. standard-setting and regulatory support organization, was looking for a way to deliver new services faster to provide more value for members and staff. It also needed greater agility to improve productivity internally.</p>
<h2>Solution</h2>
<p>Beginning in 2016, they started using <a href="https://www.cncf.io/">Cloud Native Computing Foundation (CNCF)</a> tools such as <a href="https://prometheus.io/">Prometheus</a>. NAIC began hosting internal systems and development systems on <a href="https://kubernetes.io/">Kubernetes</a> at the beginning of 2018, as part of a broad move toward the public cloud. "Our culture and technology transition is a strategy embraced by our top leaders," says Dan Barker, Chief Enterprise Architect. "It has already proven successful by allowing us to accelerate our value pipeline by more than double while decreasing our costs by more than half. We are also seeing customer satisfaction increase as we add more and more applications to these new technologies."</p>
<h2>Impact</h2>
<p>Leveraging Kubernetes, "our development teams can create rapid prototypes far faster than they used to," Barker said. Applications running on Kubernetes are more resilient than those running in other environments. The deployment of open source solutions is helping influence company culture, as NAIC becomes a more open and transparent organization.</p>
<p>"We completed a small prototype in two days that would have previously taken at least a month," Barker says. Resiliency is currently measured in how much downtime systems have. "They've basically had none, and the occasional issue is remedied in minutes," he says.</p>
{{< case-studies/quote author="Dan Barker, Chief Enterprise Architect, NAIC" >}}
"Our culture and technology transition is a strategy embraced by our top leaders. It has already proven successful by allowing us to accelerate our value pipeline by more than double while decreasing our costs by more than half. We are also seeing customer satisfaction increase as we add more and more applications to these new technologies."
{{< /case-studies/quote >}}
<p>NAIC—which was created and overseen by the chief insurance regulators from the 50 states, the District of Columbia and five U.S. territories—provides a means through which state insurance regulators establish standards and best practices, conduct peer reviews, and coordinate their regulatory oversight. Their staff supports these efforts and represents the collective views of regulators in the United States and internationally. NAIC members, together with the organization's central resources, form the national system of state-based insurance regulation in the United States.</p>
<p>The organization has been using the cloud for years, and wanted to find more ways to quickly deliver new services that provide more value for members and staff. They looked to Kubernetes for a solution. Within NAIC, several groups are leveraging Kubernetes, one being the Platform Engineering Team. "The team building out these tools are not only deploying and operating Kubernetes, but they're also using them," Barker says. "In fact, we're using GitLab to deploy Kubernetes with a pipeline using <a href="https://github.com/kubernetes/kops">kops</a>. This team was created from developers, operators, and quality engineers from across the company, so their jobs have changed quite a bit."</p>
<p>In addition, NAIC is onboarding teams to the new platform, and those teams have seen a lot of change in how they work and what they can do. "They now have more power in creating their own infrastructure and deploying their own applications," Barker says. They also use pipelines to facilitate their currently manual processes. NAIC has consumers who are using GitLab heavily, and they're starting to use Kubernetes to deploy simple applications that help their internal processes.</p>
{{< case-studies/quote
image="/images/case-studies/naic/banner3.jpg"
author="Dan Barker, Chief Enterprise Architect, NAIC"
>}}
"In our experience, vendor lock-in and tooling that is highly specific results in less resilient technology with fewer minds working to solve problems and grow the community."
{{< /case-studies/quote >}}
<p>"We needed greater agility to enable our own productivity internally," he says. "We decided it was right for us to move everything to the public cloud [Amazon Web Services] to help with that process and be able to access many of the native tools that allows us to move faster by not needing to build everything."
The NAIC also wanted to be cloud-agnostic, "and Kubernetes helps with this for our compute layer," Barker says. "Compute is pretty standard across the clouds, and now we can take advantage of any of them while getting all of the other features Kubernetes offers."</p>
<p>The NAIC currently hosts internal systems and development systems on Kubernetes, and has already seen how impactful it can be. "Our development teams can create rapid prototypes in minutes instead of weeks," Barker says. "This recently happened with an internal tool that had no measurable wait time on the infrastructure. It was solely development bound. There is now a central shared resource that lives in AWS, which means it can grow as needed."</p>
<p>The native integrations into Kubernetes at NAIC has made it easy to write code and have it running in minutes instead of weeks. Applications running on Kubernetes have also proven to be more resilient than those running in other environments. "We even have teams using this to create more internal tools to help with communication or automating some of their current tasks," Barker says.</p>
<p>"We knew that Kubernetes had become the de facto standard for container orchestration," he says. "Two major factors for selecting this were the three major cloud vendors hosting their own versions and having it hosted in a neutral party as fully open source."</p>
<p>As for other CNCF projects, NAIC is using Prometheus on a small scale and hopes to continue using it moving forward because of the seamless integration with Kubernetes. The Association also is considering <a href="https://grpc.io/">gRPC</a> as its internal communications standard, <a href="https://www.envoyproxy.io/">Envoy</a> in conjunction with Istio for service mesh, <a href="http://opentracing.io/">OpenTracing</a> and <a href="https://www.jaegertracing.io">Jaeger</a> for tracing aggregation, and <a href="https://www.fluentd.org/">Fluentd</a> with its Elasticsearch cluster.</p>
{{< case-studies/quote
image="/images/case-studies/naic/banner4.jpg"
author="Dan Barker, Chief Enterprise Architect, NAIC"
>}}
"We knew that Kubernetes had become the de facto standard for container orchestration. Two major factors for selecting this were the three major cloud vendors hosting their own versions and having it hosted in a neutral party as fully open source."
{{< /case-studies/quote >}}
<p>The open governance and broad industry participation in CNCF provided a comfort level with the technology, Barker says. "We also see it as helping to influence our own company culture," he says. "We're moving to be a more open and transparent company, and we are encouraging our staff to get involved with the different working groups and codebases. We recently became CNCF members to help further our commitment to community contribution and transparency."</p>
<p>Factors such as vendor-neutrality and cross-industry investment were important in the selection. "In our experience, vendor lock-in and tooling that is highly specific results in less resilient technology with fewer minds working to solve problems and grow the community," Barker says.</p>
<p>NAIC is a largely Oracle shop, Barker says, and has been running mostly Java on JBoss. "However, we have years of history with other applications," he says. "Some of these have been migrated by completely rewriting the application, while others are just being modified slightly to fit into this new paradigm."</p>
<p>Running on AWS cloud, the Association has not specifically taken a microservices approach. "We are moving to microservices where practical, but we haven't found that it's a necessity to operate them within Kubernetes," Barker says.</p>
<p>All of its databases are currently running within public cloud services, but they have explored eventually running those in Kubernetes, as it makes sense. "We're doing this to get more reuse from common components and to limit our failure domains to something more manageable and observable," Barker says.</p>
{{< case-studies/quote author="Dan Barker, Chief Enterprise Architect, NAIC" >}}
"We have been able to move much faster at lower cost than we were able to in the past," Barker says. "We were able to complete one of our projects in a year, when the previous version took over two years. And the new project cost $500,000 while the original required $3 million, and with fewer defects. We are also able to push out new features much faster."
{{< /case-studies/quote >}}
<p>NAIC has seen a significant business impact from its efforts. "We have been able to move much faster at lower cost than we were able to in the past," Barker says. "We were able to complete one of our projects in a year, when the previous version took over two years. And the new project cost $500,000 while the original required $3 million, and with fewer defects. We are also able to push out new features much faster."</p>
<p>He says the organization is moving toward continuous deployment "because the business case makes sense. The research is becoming very hard to argue with. We want to reduce our batch sizes and optimize on delivering value to customers and not feature count. This is requiring a larger cultural shift than just a technology shift."</p>
<p>NAIC is "becoming more open and transparent, as well as more resilient to failure," Barker says. "Even our customers are wanting more and more of this and trying to figure out how they can work with us to accomplish our mutual goals faster. Members of the insurance industry have reached out so that we can better learn together and grow as an industry."</p>

Binary file not shown.

Before

Width:  |  Height:  |  Size: 9.2 KiB

View File

@ -1,74 +0,0 @@
---
title: New York Times Case Study
case_study_styles: true
cid: caseStudies
new_case_study_styles: true
heading_background: /images/case-studies/newyorktimes/banner1.jpg
heading_title_logo: /images/newyorktimes_logo.png
subheading: >
The New York Times: From Print to the Web to Cloud Native
case_study_details:
- Company: New York Times
- Location: New York, N.Y.
- Industry: News Media
---
<h2>Challenge</h2>
<p>When the company decided a few years ago to move out of its data centers, its first deployments on the public cloud were smaller, less critical applications managed on virtual machines. "We started building more and more tools, and at some point we realized that we were doing a disservice by treating Amazon as another data center," says Deep Kapadia, Executive Director, Engineering at The New York Times. Kapadia was tapped to lead a Delivery Engineering Team that would "design for the abstractions that cloud providers offer us."</p>
<h2>Solution</h2>
<p>The team decided to use <a href="https://cloud.google.com/">Google Cloud Platform</a> and its Kubernetes-as-a-service offering, <a href="https://cloud.google.com/kubernetes-engine/">GKE</a>.</p>
<h2>Impact</h2>
<p>Speed of delivery increased. Some of the legacy VM-based deployments took 45 minutes; with Kubernetes, that time was "just a few seconds to a couple of minutes," says Engineering Manager Brian Balser. Adds Li: "Teams that used to deploy on weekly schedules or had to coordinate schedules with the infrastructure team now deploy their updates independently, and can do it daily when necessary." Adopting Cloud Native Computing Foundation technologies allows for a more unified approach to deployment across the engineering staff, and portability for the company.</p>
{{< case-studies/quote author="Deep Kapadia, Executive Director, Engineering at The New York Times" >}}
{{< youtube DqS_IPw-c6o youtube-quote-sm >}}
{{< youtube Tm4VfJtOHt8 youtube-quote-sm >}}
<br>
"I think once you get over the initial hump, things get a lot easier and actually a lot faster."
{{< /case-studies/quote >}}
<p>Founded in 1851 and known as the newspaper of record, The New York Times is a digital pioneer: Its first website launched in 1996, before Google even existed. After the company decided a few years ago to move out of its private data centers—including one located in the pricy real estate of Manhattan. It recently took another step into the future by going cloud native.</p>
<p>At first, the infrastructure team "managed the virtual machines in the Amazon cloud, and they deployed more critical applications in our data centers and the less critical ones on <a href="https://aws.amazon.com/">AWS</a> as an experiment," says Deep Kapadia, Executive Director, Engineering at The New York Times. "We started building more and more tools, and at some point we realized that we were doing a disservice by treating Amazon as another data center."</p>
<p>To get the most out of the cloud, Kapadia was tapped to lead a new Delivery Engineering Team that would "design for the abstractions that cloud providers offer us." In mid-2016, they began looking at the <a href="https://cloud.google.com/">Google Cloud Platform</a> and its Kubernetes-as-a-service offering, <a href="https://cloud.google.com/kubernetes-engine/">GKE</a>.</p>
<p>At the time, says team member Tony Li, a Site Reliability Engineer, "We had some internal tooling that attempted to do what Kubernetes does for containers, but for VMs. We asked why are we building and maintaining these tools ourselves?"</p>
<p>In early 2017, the first production application—the nytimes.com mobile homepage—began running on Kubernetes, serving just 1% of the traffic. Today, almost 100% of the nytimes.com site's end-user facing applications run on GCP, with the majority on Kubernetes.</p>
{{< case-studies/quote image="/images/case-studies/newyorktimes/banner3.jpg" >}}
"We had some internal tooling that attempted to do what Kubernetes does for containers, but for VMs. We asked why are we building and maintaining these tools ourselves?"
{{< /case-studies/quote >}}
<p>The team found that the speed of delivery was immediately impacted. "Deploying Docker images versus spinning up VMs was quite a lot faster," says Engineering Manager Brian Balser. Some of the legacy VM-based deployments took 45 minutes; with Kubernetes, that time was "just a few seconds to a couple of minutes."</p>
<p>The plan is to get as much as possible, not just the website, running on Kubernetes, and beyond that, moving toward serverless deployments. For instance, The New York Times crossword app was built on <a href="https://cloud.google.com/appengine/">Google App Engine</a>, which has been the main platform for the company's experimentation with serverless. "The hardest part was getting the engineers over the hurdle of how little they had to do," Chief Technology Officer Nick Rockwell recently told The CTO Advisor. "Our experience has been very, very good. We have invested a lot of work into deploying apps on container services, and I'm really excited about experimenting with deploying those on App Engine Flex and <a href="https://aws.amazon.com/fargate/">AWS Fargate</a> and seeing how that feels, because that's a great migration path."</p>
<p>There are some exceptions to the move to cloud native, of course. "We have the print publishing business as well," says Kapadia. "A lot of that is definitely not going down the cloud-native path because they're using vendor software and even special machinery that prints the physical paper. But even those teams are looking at things like App Engine and Kubernetes if they can."</p>
<p>Kapadia acknowledges that there was a steep learning curve for some engineers, but "I think once you get over the initial hump, things get a lot easier and actually a lot faster."</p>
{{< case-studies/quote image="/images/case-studies/newyorktimes/banner4.jpg" >}}
"Right now, every team is running a small Kubernetes cluster, but it would be nice if we could all live in a larger ecosystem," says Kapadia. "Then we can harness the power of things like service mesh proxies that can actually do a lot of instrumentation between microservices, or service-to-service orchestration. Those are the new things that we want to experiment with as we go forward."
{{< /case-studies/quote >}}
<p>At The New York Times, they did. As teams started sharing their own best practices with each other, "We're no longer the bottleneck for figuring out certain things," Kapadia says. "Most of the infrastructure and systems were managed by a centralized function. We've sort of blown that up, partly because Google and Amazon have tools that allow us to do that. We provide teams with complete ownership of their Google Cloud Platform projects, and give them a set of sensible defaults or standards. We let them know, 'If this works for you as is, great! If not, come talk to us and we'll figure out how to make it work for you.'"</p>
<p>As a result, "It's really allowed teams to move at a much more rapid pace than they were able to in the past," says Kapadia. Adds Li: "The use of GKE means each team can get their own compute cluster, reducing the number of individual instances they have to care about since developers can treat the cluster as a whole. Because the ticket-based workflow was removed from requesting resources and connections, developers can just call an API to get what they want. Teams that used to deploy on weekly schedules or had to coordinate schedules with the infrastructure team now deploy their updates independently, and can do it daily when necessary."</p>
<p>Another benefit to adopting Kubernetes: allowing for a more unified approach to deployment across the engineering staff. "Before, many teams were building their own tools for deployment," says Balser. With Kubernetes—as well as the other CNCF projects The New York Times uses, including Fluentd to collect logs for all of its AWS servers, gRPC for its <a href="https://open.nytimes.com/publishing-with-apache-kafka-at-the-new-york-times-7f0e3b7d2077">Publishing Pipeline</a>, Prometheus, and Envoy—"we can benefit from the advances that each of these technologies make, instead of trying to catch up."</p>
{{< case-studies/quote >}}
Li calls the Cloud Native Computing Foundation's projects "a northern star that we can all look at and follow."
{{< /case-studies/quote >}}
<p>These open-source technologies have given the company more portability. "CNCF has enabled us to follow an industry standard," says Kapadia. "It allows us to think about whether we want to move away from our current service providers. Most of our applications are connected to Fluentd. If we wish to switch our logging provider from provider A to provider B we can do that. We're running Kubernetes in GCP today, but if we want to run it in Amazon or Azure, we could potentially look into that as well."</p>
<p>Li calls the Cloud Native Computing Foundation's projects "a northern star that we can all look at and follow." Led by that star, the team is looking ahead to a year of onboarding the remaining half of the 40 or so product engineering teams to extract even more value out of the technology. "Right now, every team is running a small Kubernetes cluster, but it would be nice if we could all live in a larger ecosystem," says Kapadia. "Then we can harness the power of things like service mesh proxies that can actually do a lot of instrumentation between microservices, or service-to-service orchestration. Those are the new things that we want to experiment with as we go forward."</p>

Binary file not shown.

Before

Width:  |  Height:  |  Size: 20 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 10 KiB

View File

@ -1,75 +0,0 @@
---
title: Nordstrom Case Study
case_study_styles: true
cid: caseStudies
new_case_study_styles: true
heading_background: /images/case-studies/nordstrom/banner1.jpg
heading_title_logo: /images/nordstrom_logo.png
subheading: >
Finding Millions in Potential Savings in a Tough Retail Climate
case_study_details:
- Company: Nordstrom
- Location: Seattle, Washington
- Industry: Retail
---
<h2>Challenge</h2>
<p>Nordstrom wanted to increase the efficiency and speed of its technology operations, which includes the Nordstrom.com e-commerce site. At the same time, Nordstrom Technology was looking for ways to tighten its technology operational costs.</p>
<h2>Solution</h2>
<p>After embracing a DevOps transformation and launching a continuous integration/continuous deployment (CI/CD) project four years ago, the company reduced its deployment time from three months to 30 minutes. But they wanted to go even faster across environments, so they began their cloud native journey, adopting Docker containers orchestrated with <a href="http://kubernetes.io/">Kubernetes</a>.</p>
<h2>Impact</h2>
<p>Nordstrom Technology developers using Kubernetes now deploy faster and can "just focus on writing applications," says Dhawal Patel, a senior engineer on the team building a Kubernetes enterprise platform for Nordstrom. Furthermore, the team has increased Ops efficiency, improving CPU utilization from 5x to 12x depending on the workload. "We run thousands of virtual machines (VMs), but aren't effectively using all those resources," says Patel. "With Kubernetes, without even trying to make our cluster efficient, we are currently at a 10x increase."</p>
{{< case-studies/quote author="Dhawal Patel, senior engineer at Nordstrom" >}}
"We are always looking for ways to optimize and provide more value through technology. With Kubernetes we are showcasing two types of efficiency that we can bring: Dev efficiency and Ops efficiency. It's a win-win."
{{< /case-studies/quote >}}
<p>When Dhawal Patel joined <a href="http://shop.nordstrom.com/">Nordstrom</a> five years ago as an application developer for the retailer's website, he realized there was an opportunity to help speed up development cycles.</p>
<p>In those early DevOps days, Nordstrom Technology still followed a traditional model of silo teams and functions. "As a developer, I was spending more time fixing environments than writing code and adding value to business," Patel says. "I was passionate about that—so I was given the opportunity to help fix it."</p>
<p>The company was eager to move faster, too, and in 2013 launched the first continuous integration/continuous deployment (CI/CD) project. That project was the first step in Nordstrom's cloud native journey.</p>
<p>Dev and Ops team members built a CI/CD pipeline, working with the company's servers on premise. The team chose <a href="https://www.chef.io/chef/">Chef</a>, and wrote cookbooks that automated virtual IP creation, servers, and load balancing. "After we completed the project, deployment went from three months to 30 minutes," says Patel. "We still had multiple environments—dev, test, staging, then production—so with each environment running the Chef cookbooks, it took 30 minutes. It was a huge achievement at that point."</p>
<p>But new environments still took too long to turn up, so the next step was working in the cloud. Today, Nordstrom Technology has built an enterprise platform that allows the company's 1,500 developers to deploy applications running as Docker containers in the cloud, orchestrated with Kubernetes.</p>
{{< case-studies/quote image="/images/case-studies/nordstrom/banner3.jpg" >}}
"We made a bet that Kubernetes was going to take off, informed by early indicators of community support and project velocity, so we rebuilt our system with Kubernetes at the core,"
{{< /case-studies/quote >}}
<p>"The cloud provided faster access to resources, because it took weeks for us to get a virtual machine (VM) on premises," says Patel. "But now we can do the same thing in only five minutes."</p>
<p>Nordstrom's first foray into scheduling containers on a cluster was a homegrown system based on CoreOS fleet. They began doing a few proofs of concept projects with that system until Kubernetes 1.0 was released when they made the switch. "We made a bet that Kubernetes was going to take off, informed by early indicators of community support and project velocity, so we rebuilt our system with Kubernetes at the core," says Marius Grigoriu, Sr. Manager of the Kubernetes team at Nordstrom.</p>
<p>While Kubernetes is often thought as a platform for microservices, the first application to launch on Kubernetes in a critical production role at Nordstrom was Jira. "It was not the ideal microservice we were hoping to get as our first application," Patel admits, "but the team that was working on it was really passionate about Docker and Kubernetes, and they wanted to try it out. They had their application running on premises, and wanted to move it to Kubernetes."</p>
<p>The benefits were immediate for the teams that came on board. "Teams running on our Kubernetes cluster loved the fact that they had fewer issues to worry about. They didn't need to manage infrastructure or operating systems," says Grigoriu. "Early adopters loved the declarative nature of Kubernetes. They loved the reduced surface area they had to deal with."</p>
{{< case-studies/quote image="/images/case-studies/nordstrom/banner4.jpg">}}
"Teams running on our Kubernetes cluster loved the fact that they had fewer issues to worry about. They didn't need to manage infrastructure or operating systems," says Grigoriu. "Early adopters loved the declarative nature of Kubernetes. They loved the reduced surface area they had to deal with."
{{< /case-studies/quote >}}
<p>To support these early adopters, Patel's team began growing the cluster and building production-grade services. "We integrated with <a href="https://prometheus.io/">Prometheus</a> for monitoring, with a <a href="https://grafana.com/">Grafana</a> front end; we used <a href="http://www.fluentd.org/">Fluentd</a> to push logs to <a href="https://www.elastic.co/">Elasticsearch</a>, so that gives us log aggregation," says Patel. The team also added dozens of open-source components, including CNCF projects and has made contributions to Kubernetes, Terraform, and kube2iam.</p>
<p>There are now more than 60 development teams running Kubernetes in Nordstrom Technology, and as success stories have popped up, more teams have gotten on board. "Our initial customer base, the ones who were willing to try this out, are now going and evangelizing to the next set of users," says Patel. "One early adopter had Docker containers and he was not sure how to run it in production. We sat with him and within 15 minutes we deployed it in production. He thought it was amazing, and more people in his org started coming in."</p>
<p>For Nordstrom Technology, going cloud-native has vastly improved development and operational efficiency. The developers using Kubernetes now deploy faster and can focus on building value in their applications. One such team started with a 25-minute merge to deploy by launching virtual machines in the cloud. Switching to Kubernetes was a 5x speedup in their process, improving their merge to deploy time to 5 minutes.</p>
{{< case-studies/quote >}}
"With Kubernetes, without even trying to make our cluster efficient, we are currently at 40 percent CPU utilization—a 10x increase. we are running 2600+ customer pods that would have been 2600+ VMs if they had gone directly to the cloud. We are running them on 40 VMs now, so that's a huge reduction in operational overhead."
{{< /case-studies/quote >}}
<p>Speed is great, and easily demonstrated, but perhaps the bigger impact lies in the operational efficiency. "We run thousands of VMs on AWS, and their overall average CPU utilization is about four percent," says Patel. "With Kubernetes, without even trying to make our cluster efficient, we are currently at 40 percent CPU utilization—a 10x increase. We are running 2600+ customer pods that would have been 2600+ VMs if they had gone directly to the cloud. We are running them on 40 VMs now, so that's a huge reduction in operational overhead."</p>
<p>Nordstrom Technology is also exploring running Kubernetes on bare metal on premises. "If we can build an on-premises Kubernetes cluster," says Patel, "we could bring the power of cloud to provision resources fast on-premises. Then for the developer, their interface is Kubernetes; they might not even realize or care that their services are now deployed on premises because they're only working with Kubernetes."</p>
<p>For that reason, Patel is eagerly following Kubernetes' development of multi-cluster capabilities. "With cluster federation, we can have our on-premise as the primary cluster and the cloud as a secondary burstable cluster," he says. "So, when there is an anniversary sale or Black Friday sale, and we need more containers - we can go to the cloud."</p>
<p>That kind of possibility—as well as the impact that Grigoriu and Patel's team has already delivered using Kubernetes—is what led Nordstrom on its cloud native journey in the first place. "The way the retail environment is today, we are trying to build responsiveness and flexibility where we can," says Grigoriu. "Kubernetes makes it easy to: bring efficiency to both the Dev and Ops side of the equation. It's a win-win."</p>

Binary file not shown.

Before

Width:  |  Height:  |  Size: 7.2 KiB

View File

@ -1,69 +0,0 @@
---
title: Northwestern Mutual Case Study
case_study_styles: true
cid: caseStudies
new_case_study_styles: true
heading_background: /images/case-studies/northwestern/banner1.jpg
heading_title_logo: /images/northwestern_logo.png
subheading: >
Cloud Native at Northwestern Mutual
case_study_details:
- Company: Northwestern Mutual
- Location: Milwaukee, WI
- Industry: Insurance and Financial Services
---
<h2>Challenge</h2>
<p>In the spring of 2015, Northwestern Mutual acquired a fintech startup, LearnVest, and decided to take "Northwestern Mutual's leading products and services and meld it with LearnVest's digital experience and innovative financial planning platform," says Brad Williams, Director of Engineering for Client Experience, Northwestern Mutual. The company's existing infrastructure had been optimized for batch workflows hosted on on-prem networks; deployments were very traditional, focused on following a process instead of providing deployment agility. "We had to build a platform that was elastically scalable, but also much more responsive, so we could quickly get data to the client website so our end-customers have the experience they expect," says Williams.</p>
<h2>Solution</h2>
<p>The platform team came up with a plan for using the public cloud (AWS), Docker containers, and Kubernetes for orchestration. "Kubernetes gave us that base framework so teams can be very autonomous in what they're building and deliver very quickly and frequently," says Northwestern Mutual Cloud Native Engineer Frank Greco Jr. The team also built and open-sourced <a href="https://github.com/northwesternmutual/kanali">Kanali</a>, a Kubernetes-native API management tool that uses OpenTracing, Jaeger, and gRPC.</p>
<h2>Impact</h2>
<p>Before, infrastructure deployments could take weeks; now, it is done in a matter of minutes. The number of deployments has increased dramatically, from about 24 a year to over 500 in just the first 10 months of 2017. Availability has also increased: There used to be a six-hour control window for commits every Sunday morning, as well as other periods of general maintenance, during which outages could happen. "Now we have eliminated the planned outage windows," says Bryan Pfremmer, App Platform Teams Manager, Northwestern Mutual. Kanali has had an impact on the bottom line. The vendor API management product that the company previously used required 23 servers, "dedicated, to only API management," says Pfremmer. "Now it's all integrated in the existing stack and running as another deployment on Kubernetes. And that's just one environment. Between the three that we had plus the test, that's hard dollar savings."</p>
{{< case-studies/quote author="Frank Greco Jr., Cloud Native Engineer at Northwestern Mutual">}}
"In a large enterprise, you're going to have people using Kubernetes, but then you're also going to have people using WAS and .NET. You may not be at a point where your whole stack can be cloud native. What if you can take your API management tool and make it cloud native, but still proxy to legacy systems? Using different pieces that are cloud native, open source and Kubernetes native, you can do pretty innovative stuff."
{{< /case-studies/quote >}}
{{< case-studies/lead >}}
For more than 160 years, Northwestern Mutual has maintained its industry leadership in part by keeping a strong focus on risk management.
{{< /case-studies/lead >}}
<p>For many years, the company took a similar approach to managing its technology and has recently undergone a digital transformation to advance the company's digital strategy - including making a lot of noise in the cloud-native world.</p>
<p>In the spring of 2015, this insurance and financial services company acquired a fintech startup, LearnVest, and decided to take "Northwestern Mutual's leading products and services and meld it with LearnVest's digital experience and innovative financial planning platform," says Brad Williams, Director of Engineering for Client Experience, Northwestern Mutual. The company's existing infrastructure had been optimized for batch workflows hosted on an on-premise datacenter; deployments were very traditional and had to many manual steps that were error prone.</p>
<p>In order to give the company's 4.5 million clients the digital experience they'd come to expect, says Williams, "We had to build a platform that was elastically scalable, but also much more responsive, so we could quickly get data to the client website. We essentially said, 'You build the system that you think is necessary to support a new, modern-facing one.' That's why we departed from anything legacy."</p>
{{< case-studies/quote image="/images/case-studies/northwestern/banner3.jpg" >}}
"Kubernetes has definitely been the right choice for us. It gave us that base framework so teams can be autonomous in what they're building and deliver very quickly and frequently."
{{< /case-studies/quote >}}
<p>Williams and the rest of the platform team decided that the first step would be to start moving from private data centers to AWS. With a new microservice architecture in mind—and the freedom to implement what was best for the organization—they began using Docker containers. After looking into the various container orchestration options, they went with Kubernetes, even though it was still in beta at the time. "There was some debate whether we should build something ourselves, or just leverage that product and evolve with it," says Northwestern Mutual Cloud Native Engineer Frank Greco Jr. "Kubernetes has definitely been the right choice for us. It gave us that base framework so teams can be autonomous in what they're building and deliver very quickly and frequently."</p>
<p>As early adopters, the team had to do a lot of work with Ansible scripts to stand up the cluster. "We had a lot of hard security requirements given the nature of our business," explains Bryan Pfremmer, App Platform Teams Manager, Northwestern Mutual. "We found ourselves running a configuration that very few other people ever tried." The client experience group was the first to use the new platform; today, a few hundred of the company's 1,500 engineers are using it and more are eager to get on board.</p>
<p>The results have been dramatic. Before, infrastructure deployments could take two weeks; now, it is done in a matter of minutes. Now with a focus on Infrastructure automation, and self-service, "You can take an app to production in that same day if you want to," says Pfremmer.</p>
{{< case-studies/quote image="/images/case-studies/northwestern/banner4.jpg" >}}
"Now, developers have autonomy, they can use this whenever they want, however they want. It becomes more valuable the more instrumentation downstream that happens, as we mature in it."
{{< /case-studies/quote >}}
<p>The process used to be so cumbersome that minor bug releases would be bundled with feature releases. With the new streamlined system enabled by Kubernetes, the number of deployments has increased from about 24 a year to more than 500 in just the first 10 months of 2017. Availability has also been improved: There used to be a six-hour control window for commits every early Sunday morning, as well as other periods of general maintenance, during which outages could happen. "Now there's no planned outage window," notes Pfremmer.</p>
<p>Northwestern Mutual built that API management tool—called <a href="https://github.com/northwesternmutual/kanali">Kanali</a>—and open sourced it in the summer of 2017. The team took on the project because it was a key capability for what they were building and prior the solution worked in an "anti-cloud native way that was different than everything else we were doing," says Greco. Now API management is just another container deployed to Kubernetes along with a separate Jaeger deployment.</p>
<p>Now the engineers using the Kubernetes deployment platform have the added benefit of visibility in production—and autonomy. Before, a centralized team and would have to run a trace. "Now, developers have autonomy, they can use this whenever they want, however they want. It becomes more valuable the more instrumentation downstream that happens, as we mature in it." says Greco.</p>
{{< case-studies/quote >}}
"We're trying to make what we're doing known so that we can find people who are like, 'Yeah, that's interesting. I want to come do it!'"
{{< /case-studies/quote >}}
<p>But the team didn't stop there. "In a large enterprise, you're going to have people using Kubernetes, but then you're also going to have people using WAS and .NET," says Greco. "You may not be at a point where your whole stack can be cloud native. What if you can take your API management tool and make it cloud native, but still proxy to legacy systems? Using different pieces that are cloud native, open source and Kubernetes native, you can do pretty innovative stuff."</p>
<p>As the team continues to improve its stack and share its Kubernetes best practices, it feels that Northwestern Mutual's reputation as a technology-first company is evolving too. "No one would think a company that's 160-plus years old is foraying this deep into the cloud and infrastructure stack," says Pfremmer. And they're hoping that means they'll be able to attract new talent. "We're trying to make what we're doing known so that we can find people who are like, 'Yeah, that's interesting. I want to come do it!'"</p>

Binary file not shown.

Before

Width:  |  Height:  |  Size: 9.8 KiB

View File

@ -1,83 +0,0 @@
---
title: Ocado Case Study
linkTitle: Ocado
case_study_styles: true
cid: caseStudies
logo: ocado_featured_logo.png
featured: true
weight: 4
quote: >
People at Ocado Technology have been quite amazed. They ask, 'Can we do this on a Dev cluster?' and 10 minutes later we have rolled out something that is deployed across the cluster. The speed from idea to implementation to deployment is amazing.
new_case_study_styles: true
heading_background: /images/case-studies/ocado/banner1.jpg
heading_title_logo: /images/ocado_logo.png
subheading: >
Ocado: Running Grocery Warehouses with a Cloud Native Platform
case_study_details:
- Company: Ocado Technology
- Location: Hatfield, England
- Industry: Grocery retail technology and platforms
---
<h2>Challenge</h2>
<p>The world's largest online-only grocery retailer, <a href="http://www.ocadogroup.com/">Ocado</a> developed the Ocado Smart Platform to manage its own operations, from websites to warehouses, and is now licensing the technology to other retailers such as <a href="http://fortune.com/2018/05/17/ocado-kroger-warehouse-automation-amazon-walmart/">Kroger</a>. To set up the first warehouses for the platform, Ocado shifted from virtual machines and <a href="https://puppet.com/">Puppet</a> infrastructure to <a href="https://www.docker.com/">Docker</a> containers, using CoreOS's <a href="https://github.com/coreos/fleet">fleet</a> scheduler to provision all the services on its <a href="https://www.openstack.org/">OpenStack</a>-based private cloud on bare metal. As the Smart Platform grew and "fleet was going end-of-life," says Platform Engineer Mike Bryant, "we started looking for a more complete platform, with all of these disparate infrastructure services being brought together in one unified API."</p>
<h2>Solution</h2>
<p>The team decided to migrate from fleet to <a href="https://www.kubernetes.io">Kubernetes</a> on Ocado's private cloud. The Kubernetes stack currently uses <a href="https://github.com/kubernetes/kubeadm/">kubeadm</a> for bootstrapping, <a href="https://github.com/containernetworking">CNI</a> with <a href="https://www.weave.works/oss/net/">Weave Net</a> for networking, <a href="https://coreos.com/operators/prometheus/docs/latest/user-guides/getting-started.html">Prometheus Operator</a> for monitoring, <a href="https://www.fluentd.org/">Fluentd</a> for logging, and <a href="http://opentracing.io/">OpenTracing</a> for distributed tracing. The first app on Kubernetes, a business-critical service in the warehouses, went into production in the summer of 2017, with a mass migration continuing into 2018. Hundreds of Ocado engineers working on the Smart Platform are now deploying on Kubernetes.</p>
<h2>Impact</h2>
<p>With Kubernetes, "the speed from idea to implementation to deployment is amazing," says Bryant. "I've seen features go from development to production inside of a week now. In the old world, a new application deployment could easily take over a month." And because there are no longer restrictive deployment windows in the warehouses, the rate of deployments has gone from as few as two per week to dozens per week. Ocado has also achieved cost savings because Kubernetes gives the team the ability to have more fine-grained resource allocation. Says DevOps Team Leader Kevin McCormack: "We have more confidence in the resource allocation/separation features of Kubernetes, so we have been able to migrate from around 10 fleet clusters to one Kubernetes cluster." The team also uses <a href="https://prometheus.io/">Prometheus</a> and <a href="https://grafana.com/">Grafana</a> to visualize resource allocation, and makes the data available to developers. "The increased visibility offered by Prometheus means developers are more aware of what they are using and how their use impacts others, especially since we now have one shared cluster," says McCormack. "I'd estimate that we use about 15-25% less hardware resources to host the same applications in Kubernetes in our test environments."</p>
{{< case-studies/quote author="Mike Bryant, Platform Engineer, Ocado" >}}
"People at Ocado Technology have been quite amazed. They ask, 'Can we do this on a Dev cluster?' and 10 minutes later we have rolled out something that is deployed across the cluster. The speed from idea to implementation to deployment is amazing."
{{< /case-studies/quote >}}
{{< case-studies/lead >}}
When it was founded in 2000, Ocado was an online-only grocery retailer in the U.K. In the years since, it has expanded from delivering produce to families to providing technology to other grocery retailers.
{{< /case-studies/lead >}}
<p>The company began developing its Ocado Smart Platform to manage its own operations, from websites to warehouses, and is now licensing the technology to other grocery chains around the world, such as <a href="http://fortune.com/2018/05/17/ocado-kroger-warehouse-automation-amazon-walmart/">Kroger</a>. To set up the first warehouses on the platform, Ocado shifted from virtual machines and Puppet infrastructure to Docker containers, using CoreOS's fleet scheduler to provision all the services on its OpenStack-based private cloud on bare metal. As the Smart Platform grew, and "fleet was going end-of-life," says Platform Engineer Mike Bryant, "we started looking for a more complete platform, with all of these disparate infrastructure services being brought together in one unified API."</p>
<p>Bryant had already been using Kubernetes with <a href="https://www.codeforlife.education/">Code for Life</a>, a children's education project that's part of Ocado's charity arm. "We really liked it, so we started looking at it seriously for our production workloads," says Bryant. The team that managed fleet had researched orchestration solutions and landed on Kubernetes as well. "We were looking for a platform with wide adoption, and that was where the momentum was," says DevOps Team Leader Kevin McCormack. The two paths converged, and "We didn't even go through any proof-of-concept stage. The Code for Life work served that purpose," says Bryant.</p>
{{< case-studies/quote
image="/images/case-studies/ocado/banner3.jpg"
author="Kevin McCormack, DevOps Team Leader, Ocado"
>}}
"We were looking for a platform with wide adoption, and that was where the momentum was, the two paths converged, and we didn't even go through any proof-of-concept stage. The Code for Life work served that purpose,"
{{< /case-studies/quote >}}
<p>In the summer of 2016, the team began migrating from fleet to <a href="https://kubernetes.io/">Kubernetes</a> on Ocado's private cloud. The Kubernetes stack currently uses <a href="https://github.com/kubernetes/kubeadm">kubeadm</a> for bootstrapping, <a href="https://github.com/containernetworking">CNI</a> with <a href="https://www.weave.works/oss/net/">Weave Net</a> for networking, <a href="https://coreos.com/operators/prometheus/docs/latest/user-guides/getting-started.html">Prometheus Operator</a> for monitoring, <a href="https://www.fluentd.org/">Fluentd</a> for logging, and <a href="http://opentracing.io/">OpenTracing</a> for distributed tracing.</p>
<p>The first app on Kubernetes, a business-critical service in the warehouses, went into production a year later. Once that app was running smoothly, a mass migration continued into 2018. Hundreds of Ocado engineers working on the Smart Platform are now deploying on Kubernetes, and the platform is live in Ocado's warehouses, managing tens of thousands of orders a week. At full capacity, Ocado's latest warehouse in Erith, southeast London, will deliver more than 200,000 orders per week, making it the world's largest facility for online grocery.</p>
<p>There are about 150 microservices now running on Kubernetes, with multiple instances of many of them. "We're not just deploying all these microservices at once. We're deploying them all for one warehouse, and then they're all being deployed again for the next warehouse, and again and again," says Bryant.</p>
<p>The move to Kubernetes was eye-opening for many people at Ocado Technology. "In the early days of putting the platform into our test infrastructure, the technical architect asked what network performance was like on <a href="https://www.weave.works/oss/net/">Weave Net</a> with encryption turned on," recalls Bryant. "So we found a Docker container for <a href="https://iperf.fr/">iPerf</a>, wrote a daemon set, deployed it. A few moments later, we've deployed the entire thing across this cluster. He was pretty blown away by that."</p>
{{< case-studies/quote
image="/images/case-studies/ocado/banner4.jpg"
author="Mike Bryant, Platform Engineer, Ocado"
>}}
"The unified API of Kubernetes means this is all in one place, and it's one flow for approval and rollout. I've seen features go from development to production inside of a week now. In the old world, a new application deployment could easily take over a month."
{{< /case-studies/quote >}}
<p>Indeed, the impact has been profound. "Prior to containerization, we had quite restrictive deployment windows in our warehouses," says Bryant. "Moving to microservices, we've been able to deploy much more frequently. We've been able to move towards continuous delivery in a number of areas. In our older warehouse, new application deployments involve talking to a bunch of different teams for different levels of the stack: from VM provisioning, to storage, to load balancers, and so on. The unified API of Kubernetes means this is all in one place, and it's one flow for approval and rollout. I've seen features go from development to production inside of a week now. In the old world, a new application deployment could easily take over a month."</p>
<p>The rate of deployment has gone from as few as two per week to dozens per week. "With Kubernetes, some of our development teams have been able to deploy their application to production on the new platform without us noticing," says Bryant, "which means they're faster at doing what they need to do and we have less work."</p>
<p>Ocado has also achieved cost savings because Kubernetes gives the team the ability to have more fine-grained resource allocation. "That lets us shrink quite a lot of our deployments from being per-core VM deployments to having fractions of the core," says Bryant. Adds McCormack: "We have more confidence in the resource allocation/separation features of Kubernetes, so we have been able to migrate from around 10 fleet clusters to one Kubernetes cluster. This means we use our hardware better since if we have to always have two nodes of excess capacity available in case of node failures then we only need two extra instead of 20."</p>
{{< case-studies/quote author="Mike Bryant, Platform Engineer, Ocado" >}}
"CNCF have provided us with support of different technologies. We've been able to adopt those in a very easy fashion. We do like that CNCF is vendor agnostic. We're not being asked to commit to this one way of doing things. The vast diversity of viewpoints in CNCF lead to better technology."
{{< /case-studies/quote >}}
<p>The team also uses <a href="https://prometheus.io/">Prometheus</a> and <a href="https://grafana.com/">Grafana</a> to visualize resource allocation, and makes the data available to developers. "The increased visibility offered by Prometheus means developers are more aware of what they are using and how their use impacts others, especially since we now have one shared cluster," says McCormack. "I'd estimate that we use about 15-25% less hardware resource to host the same applications in Kubernetes in our test environments."</p>
<p>One of the broader benefits of cloud native, says Bryant, is the unified API. "We have one method of doing our deployments that covers the wide range of things we need to do, and we can extend the API," he says. In addition to using Prometheus Operator, the Ocado team has started writing its own operators, some of which have been <a href="https://github.com/ocadotechnology">open sourced</a>. Plus, "CNCF has provided us with support of these different technologies. We've been able to adopt those in a very easy fashion. We do like that CNCF is vendor agnostic. We're not being asked to commit to this one way of doing things. The vast diversity of viewpoints in the CNCF leads to better technology."</p>
<p>Ocado's own technology, in the form of its Smart Platform, will soon be used <a href="http://fortune.com/2018/05/17/ocado-kroger-warehouse-automation-amazon-walmart/">around</a> the world. And cloud native plays a crucial role in this global expansion. "I wouldn't have wanted to try it without Kubernetes," says Bryant. "Kubernetes has made it so much nicer, especially to have that consistent way of deploying all of the applications, then taking the same thing and being able to replicate it. It's very valuable."</p>

Binary file not shown.

Before

Width:  |  Height:  |  Size: 7.9 KiB

View File

@ -1,69 +0,0 @@
---
title: OpenAI Case Study
case_study_styles: true
cid: caseStudies
new_case_study_styles: true
heading_background: /images/case-studies/openAI/banner1.jpg
heading_title_logo: /images/openAI_logo.png
subheading: >
Launching and Scaling Up Experiments, Made Simple
case_study_details:
- Company: OpenAI
- Location: San Francisco, California
- Industry: Artificial Intelligence Research
---
<h2>Challenge</h2>
<p>An artificial intelligence research lab, OpenAI needed infrastructure for deep learning that would allow experiments to be run either in the cloud or in its own data center, and to easily scale. Portability, speed, and cost were the main drivers.</p>
<h2>Solution</h2>
<p>OpenAI began running Kubernetes on top of AWS in 2016, and in early 2017 migrated to Azure. OpenAI runs key experiments in fields including robotics and gaming both in Azure and in its own data centers, depending on which cluster has free capacity. "We use Kubernetes mainly as a batch scheduling system and rely on our <a href="https://github.com/openai/kubernetes-ec2-autoscaler">autoscaler</a> to dynamically scale up and down our cluster," says Christopher Berner, Head of Infrastructure. "This lets us significantly reduce costs for idle nodes, while still providing low latency and rapid iteration."</p>
<h2>Impact</h2>
<p>The company has benefited from greater portability: "Because Kubernetes provides a consistent API, we can move our research experiments very easily between clusters," says Berner. Being able to use its own data centers when appropriate is "lowering costs and providing us access to hardware that we wouldn't necessarily have access to in the cloud," he adds. "As long as the utilization is high, the costs are much lower there." Launching experiments also takes far less time: "One of our researchers who is working on a new distributed training system has been able to get his experiment running in two or three days. In a week or two he scaled it out to hundreds of GPUs. Previously, that would have easily been a couple of months of work."</p>
{{< case-studies/quote >}}
<iframe width="560" height="315" src="https://www.youtube.com/embed/v4N3Krzb8Eg" frameborder="0" allow="autoplay; encrypted-media" allowfullscreen></iframe>
<br>
Check out "Building the Infrastructure that Powers the Future of AI" presented by Vicki Cheung, Member of Technical Staff & Jonas Schneider, Member of Technical Staff at OpenAI from KubeCon/CloudNativeCon Europe 2017.
{{< /case-studies/quote >}}
{{< case-studies/lead >}}
From experiments in robotics to old-school video game play research, OpenAI's work in artificial intelligence technology is meant to be shared.
{{< /case-studies/lead >}}
<p>With a mission to ensure powerful AI systems are safe, OpenAI cares deeply about open source—both benefiting from it and contributing safety technology into it. "The research that we do, we want to spread it as widely as possible so everyone can benefit," says OpenAI's Head of Infrastructure Christopher Berner. The lab's philosophy—as well as its particular needs—lent itself to embracing an open source, cloud native strategy for its deep learning infrastructure.</p>
<p>OpenAI started running Kubernetes on top of AWS in 2016, and a year later, migrated the Kubernetes clusters to Azure. "We probably use Kubernetes differently from a lot of people," says Berner. "We use it for batch scheduling and as a workload manager for the cluster. It's a way of coordinating a large number of containers that are all connected together. We rely on our <a href="https://github.com/openai/kubernetes-ec2-autoscaler">autoscaler</a> to dynamically scale up and down our cluster. This lets us significantly reduce costs for idle nodes, while still providing low latency and rapid iteration."</p>
<p>In the past year, Berner has overseen the launch of several Kubernetes clusters in OpenAI's own data centers. "We run them in a hybrid model where the control planes—the Kubernetes API servers, <a href="https://github.com/coreos/etcd">etcd</a> and everything—are all in Azure, and then all of the Kubernetes nodes are in our own data center," says Berner. "The cloud is really convenient for managing etcd and all of the masters, and having backups and spinning up new nodes if anything breaks. This model allows us to take advantage of lower costs and have the availability of more specialized hardware in our own data center."</p>
{{< case-studies/quote image="/images/case-studies/openAI/banner3.jpg" >}}
OpenAI's experiments take advantage of Kubernetes' benefits, including portability. "Because Kubernetes provides a consistent API, we can move our research experiments very easily between clusters..."
{{< /case-studies/quote >}}
<p>Different teams at OpenAI currently run a couple dozen projects. While the largest-scale workloads manage bare cloud VMs directly, most of OpenAI's experiments take advantage of Kubernetes' benefits, including portability. "Because Kubernetes provides a consistent API, we can move our research experiments very easily between clusters," says Berner. The on-prem clusters are generally "used for workloads where you need lots of GPUs, something like training an ImageNet model. Anything that's CPU heavy, that's run in the cloud. But we also have a number of teams that run their experiments both in Azure and in our own data centers, just depending on which cluster has free capacity, and that's hugely valuable."</p>
<p>Berner has made the Kubernetes clusters available to all OpenAI teams to use if it's a good fit. "I've worked a lot with our games team, which at the moment is doing research on classic console games," he says. "They had been running a bunch of their experiments on our dev servers, and they had been trying out Google cloud, managing their own VMs. We got them to try out our first on-prem Kubernetes cluster, and that was really successful. They've now moved over completely to it, and it has allowed them to scale up their experiments by 10x, and do that without needing to invest significant engineering time to figure out how to manage more machines. A lot of people are now following the same path."</p>
{{< case-studies/quote image="/images/case-studies/openAI/banner4.jpg" >}}
"One of our researchers who is working on a new distributed training system has been able to get his experiment running in two or three days," says Berner. "In a week or two he scaled it out to hundreds of GPUs. Previously, that would have easily been a couple of months of work."
{{< /case-studies/quote >}}
<p>That path has been simplified by frameworks and tools that two of OpenAI's teams have developed to handle interaction with Kubernetes. "You can just write some Python code, fill out a bit of configuration with exactly how many machines you need and which types, and then it will prepare all of those specifications and send it to the Kube cluster so that it gets launched there," says Berner. "And it also provides a bit of extra monitoring and better tooling that's designed specifically for these machine learning projects."</p>
<p>The impact that Kubernetes has had at OpenAI is impressive. With Kubernetes, the frameworks and tooling, including the autoscaler, in place, launching experiments takes far less time. "One of our researchers who is working on a new distributed training system has been able to get his experiment running in two or three days," says Berner. "In a week or two he scaled it out to hundreds of GPUs. Previously, that would have easily been a couple of months of work."</p>
<p>Plus, the flexibility they now have to use their on-prem Kubernetes cluster when appropriate is "lowering costs and providing us access to hardware that we wouldn't necessarily have access to in the cloud," he says. "As long as the utilization is high, the costs are much lower in our data center. To an extent, you can also customize your hardware to exactly what you need."</p>
{{< case-studies/quote author="CHRISTOPHER BERNER, HEAD OF INFRASTRUCTURE FOR OPENAI" >}}
"Research teams can now take advantage of the frameworks we've built on top of Kubernetes, which make it easy to launch experiments, scale them by 10x or 50x, and take little effort to manage."
{{< /case-studies/quote >}}
<p>OpenAI is also benefiting from other technologies in the CNCF cloud-native ecosystem. <a href="https://grpc.io/">gRPC</a> is used by many of its systems for communications between different services, and <a href="https://prometheus.io/">Prometheus</a> is in place "as a debugging tool if things go wrong," says Berner. "We actually haven't had any real problems in our Kubernetes clusters recently, so I don't think anyone has looked at our Prometheus monitoring in a while. If something breaks, it will be there."</p>
<p>One of the things Berner continues to focus on is Kubernetes' ability to scale, which is essential to deep learning experiments. OpenAI has been able to push one of its Kubernetes clusters on Azure up to more than <a href="https://blog.openai.com/scaling-kubernetes-to-2500-nodes/">2,500 nodes</a>. "I think we'll probably hit the 5,000-machine number that Kubernetes has been tested at before too long," says Berner, adding, "We're definitely <a href="https://jobs.lever.co/openai">hiring</a> if you're excited about working on these things!"</p>

Binary file not shown.

Before

Width:  |  Height:  |  Size: 12 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 19 KiB

View File

@ -1,87 +0,0 @@
---
title: Pear Deck Case Study
case_study_styles: true
cid: caseStudies
new_case_study_styles: true
heading_background: /images/case-studies/peardeck/banner3.jpg
heading_title_logo: /images/peardeck_logo.png
subheading: >
Infrastructure for a Growing EdTech Startup
case_study_details:
- Company: Pear Deck
- Location: Iowa City, Iowa
- Industry: Educational Software
---
<h2>Challenge</h2>
<p>The three-year-old startup provides a web app for teachers to interact with their students in the classroom. The JavaScript app was built on Google's web app development platform <a href="https://firebase.google.com/">Firebase</a>, using <a href="https://www.heroku.com/">Heroku</a>. As the user base steadily grew, so did the development team. "We outgrew Heroku when we started wanting to have multiple services, and the deploying story got pretty horrendous. We were frustrated that we couldn't have the developers quickly stage a version," says CEO Riley Eynon-Lynch. "Tracing and monitoring became basically impossible." On top of that, many of Pear Deck's customers are behind government firewalls and connect through Firebase, not Pear Deck's servers, making troubleshooting even more difficult.</p>
<h2>Solution</h2>
<p>In 2016, the company began moving their code from Heroku to <a href="https://www.docker.com/">Docker</a> containers running on <a href="https://cloud.google.com/kubernetes-engine/">Google Kubernetes Engine</a>, orchestrated by <a href="http://kubernetes.io/">Kubernetes</a> and monitored with <a href="https://prometheus.io/">Prometheus</a>.</p>
<h2>Impact</h2>
<p>The new cloud native stack immediately improved the development workflow, speeding up deployments. Prometheus gave Pear Deck "a lot of confidence, knowing that people are still logging into the app and using it all the time," says Eynon-Lynch. "The biggest impact is being able to work as a team on the configuration in git in a pull request, and the biggest confidence comes from the solidity of the abstractions and the trust that we have in Kubernetes actually making our yaml files a reality."</p>
{{< case-studies/quote author="RILEY EYNON-LYNCH, CEO OF PEAR DECK" >}}
"We didn't even realize how stressed out we were about our lack of insight into what was happening with the app. I'm really excited and have more and more confidence in the actual state of our application for our actual users, and not just what the CPU graphs are saying, because of Prometheus and Kubernetes."
{{< /case-studies/quote >}}
{{< case-studies/lead >}}
With the speed befitting a startup, Pear Deck delivered its first prototype to customers within three months of incorporating.
{{< /case-studies/lead >}}
<p>As a former high school math teacher, CEO Riley Eynon-Lynch felt an urgency to provide a tech solution to classes where instructors struggle to interact with every student in a short amount of time. "Pear Deck is an app that students can use to interact with the teacher all at once," he says. "When the teacher asks a question, instead of just the kid at the front of the room answering again, everybody can answer every single question. It's a huge fundamental shift in the messaging to the students about how much we care about them and how much they are a part of the classroom."</p>
<p>Eynon-Lynch and his partners quickly built a JavaScript web app on Google's web app development platform <a href="https://firebase.google.com/">Firebase</a>, and launched the minimum viable product [MVP] on <a href="https://www.heroku.com/">Heroku</a> "because it was fast and easy," he says. "We made everything as easy as we could."</p>
<p>But once it launched, the user base began growing steadily at a rate of 30 percent a month. "Our Heroku bill was getting totally insane," Eynon-Lynch says. But even more crucially, as the company hired more developers to keep pace, "we outgrew Heroku. We wanted to have multiple services and the deploying story got pretty horrendous. We were frustrated that we couldn't have the developers quickly stage a version. Tracing and monitoring became basically impossible."</p>
<p>On top of that, many of Pear Deck's customers are behind government firewalls and connect through Firebase, not Pear Deck's servers, making troubleshooting even more difficult.</p>
<p>The team began looking around for another solution, and finally decided in early 2016 to start moving the app from Heroku to <a href="https://www.docker.com/">Docker</a> containers running on <a href="https://cloud.google.com/kubernetes-engine/">Google Kubernetes Engine</a>, orchestrated by <a href="http://kubernetes.io/">Kubernetes</a> and monitored with <a href="https://prometheus.io/">Prometheus</a>.</p>
{{< case-studies/quote image="/images/case-studies/peardeck/banner1.jpg" >}}
"When it became clear that Google Kubernetes Engine was going to have a lot of support from Google and be a fully-managed Kubernetes platform, it seemed very obvious to us that was the way to go," says Eynon-Lynch.
{{< /case-studies/quote >}}
<p>They had considered other options like Google's App Engine (which they were already using for one service) and Amazon's <a href="https://aws.amazon.com/ec2/">Elastic Compute Cloud</a> (EC2), while experimenting with running one small service that wasn't accessible to the Internet in Kubernetes. "When it became clear that Google Kubernetes Engine was going to have a lot of support from Google and be a fully-managed Kubernetes platform, it seemed very obvious to us that was the way to go," says Eynon-Lynch. "We didn't really consider Terraform and the other competitors because the abstractions offered by Kubernetes just jumped off the page to us."</p>
<p>Once the team started porting its Heroku apps into Kubernetes, which was "super easy," he says, the impact was immediate. "Before, to make a new version of the app meant going to Heroku and reconfiguring 10 new services, so basically no one was willing to do it, and we never staged things," he says. "Now we can deploy our exact same configuration in lots of different clusters in 30 seconds. We have a full set up that's always running, and then any of our developers or designers can stage new versions with one command, including their recent changes. We stage all the time now, and everyone stopped talking about how cool it is because it's become invisible how great it is."</p>
<p>Along with Kubernetes came Prometheus. "Until pretty recently we didn't have any kind of visibility into aggregate server metrics or performance," says Eynon-Lynch. The team had tried to use Google Kubernetes Engine's <a href="https://cloud.google.com/stackdriver/">Stackdriver</a> monitoring, but had problems making it work, and considered <a href="https://newrelic.com/">New Relic</a>. When they started looking at Prometheus in the fall of 2016, "the fit between the abstractions in Prometheus and the way we think about how our system works, was so clear and obvious," he says.</p>
<p>The integration with Kubernetes made set-up easy. Once Helm installed Prometheus, "We started getting a graph of the health of all our Kubernetes nodes and pods immediately. I think we were pretty hooked at that point," Eynon-Lynch says. "Then we got our own custom instrumentation working in 15 minutes, and had an actively updated count of requests that we could do, rate on and get a sense of how many users are connected at a given point. And then it was another hour before we had alarms automatically showing up in our Slack channel. All that was in one afternoon. And it was an afternoon of gasping with delight, basically!"</p>
{{< case-studies/quote image="/images/case-studies/peardeck/banner2.jpg" >}}
"We started getting a graph of the health of all our Kubernetes nodes and pods immediately. I think we were pretty hooked at that point," Eynon-Lynch says. "Then we got our own custom instrumentation working in 15 minutes, and had an actively updated count of requests that we could do, rate on and get a sense of how many users are connected at a given point. And then it was another hour before we had alarms automatically showing up in our Slack channel. All that was in one afternoon. And it was an afternoon of gasping with delight, basically!"
{{< /case-studies/quote >}}
<p>With Pear Deck's specific challenges—traffic through Firebase as well as government firewalls—Prometheus was a game-changer. "We didn't even realize how stressed out we were about our lack of insight into what was happening with the app," Eynon-Lynch says. Before, when a customer would report that the app wasn't working, the team had to manually investigate the problem without knowing whether customers were affected all over the world, or whether Firebase was down, and where.</p>
<p>To help solve that problem, the team wrote a script that pings Firebase from several different geographical locations, and then reports the responses to Prometheus in a histogram. "A huge impact that Prometheus had on us was just an amazing sigh of relief, of feeling like we knew what was happening," he says. "It took 45 minutes to implement [the Firebase alarm] because we knew that we had this trustworthy metrics platform in Prometheus. We weren't going to have to figure out, 'Where do we send these metrics? How do we aggregate the metrics? How do we understand them?'"</p>
<p>Plus, Prometheus has allowed Pear Deck to build alarms for business goals. One measures the rate of successful app loads and goes off if the day's loads are less than 90 percent of the loads from seven days before. "We run a JavaScript app behind ridiculous firewalls and all kinds of crazy browser extensions messing with it—Chrome will push a feature that breaks some CSS that we're using," Eynon-Lynch says. "So that gives us a lot of confidence, and we at least know that people are still logging into the app and using it all the time."</p>
<p>Now, when a customer complains, and none of the alarms have gone off, the team can feel confident that it's not a widespread problem. "Just to be sure, we can go and double check the graphs and say, 'Yep, there's currently 10,000 people connected to that Firebase node. It's definitely working. Let's investigate your network settings, customer,'" he says. "And we can pass that back off to our support reps instead of the whole development team freaking out that Firebase is down."</p>
<p>Pear Deck is also giving back to the community, building and open-sourcing a <a href="https://github.com/peardeck/prometheus-user-metrics">metrics aggregator</a> that enables end-user monitoring in Prometheus. "We can measure, for example, the time to interactive-dom on the web clients," he says. "The users all report that to our aggregator, then the aggregator reports to Prometheus. So we can set an alarm for some client side errors."</p>
<p>Most of Pear Deck's services have now been moved onto Kubernetes. And all of the team's new code is going on Kubernetes. "Kubernetes lets us experiment with service configurations and stage them on a staging cluster all at once, and test different scenarios and talk about them as a development team looking at code, not just talking about the steps we would eventually take as humans," says Eynon-Lynch.</p>
{{< case-studies/quote >}}
"A huge impact that Prometheus had on us was just an amazing sigh of relief, of feeling like we knew what was happening. It took 45 minutes to implement [the Firebase alarm] because we knew that we had this trustworthy metrics platform in Prometheus...in terms of the cloud, Kubernetes and Prometheus have so much to offer," he says.
{{< /case-studies/quote >}}
<p>Looking ahead, the team is planning to explore autoscaling on Kubernetes. With users all over the world but mostly in the United States, there are peaks and valleys in the traffic. One service that's still on App Engine can get as many as 10,000 requests a second during the day but far less at night. "We pay for the same servers at night, so I understand there's autoscaling that we can be taking advantage of," he says. "Implementing it is a big worry, exposing the rest of our Kubernetes cluster to us and maybe messing that up. But it's definitely our intention to move everything over, because now none of the developers want to work on that app anymore because it's such a pain to deploy it."</p>
<p>They're also eager to explore the work that Kubernetes is doing with stateful sets. "Right now all of the services we run in Kubernetes are stateless, and Google basically runs our databases for us and manages backups," Eynon-Lynch says. "But we're interested in building our own web-socket solution that doesn't have to be super stateful but will have maybe an hour's worth of state on it."</p>
<p>That project will also involve Prometheus, for a dark launch of web socket connections. "We don't know how reliable web socket connections behind all these horrible firewalls will be to our servers," he says. "We don't know what work Firebase has done to make them more reliable. So I'm really looking forward to trying to get persistent connections with web sockets to our clients and have optional tools to understand if it's working. That's our next new adventure, into stateful servers."</p>
<p>As for Prometheus, Eynon-Lynch thinks the company has only gotten started. "We haven't instrumented all our important features, especially those that depend on third parties," he says. "We have to wait for those third parties to tell us they're down, which sometimes they don't do for a long time. So I'm really excited and have more and more confidence in the actual state of our application for our actual users, and not just what the CPU graphs are saying, because of Prometheus and Kubernetes."</p>
<p>For a spry startup that's continuing to grow rapidly—and yes, they're <a href="https://www.peardeck.com/careers/">hiring</a>!—Pear Deck is notably satisfied with how its infrastructure has evolved in the cloud native ecosystem. "Usually I have some angsty thing where I want to get to the new, better technology," says Eynon-Lynch, "but in terms of the cloud, Kubernetes and Prometheus have so much to offer."</p>

Binary file not shown.

Before

Width:  |  Height:  |  Size: 12 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 9.0 KiB

View File

@ -1,83 +0,0 @@
---
title: Pearson Case Study
linkTitle: Pearson
case_study_styles: true
cid: caseStudies
featured: false
quote: >
We're already seeing tremendous benefits with Kubernetes—improved engineering productivity, faster delivery of applications and a simplified infrastructure. But this is just the beginning. Kubernetes will help transform the way that educational content is delivered online.
new_case_study_styles: true
heading_background: /images/case-studies/pearson/banner1.jpg
heading_title_logo: /images/pearson_logo.png
subheading: >
Reinventing the World's Largest Education Company With Kubernetes
case_study_details:
- Company: Pearson
- Location: Global
- Industry: Education
---
<h2>Challenge</h2>
<p>A global education company serving 75 million learners, Pearson set a goal to more than double that number, to 200 million, by 2025. A key part of this growth is in digital learning experiences, and Pearson was having difficulty in scaling and adapting to its growing online audience. They needed an infrastructure platform that would be able to scale quickly and deliver products to market faster.</p>
<h2>Solution</h2>
<p>"To transform our infrastructure, we had to think beyond simply enabling automated provisioning," says Chris Jackson, Director for Cloud Platforms & SRE at Pearson. "We realized we had to build a platform that would allow Pearson developers to build, manage and deploy applications in a completely different way." The team chose Docker container technology and Kubernetes orchestration "because of its flexibility, ease of management and the way it would improve our engineers' productivity."</p>
<h2>Impact</h2>
<p>With the platform, there has been substantial improvements in productivity and speed of delivery. "In some cases, we've gone from nine months to provision physical assets in a data center to just a few minutes to provision and get a new idea in front of a customer," says John Shirley, Lead Site Reliability Engineer for the Cloud Platform Team. Jackson estimates they've achieved 15-20% developer productivity savings. Before, outages were an issue during their busiest time of year, the back-to-school period. Now, there's high confidence in their ability to meet aggressive customer SLAs.</p>
{{< case-studies/quote author="Chris Jackson, Director for Cloud Platforms & SRE at Pearson" >}}
"We're already seeing tremendous benefits with Kubernetes—improved engineering productivity, faster delivery of applications and a simplified infrastructure. But this is just the beginning. Kubernetes will help transform the way that educational content is delivered online."
{{< /case-studies/quote >}}
<p>In 2015, Pearson was already serving 75 million learners as the world's largest education company, offering curriculum and assessment tools for Pre-K through college and beyond. Understanding that innovating the digital education experience was the key to the future of all forms of education, the company set out to increase its reach to 200 million people by 2025.</p>
<p>That goal would require a transformation of its existing infrastructure, which was in data centers. In some cases, it took nine months to provision physical assets. In order to adapt to the demands of its growing online audience, Pearson needed an infrastructure platform that would be able to scale quickly and deliver business-critical products to market faster. "We had to think beyond simply enabling automated provisioning," says Chris Jackson, Director for Cloud Platforms & SRE at Pearson. "We realized we had to build a platform that would allow Pearson developers to build, manage and deploy applications in a completely different way."</p>
<p>With 400 development groups and diverse brands with varying business and technical needs, Pearson embraced Docker container technology so that each brand could experiment with building new types of content using their preferred technologies, and then deliver it using containers. Jackson chose Kubernetes orchestration "because of its flexibility, ease of management and the way it would improve our engineers' productivity," he says.</p>
<p>The team adopted Kubernetes when it was still version 1.2 and are still going strong now on 1.7; they use Terraform and Ansible to deploy it on to basic AWS primitives. "We were trying to understand how we can create value for Pearson from this technology," says Ben Somogyi, Principal Architect for the Cloud Platforms. "It turned out that Kubernetes' benefits are huge. We're trying to help our applications development teams that use our platform go faster, so we filled that gap with a CI/CD pipeline that builds their images for them, standardizes them, patches everything up, allows them to deploy their different environments onto the cluster, and obfuscating the details of how difficult the work underneath the covers is."</p>
{{< case-studies/quote
image="/images/case-studies/pearson/banner3.jpg"
author="Chris Jackson, Director for Cloud Platforms & SRE at Pearson"
>}}
"Your internal customers need to feel like they are choosing the very best option for them. We are experiencing this first hand in the growth of adoption. We are seeing triple-digit, year-on-year growth of the service."
{{< /case-studies/quote >}}
<p>That work resulted in two tools for building and deploying applications in the cluster that Pearson has open sourced. "We're an education company, so we want to share what we can," says Somogyi.</p>
<p>Now that development teams no longer have to worry about infrastructure, there have been substantial improvements in productivity and speed of delivery. "In some cases, we've gone from nine months to provision physical assets in a data center to just a few minutes to provision and to get a new idea in front of a customer," says John Shirley, Lead Site Reliability Engineer for the Cloud Platform Team.</p>
<p>According to Jackson, the Cloud Platforms team can "provision a new proof-of-concept environment for a development team in minutes, and then they can take that to production as quickly as they are able to. This is the value proposition of all major technology services, and we had to compete like one to become our developers' preferred choice. Just because you work for the same company, you do not have the right to force people into a mediocre service. Your internal customers need to feel like they are choosing the very best option for them. We are experiencing this first hand in the growth of adoption. We are seeing triple-digit, year-on-year growth of the service."</p>
<p>Jackson estimates they've achieved a 15-20% boost in productivity for developer teams who adopt the platform. They also see a reduction in the number of customer-impacting incidents. Plus, says Jackson, "Teams who were previously limited to 1-2 releases per academic year can now ship code multiple times per day!"</p>
{{< case-studies/quote
image="/images/case-studies/pearson/banner4.jpg"
author="Chris Jackson, Director for Cloud Platforms & SRE at Pearson"
>}}
"Teams who were previously limited to 1-2 releases per academic year can now ship code multiple times per day!"
{{< /case-studies/quote >}}
<p>Availability has also been positively impacted. The back-to-school period is the company's busiest time of year, and "you have to keep applications up," says Somogyi. Before, this was a pain point for the legacy infrastructure. Now, for the applications that have been migrated to the Kubernetes platform, "We have 100% uptime. We're not worried about 9s. There aren't any. It's 100%, which is pretty astonishing for us, compared to some of the existing platforms that have legacy challenges," says Shirley.</p>
<p>"You can't even begin to put a price on how much that saves the company," Jackson explains. "A reduction in the number of support cases takes load out of our operations. The customer sentiment of having a reliable product drives customer retention and growth. It frees us to think about investing more into our digital transformation and taking a better quality of education to a global scale."</p>
<p>The platform itself is also being broken down, "so we can quickly release smaller pieces of the platform, like upgrading our Kubernetes or all the different modules that make up our platform," says Somogyi. "One of the big focuses in 2018 is this scheme of delivery to update the platform itself."</p>
<p>Guided by Pearson's overarching goal of getting to 200 million users, the team has run internal tests of the platform's scalability. "We had a challenge: 28 million requests within a 10 minute period," says Shirley. "And we demonstrated that we can hit that, with an acceptable latency. We saw that we could actually get that pretty readily, and we scaled up in just a few seconds, using open source tools entirely. Shout out to <a href="https://locust.io/">Locust</a>for that one. So that's amazing."</p>
{{< case-studies/quote author="Benjamin Somogyi, Principal Systems Architect at Pearson" >}}
"We have 100% uptime. We're not worried about 9s. There aren't any. It's 100%, which is pretty astonishing for us, compared to some of the existing platforms that have legacy challenges. You can't even begin to put a price on how much that saves the company."
{{< /case-studies/quote >}}
<p>In just two years, "We're already seeing tremendous benefits with Kubernetes—improved engineering productivity, faster delivery of applications and a simplified infrastructure," says Jackson. "But this is just the beginning. Kubernetes will help transform the way that educational content is delivered online."</p>
<p>So far, about 15 production products are running on the new platform, including Pearson's new flagship digital education service, the Global Learning Platform. The Cloud Platform team continues to prepare, onboard and support customers that are a good fit for the platform. Some existing products will be refactored into 12-factor apps, while others are being developed so that they can live on the platform from the get-go. "There are challenges with bringing in new customers of course, because we have to help them to see a different way of developing, a different way of building," says Shirley.</p>
<p>But, he adds, "It is our corporate motto: Always Learning. We encourage those teams that haven't started a cloud native journey, to see the future of technology, to learn, to explore. It will pique your interest. Keep learning."</p>

Binary file not shown.

Before

Width:  |  Height:  |  Size: 7.6 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 5.0 KiB

View File

@ -1,84 +0,0 @@
---
title: Pinterest Case Study
linkTitle: Pinterest
case_study_styles: true
cid: caseStudies
featured: false
weight: 30
quote: >
We are in the position to run things at scale, in a public cloud environment, and test things out in way that a lot of people might not be able to do.
new_case_study_styles: true
heading_background: /images/case-studies/pinterest/banner1.jpg
heading_title_logo: /images/pinterest_logo.png
subheading: >
Pinning Its Past, Present, and Future on Cloud Native
case_study_details:
- Company: Pinterest
- Location: San Francisco, California
- Industry: Web and Mobile App
---
<h2>Challenge</h2>
<p>After eight years in existence, Pinterest had grown into 1,000 microservices and multiple layers of infrastructure and diverse set-up tools and platforms. In 2016 the company launched a roadmap towards a new compute platform, led by the vision of creating the fastest path from an idea to production, without making engineers worry about the underlying infrastructure.</p>
<h2>Solution</h2>
<p>The first phase involved moving services to Docker containers. Once these services went into production in early 2017, the team began looking at orchestration to help create efficiencies and manage them in a decentralized way. After an evaluation of various solutions, Pinterest went with Kubernetes.</p>
<h2>Impact</h2>
<p>"By moving to Kubernetes the team was able to build on-demand scaling and new failover policies, in addition to simplifying the overall deployment and management of a complicated piece of infrastructure such as Jenkins," says Micheal Benedict, Product Manager for the Cloud and the Data Infrastructure Group at Pinterest. "We not only saw reduced build times but also huge efficiency wins. For instance, the team reclaimed over 80 percent of capacity during non-peak hours. As a result, the Jenkins Kubernetes cluster now uses 30 percent less instance-hours per-day when compared to the previous static cluster."</p>
{{< case-studies/quote author="Micheal Benedict, Product Manager for the Cloud and the Data Infrastructure Group at Pinterest" >}}
<iframe width="500" height="260" src="https://www.youtube.com/embed/I36NNJH1xQY" frameborder="0" allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
<br>
"So far it's been good, especially the elasticity around how we can configure our Jenkins workloads on that Kubernetes shared cluster. That is the win we were pushing for."
{{< /case-studies/quote >}}
{{< case-studies/lead >}}
Pinterest was born on the cloud—running on <a href="https://aws.amazon.com/">AWS</a> since day one in 2010—but even cloud native companies can experience some growing pains.
{{< /case-studies/lead >}}
<p>Since its launch, Pinterest has become a household name, with more than 200 million active monthly users and 100 billion objects saved. Underneath the hood, there are 1,000 microservices running and hundreds of thousands of data jobs.</p>
<p>With such growth came layers of infrastructure and diverse set-up tools and platforms for the different workloads, resulting in an inconsistent and complex end-to-end developer experience, and ultimately less velocity to get to production. So in 2016, the company launched a roadmap toward a new compute platform, led by the vision of having the fastest path from an idea to production, without making engineers worry about the underlying infrastructure.</p>
<p>The first phase involved moving to Docker. "Pinterest has been heavily running on virtual machines, on EC2 instances directly, for the longest time," says Micheal Benedict, Product Manager for the Cloud and the Data Infrastructure Group. "To solve the problem around packaging software and not make engineers own portions of the fleet and those kinds of challenges, we standardized the packaging mechanism and then moved that to the container on top of the VM. Not many drastic changes. We didn't want to boil the ocean at that point."</p>
{{< case-studies/quote
image="/images/case-studies/pinterest/banner3.jpg"
author="MICHEAL BENEDICT, PRODUCT MANAGER FOR THE CLOUD AND THE DATA INFRASTRUCTURE GROUP AT PINTEREST"
>}}
"Though Kubernetes lacked certain things we wanted, we realized that by the time we get to productionizing many of those things, we'll be able to leverage what the community is doing."
{{< /case-studies/quote >}}
<p>The first service that was migrated was the monolith API fleet that powers most of Pinterest. At the same time, Benedict's infrastructure governance team built chargeback and capacity planning systems to analyze how the company uses its virtual machines on AWS. "It became clear that running on VMs is just not sustainable with what we're doing," says Benedict. "A lot of resources were underutilized. There were efficiency efforts, which worked fine at a certain scale, but now you have to move to a more decentralized way of managing that. So orchestration was something we thought could help solve that piece."</p>
<p>That led to the second phase of the roadmap. In July 2017, after an eight-week evaluation period, the team chose Kubernetes over other orchestration platforms. "Kubernetes lacked certain things at the time—for example, we wanted Spark on Kubernetes," says Benedict. "But we realized that the dev cycles we would put in to even try building that is well worth the outcome, both for Pinterest as well as the community. We've been in those conversations in the Big Data SIG. We realized that by the time we get to productionizing many of those things, we'll be able to leverage what the community is doing."</p>
<p>At the beginning of 2018, the team began onboarding its first use case into the Kubernetes system: Jenkins workloads. "Although we have builds happening during a certain period of the day, we always need to allocate peak capacity," says Benedict. "They don't have any auto-scaling capabilities, so that capacity stays constant. It is difficult to speed up builds because ramping up takes more time. So given those kind of concerns, we thought that would be a perfect use case for us to work on."</p>
{{< case-studies/quote
image="/images/case-studies/pinterest/banner4.jpg"
author="MICHEAL BENEDICT, PRODUCT MANAGER FOR THE CLOUD AND THE DATA INFRASTRUCTURE GROUP AT PINTEREST"
>}}
"So far it's been good, especially the elasticity around how we can configure our Jenkins workloads on Kubernetes shared cluster. That is the win we were pushing for."
{{< /case-studies/quote >}}
<p>They ramped up the cluster, and working with a team of four people, got the Jenkins Kubernetes cluster ready for production. "We still have our static Jenkins cluster," says Benedict, "but on Kubernetes, we are doing similar builds, testing the entire pipeline, getting the artifact ready and just doing the comparison to see, how much time did it take to build over here. Is the SLA okay, is the artifact generated correct, are there issues there?"</p>
<p>"So far it's been good," he adds, "especially the elasticity around how we can configure our Jenkins workloads on Kubernetes shared cluster. That is the win we were pushing for."</p>
<p>By the end of Q1 2018, the team successfully migrated Jenkins Master to run natively on Kubernetes and also collaborated on the <a href="https://github.com/jenkinsci/kubernetes-plugin">Jenkins Kubernetes Plugin</a> to manage the lifecycle of workers. "We're currently building the entire Pinterest JVM stack (one of the larger monorepos at Pinterest which was recently bazelized) on this new cluster," says Benedict. "At peak, we run thousands of pods on a few hundred nodes. Overall, by moving to Kubernetes the team was able to build on-demand scaling and new failover policies, in addition to simplifying the overall deployment and management of a complicated piece of infrastructure such as Jenkins. We not only saw reduced build times but also huge efficiency wins. For instance, the team reclaimed over 80 percent of capacity during non-peak hours. As a result, the Jenkins Kubernetes cluster now uses 30 percent less instance-hours per-day when compared to the previous static cluster."</p>
{{< case-studies/quote author="MICHEAL BENEDICT, PRODUCT MANAGER FOR THE CLOUD AND THE DATA INFRASTRUCTURE GROUP AT PINTEREST">}}
"We are in the position to run things at scale, in a public cloud environment, and test things out in way that a lot of people might not be able to do."
{{< /case-studies/quote >}}
<p>Benedict points to a "pretty robust roadmap" going forward. In addition to the Pinterest big data team's experiments with Spark on Kubernetes, the company collaborated with Amazon's EKS team on an ENI/CNI plug in.</p>
<p>Once the Jenkins cluster is up and running out of dark mode, Benedict hopes to establish best practices, including having governance primitives established—including integration with the chargeback system—before moving on to migrating the next service. "We have a healthy pipeline of use-cases to be on-boarded. After Jenkins, we want to enable support for Tensorflow and Apache Spark. At some point, we aim to move the company's monolithic API service. If we move that and understand the complexity around that, it builds our confidence," says Benedict. "It sets us up for migration of all our other services."</p>
<p>After years of being a cloud native pioneer, Pinterest is eager to share its ongoing journey. "We are in the position to run things at scale, in a public cloud environment, and test things out in way that a lot of people might not be able to do," says Benedict. "We're in a great position to contribute back some of those learnings."</p>

Binary file not shown.

Before

Width:  |  Height:  |  Size: 8.9 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 8.8 KiB

View File

@ -1,79 +0,0 @@
---
title: SlingTV Case Study
linkTitle: Sling TV
case_study_styles: true
cid: caseStudies
featured: true
weight: 49
quote: >
I would almost be so bold as to say that most of these applications that we are building now would not have been possible without the cloud native patterns and the flexibility that Kubernetes enables.
new_case_study_styles: true
heading_background: /images/case-studies/slingtv/banner1.jpg
heading_title_logo: /images/slingtv_logo.png
subheading: >
Sling TV: Marrying Kubernetes and AI to Enable Proper Web Scale
case_study_details:
- Company: Sling TV
- Location: Englewood, Colorado
- Industry: Streaming television
---
<h2>Challenge</h2>
<p>Launched by DISH Network in 2015, Sling TV experienced great customer growth from the beginning. After just a year, "we were going through some growing pains of some of the legacy systems and trying to find the right architecture to enable our future," says Brad Linder, Sling TV's Cloud Native & Big Data Evangelist. The company has particular challenges: "We take live TV and distribute it over the internet out to a user's device that we do not control," says Linder. "In a lot of ways, we are working in the Wild West: The internet is what it is going to be, and if a customer's service does not work for whatever reason, they do not care why. They just want things to work. Those are the variables of the equation that we have to try to solve. We really have to try to enable optionality and good customer experience at web scale."</p>
<h2>Solution</h2>
<p>Led by the belief that "the cloud native architectures and patterns really give us a lot of flexibility in meeting the needs of that sort of customer base," Linder partnered with <a href="http://rancher.com">Rancher Labs</a> to build Sling TV's next-generation platform around Kubernetes. "We are going to need to enable a hybrid cloud strategy including multiple public clouds and an on-premise VMWare multi data center environment to meet the needs of the business at some point, so getting that sort of abstraction was a real goal," he says. "That is one of the biggest reasons why we picked Kubernetes." The team launched its first applications on Kubernetes in Sling TV's two internal data centers. The push to enable AWS as a data center option is underway and should be available by the end of 2018. The team has added <a href="https://prometheus.io/">Prometheus</a> for monitoring and <a href="https://github.com/jaegertracing/jaeger">Jaeger</a> for tracing, to work alongside the company's existing tool sets: Zenoss, New Relic and ELK.</p>
<h2>Impact</h2>
<p>"We are getting to the place where we can one-click deploy an entire data center the compute, network, Kubernetes, logging, monitoring and all the apps," says Linder. "We have really enabled a platform thinking based approach to allowing applications to consume common tools. A new application can be onboarded in about an hour using common tooling and CI/CD processes. The gains on that side have been huge. Before, it took at least a few days to get things sorted for a new application to deploy. That does not consider the training of our operations staff to manage this new application. It is two or three orders of magnitude of savings in time and cost, and operationally it has given us the opportunity to let a core team of talented operations engineers manage common infrastructure and tooling to make our applications available at web scale."</p>
{{< case-studies/quote author="Brad Linder, Cloud Native & Big Data Evangelist for Sling TV" >}}
"I would almost be so bold as to say that most of these applications that we are building now would not have been possible without the cloud native patterns and the flexibility that Kubernetes enables."
{{< /case-studies/quote >}}
{{< case-studies/lead >}}
The beauty of streaming television, like the service offered by <a href="https://www.sling.com/">Sling TV</a>, is that you can watch it from any device you want, wherever you want.
{{< /case-studies/lead >}}
<p>Of course, from the provider side of things, that creates a particular set of challenges "We take live TV and distribute it over the internet out to a user's device that we do not control," says Brad Linder, Sling TV's Cloud Native & Big Data Evangelist. "In a lot of ways, we are working in the Wild West: The internet is what it is going to be, and if a customer's service does not work for whatever reason, they do not care why. They just want things to work. Those are the variables of the equation that we have to try to solve. We really have to try to enable optionality and we have to do it at web scale."</p>
<p>Indeed, Sling TV experienced great customer growth from the beginning of its launch by <a href="https://www.dish.com/">DISH Network</a> in 2015. After just a year, "we were going through some growing pains of some of the legacy systems and trying to find the right architecture to enable our future," says Linder. Tasked with building a next-generation web scale platform for the "personalized customer experience," Linder has spent the past year bringing Kubernetes to Sling TV.</p>
<p>Led by the belief that "the cloud native architectures and patterns really give us a lot of flexibility in meeting the needs of our customers," Linder partnered with <a href="http://rancher.com">Rancher Labs</a> to build the platform around Kubernetes. "They have really helped us get our head around how to use Kubernetes," he says. "We needed the flexibility to enable our use case versus just a simple orchestrater. Enabling our future in a way that did not give us vendor lock-in was also a key part of our strategy. I think that is part of the Rancher value proposition."</p>
{{< case-studies/quote
image="/images/case-studies/slingtv/banner3.jpg"
author="Brad Linder, Cloud Native & Big Data Evangelist for Sling TV"
>}}
"We needed the flexibility to enable our use case versus just a simple orchestrater. Enabling our future in a way that did not give us vendor lock-in was also a key part of our strategy. I think that is part of the Rancher value proposition."
{{< /case-studies/quote >}}
<p>One big reason he chose Kubernetes was getting a level of abstraction that would enable the company to "enable a hybrid cloud strategy including multiple public clouds and an on-premise VMWare multi data center environment to meet the needs of the business," he says. Another factor was how much the Kubernetes ecosystem has matured over the past couple of years. "We have spent a lot of time and energy around making logging, monitoring and alerting production ready to give us insights into applications' well-being," says Linder. The team has added <a href="https://prometheus.io/">Prometheus</a> for monitoring and <a href="https://github.com/jaegertracing/jaeger">Jaeger</a> for tracing, to work alongside the company's existing tool sets: Zenoss, New Relic and ELK.</p>
<p>With the emphasis on common tooling, "We are getting to the place where we can one-click deploy an entire data center the compute, network, Kubernetes, logging, monitoring and all the apps," says Linder. "We have really enabled a platform thinking based approach to allowing applications to consume common tools and services. A new application can be onboarded in about an hour using common tooling and CI/CD processes. The gains on that side have been huge. Before, it took at least a few days to get things sorted for a new application to deploy. That does not consider the training of our operations staff to manage this new application. It is two or three orders of magnitude of savings in time and cost, and operationally it has given us the opportunity to let a core team of talented operations engineers manage common infrastructure and tooling to make our applications available at web scale."</p>
{{< case-studies/quote
image="/images/case-studies/slingtv/banner4.jpg"
author="Brad Linder, Cloud Native & Big Data Evangelist for Sling TV"
>}}
"We have to be able to react to changes and hiccups in the matrix. It is the foundation for our ability to deliver a high-quality service for our customers."
{{< /case-studies/quote >}}
<p>The team launched its first applications on Kubernetes in Sling TV's two internal data centers in the early part of Q1 2018 and began to enable AWS as a data center option. The company plans to expand into other public clouds in the future.</p>
<p>The first application that went into production is a web socket-based back-end notification service. "It allows back-end changes to trigger messages to our clients in the field without the polling," says Linder. "We are talking about very high volumes of messages with this application. Without something like Kubernetes to be able to scale up and down, as well as just support that overall workload, that is pretty hard to do. I would almost be so bold as to say that most of these applications that we are building now would not have been possible without the cloud native patterns and the flexibility that Kubernetes enables."</p>
<p>Linder oversees three teams working together on building the next-generation platform: a platform engineering team; an enterprise middleware services team; and a big data and analytics team. "We have really tried to bring everything together to be able to have a client application interact with a cloud native middleware layer. That middleware layer must run on a platform, consume platform services and then have logs and events monitored by an artificial agent to keep things running smoothly," says Linder.</p>
{{< case-studies/quote author="BRAD LINDER, CLOUD NATIVE & BIG DATA EVANGELIST FOR SLING TV">}}
This undertaking is about "trying to marry Kubernetes with AI to enable web scale that just works".
{{< /case-studies/quote >}}
<p>Ultimately, this undertaking is about "trying to marry Kubernetes with AI to enable web scale that just works," he adds. "We want the artificial agents and the big data platform using the actual logs and events coming out of the applications, Kubernetes, the infrastructure, backing services and changes to the environment to make decisions like, 'Hey we need more capacity for this service so please add more nodes.' From a platform perspective, if you are truly doing web scale stuff and you are not using AI and big data, in my opinion, you are going to implode under your own weight. It is not a question of if, it is when. If you are in a 'millions of users' sort of environment, that implosion is going to be catastrophic. We are on our way to this goal and have learned a lot along the way."</p>
<p>For Sling TV, moving to cloud native has been exactly what they needed. "We have to be able to react to changes and hiccups in the matrix," says Linder. "It is the foundation for our ability to deliver a high-quality service for our customers. Building intelligent platforms, tools and clients in the field consuming those services has got to be part of all of this. In my eyes that is a big part of what cloud native is all about. It is taking these distributed, potentially unreliable entities and enabling a robust customer experience they expect."</p>

Binary file not shown.

Before

Width:  |  Height:  |  Size: 8.3 KiB

View File

@ -1,71 +0,0 @@
---
title: Squarespace Case Study
case_study_styles: true
cid: caseStudies
new_case_study_styles: true
heading_background: /images/case-studies/squarespace/banner1.jpg
heading_title_logo: /images/squarespace_logo.png
subheading: >
Squarespace: Gaining Productivity and Resilience with Kubernetes
case_study_details:
- Company: Squarespace
- Location: New York, N.Y.
- Industry: Software as a Service, Website-Building Platform
---
<h2>Challenge</h2>
<p>Moving from a monolith to microservices in 2014 "solved a problem on the development side, but it pushed that problem to the infrastructure team," says Kevin Lynch, Staff Engineer on the Site Reliability team at Squarespace. "The infrastructure deployment process on our 5,000 VM hosts was slowing everyone down."</p>
<h2>Solution</h2>
<p>The team experimented with container orchestration platforms, and found that Kubernetes "answered all the questions that we had," says Lynch. The company began running Kubernetes in its data centers in 2016.</p>
<h2>Impact</h2>
<p>Since Squarespace moved to Kubernetes, in conjunction with modernizing its networking stack, deployment time has been reduced by almost 85%. Before, their VM deployment would take half an hour; now, says Lynch, "someone can generate a templated application, deploy it within five minutes, and have actual instances containerized, running in our staging environment at that point." Because of that, "productivity time is the big cost saver," he adds. "When we started the Kubernetes project, we had probably a dozen microservices. Today there are twice that in the pipeline being actively worked on." Resilience has also been improved with Kubernetes: "If a node goes down, it's rescheduled immediately and there's no performance impact."</p>
{{< case-studies/quote author="Kevin Lynch, Staff Engineer on the Site Reliability team at Squarespace" >}}
<iframe width="560" height="315" src="https://www.youtube.com/embed/feQkzJkW-SA" frameborder="0" allow="autoplay; encrypted-media" allowfullscreen></iframe>
<br>
"Once you prove that Kubernetes solves one problem, everyone immediately starts solving other problems without you even having to evangelize it."
{{< /case-studies/quote >}}
{{< case-studies/lead >}}
Since it was started in a dorm room in 2003, Squarespace has made it simple for millions of people to create their own websites.
{{< /case-studies/lead >}}
<p>Behind the scenes, though, the company's monolithic Java application was making things not so simple for its developers to keep improving the platform. So in 2014, the company decided to "go down the microservices path," says Kevin Lynch, staff engineer on Squarespace's Site Reliability team. "But we were always deploying our applications in vCenter VMware VMs [in our own data centers]. Microservices solved a problem on the development side, but it pushed that problem to the Infrastructure team. The infrastructure deployment process on our 5,000 VM hosts was slowing everyone down."</p>
<p>After experimenting with another container orchestration platform and "breaking it in very painful ways," Lynch says, the team began experimenting with Kubernetes in mid-2016 and found that it "answered all the questions that we had." Deploying it in the data center rather than the public cloud was their biggest challenge, and at the time, not a lot of other companies were doing that. "We had to figure out how to deploy this in our infrastructure for ourselves, and we had to integrate it with our other applications," says Lynch.</p>
<p>At the same time, Squarespace's Network Engineering team was modernizing its networking stack, switching from a traditional layer-two network to a layer-three spine-and-leaf network. "It mapped beautifully with what we wanted to do with Kubernetes," says Lynch. "It gives us the ability to have our servers communicate directly with the top-of-rack switches. We use Calico for <a href="https://github.com/containernetworking/cni">CNI networking for Kubernetes</a>, so we can announce all these individual Kubernetes pod IP addresses and have them integrate seamlessly with our other services that are still provisioned in the VMs."</p>
{{< case-studies/quote image="/images/case-studies/squarespace/banner3.jpg" >}}
After experimenting with another container orchestration platform and "breaking it in very painful ways," Lynch says, the team began experimenting with Kubernetes in mid-2016 and found that it "answered all the questions that we had."
{{< /case-studies/quote >}}
<p>Within a couple months, they had a stable cluster for their internal use, and began rolling out Kubernetes for production. They also added Zipkin and CNCF projects <a href="https://prometheus.io/">Prometheus</a> and <a href="https://www.fluentd.org/">fluentd</a> to their cloud native stack. "We switched to Kubernetes, a new world, and we revamped all our other tooling as well," says Lynch. "It allowed us to streamline our process, so we can now easily create an entire microservice project from templates, generate the code and deployment pipeline for that, generate the Docker file, and then immediately just ship a workable, deployable project to Kubernetes." Deployments across Dev/QA/Stage/Prod were also "simplified drastically," Lynch adds. "Now there is little configuration variation."</p>
<p>And the whole process takes only five minutes, an almost 85% reduction in time compared to their VM deployment. "From end to end that probably took half an hour, and that's not accounting for the fact that an infrastructure engineer would be responsible for doing that, so there's some business delay in there as well."</p>
<p>With faster deployments, "productivity time is the big cost saver," says Lynch. "We had a team that was implementing a new file storage service, and they just started integrating that with our storage back end without our involvement"—which wouldn't have been possible before Kubernetes. He adds: "When we started the Kubernetes project, we had probably a dozen microservices. Today there are twice that in the pipeline being actively worked on."</p>
{{< case-studies/quote image="/images/case-studies/squarespace/banner4.jpg" >}}
"We switched to Kubernetes, a new world....It allowed us to streamline our process, so we can now easily create an entire microservice project from templates," Lynch says. And the whole process takes only five minutes, an almost 85% reduction in time compared to their VM deployment.
{{< /case-studies/quote >}}
<p>There's also been a positive impact on the application's resilience. "When we're deploying VMs, we have to build tooling to ensure that a service is spread across racks appropriately and can withstand failure," he says. "Kubernetes just does it. If a node goes down, it's rescheduled immediately and there's no performance impact."</p>
<p>Another big benefit is autoscaling. "It wasn't really possible with the way we've been using VMware," says Lynch, "but now we can just add the appropriate autoscaling features via Kubernetes directly, and boom, it's scaling up as demand increases. And it worked out of the box."</p>
<p>For others starting out with Kubernetes, Lynch says his best advice is to "fail fast": "Once you've planned things out, just execute. Kubernetes has been really great for trying something out quickly and seeing if it works or not."</p>
{{< case-studies/quote >}}
"When we're deploying VMs, we have to build tooling to ensure that a service is spread across racks appropriately and can withstand failure," he says. "Kubernetes just does it. If a node goes down, it's rescheduled immediately and there's no performance impact."
{{< /case-studies/quote >}}
<p>Lynch and his team are planning to open source some of the tools they've developed to extend Kubernetes and use it as an API itself. The first tool injects dependent applications as containers in a pod. "When you ship an application, usually it comes along with a whole bunch of dependent applications that need to be shipped with that, for example, fluentd for logging," he explains. With this tool, the developer doesn't need to worry about the configurations.</p>
<p>Going forward, all new services at Squarespace are going into Kubernetes, and the end goal is to convert everything it can. About a quarter of existing services have been migrated. "Our monolithic application is going to be the last one, just because it's so big and complex," says Lynch. "But now I'm seeing other services get moved over, like the file storage service. Someone just did it and it worked—painlessly. So I believe if we tackle it, it's probably going to be a lot easier than we fear. Maybe I should just take my own advice and fail fast!"</p>

Binary file not shown.

Before

Width:  |  Height:  |  Size: 4.4 KiB

View File

@ -1,66 +0,0 @@
---
title: Wikimedia Case Study
case_study_styles: true
cid: caseStudies
new_case_study_styles: true
heading_title_text: Wikimedia
use_gradient_overlay: true
subheading: >
Using Kubernetes to Build Tools to Improve the World's Wikis
case_study_details:
- Company: Wikimedia
- Location: San Francisco, CA
---
<p>The non-profit Wikimedia Foundation operates some of the largest collaboratively edited reference projects in the world, including Wikipedia. To help users maintain and use wikis, it runs Wikimedia Tool Labs, a hosting environment for community developers working on tools and bots to help editors and other volunteers do their work, including reducing vandalism. The community around Wikimedia Tool Labs began forming nearly 10 years ago.</p>
{{< case-studies/quote author="Yuvi Panda, operations engineer at Wikimedia Foundation and Wikimedia Tool Labs">}}
<img src="/images/wikimedia_logo.png" alt="Wikimedia">
<br>
<br>
"Wikimedia Tool Labs is vital for making sure wikis all around the world work as well as they possibly can. Because it's grown organically for almost 10 years, it has become an extremely challenging environment and difficult to maintain. It's like a big ball of mud — you really can't see through it. With Kubernetes, we're simplifying the environment and making it easier for developers to build the tools that make wikis run better."
{{< /case-studies/quote >}}
<h2>Challenges</h2>
<ul>
<li>Simplify a complex, difficult-to-manage infrastructure</li>
<li>Allow developers to continue writing tools and bots using existing techniques</li>
</ul>
<h2>Why Kubernetes</h2>
<ul>
<li>Wikimedia Tool Labs chose Kubernetes because it can mimic existing workflows, while reducing complexity</li>
</ul>
<h2>Approach</h2>
<ul>
<li>Migrate old systems and a complex infrastructure to Kubernetes</li>
</ul>
<h2>Results</h2>
<ul>
<li>20 percent of web tools that account for more than 40 percent of web traffic now run on Kubernetes</li>
<li>A 25-node cluster that keeps up with each new Kubernetes release</li>
<li>Thousands of lines of old code have been deleted, thanks to Kubernetes</li>
</ul>
<h2>Using Kubernetes to provide tools for maintaining wikis</h2>
<p>Wikimedia Tool Labs is run by a staff of four-and-a-half paid employees and two volunteers. The infrastructure didn't make it easy or intuitive for developers to build bots and other tools to make wikis work more easily. Yuvi says, "It's incredibly chaotic. We have lots of Perl and Bash duct tape on top of it. Everything is super fragile."</p>
<p>To solve the problem, Wikimedia Tool Labs migrated parts of its infrastructure to Kubernetes, in preparation for eventually moving its entire system. Yuvi said Kubernetes greatly simplifies maintenance. The goal is to allow developers creating bots and other tools to use whatever development methods they want, but make it easier for the Wikimedia Tool Labs to maintain the required infrastructure for hosting and sharing them.</p>
<p>"With Kubernetes, I've been able to remove a lot of our custom-made code, which makes everything easier to maintain. Our users' code also runs in a more stable way than previously," says Yuvi.</p>
<h2>Simplifying infrastructure and keeping wikis running better</h2>
<p>Wikimedia Tool Labs has seen great success with the initial Kubernetes deployment. Old code is being simplified and eliminated, contributing developers don't have to change the way they write their tools and bots, and those tools and bots run in a more stable fashion than they have in the past. The paid staff and volunteers are able to better keep up with fixing issues.</p>
<p>In the future, with a more complete migration to Kubernetes, Wikimedia Tool Labs expects to make it even easier to host and maintain the bots and tools that help run wikis across the world. The tool labs already host approximately 1,300 tools and bots from 800 volunteers, with many more being submitted every day. Twenty percent of the tool labs' web tools that account for more than 60 percent of web traffic now run on Kubernetes. The tool labs has a 25-node cluster that keeps up with each new Kubernetes release. Many existing web tools are migrating to Kubernetes.</p>
<p>"Our goal is to make sure that people all over the world can share knowledge as easily as possible. Kubernetes helps with that, by making it easier for wikis everywhere to have the tools they need to thrive," says Yuvi.</p>

Binary file not shown.

Before

Width:  |  Height:  |  Size: 21 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 7.7 KiB

View File

@ -1,87 +0,0 @@
---
title: Wink Case Study
case_study_styles: true
cid: caseStudies
new_case_study_styles: true
heading_background: /images/case-studies/wink/banner1.jpg
heading_title_logo: /images/wink_logo.png
subheading: >
Cloud-Native Infrastructure Keeps Your Smart Home Connected
case_study_details:
- Company: Wink
- Location: New York, N.Y.
- Industry: Internet of Things Platform
---
<h2>Challenge</h2>
<p>Building a low-latency, highly reliable infrastructure to serve communications between millions of connected smart-home devices and the company's consumer hubs and mobile app, with an emphasis on horizontal scalability, the ability to encrypt everything quickly and connections that could be easily brought back up if anything went wrong.</p>
<h2>Solution</h2>
<p>Across-the-board use of a Kubernetes-Docker-CoreOS Container Linux stack.</p>
<h2>Impact</h2>
<p>"Two of the biggest American retailers [Home Depot and Walmart] are carrying and promoting the brand and the hardware," Wink Head of Engineering Kit Klein says proudly though he adds that "it really comes with a lot of pressure. It's not a retail situation where you have a lot of tech enthusiasts. These are everyday people who want something that works and have no tolerance for technical excuses." And that's further testament to how much faith Klein has in the infrastructure that the Wink team has built. With 80 percent of Wink's workload running on a unified stack of Kubernetes-Docker-CoreOS, the company has put itself in a position to continually innovate and improve its products and services. Committing to this technology, says Klein, "makes building on top of the infrastructure relatively easy."</p>
{{< case-studies/quote author="KIT KLEIN, HEAD OF ENGINEERING, WINK" >}}
"It's not proprietary, it's totally open, it's really portable. You can run all the workloads across different cloud providers. You can easily run a hybrid AWS or even bring in your own data center. That's the benefit of having everything unified on one open source Kubernetes-Docker-CoreOS Container Linux stack. There are massive security benefits if you only have one Linux distro/machine image to validate. The benefits are enormous because you save money, and you save time."
{{< /case-studies/quote >}}
{{< case-studies/lead >}}
How many people does it take to turn on a light bulb?
{{< /case-studies/lead >}}
<p>Kit Klein whips out his phone to demonstrate. With a few swipes, the head of engineering at Wink pulls up the smart-home app created by the New York City-based company and taps the light button. "Honestly when you're holding the phone and you're hitting the light," he says, "by the time you feel the pressure of your finger on the screen, it's on. It takes as long as the signal to travel to your brain."</p>
<p>Sure, it takes just one finger and less than 200 milliseconds to turn on the light or lock a door or change a thermostat. But what allows Wink to help consumers manage their connected smart-home products with such speed and ease is a sophisticated, cloud native infrastructure that Klein and his team built and continue to develop using a unified stack of CoreOS, the open-source operating system designed for clustered deployments, and Kubernetes, an open-source platform for automating deployment, scaling, and operations of application containers across clusters of hosts, providing container-centric infrastructure. "When you have a big, complex network of interdependent microservices that need to be able to discover each other, and need to be horizontally scalable and tolerant to failure, that's what this is really optimized for," says Klein. "A lot of people end up relying on proprietary services [offered by some big cloud providers] to do some of this stuff, but what you get by adopting CoreOS/Kubernetes is portability, to not be locked in to anyone. You can really make your own fate."</p>
<p>Indeed, Wink did. The company's mission statement is to make the connected home accessible that is, user-friendly for non-technical owners, affordable and perhaps most importantly, reliable. "If you can't trust that when you hit the switch, you know a light is going to go on, or if you're remote and you're checking on your house and that information isn't accurate, then the convenience of the system is lost," says Klein. "So that's where the infrastructure comes in."</p>
<p>Wink was incubated within Quirky, a company that developed crowd-sourced inventions. The Wink app was first introduced in 2013, and at the time, it controlled only a few consumer products such as the PivotPower Strip that Quirky produced in collaboration with GE. As smart-home products proliferated, Wink was launched in 2014 in Home Depot stores nationwide. Its first project: a hub that could integrate with smart products from about a dozen brands like Honeywell and Chamberlain. The biggest challenge would be to build the infrastructure to serve all those communications between the hub and the products, with a focus on maximizing reliability and minimizing latency.</p>
<p>"When we originally started out, we were moving very fast trying to get the first product to market, the minimum viable product," says Klein. "Lots of times you go down a path and end up having to backtrack and try different things. But in this particular case, we did a lot of the work up front, which led to us making a really sound decision to deploy it on CoreOS Container Linux. And that was very early in the life of it."</p>
{{< case-studies/quote image="/images/case-studies/wink/banner3.jpg">}}
"...what you get by adopting CoreOS/Kubernetes is portability, to not be locked in to anyone. You can really make your own fate."
{{< /case-studies/quote >}}
<p>Concern number one: Wink's products need to connect to consumer devices in people's homes, behind a firewall. "You don't have an end point like a URL, and you don't even know what ports are open behind that firewall," Klein explains. "So you essentially need to have this thing wake up and talk to your system and then open real-time, bidirectional communication between the cloud and the device. And it's really, really important that it's persistent because you want to decrease as much as possible the overhead of sending a message you never know when someone is going to turn on the lights."</p>
<p>With the earliest version of the Wink Hub, when you decided to turn your lights on or off, the request would be sent to the cloud and then executed. Subsequent updates to Wink's software enabled local control, cutting latency down to about 10 milliseconds for many devices. But with the need for cloud-enabled integrations of an ever-growing ecosystem of smart home products, low-latency internet connectivity is still a critical consideration.</p>
{{< case-studies/lead >}}
"You essentially need to have this thing wake up and talk to your system and then open real-time, bidirectional communication between the cloud and the device. And it's really, really important that it's persistent...you never know when someone is going to turn on the lights."
{{< /case-studies/lead >}}
<p>In addition, Wink had other requirements: horizontal scalability, the ability to encrypt everything quickly, connections that could be easily brought back up if something went wrong. "Looking at this whole structure we started, we decided to make a secure socket-based service," says Klein. "We've always used, I would say, some sort of clustering technology to deploy our services and so the decision we came to was, this thing is going to be containerized, running on Docker."</p>
<p>At the time just over two years ago Docker wasn't yet widely used, but as Klein points out, "it was certainly understood by the people who were on the frontier of technology. We started looking at potential technologies that existed. One of the limiting factors was that we needed to deploy multi-port non-http/https services. It wasn't really appropriate for some of the early cluster technology. We liked the project a lot and we ended up using it on other stuff for a while, but initially it was too targeted toward http workloads."</p>
<p>Once Wink's backend engineering team decided on a Dockerized workload, they had to make decisions about the OS and the container orchestration platform. "Obviously you can't just start the containers and hope everything goes well," Klein says with a laugh. "You need to have a system that is helpful [in order] to manage where the workloads are being distributed out to. And when the container inevitably dies or something like that, to restart it, you have a load balancer. All sorts of housekeeping work is needed to have a robust infrastructure."</p>
{{< case-studies/quote image="/images/case-studies/wink/banner4.jpg" >}}
"Obviously you can't just start the containers and hope everything goes well," Klein says with a laugh. "You need to have a system that is helpful [in order] to manage where the workloads are being distributed out to. And when the container inevitably dies or something like that, to restart it, you have a load balancer. All sorts of housekeeping work is needed to have a robust infrastructure."
{{< /case-studies/quote >}}
<p>Wink considered building directly on a general purpose Linux distro like Ubuntu (which would have required installing tools to run a containerized workload) and cluster management systems like Mesos (which was targeted toward enterprises with larger teams/workloads), but ultimately set their sights on CoreOS Container Linux. "A container-optimized Linux distribution system was exactly what we needed," he says. "We didn't have to futz around with trying to take something like a Linux distro and install everything. It's got a built-in container orchestration system, which is Fleet, and an easy-to-use API. It's not as feature-rich as some of the heavier solutions, but we realized that, at that moment, it was exactly what we needed."</p>
<p>Wink's hub (along with a revamped app) was introduced in July 2014 with a short-term deployment, and within the first month, they had moved the service to the Dockerized CoreOS deployment. Since then, they've moved almost every other piece of their infrastructure from third-party cloud-to-cloud integrations to their customer service and payment portals onto CoreOS Container Linux clusters.</p>
<p>Using this setup did require some customization. "Fleet is really nice as a basic container orchestration system, but it doesn't take care of routing, sharing configurations, secrets, et cetera, among instances of a service," Klein says. "All of those layers of functionality can be implemented, of course, but if you don't want to spend a lot of time writing unit files manually which of course nobody does you need to create a tool to automate some of that, which we did."</p>
<p>Wink quickly embraced the Kubernetes container cluster manager when it was launched in 2015 and integrated with CoreOS core technology, and as promised, it ended up providing the features Wink wanted and had planned to build. "If not for Kubernetes, we likely would have taken the logic and library we implemented for the automation tool that we created, and would have used it in a higher level abstraction and tool that could be used by non-DevOps engineers from the command line to create and manage clusters," Klein says. "But Kubernetes made that totally unnecessary and is written and maintained by people with a lot more experience in cluster management than us, so all the better." Now, an estimated 80 percent of Wink's workload is run on Kubernetes on top of CoreOS Container Linux.</p>
{{< case-studies/quote >}}
"Stay close to the development. Understand why decisions are being made. If you understand the intent behind the project, from the technological intent to a certain philosophical intent, then it helps you understand how to build your system in harmony with those systems as opposed to trying to work against it."
{{< /case-studies/quote >}}
<p>Wink's reasons for going all in are clear: "It's not proprietary, it's totally open, it's really portable," Klein says. "You can run all the workloads across different cloud providers. You can easily run a hybrid AWS or even bring in your own data center. That's the benefit of having everything unified on one Kubernetes-Docker-CoreOS Container Linux stack. There are massive security benefits if you only have one Linux distro to try to validate. The benefits are enormous because you save money, you save time."</p>
<p>Klein concedes that there are tradeoffs in every technology decision. "Cutting-edge technology is going to be scary for some people," he says. "In order to take advantage of this, you really have to keep up with the technology. You can't treat it like it's a black box. Stay close to the development. Understand why decisions are being made. If you understand the intent behind the project, from the technological intent to a certain philosophical intent, then it helps you understand how to build your system in harmony with those systems as opposed to trying to work against it."</p>
<p>Wink, which was acquired by Flex in 2015, now controls 2.3 million connected devices in households all over the country. What's next for the company? A new version of the hub - Wink Hub 2 - hit shelves last November and is being offered for the first time at Walmart stores in addition to Home Depot. "Two of the biggest American retailers are carrying and promoting the brand and the hardware," Klein says proudly though he adds that "it really comes with a lot of pressure. It's not a retail situation where you have a lot of tech enthusiasts. These are everyday people who want something that works and have no tolerance for technical excuses." And that's further testament to how much faith Klein has in the infrastructure that the Wink team has have built.</p>
<p>Wink's engineering team has grown exponentially since its early days, and behind the scenes, Klein is most excited about the machine learning Wink is using. "We built [a system of] containerized small sections of the data pipeline that feed each other and can have multiple outputs," he says. "It's like data pipelines as microservices." Again, Klein points to having a unified stack running on CoreOS Container Linux and Kubernetes as the primary driver for the innovations to come. "You're not reinventing the wheel every time," he says. "You can just get down to work."</p>

Binary file not shown.

Before

Width:  |  Height:  |  Size: 21 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 5.5 KiB

View File

@ -1,93 +0,0 @@
---
title: Workiva Case Study
linkTitle: Workiva
case_study_styles: true
cid: caseStudies
draft: true
featured: true
weight: 20
quote: >
With OpenTracing, my team was able to look at a trace and make optimization suggestions to another team without ever looking at their code.
new_case_study_styles: true
heading_background: /images/case-studies/workiva/banner1.jpg
heading_title_logo: /images/workiva_logo.png
subheading: >
Using OpenTracing to Help Pinpoint the Bottlenecks
case_study_details:
- Company: Workiva
- Location: Ames, Iowa
- Industry: Enterprise Software
---
<h2>Challenge</h2>
<p><a href="https://www.workiva.com/">Workiva</a> offers a cloud-based platform for managing and reporting business data. This SaaS product, Wdesk, is used by more than 70 percent of the Fortune 500 companies. As the company made the shift from a monolith to a more distributed, microservice-based system, "We had a number of people working on this, all on different teams, so we needed to identify what the issues were and where the bottlenecks were," says Senior Software Architect MacLeod Broad. With back-end code running on Google App Engine, Google Compute Engine, as well as Amazon Web Services, Workiva needed a tracing system that was agnostic of platform. While preparing one of the company's first products utilizing AWS, which involved a "sync and link" feature that linked data from spreadsheets built in the new application with documents created in the old application on Workiva's existing system, Broad's team found an ideal use case for tracing: There were circular dependencies, and optimizations often turned out to be micro-optimizations that didn't impact overall speed.</p>
<h2>Solution</h2>
<p>Broad's team introduced the platform-agnostic distributed tracing system OpenTracing to help them pinpoint the bottlenecks.</p>
<h2>Impact</h2>
<p>Now used throughout the company, OpenTracing produced immediate results. Software Engineer Michael Davis reports: "Tracing has given us immediate, actionable insight into how to improve our service. Through a combination of seeing where each call spends its time, as well as which calls are most often used, we were able to reduce our average response time by 95 percent (from 600ms to 30ms) in a single fix."</p>
{{< case-studies/quote author="MacLeod Broad, Senior Software Architect at Workiva" >}}
"With OpenTracing, my team was able to look at a trace and make optimization suggestions to another team without ever looking at their code."
{{< /case-studies/quote >}}
{{< case-studies/lead >}}
Last fall, MacLeod Broad's platform team at Workiva was prepping one of the company's first products utilizing <a href="https://aws.amazon.com/">Amazon Web Services</a> when they ran into a roadblock.
{{< /case-studies/lead >}}
<p>Early on, Workiva's backend had run mostly on <a href="https://cloud.google.com/appengine/">Google App Engine</a>. But things changed along the way as Workiva's SaaS offering, <a href="https://www.workiva.com/wdesk">Wdesk</a>, a cloud-based platform for managing and reporting business data, grew its customer base to more than 70 percent of the Fortune 500 companies. "As customer needs grew and the product offering expanded, we started to leverage a wider offering of services such as Amazon Web Services as well as other Google Cloud Platform services, creating a multi-vendor environment."</p>
<p>With this new product, there was a "sync and link" feature by which data "went through a whole host of services starting with the new spreadsheet system [<a href="https://aws.amazon.com/rds/aurora/">Amazon Aurora</a>] into what we called our linking system, and then pushed through http to our existing system, and then a number of calculations would go on, and the results would be transmitted back into the new system," says Broad. "We were trying to optimize that for speed. We thought we had made this great optimization and then it would turn out to be a micro optimization, which didn't really affect the overall speed of things."</p>
<p>The challenges faced by Broad's team may sound familiar to other companies that have also made the shift from monoliths to more distributed, microservice-based systems. "We had a number of people working on this, all on different teams, so it was difficult to get our head around what the issues were and where the bottlenecks were," says Broad.</p>
<p>"Each service team was going through different iterations of their architecture and it was very hard to follow what was actually going on in each teams' system," he adds. "We had circular dependencies where we'd have three or four different service teams unsure of where the issues really were, requiring a lot of back and forth communication. So we wasted a lot of time saying, 'What part of this is slow? Which part of this is sometimes slow depending on the use case? Which part is degrading over time? Which part of this process is asynchronous so it doesn't really matter if it's long-running or not? What are we doing that's redundant, and which part of this is buggy?'"</p>
{{< case-studies/quote
image="/images/case-studies/workiva/banner3.jpg"
author="MACLEOD BROAD, SENIOR SOFTWARE ARCHITECT AT WORKIVA"
>}}
"A tracing system can at a glance explain an architecture, narrow down a performance bottleneck and zero in on it, and generally just help direct an investigation at a high level. Being able to do that at a glance is much faster than at a meeting or with three days of debugging, and it's a lot faster than never figuring out the problem and just moving on."
{{< /case-studies/quote >}}
<p>Simply put, it was an ideal use case for tracing. "A tracing system can at a glance explain an architecture, narrow down a performance bottleneck and zero in on it, and generally just help direct an investigation at a high level," says Broad. "Being able to do that at a glance is much faster than at a meeting or with three days of debugging, and it's a lot faster than never figuring out the problem and just moving on."</p>
<p>With Workiva's back-end code running on <a href="https://cloud.google.com/compute/">Google Compute Engine</a> as well as App Engine and AWS, Broad knew that he needed a tracing system that was platform agnostic. "We were looking at different tracing solutions," he says, "and we decided that because it seemed to be a very evolving market, we didn't want to get stuck with one vendor. So OpenTracing seemed like the cleanest way to avoid vendor lock-in on what backend we actually had to use."</p>
<p>Once they introduced OpenTracing into this first use case, Broad says, "The trace made it super obvious where the bottlenecks were." Even though everyone had assumed it was Workiva's existing code that was slowing things down, that wasn't exactly the case. "It looked like the existing code was slow only because it was reaching out to our next-generation services, and they were taking a very long time to service all those requests," says Broad. "On the waterfall graph you can see the exact same work being done on every request when it was calling back in. So every service request would look the exact same for every response being paged out. And then it was just a no-brainer of, 'Why is it doing all this work again?'"</p>
<p>Using the insight OpenTracing gave them, "My team was able to look at a trace and make optimization suggestions to another team without ever looking at their code," says Broad. "The way we named our traces gave us insight whether it's doing a SQL call or it's making an RPC. And so it was really easy to say, 'OK, we know that it's going to page through all these requests. Do the work once and stuff it in cache.' And we were done basically. All those calls became sub-second calls immediately."</p>
{{< case-studies/quote
image="/images/case-studies/workiva/banner4.jpg"
author="MACLEOD BROAD, SENIOR SOFTWARE ARCHITECT AT WORKIVA"
>}}
"We were looking at different tracing solutions and we decided that because it seemed to be a very evolving market, we didn't want to get stuck with one vendor. So OpenTracing seemed like the cleanest way to avoid vendor lock-in on what backend we actually had to use."
{{< /case-studies/quote >}}
<p>After the success of the first use case, everyone involved in the trial went back and fully instrumented their products. Tracing was added to a few more use cases. "We wanted to get through the initial implementation pains early without bringing the whole department along for the ride," says Broad. "Now, a lot of teams add it when they're starting up a new service. We're really pushing adoption now more than we were before."</p>
<p>Some teams were won over quickly. "Tracing has given us immediate, actionable insight into how to improve our [Workspaces] service," says Software Engineer Michael Davis. "Through a combination of seeing where each call spends its time, as well as which calls are most often used, we were able to reduce our average response time by 95 percent (from 600ms to 30ms) in a single fix."</p>
<p>Most of Workiva's major products are now traced using OpenTracing, with data pushed into <a href="https://cloud.google.com/stackdriver/">Google StackDriver</a>. Even the products that aren't fully traced have some components and libraries that are.</p>
<p>Broad points out that because some of the engineers were working on App Engine and already had experience with the platform's Appstats library for profiling performance, it didn't take much to get them used to using OpenTracing. But others were a little more reluctant. "The biggest hindrance to adoption I think has been the concern about how much latency is introducing tracing [and StackDriver] going to cost," he says. "People are also very concerned about adding middleware to whatever they're working on. Questions about passing the context around and how that's done were common. A lot of our Go developers were fine with it, because they were already doing that in one form or another. Our Java developers were not super keen on doing that because they'd used other systems that didn't require that."But the benefits clearly outweighed the concerns, and today, Workiva's official policy is to use tracing."</p>
<p>In fact, Broad believes that tracing naturally fits in with Workiva's existing logging and metrics systems. "This was the way we presented it internally, and also the way we designed our use," he says. "Our traces are logged in the exact same mechanism as our app metric and logging data, and they get pushed the exact same way. So we treat all that data exactly the same when it's being created and when it's being recorded. We have one internal library that we use for logging, telemetry, analytics and tracing."</p>
{{< case-studies/quote author="Michael Davis, Software Engineer, Workiva" >}}
"Tracing has given us immediate, actionable insight into how to improve our [Workspaces] service. Through a combination of seeing where each call spends its time, as well as which calls are most often used, we were able to reduce our average response time by 95 percent (from 600ms to 30ms) in a single fix."
{{< /case-studies/quote >}}
<p>For Workiva, OpenTracing has become an essential tool for zeroing in on optimizations and determining what's actually a micro-optimization by observing usage patterns. "On some projects we often assume what the customer is doing, and we optimize for these crazy scale cases that we hit 1 percent of the time," says Broad. "It's been really helpful to be able to say, 'OK, we're adding 100 milliseconds on every request that does X, and we only need to add that 100 milliseconds if it's the worst of the worst case, which only happens one out of a thousand requests or one out of a million requests."</p>
<p>Unlike many other companies, Workiva also traces the client side. "For us, the user experience is important—it doesn't matter if the RPC takes 100 milliseconds if it still takes 5 seconds to do the rendering to show it in the browser," says Broad. "So for us, those client times are important. We trace it to see what parts of loading take a long time. We're in the middle of working on a definition of what is 'loaded.' Is it when you have it, or when it's rendered, or when you can interact with it? Those are things we're planning to use tracing for to keep an eye on and to better understand."</p>
<p>That also requires adjusting for differences in external and internal clocks. "Before time correcting, it was horrible; our traces were more misleading than anything," says Broad. "So we decided that we would return a timestamp on the response headers, and then have the client reorient its time based on that—not change its internal clock but just calculate the offset on the response time to when the client got it. And if you end up in an impossible situation where a client RPC spans 210 milliseconds but the time on the response time is outside of that window, then we have to reorient that."</p>
<p>Broad is excited about the impact OpenTracing has already had on the company, and is also looking ahead to what else the technology can enable. One possibility is using tracing to update documentation in real time. "Keeping documentation up to date with reality is a big challenge," he says. "Say, we just ran a trace simulation or we just ran a smoke test on this new deploy, and the architecture doesn't match the documentation. We can find whose responsibility it is and let them know and have them update it. That's one of the places I'd like to get in the future with tracing."</p>

Binary file not shown.

Before

Width:  |  Height:  |  Size: 5.8 KiB

View File

@ -1,4 +0,0 @@
---
title: Yahoo! Japan
content_url: https://kubernetes.io/blog/2016/10/kubernetes-and-openstack-at-yahoo-japan
---

Binary file not shown.

Before

Width:  |  Height:  |  Size: 7.1 KiB

View File

@ -1,82 +0,0 @@
---
title: Ygrene Case Study
linkTitle: Ygrene
case_study_styles: true
cid: caseStudies
logo: ygrene_featured_logo.png
featured: true
weight: 48
quote: >
We had to change some practices and code, and the way things were built, but we were able to get our main systems onto Kubernetes in a month or so, and then into production within two months. That's very fast for a finance company.
new_case_study_styles: true
heading_background: /images/case-studies/ygrene/banner1.jpg
heading_title_logo: /images/ygrene_logo.png
subheading: >
Ygrene: Using Cloud Native to Bring Security and Scalability to the Finance Industry
case_study_details:
- Company: Ygrene
- Location: Petaluma, Calif.
- Industry: Clean energy financing
---
<h2>Challenge</h2>
<p>A PACE (Property Assessed Clean Energy) financing company, Ygrene has funded more than $1 billion in loans since 2010. In order to approve and process those loans, "We have lots of data sources that are being aggregated, and we also have lots of systems that need to churn on that data," says Ygrene Development Manager Austin Adams. The company was utilizing massive servers, and "we just reached the limit of being able to scale them vertically. We had a really unstable system that became overwhelmed with requests just for doing background data processing in real time. The performance the users saw was very poor. We needed a solution that wouldn't require us to make huge refactors to the code base." As a finance company, Ygrene also needed to ensure that they were shipping their applications securely.</p>
<h2>Solution</h2>
<p>Moving from an Engine Yard platform and Amazon Elastic Beanstalk, the Ygrene team embraced cloud native technologies and practices: <a href="https://kubernetes.io/">Kubernetes</a> to help scale out vertically and distribute workloads, <a href="https://github.com/theupdateframework/notary">Notary</a> to put in build-time controls and get trust on the Docker images being used with third-party dependencies, and <a href="https://www.fluentd.org/">Fluentd</a> for "observing every part of our stack," all running on <a href="https://aws.amazon.com/ec2/spot/">Amazon EC2 Spot</a>.</p>
<h2>Impact</h2>
<p>Before, deployments typically took three to four hours, and two or three months' worth of work would be deployed at low-traffic times every week or two weeks. Now, they take five minutes for Kubernetes, and an hour for the overall deploy with smoke testing. And "we're able to deploy three or four times a week, with just one week's or two days' worth of work," Adams says. "We're deploying during the work week, in the daytime and without any downtime. We had to ask for business approval to take the systems down, even in the middle of the night, because people could be doing loans. Now we can deploy, ship code, and migrate databases, all without taking the system down. The company gets new features without worrying that some business will be lost or delayed." Additionally, by using the kops project, Ygrene can now run its Kubernetes clusters with AWS EC2 Spot, at a tenth of the previous cost. These cloud native technologies have "changed the game for scalability, observability, and security—we're adding new data sources that are very secure," says Adams. "Without Kubernetes, Notary, and Fluentd, we couldn't tell our investors and team members that we knew what was going on."</p>
{{< case-studies/quote author="Austin Adams, Development Manager, Ygrene Energy Fund" >}}
"CNCF projects are helping Ygrene determine the security and observability standards for the entire PACE industry. We're an emerging finance industry, and without these projects, especially Kubernetes, we couldn't be the industry leader that we are today."
{{< /case-studies/quote >}}
{{< case-studies/lead >}}
In less than a decade, <a href="https://ygrene.com/">Ygrene</a> has funded more than $1 billion in loans for renewable energy projects.
{{< /case-studies/lead >}}
<p>A <a href="https://www.energy.gov/eere/slsc/property-assessed-clean-energy-programs">PACE</a> (Property Assessed Clean Energy) financing company, "We take the equity in a home or a commercial building, and use it to finance property improvements for anything that saves electricity, produces electricity, saves water, or reduces carbon emissions," says Development Manager Austin Adams.</p>
<p>In order to approve those loans, the company processes an enormous amount of underwriting data. "We have tons of different points that we have to validate about the property, about the company, or about the person," Adams says. "So we have lots of data sources that are being aggregated, and we also have lots of systems that need to churn on that data in real time."</p>
<p>By 2017, deployments and scalability had become pain points. The company was utilizing massive servers, and "we just reached the limit of being able to scale them vertically," he says. Migrating to AWS Elastic Beanstalk didn't solve the problem: "The Scala services needed a lot of data from the main Ruby on Rails services and from different vendors, so they were asking for information from our Ruby services at a rate that those services couldn't handle. We had lots of configuration misses with Elastic Beanstalk as well. It just came to a head, and we realized we had a really unstable system."</p>
{{< case-studies/quote
image="/images/case-studies/ygrene/banner3.jpg"
author="Austin Adams, Development Manager, Ygrene Energy Fund"
>}}
"CNCF has been an amazing incubator for so many projects. Now we look at its webpage regularly to find out if there are any new, awesome, high-quality projects we can implement into our stack. It's actually become a hub for us for knowing what software we need to be looking at to make our systems more secure or more scalable."
{{< /case-studies/quote >}}
<p>Adams along with the rest of the team set out to find a solution that would be transformational, but "wouldn't require us to make huge refactors to the code base," he says. And as a finance company, Ygrene needed security as much as scalability. They found the answer by embracing cloud native technologies: Kubernetes to help scale out vertically and distribute workloads, Notary to achieve reliable security at every level, and Fluentd for observability. "Kubernetes was where the community was going, and we wanted to be future proof," says Adams.</p>
<p>With Kubernetes, the team was able to quickly containerize the Ygrene application with Docker. "We had to change some practices and code, and the way things were built," Adams says, "but we were able to get our main systems onto Kubernetes in a month or so, and then into production within two months. That's very fast for a finance company."</p>
<p>How? Cloud native has "changed the game for scalability, observability, and security—we're adding new data sources that are very secure," says Adams. "Without Kubernetes, Notary, and Fluentd, we couldn't tell our investors and team members that we knew what was going on."</p>
<p>Notary, in particular, "has been a godsend," says Adams. "We need to know that our attack surface on third-party dependencies is low, or at least managed. We use it as a trust system and we also use it as a separation, so production images are signed by Notary, but some development images we don't sign. That is to ensure that they can't get into the production cluster. We've been using it in the test cluster to feel more secure about our builds."</p>
{{< case-studies/quote image="/images/case-studies/ygrene/banner4.jpg">}}
"We had to change some practices and code, and the way things were built," Adams says, "but we were able to get our main systems onto Kubernetes in a month or so, and then into production within two months. That's very fast for a finance company."
{{< /case-studies/quote >}}
<p>By using the kops project, Ygrene was able to move from Elastic Beanstalk to running its Kubernetes clusters on AWS EC2 Spot, at a tenth of the previous cost. "In order to scale before, we would need to up our instance sizes, incurring high cost for low value," says Adams. "Now with Kubernetes and kops, we are able to scale horizontally on Spot with multiple instance groups."</p>
<p>That also helped them mitigate the risk that comes with running in the public cloud. "We figured out, essentially, that if we're able to select instance classes using EC2 Spot that had an extremely low likelihood of interruption and zero history of interruption, and we're willing to pay a price high enough, that we could virtually get the same guarantee using Kubernetes because we have enough nodes," says Software Engineer Zach Arnold, who led the migration to Kubernetes. "Now that we've re-architected these pieces of the application to not live on the same server, we can push out to many different servers and have a more stable deployment."</p>
<p>As a result, the team can now ship code any time of day. "That was risky because it could bring down your whole loan management software with it," says Arnold. "But we now can deploy safely and securely during the day."</p>
{{< case-studies/quote >}}
"In order to scale before, we would need to up our instance sizes, incurring high cost for low value," says Adams. "Now with Kubernetes and kops, we are able to scale horizontally on Spot with multiple instance groups."
{{< /case-studies/quote >}}
<p>Before, deployments typically took three to four hours, and two or three months' worth of work would be deployed at low-traffic times every week or two weeks. Now, they take five minutes for Kubernetes, and an hour for an overall deploy with smoke testing. And "we're able to deploy three or four times a week, with just one week's or two days' worth of work," Adams says. "We're deploying during the work week, in the daytime and without any downtime. We had to ask for business approval to take the systems down for 30 minutes to an hour, even in the middle of the night, because people could be doing loans. Now we can deploy, ship code, and migrate databases, all without taking the system down. The company gets new features without worrying that some business will be lost or delayed."</p>
<p>Cloud native also affected how Ygrene's 50+ developers and contractors work. Adams and Arnold spent considerable time "teaching people to think distributed out of the box," says Arnold. "We ended up picking what we call the Four S's of Shipping: safely, securely, stably, and speedily." (For more on the security piece of it, see their <a href="https://thenewstack.io/beyond-ci-cd-how-continuous-hacking-of-docker-containers-and-pipeline-driven-security-keeps-ygrene-secure/">article</a> on their "continuous hacking" strategy.) As for the engineers, says Adams, "they have been able to advance as their software has advanced. I think that at the end of the day, the developers feel better about what they're doing, and they also feel more connected to the modern software development community."</p>
<p>Looking ahead, Adams is excited to explore more CNCF projects, including SPIFFE and SPIRE. "CNCF has been an amazing incubator for so many projects," he says. "Now we look at its webpage regularly to find out if there are any new, awesome, high-quality projects we can implement into our stack. It's actually become a hub for us for knowing what software we need to be looking at to make our systems more secure or more scalable."</p>

Binary file not shown.

Before

Width:  |  Height:  |  Size: 11 KiB

View File

@ -1,83 +0,0 @@
---
title: Zalando Case Study
case_study_styles: true
cid: caseStudies
new_case_study_styles: true
heading_background: /images/case-studies/zalando/banner1.jpg
heading_title_logo: /images/zalando_logo.png
subheading: >
Europe's Leading Online Fashion Platform Gets Radical with Cloud Native
case_study_details:
- Company: Zalando
- Location: Berlin, Germany
- Industry: Online Fashion
---
<h2>Challenge</h2>
<p>Zalando, Europe's leading online fashion platform, has experienced exponential growth since it was founded in 2008. In 2015, with plans to further expand its original e-commerce site to include new services and products, Zalando embarked on a <a href="https://jobs.zalando.com/tech/blog/radical-agility-study-notes/">radical transformation</a> resulting in autonomous self-organizing teams. This change requires an infrastructure that could scale with the growth of the engineering organization. Zalando's technology department began rewriting its applications to be cloud-ready and started moving its infrastructure from on-premise data centers to the cloud. While orchestration wasn't immediately considered, as teams migrated to <a href="https://aws.amazon.com/">Amazon Web Services</a> (AWS): "We saw the pain teams were having with infrastructure and Cloud Formation on AWS," says Henning Jacobs, Head of Developer Productivity. "There's still too much operational overhead for the teams and compliance. " To provide better support, cluster management was brought into play.</p>
<h2>Solution</h2>
<p>The company now runs its Docker containers on AWS using Kubernetes orchestration.</p>
<h2>Impact</h2>
<p>With the old infrastructure "it was difficult to properly embrace new technologies, and DevOps teams were considered to be a bottleneck," says Jacobs. "Now, with this cloud infrastructure, they have this packaging format, which can contain anything that runs on the Linux kernel. This makes a lot of people pretty happy. The engineers love autonomy."</p>
{{< case-studies/quote author="Henning Jacobs, Head of Developer Productivity at Zalando" >}}
"We envision all Zalando delivery teams running their containerized applications on a state-of-the-art, reliable and scalable cluster infrastructure provided by Kubernetes."
{{< /case-studies/quote >}}
{{< case-studies/lead >}}
When Henning Jacobs arrived at Zalando in 2010, the company was just two years old with 180 employees running an online store for European shoppers to buy fashion items.
{{< /case-studies/lead >}}
<p>"It started as a PHP e-commerce site which was easy to get started with, but was not scaling with the business' needs" says Jacobs, Head of Developer Productivity at Zalando.</p>
<p>At that time, the company began expanding beyond its German origins into other European markets. Fast-forward to today and Zalando now has more than 14,000 employees, 3.6 billion Euro in revenue for 2016 and operates across 15 countries. "With growth in all dimensions, and constant scaling, it has been a once-in-a-lifetime experience," he says.</p>
<p>Not to mention a unique opportunity for an infrastructure specialist like Jacobs. Just after he joined, the company began rewriting all their applications in-house. "That was generally our strategy," he says. "For example, we started with our own logistics warehouses but at first you don't know how to do logistics software, so you have some vendor software. And then we replaced it with our own because with off-the-shelf software you're not competitive. You need to optimize these processes based on your specific business needs."</p>
<p>In parallel to rewriting their applications, Zalando had set a goal of expanding beyond basic e-commerce to a platform offering multi-tenancy, a dramatic increase in assortments and styles, same-day delivery and even <a href="https://www.zalon.de">your own personal online stylist</a>.</p>
<p>The need to scale ultimately led the company on a cloud-native journey. As did its embrace of a microservices-based software architecture that gives engineering teams more autonomy and ownership of projects. "This move to the cloud was necessary because in the data center you couldn't have autonomous teams. You have the same infrastructure and it was very homogeneous, so you could only run your Java or Python app," Jacobs says.</p>
{{< case-studies/quote image="/images/case-studies/zalando/banner3.jpg" >}}
"This move to the cloud was necessary because in the data center you couldn't have autonomous teams. You have the same infrastructure and it was very homogeneous, so you could only run your Java or Python app."
{{< /case-studies/quote >}}
<p>Zalando began moving its infrastructure from two on-premise data centers to the cloud, requiring the migration of older applications for cloud-readiness. "We decided to have a clean break," says Jacobs. "Our <a href="https://aws.amazon.com/">Amazon Web Services</a> infrastructure was set up like so: Every team has its own AWS account, which is completely isolated, meaning there's no 'lift and shift.' You basically have to rewrite your application to make it cloud-ready even down to the persistence layer. We bravely went back to the drawing board and redid everything, first choosing Docker as a common containerization, then building the infrastructure from there."</p>
<p>The company decided to hold off on orchestration at the beginning, but as teams were migrated to AWS, "we saw the pain teams were having with infrastructure and cloud formation on AWS," says Jacobs.</p>
<p>Zalandos 200+ autonomous engineering teams decide what technologies to use and could operate their own applications using their own AWS accounts. This setup proved to be a compliance challenge. Even with strict rules-of-play and automated compliance checks in place, engineering teams and IT-compliance were overburdened addressing compliance issues. "Violations appear for non-compliant behavior, which we detect when scanning the cloud infrastructure," says Jacobs. "Everything is possible and nothing enforced, so you have to live with violations (and resolve them) instead of preventing the error in the first place. This means overhead for teams—and overhead for compliance and operations. It also takes time to spin up new EC2 instances on AWS, which affects our deployment velocity."</p>
<p>The team realized they needed to "leverage the value you get from cluster management," says Jacobs. When they first looked at Platform as a Service (PaaS) options in 2015, the market was fragmented; but "now there seems to be a clear winner. It seemed like a good bet to go with Kubernetes."</p>
<p>The transition to Kubernetes started in 2016 during Zalando's <a href="https://jobs.zalando.com/tech/blog/hack-week-5-is-live/?gh_src=4n3gxh1">Hack Week</a> where participants deployed their projects to a Kubernetes cluster. From there 60 members of the tech infrastructure department were on-boarded - and then engineering teams were brought on one at a time. "We always start by talking with them and make sure everyone's expectations are clear," says Jacobs. "Then we conduct some Kubernetes training, which is mostly training for our CI/CD setup, because the user interface for our users is primarily through the CI/CD system. But they have to know fundamental Kubernetes concepts and the API. This is followed by a weekly sync with each team to check their progress. Once they have something in production, we want to see if everything is fine on top of what we can improve."</p>
{{< case-studies/quote image="/images/case-studies/zalando/banner4.jpg" >}}
Once Zalando began migrating applications to Kubernetes, the results were immediate. "Kubernetes is a cornerstone for our seamless end-to-end developer experience. We are able to ship ideas to production using a single consistent and declarative API," says Jacobs.
{{< /case-studies/quote >}}
<p>At the moment, Zalando is running an initial 40 Kubernetes clusters with plans to scale for the foreseeable future. Once Zalando began migrating applications to Kubernetes, the results were immediate. "Kubernetes is a cornerstone for our seamless end-to-end developer experience. We are able to ship ideas to production using a single consistent and declarative API," says Jacobs. "The self-healing infrastructure provides a frictionless experience with higher-level abstractions built upon low-level best practices. We envision all Zalando delivery teams will run their containerized applications on a state-of-the-art reliable and scalable cluster infrastructure provided by Kubernetes."</p>
<p>With the old on-premise infrastructure "it was difficult to properly embrace new technologies, and DevOps teams were considered to be a bottleneck," says Jacobs. "Now, with this cloud infrastructure, they have this packaging format, which can contain anything that runs in the Linux kernel. This makes a lot of people pretty happy. The engineers love the autonomy."</p>
<p>There were a few challenges in Zalando's Kubernetes implementation. "We are a team of seven people providing clusters to different engineering teams, and our goal is to provide a rock-solid experience for all of them," says Jacobs. "We don't want pet clusters. We don't want to have to understand what workload they have; it should just work out of the box. With that in mind, cluster autoscaling is important. There are many different ways of doing cluster management, and this is not part of the core. So we created two components to provision clusters, have a registry for clusters, and to manage the whole cluster life cycle."</p>
<p>Jacobs's team also worked to improve the Kubernetes-AWS integration. "Thus you're very restricted. You need infrastructure to scale each autonomous team's idea." Plus, "there are still a lot of best practices missing," says Jacobs. The team, for example, recently solved a pod security policy issue. "There was already a concept in Kubernetes but it wasn't documented, so it was kind of tricky," he says. The large Kubernetes community was a big help to resolve the issue. To help other companies start down the same path, Jacobs compiled his team's learnings in a document called <a href="http://kubernetes-on-aws.readthedocs.io/en/latest/admin-guide/kubernetes-in-production.html">Running Kubernetes in Production</a>.</p>
{{< case-studies/quote >}}
"The Kubernetes API allows us to run applications in a cloud provider-agnostic way, which gives us the freedom to revisit IaaS providers in the coming years... We expect the Kubernetes API to be the global standard for PaaS infrastructure and are excited about the continued journey."
{{< /case-studies/quote >}}
<p>In the end, Kubernetes made it possible for Zalando to introduce and maintain the new products the company envisioned to grow its platform. "<a href="https://www.zalon.de/">The fashion advice</a> product used Scala, and there were struggles to make this possible with our former infrastructure," says Jacobs. "It was a workaround, and that team needed more and more support from the platform team, just because they used different technologies. Now with Kubernetes, it's autonomous. Whatever the workload is, that team can just go their way, and Kubernetes prevents other bottlenecks."</p>
<p>Looking ahead, Jacobs sees Zalando's new infrastructure as a great enabler for other things the company has in the works, from its new logistics software, to a platform feature connecting brands, to products dreamed up by data scientists. "One vision is if you watch the next James Bond movie and see the suit he's wearing, you should be able to automatically order it, and have it delivered to you within an hour," says Jacobs. "It's about connecting the full fashion sphere. This is definitely not possible if you have a bottleneck with everyone running in the same data center and thus very restricted. You need infrastructure to scale each autonomous team's idea."</p>
<p>For other companies considering this technology, Jacobs says he wouldn't necessarily advise doing it exactly the same way Zalando did. "It's okay to do so if you're ready to fail at some things," he says. "You need to set the right expectations. Not everything will work. Rewriting apps and this type of organizational change can be disruptive. The first product we moved was critical. There were a lot of dependencies, and it took longer than expected. Maybe we should have started with something less complicated, less business critical, just to get our toes wet."</p>
<p>But once they got to the other side "it was clear for everyone that there's no big alternative," Jacobs adds. "The Kubernetes API allows us to run applications in a cloud provider-agnostic way, which gives us the freedom to revisit IaaS providers in the coming years. Zalando Technology benefits from migrating to Kubernetes as we are able to leverage our existing knowledge to create an engineering platform offering flexibility and speed to our engineers while significantly reducing the operational overhead. We expect the Kubernetes API to be the global standard for PaaS infrastructure and are excited about the continued journey."</p>

Binary file not shown.

Before

Width:  |  Height:  |  Size: 7.5 KiB