From 4d9ee76ace29aa281c669a55e28d8abf07f8b4c8 Mon Sep 17 00:00:00 2001 From: "M. Habib Rosyad" Date: Sun, 6 Sep 2020 18:13:40 +0700 Subject: [PATCH] Improve maintainability of Case Studies styling - Add quote and lead shortcode - Add case study metadata in front matter (to generate page) - Allow case study page to inherit similar styles from a centralised CSS file. --- assets/scss/_case-studies.scss | 3 +- content/en/case-studies/adform/index.html | 130 ++++++-------- content/en/case-studies/adidas/index.html | 166 +++++++----------- content/en/case-studies/amadeus/index.html | 159 ++++++++--------- content/en/case-studies/ancestry/index.html | 151 +++++++--------- .../en/case-studies/ant-financial/index.html | 140 +++++++-------- content/en/case-studies/appdirect/index.html | 134 +++++++------- content/en/case-studies/babylon/index.html | 133 ++++++-------- content/en/case-studies/blablacar/index.html | 147 +++++++--------- content/en/case-studies/blackrock/index.html | 139 ++++++--------- .../en/case-studies/booking-com/index.html | 153 +++++++--------- content/en/case-studies/booz-allen/index.html | 147 +++++++--------- content/en/case-studies/bose/index.html | 134 +++++++------- content/en/case-studies/box/index.html | 154 ++++++++-------- content/en/case-studies/buffer/index.html | 153 +++++++--------- .../en/case-studies/capital-one/index.html | 130 +++++--------- content/en/case-studies/cern/index.html | 128 ++++++-------- .../en/case-studies/chinaunicom/index.html | 124 ++++++------- .../case-studies/city-of-montreal/index.html | 140 +++++++-------- content/en/case-studies/crowdfire/index.html | 144 +++++++-------- content/en/case-studies/denso/index.html | 130 ++++++-------- content/en/case-studies/golfnow/index.html | 164 +++++++---------- content/en/case-studies/haufegroup/index.html | 145 +++++++-------- content/en/case-studies/huawei/index.html | 126 ++++++------- content/en/case-studies/ibm/index.html | 136 ++++++-------- content/en/case-studies/ing/index.html | 129 ++++++-------- content/en/case-studies/jd-com/index.html | 136 +++++++------- content/en/case-studies/naic/index.html | 142 ++++++--------- content/en/case-studies/nav/index.html | 132 +++++++------- content/en/case-studies/nerdalize/index.html | 143 +++++++-------- content/en/case-studies/netease/index.html | 127 ++++++-------- .../en/case-studies/newyorktimes/index.html | 138 ++++++--------- content/en/case-studies/nokia/index.html | 124 ++++++------- content/en/case-studies/nordstrom/index.html | 135 ++++++-------- .../northwestern-mutual/index.html | 120 +++++-------- content/en/case-studies/ocado/index.html | 130 ++++++-------- content/en/case-studies/openAI/index.html | 116 +++++------- content/en/case-studies/peardeck/index.html | 158 +++++++---------- content/en/case-studies/pearson/index.html | 154 ++++++++-------- content/en/case-studies/pingcap/index.html | 135 +++++++------- content/en/case-studies/pinterest/index.html | 128 ++++++-------- content/en/case-studies/prowise/index.html | 144 +++++++-------- content/en/case-studies/ricardo-ch/index.html | 135 ++++++-------- content/en/case-studies/slamtec/index.html | 125 ++++++------- content/en/case-studies/slingtv/index.html | 129 ++++++-------- content/en/case-studies/sos/index.html | 134 +++++++------- content/en/case-studies/spotify/index.html | 131 ++++++-------- .../en/case-studies/squarespace/index.html | 122 +++++-------- content/en/case-studies/thredup/index.html | 137 +++++++-------- content/en/case-studies/vsco/index.html | 134 ++++++-------- content/en/case-studies/wikimedia/index.html | 144 ++++++--------- content/en/case-studies/wink/index.html | 142 +++++++-------- content/en/case-studies/woorank/index.html | 135 +++++++------- content/en/case-studies/workiva/index.html | 140 +++++++-------- content/en/case-studies/ygrene/index.html | 123 +++++-------- content/en/case-studies/zalando/index.html | 146 +++++++-------- layouts/case-studies/single.html | 29 ++- layouts/partials/css.html | 4 + layouts/shortcodes/case-studies/lead.html | 1 + layouts/shortcodes/case-studies/quote.html | 12 ++ static/css/new-case-studies.css | 163 +++++++++++++++++ 61 files changed, 3389 insertions(+), 4398 deletions(-) create mode 100644 layouts/shortcodes/case-studies/lead.html create mode 100644 layouts/shortcodes/case-studies/quote.html create mode 100644 static/css/new-case-studies.css diff --git a/assets/scss/_case-studies.scss b/assets/scss/_case-studies.scss index d61d24189c..4f44864127 100644 --- a/assets/scss/_case-studies.scss +++ b/assets/scss/_case-studies.scss @@ -2,6 +2,7 @@ hr { background-color: #999999; + margin-top: 0; } h2 { @@ -10,7 +11,7 @@ h2 { .subhead { padding-bottom: 2% !important; - padding-top: 0% !important; + padding-top: 0.75em !important; } .details { diff --git a/content/en/case-studies/adform/index.html b/content/en/case-studies/adform/index.html index be35a2d837..1de43d0637 100644 --- a/content/en/case-studies/adform/index.html +++ b/content/en/case-studies/adform/index.html @@ -3,116 +3,84 @@ title: Adform Case Study linkTitle: Adform case_study_styles: true cid: caseStudies -css: /css/style_case_studies.css logo: adform_featured_logo.png draft: false featured: true weight: 47 quote: > Kubernetes enabled the self-healing and immutable infrastructure. We can do faster releases, so our developers are really happy. They can ship our features faster than before, and that makes our clients happier. + +new_case_study_styles: true +heading_background: /images/case-studies/adform/banner1.jpg +heading_title_logo: /images/adform_logo.png +subheading: > + Improving Performance and Morale with Cloud Native +case_study_details: + - Company: AdForm + - Location: Copenhagen, Denmark + - Industry: Adtech --- -
-

CASE STUDY:
Improving Performance and Morale with Cloud Native +

Challenge

-

+

Adform's mission is to provide a secure and transparent full stack of advertising technology to enable digital ads across devices. The company has a large infrastructure: OpenStack-based private clouds running on 1,100 physical servers in 7 data centers around the world, 3 of which were opened in the past year. With the company's growth, the infrastructure team felt that "our private cloud was not really flexible enough," says IT System Engineer Edgaras Apšega. "The biggest pain point is that our developers need to maintain their virtual machines, so rolling out technology and new software takes time. We were really struggling with our releases, and we didn't have self-healing infrastructure."

-
+

Solution

-
- Company  AdForm     Location  Copenhagen, Denmark     Industry  Adtech -
- -
-
-
-
-

Challenge

- Adform’s mission is to provide a secure and transparent full stack of advertising technology to enable digital ads across devices. The company has a large infrastructure: OpenStack-based private clouds running on 1,100 physical servers in 7 data centers around the world, 3 of which were opened in the past year. With the company’s growth, the infrastructure team felt that "our private cloud was not really flexible enough," says IT System Engineer Edgaras Apšega. "The biggest pain point is that our developers need to maintain their virtual machines, so rolling out technology and new software takes time. We were really struggling with our releases, and we didn’t have self-healing infrastructure." - - -
- -

Solution

- The team, which had already been using Prometheus for monitoring, embraced Kubernetes and cloud native practices in 2017. "To start our Kubernetes journey, we had to adapt all our software, so we had to choose newer frameworks," says Apšega. "We also adopted the microservices way, so observability is much better because you can inspect the bug or the services separately." - - -
- -
+

The team, which had already been using Prometheus for monitoring, embraced Kubernetes and cloud native practices in 2017. "To start our Kubernetes journey, we had to adapt all our software, so we had to choose newer frameworks," says Apšega. "We also adopted the microservices way, so observability is much better because you can inspect the bug or the services separately."

Impact

- "Kubernetes helps our business a lot because our features are coming to market faster," says Apšega. The release process went from several hours to several minutes. Autoscaling has been at least 6 times faster than the semi-manual VM bootstrapping and application deployment required before. The team estimates that the company has experienced cost savings of 4-5x due to less hardware and fewer man hours needed to set up the hardware and virtual machines, metrics, and logging. Utilization of the hardware resources has been reduced as well, with containers notching 2-3 times more efficiency over virtual machines. "The deployments are very easy because developers just push the code and it automatically appears on Kubernetes," says Apšega. Prometheus has also had a positive impact: "It provides high availability for metrics and alerting. We monitor everything starting from hardware to applications. Having all the metrics in Grafana dashboards provides great insight on your systems." +

"Kubernetes helps our business a lot because our features are coming to market faster," says Apšega. The release process went from several hours to several minutes. Autoscaling has been at least 6 times faster than the semi-manual VM bootstrapping and application deployment required before. The team estimates that the company has experienced cost savings of 4-5x due to less hardware and fewer man hours needed to set up the hardware and virtual machines, metrics, and logging. Utilization of the hardware resources has been reduced as well, with containers notching 2-3 times more efficiency over virtual machines. "The deployments are very easy because developers just push the code and it automatically appears on Kubernetes," says Apšega. Prometheus has also had a positive impact: "It provides high availability for metrics and alerting. We monitor everything starting from hardware to applications. Having all the metrics in Grafana dashboards provides great insight on your systems."

-
+{{< case-studies/quote author="Edgaras Apšega, IT Systems Engineer, Adform" >}} +"Kubernetes enabled the self-healing and immutable infrastructure. We can do faster releases, so our developers are really happy. They can ship our features faster than before, and that makes our clients happier." +{{< /case-studies/quote >}} -
-
-
-
-"Kubernetes enabled the self-healing and immutable infrastructure. We can do faster releases, so our developers are really happy. They can ship our features faster than before, and that makes our clients happier."

— Edgaras Apšega, IT Systems Engineer, Adform
+{{< case-studies/lead >}} +Adform made headlines last year when it detected the HyphBot ad fraud network that was costing some businesses hundreds of thousands of dollars a day. +{{< /case-studies/lead >}} -
-
+

With its mission to provide a secure and transparent full stack of advertising technology to enable an open internet, Adform published a white paper revealing what it did—and others could too—to limit customers' exposure to the scam.

+

In that same spirit, Adform is sharing its cloud native journey. "When you see that everyone shares their best practices, it inspires you to contribute back to the project," says IT Systems Engineer Edgaras Apšega.

-
-
-

Adform made headlines last year when it detected the HyphBot ad fraud network that was costing some businesses hundreds of thousands of dollars a day.

With its mission to provide a secure and transparent full stack of advertising technology to enable an open internet, Adform published a white paper revealing what it did—and others could too—to limit customers’ exposure to the scam.

-In that same spirit, Adform is sharing its cloud native journey. "When you see that everyone shares their best practices, it inspires you to contribute back to the project," says IT Systems Engineer Edgaras Apšega.

-The company has a large infrastructure: OpenStack-based private clouds running on 1,100 physical servers in their own seven data centers around the world, three of which were opened in the past year. With the company’s growth, the infrastructure team felt that "our private cloud was not really flexible enough," says Apšega. "The biggest pain point is that our developers need to maintain their virtual machines, so rolling out technology and new software really takes time. We were really struggling with our releases, and we didn’t have self-healing infrastructure." +

The company has a large infrastructure: OpenStack-based private clouds running on 1,100 physical servers in their own seven data centers around the world, three of which were opened in the past year. With the company's growth, the infrastructure team felt that "our private cloud was not really flexible enough," says Apšega. "The biggest pain point is that our developers need to maintain their virtual machines, so rolling out technology and new software really takes time. We were really struggling with our releases, and we didn't have self-healing infrastructure."

+{{< case-studies/quote + image="/images/case-studies/adform/banner3.jpg" + author="Edgaras Apšega, IT Systems Engineer, Adform" +>}} +"The fact that Cloud Native Computing Foundation incubated Kubernetes was a really big point for us because it was vendor neutral. And we can see that a community really gathers around it. Everyone shares their experiences, their knowledge, and the fact that it's open source, you can contribute." +{{< /case-studies/quote >}} -
-
-
-
- "The fact that Cloud Native Computing Foundation incubated Kubernetes was a really big point for us because it was vendor neutral. And we can see that a community really gathers around it. Everyone shares their experiences, their knowledge, and the fact that it’s open source, you can contribute."

— Edgaras Apšega, IT Systems Engineer, Adform
-
-
-
-
+

The team, which had already been using Prometheus for monitoring, embraced Kubernetes, microservices, and cloud native practices. "The fact that Cloud Native Computing Foundation incubated Kubernetes was a really big point for us because it was vendor neutral," says Apšega. "And we can see that a community really gathers around it."

-The team, which had already been using Prometheus for monitoring, embraced Kubernetes, microservices, and cloud native practices. "The fact that Cloud Native Computing Foundation incubated Kubernetes was a really big point for us because it was vendor neutral," says Apšega. "And we can see that a community really gathers around it."

-A proof of concept project was started, with a Kubernetes cluster running on bare metal in the data center. When developers saw how quickly containers could be spun up compared to the virtual machine process, "they wanted to ship their containers in production right away, and we were still doing proof of concept," says IT Systems Engineer Andrius Cibulskis. -Of course, a lot of work still had to be done. "First of all, we had to learn Kubernetes, see all of the moving parts, how they glue together," says Apšega. "Second of all, the whole CI/CD part had to be redone, and our DevOps team had to invest more man hours to implement it. And third is that developers had to rewrite the code, and they’re still doing it." -

-The first production cluster was launched in the spring of 2018, and is now up to 20 physical machines dedicated for pods throughout three data centers, with plans for separate clusters in the other four data centers. The user-facing Adform application platform, data distribution platform, and back ends are now all running on Kubernetes. "Many APIs for critical applications are being developed for Kubernetes," says Apšega. "Teams are rewriting their applications to .NET core, because it supports containers, and preparing to move to Kubernetes. And new applications, by default, go in containers." +

A proof of concept project was started, with a Kubernetes cluster running on bare metal in the data center. When developers saw how quickly containers could be spun up compared to the virtual machine process, "they wanted to ship their containers in production right away, and we were still doing proof of concept," says IT Systems Engineer Andrius Cibulskis.

+

Of course, a lot of work still had to be done. "First of all, we had to learn Kubernetes, see all of the moving parts, how they glue together," says Apšega. "Second of all, the whole CI/CD part had to be redone, and our DevOps team had to invest more man hours to implement it. And third is that developers had to rewrite the code, and they're still doing it."

-
-
-
-
-"Releases are really nice for them, because they just push their code to Git and that’s it. They don’t have to worry about their virtual machines anymore."

— Andrius Cibulskis, IT Systems Engineer, Adform
-
-
+

The first production cluster was launched in the spring of 2018, and is now up to 20 physical machines dedicated for pods throughout three data centers, with plans for separate clusters in the other four data centers. The user-facing Adform application platform, data distribution platform, and back ends are now all running on Kubernetes. "Many APIs for critical applications are being developed for Kubernetes," says Apšega. "Teams are rewriting their applications to .NET core, because it supports containers, and preparing to move to Kubernetes. And new applications, by default, go in containers."

-
-
-This big push has been driven by the real impact that these new practices have had. "Kubernetes helps our business a lot because our features are coming to market faster," says Apšega. "The deployments are very easy because developers just push the code and it automatically appears on Kubernetes." The release process went from several hours to several minutes. Autoscaling is at least six times faster than the semi-manual VM bootstrapping and application deployment required before.

-The team estimates that the company has experienced cost savings of 4-5x due to less hardware and fewer man hours needed to set up the hardware and virtual machines, metrics, and logging. Utilization of the hardware resources has been reduced as well, with containers notching two to three times more efficiency over virtual machines.

-Prometheus has also had a positive impact: "It provides high availability for metrics and alerting," says Apšega. "We monitor everything starting from hardware to applications. Having all the metrics in Grafana dashboards provides great insight on our systems." +{{< case-studies/quote + image="/images/case-studies/adform/banner4.jpg" + author="Andrius Cibulskis, IT Systems Engineer, Adform" +>}} +"Releases are really nice for them, because they just push their code to Git and that's it. They don't have to worry about their virtual machines anymore." +{{< /case-studies/quote >}} +

This big push has been driven by the real impact that these new practices have had. "Kubernetes helps our business a lot because our features are coming to market faster," says Apšega. "The deployments are very easy because developers just push the code and it automatically appears on Kubernetes." The release process went from several hours to several minutes. Autoscaling is at least six times faster than the semi-manual VM bootstrapping and application deployment required before.

+

The team estimates that the company has experienced cost savings of 4-5x due to less hardware and fewer man hours needed to set up the hardware and virtual machines, metrics, and logging. Utilization of the hardware resources has been reduced as well, with containers notching two to three times more efficiency over virtual machines.

-
+

Prometheus has also had a positive impact: "It provides high availability for metrics and alerting," says Apšega. "We monitor everything starting from hardware to applications. Having all the metrics in Grafana dashboards provides great insight on our systems."

-
-
- "I think that our company just started our cloud native journey. It seems like a huge road ahead, but we’re really happy that we joined it."

— Edgaras Apšega, IT Systems Engineer, Adform
-
-
+{{< case-studies/quote author="Edgaras Apšega, IT Systems Engineer, Adform" >}} +"I think that our company just started our cloud native journey. It seems like a huge road ahead, but we're really happy that we joined it." +{{< /case-studies/quote >}} -
-All of these benefits have trickled down to individual team members, whose working lives have been changed for the better. "They used to have to get up at night to re-start some services, and now Kubernetes handles all of that," says Apšega. Adds Cibulskis: "Releases are really nice for them, because they just push their code to Git and that’s it. They don’t have to worry about their virtual machines anymore." Even the security teams have been impacted. "Security teams are always not happy," says Apšega, "and now they’re happy because they can easily inspect the containers." -The company plans to remain in the data centers for now, "mostly because we want to keep all the data, to not share it in any way," says Cibulskis, "and it’s cheaper at our scale." But, Apšega says, the possibility of using a hybrid cloud for computing is intriguing: "One of the projects we’re interested in is the Virtual Kubelet that lets you spin up the working nodes on different clouds to do some computing." -

-Apšega, Cibulskis and their colleagues are keeping tabs on how the cloud native ecosystem develops, and are excited to contribute where they can. "I think that our company just started our cloud native journey," says Apšega. "It seems like a huge road ahead, but we’re really happy that we joined it." +

All of these benefits have trickled down to individual team members, whose working lives have been changed for the better. "They used to have to get up at night to re-start some services, and now Kubernetes handles all of that," says Apšega. Adds Cibulskis: "Releases are really nice for them, because they just push their code to Git and that's it. They don't have to worry about their virtual machines anymore." Even the security teams have been impacted. "Security teams are always not happy," says Apšega, "and now they're happy because they can easily inspect the containers."

+

The company plans to remain in the data centers for now, "mostly because we want to keep all the data, to not share it in any way," says Cibulskis, "and it's cheaper at our scale." But, Apšega says, the possibility of using a hybrid cloud for computing is intriguing: "One of the projects we're interested in is the Virtual Kubelet that lets you spin up the working nodes on different clouds to do some computing."

- -
- -
+

Apšega, Cibulskis and their colleagues are keeping tabs on how the cloud native ecosystem develops, and are excited to contribute where they can. "I think that our company just started our cloud native journey," says Apšega. "It seems like a huge road ahead, but we're really happy that we joined it."

diff --git a/content/en/case-studies/adidas/index.html b/content/en/case-studies/adidas/index.html index 5f9d0da24a..3a95a5aa1b 100644 --- a/content/en/case-studies/adidas/index.html +++ b/content/en/case-studies/adidas/index.html @@ -3,106 +3,76 @@ title: adidas Case Study linkTitle: adidas case_study_styles: true cid: caseStudies -css: /css/case-studies-gradient.css featured: false + +new_case_study_styles: true +heading_background: /images/case-studies/adidas/banner1.png +heading_title_text: adidas +use_gradient_overlay: true +subheading: > + Staying True to Its Culture, adidas Got 40% of Its Most Impactful Systems Running on Kubernetes in a Year +case_study_details: + - Company: adidas + - Location: Herzogenaurach, Germany + - Industry: Fashion --- -​ +

Challenge

+ +

In recent years, the adidas team was happy with its software choices from a technology perspective—but accessing all of the tools was a problem. For instance, "just to get a developer VM, you had to send a request form, give the purpose, give the title of the project, who's responsible, give the internal cost center a call so that they can do recharges," says Daniel Eichten, Senior Director of Platform Engineering. "The best case is you got your machine in half an hour. Worst case is half a week or sometimes even a week."

+ +

Solution

+ +

To improve the process, "we started from the developer point of view," and looked for ways to shorten the time it took to get a project up and running and into the adidas infrastructure, says Senior Director of Platform Engineering Fernando Cornago. They found the solution with containerization, agile development, continuous delivery, and a cloud native platform that includes Kubernetes and Prometheus.

-
-

CASE STUDY: adidas

-
Staying True to Its Culture, adidas Got 40% of Its Most Impactful Systems Running on Kubernetes in a Year
-
-​ -​ -
- Company  adidas     Location  Herzogenaurach, Germany     Industry  Fashion -
-​ -
-
-
-
-

Challenge

- In recent years, the adidas team was happy with its software choices from a technology perspective—but accessing all of the tools was a problem. For instance, “just to get a developer VM, you had to send a request form, give the purpose, give the title of the project, who’s responsible, give the internal cost center a call so that they can do recharges,” says Daniel Eichten, Senior Director of Platform Engineering. “The best case is you got your machine in half an hour. Worst case is half a week or sometimes even a week.” -

Solution

- To improve the process, “we started from the developer point of view,” and looked for ways to shorten the time it took to get a project up and running and into the adidas infrastructure, says Senior Director of Platform Engineering Fernando Cornago. They found the solution with containerization, agile development, continuous delivery, and a cloud native platform that includes Kubernetes and Prometheus. -​

Impact

- Just six months after the project began, 100% of the adidas e-commerce site was running on Kubernetes. Load time for the e-commerce site was reduced by half. Releases went from every 4-6 weeks to 3-4 times a day. With 4,000 pods, 200 nodes, and 80,000 builds per month, adidas is now running 40% of its most critical, impactful systems on its cloud native platform. -
-
-
-
-
- "For me, Kubernetes is a platform made by engineers for engineers. It’s relieving the development team from tasks that they don’t want to do, but at the same time giving the visibility of what is behind the curtain, so they can also control it." -

- FERNANDO CORNAGO, SENIOR DIRECTOR OF PLATFORM ENGINEERING AT ADIDAS

-
-
-​ -​ -
-
-

In recent years, the adidas team was happy with its software choices from a technology perspective—but accessing all of the tools was a problem.

-

For engineers at adidas, says Daniel Eichten, Senior Director of Platform Engineering, "it felt like being an artist with your hands tied behind your back, and you’re supposed to paint something."

-

- For instance, "just to get a developer VM, you had to send a request form, give the purpose, give the title of the project, who’s responsible, give the internal cost center a call so that they can do recharges," says Eichten. "Eventually, after a ton of approvals, then the provisioning of the machine happened within minutes, and then the best case is you got your machine in half an hour. Worst case is half a week or sometimes even a week." -

-

- To improve the process, "we started from the developer point of view," and looked for ways to shorten the time it took to get a project up and running and into the adidas infrastructure, says Senior Director of Platform Engineering Fernando Cornago.

-
-
-​ -
-
- "I call our cloud native platform the field of dreams. We built it, and we never anticipated that people would come and just love it."

- DANIEL EICHTEN, SENIOR DIRECTOR OF PLATFORM ENGINEERING AT ADIDAS

-
-
-​ -
-
-

- "We were engineers before," adds Eichten. "We know what a typical engineer needs, is craving for, what he or she doesn’t want to take care of. For us it was pretty clear. We filled the gaps that no one wants to take care of, and we make the stuff that is usually painful as painless as possible." The goals: to improve speed, operability, and observability. -

-

- Cornago and Eichten found the solution with containerization, agile development, continuous delivery, and a cloud native platform that includes Kubernetes and Prometheus. "Choosing Kubernetes was pretty clear," says Eichten. "Day zero, deciding, easy. Day one, installing, configuring, easy. Day two, keeping it up and running even with small workloads, if something goes wrong, you don’t know how these things work in detail, you’re lost. For day two problems, we needed a partner who’s helping us." -

-

- In early 2017, adidas chose Giant Swarm to consult, install, configure, and run all of its Kubernetes clusters in AWS and on premise. "There is no competitive edge over our competitors like Puma or Nike in running and operating a Kubernetes cluster," says Eichten. "Our competitive edge is that we teach our internal engineers how to build cool e-comm stores that are fast, that are resilient, that are running perfectly." -

-
-
-​ -​ -
-
- “There is no competitive edge over our competitors like Puma or Nike in running and operating a Kubernetes cluster. Our competitive edge is that we teach our internal engineers how to build cool e-comm stores that are fast, that are resilient, that are running perfectly.”

- DANIEL EICHTEN, SENIOR DIRECTOR OF PLATFORM ENGINEERING AT ADIDAS

-
-
-​ -
-
-

- Adds Cornago: “For me, our Kubernetes platform is made by engineers for engineers. It’s relieving the development team from tasks that they don’t want to do, but at the same time giving the visibility of what is behind the curtain, so they can also control it.” -

-

- Case in point: For Cyber Week, the team has to create a lot of custom metrics. In November 2017, “because we used the same Prometheus that we use for monitoring the cluster, we really filled the Prometheus database, and we were not able to reduce the retention period [enough],” says Cornago. So during the freeze period before the peak shopping week, five engineers from the platform team worked with five engineers from the e-comm team to figure out a federated solution that was implemented in two days. -

-

- In addition to being ready for Cyber Week—100% of the adidas e-commerce site was running on Kubernetes then, just six months after the project began—the cloud native stack has had other impressive results. Load time for the e-commerce site was reduced by half. Releases went from every 4-6 weeks to 3-4 times a day. With 4,000 pods, 200 nodes, and 80,000 builds per month, adidas is now running 40% of its most critical, impactful systems on its cloud native platform. -

-

- And adoption has spread quickly among adidas’s 300-strong engineering corps. “I call our cloud native platform the field of dreams,” says Eichten. “We built it, and we never anticipated that people would come and just love it.” -

-

- For one thing, “everybody who can touch a line of code” has spent one full week onboarding and learning the platform with members of the 35-person platform engineering team, says Cornago. “We try to spend 50% of our time sitting with the teams, because this is the only way to understand how our platform is being used. And this is how the teams will feel safe that there is someone on the other side of the wall, also feeling the pain.” -

-

- Additionally, Cornago and Eichten took advantage of the fact that as a fashion athletic wear brand, adidas has sports and competition in its DNA. “Top-down mandates don’t work at adidas, but gamification works,” says Cornago. “So this year we had a DevOps Cup competition. Every team created new technical capabilities and had a hypothesis of how this affected business value. We announced the winner at a big internal tech summit with more than 600 people. It’s been really, really useful for the teams.” -

-

- So if they had any advice for other companies looking to start a cloud native journey, it would be this: “There is no one-size-fits-all for all companies,” says Cornago. “Apply your company’s culture to everything that you do.” -

-
-
- + +

Just six months after the project began, 100% of the adidas e-commerce site was running on Kubernetes. Load time for the e-commerce site was reduced by half. Releases went from every 4-6 weeks to 3-4 times a day. With 4,000 pods, 200 nodes, and 80,000 builds per month, adidas is now running 40% of its most critical, impactful systems on its cloud native platform.

+ +{{< case-studies/quote + image="/images/case-studies/adidas/banner2.png" + author="FERNANDO CORNAGO, SENIOR DIRECTOR OF PLATFORM ENGINEERING AT ADIDAS" +>}} +"For me, Kubernetes is a platform made by engineers for engineers. It's relieving the development team from tasks that they don't want to do, but at the same time giving the visibility of what is behind the curtain, so they can also control it." +{{< /case-studies/quote >}} + +{{< case-studies/lead >}} +In recent years, the adidas team was happy with its software choices from a technology perspective—but accessing all of the tools was a problem. +{{< /case-studies/lead >}} + +

For engineers at adidas, says Daniel Eichten, Senior Director of Platform Engineering, "it felt like being an artist with your hands tied behind your back, and you're supposed to paint something."

+ +

For instance, "just to get a developer VM, you had to send a request form, give the purpose, give the title of the project, who's responsible, give the internal cost center a call so that they can do recharges," says Eichten. "Eventually, after a ton of approvals, then the provisioning of the machine happened within minutes, and then the best case is you got your machine in half an hour. Worst case is half a week or sometimes even a week."

+ +

To improve the process, "we started from the developer point of view," and looked for ways to shorten the time it took to get a project up and running and into the adidas infrastructure, says Senior Director of Platform Engineering Fernando Cornago.

+ +{{< case-studies/quote author="DANIEL EICHTEN, SENIOR DIRECTOR OF PLATFORM ENGINEERING AT ADIDAS" >}} +"I call our cloud native platform the field of dreams. We built it, and we never anticipated that people would come and just love it." +{{< /case-studies/quote >}} + +

"We were engineers before," adds Eichten. "We know what a typical engineer needs, is craving for, what he or she doesn't want to take care of. For us it was pretty clear. We filled the gaps that no one wants to take care of, and we make the stuff that is usually painful as painless as possible." The goals: to improve speed, operability, and observability.

+ +

Cornago and Eichten found the solution with containerization, agile development, continuous delivery, and a cloud native platform that includes Kubernetes and Prometheus. "Choosing Kubernetes was pretty clear," says Eichten. "Day zero, deciding, easy. Day one, installing, configuring, easy. Day two, keeping it up and running even with small workloads, if something goes wrong, you don't know how these things work in detail, you're lost. For day two problems, we needed a partner who's helping us."

+ +

In early 2017, adidas chose Giant Swarm to consult, install, configure, and run all of its Kubernetes clusters in AWS and on premise. "There is no competitive edge over our competitors like Puma or Nike in running and operating a Kubernetes cluster," says Eichten. "Our competitive edge is that we teach our internal engineers how to build cool e-comm stores that are fast, that are resilient, that are running perfectly."

+ +{{< case-studies/quote + image="/images/case-studies/adidas/banner3.png" + author="DANIEL EICHTEN, SENIOR DIRECTOR OF PLATFORM ENGINEERING AT ADIDAS" +>}} +"There is no competitive edge over our competitors like Puma or Nike in running and operating a Kubernetes cluster. Our competitive edge is that we teach our internal engineers how to build cool e-comm stores that are fast, that are resilient, that are running perfectly." +{{< /case-studies/quote >}} + +

Adds Cornago: "For me, our Kubernetes platform is made by engineers for engineers. It's relieving the development team from tasks that they don't want to do, but at the same time giving the visibility of what is behind the curtain, so they can also control it."

+ +

Case in point: For Cyber Week, the team has to create a lot of custom metrics. In November 2017, "because we used the same Prometheus that we use for monitoring the cluster, we really filled the Prometheus database, and we were not able to reduce the retention period [enough]," says Cornago. So during the freeze period before the peak shopping week, five engineers from the platform team worked with five engineers from the e-comm team to figure out a federated solution that was implemented in two days.

+ +

In addition to being ready for Cyber Week—100% of the adidas e-commerce site was running on Kubernetes then, just six months after the project began—the cloud native stack has had other impressive results. Load time for the e-commerce site was reduced by half. Releases went from every 4-6 weeks to 3-4 times a day. With 4,000 pods, 200 nodes, and 80,000 builds per month, adidas is now running 40% of its most critical, impactful systems on its cloud native platform.

+ +

And adoption has spread quickly among adidas's 300-strong engineering corps. "I call our cloud native platform the field of dreams," says Eichten. "We built it, and we never anticipated that people would come and just love it."

+ +

For one thing, "everybody who can touch a line of code" has spent one full week onboarding and learning the platform with members of the 35-person platform engineering team, says Cornago. "We try to spend 50% of our time sitting with the teams, because this is the only way to understand how our platform is being used. And this is how the teams will feel safe that there is someone on the other side of the wall, also feeling the pain."

+ +

Additionally, Cornago and Eichten took advantage of the fact that as a fashion athletic wear brand, adidas has sports and competition in its DNA. "Top-down mandates don't work at adidas, but gamification works," says Cornago. "So this year we had a DevOps Cup competition. Every team created new technical capabilities and had a hypothesis of how this affected business value. We announced the winner at a big internal tech summit with more than 600 people. It's been really, really useful for the teams."

+ +

So if they had any advice for other companies looking to start a cloud native journey, it would be this: "There is no one-size-fits-all for all companies," says Cornago. "Apply your company's culture to everything that you do."

diff --git a/content/en/case-studies/amadeus/index.html b/content/en/case-studies/amadeus/index.html index 8b6294d4f9..bd647003ec 100644 --- a/content/en/case-studies/amadeus/index.html +++ b/content/en/case-studies/amadeus/index.html @@ -1,105 +1,84 @@ --- title: Amadeus Case Study - case_study_styles: true cid: caseStudies -css: /css/style_amadeus.css + +new_case_study_styles: true +heading_background: /images/case-studies/amadeus/banner1.jpg +heading_title_logo: /images/amadeus_logo.png +subheading: > + Another Technical Evolution for a 30-Year-Old Company +case_study_details: + - Company: Amadeus IT Group + - Location: Madrid, Spain + - Industry: Travel Technology --- -
-

CASE STUDY:
Another Technical Evolution for a 30-Year-Old Company -

-
-
- Company  Amadeus IT Group     Location  Madrid, Spain     Industry  Travel Technology -
-
-
-
-
-

Challenge

- In the past few years, Amadeus, which provides IT solutions to the travel industry around the world, found itself in need of a new platform for the 5,000 services supported by its service-oriented architecture. The 30-year-old company operates its own data center in Germany, and there were growing demands internally and externally for solutions that needed to be geographically dispersed. And more generally, "we had objectives of being even more highly available," says Eric Mountain, Senior Expert, Distributed Systems at Amadeus. Among the company’s goals: to increase automation in managing its infrastructure, optimize the distribution of workloads, use data center resources more efficiently, and adopt new technologies more easily. -
-
-

Solution

- Mountain has been overseeing the company’s migration to Kubernetes, using OpenShift Container Platform, Red Hat’s enterprise container platform. -

+

Challenge

+ +

In the past few years, Amadeus, which provides IT solutions to the travel industry around the world, found itself in need of a new platform for the 5,000 services supported by its service-oriented architecture. The 30-year-old company operates its own data center in Germany, and there were growing demands internally and externally for solutions that needed to be geographically dispersed. And more generally, "we had objectives of being even more highly available," says Eric Mountain, Senior Expert, Distributed Systems at Amadeus. Among the company's goals: to increase automation in managing its infrastructure, optimize the distribution of workloads, use data center resources more efficiently, and adopt new technologies more easily.

+ +

Solution

+ +

Mountain has been overseeing the company's migration to Kubernetes, using OpenShift Container Platform, Red Hat's enterprise container platform.

+

Impact

- One of the first projects the team deployed in Kubernetes was the Amadeus Airline Cloud Availability solution, which helps manage ever-increasing flight-search volume. "It’s now handling in production several thousand transactions per second, and it’s deployed in multiple data centers throughout the world," says Mountain. "It’s not a migration of an existing workload; it’s a whole new workload that we couldn’t have done otherwise. [This platform] gives us access to market opportunities that we didn’t have before." -
-
-
-
-
- "We want multi-data center capabilities, and we want them for our mainstream system as well. We didn’t think that we could achieve them with our existing system. We need new automation, things that Kubernetes and OpenShift bring."

- Eric Mountain, Senior Expert, Distributed Systems at Amadeus IT Group
-
-
-
+

One of the first projects the team deployed in Kubernetes was the Amadeus Airline Cloud Availability solution, which helps manage ever-increasing flight-search volume. "It's now handling in production several thousand transactions per second, and it's deployed in multiple data centers throughout the world," says Mountain. "It's not a migration of an existing workload; it's a whole new workload that we couldn't have done otherwise. [This platform] gives us access to market opportunities that we didn't have before."

-
-

In his two decades at Amadeus, Eric Mountain has been the migrations guy.

- Back in the day, he worked on the company’s move from Unix to Linux, and now he’s overseeing the journey to cloud native. "Technology just keeps changing, and we embrace it," he says. "We are celebrating our 30 years this year, and we continue evolving and innovating to stay cost-efficient and enhance everyone’s travel experience, without interrupting workflows for the customers who depend on our technology."

- That was the challenge that Amadeus—which provides IT solutions to the travel industry around the world, from flight searches to hotel bookings to customer feedback—faced in 2014. The technology team realized it was in need of a new platform for the 5,000 services supported by its service-oriented architecture.

- The tipping point occurred when they began receiving many requests, internally and externally, for solutions that needed to be geographically outside the company’s main data center in Germany. "Some requests were for running our applications on customer premises," Mountain says. "There were also new services we were looking to offer that required response time to the order of a few hundred milliseconds, which we couldn’t achieve with transatlantic traffic. Or at least, not without eating into a considerable portion of the time available to our applications for them to process individual queries."

- More generally, the company was interested in leveling up on high availability, increasing automation in managing infrastructure, optimizing the distribution of workloads and using data center resources more efficiently. "We have thousands and thousands of servers," says Mountain. "These servers are assigned roles, so even if the setup is highly automated, the machine still has a given role. It’s wasteful on many levels. For instance, an application doesn’t necessarily use the machine very optimally. Virtualization can help a bit, but it’s not a silver bullet. If that machine breaks, you still want to repair it because it has that role and you can’t simply say, ‘Well, I’ll bring in another machine and give it that role.’ It’s not fast. It’s not efficient. So we wanted the next level of automation."

-
-
+{{< case-studies/quote author="Eric Mountain, Senior Expert, Distributed Systems at Amadeus IT Group" >}} +"We want multi-data center capabilities, and we want them for our mainstream system as well. We didn't think that we could achieve them with our existing system. We need new automation, things that Kubernetes and OpenShift bring." +{{< /case-studies/quote >}} -
-
- "We hope that if we build on what others have built, what we do might actually be upstream-able. As Kubernetes and OpenShift progress, we see that we are indeed able to remove some of the additional layers we implemented to compensate for gaps we perceived earlier." -
-
+{{< case-studies/lead >}} +In his two decades at Amadeus, Eric Mountain has been the migrations guy. +{{< /case-studies/lead >}} -
-
- While mainly a C++ and Java shop, Amadeus also wanted to be able to adopt new technologies more easily. Some of its developers had started using languages like Python and databases like Couchbase, but Mountain wanted still more options, he says, "in order to better adapt our technical solutions to the products we offer, and open up entirely new possibilities to our developers." Working with recent technologies and cool new things would also make it easier to attract new talent. -

- All of those needs led Mountain and his team on a search for a new platform. "We did a set of studies and proofs of concept over a fairly short period, and we considered many technologies," he says. "In the end, we were left with three choices: build everything on premise, build on top of Kubernetes whatever happens to be missing from our point of view, or go with OpenShift and build whatever remains there." -

- The team decided against building everything themselves—though they’d done that sort of thing in the past—because "people were already inventing things that looked good," says Mountain. -

- Ultimately, they went with OpenShift Container Platform, Red Hat’s Kubernetes-based enterprise offering, instead of building on top of Kubernetes because "there was a lot of synergy between what we wanted and the way Red Hat was anticipating going with OpenShift," says Mountain. "They were clearly developing Kubernetes, and developing certain things ahead of time in OpenShift, which were important to us, such as more security." -

- The hope was that those particular features would eventually be built into Kubernetes, and, in the case of security, Mountain feels that has happened. "We realize that there’s always a certain amount of automation that we will probably have to develop ourselves to compensate for certain gaps," says Mountain. "The less we do that, the better for us. We hope that if we build on what others have built, what we do might actually be upstream-able. As Kubernetes and OpenShift progress, we see that we are indeed able to remove some of the additional layers we implemented to compensate for gaps we perceived earlier." -
-
+

Back in the day, he worked on the company's move from Unix to Linux, and now he's overseeing the journey to cloud native. "Technology just keeps changing, and we embrace it," he says. "We are celebrating our 30 years this year, and we continue evolving and innovating to stay cost-efficient and enhance everyone's travel experience, without interrupting workflows for the customers who depend on our technology."

-
-
- "It’s not a migration of an existing workload; it’s a whole new workload that we couldn’t have done otherwise. [This platform] gives us access to market opportunities that we didn’t have before." -
-
+

That was the challenge that Amadeus—which provides IT solutions to the travel industry around the world, from flight searches to hotel bookings to customer feedback—faced in 2014. The technology team realized it was in need of a new platform for the 5,000 services supported by its service-oriented architecture.

-
-
- The first project the team tackled was one that they knew had to run outside the data center in Germany. Because of the project’s needs, "We couldn’t rely only on the built-in Kubernetes service discovery; we had to layer on top of that an extra service discovery level that allows us to load balance at the operation level within our system," says Mountain. They also built a stream dedicated to monitoring, which at the time wasn’t offered in the Kubernetes or OpenShift ecosystem. Now that Prometheus and other products are available, Mountain says the company will likely re-evaluate their monitoring system: "We obviously always like to leverage what Kubernetes and OpenShift can offer." -

- The second project ended up going into production first: the Amadeus Airline Cloud Availability solution, which helps manage ever-increasing flight-search volume and was deployed in public cloud. Launched in early 2016, it is "now handling in production several thousand transactions per second, and it’s deployed in multiple data centers throughout the world," says Mountain. "It’s not a migration of an existing workload; it’s a whole new workload that we couldn’t have done otherwise. [This platform] gives us access to market opportunities that we didn’t have before." -

- Having been through this kind of technical evolution more than once, Mountain has advice on how to handle the cultural changes. "That’s one aspect that we can tackle progressively," he says. "We have to go on supplying our customers with new features on our pre-existing products, and we have to keep existing products working. So we can’t simply do absolutely everything from one day to the next. And we mustn’t sell it that way." -

- The first order of business, then, is to pick one or two applications to demonstrate that the technology works. Rather than choosing a high-impact, high-risk project, Mountain’s team selected a smaller application that was representative of all the company’s other applications in its complexity: "We just made sure we picked something that’s complex enough, and we showed that it can be done." -
-
+

The tipping point occurred when they began receiving many requests, internally and externally, for solutions that needed to be geographically outside the company's main data center in Germany. "Some requests were for running our applications on customer premises," Mountain says. "There were also new services we were looking to offer that required response time to the order of a few hundred milliseconds, which we couldn't achieve with transatlantic traffic. Or at least, not without eating into a considerable portion of the time available to our applications for them to process individual queries."

-
-
- "The bottom line is we want these multi-data center capabilities, and we want them as well for our mainstream system," he says. "And we don’t think that we can implement them with our previous system. We need the new automation, homogeneity, and scale that Kubernetes and OpenShift bring." -
-
+

More generally, the company was interested in leveling up on high availability, increasing automation in managing infrastructure, optimizing the distribution of workloads and using data center resources more efficiently. "We have thousands and thousands of servers," says Mountain. "These servers are assigned roles, so even if the setup is highly automated, the machine still has a given role. It's wasteful on many levels. For instance, an application doesn't necessarily use the machine very optimally. Virtualization can help a bit, but it's not a silver bullet. If that machine breaks, you still want to repair it because it has that role and you can't simply say, 'Well, I'll bring in another machine and give it that role.' It's not fast. It's not efficient. So we wanted the next level of automation."

-
-
- Next comes convincing people. "On the operations side and on the R&D side, there will be people who say quite rightly, ‘There is a system, and it works, so why change?’" Mountain says. "The only thing that really convinces people is showing them the value." For Amadeus, people realized that the Airline Cloud Availability product could not have been made available on the public cloud with the company’s existing system. The question then became, he says, "Do we go into a full-blown migration? Is that something that is justified?" -

- "The bottom line is we want these multi-data center capabilities, and we want them as well for our mainstream system," he says. "And we don’t think that we can implement them with our previous system. We need the new automation, homogeneity, and scale that Kubernetes and OpenShift bring." -

- So how do you get everyone on board? "Make sure you have good links between your R&D and your operations," he says. "Also make sure you’re going to talk early on to the investors and stakeholders. Figure out what it is that they will be expecting from you, that will convince them or not, that this is the right way for your company." -

- His other advice is simply to make the technology available for people to try it. "Kubernetes and OpenShift Origin are open source software, so there’s no complicated license key for the evaluation period and you’re not limited to 30 days," he points out. "Just go and get it running." Along with that, he adds, "You’ve got to be prepared to rethink how you do things. Of course making your applications as cloud native as possible is how you’ll reap the most benefits: 12 factors, CI/CD, which is continuous integration, continuous delivery, but also continuous deployment." -

- And while they explore that aspect of the technology, Mountain and his team will likely be practicing what he preaches to others taking the cloud native journey. "See what happens when you break it, because it’s important to understand the limits of the system," he says. Or rather, he notes, the advantages of it. "Breaking things on Kube is actually one of the nice things about it—it recovers. It’s the only real way that you’ll see that you might be able to do things." -
-
+{{< case-studies/quote image="/images/case-studies/amadeus/banner3.jpg" >}} +"We hope that if we build on what others have built, what we do might actually be upstream-able. As Kubernetes and OpenShift progress, we see that we are indeed able to remove some of the additional layers we implemented to compensate for gaps we perceived earlier." +{{< /case-studies/quote >}} + +

While mainly a C++ and Java shop, Amadeus also wanted to be able to adopt new technologies more easily. Some of its developers had started using languages like Python and databases like Couchbase, but Mountain wanted still more options, he says, "in order to better adapt our technical solutions to the products we offer, and open up entirely new possibilities to our developers." Working with recent technologies and cool new things would also make it easier to attract new talent.

+ +

All of those needs led Mountain and his team on a search for a new platform. "We did a set of studies and proofs of concept over a fairly short period, and we considered many technologies," he says. "In the end, we were left with three choices: build everything on premise, build on top of Kubernetes whatever happens to be missing from our point of view, or go with OpenShift and build whatever remains there."

+ +

The team decided against building everything themselves—though they'd done that sort of thing in the past—because "people were already inventing things that looked good," says Mountain.

+ +

Ultimately, they went with OpenShift Container Platform, Red Hat's Kubernetes-based enterprise offering, instead of building on top of Kubernetes because "there was a lot of synergy between what we wanted and the way Red Hat was anticipating going with OpenShift," says Mountain. "They were clearly developing Kubernetes, and developing certain things ahead of time in OpenShift, which were important to us, such as more security."

+ +

The hope was that those particular features would eventually be built into Kubernetes, and, in the case of security, Mountain feels that has happened. "We realize that there's always a certain amount of automation that we will probably have to develop ourselves to compensate for certain gaps," says Mountain. "The less we do that, the better for us. We hope that if we build on what others have built, what we do might actually be upstream-able. As Kubernetes and OpenShift progress, we see that we are indeed able to remove some of the additional layers we implemented to compensate for gaps we perceived earlier."

+ +{{< case-studies/quote image="/images/case-studies/amadeus/banner4.jpg" >}} +"It's not a migration of an existing workload; it's a whole new workload that we couldn't have done otherwise. [This platform] gives us access to market opportunities that we didn't have before." +{{< /case-studies/quote >}} + +

The first project the team tackled was one that they knew had to run outside the data center in Germany. Because of the project's needs, "We couldn't rely only on the built-in Kubernetes service discovery; we had to layer on top of that an extra service discovery level that allows us to load balance at the operation level within our system," says Mountain. They also built a stream dedicated to monitoring, which at the time wasn't offered in the Kubernetes or OpenShift ecosystem. Now that Prometheus and other products are available, Mountain says the company will likely re-evaluate their monitoring system: "We obviously always like to leverage what Kubernetes and OpenShift can offer." +

+ +

The second project ended up going into production first: the Amadeus Airline Cloud Availability solution, which helps manage ever-increasing flight-search volume and was deployed in public cloud. Launched in early 2016, it is "now handling in production several thousand transactions per second, and it's deployed in multiple data centers throughout the world," says Mountain. "It's not a migration of an existing workload; it's a whole new workload that we couldn't have done otherwise. [This platform] gives us access to market opportunities that we didn't have before."

+ +

Having been through this kind of technical evolution more than once, Mountain has advice on how to handle the cultural changes. "That's one aspect that we can tackle progressively," he says. "We have to go on supplying our customers with new features on our pre-existing products, and we have to keep existing products working. So we can't simply do absolutely everything from one day to the next. And we mustn't sell it that way."

+ +

The first order of business, then, is to pick one or two applications to demonstrate that the technology works. Rather than choosing a high-impact, high-risk project, Mountain's team selected a smaller application that was representative of all the company's other applications in its complexity: "We just made sure we picked something that's complex enough, and we showed that it can be done."

+ +{{< case-studies/quote >}} +"The bottom line is we want these multi-data center capabilities, and we want them as well for our mainstream system," he says. "And we don't think that we can implement them with our previous system. We need the new automation, homogeneity, and scale that Kubernetes and OpenShift bring." +{{< /case-studies/quote >}} + +

Next comes convincing people. "On the operations side and on the R&D side, there will be people who say quite rightly, 'There is a system, and it works, so why change?'" Mountain says. "The only thing that really convinces people is showing them the value." For Amadeus, people realized that the Airline Cloud Availability product could not have been made available on the public cloud with the company's existing system. The question then became, he says, "Do we go into a full-blown migration? Is that something that is justified?"

+ +

"The bottom line is we want these multi-data center capabilities, and we want them as well for our mainstream system," he says. "And we don't think that we can implement them with our previous system. We need the new automation, homogeneity, and scale that Kubernetes and OpenShift bring."

+ +

So how do you get everyone on board? "Make sure you have good links between your R&D and your operations," he says. "Also make sure you're going to talk early on to the investors and stakeholders. Figure out what it is that they will be expecting from you, that will convince them or not, that this is the right way for your company."

+ +

His other advice is simply to make the technology available for people to try it. "Kubernetes and OpenShift Origin are open source software, so there's no complicated license key for the evaluation period and you're not limited to 30 days," he points out. "Just go and get it running." Along with that, he adds, "You've got to be prepared to rethink how you do things. Of course making your applications as cloud native as possible is how you'll reap the most benefits: 12 factors, CI/CD, which is continuous integration, continuous delivery, but also continuous deployment."

+ +

And while they explore that aspect of the technology, Mountain and his team will likely be practicing what he preaches to others taking the cloud native journey. "See what happens when you break it, because it's important to understand the limits of the system," he says. Or rather, he notes, the advantages of it. "Breaking things on Kube is actually one of the nice things about it—it recovers. It's the only real way that you'll see that you might be able to do things."

diff --git a/content/en/case-studies/ancestry/index.html b/content/en/case-studies/ancestry/index.html index a992a284ac..8cab3c19c6 100644 --- a/content/en/case-studies/ancestry/index.html +++ b/content/en/case-studies/ancestry/index.html @@ -1,117 +1,92 @@ --- title: Ancestry Case Study - case_study_styles: true cid: caseStudies -css: /css/style_ancestry.css + +new_case_study_styles: true +heading_background: /images/case-studies/ancestry/banner1.jpg +heading_title_logo: /images/ancestry_logo.png +subheading: > + Digging Into the Past With New Technology +case_study_details: + - Company: Ancestry + - Location: Lehi, Utah + - Industry: Internet Company, Online Services --- -
-

CASE STUDY:
Digging Into the Past With New Technology

- -
- -
- Company  Ancestry     Location  Lehi, Utah     Industry  Internet Company, Online Services -
- -
- -
-
-
-

Challenge

-Ancestry, the global leader in family history and consumer genomics, uses sophisticated engineering and technology to help everyone, everywhere discover the story of what led to them. The company has spent more than 30 years innovating and building products and technologies that at their core, result in real and emotional human responses. Ancestry currently serves more than 2.6 million paying subscribers, holds 20 billion historical records, 90 million family trees and more than four million people are in its AncestryDNA network, making it the largest consumer genomics DNA network in the world. The company's popular website, ancestry.com, has been working with big data long before the term was popularized. The site was built on hundreds of services, technologies and a traditional deployment methodology. "It's worked well for us in the past," says Paul MacKay, software engineer and architect at Ancestry, "but had become quite cumbersome in its processing and is time-consuming. As a primarily online service, we are constantly looking for ways to accelerate to be more agile in delivering our solutions and our products." -
+

Ancestry, the global leader in family history and consumer genomics, uses sophisticated engineering and technology to help everyone, everywhere discover the story of what led to them. The company has spent more than 30 years innovating and building products and technologies that at their core, result in real and emotional human responses. Ancestry currently serves more than 2.6 million paying subscribers, holds 20 billion historical records, 90 million family trees and more than four million people are in its AncestryDNA network, making it the largest consumer genomics DNA network in the world. The company's popular website, ancestry.com, has been working with big data long before the term was popularized. The site was built on hundreds of services, technologies and a traditional deployment methodology. "It's worked well for us in the past," says Paul MacKay, software engineer and architect at Ancestry, "but had become quite cumbersome in its processing and is time-consuming. As a primarily online service, we are constantly looking for ways to accelerate to be more agile in delivering our solutions and our products."

-
+

Solution

-
-

Solution

+

The company is transitioning to cloud native infrastructure, using Docker containerization, Kubernetes orchestration and Prometheus for cluster monitoring.

- The company is transitioning to cloud native infrastructure, using Docker containerization, Kubernetes orchestration and Prometheus for cluster monitoring.
-

Impact

- "Every single product, every decision we make at Ancestry, focuses on delighting our customers with intimate, sometimes life-changing discoveries about themselves and their families," says MacKay. "As the company continues to grow, the increased productivity gains from using Kubernetes has helped Ancestry make customer discoveries faster. With the move to Dockerization for example, instead of taking between 20 to 50 minutes to deploy a new piece of code, we can now deploy in under a minute for much of our code. We’ve truly experienced significant time savings in addition to the various features and benefits from cloud native and Kubernetes-type technologies." -
-
-
-
-
- "At a certain point, you have to step back if you're going to push a new technology and get key thought leaders with engineers within the organization to become your champions for new technology adoption. At training sessions, the development teams were always the ones that were saying, 'Kubernetes saved our time tremendously; it's an enabler. It really is incredible.'"

- PAUL MACKAY, SOFTWARE ENGINEER AND ARCHITECT AT ANCESTRY -
-
+

"Every single product, every decision we make at Ancestry, focuses on delighting our customers with intimate, sometimes life-changing discoveries about themselves and their families," says MacKay. "As the company continues to grow, the increased productivity gains from using Kubernetes has helped Ancestry make customer discoveries faster. With the move to Dockerization for example, instead of taking between 20 to 50 minutes to deploy a new piece of code, we can now deploy in under a minute for much of our code. We've truly experienced significant time savings in addition to the various features and benefits from cloud native and Kubernetes-type technologies."

-
-
-

It started with a Shaky Leaf.

+{{< case-studies/quote author="PAUL MACKAY, SOFTWARE ENGINEER AND ARCHITECT AT ANCESTRY" >}} +"At a certain point, you have to step back if you're going to push a new technology and get key thought leaders with engineers within the organization to become your champions for new technology adoption. At training sessions, the development teams were always the ones that were saying, 'Kubernetes saved our time tremendously; it's an enabler. It really is incredible.'" +{{< /case-studies/quote >}} - Since its introduction a decade ago, the Shaky Leaf icon has become one of Ancestry's signature features, which signals to users that there's a helpful hint you can use to find out more about your family tree.

- So when the company decided to begin moving its infrastructure to cloud native technology, the first service that was launched on Kubernetes, the open source platform for managing application containers across clusters of hosts, was this hint system. Think of it as Amazon's recommended products, but instead of recommending products the company recommends records, stories, or familial connections. "It was a very important part of the site," says Ancestry software engineer and architect Paul MacKay, "but also small enough for a pilot project that we knew we could handle in a very appropriate, secure way."

- And when it went live smoothly in early 2016, "our deployment time for this service literally was cut down from 50 minutes to 2 or 5 minutes," MacKay adds. "The development team was just thrilled because we're focused on supplying a great experience for our customers. And that means features, it means stability, it means all those things that we need for a first-in-class type operation."

- The stability of that Shaky Leaf was a signal for MacKay and his team that their decision to embrace cloud native technologies was the right one for the company. With a private data center, Ancestry built its website (which launched in 1996) on hundreds of services and technologies and a traditional deployment methodology. "It worked well for us in the past, but the sum of the legacy systems became quite cumbersome in its processing and was time-consuming," says MacKay. "We were looking for other ways to accelerate, to be more agile in delivering our solutions and our products." -
-
+{{< case-studies/lead >}} +It started with a Shaky Leaf. +{{< /case-studies/lead >}} -
-
+

Since its introduction a decade ago, the Shaky Leaf icon has become one of Ancestry's signature features, which signals to users that there's a helpful hint you can use to find out more about your family tree.

+ +

So when the company decided to begin moving its infrastructure to cloud native technology, the first service that was launched on Kubernetes, the open source platform for managing application containers across clusters of hosts, was this hint system. Think of it as Amazon's recommended products, but instead of recommending products the company recommends records, stories, or familial connections. "It was a very important part of the site," says Ancestry software engineer and architect Paul MacKay, "but also small enough for a pilot project that we knew we could handle in a very appropriate, secure way."

+ +

And when it went live smoothly in early 2016, "our deployment time for this service literally was cut down from 50 minutes to 2 or 5 minutes," MacKay adds. "The development team was just thrilled because we're focused on supplying a great experience for our customers. And that means features, it means stability, it means all those things that we need for a first-in-class type operation."

+ +

The stability of that Shaky Leaf was a signal for MacKay and his team that their decision to embrace cloud native technologies was the right one for the company. With a private data center, Ancestry built its website (which launched in 1996) on hundreds of services and technologies and a traditional deployment methodology. "It worked well for us in the past, but the sum of the legacy systems became quite cumbersome in its processing and was time-consuming," says MacKay. "We were looking for other ways to accelerate, to be more agile in delivering our solutions and our products."

+ +{{< case-studies/quote image="/images/case-studies/ancestry/banner3.jpg" >}} "And when it [Kubernetes] went live smoothly in early 2016, 'our deployment time for this service literally was cut down from 50 minutes to 2 or 5 minutes,' MacKay adds. 'The development team was just thrilled because we're focused on supplying a great experience for our customers. And that means features, it means stability, it means all those things that we need for a first-in-class type operation.'" -
-
+{{< /case-studies/quote >}} -
-
- That need led them in 2015 to explore containerization. Ancestry engineers had already been using technology like Java and Python on Linux, so part of the decision was about making the infrastructure more Linux-friendly. They quickly decided that they wanted to go with Docker for containerization, "but it always comes down to the orchestration part of it to make it really work," says MacKay.

- His team looked at orchestration platforms offered by Docker Compose, Mesos and OpenStack, and even started to prototype some homegrown solutions. And then they started hearing rumblings of the imminent release of Kubernetes v1.0. "At the forefront, we were looking at the secret store, so we didn't have to manage that all ourselves, the config maps, the methodology of seamless deployment strategy," he says. "We found that how Kubernetes had done their resources, their types, their labels and just their interface was so much further advanced than the other things we had seen. It was a feature fit."

-
- Plus, MacKay says, "I just believed in the confidence that comes with the history that Google has with containerization. So we started out right on the leading edge of it. And we haven't looked back since."

- Which is not to say that adopting a new technology hasn't come with some challenges. "Change is hard," says MacKay. "Not because the technology is hard or that the technology is not good. It's just that people like to do things like they had done [before]. You have the early adopters and you have those who are coming in later. It was a learning experience on both sides."

- Figuring out the best deployment operations for Ancestry was a big part of the work it took to adopt cloud native infrastructure. "We want to make sure the process is easy and also controlled in the manner that allows us the highest degree of security that we demand and our customers demand," says MacKay. "With Kubernetes and other products, there are some good solutions, but a little bit of glue is needed to bring it into corporate processes and governances. It's like having a set of gloves that are generic, but when you really do want to grab something you have to make it so it's customized to you. That's what we had to do."

- Their best practices include allowing their developers to deploy into development stage and production, but then controlling the aspects that need governance and auditing, such as secrets. They found that having one namespace per service is useful for achieving that containment of secrets and config maps. And for their needs, having one container per pod makes it easier to manage and to have a smaller unit of deployment. -

-
-
+

That need led them in 2015 to explore containerization. Ancestry engineers had already been using technology like Java and Python on Linux, so part of the decision was about making the infrastructure more Linux-friendly. They quickly decided that they wanted to go with Docker for containerization, "but it always comes down to the orchestration part of it to make it really work," says MacKay.

-
-
+

His team looked at orchestration platforms offered by Docker Compose, Mesos and OpenStack, and even started to prototype some homegrown solutions. And then they started hearing rumblings of the imminent release of Kubernetes v1.0. "At the forefront, we were looking at the secret store, so we didn't have to manage that all ourselves, the config maps, the methodology of seamless deployment strategy," he says. "We found that how Kubernetes had done their resources, their types, their labels and just their interface was so much further advanced than the other things we had seen. It was a feature fit."

+{{< case-studies/lead >}} +Plus, MacKay says, "I just believed in the confidence that comes with the history that Google has with containerization. So we started out right on the leading edge of it. And we haven't looked back since." +{{< /case-studies/lead >}} + +

Which is not to say that adopting a new technology hasn't come with some challenges. "Change is hard," says MacKay. "Not because the technology is hard or that the technology is not good. It's just that people like to do things like they had done [before]. You have the early adopters and you have those who are coming in later. It was a learning experience on both sides."

+ +

Figuring out the best deployment operations for Ancestry was a big part of the work it took to adopt cloud native infrastructure. "We want to make sure the process is easy and also controlled in the manner that allows us the highest degree of security that we demand and our customers demand," says MacKay. "With Kubernetes and other products, there are some good solutions, but a little bit of glue is needed to bring it into corporate processes and governances. It's like having a set of gloves that are generic, but when you really do want to grab something you have to make it so it's customized to you. That's what we had to do."

+ +

Their best practices include allowing their developers to deploy into development stage and production, but then controlling the aspects that need governance and auditing, such as secrets. They found that having one namespace per service is useful for achieving that containment of secrets and config maps. And for their needs, having one container per pod makes it easier to manage and to have a smaller unit of deployment. +

+ +{{< case-studies/quote image="/images/case-studies/ancestry/banner4.jpg" >}} "The success of Ancestry's first deployment of the hint system on Kubernetes helped create momentum for greater adoption of the technology." +{{< /case-studies/quote >}} -
-
+

With that process established, the time spent on deployment was cut down to under a minute for some services. "As programmers, we have what's called REPL: read, evaluate, print, and loop, but with Kubernetes, we have CDEL: compile, deploy, execute, and loop," says MacKay. "It's a very quick loop back and a great benefit to understand that when our services are deployed in production, they're the same as what we tested in the pre-production environments. The approach of cloud native for Ancestry provides us a better ability to scale and to accommodate the business needs as work loads occur."

-
-
- With that process established, the time spent on deployment was cut down to under a minute for some services. "As programmers, we have what's called REPL: read, evaluate, print, and loop, but with Kubernetes, we have CDEL: compile, deploy, execute, and loop," says MacKay. "It's a very quick loop back and a great benefit to understand that when our services are deployed in production, they're the same as what we tested in the pre-production environments. The approach of cloud native for Ancestry provides us a better ability to scale and to accommodate the business needs as work loads occur."

- The success of Ancestry's first deployment of the hint system on Kubernetes helped create momentum for greater adoption of the technology. "Engineers like to code, they like to do features, they don't like to sit around waiting for things to be deployed and worrying about scaling up and out and down," says MacKay. "After a while the engineers became our champions. At training sessions, the development teams were always the ones saying, 'Kubernetes saved our time tremendously; it's an enabler; it really is incredible.' Over time, we were able to convince our management that this was a transition that the industry is making and that we needed to be a part of it."

- A year later, Ancestry has transitioned a good number of applications to Kubernetes. "We have many different services that make up the rich environment that [the website] has from both the DNA side and the family history side," says MacKay. "We have front-end stacks, back-end stacks and back-end processing type stacks that are in the cluster."

- The company continues to weigh which services it will move forward to Kubernetes, which ones will be kept as is, and which will be replaced in the future and thus don't have to be moved over. MacKay estimates that the company is "approaching halfway on those features that are going forward. We don't have to do a lot of convincing anymore. It's more of an issue of timing with getting product management and engineering staff the knowledge and information that they need." -
-
+

The success of Ancestry's first deployment of the hint system on Kubernetes helped create momentum for greater adoption of the technology. "Engineers like to code, they like to do features, they don't like to sit around waiting for things to be deployed and worrying about scaling up and out and down," says MacKay. "After a while the engineers became our champions. At training sessions, the development teams were always the ones saying, 'Kubernetes saved our time tremendously; it's an enabler; it really is incredible.' Over time, we were able to convince our management that this was a transition that the industry is making and that we needed to be a part of it."

-
-
- "... 'I believe in Kubernetes. I believe in containerization. I think - if we can get there and establish ourselves in that world, we will be further along and far better off being agile and all the things we talk about, - and it'll go forward.'" -
-
+

A year later, Ancestry has transitioned a good number of applications to Kubernetes. "We have many different services that make up the rich environment that [the website] has from both the DNA side and the family history side," says MacKay. "We have front-end stacks, back-end stacks and back-end processing type stacks that are in the cluster."

-
-
+

The company continues to weigh which services it will move forward to Kubernetes, which ones will be kept as is, and which will be replaced in the future and thus don't have to be moved over. MacKay estimates that the company is "approaching halfway on those features that are going forward. We don't have to do a lot of convincing anymore. It's more of an issue of timing with getting product management and engineering staff the knowledge and information that they need."

+{{< case-studies/quote >}} +"... 'I believe in Kubernetes. I believe in containerization. I think if we can get there and establish ourselves in that world, we will be further along and far better off being agile and all the things we talk about, and it'll go forward.'" +{{< /case-studies/quote >}} -Looking ahead, MacKay sees Ancestry maximizing the benefits of Kubernetes in 2017. "We're very close to having everything that should be or could be in a Linux-friendly world in Kubernetes by the end of the year," he says, adding that he's looking forward to features such as federation and horizontal pod autoscaling that are currently in the works. "Kubernetes has been very wonderful for us and we continue to ride the wave."

-That wave, he points out, has everything to do with the vibrant Kubernetes community, which has grown by leaps and bounds since Ancestry joined it as an early adopter. "This is just a very rough way of judging it, but on Slack in June 2015, there were maybe 500 on there," MacKay says. "The last time I looked there were maybe 8,500 just on the Slack channel. There are so many major companies and different kinds of companies involved now. It's the variety of contributors, the number of contributors, the incredibly competent and friendly community."

-As much as he and his team at Ancestry have benefited from what he calls "the goodness and the technical abilities of many" in the community, they've also contributed information about best practices, logged bug issues and participated in the open source conversation. And they've been active in attending meetups to help educate and give back to the local tech community in Utah. Says MacKay: "We're trying to give back as far as our experience goes, rather than just code." -

When he meets with companies considering adopting cloud native infrastructure, the best advice he has to give from Ancestry's Kubernetes journey is this: "Start small, but with hard problems," he says. And "you need a patron who understands the vision of containerization, to help you tackle the political as well as other technical roadblocks that can occur when change is needed."

-With the changes that MacKay's team has led over the past year and a half, cloud native will be part of Ancestry's technological genealogy for years to come. MacKay has been such a champion of the technology that he says people have jokingly accused him of having a Kubernetes tattoo.

-"I really don't," he says with a laugh. "But I'm passionate. I'm not exclusive to any technology; I use whatever I need that's out there that makes us great. If it's something else, I'll use it. But right now I believe in Kubernetes. I believe in containerization. I think if we can get there and establish ourselves in that world, we will be further along and far better off being agile and all the things we talk about, and it'll go forward."

-He pauses. "So, yeah, I guess you can say I'm an evangelist for Kubernetes," he says. "But I'm not getting a tattoo!" +

Looking ahead, MacKay sees Ancestry maximizing the benefits of Kubernetes in 2017. "We're very close to having everything that should be or could be in a Linux-friendly world in Kubernetes by the end of the year," he says, adding that he's looking forward to features such as federation and horizontal pod autoscaling that are currently in the works. "Kubernetes has been very wonderful for us and we continue to ride the wave."

+

That wave, he points out, has everything to do with the vibrant Kubernetes community, which has grown by leaps and bounds since Ancestry joined it as an early adopter. "This is just a very rough way of judging it, but on Slack in June 2015, there were maybe 500 on there," MacKay says. "The last time I looked there were maybe 8,500 just on the Slack channel. There are so many major companies and different kinds of companies involved now. It's the variety of contributors, the number of contributors, the incredibly competent and friendly community."

-
-
+

As much as he and his team at Ancestry have benefited from what he calls "the goodness and the technical abilities of many" in the community, they've also contributed information about best practices, logged bug issues and participated in the open source conversation. And they've been active in attending meetups to help educate and give back to the local tech community in Utah. Says MacKay: "We're trying to give back as far as our experience goes, rather than just code."

+ +

When he meets with companies considering adopting cloud native infrastructure, the best advice he has to give from Ancestry's Kubernetes journey is this: "Start small, but with hard problems," he says. And "you need a patron who understands the vision of containerization, to help you tackle the political as well as other technical roadblocks that can occur when change is needed."

+ +

With the changes that MacKay's team has led over the past year and a half, cloud native will be part of Ancestry's technological genealogy for years to come. MacKay has been such a champion of the technology that he says people have jokingly accused him of having a Kubernetes tattoo.

+ +

"I really don't," he says with a laugh. "But I'm passionate. I'm not exclusive to any technology; I use whatever I need that's out there that makes us great. If it's something else, I'll use it. But right now I believe in Kubernetes. I believe in containerization. I think if we can get there and establish ourselves in that world, we will be further along and far better off being agile and all the things we talk about, and it'll go forward."

+ +

He pauses. "So, yeah, I guess you can say I'm an evangelist for Kubernetes," he says. "But I'm not getting a tattoo!"

diff --git a/content/en/case-studies/ant-financial/index.html b/content/en/case-studies/ant-financial/index.html index 1711ef97b8..542352902d 100644 --- a/content/en/case-studies/ant-financial/index.html +++ b/content/en/case-studies/ant-financial/index.html @@ -3,94 +3,80 @@ title: Ant Financial Case Study linkTitle: ant-financial case_study_styles: true cid: caseStudies -css: /css/style_case_studies.css featured: false + +new_case_study_styles: true +heading_background: /images/case-studies/antfinancial/banner1.jpg +heading_title_logo: /images/antfinancial_logo.png +subheading: > + Ant Financial's Hypergrowth Strategy Using Kubernetes +case_study_details: + - Company: Ant Financial + - Location: Hangzhou, China + - Industry: Financial Services --- -
-

CASE STUDY:
Ant Financial’s Hypergrowth Strategy Using Kubernetes +

Challenge

-

+

Officially founded in October 2014, Ant Financial originated from Alipay, the world's largest online payment platform that launched in 2004. The company also offers numerous other services leveraging technology innovation. With the volume of transactions Alipay handles for its 900+ million users worldwide (through its local and global partners)—256,000 transactions per second at the peak of Double 11 Singles Day 2017, and total gross merchandise value of $31 billion for Singles Day 2018—not to mention that of its other services, Ant Financial faces "data processing challenge in a whole new way," says Haojie Hang, who is responsible for Product Management for the Storage and Compute Group. "We see three major problems of operating at that scale: how to provide real-time compute, storage, and processing capability, for instance to make real-time recommendations for fraud detection; how to provide intelligence on top of this data, because there's too much data and then we're not getting enough insight; and how to apply security in the application level, in the middleware level, the system level, even the chip level." In order to provide reliable and consistent services to its customers, Ant Financial embraced containers in early 2014, and soon needed an orchestration solution for the tens-of-thousands-of-node clusters in its data centers.

-
+

Solution

-
- Company  Ant Financial     Location  Hangzhou, China     Industry  Financial Services -
+

After investigating several technologies, the team chose Kubernetes for orchestration, as well as a number of other CNCF projects, including Prometheus, OpenTracing, etcd and CoreDNS. "In late 2016, we decided that Kubernetes will be the de facto standard," says Hang. "Looking back, we made the right bet on the right technology. But then we needed to move the production workload from the legacy infrastructure to the latest Kubernetes-enabled platform, and that took some time, because we are very careful in terms of reliability and consistency." All core financial systems were containerized by November 2017, and the migration to Kubernetes is ongoing.

-
-
-
-
-

Challenge

- Officially founded in October 2014, Ant Financial originated from Alipay, the world’s largest online payment platform that launched in 2004. The company also offers numerous other services leveraging technology innovation. With the volume of transactions Alipay handles for its 900+ million users worldwide (through its local and global partners)—256,000 transactions per second at the peak of Double 11 Singles Day 2017, and total gross merchandise value of $31 billion for Singles Day 2018—not to mention that of its other services, Ant Financial faces “data processing challenge in a whole new way,” says Haojie Hang, who is responsible for Product Management for the Storage and Compute Group. “We see three major problems of operating at that scale: how to provide real-time compute, storage, and processing capability, for instance to make real-time recommendations for fraud detection; how to provide intelligence on top of this data, because there’s too much data and then we’re not getting enough insight; and how to apply security in the application level, in the middleware level, the system level, even the chip level.” In order to provide reliable and consistent services to its customers, Ant Financial embraced containers in early 2014, and soon needed an orchestration solution for the tens-of-thousands-of-node clusters in its data centers. - -

Solution

- After investigating several technologies, the team chose Kubernetes for orchestration, as well as a number of other CNCF projects, including Prometheus, OpenTracing, etcd and CoreDNS. “In late 2016, we decided that Kubernetes will be the de facto standard,” says Hang. “Looking back, we made the right bet on the right technology. But then we needed to move the production workload from the legacy infrastructure to the latest Kubernetes-enabled platform, and that took some time, because we are very careful in terms of reliability and consistency.” All core financial systems were containerized by November 2017, and the migration to Kubernetes is ongoing. -

Impact

- “We’ve seen at least tenfold in improvement in terms of the operations with cloud native technology, which means you can have tenfold increase in terms of output,” says Hang. Ant also provides its fully integrated financial cloud platform to business partners around the world, and hopes to power the next generation of digital banking with deep experience in service innovation and technology expertise. Hang says the team hasn’t begun to focus on optimizing the Kubernetes platform, either: “Because we’re still in the hyper growth stage, we’re not in a mode where we do cost saving yet.” -
-
-
-
-
- "In late 2016, we decided that Kubernetes will be the de facto standard. Looking back, we made the right bet on the right technology." -

- HAOJIE HANG, PRODUCT MANAGEMENT, ANT FINANCIAL
-
-
-
-
-

A spinoff of the multinational conglomerate Alibaba, Ant Financial boasts a $150+ billion valuation and the scale to match. The fintech startup, launched in 2014, is comprised of Alipay, the world’s largest online payment platform, and numerous other services leveraging technology innovation.

- And the volume of transactions that Alipay handles for over 900 million users worldwide (through its local and global partners) is staggering: 256,000 per second at the peak of Double 11 Singles Day 2017, and total gross merchandise value of $31 billion for Singles Day 2018. With the mission of “bringing the world equal opportunities,” Ant Financial is dedicated to creating an open, shared credit system and financial services platform through technology innovations. -

- Combine that with the operations of its other properties—such as the Huabei online credit system, Jiebei lending service, and the 350-million-user Ant Forest green energy mobile app—and Ant Financial faces “data processing challenge in a whole new way,” says Haojie Hang, who is responsible for Product Management for the Storage and Compute Group. “We see three major problems of operating at that scale: how to provide real-time compute, storage, and processing capability, for instance to make real-time recommendations for fraud detection; how to provide intelligence on top of this data, because there’s too much data and we’re not getting enough insight; and how to apply security in the application level, in the middleware level, the system level, even the chip level.” -

- To address those challenges and provide reliable and consistent services to its customers, Ant Financial embraced Docker containerization in 2014. But they soon realized that they needed an orchestration solution for some tens-of-thousands-of-node clusters in the company’s data centers. -
-
-
-
- "On Double 11 this year, we had plenty of nodes on Kubernetes, but compared to the whole scale of our infrastructure, this is still in progress."

- RANGER YU, GLOBAL TECHNOLOGY PARTNERSHIP & DEVELOPMENT, ANT FINANCIAL
+

"We've seen at least tenfold in improvement in terms of the operations with cloud native technology, which means you can have tenfold increase in terms of output," says Hang. Ant also provides its fully integrated financial cloud platform to business partners around the world, and hopes to power the next generation of digital banking with deep experience in service innovation and technology expertise. Hang says the team hasn't begun to focus on optimizing the Kubernetes platform, either: "Because we're still in the hyper growth stage, we're not in a mode where we do cost saving yet."

-
-
-
-
- The team investigated several technologies, including Docker Swarm and Mesos. “We did a lot of POCs, but we’re very careful in terms of production systems, because we want to make sure we don’t lose any data,” says Hang. “You cannot afford to have a service downtime for one minute; even one second has a very, very big impact. We operate every day under pressure to provide reliable and consistent services to consumers and businesses in China and globally.” -

- Ultimately, Hang says Ant chose Kubernetes because it checked all the boxes: a strong community, technology that “will be relevant in the next three to five years,” and a good match for the company’s engineering talent. “In late 2016, we decided that Kubernetes will be the de facto standard,” says Hang. “Looking back, we made the right bet on the right technology. But then we needed to move the production workload from the legacy infrastructure to the latest Kubernetes-enabled platform. We spent a lot of time learning and then training our people to build applications on Kubernetes well.” -

- All core financial systems were containerized by November 2017, and the migration to Kubernetes is ongoing. Ant’s platform also leverages a number of other CNCF projects, including Prometheus, OpenTracing, etcd and CoreDNS. “On Double 11 this year, we had plenty of nodes on Kubernetes, but compared to the whole scale of our infrastructure, this is still in progress,” says Ranger Yu, Global Technology Partnership & Development. -
-
-
-
- "We’re very grateful for CNCF and this amazing technology, which we need as we continue to scale globally. We’re definitely embracing the community and open source more in the future."

- HAOJIE HANG, PRODUCT MANAGEMENT, ANT FINANCIAL
-
-
+{{< case-studies/quote author="HAOJIE HANG, PRODUCT MANAGEMENT, ANT FINANCIAL" >}} +"In late 2016, we decided that Kubernetes will be the de facto standard. Looking back, we made the right bet on the right technology." +{{< /case-studies/quote >}} -
-
- Still, there has already been an impact. “Cloud native technology has benefited us greatly in terms of efficiency,” says Hang. “In general, we want to make sure our infrastructure is nimble and flexible enough for the work that could happen tomorrow. That’s the goal. And with cloud native technology, we’ve seen at least tenfold improvement in operations, which means you can have tenfold increase in terms of output. Let’s say you are operating 10 nodes with one person. With cloud native, tomorrow you can have 100 nodes.” -

- Ant also provides its financial cloud platform to partners around the world, and hopes to power the next generation of digital banking with deep experience in service innovation and technology expertise. Hang says the team hasn’t begun to focus on optimizing the Kubernetes platform, either: “Because we’re still in the hyper growth stage, we’re not in a mode where we do cost-saving yet.” -

- The CNCF community has also been a valuable asset during Ant Financial’s move to cloud native. “If you are applying a new technology, it’s very good to have a community to discuss technical problems with other users,” says Hang. “We’re very grateful for CNCF and this amazing technology, which we need as we continue to scale globally. We’re definitely embracing the community and open sourcing more in the future.” -
+{{< case-studies/lead >}} +A spinoff of the multinational conglomerate Alibaba, Ant Financial boasts a $150+ billion valuation and the scale to match. The fintech startup, launched in 2014, is comprised of Alipay, the world's largest online payment platform, and numerous other services leveraging technology innovation. +{{< /case-studies/lead >}} -
-
-"In China, we are the North Star in terms of innovation in financial and other related services,” says Hang. “We definitely want to make sure we’re still leading in the next 5 to 10 years with our investment in technology."

- RANGER YU, GLOBAL TECHNOLOGY PARTNERSHIP & DEVELOPMENT, ANT FINANCIAL
-
+

And the volume of transactions that Alipay handles for over 900 million users worldwide (through its local and global partners) is staggering: 256,000 per second at the peak of Double 11 Singles Day 2017, and total gross merchandise value of $31 billion for Singles Day 2018. With the mission of "bringing the world equal opportunities," Ant Financial is dedicated to creating an open, shared credit system and financial services platform through technology innovations.

-
- In fact, the company has already started to open source some of its cloud native middleware. “We are going to be very proactive about that,” says Yu. “CNCF provided a platform so everyone can plug in or contribute components. This is very good open source governance.” -

- Looking ahead, the Ant team will continue to evaluate many other CNCF projects. Building a service mesh community in China, the team has brought together many China-based companies and developers to discuss the potential of that technology. “Service mesh is very attractive for Chinese developers and end users because we have a lot of legacy systems running now, and it’s an ideal mid-layer to glue everything together, both new and legacy,” says Hang. “For new technologies, we look very closely at whether they will last.” -

- At Ant, Kubernetes passed that test with flying colors, and the team hopes other companies will follow suit. “In China, we are the North Star in terms of innovation in financial and other related services,” says Hang. “We definitely want to make sure we’re still leading in the next 5 to 10 years with our investment in technology.” +

Combine that with the operations of its other properties—such as the Huabei online credit system, Jiebei lending service, and the 350-million-user Ant Forest green energy mobile app—and Ant Financial faces "data processing challenge in a whole new way," says Haojie Hang, who is responsible for Product Management for the Storage and Compute Group. "We see three major problems of operating at that scale: how to provide real-time compute, storage, and processing capability, for instance to make real-time recommendations for fraud detection; how to provide intelligence on top of this data, because there's too much data and we're not getting enough insight; and how to apply security in the application level, in the middleware level, the system level, even the chip level."

-
-
+

To address those challenges and provide reliable and consistent services to its customers, Ant Financial embraced Docker containerization in 2014. But they soon realized that they needed an orchestration solution for some tens-of-thousands-of-node clusters in the company's data centers.

+ +{{< case-studies/quote + image="/images/case-studies/antfinancial/banner3.jpg" + author="RANGER YU, GLOBAL TECHNOLOGY PARTNERSHIP & DEVELOPMENT, ANT FINANCIAL" +>}} +"On Double 11 this year, we had plenty of nodes on Kubernetes, but compared to the whole scale of our infrastructure, this is still in progress." +{{< /case-studies/quote >}} + +

The team investigated several technologies, including Docker Swarm and Mesos. "We did a lot of POCs, but we're very careful in terms of production systems, because we want to make sure we don't lose any data," says Hang. "You cannot afford to have a service downtime for one minute; even one second has a very, very big impact. We operate every day under pressure to provide reliable and consistent services to consumers and businesses in China and globally."

+ +

Ultimately, Hang says Ant chose Kubernetes because it checked all the boxes: a strong community, technology that "will be relevant in the next three to five years," and a good match for the company's engineering talent. "In late 2016, we decided that Kubernetes will be the de facto standard," says Hang. "Looking back, we made the right bet on the right technology. But then we needed to move the production workload from the legacy infrastructure to the latest Kubernetes-enabled platform. We spent a lot of time learning and then training our people to build applications on Kubernetes well."

+ +

All core financial systems were containerized by November 2017, and the migration to Kubernetes is ongoing. Ant's platform also leverages a number of other CNCF projects, including Prometheus, OpenTracing, etcd and CoreDNS. "On Double 11 this year, we had plenty of nodes on Kubernetes, but compared to the whole scale of our infrastructure, this is still in progress," says Ranger Yu, Global Technology Partnership & Development.

+ +{{< case-studies/quote + image="/images/case-studies/antfinancial/banner4.jpg" + author="HAOJIE HANG, PRODUCT MANAGEMENT, ANT FINANCIAL" +>}} +"We're very grateful for CNCF and this amazing technology, which we need as we continue to scale globally. We're definitely embracing the community and open source more in the future." +{{< /case-studies/quote >}} + +

Still, there has already been an impact. "Cloud native technology has benefited us greatly in terms of efficiency," says Hang. "In general, we want to make sure our infrastructure is nimble and flexible enough for the work that could happen tomorrow. That's the goal. And with cloud native technology, we've seen at least tenfold improvement in operations, which means you can have tenfold increase in terms of output. Let's say you are operating 10 nodes with one person. With cloud native, tomorrow you can have 100 nodes."

+ +

Ant also provides its financial cloud platform to partners around the world, and hopes to power the next generation of digital banking with deep experience in service innovation and technology expertise. Hang says the team hasn't begun to focus on optimizing the Kubernetes platform, either: "Because we're still in the hyper growth stage, we're not in a mode where we do cost-saving yet."

+ +

The CNCF community has also been a valuable asset during Ant Financial's move to cloud native. "If you are applying a new technology, it's very good to have a community to discuss technical problems with other users," says Hang. "We're very grateful for CNCF and this amazing technology, which we need as we continue to scale globally. We're definitely embracing the community and open sourcing more in the future."

+ +{{< case-studies/quote + image="/images/case-studies/antfinancial/banner4.jpg" + author="RANGER YU, GLOBAL TECHNOLOGY PARTNERSHIP & DEVELOPMENT, ANT FINANCIAL" +>}} +"In China, we are the North Star in terms of innovation in financial and other related services," says Hang. "We definitely want to make sure we're still leading in the next 5 to 10 years with our investment in technology." +{{< /case-studies/quote >}} + +

In fact, the company has already started to open source some of its cloud native middleware. "We are going to be very proactive about that," says Yu. "CNCF provided a platform so everyone can plug in or contribute components. This is very good open source governance."

+ +

Looking ahead, the Ant team will continue to evaluate many other CNCF projects. Building a service mesh community in China, the team has brought together many China-based companies and developers to discuss the potential of that technology. "Service mesh is very attractive for Chinese developers and end users because we have a lot of legacy systems running now, and it's an ideal mid-layer to glue everything together, both new and legacy," says Hang. "For new technologies, we look very closely at whether they will last."

+ +

At Ant, Kubernetes passed that test with flying colors, and the team hopes other companies will follow suit. "In China, we are the North Star in terms of innovation in financial and other related services," says Hang. "We definitely want to make sure we're still leading in the next 5 to 10 years with our investment in technology."

diff --git a/content/en/case-studies/appdirect/index.html b/content/en/case-studies/appdirect/index.html index ca6b0b8fe9..ffb06c67b9 100644 --- a/content/en/case-studies/appdirect/index.html +++ b/content/en/case-studies/appdirect/index.html @@ -1,99 +1,85 @@ --- title: AppDirect Case Study - linkTitle: AppDirect case_study_styles: true cid: caseStudies -css: /css/style_case_studies.css logo: appdirect_featured_logo.png featured: true weight: 4 quote: > We made the right decisions at the right time. Kubernetes and the cloud native technologies are now seen as the de facto ecosystem. + +new_case_study_styles: true +heading_background: /images/case-studies/appdirect/banner1.jpg +heading_title_logo: /images/appdirect_logo.png +subheading: > + AppDirect: How AppDirect Supported the 10x Growth of Its Engineering Staff with Kubernetess +case_study_details: + - Company: AppDirect + - Location: San Francisco, California + - Industry: Software --- -
-

CASE STUDY:
AppDirect: How AppDirect Supported the 10x Growth of Its Engineering Staff with Kubernetess -

+

Challenge

-
+

AppDirect provides an end-to-end commerce platform for cloud-based products and services. When Director of Software Development Pierre-Alexandre Lacerte began working there in 2014, the company had a monolith application deployed on a "tomcat infrastructure, and the whole release process was complex for what it should be," he says. "There were a lot of manual steps involved, with one engineer building a feature, then another team picking up the change. So you had bottlenecks in the pipeline to ship a feature to production." At the same time, the engineering team was growing, and the company realized it needed a better infrastructure to both support that growth and increase velocity.

-
- Company  AppDirect     Location  San Francisco, California -     Industry  Software -
+

Solution

+ +

"My idea was: Let's create an environment where teams can deploy their services faster, and they will say, 'Okay, I don't want to build in the monolith anymore. I want to build a service,'" says Lacerte. They considered and prototyped several different technologies before deciding to adopt Kubernetes in early 2016. Lacerte's team has also integrated Prometheus monitoring into the platform; tracing is next. Today, AppDirect has more than 50 microservices in production and 15 Kubernetes clusters deployed on AWS and on premise around the world.

-
-
-
-
-

Challenge

- AppDirect provides an end-to-end commerce platform for cloud-based products and services. When Director of Software Development Pierre-Alexandre Lacerte began working there in 2014, the company had a monolith application deployed on a "tomcat infrastructure, and the whole release process was complex for what it should be," he says. "There were a lot of manual steps involved, with one engineer building a feature, then another team picking up the change. So you had bottlenecks in the pipeline to ship a feature to production." At the same time, the engineering team was growing, and the company realized it needed a better infrastructure to both support that growth and increase velocity. -

-

Solution

- "My idea was: Let’s create an environment where teams can deploy their services faster, and they will say, ‘Okay, I don’t want to build in the monolith anymore. I want to build a service,’" says Lacerte. They considered and prototyped several different technologies before deciding to adopt Kubernetes in early 2016. Lacerte’s team has also integrated Prometheus monitoring into the platform; tracing is next. Today, AppDirect has more than 50 microservices in production and 15 Kubernetes clusters deployed on AWS and on premise around the world. -

Impact

- The Kubernetes platform has helped support the engineering team’s 10x growth over the past few years. Coupled with the fact that they were continually adding new features, Lacerte says, "I think our velocity would have slowed down a lot if we didn’t have this new infrastructure." Moving to Kubernetes and services has meant that deployments have become much faster due to less dependency on custom-made, brittle shell scripts with SCP commands. Time to deploy a new version has shrunk from 4 hours to a few minutes. Additionally, the company invested a lot of effort to make things self-service for developers. "Onboarding a new service doesn’t require Jira tickets or meeting with three different teams," says Lacerte. Today, the company sees 1,600 deployments per week, compared to 1-30 before. The company also achieved cost savings by moving its marketplace and billing monoliths to Kubernetes from legacy EC2 hosts as well as by leveraging autoscaling, as traffic is higher during business hours. -
-
-
-
-
- "It was an immense engineering culture shift, but the benefits are undeniable in terms of scale and speed." -

- Alexandre Gervais, Staff Software Developer, AppDirect
-
-
-
-
-

With its end-to-end commerce platform for cloud-based products and services, AppDirect has been helping organizations such as Comcast and GoDaddy simplify the digital supply chain since 2009.

-
- When Director of Software Development Pierre-Alexandre Lacerte started working there in 2014, the company had a monolith application deployed on a "tomcat infrastructure, and the whole release process was complex for what it should be," he says. "There were a lot of manual steps involved, with one engineer building a feature then creating a pull request, and a QA or another engineer validating the feature. Then it gets merged and someone else will take care of the deployment. So we had bottlenecks in the pipeline to ship a feature to production."

- At the same time, the engineering team of 40 was growing, and the company wanted to add an increasing number of features to its products. As a member of the platform team, Lacerte began hearing from multiple teams that wanted to deploy applications using different frameworks and languages, from Node.js to Spring Boot Java. He soon realized that in order to both support growth and increase velocity, the company needed a better infrastructure, and a system in which teams are autonomous, can do their own deploys, and be responsible for their services in production. -
-
-
-
- "We made the right decisions at the right time. Kubernetes and the cloud native technologies are now seen as the de facto ecosystem. We know where to focus our efforts in order to tackle the new wave of challenges we face as we scale out. The community is so active and vibrant, which is a great complement to our awesome internal team."

- Alexandre Gervais, Staff Software Developer, AppDirect -
+

The Kubernetes platform has helped support the engineering team's 10x growth over the past few years. Coupled with the fact that they were continually adding new features, Lacerte says, "I think our velocity would have slowed down a lot if we didn't have this new infrastructure." Moving to Kubernetes and services has meant that deployments have become much faster due to less dependency on custom-made, brittle shell scripts with SCP commands. Time to deploy a new version has shrunk from 4 hours to a few minutes. Additionally, the company invested a lot of effort to make things self-service for developers. "Onboarding a new service doesn't require Jira tickets or meeting with three different teams," says Lacerte. Today, the company sees 1,600 deployments per week, compared to 1-30 before. The company also achieved cost savings by moving its marketplace and billing monoliths to Kubernetes from legacy EC2 hosts as well as by leveraging autoscaling, as traffic is higher during business hours.

-
-
-
-
- From the beginning, Lacerte says, "My idea was: Let’s create an environment where teams can deploy their services faster, and they will say, ‘Okay, I don’t want to build in the monolith anymore. I want to build a service.’" (Lacerte left the company in 2019.)

- Working with the operations team, Lacerte’s group got more control and access to the company’s AWS infrastructure, and started prototyping several orchestration technologies. "Back then, Kubernetes was a little underground, unknown," he says. "But we looked at the community, the number of pull requests, the velocity on GitHub, and we saw it was getting traction. And we found that it was much easier for us to manage than the other technologies." - They spun up the first few services on Kubernetes using Chef and Terraform provisioning, and as more services were added, more automation was, too. "We have clusters around the world—in Korea, in Australia, in Germany, and in the U.S.," says Lacerte. "Automation is critical for us." They’re now largely using Kops, and are looking at managed Kubernetes offerings from several cloud providers.

- Today, though the monolith still exists, there are fewer and fewer commits and features. All teams are deploying on the new infrastructure, and services are the norm. AppDirect now has more than 50 microservices in production and 15 Kubernetes clusters deployed on AWS and on premise around the world.

- Lacerte’s strategy ultimately worked because of the very real impact the Kubernetes platform has had to deployment time. Due to less dependency on custom-made, brittle shell scripts with SCP commands, time to deploy a new version has shrunk from 4 hours to a few minutes. Additionally, the company invested a lot of effort to make things self-service for developers. "Onboarding a new service doesn’t require Jira tickets or meeting with three different teams," says Lacerte. Today, the company sees 1,600 deployments per week, compared to 1-30 before. -
-
-
-
- "I think our velocity would have slowed down a lot if we didn’t have this new infrastructure."

- Pierre-Alexandre Lacerte, Director of Software Development, AppDirect
-
-
+{{< case-studies/quote author="Alexandre Gervais, Staff Software Developer, AppDirect" >}} +"It was an immense engineering culture shift, but the benefits are undeniable in terms of scale and speed." +{{< /case-studies/quote >}} -
+{{< case-studies/lead >}} +With its end-to-end commerce platform for cloud-based products and services, AppDirect has been helping organizations such as Comcast and GoDaddy simplify the digital supply chain since 2009. +{{< /case-studies/lead >}} -
- Additionally, the Kubernetes platform has helped support the engineering team’s 10x growth over the past few years. "Ownership, a core value of AppDirect, reflects in our ability to ship services independently of our monolith code base," says Staff Software Developer Alexandre Gervais, who worked with Lacerte on the initiative. "Small teams now own critical parts of our business domain model, and they operate in their decoupled domain of expertise, with limited knowledge of the entire codebase. This reduces and isolates some of the complexity." Coupled with the fact that they were continually adding new features, Lacerte says, "I think our velocity would have slowed down a lot if we didn’t have this new infrastructure." - The company also achieved cost savings by moving its marketplace and billing monoliths to Kubernetes from legacy EC2 hosts as well as by leveraging autoscaling, as traffic is higher during business hours.

- AppDirect’s cloud native stack also includes gRPC and Fluentd, and the team is currently working on setting up OpenCensus. The platform already has Prometheus integrated, so "when teams deploy their service, they have their notifications, alerts and configurations," says Lacerte. "For example, in the test environment, I want to get a message on Slack, and in production, I want a Slack message and I also want to get paged. We have integration with pager duty. Teams have more ownership on their services." +

When Director of Software Development Pierre-Alexandre Lacerte started working there in 2014, the company had a monolith application deployed on a "tomcat infrastructure, and the whole release process was complex for what it should be," he says. "There were a lot of manual steps involved, with one engineer building a feature then creating a pull request, and a QA or another engineer validating the feature. Then it gets merged and someone else will take care of the deployment. So we had bottlenecks in the pipeline to ship a feature to production."

-
+

At the same time, the engineering team of 40 was growing, and the company wanted to add an increasing number of features to its products. As a member of the platform team, Lacerte began hearing from multiple teams that wanted to deploy applications using different frameworks and languages, from Node.js to Spring Boot Java. He soon realized that in order to both support growth and increase velocity, the company needed a better infrastructure, and a system in which teams are autonomous, can do their own deploys, and be responsible for their services in production.

-
-
-"We moved from a culture limited to ‘pushing code in a branch’ to exciting new responsibilities outside of the code base: deployment of features and configurations; monitoring of application and business metrics; and on-call support in case of outages. It was an immense engineering culture shift, but the benefits are undeniable in terms of scale and speed."

- Pierre-Alexandre Lacerte, Director of Software Development, AppDirect
-
+{{< case-studies/quote + image="/images/case-studies/appdirect/banner3.jpg" + author="Alexandre Gervais, Staff Software Developer, AppDirect" +>}} +"We made the right decisions at the right time. Kubernetes and the cloud native technologies are now seen as the de facto ecosystem. We know where to focus our efforts in order to tackle the new wave of challenges we face as we scale out. The community is so active and vibrant, which is a great complement to our awesome internal team." +{{< /case-studies/quote >}} -
- That of course also means more responsibility. "We asked engineers to expand their horizons," says Gervais. "We moved from a culture limited to ‘pushing code in a branch’ to exciting new responsibilities outside of the code base: deployment of features and configurations; monitoring of application and business metrics; and on-call support in case of outages. It was an immense engineering culture shift, but the benefits are undeniable in terms of scale and speed."

- As the engineering ranks continue to grow, the platform team has a new challenge, of making sure that the Kubernetes platform is accessible and easily utilized by everyone. "How can we make sure that when we add more people to our team that they are efficient, productive, and know how to ramp up on the platform?" Lacerte says. So we have the evangelists, the documentation, some project examples. We do demos, we have AMA sessions. We’re trying different strategies to get everyone’s attention."

- Three and a half years into their Kubernetes journey, Gervais feels AppDirect "made the right decisions at the right time," he says. "Kubernetes and the cloud native technologies are now seen as the de facto ecosystem. We know where to focus our efforts in order to tackle the new wave of challenges we face as we scale out. The community is so active and vibrant, which is a great complement to our awesome internal team. Going forward, our focus will really be geared towards benefiting from the ecosystem by providing added business value in our day-to-day operations." +

From the beginning, Lacerte says, "My idea was: Let's create an environment where teams can deploy their services faster, and they will say, 'Okay, I don't want to build in the monolith anymore. I want to build a service.'" (Lacerte left the company in 2019.)

+

Working with the operations team, Lacerte's group got more control and access to the company's AWS infrastructure, and started prototyping several orchestration technologies. "Back then, Kubernetes was a little underground, unknown," he says. "But we looked at the community, the number of pull requests, the velocity on GitHub, and we saw it was getting traction. And we found that it was much easier for us to manage than the other technologies."

-
-
+

They spun up the first few services on Kubernetes using Chef and Terraform provisioning, and as more services were added, more automation was, too. "We have clusters around the world—in Korea, in Australia, in Germany, and in the U.S.," says Lacerte. "Automation is critical for us." They're now largely using Kops, and are looking at managed Kubernetes offerings from several cloud providers.

+ +

Today, though the monolith still exists, there are fewer and fewer commits and features. All teams are deploying on the new infrastructure, and services are the norm. AppDirect now has more than 50 microservices in production and 15 Kubernetes clusters deployed on AWS and on premise around the world.

+ +

Lacerte's strategy ultimately worked because of the very real impact the Kubernetes platform has had to deployment time. Due to less dependency on custom-made, brittle shell scripts with SCP commands, time to deploy a new version has shrunk from 4 hours to a few minutes. Additionally, the company invested a lot of effort to make things self-service for developers. "Onboarding a new service doesn't require Jira tickets or meeting with three different teams," says Lacerte. Today, the company sees 1,600 deployments per week, compared to 1-30 before.

+ +{{< case-studies/quote + image="/images/case-studies/appdirect/banner4.jpg" + author="Pierre-Alexandre Lacerte, Director of Software Development, AppDirect" +>}} +"I think our velocity would have slowed down a lot if we didn't have this new infrastructure." +{{< /case-studies/quote >}} + +

Additionally, the Kubernetes platform has helped support the engineering team's 10x growth over the past few years. "Ownership, a core value of AppDirect, reflects in our ability to ship services independently of our monolith code base," says Staff Software Developer Alexandre Gervais, who worked with Lacerte on the initiative. "Small teams now own critical parts of our business domain model, and they operate in their decoupled domain of expertise, with limited knowledge of the entire codebase. This reduces and isolates some of the complexity." Coupled with the fact that they were continually adding new features, Lacerte says, "I think our velocity would have slowed down a lot if we didn't have this new infrastructure."

+ +

The company also achieved cost savings by moving its marketplace and billing monoliths to Kubernetes from legacy EC2 hosts as well as by leveraging autoscaling, as traffic is higher during business hours.

+ +

AppDirect's cloud native stack also includes gRPC and Fluentd, and the team is currently working on setting up OpenCensus. The platform already has Prometheus integrated, so "when teams deploy their service, they have their notifications, alerts and configurations," says Lacerte. "For example, in the test environment, I want to get a message on Slack, and in production, I want a Slack message and I also want to get paged. We have integration with pager duty. Teams have more ownership on their services."

+ +{{< case-studies/quote author="Pierre-Alexandre Lacerte, Director of Software Development, AppDirect" >}} +"We moved from a culture limited to 'pushing code in a branch' to exciting new responsibilities outside of the code base: deployment of features and configurations; monitoring of application and business metrics; and on-call support in case of outages. It was an immense engineering culture shift, but the benefits are undeniable in terms of scale and speed." +{{< /case-studies/quote >}} + +

That of course also means more responsibility. "We asked engineers to expand their horizons," says Gervais. "We moved from a culture limited to 'pushing code in a branch' to exciting new responsibilities outside of the code base: deployment of features and configurations; monitoring of application and business metrics; and on-call support in case of outages. It was an immense engineering culture shift, but the benefits are undeniable in terms of scale and speed."

+ +

As the engineering ranks continue to grow, the platform team has a new challenge, of making sure that the Kubernetes platform is accessible and easily utilized by everyone. "How can we make sure that when we add more people to our team that they are efficient, productive, and know how to ramp up on the platform?" Lacerte says. So we have the evangelists, the documentation, some project examples. We do demos, we have AMA sessions. We're trying different strategies to get everyone's attention."

+ +

Three and a half years into their Kubernetes journey, Gervais feels AppDirect "made the right decisions at the right time," he says. "Kubernetes and the cloud native technologies are now seen as the de facto ecosystem. We know where to focus our efforts in order to tackle the new wave of challenges we face as we scale out. The community is so active and vibrant, which is a great complement to our awesome internal team. Going forward, our focus will really be geared towards benefiting from the ecosystem by providing added business value in our day-to-day operations."

diff --git a/content/en/case-studies/babylon/index.html b/content/en/case-studies/babylon/index.html index dce0612175..6b86096b02 100644 --- a/content/en/case-studies/babylon/index.html +++ b/content/en/case-studies/babylon/index.html @@ -3,107 +3,82 @@ title: Babylon Case Study linkTitle: Babylon case_study_styles: true cid: caseStudies -css: /css/case-studies-gradient.css logo: babylon_featured_logo.svg featured: true weight: 1 quote: > Kubernetes is a great platform for machine learning because it comes with all the scheduling and scalability that you need. + +new_case_study_styles: true +heading_background: /images/case-studies/babylon/banner4.jpg +heading_title_text: Babylon +use_gradient_overlay: true +subheading: > + AppDirect: How Cloud Native Is Enabling Babylon's Medical AI Innovations +case_study_details: + - Company: Babylon + - Location: United Kingdom + - Industry: AI, Healthcare --- +

Challenge

-
-

CASE STUDY: Babylon

-
How Cloud Native Is Enabling Babylon’s Medical AI Innovations
-
+

A large number of Babylon's products leverage machine learning and artificial intelligence, and in 2019, there wasn't enough computing power in-house to run a particular experiment. The company was also growing (from 100 to 1,600 in three years) and planning expansion into other countries.

-
- Company  Babylon     Location  United Kingdom     Industry  AI, Healthcare -
+

Solution

-
-
-
-
-

Challenge

- A large number of Babylon’s products leverage machine learning and artificial intelligence, and in 2019, there wasn’t enough computing power in-house to run a particular experiment. The company was also growing (from 100 to 1,600 in three years) and planning expansion into other countries. - - -

Solution

- Babylon had migrated its user-facing applications to a Kubernetes platform in 2018, so the infrastructure team turned to Kubeflow, a toolkit for machine learning on Kubernetes. “We tried to create a Kubernetes core server, we deployed Kubeflow, and we orchestrated the whole experiment, which ended up being a really good success,” says AI Infrastructure Lead Jérémie Vallée. The team began building a self-service AI training platform on top of Kubernetes. +

Babylon had migrated its user-facing applications to a Kubernetes platform in 2018, so the infrastructure team turned to Kubeflow, a toolkit for machine learning on Kubernetes. "We tried to create a Kubernetes core server, we deployed Kubeflow, and we orchestrated the whole experiment, which ended up being a really good success," says AI Infrastructure Lead Jérémie Vallée. The team began building a self-service AI training platform on top of Kubernetes.

Impact

- Instead of waiting hours or days to be able to compute, teams can get access instantaneously. Clinical validations used to take 10 hours; now they are done in under 20 minutes. The portability of the cloud native platform has also enabled Babylon to expand into other countries.
-
-
-
-
- “Kubernetes is a great platform for machine learning because it comes with all the scheduling and scalability that you need.” -

- JÉRÉMIE VALLÉE, AI INFRASTRUCTURE LEAD AT BABYLON

-
-
+

Instead of waiting hours or days to be able to compute, teams can get access instantaneously. Clinical validations used to take 10 hours; now they are done in under 20 minutes. The portability of the cloud native platform has also enabled Babylon to expand into other countries.

-
-
-

Babylon’s mission is to put accessible and affordable healthcare services in the hands of every person on earth.

+{{< case-studies/quote + image="/images/case-studies/babylon/banner1.jpg" + author="JÉRÉMIE VALLÉE, AI INFRASTRUCTURE LEAD AT BABYLON" +>}} +"Kubernetes is a great platform for machine learning because it comes with all the scheduling and scalability that you need." +{{< /case-studies/quote >}} -

Since its launch in the U.K. in 2013, the startup has facilitated millions of digital consultations around the world. In the U.K., patients were typically waiting a week or two for a doctor’s appointment. Through Babylon’s NHS service, GP at Hand—which has more than 75,000 registered patients—39% get an appointment through their phone within 30 minutes, and 89% within 6 hours.

+{{< case-studies/lead >}} +Babylon's mission is to put accessible and affordable healthcare services in the hands of every person on earth. +{{< /case-studies/lead >}} -

That’s just the start. “We try to combine different types of technology with the medical expertise that we have in-house to build products that will help patients manage and understand their health, and also help doctors be more efficient at what they do,” says Jérémie Vallée, AI Infrastructure Lead at Babylon.

+

Since its launch in the U.K. in 2013, the startup has facilitated millions of digital consultations around the world. In the U.K., patients were typically waiting a week or two for a doctor's appointment. Through Babylon's NHS service, GP at Hand—which has more than 75,000 registered patients—39% get an appointment through their phone within 30 minutes, and 89% within 6 hours.

-

A large number of these products leverage machine learning and artificial intelligence, and in 2019, researchers hit a pain point. “We have some servers in-house where our researchers were doing a lot of AI experiments and some training of models, and we came to a point where we didn’t have enough compute in-house to run a particular experiment,” says Vallée.

-

- Babylon had migrated its user-facing applications to a Kubernetes platform in 2018, “and we had a lot of Kubernetes knowledge thanks to the migration,” he adds. To optimize some of the models that had been created, the team turned to Kubeflow, a toolkit for machine learning on Kubernetes. “We tried to create a Kubernetes core server, we deployed Kubeflow, and we orchestrated the whole experiment, which ended up being a really good success,” he says.

-

- Based on that experience, Vallée’s team was tasked with building a self-service platform to help Babylon’s AI teams become more efficient, and by extension help get products to market faster. The main requirements: (1) the ability to give researchers and engineers access to the compute they needed, regardless of the size of the experiments they may need to run; (2) a way to provide teams with the best tools that they needed to do their work, on demand and in a centralized way; and (3) the training platform had to be close to the data that was being managed, because of the company’s expansion into different countries.

+

That's just the start. "We try to combine different types of technology with the medical expertise that we have in-house to build products that will help patients manage and understand their health, and also help doctors be more efficient at what they do," says Jérémie Vallée, AI Infrastructure Lead at Babylon.

+

A large number of these products leverage machine learning and artificial intelligence, and in 2019, researchers hit a pain point. "We have some servers in-house where our researchers were doing a lot of AI experiments and some training of models, and we came to a point where we didn't have enough compute in-house to run a particular experiment," says Vallée.

-
-
+

Babylon had migrated its user-facing applications to a Kubernetes platform in 2018, "and we had a lot of Kubernetes knowledge thanks to the migration," he adds. To optimize some of the models that had been created, the team turned to Kubeflow, a toolkit for machine learning on Kubernetes. "We tried to create a Kubernetes core server, we deployed Kubeflow, and we orchestrated the whole experiment, which ended up being a really good success," he says.

-
-
- “Delivering a self-service platform where users are empowered to run their own workload has enabled our data scientist community to do hyper parameter tuning and general algorithm development without any cloud skill and without the help of platform engineers, thus accelerating our innovation.”

- CAROLINE HARGROVE, CHIEF TECHNOLOGY OFFICER AT BABYLON

-
-
+

Based on that experience, Vallée's team was tasked with building a self-service platform to help Babylon's AI teams become more efficient, and by extension help get products to market faster. The main requirements: (1) the ability to give researchers and engineers access to the compute they needed, regardless of the size of the experiments they may need to run; (2) a way to provide teams with the best tools that they needed to do their work, on demand and in a centralized way; and (3) the training platform had to be close to the data that was being managed, because of the company's expansion into different countries.

-
-
-

- Kubernetes was an enabler on every count. “Kubernetes is a great platform for machine learning because it comes with all the scheduling and scalability that you need,” says Vallée. The need to keep data in every country in which Babylon operates requires a multi-region, multi-cloud strategy, and some countries might not even have a public cloud provider at all. “We wanted to make this platform portable so that we can run training jobs anywhere,” he says. “Kubernetes offered a base layer that allows you to deploy the platform outside of the cloud provider, and then deploy whatever tooling you need. That was a very good selling point for us.”

-

- Once the team decided to build the Babylon AI Research platform on top of Kubernetes, they referred to the Cloud Native Landscape to build out the stack: Prometheus and Grafana for monitoring; an Istio service mesh to control the network on the training platform and control what access all of the workflows would have; Helm to deploy the stack; and Flux to manage the GitOps part of the pipeline.

-

- The cloud native AI platform has had a huge impact at Babylon. The first research projects run on the platform mostly involved machine learning and natural language processing. These experiments required a huge amount of compute—1600 CPU, 3.2 TB RAM—which was much more than Babylon had in-house. Plus, access to compute used to take hours, or sometimes even days, depending on how busy the platform team was. “Now, with Kubernetes and the self-service platform that we provide, it’s pretty much instantaneous,” says Vallée.

-

- Another important type of work that’s done on the platform is clinical validation for new applications such as Babylon’s Symptom Checker, which calculates the probability of a disease given the evidence input by the user. “Being in healthcare, we want all of our models to be safe before they’re going to hit production,” says Vallée. Using Argo for GitOps “enabled us to scale the process massively.”

+{{< case-studies/quote author="CAROLINE HARGROVE, CHIEF TECHNOLOGY OFFICER AT BABYLON" >}} +"Delivering a self-service platform where users are empowered to run their own workload has enabled our data scientist community to do hyper parameter tuning and general algorithm development without any cloud skill and without the help of platform engineers, thus accelerating our innovation." +{{< /case-studies/quote >}} -

-
-
+

Kubernetes was an enabler on every count. "Kubernetes is a great platform for machine learning because it comes with all the scheduling and scalability that you need," says Vallée. The need to keep data in every country in which Babylon operates requires a multi-region, multi-cloud strategy, and some countries might not even have a public cloud provider at all. "We wanted to make this platform portable so that we can run training jobs anywhere," he says. "Kubernetes offered a base layer that allows you to deploy the platform outside of the cloud provider, and then deploy whatever tooling you need. That was a very good selling point for us."

+

Once the team decided to build the Babylon AI Research platform on top of Kubernetes, they referred to the Cloud Native Landscape to build out the stack: Prometheus and Grafana for monitoring; an Istio service mesh to control the network on the training platform and control what access all of the workflows would have; Helm to deploy the stack; and Flux to manage the GitOps part of the pipeline.

-
-
- “Giving a Kubernetes-based platform to our data scientists has meant increased security, increased innovation through empowerment, and a more affordable health service as our cloud engineers are building an experience that is used by hundreds on a daily basis, rather than supporting specific bespoke use cases.”

- JEAN MARIE FERDEGUE, DIRECTOR OF PLATFORM OPERATIONS AT BABYLON

-
-
+

The cloud native AI platform has had a huge impact at Babylon. The first research projects run on the platform mostly involved machine learning and natural language processing. These experiments required a huge amount of compute—1600 CPU, 3.2 TB RAM—which was much more than Babylon had in-house. Plus, access to compute used to take hours, or sometimes even days, depending on how busy the platform team was. "Now, with Kubernetes and the self-service platform that we provide, it's pretty much instantaneous," says Vallée.

-
-
-

- Researchers used to have to wait up to 10 hours to get results on new versions of their models. With Kubernetes, that time is now down to under 20 minutes. Plus, previously they could only run one clinical validation at a time, now they can run many parallel ones if they need to—a huge benefit considering that in the past three years, Babylon has grown from 100 to 1,600 employees.

-

- “Delivering a self-service platform where users are empowered to run their own workload has enabled our data scientist community to do hyper parameter tuning and general algorithm development without any cloud skill and without the help of platform engineers, thus accelerating our innovation,” says Chief Technology Officer Caroline Hargrove.

-

- Adds Director of Platform Operations Jean Marie Ferdegue: “Giving a Kubernetes-based platform to our data scientists has meant increased security, increased innovation through empowerment, and a more affordable health service as our cloud engineers are building an experience that is used by hundreds on a daily basis, rather than supporting specific bespoke use cases.”

-

- Plus, as Babylon continues to expand, “it will be very easy to onboard new countries,” says Vallée. “Fifteen months ago when we deployed this platform, we had one big environment in the U.K., but now we have one in Canada, we have one in Asia, and we have one coming in the U.S. This is one of the things that Kubernetes and the other cloud native projects have enabled for us.” -

-

- Babylon’s road map for cloud native involves onboarding all of the company’s AI efforts to the platform. Increasingly, that includes AI services of care. “I think this is going to be an interesting field where AI and healthcare meet,” Vallée says. “It’s kind of a complex problem and there’s a lot of issues around this. So with our platform, we want to say, ‘What can we do to make this less painful for our developers and machine learning engineers?’” -

-
-
- +

Another important type of work that's done on the platform is clinical validation for new applications such as Babylon's Symptom Checker, which calculates the probability of a disease given the evidence input by the user. "Being in healthcare, we want all of our models to be safe before they're going to hit production," says Vallée. Using Argo for GitOps "enabled us to scale the process massively."

+ +{{< case-studies/quote + image="/images/case-studies/babylon/banner2.jpg" + author="JEAN MARIE FERDEGUE, DIRECTOR OF PLATFORM OPERATIONS AT BABYLON" +>}} +"Giving a Kubernetes-based platform to our data scientists has meant increased security, increased innovation through empowerment, and a more affordable health service as our cloud engineers are building an experience that is used by hundreds on a daily basis, rather than supporting specific bespoke use cases." +{{< /case-studies/quote >}} + +

Researchers used to have to wait up to 10 hours to get results on new versions of their models. With Kubernetes, that time is now down to under 20 minutes. Plus, previously they could only run one clinical validation at a time, now they can run many parallel ones if they need to—a huge benefit considering that in the past three years, Babylon has grown from 100 to 1,600 employees.

+ +

"Delivering a self-service platform where users are empowered to run their own workload has enabled our data scientist community to do hyper parameter tuning and general algorithm development without any cloud skill and without the help of platform engineers, thus accelerating our innovation," says Chief Technology Officer Caroline Hargrove.

+ +

Adds Director of Platform Operations Jean Marie Ferdegue: "Giving a Kubernetes-based platform to our data scientists has meant increased security, increased innovation through empowerment, and a more affordable health service as our cloud engineers are building an experience that is used by hundreds on a daily basis, rather than supporting specific bespoke use cases."

+ +

Plus, as Babylon continues to expand, "it will be very easy to onboard new countries," says Vallée. "Fifteen months ago when we deployed this platform, we had one big environment in the U.K., but now we have one in Canada, we have one in Asia, and we have one coming in the U.S. This is one of the things that Kubernetes and the other cloud native projects have enabled for us."

+ +

Babylon's road map for cloud native involves onboarding all of the company's AI efforts to the platform. Increasingly, that includes AI services of care. "I think this is going to be an interesting field where AI and healthcare meet," Vallée says. "It's kind of a complex problem and there's a lot of issues around this. So with our platform, we want to say, 'What can we do to make this less painful for our developers and machine learning engineers?'"

diff --git a/content/en/case-studies/blablacar/index.html b/content/en/case-studies/blablacar/index.html index 2d55ffb8d0..e6537a672f 100644 --- a/content/en/case-studies/blablacar/index.html +++ b/content/en/case-studies/blablacar/index.html @@ -1,98 +1,85 @@ --- title: BlaBlaCar Case Study - case_study_styles: true cid: caseStudies -css: /css/style_blablacar.css + +new_case_study_styles: true +heading_background: /images/case-studies/blablacar/banner1.jpg +heading_title_logo: /images/blablacar_logo.png +subheading: > + Turning to Containerization to Support Millions of Rideshares +case_study_details: + - Company: BlaBlaCar + - Location: Paris, France + - Industry: Ridesharing Company --- -
-

CASE STUDY:
Turning to Containerization to Support Millions of Rideshares

+

Challenge

-
+

The world's largest long-distance carpooling community, BlaBlaCar, connects 40 million members across 22 countries. The company has been experiencing exponential growth since 2012 and needed its infrastructure to keep up. "When you're thinking about doubling the number of servers, you start thinking, 'What should I do to be more efficient?'" says Simon Lallemand, Infrastructure Engineer at BlaBlaCar. "The answer is not to hire more and more people just to deal with the servers and installation." The team knew they had to scale the platform, but wanted to stay on their own bare metal servers.

-
- Company  BlaBlaCar     Location  Paris, France     Industry  Ridesharing Company -
+

Solution

-
-
-
-
-

Challenge

- The world’s largest long-distance carpooling community, BlaBlaCar, connects 40 million members across 22 countries. The company has been experiencing exponential growth since 2012 and needed its infrastructure to keep up. "When you’re thinking about doubling the number of servers, you start thinking, ‘What should I do to be more efficient?’" says Simon Lallemand, Infrastructure Engineer at BlaBlaCar. "The answer is not to hire more and more people just to deal with the servers and installation." The team knew they had to scale the platform, but wanted to stay on their own bare metal servers. -
-
-

Solution

- Opting not to shift to cloud virtualization or use a private cloud on their own servers, the BlaBlaCar team became early adopters of containerization, using the CoreOs runtime rkt, initially deployed using fleet cluster manager. Last year, the company switched to Kubernetes orchestration, and now also uses Prometheus for monitoring. -
+

Opting not to shift to cloud virtualization or use a private cloud on their own servers, the BlaBlaCar team became early adopters of containerization, using the CoreOs runtime rkt, initially deployed using fleet cluster manager. Last year, the company switched to Kubernetes orchestration, and now also uses Prometheus for monitoring.

-

Impact

- "Before using containers, it would take sometimes a day, sometimes two, just to create a new service," says Lallemand. "With all the tooling that we made around the containers, copying a new service now is a matter of minutes. It’s really a huge gain. We are better at capacity planning in our data center because we have fewer constraints due to this abstraction between the services and the hardware we run on. For the developers, it also means they can focus only on the features that they’re developing, and not on the infrastructure." -
-
-
-
-
- "When you’re switching to this cloud-native model and running everything in containers, you have to make sure that at any moment you can reboot without any downtime and without losing traffic. [With Kubernetes] our infrastructure is much more resilient and we have better availability than before."

- Simon Lallemand, Infrastructure Engineer at BlaBlaCar
-
-
-
-
-

For the 40 million users of BlaBlaCar, it’s easy to find strangers headed in the same direction to share rides and costs. You can even choose how much "bla bla" chatter you want from a long-distance ride mate.

- Behind the scenes, though, the infrastructure was falling woefully behind the rider community’s exponential growth. Founded in 2006, the company hit its current stride around 2012. "Our infrastructure was very traditional," says Infrastructure Engineer Simon Lallemand, who began working at the company in 2014. "In the beginning, it was a bit chaotic because we had to [grow] fast. But then comes the time when you have to design things to make it manageable."

- By 2015, the company had about 50 bare metal servers. The team was using a MySQL database and PHP, but, Lallemand says, "it was a very static way." They also utilized the configuration management system, Chef, but had little automation in its process. "When you’re thinking about doubling the number of servers, you start thinking, ‘What should I do to be more efficient?’" says Lallemand. "The answer is not to hire more and more people just to deal with the servers and installation."

- Instead, BlaBlaCar began its cloud-native journey but wasn’t sure which route to take. "We could either decide to go into cloud virtualization or even use a private cloud on our own servers," says Lallemand. "But going into the cloud meant we had to make a lot of changes in our application work, and we were just not ready to make the switch from on premise to the cloud." They wanted to keep the great performance they got on bare metal, so they didn’t want to go to virtualization on premise.

- The solution: containerization. This was early 2015 and containers were still relatively new. "It was a bold move at the time," says Lallemand. "We decided that the next servers that we would buy in the new data center would all be the same model, so we could outsource the maintenance of the servers. And we decided to go with containers and with CoreOS Container Linux as an abstraction for this hardware. It seemed future-proof to go with containers because we could see what companies were already doing with containers." -
-
+

"Before using containers, it would take sometimes a day, sometimes two, just to create a new service," says Lallemand. "With all the tooling that we made around the containers, copying a new service now is a matter of minutes. It's really a huge gain. We are better at capacity planning in our data center because we have fewer constraints due to this abstraction between the services and the hardware we run on. For the developers, it also means they can focus only on the features that they're developing, and not on the infrastructure."

-
-
- "With all the tooling that we made around the containers, copying a new service is a matter of minutes. It’s a huge gain. For the developers, it means they can focus only on the features that they’re developing and not on the infrastructure or the hour they would test their code, or the hour that it would get deployed." -
-
+{{< case-studies/quote author="Simon Lallemand, Infrastructure Engineer at BlaBlaCar" >}} +"When you're switching to this cloud-native model and running everything in containers, you have to make sure that at any moment you can reboot without any downtime and without losing traffic. [With Kubernetes] our infrastructure is much more resilient and we have better availability than before." +{{< /case-studies/quote >}} -
-
- Next, they needed to choose a runtime for the containers, but "there were very few deployments in production at that time," says Lallemand. They experimented with Docker but decided to go with rkt. Lallemand explains that for BlaBlaCar, it was "much simpler to integrate things that are on rkt." At the time, the project was still pre-v1.0, so "we could speak with the developers of rkt and give them feedback. It was an advantage." Plus, he notes, rkt was very stable, even at this early stage.

- Once those decisions were made that summer, the company came up with a plan for implementation. First, they formed a task force to create a workflow that would be tested by three of the 10 members on Lallemand’s team. But they took care to run regular workshops with all 10 members to make sure everyone was on board. "When you’re focused on your product sometimes you forget if it’s really user friendly, whether other people can manage to create containers too," Lallemand says. "So we did a lot of iterations to find a good workflow."

- After establishing the workflow, Lallemand says with a smile that "we had this strange idea that we should try the most difficult thing first. Because if it works, it will work for everything." So the first project the team decided to containerize was the database. "Nobody did that at the time, and there were really no existing tools for what we wanted to do, including building container images," he says. So the team created their own tools, such as dgr, which builds container images so that the whole team has a common framework to build on the same images with the same standards. They also revamped the service-discovery tools Nerve and Synapse; their versions, Go-Nerve and Go-Synapse, were written in Go and built to be more efficient and include new features. All of these tools were open-sourced.

- At the same time, the company was working to migrate its entire platform to containers with a deadline set for Christmas 2015. With all the work being done in parallel, BlaBlaCar was able to get about 80 percent of its production into containers by its deadline with live traffic running on containers during December. (It’s now at 100 percent.) "It’s a really busy time for traffic," says Lallemand. "We knew that by using those new servers with containers, it would help us handle the traffic."

- In the middle of that peak season for carpooling, everything worked well. "The biggest impact that we had was for the deployment of new services," says Lallemand. "Before using containers, we had to first deploy a new server and create configurations with Chef. It would take sometimes a day, sometimes two, just to create a new service. And with all the tooling that we made around the containers, copying a new service is a matter of minutes. So it’s really a huge gain. For the developers, it means they can focus only on the features that they’re developing and not on the infrastructure or the hour they would test their code, or the hour that it would get deployed." -
-
+{{< case-studies/lead >}} +For the 40 million users of BlaBlaCar, it's easy to find strangers headed in the same direction to share rides and costs. You can even choose how much "bla bla" chatter you want from a long-distance ride mate. +{{< /case-studies/lead >}} -
-
- "We realized that there was a really strong community around it [Kubernetes], which meant we would not have to maintain a lot of tools of our own," says Lallemand. "It was better if we could contribute to some bigger project like Kubernetes." -
-
+

Behind the scenes, though, the infrastructure was falling woefully behind the rider community's exponential growth. Founded in 2006, the company hit its current stride around 2012. "Our infrastructure was very traditional," says Infrastructure Engineer Simon Lallemand, who began working at the company in 2014. "In the beginning, it was a bit chaotic because we had to [grow] fast. But then comes the time when you have to design things to make it manageable."

-
-
- In order to meet their self-imposed deadline, one of the decisions they made was to not do any "orchestration magic" for containers in the first production alignment. Instead, they used the basic fleet tool from CoreOS to deploy their containers. (They did build a tool called GGN, which they’ve open-sourced, to make it more manageable for their system engineers to use.)

- Still, the team knew that they’d want more orchestration. "Our tool was doing a pretty good job, but at some point you want to give more autonomy to the developer team," Lallemand says. "We also realized that we don’t want to be the single point of contact for developers when they want to launch new services." By the summer of 2016, they found their answer in Kubernetes, which had just begun supporting rkt implementation.

- After discussing their needs with their contacts at CoreOS and Google, they were convinced that Kubernetes would work for BlaBlaCar. "We realized that there was a really strong community around it, which meant we would not have to maintain a lot of tools of our own," says Lallemand. "It was better if we could contribute to some bigger project like Kubernetes." They also started using Prometheus, as they were looking for "service-oriented monitoring that could be updated nightly." Production on Kubernetes began in December 2016. "We like to do crazy stuff around Christmas," he adds with a laugh.

- BlaBlaCar now has about 3,000 pods, with 1200 of them running on Kubernetes. Lallemand leads a "foundations team" of 25 members who take care of the networks, databases and systems for about 100 developers. There have been some challenges getting to this point. "The rkt implementation is still not 100 percent finished," Lallemand points out. "It’s really good, but there are some features still missing. We have questions about how we do things with stateful services, like databases. We know how we will be migrating some of the services; some of the others are a bit more complicated to deal with. But the Kubernetes community is making a lot of progress on that part."

- The team is particularly happy that they’re now able to plan capacity better in the company’s data center. "We have fewer constraints since we have this abstraction between the services and the hardware we run on," says Lallemand. "If we lose a server because there’s a hardware problem on it, we just move the containers onto another server. It’s much more efficient. We do that by just changing a line in the configuration file. And with Kubernetes, it should be automatic, so we would have nothing to do." -
-
+

By 2015, the company had about 50 bare metal servers. The team was using a MySQL database and PHP, but, Lallemand says, "it was a very static way." They also utilized the configuration management system, Chef, but had little automation in its process. "When you're thinking about doubling the number of servers, you start thinking, 'What should I do to be more efficient?'" says Lallemand. "The answer is not to hire more and more people just to deal with the servers and installation."

-
-
- "If we lose a server because there’s a hardware problem on it, we just move the containers onto another server. It’s much more efficient. We do that by just changing a line in the configuration file. With Kubernetes, it should be automatic, so we would have nothing to do." -
-
+

Instead, BlaBlaCar began its cloud-native journey but wasn't sure which route to take. "We could either decide to go into cloud virtualization or even use a private cloud on our own servers," says Lallemand. "But going into the cloud meant we had to make a lot of changes in our application work, and we were just not ready to make the switch from on premise to the cloud." They wanted to keep the great performance they got on bare metal, so they didn't want to go to virtualization on premise.

-
-
- And these advances ultimately trickle down to BlaBlaCar’s users. "We have improved availability overall on our website," says Lallemand. "When you’re switching to this cloud-native model with running everything in containers, you have to make sure that you can at any moment reboot a server or a data container without any downtime, without losing traffic. So now our infrastructure is much more resilient and we have better availability than before."

- Within BlaBlaCar’s technology department, the cloud-native journey has created some profound changes. Lallemand thinks that the regular meetings during the conception stage and the training sessions during implementation helped. "After that everybody took part in the migration process," he says. "Then we split the organization into different ‘tribes’—teams that gather developers, product managers, data analysts, all the different jobs, to work on a specific part of the product. Before, they were organized by function. The idea is to give all these tribes access to the infrastructure directly in a self-service way without having to ask. These people are really autonomous. They have responsibility of that part of the product, and they can make decisions faster."

- This DevOps transformation turned out to be a positive one for the company’s staffers. "The team was very excited about the DevOps transformation because it was new, and we were working to make things more reliable, more future-proof," says Lallemand. "We like doing things that very few people are doing, other than the internet giants."

- With these changes already making an impact, BlaBlaCar is looking to split up more and more of its application into services. "I don’t say microservices because they’re not so micro," Lallemand says. "If we can split the responsibilities between the development teams, it would be easier to manage and more reliable, because we can easily add and remove services if one fails. You can handle it easily, instead of adding a big monolith that we still have."

- When Lallemand speaks to other European companies curious about what BlaBlaCar has done with its infrastructure, he tells them to come along for the ride. "I tell them that it’s such a pleasure to deal with the infrastructure that we have today compared to what we had before," he says. "They just need to keep in mind their real motive, whether it’s flexibility in development or reliability or so on, and then go step by step towards reaching those objectives. That’s what we’ve done. It’s important not to do technology for the sake of technology. Do it for a purpose. Our focus was on helping the developers." -
-
+

The solution: containerization. This was early 2015 and containers were still relatively new. "It was a bold move at the time," says Lallemand. "We decided that the next servers that we would buy in the new data center would all be the same model, so we could outsource the maintenance of the servers. And we decided to go with containers and with CoreOS Container Linux as an abstraction for this hardware. It seemed future-proof to go with containers because we could see what companies were already doing with containers."

+ +{{< case-studies/quote image="/images/case-studies/blablacar/banner3.jpg">}} +"With all the tooling that we made around the containers, copying a new service is a matter of minutes. It's a huge gain. For the developers, it means they can focus only on the features that they're developing and not on the infrastructure or the hour they would test their code, or the hour that it would get deployed." +{{< /case-studies/quote >}} + +

Next, they needed to choose a runtime for the containers, but "there were very few deployments in production at that time," says Lallemand. They experimented with Docker but decided to go with rkt. Lallemand explains that for BlaBlaCar, it was "much simpler to integrate things that are on rkt." At the time, the project was still pre-v1.0, so "we could speak with the developers of rkt and give them feedback. It was an advantage." Plus, he notes, rkt was very stable, even at this early stage.

+ +

Once those decisions were made that summer, the company came up with a plan for implementation. First, they formed a task force to create a workflow that would be tested by three of the 10 members on Lallemand's team. But they took care to run regular workshops with all 10 members to make sure everyone was on board. "When you're focused on your product sometimes you forget if it's really user friendly, whether other people can manage to create containers too," Lallemand says. "So we did a lot of iterations to find a good workflow."

+ +

After establishing the workflow, Lallemand says with a smile that "we had this strange idea that we should try the most difficult thing first. Because if it works, it will work for everything." So the first project the team decided to containerize was the database. "Nobody did that at the time, and there were really no existing tools for what we wanted to do, including building container images," he says. So the team created their own tools, such as dgr, which builds container images so that the whole team has a common framework to build on the same images with the same standards. They also revamped the service-discovery tools Nerve and Synapse; their versions, Go-Nerve and Go-Synapse, were written in Go and built to be more efficient and include new features. All of these tools were open-sourced.

+ +

At the same time, the company was working to migrate its entire platform to containers with a deadline set for Christmas 2015. With all the work being done in parallel, BlaBlaCar was able to get about 80 percent of its production into containers by its deadline with live traffic running on containers during December. (It's now at 100 percent.) "It's a really busy time for traffic," says Lallemand. "We knew that by using those new servers with containers, it would help us handle the traffic."

+ +

In the middle of that peak season for carpooling, everything worked well. "The biggest impact that we had was for the deployment of new services," says Lallemand. "Before using containers, we had to first deploy a new server and create configurations with Chef. It would take sometimes a day, sometimes two, just to create a new service. And with all the tooling that we made around the containers, copying a new service is a matter of minutes. So it's really a huge gain. For the developers, it means they can focus only on the features that they're developing and not on the infrastructure or the hour they would test their code, or the hour that it would get deployed."

+ +{{< case-studies/quote image="/images/case-studies/blablacar/banner4.jpg" >}} +"We realized that there was a really strong community around it [Kubernetes], which meant we would not have to maintain a lot of tools of our own," says Lallemand. "It was better if we could contribute to some bigger project like Kubernetes." +{{< /case-studies/quote >}} + +

In order to meet their self-imposed deadline, one of the decisions they made was to not do any "orchestration magic" for containers in the first production alignment. Instead, they used the basic fleet tool from CoreOS to deploy their containers. (They did build a tool called GGN, which they've open-sourced, to make it more manageable for their system engineers to use.)

+ +

Still, the team knew that they'd want more orchestration. "Our tool was doing a pretty good job, but at some point you want to give more autonomy to the developer team," Lallemand says. "We also realized that we don't want to be the single point of contact for developers when they want to launch new services." By the summer of 2016, they found their answer in Kubernetes, which had just begun supporting rkt implementation.

+ +

After discussing their needs with their contacts at CoreOS and Google, they were convinced that Kubernetes would work for BlaBlaCar. "We realized that there was a really strong community around it, which meant we would not have to maintain a lot of tools of our own," says Lallemand. "It was better if we could contribute to some bigger project like Kubernetes." They also started using Prometheus, as they were looking for "service-oriented monitoring that could be updated nightly." Production on Kubernetes began in December 2016. "We like to do crazy stuff around Christmas," he adds with a laugh.

+ +

BlaBlaCar now has about 3,000 pods, with 1200 of them running on Kubernetes. Lallemand leads a "foundations team" of 25 members who take care of the networks, databases and systems for about 100 developers. There have been some challenges getting to this point. "The rkt implementation is still not 100 percent finished," Lallemand points out. "It's really good, but there are some features still missing. We have questions about how we do things with stateful services, like databases. We know how we will be migrating some of the services; some of the others are a bit more complicated to deal with. But the Kubernetes community is making a lot of progress on that part."

+ +

The team is particularly happy that they're now able to plan capacity better in the company's data center. "We have fewer constraints since we have this abstraction between the services and the hardware we run on," says Lallemand. "If we lose a server because there's a hardware problem on it, we just move the containers onto another server. It's much more efficient. We do that by just changing a line in the configuration file. And with Kubernetes, it should be automatic, so we would have nothing to do."

+ +{{< case-studies/quote >}} +"If we lose a server because there's a hardware problem on it, we just move the containers onto another server. It's much more efficient. We do that by just changing a line in the configuration file. With Kubernetes, it should be automatic, so we would have nothing to do." +{{< /case-studies/quote >}} + +

And these advances ultimately trickle down to BlaBlaCar's users. "We have improved availability overall on our website," says Lallemand. "When you're switching to this cloud-native model with running everything in containers, you have to make sure that you can at any moment reboot a server or a data container without any downtime, without losing traffic. So now our infrastructure is much more resilient and we have better availability than before."

+ +

Within BlaBlaCar's technology department, the cloud-native journey has created some profound changes. Lallemand thinks that the regular meetings during the conception stage and the training sessions during implementation helped. "After that everybody took part in the migration process," he says. "Then we split the organization into different 'tribes'—teams that gather developers, product managers, data analysts, all the different jobs, to work on a specific part of the product. Before, they were organized by function. The idea is to give all these tribes access to the infrastructure directly in a self-service way without having to ask. These people are really autonomous. They have responsibility of that part of the product, and they can make decisions faster."

+ +

This DevOps transformation turned out to be a positive one for the company's staffers. "The team was very excited about the DevOps transformation because it was new, and we were working to make things more reliable, more future-proof," says Lallemand. "We like doing things that very few people are doing, other than the internet giants."

+ +

With these changes already making an impact, BlaBlaCar is looking to split up more and more of its application into services. "I don't say microservices because they're not so micro," Lallemand says. "If we can split the responsibilities between the development teams, it would be easier to manage and more reliable, because we can easily add and remove services if one fails. You can handle it easily, instead of adding a big monolith that we still have."

+ +

When Lallemand speaks to other European companies curious about what BlaBlaCar has done with its infrastructure, he tells them to come along for the ride. "I tell them that it's such a pleasure to deal with the infrastructure that we have today compared to what we had before," he says. "They just need to keep in mind their real motive, whether it's flexibility in development or reliability or so on, and then go step by step towards reaching those objectives. That's what we've done. It's important not to do technology for the sake of technology. Do it for a purpose. Our focus was on helping the developers."

diff --git a/content/en/case-studies/blackrock/index.html b/content/en/case-studies/blackrock/index.html index 6bef0ec708..725d6bc057 100644 --- a/content/en/case-studies/blackrock/index.html +++ b/content/en/case-studies/blackrock/index.html @@ -1,112 +1,83 @@ --- title: BlackRock Case Study - case_study_styles: true cid: caseStudies -css: /css/style_blackrock.css + +new_case_study_styles: true +heading_background: /images/case-studies/blackrock/banner1.jpg +heading_title_logo: /images/blackrock_logo.png +subheading: > + Rolling Out Kubernetes in Production in 100 Days +case_study_details: + - Company: BlackRock + - Location: New York, NY + - Industry: Financial Services --- -
-

CASE STUDY:
-
Rolling Out Kubernetes in Production in 100 Days
-

+

Challenge

-
+

The world's largest asset manager, BlackRock operates a very controlled static deployment scheme, which has allowed for scalability over the years. But in their data science division, there was a need for more dynamic access to resources. "We want to be able to give every investor access to data science, meaning Python notebooks, or even something much more advanced, like a MapReduce engine based on Spark," says Michael Francis, a Managing Director in BlackRock's Product Group, which runs the company's investment management platform. "Managing complex Python installations on users' desktops is really hard because everyone ends up with slightly different environments. We have existing environments that do these things, but we needed to make it real, expansive and scalable. Being able to spin that up on demand, tear it down, make that much more dynamic, became a critical thought process for us. It's not so much that we had to solve our main core production problem, it's how do we extend that? How do we evolve?"

-
- Company  BlackRock     Location  New York, NY     Industry  Financial Services -
+

Solution

-
+

Drawing from what they learned during a pilot done last year using Docker environments, Francis put together a cross-sectional team of 20 to build an investor research web app using Kubernetes with the goal of getting it into production within one quarter.

-
- -
-
-

Challenge

- The world’s largest asset manager, BlackRock operates a very controlled static deployment scheme, which has allowed for scalability over the years. But in their data science division, there was a need for more dynamic access to resources. "We want to be able to give every investor access to data science, meaning Python notebooks, or even something much more advanced, like a MapReduce engine based on Spark," says Michael Francis, a Managing Director in BlackRock’s Product Group, which runs the company’s investment management platform. "Managing complex Python installations on users’ desktops is really hard because everyone ends up with slightly different environments. We have existing environments that do these things, but we needed to make it real, expansive and scalable. Being able to spin that up on demand, tear it down, make that much more dynamic, became a critical thought process for us. It’s not so much that we had to solve our main core production problem, it’s how do we extend that? How do we evolve?" -
- -
-

Solution

- Drawing from what they learned during a pilot done last year using Docker environments, Francis put together a cross-sectional team of 20 to build an investor research web app using Kubernetes with the goal of getting it into production within one quarter. -

Impact

- "Our goal was: How do you give people tools rapidly without having to install them on their desktop?" says Francis. And the team hit the goal within 100 days. Francis is pleased with the results and says, "We’re going to use this infrastructure for lots of other application workloads as time goes on. It’s not just data science; it’s this style of application that needs the dynamism. But I think we’re 6-12 months away from making a [large scale] decision. We need to gain experience of running the system in production, we need to understand failure modes and how best to manage operational issues. What’s interesting is that just having this technology there is changing the way our developers are starting to think about their future development." -
-
+

"Our goal was: How do you give people tools rapidly without having to install them on their desktop?" says Francis. And the team hit the goal within 100 days. Francis is pleased with the results and says, "We're going to use this infrastructure for lots of other application workloads as time goes on. It's not just data science; it's this style of application that needs the dynamism. But I think we're 6-12 months away from making a [large scale] decision. We need to gain experience of running the system in production, we need to understand failure modes and how best to manage operational issues. What's interesting is that just having this technology there is changing the way our developers are starting to think about their future development."

-
+{{< case-studies/quote author="Michael Francis, Managing Director, BlackRock">}} +"My message to other enterprises like us is you can actually integrate Kubernetes into an existing, well-orchestrated machinery. You don't have to throw out everything you do. And using Kubernetes made a complex problem significantly easier." +{{< /case-studies/quote >}} -
-
- "My message to other enterprises like us is you can actually integrate Kubernetes into an existing, well-orchestrated machinery. You don’t have to throw out everything you do. And using Kubernetes made a complex problem significantly easier."

- Michael Francis, Managing Director, BlackRock
-
-
+

One of the management objectives for BlackRock's Product Group employees in 2017 was to "build cool stuff." Led by Managing Director Michael Francis, a cross-sectional group of 20 did just that: They rolled out a full production Kubernetes environment and released a new investor research web app on it. In 100 days.

-
+

For a company that's the world's largest asset manager, "just equipment procurement can take 100 days sometimes, let alone from inception to delivery," says Karl Wieman, a Senior System Administrator. "It was an aggressive schedule. But it moved the dial." In fact, the project achieved two goals: It solved a business problem (creating the needed web app) as well as provided real-world, in-production experience with Kubernetes, a cloud-native technology that the company was eager to explore. "It's not so much that we had to solve our main core production problem, it's how do we extend that? How do we evolve?" says Francis. The ultimate success of this project, beyond delivering the app, lies in the fact that "we've managed to integrate a radically new thought process into a controlled infrastructure that we didn't want to change."

-
- One of the management objectives for BlackRock’s Product Group employees in 2017 was to "build cool stuff." Led by Managing Director Michael Francis, a cross-sectional group of 20 did just that: They rolled out a full production Kubernetes environment and released a new investor research web app on it. In 100 days.

- For a company that’s the world’s largest asset manager, "just equipment procurement can take 100 days sometimes, let alone from inception to delivery," says Karl Wieman, a Senior System Administrator. "It was an aggressive schedule. But it moved the dial." - In fact, the project achieved two goals: It solved a business problem (creating the needed web app) as well as provided real-world, in-production experience with Kubernetes, a cloud-native technology that the company was eager to explore. "It’s not so much that we had to solve our main core production problem, it’s how do we extend that? How do we evolve?" says Francis. The ultimate success of this project, beyond delivering the app, lies in the fact that "we’ve managed to integrate a radically new thought process into a controlled infrastructure that we didn’t want to change."

- After all, in its three decades of existence, BlackRock has "a very well-established environment for managing our compute resources," says Francis. "We manage large cluster processes on machines, so we do a lot of orchestration and management for our main production processes in a way that’s very cloudish in concept. We’re able to manage them in a very controlled, static deployment scheme, and that has given us a huge amount of scalability."

- Though that works well for the core production, the company has found that some data science workloads require more dynamic access to resources. "It’s a very bursty process," says Francis, who is head of data for the company’s Aladdin investment management platform division.

- Aladdin, which connects the people, information and technology needed for money management in real time, is used internally and is also sold as a platform to other asset managers and insurance companies. "We want to be able to give every investor access to data science, meaning Python notebooks, or even something much more advanced, like a MapReduce engine based on Spark," says Francis. But "managing complex Python installations on users’ desktops is really hard because everyone ends up with slightly different environments. Docker allows us to flatten that environment." -
-
+

After all, in its three decades of existence, BlackRock has "a very well-established environment for managing our compute resources," says Francis. "We manage large cluster processes on machines, so we do a lot of orchestration and management for our main production processes in a way that's very cloudish in concept. We're able to manage them in a very controlled, static deployment scheme, and that has given us a huge amount of scalability."

-
-
- "We manage large cluster processes on machines, so we do a lot of orchestration and management for our main production processes in a way that’s very cloudish in concept. We’re able to manage them in a very controlled, static deployment scheme, and that has given us a huge amount of scalability." -
-
+

Though that works well for the core production, the company has found that some data science workloads require more dynamic access to resources. "It's a very bursty process," says Francis, who is head of data for the company's Aladdin investment management platform division.

-
-
- Still, challenges remain. "If you have a shared cluster, you get this storming herd problem where everyone wants to do the same thing at the same time," says Francis. "You could put limits on it, but you’d have to build an infrastructure to define limits for our processes, and the Python notebooks weren’t really designed for that. We have existing environments that do these things, but we needed to make it real, expansive, and scalable. Being able to spin that up on demand, tear it down, and make that much more dynamic, became a critical thought process for us."

- Made up of managers from technology, infrastructure, production operations, development and information security, Francis’s team was able to look at the problem holistically and come up with a solution that made sense for BlackRock. "Our initial straw man was that we were going to build everything using Ansible and run it all using some completely different distributed environment," says Francis. "That would have been absolutely the wrong thing to do. Had we gone off on our own as the dev team and developed this solution, it would have been a very different product. And it would have been very expensive. We would not have gone down the route of running under our existing orchestration system. Because we don’t understand it. These guys [in operations and infrastructure] understand it. Having the multidisciplinary team allowed us to get to the right solutions and that actually meant we didn’t build anywhere near the amount we thought we were going to end up building."

- In search of a solution in which they could manage usage on a user-by-user level, Francis’s team gravitated to Red Hat’s OpenShift Kubernetes offering. The company had already experimented with other cloud-native environments, but the team liked that Kubernetes was open source, and "we felt the winds were blowing in the direction of Kubernetes long term," says Francis. "Typically we make technology choices that we believe are going to be here in 5-10 years’ time, in some form. And right now, in this space, Kubernetes feels like the one that’s going to be there." Adds Uri Morris, Vice President of Production Operations: "When you see that the non-Google committers to Kubernetes overtook the Google committers, that’s an indicator of the momentum."

- Once that decision was made, the major challenge was figuring out how to make Kubernetes work within BlackRock’s existing framework. "It’s about understanding how we can operate, manage and support a platform like this, in addition to tacking it onto our existing technology platform," says Project Manager Michael Maskallis. "All the controls we have in place, the change management process, the software development lifecycle, onboarding processes we go through—how can we do all these things?"

- The first (anticipated) speed bump was working around issues behind BlackRock’s corporate firewalls. "One of our challenges is there are no firewalls in most open source software," says Francis. "So almost all install scripts fail in some bizarre way, and pulling down packages doesn’t necessarily work." The team ran into these types of problems using Minikube and did a few small pushes back to the open source project. +

Aladdin, which connects the people, information and technology needed for money management in real time, is used internally and is also sold as a platform to other asset managers and insurance companies. "We want to be able to give every investor access to data science, meaning Python notebooks, or even something much more advanced, like a MapReduce engine based on Spark," says Francis. But "managing complex Python installations on users' desktops is really hard because everyone ends up with slightly different environments. Docker allows us to flatten that environment."

+{{< case-studies/quote image="/images/case-studies/blackrock/banner3.jpg">}} +"We manage large cluster processes on machines, so we do a lot of orchestration and management for our main production processes in a way that's very cloudish in concept. We're able to manage them in a very controlled, static deployment scheme, and that has given us a huge amount of scalability." +{{< /case-studies/quote >}} -
-
+

Still, challenges remain. "If you have a shared cluster, you get this storming herd problem where everyone wants to do the same thing at the same time," says Francis. "You could put limits on it, but you'd have to build an infrastructure to define limits for our processes, and the Python notebooks weren't really designed for that. We have existing environments that do these things, but we needed to make it real, expansive, and scalable. Being able to spin that up on demand, tear it down, and make that much more dynamic, became a critical thought process for us."

-
-
- "Typically we make technology choices that we believe are going to be here in 5-10 years’ time, in some form. And right now, in this space, Kubernetes feels like the one that’s going to be there." -
-
+

Made up of managers from technology, infrastructure, production operations, development and information security, Francis's team was able to look at the problem holistically and come up with a solution that made sense for BlackRock. "Our initial straw man was that we were going to build everything using Ansible and run it all using some completely different distributed environment," says Francis. "That would have been absolutely the wrong thing to do. Had we gone off on our own as the dev team and developed this solution, it would have been a very different product. And it would have been very expensive. We would not have gone down the route of running under our existing orchestration system. Because we don't understand it. These guys [in operations and infrastructure] understand it. Having the multidisciplinary team allowed us to get to the right solutions and that actually meant we didn't build anywhere near the amount we thought we were going to end up building."

-
-
- There were also questions about service discovery. "You can think of Aladdin as a cloud of services with APIs between them that allows us to build applications rapidly," says Francis. "It’s all on a proprietary message bus, which gives us all sorts of advantages but at the same time, how does that play in a third party [platform]?"

- Another issue they had to navigate was that in BlackRock’s existing system, the messaging protocol has different instances in the different development, test and production environments. While Kubernetes enables a more DevOps-style model, it didn’t make sense for BlackRock. "I think what we are very proud of is that the ability for us to push into production is still incredibly rapid in this [new] infrastructure, but we have the control points in place, and we didn’t have to disrupt everything," says Francis. "A lot of the cost of this development was thinking how best to leverage our internal tools. So it was less costly than we actually thought it was going to be."

- The project leveraged tools associated with the messaging bus, for example. "The way that the Kubernetes cluster will talk to our internal messaging platform is through a gateway program, and this gateway program already has built-in checks and throttles," says Morris. "We can use them to control and potentially throttle the requests coming in from Kubernetes’s very elastic infrastructure to the production infrastructure. We’ll continue to go in that direction. It enables us to scale as we need to from the operational perspective."

- The solution also had to be complementary with BlackRock’s centralized operational support team structure. "The core infrastructure components of Kubernetes are hooked into our existing orchestration framework, which means that anyone in our support team has both control and visibility to the cluster using the existing operational tools," Morris explains. "That means that I don’t need to hire more people."

- With those points established, the team created a procedure for the project: "We rolled this out first to a development environment, then moved on to a testing environment and then eventually to two production environments, in that sequential order," says Maskallis. "That drove a lot of our learning curve. We have all these moving parts, the software components on the infrastructure side, the software components with Kubernetes directly, the interconnectivity with the rest of the environment that we operate here at BlackRock, and how we connect all these pieces. If we came across issues, we fixed them, and then moved on to the different environments to replicate that until we eventually ended up in our production environment where this particular cluster is supposed to live."

- The team had weekly one-hour working sessions with all the members (who are located around the world) participating, and smaller breakout or deep-dive meetings focusing on specific technical details. Possible solutions would be reported back to the group and debated the following week. "I think what made it a successful experiment was people had to work to learn, and they shared their experiences with others," says Vice President and Software Developer Fouad Semaan. Then, Francis says, "We gave our engineers the space to do what they’re good at. This hasn’t been top-down." +

In search of a solution in which they could manage usage on a user-by-user level, Francis's team gravitated to Red Hat's OpenShift Kubernetes offering. The company had already experimented with other cloud-native environments, but the team liked that Kubernetes was open source, and "we felt the winds were blowing in the direction of Kubernetes long term," says Francis. "Typically we make technology choices that we believe are going to be here in 5-10 years' time, in some form. And right now, in this space, Kubernetes feels like the one that's going to be there." Adds Uri Morris, Vice President of Production Operations: "When you see that the non-Google committers to Kubernetes overtook the Google committers, that's an indicator of the momentum."

+

Once that decision was made, the major challenge was figuring out how to make Kubernetes work within BlackRock's existing framework. "It's about understanding how we can operate, manage and support a platform like this, in addition to tacking it onto our existing technology platform," says Project Manager Michael Maskallis. "All the controls we have in place, the change management process, the software development lifecycle, onboarding processes we go through—how can we do all these things?"

-
-
+

The first (anticipated) speed bump was working around issues behind BlackRock's corporate firewalls. "One of our challenges is there are no firewalls in most open source software," says Francis. "So almost all install scripts fail in some bizarre way, and pulling down packages doesn't necessarily work." The team ran into these types of problems using Minikube and did a few small pushes back to the open source project.

-
-
- "The core infrastructure components of Kubernetes are hooked into our existing orchestration framework, which means that anyone in our support team has both control and visibility to the cluster using the existing operational tools. That means that I don’t need to hire more people." +{{< case-studies/quote image="/images/case-studies/blackrock/banner4.jpg">}} +"Typically we make technology choices that we believe are going to be here in 5-10 years' time, in some form. And right now, in this space, Kubernetes feels like the one that's going to be there." +{{< /case-studies/quote >}} -
-
+

There were also questions about service discovery. "You can think of Aladdin as a cloud of services with APIs between them that allows us to build applications rapidly," says Francis. "It's all on a proprietary message bus, which gives us all sorts of advantages but at the same time, how does that play in a third party [platform]?"

-
-
- They were led by one key axiom: To stay focused and avoid scope creep. This meant that they wouldn’t use features that weren’t in the core of Kubernetes and Docker. But if there was a real need, they’d build the features themselves. Luckily, Francis says, "Because of the rapidity of the development, a lot of things we thought we would have to build ourselves have been rolled into the core product. [The package manager Helm is one example]. People have similar problems."

- By the end of the 100 days, the app was up and running for internal BlackRock users. The initial capacity of 30 users was hit within hours, and quickly increased to 150. "People were immediately all over it," says Francis. In the next phase of this project, they are planning to scale up the cluster to have more capacity.

- Even more importantly, they now have in-production experience with Kubernetes that they can continue to build on—and a complete framework for rolling out new applications. "We’re going to use this infrastructure for lots of other application workloads as time goes on. It’s not just data science; it’s this style of application that needs the dynamism," says Francis. "Is it the right place to move our core production processes onto? It might be. We’re not at a point where we can say yes or no, but we felt that having real production experience with something like Kubernetes at some form and scale would allow us to understand that. I think we’re 6-12 months away from making a [large scale] decision. We need to gain experience of running the system in production, we need to understand failure modes and how best to manage operational issues."

- For other big companies considering a project like this, Francis says commitment and dedication are key: "We got the signoff from [senior management] from day one, with the commitment that we were able to get the right people. If I had to isolate what makes something complex like this succeed, I would say senior hands-on people who can actually drive it make a huge difference." With that in place, he adds, "My message to other enterprises like us is you can actually integrate Kubernetes into an existing, well-orchestrated machinery. You don’t have to throw out everything you do. And using Kubernetes made a complex problem significantly easier." +

Another issue they had to navigate was that in BlackRock's existing system, the messaging protocol has different instances in the different development, test and production environments. While Kubernetes enables a more DevOps-style model, it didn't make sense for BlackRock. "I think what we are very proud of is that the ability for us to push into production is still incredibly rapid in this [new] infrastructure, but we have the control points in place, and we didn't have to disrupt everything," says Francis. "A lot of the cost of this development was thinking how best to leverage our internal tools. So it was less costly than we actually thought it was going to be."

-
-
+

The project leveraged tools associated with the messaging bus, for example. "The way that the Kubernetes cluster will talk to our internal messaging platform is through a gateway program, and this gateway program already has built-in checks and throttles," says Morris. "We can use them to control and potentially throttle the requests coming in from Kubernetes's very elastic infrastructure to the production infrastructure. We'll continue to go in that direction. It enables us to scale as we need to from the operational perspective."

+ +

The solution also had to be complementary with BlackRock's centralized operational support team structure. "The core infrastructure components of Kubernetes are hooked into our existing orchestration framework, which means that anyone in our support team has both control and visibility to the cluster using the existing operational tools," Morris explains. "That means that I don't need to hire more people."

+ +

With those points established, the team created a procedure for the project: "We rolled this out first to a development environment, then moved on to a testing environment and then eventually to two production environments, in that sequential order," says Maskallis. "That drove a lot of our learning curve. We have all these moving parts, the software components on the infrastructure side, the software components with Kubernetes directly, the interconnectivity with the rest of the environment that we operate here at BlackRock, and how we connect all these pieces. If we came across issues, we fixed them, and then moved on to the different environments to replicate that until we eventually ended up in our production environment where this particular cluster is supposed to live."

+ +

The team had weekly one-hour working sessions with all the members (who are located around the world) participating, and smaller breakout or deep-dive meetings focusing on specific technical details. Possible solutions would be reported back to the group and debated the following week. "I think what made it a successful experiment was people had to work to learn, and they shared their experiences with others," says Vice President and Software Developer Fouad Semaan. Then, Francis says, "We gave our engineers the space to do what they're good at. This hasn't been top-down."

+ +{{< case-studies/quote >}} +"The core infrastructure components of Kubernetes are hooked into our existing orchestration framework, which means that anyone in our support team has both control and visibility to the cluster using the existing operational tools. That means that I don't need to hire more people." +{{< /case-studies/quote >}} + +

They were led by one key axiom: To stay focused and avoid scope creep. This meant that they wouldn't use features that weren't in the core of Kubernetes and Docker. But if there was a real need, they'd build the features themselves. Luckily, Francis says, "Because of the rapidity of the development, a lot of things we thought we would have to build ourselves have been rolled into the core product. [The package manager Helm is one example]. People have similar problems."

+ +

By the end of the 100 days, the app was up and running for internal BlackRock users. The initial capacity of 30 users was hit within hours, and quickly increased to 150. "People were immediately all over it," says Francis. In the next phase of this project, they are planning to scale up the cluster to have more capacity.

+ +

Even more importantly, they now have in-production experience with Kubernetes that they can continue to build on—and a complete framework for rolling out new applications. "We're going to use this infrastructure for lots of other application workloads as time goes on. It's not just data science; it's this style of application that needs the dynamism," says Francis. "Is it the right place to move our core production processes onto? It might be. We're not at a point where we can say yes or no, but we felt that having real production experience with something like Kubernetes at some form and scale would allow us to understand that. I think we're 6-12 months away from making a [large scale] decision. We need to gain experience of running the system in production, we need to understand failure modes and how best to manage operational issues."

+ +

For other big companies considering a project like this, Francis says commitment and dedication are key: "We got the signoff from [senior management] from day one, with the commitment that we were able to get the right people. If I had to isolate what makes something complex like this succeed, I would say senior hands-on people who can actually drive it make a huge difference." With that in place, he adds, "My message to other enterprises like us is you can actually integrate Kubernetes into an existing, well-orchestrated machinery. You don't have to throw out everything you do. And using Kubernetes made a complex problem significantly easier."

diff --git a/content/en/case-studies/booking-com/index.html b/content/en/case-studies/booking-com/index.html index 99369a2bf9..74d5b282b1 100644 --- a/content/en/case-studies/booking-com/index.html +++ b/content/en/case-studies/booking-com/index.html @@ -3,115 +3,84 @@ title: Booking.com Case Study linkTitle: Booking.com case_study_styles: true cid: caseStudies -css: /css/case-studies-gradient.css logo: booking.com_featured_logo.png featured: true weight: 3 quote: > We realized that we needed to learn Kubernetes better in order to fully use the potential of it. At that point, we made the shift to build our own Kubernetes platform. + +new_case_study_styles: true +heading_background: /images/case-studies/booking/banner1.jpg +heading_title_text: Booking.com +use_gradient_overlay: true +subheading: > + After Learning the Ropes with a Kubernetes Distribution, Booking.com Built a Platform of Its Own +case_study_details: + - Company: Booking.com + - Location: Netherlands + - Industry: Travel --- -​ +

Challenge

+

In 2016, Booking.com migrated to an OpenShift platform, which gave product developers faster access to infrastructure. But because Kubernetes was abstracted away from the developers, the infrastructure team became a "knowledge bottleneck" when challenges arose. Trying to scale that support wasn't sustainable.

-
-

CASE STUDY: Booking.com

-
After Learning the Ropes with a Kubernetes Distribution, Booking.com Built a Platform of Its Own
-
-​ -​ -
- Company  Booking.com     Location  Netherlands     Industry  Travel -
-​ -
-
-
-
-

Challenge

- In 2016, Booking.com migrated to an OpenShift platform, which gave product developers faster access to infrastructure. But because Kubernetes was abstracted away from the developers, the infrastructure team became a “knowledge bottleneck” when challenges arose. Trying to scale that support wasn’t sustainable. +

Solution

+

After a year operating OpenShift, the platform team decided to build its own vanilla Kubernetes platform—and ask developers to learn some Kubernetes in order to use it. "This is not a magical platform," says Ben Tyler, Principal Developer, B Platform Track. "We're not claiming that you can just use it with your eyes closed. Developers need to do some learning, and we're going to do everything we can to make sure they have access to that knowledge."

-

Solution

- After a year operating OpenShift, the platform team decided to build its own vanilla Kubernetes platform—and ask developers to learn some Kubernetes in order to use it. “This is not a magical platform,” says Ben Tyler, Principal Developer, B Platform Track. “We’re not claiming that you can just use it with your eyes closed. Developers need to do some learning, and we’re going to do everything we can to make sure they have access to that knowledge.” -​

Impact

- Despite the learning curve, there’s been a great uptick in adoption of the new Kubernetes platform. Before containers, creating a new service could take a couple of days if the developers understood Puppet, or weeks if they didn’t. On the new platform, it can take as few as 10 minutes. About 500 new services were built on the platform in the first 8 months. -
-
-
-
-
- “As our users learn Kubernetes and become more sophisticated Kubernetes users, they put pressure on us to provide a better, more native Kubernetes experience, which is great. It’s a super healthy dynamic.” -

- BEN TYLER, PRINCIPAL DEVELOPER, B PLATFORM TRACK AT BOOKING.COM

-
-
-​ -​ -
-
-

Booking.com has a long history with Kubernetes: In 2015, a team at the travel platform prototyped a container platform based on Mesos and Marathon. -

-

Impressed by what the technology offered, but in need of enterprise features at its scale—the site handles more than 1.5 million room-night reservations a day on average—the team decided to adopt an OpenShift platform.

+

Despite the learning curve, there's been a great uptick in adoption of the new Kubernetes platform. Before containers, creating a new service could take a couple of days if the developers understood Puppet, or weeks if they didn't. On the new platform, it can take as few as 10 minutes. About 500 new services were built on the platform in the first 8 months.

-

This platform, which was wrapped in a Heroku-style, high-level CLI interface, “was definitely popular with our product developers,” says Ben Tyler, Principal Developer, B Platform Track. “We gave them faster access to infrastructure.”

+{{< case-studies/quote + image="/images/case-studies/booking/banner2.jpg" + author="BEN TYLER, PRINCIPAL DEVELOPER, B PLATFORM TRACK AT BOOKING.COM" +>}} +"As our users learn Kubernetes and become more sophisticated Kubernetes users, they put pressure on us to provide a better, more native Kubernetes experience, which is great. It's a super healthy dynamic." +{{< /case-studies/quote >}} -

But, he adds, “anytime something went slightly off the rails, developers didn’t have any of the knowledge required to support themselves.”

+​{{< case-studies/lead >}} +Booking.com has a long history with Kubernetes: In 2015, a team at the travel platform prototyped a container platform based on Mesos and Marathon. +{{< /case-studies/lead >}} -

And after a year of operating this platform, the infrastructure team found that it had become “a knowledge bottleneck,” he says. “Most of the developers who used it did not know it was Kubernetes underneath. An application failure and a platform failure both looked like failures of that Heroku-style tool.”

-

- Scaling the necessary support did not seem feasible or sustainable, so the platform team needed a new solution. The understanding of Kubernetes that they had gained operating the OpenShift platform gave them confidence to build a vanilla Kubernetes platform of their own and customize it to suit the company’s needs.

+

Impressed by what the technology offered, but in need of enterprise features at its scale—the site handles more than 1.5 million room-night reservations a day on average—the team decided to adopt an OpenShift platform.

+

This platform, which was wrapped in a Heroku-style, high-level CLI interface, "was definitely popular with our product developers," says Ben Tyler, Principal Developer, B Platform Track. "We gave them faster access to infrastructure."

-
-
-​ -
-
- “For entering the landscape, OpenShift was definitely very helpful. It shows you what the technology can do, and it makes it easy for you to use it. After we spent some time on it, we realized that we needed to learn Kubernetes better in order to fully use the potential of it. At that point, we made the shift to build our own Kubernetes platform. We definitely benefit in the long term for taking that step and investing the time in gaining that knowledge.”

- EDUARD IACOBOAIA, SENIOR SYSTEM ADMINISTRATOR, B PLATFORM TRACK AT BOOKING.COM

-
-
-​ -
-
-

- “For entering the landscape, OpenShift was definitely very helpful,” says Eduard Iacoboaia, Senior System Administrator, B Platform Track. “It shows you what the technology can do, and it makes it easy for you to use it. After we spent some time on it, we realized that we needed to learn Kubernetes better in order to fully use the potential of it. At that point, we made the shift to build our own Kubernetes platform. We definitely benefit in the long term for taking that step and investing the time in gaining that knowledge.”

-

- Iacoboaia’s team had customized a lot of OpenShift tools to make them work at Booking.com, and “those integrations points were kind of fragile,” he says. “We spent much more time understanding all the components of Kubernetes, how they work, how they interact with each other.” That research led the team to switch from OpenShift’s built-in Ansible playbooks to Puppet deployments, which are used for the rest of Booking’s infrastructure. The control plane was also moved from inside the cluster onto bare metal, as the company runs tens of thousands of bare-metal servers and a large infrastructure for running applications on bare metal. (Booking runs Kubernetes in multiple clusters in multiple data centers across the various regions where it has compute.) “We decided to keep it as simple as possible and to also use the tools that we know best,” says Iacoboaia.

-

- The other big change was that product engineers would have to learn Kubernetes in order to onboard. “This is not a magical platform,” says Tyler. “We’re not claiming that you can just use it with your eyes closed. Developers need to do some learning, and we’re going to do everything we can to make sure they have access to that knowledge.” That includes trainings, blog posts, videos, and Udemy courses.

-

- Despite the learning curve, there’s been a great uptick in adoption of the new Kubernetes platform. “I think the reason we’ve been able to strike this bargain successfully is that we’re not asking them to learn a proprietary app system,” says Tyler. “We’re asking them to learn something that’s open source, where the knowledge is transferable. They’re investing in their own careers by learning Kubernetes.”

-

- One clear sign that this strategy has been a success is that in the support channel, when users have questions, other product engineers are jumping in to respond. “I haven’t seen that kind of community engagement around a particular platform product internally before,” says Tyler. “It helps a lot that it’s visibly an ecosystem standard outside of the company, so people feel value in investing in that knowledge and sharing it with others, which is really, really powerful.”

+

But, he adds, "anytime something went slightly off the rails, developers didn't have any of the knowledge required to support themselves."

+

And after a year of operating this platform, the infrastructure team found that it had become "a knowledge bottleneck," he says. "Most of the developers who used it did not know it was Kubernetes underneath. An application failure and a platform failure both looked like failures of that Heroku-style tool."

-
-
-​ -​ -
-
- “We have a tutorial. You follow the tutorial. Your code is running. Then, it’s business-logic time. The time to gain access to resources is decreased enormously.”

- BEN TYLER, PRINCIPAL DEVELOPER, B PLATFORM TRACK AT BOOKING.COM

-
-
-​ -
-
-

- There’s other quantifiable evidence too: Before containers, creating a new service could take a couple of days if the developers understood Puppet, or weeks if they didn’t. On the new platform, it takes 10 minutes. “We have a tutorial. You follow the tutorial. Your code is running. Then, it’s business-logic time,” says Tyler. “The time to gain access to resources is decreased enormously.” About 500 new services were built in the first 8 months on the platform, with hundreds of releases per day.

-

- The platform offers different “layers of contracts, so to speak,” says Tyler. “At the very base, it’s just Kubernetes. If you’re a pro Kubernetes user, here’s a Kubernetes API, just like you get from GKE or AKS. We’re trying to be a provider on that same level. But our whole job inside the company is to be a bigger value add than just vanilla infrastructure, so we provide a set of base images for our main stacks, Perl and Java.” -

-

- And “as our users learn Kubernetes and become more sophisticated Kubernetes users, they put pressure on us to provide a better more native Kubernetes experience, which is great,” says Tyler. “It’s a super healthy dynamic.”

-

- The platform also includes other CNCF technologies, such as Envoy, Helm, and Prometheus. Most of the critical service traffic for Booking.com is routed through Envoy, and Prometheus is used primarily to monitor infrastructure components. Helm is consumed as a packaging standard. The team also developed and open sourced Shipper, an extension for Kubernetes to add more complex rollout strategies and multi-cluster orchestration. -

-

- To be sure, there have been internal discussions about the wisdom of building a Kubernetes platform from the ground up. “This is not really our core competency—Kubernetes and travel, they’re kind of far apart, right?” says Tyler. “But we’ve made a couple of bets on CNCF components that have worked out really well for us. Envoy and Kubernetes, in particular, have been really beneficial to our organization. We were able to customize them, either because we could look at the source code or because they had extension points, and we were able to get value out of them very quickly without having to change any paradigms internally.” -

-
-
- +

Scaling the necessary support did not seem feasible or sustainable, so the platform team needed a new solution. The understanding of Kubernetes that they had gained operating the OpenShift platform gave them confidence to build a vanilla Kubernetes platform of their own and customize it to suit the company's needs.

+ +{{< case-studies/quote author="EDUARD IACOBOAIA, SENIOR SYSTEM ADMINISTRATOR, B PLATFORM TRACK AT BOOKING.COM" >}} +"For entering the landscape, OpenShift was definitely very helpful. It shows you what the technology can do, and it makes it easy for you to use it. After we spent some time on it, we realized that we needed to learn Kubernetes better in order to fully use the potential of it. At that point, we made the shift to build our own Kubernetes platform. We definitely benefit in the long term for taking that step and investing the time in gaining that knowledge." +{{< /case-studies/quote >}} + +

"For entering the landscape, OpenShift was definitely very helpful," says Eduard Iacoboaia, Senior System Administrator, B Platform Track. "It shows you what the technology can do, and it makes it easy for you to use it. After we spent some time on it, we realized that we needed to learn Kubernetes better in order to fully use the potential of it. At that point, we made the shift to build our own Kubernetes platform. We definitely benefit in the long term for taking that step and investing the time in gaining that knowledge."

+ +

Iacoboaia's team had customized a lot of OpenShift tools to make them work at Booking.com, and "those integrations points were kind of fragile," he says. "We spent much more time understanding all the components of Kubernetes, how they work, how they interact with each other." That research led the team to switch from OpenShift's built-in Ansible playbooks to Puppet deployments, which are used for the rest of Booking's infrastructure. The control plane was also moved from inside the cluster onto bare metal, as the company runs tens of thousands of bare-metal servers and a large infrastructure for running applications on bare metal. (Booking runs Kubernetes in multiple clusters in multiple data centers across the various regions where it has compute.) "We decided to keep it as simple as possible and to also use the tools that we know best," says Iacoboaia.

+ +

The other big change was that product engineers would have to learn Kubernetes in order to onboard. "This is not a magical platform," says Tyler. "We're not claiming that you can just use it with your eyes closed. Developers need to do some learning, and we're going to do everything we can to make sure they have access to that knowledge." That includes trainings, blog posts, videos, and Udemy courses.

+ +

Despite the learning curve, there's been a great uptick in adoption of the new Kubernetes platform. "I think the reason we've been able to strike this bargain successfully is that we're not asking them to learn a proprietary app system," says Tyler. "We're asking them to learn something that's open source, where the knowledge is transferable. They're investing in their own careers by learning Kubernetes."

+ +

One clear sign that this strategy has been a success is that in the support channel, when users have questions, other product engineers are jumping in to respond. "I haven't seen that kind of community engagement around a particular platform product internally before," says Tyler. "It helps a lot that it's visibly an ecosystem standard outside of the company, so people feel value in investing in that knowledge and sharing it with others, which is really, really powerful."

+ +{{< case-studies/quote + image="/images/case-studies/booking/banner3.jpg" + author="BEN TYLER, PRINCIPAL DEVELOPER, B PLATFORM TRACK AT BOOKING.COM" +>}} +"We have a tutorial. You follow the tutorial. Your code is running. Then, it's business-logic time. The time to gain access to resources is decreased enormously." +{{< /case-studies/quote >}} + +

There's other quantifiable evidence too: Before containers, creating a new service could take a couple of days if the developers understood Puppet, or weeks if they didn't. On the new platform, it takes 10 minutes. "We have a tutorial. You follow the tutorial. Your code is running. Then, it's business-logic time," says Tyler. "The time to gain access to resources is decreased enormously." About 500 new services were built in the first 8 months on the platform, with hundreds of releases per day.

+ +

The platform offers different "layers of contracts, so to speak," says Tyler. "At the very base, it's just Kubernetes. If you're a pro Kubernetes user, here's a Kubernetes API, just like you get from GKE or AKS. We're trying to be a provider on that same level. But our whole job inside the company is to be a bigger value add than just vanilla infrastructure, so we provide a set of base images for our main stacks, Perl and Java."

+ +

And "as our users learn Kubernetes and become more sophisticated Kubernetes users, they put pressure on us to provide a better more native Kubernetes experience, which is great," says Tyler. "It's a super healthy dynamic."

+ +

The platform also includes other CNCF technologies, such as Envoy, Helm, and Prometheus. Most of the critical service traffic for Booking.com is routed through Envoy, and Prometheus is used primarily to monitor infrastructure components. Helm is consumed as a packaging standard. The team also developed and open sourced Shipper, an extension for Kubernetes to add more complex rollout strategies and multi-cluster orchestration.

+ +

To be sure, there have been internal discussions about the wisdom of building a Kubernetes platform from the ground up. "This is not really our core competency—Kubernetes and travel, they're kind of far apart, right?" says Tyler. "But we've made a couple of bets on CNCF components that have worked out really well for us. Envoy and Kubernetes, in particular, have been really beneficial to our organization. We were able to customize them, either because we could look at the source code or because they had extension points, and we were able to get value out of them very quickly without having to change any paradigms internally."

diff --git a/content/en/case-studies/booz-allen/index.html b/content/en/case-studies/booz-allen/index.html index fdda5e976a..7b53dc01ae 100644 --- a/content/en/case-studies/booz-allen/index.html +++ b/content/en/case-studies/booz-allen/index.html @@ -3,101 +3,78 @@ title: Booz Allen Case Study linkTitle: Booz Allen Hamilton case_study_styles: true cid: caseStudies -css: /css/case-studies-gradient.css logo: booz-allen-featured-logo.svg featured: true weight: 2 quote: > - Kubernetes is a great solution for us. It allows us to rapidly iterate on our clients’ demands. + Kubernetes is a great solution for us. It allows us to rapidly iterate on our clients' demands. + +new_case_study_styles: true +heading_background: /images/case-studies/booz-allen/banner4.jpg +heading_title_text: Booz Allen Hamilton +use_gradient_overlay: true +subheading: > + How Booz Allen Hamilton Is Helping Modernize the Federal Government with Kubernetes +case_study_details: + - Company: Booz Allen Hamilton + - Location: United States + - Industry: Government --- -​ +

Challenge

-
-

CASE STUDY: Booz Allen Hamilton

-
How Booz Allen Hamilton Is Helping Modernize the Federal Government with Kubernetes
-
-​ -​ -
- Company  Booz Allen Hamilton     Location  United States     Industry  Government -
-​ -
-
-
-
-

Challenge

- In 2017, Booz Allen Hamilton’s Strategic Innovation Group worked with the federal government to relaunch the decade-old recreation.gov website, which provides information and real-time booking for more than 100,000 campsites and facilities on federal lands across the country. The infrastructure needed to be agile, reliable, and scalable—as well as repeatable for the other federal agencies that are among Booz Allen Hamilton’s customers. +

In 2017, Booz Allen Hamilton's Strategic Innovation Group worked with the federal government to relaunch the decade-old recreation.gov website, which provides information and real-time booking for more than 100,000 campsites and facilities on federal lands across the country. The infrastructure needed to be agile, reliable, and scalable—as well as repeatable for the other federal agencies that are among Booz Allen Hamilton's customers.

+ +

Solution

+ +

"The only way that we thought we could be successful with this problem across all the different agencies is to create a microservice architecture and containers, so that we could be very dynamic and very agile to any given agency for whatever requirements that they may have," says Booz Allen Hamilton Senior Lead Technologist Martin Folkoff. To meet those requirements, Folkoff's team looked to Kubernetes for orchestration.

-

Solution

- “The only way that we thought we could be successful with this problem across all the different agencies is to create a microservice architecture and containers, so that we could be very dynamic and very agile to any given agency for whatever requirements that they may have,” says Booz Allen Hamilton Senior Lead Technologist Martin Folkoff. To meet those requirements, Folkoff’s team looked to Kubernetes for orchestration. -​

Impact

- With the recreation.gov Kubernetes platform, changes can be implemented in about 30 minutes, compared to the multiple hours or even days legacy government applications require to review the code, get approval, and deploy the fix. Recreation.gov deploys to production on average 10 times a day. With monitoring, security, and logging built in, developers can create and publish new services to production within a week. Additionally, Folkoff says, “supporting the large, existing monoliths in the government is extremely expensive,” and migrating into a more modern platform has resulted in perhaps 50% cost savings. -
-
-
-
-
- "When there’s a regulatory change in an agency, or a legislative change in Congress, or an executive order that changes the way you do business, how do I deploy that and get that out to the people who need it rapidly? At the end of the day, that’s the problem we’re trying to help the government solve with tools like Kubernetes." -

- JOSH BOYD, CHIEF TECHNOLOGIST AT BOOZ ALLEN HAMILTON

-
-
-​ -​ -
-
-

The White House launched an IT modernization effort in 2017, and in addition to improving cybersecurity and shifting to the public cloud and a consolidated IT model, “the federal government is looking to provide a better experience to citizens in every way that we interact with the government through every channel,” says Booz Allen Hamilton Senior Lead Technologist Martin Folkoff.

-

To that end, Folkoff’s Strategic Innovation Group worked with the federal government last year to relaunch the decade-old recreation.gov website, which provides information and real-time booking for more than 100,000 campsites and facilities on federal lands across the country.

+

With the recreation.gov Kubernetes platform, changes can be implemented in about 30 minutes, compared to the multiple hours or even days legacy government applications require to review the code, get approval, and deploy the fix. Recreation.gov deploys to production on average 10 times a day. With monitoring, security, and logging built in, developers can create and publish new services to production within a week. Additionally, Folkoff says, "supporting the large, existing monoliths in the government is extremely expensive," and migrating into a more modern platform has resulted in perhaps 50% cost savings.

-

The infrastructure needed to be agile, reliable, and scalable—as well as repeatable for the other federal agencies that are among Booz Allen Hamilton’s customers. “The only way that we thought we could be successful with this problem across all the different agencies is to create a microservice architecture, so that we could be very dynamic and very agile to any given agency for whatever requirements that they may have,” says Folkoff.

-
-
+{{< case-studies/quote + image="/images/case-studies/booz-allen/banner2.jpg" + author="JOSH BOYD, CHIEF TECHNOLOGIST AT BOOZ ALLEN HAMILTON" +>}} +"When there's a regulatory change in an agency, or a legislative change in Congress, or an executive order that changes the way you do business, how do I deploy that and get that out to the people who need it rapidly? At the end of the day, that's the problem we're trying to help the government solve with tools like Kubernetes." +{{< /case-studies/quote >}} ​ -
-
- "With CNCF, there’s a lot of focus on scale, and so there’s a lot of comfort knowing that as the project grows, we’re going to be comfortable using that tool set."

- MARTIN FOLKOFF, SENIOR LEAD TECHNOLOGIST AT BOOZ ALLEN HAMILTON

-
-
+​{{< case-studies/lead >}} +The White House launched an IT modernization effort in 2017, and in addition to improving cybersecurity and shifting to the public cloud and a consolidated IT model, "the federal government is looking to provide a better experience to citizens in every way that we interact with the government through every channel," says Booz Allen Hamilton Senior Lead Technologist Martin Folkoff. +{{< /case-studies/lead >}} + +

To that end, Folkoff's Strategic Innovation Group worked with the federal government last year to relaunch the decade-old recreation.gov website, which provides information and real-time booking for more than 100,000 campsites and facilities on federal lands across the country.

+ +

The infrastructure needed to be agile, reliable, and scalable—as well as repeatable for the other federal agencies that are among Booz Allen Hamilton's customers. "The only way that we thought we could be successful with this problem across all the different agencies is to create a microservice architecture, so that we could be very dynamic and very agile to any given agency for whatever requirements that they may have," says Folkoff.

+ +{{< case-studies/quote author="MARTIN FOLKOFF, SENIOR LEAD TECHNOLOGIST AT BOOZ ALLEN HAMILTON" >}} +"With CNCF, there's a lot of focus on scale, and so there's a lot of comfort knowing that as the project grows, we're going to be comfortable using that tool set." +{{< /case-studies/quote >}} + +

Booz Allen Hamilton, which has provided consulting services to the federal government for more than a century, introduced microservices, Docker containers, and AWS to its federal agency clients about five years ago. The next logical step was Kubernetes for orchestration. "Knowing that we had to be really agile and really reliable and scalable, we felt that the only technology that we know that can enable those kinds of things are the ones the CNCF provides," Folkoff says. "One of the things that is always important for the government is to make sure that the things that we build really endure. Using technology that is supported across multiple different companies and has strong governance gives people a lot of confidence."

+ +

Kubernetes was also aligned with the government's open source and IT modernization initiatives, so there has been an uptick in its usage at federal agencies over the past two years. "Now that Kubernetes is becoming offered as a service by the cloud providers like AWS and Microsoft, we're starting to see even more interest," says Chief Technologist Josh Boyd. Adds Folkoff: "With CNCF, there's a lot of focus on scale, and so there's a lot of comfort knowing that as the project grows, we're going to be comfortable using that tool set."

+ +

The greenfield recreation.gov project allowed the team to build a new Kubernetes-enabled site running on AWS, and the migration lasted only a week, when the old site didn't take bookings. "For the actual transition, we just swapped a DNS server, and it only took about 35 seconds between the old site being down and our new site being up and available," Folkoff adds.

​ -
-
-

- Booz Allen Hamilton, which has provided consulting services to the federal government for more than a century, introduced microservices, Docker containers, and AWS to its federal agency clients about five years ago. The next logical step was Kubernetes for orchestration. “Knowing that we had to be really agile and really reliable and scalable, we felt that the only technology that we know that can enable those kinds of things are the ones the CNCF provides,” Folkoff says. “One of the things that is always important for the government is to make sure that the things that we build really endure. Using technology that is supported across multiple different companies and has strong governance gives people a lot of confidence.”

-

- Kubernetes was also aligned with the government’s open source and IT modernization initiatives, so there has been an uptick in its usage at federal agencies over the past two years. “Now that Kubernetes is becoming offered as a service by the cloud providers like AWS and Microsoft, we’re starting to see even more interest,” says Chief Technologist Josh Boyd. Adds Folkoff: “With CNCF, there’s a lot of focus on scale, and so there’s a lot of comfort knowing that as the project grows, we’re going to be comfortable using that tool set.”

-

- The greenfield recreation.gov project allowed the team to build a new Kubernetes-enabled site running on AWS, and the migration lasted only a week, when the old site didn’t take bookings. “For the actual transition, we just swapped a DNS server, and it only took about 35 seconds between the old site being down and our new site being up and available,” Folkoff adds.

-

-
-
-​ -​ -
-
- "Kubernetes alone enables a dramatic reduction in cost as resources are prioritized to the day’s event"

- MARTIN FOLKOFF, SENIOR LEAD TECHNOLOGIST AT BOOZ ALLEN HAMILTON

-
-
-​ -
-
-

- In addition to its work with the Department of Interior for recreation.gov, Booz Allen Hamilton has brought Kubernetes to various Defense, Intelligence, and civilian agencies. Says Boyd: “When there’s a regulatory change in an agency, or a legislative change in Congress, or an executive order that changes the way you do business, how do I deploy that and get that out to the people who need it rapidly? At the end of the day, that’s the problem we’re trying to help the government solve with tools like Kubernetes.”

-

- For recreation.gov, the impact was clear and immediate. With the Kubernetes platform, Folkoff says, “if a new requirement for a permit comes out, we have the ability to design and develop and implement that completely independently of reserving a campsite. It provides a much better experience to users.” Today, changes can be implemented in about 30 minutes, compared to the multiple hours or even days legacy government applications require to review the code, get approval, and deploy the fix. Recreation.gov deploys to production on average 10 times a day.

-

- Developer velocity has been improved. “When I want to do monitoring or security or logging, I don’t have to do anything to my services or my application to enable that anymore,” says Boyd. “I get all of this magic just by being on the Kubernetes platform.” With all of those things built in, developers can create and publish new services to production within one week.

-

- Additionally, Folkoff says, “supporting the large, existing monoliths in the government is extremely expensive,” and migrating into a more modern platform has resulted in perhaps 50% cost savings. “Kubernetes alone enables a dramatic reduction in cost as resources are prioritized to the day’s event,” he says. “For example, during a popular campsite release, camping-related services are scaled out while permit services are scaled down.”

-

- So far, “Kubernetes is a great solution for us,” says Folkoff. “It allows us to rapidly iterate on our clients’ demands.” Looking ahead, the team sees further adoption of the Kubernetes platform across federal agencies. Says Boyd: “You get the ability for the rapid delivery of business value for your customers. You now have observability into everything that you’re doing. You don’t have these onesies and twosies unicorn servers anymore. Now everything that you deploy is deployed in the same way, it’s all instrumented the same way, and it’s all built and deployed the same way through our CI/CD processes.”

-

- They also see a push toward re-platforming. “There’s still a lot of legacy workloads out there,” says Boyd. “We’ve got the new challenges of greenfield development and integration with legacy systems, but also that brown field of ‘Hey, how do I take this legacy monolith and get it onto a platform where now it’s instrumented with all the magic of the Kubernetes platform without having to do a whole lot to my application?’ I think re-platforming is a pretty big use case for the government right now.”

-

- And given the success that they’ve had with Kubernetes so far, Boyd says, “I think at this point that technology is becoming pretty easy to sell.” Adds Folkoff: “People are really excited about being able to deploy, scale, be reliable, and do cheaper maintenance of all of this.” -

-
-
- +{{< case-studies/quote + image="/images/case-studies/booz-allen/banner1.png" + author="MARTIN FOLKOFF, SENIOR LEAD TECHNOLOGIST AT BOOZ ALLEN HAMILTON" +>}} +"Kubernetes alone enables a dramatic reduction in cost as resources are prioritized to the day's event" +{{< /case-studies/quote >}} + +

In addition to its work with the Department of Interior for recreation.gov, Booz Allen Hamilton has brought Kubernetes to various Defense, Intelligence, and civilian agencies. Says Boyd: "When there's a regulatory change in an agency, or a legislative change in Congress, or an executive order that changes the way you do business, how do I deploy that and get that out to the people who need it rapidly? At the end of the day, that's the problem we're trying to help the government solve with tools like Kubernetes."

+ +

For recreation.gov, the impact was clear and immediate. With the Kubernetes platform, Folkoff says, "if a new requirement for a permit comes out, we have the ability to design and develop and implement that completely independently of reserving a campsite. It provides a much better experience to users." Today, changes can be implemented in about 30 minutes, compared to the multiple hours or even days legacy government applications require to review the code, get approval, and deploy the fix. Recreation.gov deploys to production on average 10 times a day.

+ +

Developer velocity has been improved. "When I want to do monitoring or security or logging, I don't have to do anything to my services or my application to enable that anymore," says Boyd. "I get all of this magic just by being on the Kubernetes platform." With all of those things built in, developers can create and publish new services to production within one week.

+ +

Additionally, Folkoff says, "supporting the large, existing monoliths in the government is extremely expensive," and migrating into a more modern platform has resulted in perhaps 50% cost savings. "Kubernetes alone enables a dramatic reduction in cost as resources are prioritized to the day's event," he says. "For example, during a popular campsite release, camping-related services are scaled out while permit services are scaled down."

+ +

So far, "Kubernetes is a great solution for us," says Folkoff. "It allows us to rapidly iterate on our clients' demands." Looking ahead, the team sees further adoption of the Kubernetes platform across federal agencies. Says Boyd: "You get the ability for the rapid delivery of business value for your customers. You now have observability into everything that you're doing. You don't have these onesies and twosies unicorn servers anymore. Now everything that you deploy is deployed in the same way, it's all instrumented the same way, and it's all built and deployed the same way through our CI/CD processes."

+ +

They also see a push toward re-platforming. "There's still a lot of legacy workloads out there," says Boyd. "We've got the new challenges of greenfield development and integration with legacy systems, but also that brown field of 'Hey, how do I take this legacy monolith and get it onto a platform where now it's instrumented with all the magic of the Kubernetes platform without having to do a whole lot to my application?' I think re-platforming is a pretty big use case for the government right now."

+ +

And given the success that they've had with Kubernetes so far, Boyd says, "I think at this point that technology is becoming pretty easy to sell." Adds Folkoff: "People are really excited about being able to deploy, scale, be reliable, and do cheaper maintenance of all of this."

diff --git a/content/en/case-studies/bose/index.html b/content/en/case-studies/bose/index.html index c77f416c13..70deb6ce68 100644 --- a/content/en/case-studies/bose/index.html +++ b/content/en/case-studies/bose/index.html @@ -3,101 +3,85 @@ title: Bose Case Study linkTitle: Bose case_study_styles: true cid: caseStudies -css: /css/style_case_studies.css logo: bose_featured_logo.png featured: false weight: 2 quote: > - The CNCF Landscape quickly explains what’s going on in all the different areas from storage to cloud providers to automation and so forth. This is our shopping cart to build a cloud infrastructure. We can go choose from the different aisles. + The CNCF Landscape quickly explains what's going on in all the different areas from storage to cloud providers to automation and so forth. This is our shopping cart to build a cloud infrastructure. We can go choose from the different aisles. + +new_case_study_styles: true +heading_background: /images/case-studies/bose/banner1.jpg +heading_title_logo: /images/bose_logo.png +subheading: > + Bose: Supporting Rapid Development for Millions of IoT Products With Kubernetes +case_study_details: + - Company: Bose Corporation + - Location: Framingham, Massachusetts + - Industry: Consumer Electronics --- -
-

CASE STUDY:
Bose: Supporting Rapid Development for Millions of IoT Products With Kubernetes +

Challenge

-

+

A household name in high-quality audio equipment, Bose has offered connected products for more than five years, and as that demand grew, the infrastructure had to change to support it. "We needed to provide a mechanism for developers to rapidly prototype and deploy services all the way to production pretty fast," says Lead Cloud Engineer Josh West. In 2016, the company decided to start building a platform from scratch. The primary goal: "To be one to two steps ahead of the different product groups so that we are never scrambling to catch up with their scale," says Cloud Architecture Manager Dylan O'Mahony.

-
+

Solution

-
- Company  Bose Corporation     Location  Framingham, Massachusetts -     Industry  Consumer Electronics -
+

From the beginning, the team knew it wanted a microservices architecture. After evaluating and prototyping a couple of orchestration solutions, the team decided to adopt Kubernetes for its scaled IoT Platform-as-a-Service running on AWS. The platform, which also incorporated Prometheus monitoring, launched in production in 2017, serving over 3 million connected products from the get-go. Bose has since adopted a number of other CNCF technologies, including Fluentd, CoreDNS, Jaeger, and OpenTracing.

-
-
-
-
-

Challenge

- A household name in high-quality audio equipment, Bose has offered connected products for more than five years, and as that demand grew, the infrastructure had to change to support it. "We needed to provide a mechanism for developers to rapidly prototype and deploy services all the way to production pretty fast,” says Lead Cloud Engineer Josh West. In 2016, the company decided to start building a platform from scratch. The primary goal: "To be one to two steps ahead of the different product groups so that we are never scrambling to catch up with their scale,” says Cloud Architecture Manager Dylan O’Mahony. -

-

Solution

- From the beginning, the team knew it wanted a microservices architecture. After evaluating and prototyping a couple of orchestration solutions, the team decided to adopt Kubernetes for its scaled IoT Platform-as-a-Service running on AWS. The platform, which also incorporated Prometheus monitoring, launched in production in 2017, serving over 3 million connected products from the get-go. Bose has since adopted a number of other CNCF technologies, including Fluentd, CoreDNS, Jaeger, and OpenTracing. -

Impact

- With about 100 engineers onboarded, the platform is now enabling 30,000 non-production deployments across dozens of microservices per year. In 2018, there were 1250+ production deployments. Just one production cluster holds 1,800 namespaces and 340 worker nodes. "We had a brand new service taken from concept through coding and deployment all the way to production, including hardening, security testing and so forth, in less than two and a half weeks,” says O’Mahony. -
-
-
-
-
- "At Bose we’re building an IoT platform that has enabled our physical products. If it weren’t for Kubernetes and the rest of the CNCF projects being free open source software with such a strong community, we would never have achieved scale, or even gotten to launch on schedule." -

- Josh West, Lead Cloud Engineer, Bose
-
-
-
-
-

A household name in high-quality audio equipment, Bose has offered connected products for more than five years, and as that demand grew, the infrastructure had to change to support it.

- "We needed to provide a mechanism for developers to rapidly prototype and deploy services all the way to production pretty fast,” says Lead Cloud Engineer Josh West. "There were a lot of cloud capabilities we wanted to provide to support our audio equipment and experiences.”

-In 2016, the company decided to start building an IoT platform from scratch. The primary goal: "To be one to two steps ahead of the different product groups so that we are never scrambling to catch up with their scale,” says Cloud Architecture Manager Dylan O’Mahony. "If they release a new connected product, we want to be already well ahead of being able to handle whatever scale that they’re going to throw at us.”

-From the beginning, the team knew it wanted a microservices architecture and platform as a service. After evaluating and prototyping orchestration solutions, including Mesos and Docker Swarm, the team decided to adopt Kubernetes for its platform running on AWS. Kubernetes was still in 1.5, but already the technology could do much of what the team wanted and needed for the present and the future. For West, that meant having storage and network handled. O’Mahony points to Kubernetes’ portability in case Bose decides to go multi-cloud.

-"Bose is a company that looks out for the long term,” says West. "Going with a quick commercial off-the-shelf solution might’ve worked for that point in time, but it would not have carried us forward, which is what we needed from Kubernetes and the CNCF.” +

With about 100 engineers onboarded, the platform is now enabling 30,000 non-production deployments across dozens of microservices per year. In 2018, there were 1250+ production deployments. Just one production cluster holds 1,800 namespaces and 340 worker nodes. "We had a brand new service taken from concept through coding and deployment all the way to production, including hardening, security testing and so forth, in less than two and a half weeks," says O'Mahony.

+{{< case-studies/quote author="Josh West, Lead Cloud Engineer, Bose" >}} +"At Bose we're building an IoT platform that has enabled our physical products. If it weren't for Kubernetes and the rest of the CNCF projects being free open source software with such a strong community, we would never have achieved scale, or even gotten to launch on schedule." +{{< /case-studies/quote >}} -
-
-
-
- "Everybody on the team thinks in terms of automation, leaning out the processes, getting things done as quickly as possible. When you step back and look at what it means for a 50-plus-year-old speaker company to have that sort of culture, it really is quite incredible, and I think the tools that we use and the foundation that we’ve built with them is a huge piece of that."

- Dylan O’Mahony, Cloud Architecture Manager, Bose
+{{< case-studies/lead >}} +A household name in high-quality audio equipment, Bose has offered connected products for more than five years, and as that demand grew, the infrastructure had to change to support it. +{{< /case-studies/lead >}} -
-
-
-
- The team spent time working on choosing tooling to make the experience easier for developers. "Our developers interact with tools provided by our Ops team, and the Ops team run all of their tooling on top of Kubernetes,” says O’Mahony. "We try not to make direct Kubernetes access the only way. In fact, ideally, our developers wouldn’t even need to know that they’re running on Kubernetes.”

- The platform, which also incorporated Prometheus monitoring from the beginning, backdoored its way into production in 2017, serving over 3 million connected products from the get-go. "Even though the speakers and the products that we were designing this platform for were still quite a ways away from being launched, we did have some connected speakers on the market,” says O’Mahony. "We basically started to point certain features of those speakers and the apps that go with those speakers to this platform.”

- Today, just one of Bose’s production clusters holds 1,800 namespaces/discrete services and 340 nodes. With about 100 engineers now onboarded, the platform infrastructure is now enabling 30,000 non-production deployments across dozens of microservices per year. In 2018, there were 1250+ production deployments.. It’s a staggering improvement over some of Bose’s previous deployment processes, which supported far fewer deployments and services. +

"We needed to provide a mechanism for developers to rapidly prototype and deploy services all the way to production pretty fast," says Lead Cloud Engineer Josh West. "There were a lot of cloud capabilities we wanted to provide to support our audio equipment and experiences."

-
-
-
-
- "The CNCF Landscape quickly explains what’s going on in all the different areas from storage to cloud providers to automation and so forth. This is our shopping cart to build a cloud infrastructure. We can go choose from the different aisles."

- Josh West, Lead Cloud Engineer, Bose
-
-
+

In 2016, the company decided to start building an IoT platform from scratch. The primary goal: "To be one to two steps ahead of the different product groups so that we are never scrambling to catch up with their scale," says Cloud Architecture Manager Dylan O'Mahony. "If they release a new connected product, we want to be already well ahead of being able to handle whatever scale that they're going to throw at us."

-
-
- "We had a brand new service deployed from concept through coding and deployment all the way to production, including hardening, security testing and so forth, in less than two and a half weeks,” says O’Mahony. "Everybody thinks in terms of automation, leaning out the processes, getting things done as quickly as possible. When you step back and look at what it means for a 50-plus-year-old speaker company to have that sort of culture, it really is quite incredible, and I think the tools that we use and the foundation that we’ve built is a huge piece of that.”

- Many of those technologies—such as Fluentd, CoreDNS, Jaeger, and OpenTracing—come from the CNCF Landscape, which West and O’Mahony have relied upon throughout Bose’s cloud native journey. "The CNCF Landscape quickly explains what’s going on in all the different areas from storage to cloud providers to automation and so forth,” says West. "This is our shopping cart to build a cloud infrastructure. We can go choose from the different aisles.”

- And, he adds, "If it weren’t for Kubernetes and the rest of the CNCF projects being free open source software with such a strong community, we would never have achieved scale, or even gotten to launch on schedule.”

- Another benefit of going cloud native: "We are even attracting much more talent into Bose because we’re so involved with the CNCF Landscape,” says West. (Yes, they’re hiring.) "It’s just enabled so many people to do so many great things and really brought Bose into the future of cloud.” +

From the beginning, the team knew it wanted a microservices architecture and platform as a service. After evaluating and prototyping orchestration solutions, including Mesos and Docker Swarm, the team decided to adopt Kubernetes for its platform running on AWS. Kubernetes was still in 1.5, but already the technology could do much of what the team wanted and needed for the present and the future. For West, that meant having storage and network handled. O'Mahony points to Kubernetes' portability in case Bose decides to go multi-cloud.

+

"Bose is a company that looks out for the long term," says West. "Going with a quick commercial off-the-shelf solution might've worked for that point in time, but it would not have carried us forward, which is what we needed from Kubernetes and the CNCF."

-
+{{< case-studies/quote + image="/images/case-studies/bose/banner3.jpg" + author="Dylan O'Mahony, Cloud Architecture Manager, Bose" +>}} +"Everybody on the team thinks in terms of automation, leaning out the processes, getting things done as quickly as possible. When you step back and look at what it means for a 50-plus-year-old speaker company to have that sort of culture, it really is quite incredible, and I think the tools that we use and the foundation that we've built with them is a huge piece of that." +{{< /case-studies/quote >}} -
-
-"We have a lot going on to support many more of our business units at Bose in addition to the consumer electronics division, which we currently do. It’s only because of the cloud native landscape and the tools and the features that are available that we can provide such a fantastic cloud platform for all the developers and divisions that are trying to enable some pretty amazing experiences."

- Dylan O’Mahony, Cloud Architecture Manager, Bose
-
+

The team spent time working on choosing tooling to make the experience easier for developers. "Our developers interact with tools provided by our Ops team, and the Ops team run all of their tooling on top of Kubernetes," says O'Mahony. "We try not to make direct Kubernetes access the only way. In fact, ideally, our developers wouldn't even need to know that they're running on Kubernetes."

-
- In the coming year, the team wants to work on service mesh and serverless, as well as expansion around the world. "Getting our latency down by going multi-region is going to be a big focus for us,” says O’Mahony. "In order to make sure that our customers in Japan, Australia, and everywhere else are having a good experience, we want to have points of presence closer to them. It’s never been done at Bose before.”

- That won’t stop them, because the team is all about lofty goals. "We want to get to billions of connected products!” says West. "We have a lot going on to support many more of our business units at Bose in addition to the consumer electronics division, which we currently do. It’s only because of the cloud native landscape and the tools and the features that are available that we can provide such a fantastic cloud platform for all the developers and divisions that are trying to enable some pretty amazing experiences.”

- In fact, given the scale the platform is already supporting, says O’Mahony, "doing anything other than Kubernetes, I think, would be folly at this point.” +

The platform, which also incorporated Prometheus monitoring from the beginning, backdoored its way into production in 2017, serving over 3 million connected products from the get-go. "Even though the speakers and the products that we were designing this platform for were still quite a ways away from being launched, we did have some connected speakers on the market," says O'Mahony. "We basically started to point certain features of those speakers and the apps that go with those speakers to this platform."

+

Today, just one of Bose's production clusters holds 1,800 namespaces/discrete services and 340 nodes. With about 100 engineers now onboarded, the platform infrastructure is now enabling 30,000 non-production deployments across dozens of microservices per year. In 2018, there were 1250+ production deployments.. It's a staggering improvement over some of Bose's previous deployment processes, which supported far fewer deployments and services.

-
-
- - +{{< case-studies/quote + image="/images/case-studies/bose/banner4.jpg" + author="Josh West, Lead Cloud Engineer, Bose" +>}} +"The CNCF Landscape quickly explains what's going on in all the different areas from storage to cloud providers to automation and so forth. This is our shopping cart to build a cloud infrastructure. We can go choose from the different aisles." +{{< /case-studies/quote >}} + +

"We had a brand new service deployed from concept through coding and deployment all the way to production, including hardening, security testing and so forth, in less than two and a half weeks," says O'Mahony. "Everybody thinks in terms of automation, leaning out the processes, getting things done as quickly as possible. When you step back and look at what it means for a 50-plus-year-old speaker company to have that sort of culture, it really is quite incredible, and I think the tools that we use and the foundation that we've built is a huge piece of that."

+ +

Many of those technologies—such as Fluentd, CoreDNS, Jaeger, and OpenTracing—come from the CNCF Landscape, which West and O'Mahony have relied upon throughout Bose's cloud native journey. "The CNCF Landscape quickly explains what's going on in all the different areas from storage to cloud providers to automation and so forth," says West. "This is our shopping cart to build a cloud infrastructure. We can go choose from the different aisles."

+ +

And, he adds, "If it weren't for Kubernetes and the rest of the CNCF projects being free open source software with such a strong community, we would never have achieved scale, or even gotten to launch on schedule."

+ +

Another benefit of going cloud native: "We are even attracting much more talent into Bose because we're so involved with the CNCF Landscape," says West. (Yes, they're hiring.) "It's just enabled so many people to do so many great things and really brought Bose into the future of cloud."

+ +{{< case-studies/quote author="Dylan O'Mahony, Cloud Architecture Manager, Bose" >}} +"We have a lot going on to support many more of our business units at Bose in addition to the consumer electronics division, which we currently do. It's only because of the cloud native landscape and the tools and the features that are available that we can provide such a fantastic cloud platform for all the developers and divisions that are trying to enable some pretty amazing experiences." +{{< /case-studies/quote >}} + +

In the coming year, the team wants to work on service mesh and serverless, as well as expansion around the world. "Getting our latency down by going multi-region is going to be a big focus for us," says O'Mahony. "In order to make sure that our customers in Japan, Australia, and everywhere else are having a good experience, we want to have points of presence closer to them. It's never been done at Bose before."

+ +

That won't stop them, because the team is all about lofty goals. "We want to get to billions of connected products!" says West. "We have a lot going on to support many more of our business units at Bose in addition to the consumer electronics division, which we currently do. It's only because of the cloud native landscape and the tools and the features that are available that we can provide such a fantastic cloud platform for all the developers and divisions that are trying to enable some pretty amazing experiences."

+ +

In fact, given the scale the platform is already supporting, says O'Mahony, "doing anything other than Kubernetes, I think, would be folly at this point."

diff --git a/content/en/case-studies/box/index.html b/content/en/case-studies/box/index.html index bead8eb01a..058ff7f9a2 100644 --- a/content/en/case-studies/box/index.html +++ b/content/en/case-studies/box/index.html @@ -2,113 +2,99 @@ title: Box Case Study case_study_styles: true cid: caseStudies -css: /css/style_box.css video: https://www.youtube.com/embed/of45hYbkIZs?autoplay=1 quote: > Kubernetes has the opportunity to be the new cloud platform. The amount of innovation that's going to come from being able to standardize on Kubernetes as a platform is incredibly exciting - more exciting than anything I've seen in the last 10 years of working on the cloud. +new_case_study_styles: true +heading_background: /images/case-studies/box/banner1.jpg +heading_title_logo: /images/box_logo.png +subheading: > + An Early Adopter Envisions a New Cloud Platform +case_study_details: + - Company: Box + - Location: Redwood City, California + - Industry: Technology --- -
-

CASE STUDY:
-
An Early Adopter Envisions - a New Cloud Platform
-

-
+

Challenge

+

Founded in 2005, the enterprise content management company allows its more than 50 million users to manage content in the cloud. Box was built primarily with bare metal inside the company's own data centers, with a monolithic PHP code base. As the company was expanding globally, it needed to focus on "how we run our workload across many different cloud infrastructures from bare metal to public cloud," says Sam Ghods, Cofounder and Services Architect of Box. "It's been a huge challenge because of different clouds, especially bare metal, have very different interfaces."

-
- Company  Box     Location  Redwood City, California     Industry  Technology -
+

Solution

-
+

Over the past couple of years, Box has been decomposing its infrastructure into microservices, and became an early adopter of, as well as contributor to, Kubernetes container orchestration. Kubernetes, Ghods says, has allowed Box's developers to "target a universal set of concepts that are portable across all clouds."

-
+

Impact

-
-
+

"Before Kubernetes," Ghods says, "our infrastructure was so antiquated it was taking us more than six months to deploy a new microservice. Today, a new microservice takes less than five days to deploy. And we're working on getting it to an hour."

-

Challenge

- Founded in 2005, the enterprise content management company allows its more than 50 million users to manage content in the cloud. Box was built primarily with bare metal inside the company’s own data centers, with a monolithic PHP code base. As the company was expanding globally, it needed to focus on "how we run our workload across many different cloud infrastructures from bare metal to public cloud," says Sam Ghods, Cofounder and Services Architect of Box. "It’s been a huge challenge because of different clouds, especially bare metal, have very different interfaces." -
-
+{{< case-studies/quote author="SAM GHOUDS, CO-FOUNDER AND SERVICES ARCHITECT OF BOX" >}} +"We looked at a lot of different options, but Kubernetes really stood out....the fact that on day one it was designed to run on bare metal just as well as Google Cloud meant that we could actually migrate to it inside of our data centers, and then use those same tools and concepts to run across public cloud providers as well." +{{< /case-studies/quote >}} -
-

Solution

- Over the past couple of years, Box has been decomposing its infrastructure into microservices, and became an early adopter of, as well as contributor to, Kubernetes container orchestration. Kubernetes, Ghods says, has allowed Box’s developers to "target a universal set of concepts that are portable across all clouds."

+{{< case-studies/lead >}} +In the summer of 2014, Box was feeling the pain of a decade's worth of hardware and software infrastructure that wasn't keeping up with the company's needs. +{{< /case-studies/lead >}} -

Impact

- "Before Kubernetes," Ghods says, "our infrastructure was so antiquated it was taking us more than six months to deploy a new microservice. Today, a new microservice takes less than five days to deploy. And we’re working on getting it to an hour." -
-
+

A platform that allows its more than 50 million users (including governments and big businesses like General Electric) to manage and share content in the cloud, Box was originally a PHP monolith of millions of lines of code built exclusively with bare metal inside of its own data centers. It had already begun to slowly chip away at the monolith, decomposing it into microservices. And "as we've been expanding into regions around the globe, and as the public cloud wars have been heating up, we've been focusing a lot more on figuring out how we run our workload across many different environments and many different cloud infrastructure providers," says Box Cofounder and Services Architect Sam Ghods. "It's been a huge challenge thus far because of all these different providers, especially bare metal, have very different interfaces and ways in which you work with them."

-
+

Box's cloud native journey accelerated that June, when Ghods attended DockerCon. The company had come to the realization that it could no longer run its applications only off bare metal, and was researching containerizing with Docker, virtualizing with OpenStack, and supporting public cloud.

-
-
- "We looked at a lot of different options, but Kubernetes really stood out....the fact that on day one it was designed to run on bare metal just as well as Google Cloud meant that we could actually migrate to it inside of our data centers, and then use those same tools and concepts to run across public cloud providers as well."

- SAM GHOUDS, CO-FOUNDER AND SERVICES ARCHITECT OF BOX -
-
+

At that conference, Google announced the release of its Kubernetes container management system, and Ghods was won over. "We looked at a lot of different options, but Kubernetes really stood out, especially because of the incredibly strong team of Borg veterans and the vision of having a completely infrastructure-agnostic way of being able to run cloud software," he says, referencing Google's internal container orchestrator Borg. "The fact that on day one it was designed to run on bare metal just as well as Google Cloud meant that we could actually migrate to it inside of our data centers, and then use those same tools and concepts to run across public cloud providers as well."

-
+

Another plus: Ghods liked that Kubernetes has a universal set of API objects like pod, service, replica set and deployment object, which created a consistent surface to build tooling against. "Even PaaS layers like OpenShift or Deis that build on top of Kubernetes still treat those objects as first-class principles," he says. "We were excited about having these abstractions shared across the entire ecosystem, which would result in a lot more momentum than we saw in other potential solutions."

-
-

In the summer of 2014, Box was feeling the pain of a decade’s worth of hardware and software infrastructure that wasn’t keeping up with the company’s needs.

+

Box deployed Kubernetes in a cluster in a production data center just six months later. Kubernetes was then still pre-beta, on version 0.11. They started small: The very first thing Ghods's team ran on Kubernetes was a Box API checker that confirms Box is up. "That was just to write and deploy some software to get the whole pipeline functioning," he says. Next came some daemons that process jobs, which was "nice and safe because if they experienced any interruptions, we wouldn't fail synchronous incoming requests from customers."

- A platform that allows its more than 50 million users (including governments and big businesses like General Electric) to manage and share content in the cloud, Box was originally a PHP monolith of millions of lines of code built exclusively with bare metal inside of its own data centers. It had already begun to slowly chip away at the monolith, decomposing it into microservices. And "as we’ve been expanding into regions around the globe, and as the public cloud wars have been heating up, we’ve been focusing a lot more on figuring out how we run our workload across many different environments and many different cloud infrastructure providers," says Box Cofounder and Services Architect Sam Ghods. "It’s been a huge challenge thus far because of all these different providers, especially bare metal, have very different interfaces and ways in which you work with them."

- Box’s cloud native journey accelerated that June, when Ghods attended DockerCon. The company had come to the realization that it could no longer run its applications only off bare metal, and was researching containerizing with Docker, virtualizing with OpenStack, and supporting public cloud.

- At that conference, Google announced the release of its Kubernetes container management system, and Ghods was won over. "We looked at a lot of different options, but Kubernetes really stood out, especially because of the incredibly strong team of Borg veterans and the vision of having a completely infrastructure-agnostic way of being able to run cloud software," he says, referencing Google’s internal container orchestrator Borg. "The fact that on day one it was designed to run on bare metal just as well as Google Cloud meant that we could actually migrate to it inside of our data centers, and then use those same tools and concepts to run across public cloud providers as well."

- Another plus: Ghods liked that Kubernetes has a universal set of API objects like pod, service, replica set and deployment object, which created a consistent surface to build tooling against. "Even PaaS layers like OpenShift or Deis that build on top of Kubernetes still treat those objects as first-class principles," he says. "We were excited about having these abstractions shared across the entire ecosystem, which would result in a lot more momentum than we saw in other potential solutions."

- Box deployed Kubernetes in a cluster in a production data center just six months later. Kubernetes was then still pre-beta, on version 0.11. They started small: The very first thing Ghods’s team ran on Kubernetes was a Box API checker that confirms Box is up. "That was just to write and deploy some software to get the whole pipeline functioning," he says. Next came some daemons that process jobs, which was "nice and safe because if they experienced any interruptions, we wouldn’t fail synchronous incoming requests from customers." +{{< case-studies/quote image="/images/case-studies/box/banner3.jpg">}} +"As we've been expanding into regions around the globe, and as the public cloud wars have been heating up, we've been focusing a lot more on figuring out how we [can have Kubernetes help] run our workload across many different environments and many different cloud infrastructure providers." +{{< /case-studies/quote >}} -
-
+

The first live service, which the team could route to and ask for information, was launched a few months later. At that point, Ghods says, "We were comfortable with the stability of the Kubernetes cluster. We started to port some services over, then we would increase the cluster size and port a few more, and that's ended up to about 100 servers in each data center that are dedicated purely to Kubernetes. And that's going to be expanding a lot over the next 12 months, probably too many hundreds if not thousands."

-
-
- "As we’ve been expanding into regions around the globe, and as the public cloud wars have been heating up, we’ve been focusing a lot more on figuring out how we [can have Kubernetes help] run our workload across many different environments and many different cloud infrastructure providers." -
-
+

While observing teams who began to use Kubernetes for their microservices, "we immediately saw an uptick in the number of microservices being released," Ghods notes. "There was clearly a pent-up demand for a better way of building software through microservices, and the increase in agility helped our developers be more productive and make better architectural choices."

-
-
- The first live service, which the team could route to and ask for information, was launched a few months later. At that point, Ghods says, "We were comfortable with the stability of the Kubernetes cluster. We started to port some services over, then we would increase the cluster size and port a few more, and that’s ended up to about 100 servers in each data center that are dedicated purely to Kubernetes. And that’s going to be expanding a lot over the next 12 months, probably too many hundreds if not thousands."

- While observing teams who began to use Kubernetes for their microservices, "we immediately saw an uptick in the number of microservices being released," Ghods notes. "There was clearly a pent-up demand for a better way of building software through microservices, and the increase in agility helped our developers be more productive and make better architectural choices." -

"There was clearly a pent-up demand for a better way of building software through microservices, and the increase in agility helped our developers be more productive and make better architectural choices."

- Ghods reflects that as early adopters, Box had a different journey from what companies experience now. "We were definitely lock step with waiting for certain things to stabilize or features to get released," he says. "In the early days we were doing a lot of contributions [to components such as kubectl apply] and waiting for Kubernetes to release each of them, and then we’d upgrade, contribute more, and go back and forth several times. The entire project took about 18 months from our first real deployment on Kubernetes to having general availability. If we did that exact same thing today, it would probably be no more than six."

- In any case, Box didn’t have to make too many modifications to Kubernetes for it to work for the company. "The vast majority of the work our team has done to implement Kubernetes at Box has been making it work inside of our existing (and often legacy) infrastructure," says Ghods, "such as upgrading our base operating system from RHEL6 to RHEL7 or integrating it into Nagios, our monitoring infrastructure. But overall Kubernetes has been remarkably flexible with fitting into many of our constraints, and we’ve been running it very successfully on our bare metal infrastructure."

- Perhaps the bigger challenge for Box was a cultural one. "Kubernetes, and cloud native in general, represents a pretty big paradigm shift, and it’s not very incremental," Ghods says. "We’re essentially making this pitch that Kubernetes is going to solve everything because it does things the right way and everything is just suddenly better. But it’s important to keep in mind that it’s not nearly as proven as many other solutions out there. You can’t say how long this or that company took to do it because there just aren’t that many yet. Our team had to really fight for resources because our project was a bit of a moonshot." -
-
+{{< case-studies/lead >}} +"There was clearly a pent-up demand for a better way of building software through microservices, and the increase in agility helped our developers be more productive and make better architectural choices." +{{< /case-studies/lead >}} -
-
- "The vast majority of the work our team has done to implement Kubernetes at Box has been making it work inside of our existing [and often legacy] infrastructure....overall Kubernetes has been remarkably flexible with fitting into many of our constraints, and we’ve been running it very successfully on our bare metal infrastructure." -
-
+

Ghods reflects that as early adopters, Box had a different journey from what companies experience now. "We were definitely lock step with waiting for certain things to stabilize or features to get released," he says. "In the early days we were doing a lot of contributions [to components such as kubectl apply] and waiting for Kubernetes to release each of them, and then we'd upgrade, contribute more, and go back and forth several times. The entire project took about 18 months from our first real deployment on Kubernetes to having general availability. If we did that exact same thing today, it would probably be no more than six."

-
-
- Having learned from experience, Ghods offers these two pieces of advice for companies going through similar challenges: -

1. Deliver early and often.

Service discovery was a huge problem for Box, and the team had to decide whether to build an interim solution or wait for Kubernetes to natively satisfy Box’s unique requirements. After much debate, "we just started focusing on delivering something that works, and then dealing with potentially migrating to a more native solution later," Ghods says. "The above-all-else target for the team should always be to serve real production use cases on the infrastructure, no matter how trivial. This helps keep the momentum going both for the team itself and for the organizational perception of the project."

-

2. Keep an open mind about what your company has to abstract away from developers and what it doesn’t.

Early on, the team built an abstraction on top of Docker files to help ensure that images had the right security updates. - This turned out to be superfluous work, since container images are considered immutable and you can easily scan them post-build to ensure they do not contain vulnerabilities. Because managing infrastructure through containerization is such a discontinuous leap, it’s better to start by interacting directly with the native tools and learning their unique advantages and caveats. An abstraction should be built only after a practical need for it arises.

- In the end, the impact has been powerful. "Before Kubernetes," Ghods says, "our infrastructure was so antiquated it was taking us more than six months to deploy a new microservice. Now a new microservice takes less than five days to deploy. And we’re working on getting it to an hour. Granted, much of that six months was due to how broken our systems were, but bare metal is intrinsically a difficult platform to support unless you have a system like Kubernetes to help manage it."

- By Ghods’s estimate, Box is still several years away from his goal of being a 90-plus percent Kubernetes shop. "We’re very far along on having a mission-critical, stable Kubernetes deployment that provides a lot of value," he says. "Right now about five percent of all of our compute runs on Kubernetes, and I think in the next six months we’ll likely be between 20 to 50 percent. We’re working hard on enabling all stateless service use cases, and shift our focus to stateful services after that." -
-
+

In any case, Box didn't have to make too many modifications to Kubernetes for it to work for the company. "The vast majority of the work our team has done to implement Kubernetes at Box has been making it work inside of our existing (and often legacy) infrastructure," says Ghods, "such as upgrading our base operating system from RHEL6 to RHEL7 or integrating it into Nagios, our monitoring infrastructure. But overall Kubernetes has been remarkably flexible with fitting into many of our constraints, and we've been running it very successfully on our bare metal infrastructure."

-
-
- "Ghods predicts that Kubernetes has the opportunity to be the new cloud platform. '...because it’s a never-before-seen level of automation and intelligence surrounding infrastructure that is portable and agnostic to every way you can run your infrastructure.'" -
-
+

Perhaps the bigger challenge for Box was a cultural one. "Kubernetes, and cloud native in general, represents a pretty big paradigm shift, and it's not very incremental," Ghods says. "We're essentially making this pitch that Kubernetes is going to solve everything because it does things the right way and everything is just suddenly better. But it's important to keep in mind that it's not nearly as proven as many other solutions out there. You can't say how long this or that company took to do it because there just aren't that many yet. Our team had to really fight for resources because our project was a bit of a moonshot."

-
-
- In fact, that’s what he envisions across the industry: Ghods predicts that Kubernetes has the opportunity to be the new cloud platform. Kubernetes provides an API consistent across different cloud platforms including bare metal, and "I don’t think people have seen the full potential of what’s possible when you can program against one single interface," he says. "The same way AWS changed infrastructure so that you don’t have to think about servers or cabinets or networking equipment anymore, Kubernetes enables you to focus exclusively on the containers that you’re running, which is pretty exciting. That’s the vision."

- Ghods points to projects that are already in development or recently released for Kubernetes as a cloud platform: cluster federation, the Dashboard UI, and CoreOS’s etcd operator. "I honestly believe it’s the most exciting thing I’ve seen in cloud infrastructure," he says, "because it’s a never-before-seen level of automation and intelligence surrounding infrastructure that is portable and agnostic to every way you can run your infrastructure."

- Box, with its early decision to use bare metal, embarked on its Kubernetes journey out of necessity. But Ghods says that even if companies don’t have to be agnostic about cloud providers today, Kubernetes may soon become the industry standard, as more and more tooling and extensions are built around the API.

- "The same way it doesn’t make sense to deviate from Linux because it’s such a standard," Ghods says, "I think Kubernetes is going down the same path. It is still early days—the documentation still needs work and the user experience for writing and publishing specs to the Kubernetes clusters is still rough. When you’re on the cutting edge you can expect to bleed a little. But the bottom line is, this is where the industry is going. Three to five years from now it’s really going to be shocking if you run your infrastructure any other way." -
-
+{{< case-studies/quote image="/images/case-studies/box/banner4.jpg">}} +"The vast majority of the work our team has done to implement Kubernetes at Box has been making it work inside of our existing [and often legacy] infrastructure....overall Kubernetes has been remarkably flexible with fitting into many of our constraints, and we've been running it very successfully on our bare metal infrastructure." +{{< /case-studies/quote >}} + +

Having learned from experience, Ghods offers these two pieces of advice for companies going through similar challenges:

+ +{{< case-studies/lead >}} +1. Deliver early and often. +{{< /case-studies/lead >}} + +

Service discovery was a huge problem for Box, and the team had to decide whether to build an interim solution or wait for Kubernetes to natively satisfy Box's unique requirements. After much debate, "we just started focusing on delivering something that works, and then dealing with potentially migrating to a more native solution later," Ghods says. "The above-all-else target for the team should always be to serve real production use cases on the infrastructure, no matter how trivial. This helps keep the momentum going both for the team itself and for the organizational perception of the project."

+ +{{< case-studies/lead >}} +2. Keep an open mind about what your company has to abstract away from developers and what it doesn't. +{{< /case-studies/lead >}} + +

Early on, the team built an abstraction on top of Docker files to help ensure that images had the right security updates. This turned out to be superfluous work, since container images are considered immutable and you can easily scan them post-build to ensure they do not contain vulnerabilities. Because managing infrastructure through containerization is such a discontinuous leap, it's better to start by interacting directly with the native tools and learning their unique advantages and caveats. An abstraction should be built only after a practical need for it arises.

+ +

In the end, the impact has been powerful. "Before Kubernetes," Ghods says, "our infrastructure was so antiquated it was taking us more than six months to deploy a new microservice. Now a new microservice takes less than five days to deploy. And we're working on getting it to an hour. Granted, much of that six months was due to how broken our systems were, but bare metal is intrinsically a difficult platform to support unless you have a system like Kubernetes to help manage it."

+ +

By Ghods's estimate, Box is still several years away from his goal of being a 90-plus percent Kubernetes shop. "We're very far along on having a mission-critical, stable Kubernetes deployment that provides a lot of value," he says. "Right now about five percent of all of our compute runs on Kubernetes, and I think in the next six months we'll likely be between 20 to 50 percent. We're working hard on enabling all stateless service use cases, and shift our focus to stateful services after that."

+ +{{< case-studies/quote >}} +"Ghods predicts that Kubernetes has the opportunity to be the new cloud platform. '...because it's a never-before-seen level of automation and intelligence surrounding infrastructure that is portable and agnostic to every way you can run your infrastructure.'" +{{< /case-studies/quote >}} + +

In fact, that's what he envisions across the industry: Ghods predicts that Kubernetes has the opportunity to be the new cloud platform. Kubernetes provides an API consistent across different cloud platforms including bare metal, and "I don't think people have seen the full potential of what's possible when you can program against one single interface," he says. "The same way AWS changed infrastructure so that you don't have to think about servers or cabinets or networking equipment anymore, Kubernetes enables you to focus exclusively on the containers that you're running, which is pretty exciting. That's the vision."

+ +

Ghods points to projects that are already in development or recently released for Kubernetes as a cloud platform: cluster federation, the Dashboard UI, and CoreOS's etcd operator. "I honestly believe it's the most exciting thing I've seen in cloud infrastructure," he says, "because it's a never-before-seen level of automation and intelligence surrounding infrastructure that is portable and agnostic to every way you can run your infrastructure."

+ +

Box, with its early decision to use bare metal, embarked on its Kubernetes journey out of necessity. But Ghods says that even if companies don't have to be agnostic about cloud providers today, Kubernetes may soon become the industry standard, as more and more tooling and extensions are built around the API.

+ +

"The same way it doesn't make sense to deviate from Linux because it's such a standard," Ghods says, "I think Kubernetes is going down the same path. It is still early days—the documentation still needs work and the user experience for writing and publishing specs to the Kubernetes clusters is still rough. When you're on the cutting edge you can expect to bleed a little. But the bottom line is, this is where the industry is going. Three to five years from now it's really going to be shocking if you run your infrastructure any other way."

diff --git a/content/en/case-studies/buffer/index.html b/content/en/case-studies/buffer/index.html index 333db6a74e..bcf0896445 100644 --- a/content/en/case-studies/buffer/index.html +++ b/content/en/case-studies/buffer/index.html @@ -1,112 +1,83 @@ --- title: Buffer Case Study - case_study_styles: true cid: caseStudies -css: /css/style_buffer.css + +new_case_study_styles: true +heading_background: /images/case-studies/buffer/banner3.jpg +heading_title_logo: /images/buffer.png +subheading: > + Making Deployments Easy for a Small, Distributed Team +case_study_details: + - Company: Buffer + - Location: Around the World + - Industry: Social Media Technology --- -
-

CASE STUDY:
-
Making Deployments Easy for a Small, Distributed Team
-

-
+

Challenge

-
- Company Buffer     Location  Around the World     Industry  Social Media Technology -
+

With a small but fully distributed team of 80 working across almost a dozen time zones, Buffer—which offers social media management to agencies and marketers—was looking to solve its "classic monolithic code base problem," says Architect Dan Farrelly. "We wanted to have the kind of liquid infrastructure where a developer could create an app and deploy it and scale it horizontally as necessary."

-
+

Solution

-
-
-
+

Embracing containerization, Buffer moved its infrastructure from Amazon Web Services' Elastic Beanstalk to Docker on AWS, orchestrated with Kubernetes.

-

Challenge

- With a small but fully distributed team of 80 working across almost a dozen time zones, Buffer—which offers social media management to agencies and marketers—was looking to solve its "classic monolithic code base problem," says Architect Dan Farrelly. "We wanted to have the kind of liquid infrastructure where a developer could create an app and deploy it and scale it horizontally as necessary." -
- -
-

Solution

- Embracing containerization, Buffer moved its infrastructure from Amazon Web Services’ Elastic Beanstalk to Docker on AWS, orchestrated with Kubernetes. -
-

Impact

- The new system "leveled up our ability with deployment and rolling out new changes," says Farrelly. "Building something on your computer and knowing that it’s going to work has shortened things up a lot. Our feedback cycles are a lot faster now too." -
-
-
-
-
- "It’s amazing that we can use the Kubernetes solution off the shelf with our team. And it just keeps getting better. Before we even know that we need something, it’s there in the next release or it’s coming in the next few months."

- DAN FARRELLY, BUFFER ARCHITECT -
-
+

The new system "leveled up our ability with deployment and rolling out new changes," says Farrelly. "Building something on your computer and knowing that it's going to work has shortened things up a lot. Our feedback cycles are a lot faster now too."

-
-
-

Dan Farrelly uses a carpentry analogy to explain the problem his company, Buffer, began having as its team of developers grew over the past few years.

+{{< case-studies/quote author="DAN FARRELLY, BUFFER ARCHITECT" >}} +"It's amazing that we can use the Kubernetes solution off the shelf with our team. And it just keeps getting better. Before we even know that we need something, it's there in the next release or it's coming in the next few months." +{{< /case-studies/quote >}} - "If you’re building a table by yourself, it’s fine," the company’s architect says. "If you bring in a second person to work on the table, maybe that person can start sanding the legs while you’re sanding the top. But when you bring a third or fourth person in, someone should probably work on a different table." Needing to work on more and more different tables led Buffer on a path toward microservices and containerization made possible by Kubernetes.

- Since around 2012, Buffer had already been using Elastic Beanstalk, the orchestration service for deploying infrastructure offered by Amazon Web Services. "We were deploying a single monolithic PHP application, and it was the same application across five or six environments," says Farrelly. "We were very much a product-driven company. It was all about shipping new features quickly and getting things out the door, and if something was not broken, we didn’t spend too much time on it. If things were getting a little bit slow, we’d maybe use a faster server or just scale up one instance, and it would be good enough. We’d move on."

- But things came to a head in 2016. With the growing number of committers on staff, Farrelly and Buffer’s then-CTO, Sunil Sadasivan, decided it was time to re-architect and rethink their infrastructure. "It was a classic monolithic code base problem," says Farrelly.

Some of the company’s team was already successfully using Docker in their development environment, but the only application running on Docker in production was a marketing website that didn’t see real user traffic. They wanted to go further with Docker, and the next step was looking at options for orchestration. -
-
+{{< case-studies/lead >}} +Dan Farrelly uses a carpentry analogy to explain the problem his company, Buffer, began having as its team of developers grew over the past few years. +{{< /case-studies/lead >}} -
-
- And all the things Kubernetes did well suited Buffer’s needs. "We wanted to have the kind of liquid infrastructure where a developer could create an app and deploy it and scale it horizontally as necessary," says Farrelly. "We quickly used some scripts to set up a couple of test clusters, we built some small proof-of-concept applications in containers, and we deployed things within an hour. We had very little experience in running containers in production. It was amazing how quickly we could get a handle on it [Kubernetes]." -
-
+

"If you're building a table by yourself, it's fine," the company's architect says. "If you bring in a second person to work on the table, maybe that person can start sanding the legs while you're sanding the top. But when you bring a third or fourth person in, someone should probably work on a different table." Needing to work on more and more different tables led Buffer on a path toward microservices and containerization made possible by Kubernetes.

-
-
- First they considered Mesosphere, DC/OS and Amazon Elastic Container Service (which their data systems team was already using for some data pipeline jobs). While they were impressed by these offerings, they ultimately went with Kubernetes. "We run on AWS still, so spinning up, creating services and creating load balancers on demand for us without having to configure them manually was a great way for our team to get into this," says Farrelly. "We didn’t need to figure out how to configure this or that, especially coming from a former Elastic Beanstalk environment that gave us an automatically-configured load balancer. I really liked Kubernetes’ controls of the command line. It just took care of ports. It was a lot more flexible. Kubernetes was designed for doing what it does, so it does it very well."

- And all the things Kubernetes did well suited Buffer’s needs. "We wanted to have the kind of liquid infrastructure where a developer could create an app and deploy it and scale it horizontally as necessary," says Farrelly. "We quickly used some scripts to set up a couple of test clusters, we built some small proof-of-concept applications in containers, and we deployed things within an hour. We had very little experience in running containers in production. It was amazing how quickly we could get a handle on it [Kubernetes]."

- Above all, it provided a powerful solution for one of the company’s most distinguishing characteristics: their remote team that’s spread across a dozen different time zones. "The people with deep knowledge of our infrastructure live in time zones different from our peak traffic time zones, and most of our product engineers live in other places," says Farrelly. "So we really wanted something where anybody could get a grasp of the system early on and utilize it, and not have to worry that the deploy engineer is asleep. Otherwise people would sit around for 12 to 24 hours for something. It’s been really cool to see people moving much faster." -

- With a relatively small engineering team—just 25 people, and only a handful working on infrastructure, with the majority front-end developers—Buffer needed "something robust for them to deploy whatever they wanted," says Farrelly. Before, "it was only a couple of people who knew how to set up everything in the old way. With this system, it was easy to review documentation and get something out extremely quickly. It lowers the bar for us to get everything in production. We don't have the big team to build all these tools or manage the infrastructure like other larger companies might." -
-
+

Since around 2012, Buffer had already been using Elastic Beanstalk, the orchestration service for deploying infrastructure offered by Amazon Web Services. "We were deploying a single monolithic PHP application, and it was the same application across five or six environments," says Farrelly. "We were very much a product-driven company. It was all about shipping new features quickly and getting things out the door, and if something was not broken, we didn't spend too much time on it. If things were getting a little bit slow, we'd maybe use a faster server or just scale up one instance, and it would be good enough. We'd move on."

-
-
- "In our old way of working, the feedback loop was a lot longer, and it was delicate because if you deployed something, the risk was high to potentially break something else," Farrelly says. "With the kind of deploys that we built around Kubernetes, we were able to detect bugs and fix them, and get them deployed super fast. The second someone is fixing [a bug], it’s out the door." -
-
+

But things came to a head in 2016. With the growing number of committers on staff, Farrelly and Buffer's then-CTO, Sunil Sadasivan, decided it was time to re-architect and rethink their infrastructure. "It was a classic monolithic code base problem," says Farrelly.

Some of the company's team was already successfully using Docker in their development environment, but the only application running on Docker in production was a marketing website that didn't see real user traffic. They wanted to go further with Docker, and the next step was looking at options for orchestration.

-
-
- To help with this, Buffer developers wrote a deploy bot that wraps the Kubernetes deploy process and can be used by every team. "Before, our data analysts would update, say, a Python analysis script and have to wait for the lead on that team to click the button and deploy it," Farrelly explains. "Now our data analysts can make a change, enter a Slack command, ‘/deploy,’ and it goes out instantly. They don’t need to wait on these slow turnaround times. They don’t even know where it’s running; it doesn’t matter." -

- One of the first applications the team built from scratch using Kubernetes was a new image resizing service. As a social media management tool that allows marketing teams to collaborate on posts and send updates across multiple social media profiles and networks, Buffer has to be able to resize photographs as needed to meet the varying limitations of size and format posed by different social networks. "We always had these hacked together solutions," says Farrelly. -

- To create this new service, one of the senior product engineers was assigned to learn Docker and Kubernetes, then build the service, test it, deploy it and monitor it—which he was able to do relatively quickly. "In our old way of working, the feedback loop was a lot longer, and it was delicate because if you deployed something, the risk was high to potentially break something else," Farrelly says. "With the kind of deploys that we built around Kubernetes, we were able to detect bugs and fix them, and get them deployed super fast. The second someone is fixing [a bug], it’s out the door." -

- Plus, unlike with their old system, they could scale things horizontally with one command. "As we rolled it out," Farrelly says, "we could anticipate and just click a button. This allowed us to deal with the demand that our users were placing on the system and easily scale it to handle it." -

- Another thing they weren’t able to do before was a canary deploy. This new capability "made us so much more confident in deploying big changes," says Farrelly. "Before, it took a lot of testing, which is still good, but it was also a lot of ‘fingers crossed.’ And this is something that gets run 800,000 times a day, the core of our business. If it doesn’t work, our business doesn’t work. In a Kubernetes world, I can do a canary deploy to test it for 1 percent and I can shut it down very quickly if it isn’t working. This has leveled up our ability to deploy and roll out new changes quickly while reducing risk." +{{< case-studies/quote image="/images/case-studies/buffer/banner1.jpg" >}} +And all the things Kubernetes did well suited Buffer's needs. "We wanted to have the kind of liquid infrastructure where a developer could create an app and deploy it and scale it horizontally as necessary," says Farrelly. "We quickly used some scripts to set up a couple of test clusters, we built some small proof-of-concept applications in containers, and we deployed things within an hour. We had very little experience in running containers in production. It was amazing how quickly we could get a handle on it [Kubernetes]." +{{< /case-studies/quote >}} -
-
+

First they considered Mesosphere, DC/OS and Amazon Elastic Container Service (which their data systems team was already using for some data pipeline jobs). While they were impressed by these offerings, they ultimately went with Kubernetes. "We run on AWS still, so spinning up, creating services and creating load balancers on demand for us without having to configure them manually was a great way for our team to get into this," says Farrelly. "We didn't need to figure out how to configure this or that, especially coming from a former Elastic Beanstalk environment that gave us an automatically-configured load balancer. I really liked Kubernetes' controls of the command line. It just took care of ports. It was a lot more flexible. Kubernetes was designed for doing what it does, so it does it very well."

-
-
- "If you want to run containers in production, with nearly the power that Google uses internally, this [Kubernetes] is a great way to do that," Farrelly says. "We’re a relatively small team that’s actually running Kubernetes, and we’ve never run anything like it before. So it’s more approachable than you might think. That’s the one big thing that I tell people who are experimenting with it. Pick a couple of things, roll it out, kick the tires on this for a couple of months and see how much it can handle. You start learning a lot this way." -
-
+

And all the things Kubernetes did well suited Buffer's needs. "We wanted to have the kind of liquid infrastructure where a developer could create an app and deploy it and scale it horizontally as necessary," says Farrelly. "We quickly used some scripts to set up a couple of test clusters, we built some small proof-of-concept applications in containers, and we deployed things within an hour. We had very little experience in running containers in production. It was amazing how quickly we could get a handle on it [Kubernetes]."

-
-
- By October 2016, 54 percent of Buffer’s traffic was going through their Kubernetes cluster. "There’s a lot of our legacy functionality that still runs alright, and those parts might move to Kubernetes or stay in our old setup forever," says Farrelly. But the company made the commitment at that time that going forward, "all new development, all new features, will be running on Kubernetes." -

- The plan for 2017 is to move all the legacy applications to a new Kubernetes cluster, and run everything they’ve pulled out of their old infrastructure, plus the new services they’re developing in Kubernetes, on another cluster. "I want to bring all the benefits that we’ve seen on our early services to everyone on the team," says Farrelly. -

-

For Buffer’s engineers, it’s an exciting process. "Every time we’re deploying a new service, we need to figure out: OK, what’s the architecture? How do these services communicate? What’s the best way to build this service?" Farrelly says. "And then we use the different features that Kubernetes has to glue all the pieces together. It’s enabling us to experiment as we’re learning how to design a service-oriented architecture. Before, we just wouldn’t have been able to do it. This is actually giving us a blank white board so we can do whatever we want on it." -

- Part of that blank slate is the flexibility that Kubernetes offers should the time come when Buffer may want or need to change its cloud. "It’s cloud agnostic so maybe one day we could switch to Google or somewhere else," Farrelly says. "We’re very deep in Amazon but it’s nice to know we could move away if we need to." -

- At this point, the team at Buffer can’t imagine running their infrastructure any other way—and they’re happy to spread the word. "If you want to run containers in production, with nearly the power that Google uses internally, this [Kubernetes] is a great way to do that," Farrelly says. "We’re a relatively small team that’s actually running Kubernetes, and we’ve never run anything like it before. So it’s more approachable than you might think. That’s the one big thing that I tell people who are experimenting with it. Pick a couple of things, roll it out, kick the tires on this for a couple of months and see how much it can handle. You start learning a lot this way." -

-
-
+

Above all, it provided a powerful solution for one of the company's most distinguishing characteristics: their remote team that's spread across a dozen different time zones. "The people with deep knowledge of our infrastructure live in time zones different from our peak traffic time zones, and most of our product engineers live in other places," says Farrelly. "So we really wanted something where anybody could get a grasp of the system early on and utilize it, and not have to worry that the deploy engineer is asleep. Otherwise people would sit around for 12 to 24 hours for something. It's been really cool to see people moving much faster."

+ +

With a relatively small engineering team—just 25 people, and only a handful working on infrastructure, with the majority front-end developers—Buffer needed "something robust for them to deploy whatever they wanted," says Farrelly. Before, "it was only a couple of people who knew how to set up everything in the old way. With this system, it was easy to review documentation and get something out extremely quickly. It lowers the bar for us to get everything in production. We don't have the big team to build all these tools or manage the infrastructure like other larger companies might."

+ +{{< case-studies/quote image="/images/case-studies/buffer/banner4.jpg" >}} +"In our old way of working, the feedback loop was a lot longer, and it was delicate because if you deployed something, the risk was high to potentially break something else," Farrelly says. "With the kind of deploys that we built around Kubernetes, we were able to detect bugs and fix them, and get them deployed super fast. The second someone is fixing [a bug], it's out the door." +{{< /case-studies/quote >}} + +

To help with this, Buffer developers wrote a deploy bot that wraps the Kubernetes deploy process and can be used by every team. "Before, our data analysts would update, say, a Python analysis script and have to wait for the lead on that team to click the button and deploy it," Farrelly explains. "Now our data analysts can make a change, enter a Slack command, '/deploy,' and it goes out instantly. They don't need to wait on these slow turnaround times. They don't even know where it's running; it doesn't matter."

+ +

One of the first applications the team built from scratch using Kubernetes was a new image resizing service. As a social media management tool that allows marketing teams to collaborate on posts and send updates across multiple social media profiles and networks, Buffer has to be able to resize photographs as needed to meet the varying limitations of size and format posed by different social networks. "We always had these hacked together solutions," says Farrelly.

+ +

To create this new service, one of the senior product engineers was assigned to learn Docker and Kubernetes, then build the service, test it, deploy it and monitor it—which he was able to do relatively quickly. "In our old way of working, the feedback loop was a lot longer, and it was delicate because if you deployed something, the risk was high to potentially break something else," Farrelly says. "With the kind of deploys that we built around Kubernetes, we were able to detect bugs and fix them, and get them deployed super fast. The second someone is fixing [a bug], it's out the door."

+ +

Plus, unlike with their old system, they could scale things horizontally with one command. "As we rolled it out," Farrelly says, "we could anticipate and just click a button. This allowed us to deal with the demand that our users were placing on the system and easily scale it to handle it."

+ +

Another thing they weren't able to do before was a canary deploy. This new capability "made us so much more confident in deploying big changes," says Farrelly. "Before, it took a lot of testing, which is still good, but it was also a lot of 'fingers crossed.' And this is something that gets run 800,000 times a day, the core of our business. If it doesn't work, our business doesn't work. In a Kubernetes world, I can do a canary deploy to test it for 1 percent and I can shut it down very quickly if it isn't working. This has leveled up our ability to deploy and roll out new changes quickly while reducing risk."

+ +{{< case-studies/quote >}} +"If you want to run containers in production, with nearly the power that Google uses internally, this [Kubernetes] is a great way to do that," Farrelly says. "We're a relatively small team that's actually running Kubernetes, and we've never run anything like it before. So it's more approachable than you might think. That's the one big thing that I tell people who are experimenting with it. Pick a couple of things, roll it out, kick the tires on this for a couple of months and see how much it can handle. You start learning a lot this way." +{{< /case-studies/quote >}} + +

By October 2016, 54 percent of Buffer's traffic was going through their Kubernetes cluster. "There's a lot of our legacy functionality that still runs alright, and those parts might move to Kubernetes or stay in our old setup forever," says Farrelly. But the company made the commitment at that time that going forward, "all new development, all new features, will be running on Kubernetes."

+ +

The plan for 2017 is to move all the legacy applications to a new Kubernetes cluster, and run everything they've pulled out of their old infrastructure, plus the new services they're developing in Kubernetes, on another cluster. "I want to bring all the benefits that we've seen on our early services to everyone on the team," says Farrelly.

+ +{{< case-studies/lead >}} +For Buffer's engineers, it's an exciting process. "Every time we're deploying a new service, we need to figure out: OK, what's the architecture? How do these services communicate? What's the best way to build this service?" Farrelly says. "And then we use the different features that Kubernetes has to glue all the pieces together. It's enabling us to experiment as we're learning how to design a service-oriented architecture. Before, we just wouldn't have been able to do it. This is actually giving us a blank white board so we can do whatever we want on it." +{{< /case-studies/lead >}} + +

Part of that blank slate is the flexibility that Kubernetes offers should the time come when Buffer may want or need to change its cloud. "It's cloud agnostic so maybe one day we could switch to Google or somewhere else," Farrelly says. "We're very deep in Amazon but it's nice to know we could move away if we need to."

+ +

At this point, the team at Buffer can't imagine running their infrastructure any other way—and they're happy to spread the word. "If you want to run containers in production, with nearly the power that Google uses internally, this [Kubernetes] is a great way to do that," Farrelly says. "We're a relatively small team that's actually running Kubernetes, and we've never run anything like it before. So it's more approachable than you might think. That's the one big thing that I tell people who are experimenting with it. Pick a couple of things, roll it out, kick the tires on this for a couple of months and see how much it can handle. You start learning a lot this way."

diff --git a/content/en/case-studies/capital-one/index.html b/content/en/case-studies/capital-one/index.html index f95fb2acc7..22e7322ea0 100644 --- a/content/en/case-studies/capital-one/index.html +++ b/content/en/case-studies/capital-one/index.html @@ -2,95 +2,61 @@ title: Capital One Case Study case_study_styles: true cid: caseStudies -css: /css/style_case_studies.css + +new_case_study_styles: true +heading_background: /images/case-studies/capitalone/banner1.jpg +heading_title_logo: /images/capitalone-logo.png +subheading: > + Supporting Fast Decisioning Applications with Kubernetes +case_study_details: + - Company: Capital One + - Location: McLean, Virginia + - Industry: Retail banking --- -
-

CASE STUDY:
Supporting Fast Decisioning Applications with Kubernetes +

Challenge

-

+

The team set out to build a provisioning platform for Capital One applications deployed on AWS that use streaming, big-data decisioning, and machine learning. One of these applications handles millions of transactions a day; some deal with critical functions like fraud detection and credit decisioning. The key considerations: resilience and speed—as well as full rehydration of the cluster from base AMIs.

-
+

Solution

-
- Company  Capital One     Location  McLean, Virginia     Industry  Retail banking -
- -
-
-
-
-

Challenge

- The team set out to build a provisioning platform for Capital One applications deployed on AWS that use streaming, big-data decisioning, and machine learning. One of these applications handles millions of transactions a day; some deal with critical functions like fraud detection and credit decisioning. The key considerations: resilience and speed—as well as full rehydration of the cluster from base AMIs. - -
-

Solution

- The decision to run Kubernetes "is very strategic for us," says John Swift, Senior Director Software Engineering. "We use Kubernetes as a substrate or an operating system, if you will. There’s a degree of affinity in our product development." -
- -
+

The decision to run Kubernetes "is very strategic for us," says John Swift, Senior Director Software Engineering. "We use Kubernetes as a substrate or an operating system, if you will. There's a degree of affinity in our product development."

Impact

- "Kubernetes is a significant productivity multiplier," says Lead Software Engineer Keith Gasser, adding that to run the platform without Kubernetes would "easily see our costs triple, quadruple what they are now for the amount of pure AWS expense." Time to market has been improved as well: "Now, a team can come to us and we can have them up and running with a basic decisioning app in a fortnight, which before would have taken a whole quarter, if not longer." Deployments increased by several orders of magnitude. Plus, the rehydration/cluster-rebuild process, which took a significant part of a day to do manually, now takes a couple hours with Kubernetes automation and declarative configuration. + +

"Kubernetes is a significant productivity multiplier," says Lead Software Engineer Keith Gasser, adding that to run the platform without Kubernetes would "easily see our costs triple, quadruple what they are now for the amount of pure AWS expense." Time to market has been improved as well: "Now, a team can come to us and we can have them up and running with a basic decisioning app in a fortnight, which before would have taken a whole quarter, if not longer." Deployments increased by several orders of magnitude. Plus, the rehydration/cluster-rebuild process, which took a significant part of a day to do manually, now takes a couple hours with Kubernetes automation and declarative configuration.

+ +{{< case-studies/quote author="Jamil Jadallah, Scrum Master" >}} + +
+"With the scalability, the management, the coordination, Kubernetes really empowers us and gives us more time back than we had before." +{{< /case-studies/quote >}} + +

As a top 10 U.S. retail bank, Capital One has applications that handle millions of transactions a day. Big-data decisioning—for fraud detection, credit approvals and beyond—is core to the business. To support the teams that build applications with those functions for the bank, the cloud team led by Senior Director Software Engineering John Swift embraced Kubernetes for its provisioning platform. "Kubernetes and its entire ecosystem are very strategic for us," says Swift. "We use Kubernetes as a substrate or an operating system, if you will. There's a degree of affinity in our product development."

+ +

Almost two years ago, the team embarked on this journey by first working with Docker. Then came Kubernetes. "We wanted to put streaming services into Kubernetes as one feature of the workloads for fast decisioning, and to be able to do batch alongside it," says Lead Software Engineer Keith Gasser. "Once the data is streamed and batched, there are so many tool sets in Flink that we use for decisioning. We want to provide the tools in the same ecosystem, in a consistent way, rather than have a large custom snowflake ecosystem where every tool needs its own custom deployment. Kubernetes gives us the ability to bring all of these together, so the richness of the open source and even the license community dealing with big data can be corralled."

+ +{{< case-studies/quote image="/images/case-studies/capitalone/banner3.jpg" >}} +"We want to provide the tools in the same ecosystem, in a consistent way, rather than have a large custom snowflake ecosystem where every tool needs its own custom deployment. Kubernetes gives us the ability to bring all of these together, so the richness of the open source and even the license community dealing with big data can be corralled." +{{< /case-studies/quote >}} + +

In this first year, the impact has already been great. "Time to market is really huge for us," says Gasser. "Especially with fraud, you have to be very nimble in the way you respond to threats in the marketplace—being able to add and push new rules, detect new patterns of behavior, detect anomalies in account and transaction flows." With Kubernetes, "a team can come to us and we can have them up and running with a basic decisioning app in a fortnight, which before would have taken a whole quarter, if not longer. Kubernetes is a manifold productivity multiplier."

+ +

Teams now have the tools to be autonomous in their deployments, and as a result, deployments have increased by two orders of magnitude. "And that was with just seven dedicated resources, without needing a whole group sitting there watching everything," says Scrum Master Jamil Jadallah. "That's a huge cost savings. With the scalability, the management, the coordination, Kubernetes really empowers us and gives us more time back than we had before."

+ +{{< case-studies/quote image="/images/case-studies/capitalone/banner4.jpg" >}} +With Kubernetes, "a team can come to us and we can have them up and running with a basic decisioning app in a fortnight, which before would have taken a whole quarter, if not longer. Kubernetes is a manifold productivity multiplier." +{{< /case-studies/quote >}} + +

Kubernetes has also been a great time-saver for Capital One's required period "rehydration" of clusters from base AMIs. To minimize the attack vulnerability profile for applications in the cloud, "Our entire clusters get rebuilt from scratch periodically, with new fresh instances and virtual server images that are patched with the latest and greatest security patches," says Gasser. This process used to take the better part of a day, and personnel, to do manually. It's now a quick Kubernetes job.

+ +

Savings extend to both capital and operating expenses. "It takes very little to get into Kubernetes because it's all open source," Gasser points out. "We went the DIY route for building our cluster, and we definitely like the flexibility of being able to embrace the latest from the community immediately without waiting for a downstream company to do it. There's capex related to those licenses that we don't have to pay for. Moreover, there's capex savings for us from some of the proprietary software that we get to sunset in our particular domain. So that goes onto our ledger in a positive way as well." (Some of those open source technologies include Prometheus, Fluentd, gRPC, Istio, CNI, and Envoy.)

+ +{{< case-studies/quote >}} +"If we had to do all of this without Kubernetes, on underlying cloud services, I could easily see our costs triple, quadruple what they are now for the amount of pure AWS expense. That doesn't account for personnel to deploy and maintain all the additional infrastructure." +{{< /case-studies/quote >}} -
+

And on the opex side, Gasser says, the savings are high. "We run dozens of services, we have scores of pods, many daemon sets, and since we're data-driven, we take advantage of EBS-backed volume claims for all of our stateful services. If we had to do all of this without Kubernetes, on underlying cloud services, I could easily see our costs triple, quadruple what they are now for the amount of pure AWS expense. That doesn't account for personnel to deploy and maintain all the additional infrastructure."

-
-
-
-
-

-"With the scalability, the management, the coordination, Kubernetes really empowers us and gives us more time back than we had before." — Jamil Jadallah, Scrum Master -
-
- -
-
-

- As a top 10 U.S. retail bank, Capital One has applications that handle millions of transactions a day. Big-data decisioning—for fraud detection, credit approvals and beyond—is core to the business. To support the teams that build applications with those functions for the bank, the cloud team led by Senior Director Software Engineering John Swift embraced Kubernetes for its provisioning platform. "Kubernetes and its entire ecosystem are very strategic for us," says Swift. "We use Kubernetes as a substrate or an operating system, if you will. There’s a degree of affinity in our product development."

- Almost two years ago, the team embarked on this journey by first working with Docker. Then came Kubernetes. "We wanted to put streaming services into Kubernetes as one feature of the workloads for fast decisioning, and to be able to do batch alongside it," says Lead Software Engineer Keith Gasser. "Once the data is streamed and batched, there are so many tool sets in Flink that we use for decisioning. We want to provide the tools in the same ecosystem, in a consistent way, rather than have a large custom snowflake ecosystem where every tool needs its own custom deployment. Kubernetes gives us the ability to bring all of these together, so the richness of the open source and even the license community dealing with big data can be corralled." - - - -
-
-
-
- "We want to provide the tools in the same ecosystem, in a consistent way, rather than have a large custom snowflake ecosystem where every tool needs its own custom deployment. Kubernetes gives us the ability to bring all of these together, so the richness of the open source and even the license community dealing with big data can be corralled." - - -
-
-
-
- In this first year, the impact has already been great. "Time to market is really huge for us," says Gasser. "Especially with fraud, you have to be very nimble in the way you respond to threats in the marketplace—being able to add and push new rules, detect new patterns of behavior, detect anomalies in account and transaction flows." With Kubernetes, "a team can come to us and we can have them up and running with a basic decisioning app in a fortnight, which before would have taken a whole quarter, if not longer. Kubernetes is a manifold productivity multiplier."

- Teams now have the tools to be autonomous in their deployments, and as a result, deployments have increased by two orders of magnitude. "And that was with just seven dedicated resources, without needing a whole group sitting there watching everything," says Scrum Master Jamil Jadallah. "That’s a huge cost savings. With the scalability, the management, the coordination, Kubernetes really empowers us and gives us more time back than we had before." - -
-
-
-
- With Kubernetes, "a team can come to us and we can have them up and running with a basic decisioning app in a fortnight, which before would have taken a whole quarter, if not longer. Kubernetes is a manifold productivity multiplier." -
-
- -
-
- Kubernetes has also been a great time-saver for Capital One’s required period "rehydration" of clusters from base AMIs. To minimize the attack vulnerability profile for applications in the cloud, "Our entire clusters get rebuilt from scratch periodically, with new fresh instances and virtual server images that are patched with the latest and greatest security patches," says Gasser. This process used to take the better part of a day, and personnel, to do manually. It’s now a quick Kubernetes job.

- Savings extend to both capital and operating expenses. "It takes very little to get into Kubernetes because it’s all open source," Gasser points out. "We went the DIY route for building our cluster, and we definitely like the flexibility of being able to embrace the latest from the community immediately without waiting for a downstream company to do it. There’s capex related to those licenses that we don’t have to pay for. Moreover, there’s capex savings for us from some of the proprietary software that we get to sunset in our particular domain. So that goes onto our ledger in a positive way as well." (Some of those open source technologies include Prometheus, Fluentd, gRPC, Istio, CNI, and Envoy.) - -
- -
-
- "If we had to do all of this without Kubernetes, on underlying cloud services, I could easily see our costs triple, quadruple what they are now for the amount of pure AWS expense. That doesn’t account for personnel to deploy and maintain all the additional infrastructure." -
-
- -
- And on the opex side, Gasser says, the savings are high. "We run dozens of services, we have scores of pods, many daemon sets, and since we’re data-driven, we take advantage of EBS-backed volume claims for all of our stateful services. If we had to do all of this without Kubernetes, on underlying cloud services, I could easily see our costs triple, quadruple what they are now for the amount of pure AWS expense. That doesn’t account for personnel to deploy and maintain all the additional infrastructure."

- The team is confident that the benefits will continue to multiply—without a steep learning curve for the engineers being exposed to the new technology. "As we onboard additional tenants in this ecosystem, I think the need for folks to understand Kubernetes may not necessarily go up. In fact, I think it goes down, and that’s good," says Gasser. "Because that really demonstrates the scalability of the technology. You start to reap the benefits, and they can concentrate on all the features they need to build for great decisioning in the business— fraud decisions, credit decisions—and not have to worry about, ‘Is my AWS server broken? Is my pod not running?’" -
- -
+

The team is confident that the benefits will continue to multiply—without a steep learning curve for the engineers being exposed to the new technology. "As we onboard additional tenants in this ecosystem, I think the need for folks to understand Kubernetes may not necessarily go up. In fact, I think it goes down, and that's good," says Gasser. "Because that really demonstrates the scalability of the technology. You start to reap the benefits, and they can concentrate on all the features they need to build for great decisioning in the business— fraud decisions, credit decisions—and not have to worry about, 'Is my AWS server broken? Is my pod not running?'"

diff --git a/content/en/case-studies/cern/index.html b/content/en/case-studies/cern/index.html index 48e965d7fb..461aa8fb84 100644 --- a/content/en/case-studies/cern/index.html +++ b/content/en/case-studies/cern/index.html @@ -3,91 +3,79 @@ title: CERN Case Study linkTitle: cern case_study_styles: true cid: caseStudies -css: /css/style_case_studies.css logo: cern_featured_logo.png + +new_case_study_styles: true +heading_background: /images/case-studies/cern/banner1.jpg +heading_title_text: CERN +subheading: > + CERN: Processing Petabytes of Data More Efficiently with Kubernetes +case_study_details: + - Company: CERN + - Location: Geneva, Switzerland + - Industry: Particle physics research --- -
-

CASE STUDY: CERN
CERN: Processing Petabytes of Data More Efficiently with Kubernetes +

Challenge

-

+

At CERN, the European Organization for Nuclear Research, physicists conduct experiments to learn about fundamental science. In its particle accelerators, "we accelerate protons to very high energy, close to the speed of light, and we make the two beams of protons collide," says CERN Software Engineer Ricardo Rocha. "The end result is a lot of data that we have to process." CERN currently stores 330 petabytes of data in its data centers, and an upgrade of its accelerators expected in the next few years will drive that number up by 10x. Additionally, the organization experiences extreme peaks in its workloads during periods prior to big conferences, and needs its infrastructure to scale to those peaks. "We want to have a more hybrid infrastructure, where we have our on premise infrastructure but can make use of public clouds temporarily when these peaks come up," says Rocha. "We've been looking to new technologies that can help improve our efficiency in our infrastructure so that we can dedicate more of our resources to the actual processing of the data."

-
+

Solution

-
- Company  CERN     Location  Geneva, Switzerland -     Industry  Particle physics research -
+

CERN's technology team embraced containerization and cloud native practices, choosing Kubernetes for orchestration, Helm for deployment, Prometheus for monitoring, and CoreDNS for DNS resolution inside the clusters. Kubernetes federation has allowed the organization to run some production workloads both on premise and in public clouds.

-
-
-
-
-

Challenge

- At CERN, the European Organization for Nuclear Research, physicists conduct experiments to learn about fundamental science. In its particle accelerators, "we accelerate protons to very high energy, close to the speed of light, and we make the two beams of protons collide," says CERN Software Engineer Ricardo Rocha. "The end result is a lot of data that we have to process." CERN currently stores 330 petabytes of data in its data centers, and an upgrade of its accelerators expected in the next few years will drive that number up by 10x. Additionally, the organization experiences extreme peaks in its workloads during periods prior to big conferences, and needs its infrastructure to scale to those peaks. "We want to have a more hybrid infrastructure, where we have our on premise infrastructure but can make use of public clouds temporarily when these peaks come up," says Rocha. "We’ve been looking to new technologies that can help improve our efficiency in our infrastructure so that we can dedicate more of our resources to the actual processing of the data." -

-

Solution

- CERN’s technology team embraced containerization and cloud native practices, choosing Kubernetes for orchestration, Helm for deployment, Prometheus for monitoring, and CoreDNS for DNS resolution inside the clusters. Kubernetes federation has allowed the organization to run some production workloads both on premise and in public clouds. -

Impact

- "Kubernetes gives us the full automation of the application," says Rocha. "It comes with built-in monitoring and logging for all the applications and the workloads that deploy in Kubernetes. This is a massive simplification of our current deployments." The time to deploy a new cluster for a complex distributed storage system has gone from more than 3 hours to less than 15 minutes. Adding new nodes to a cluster used to take more than an hour; now it takes less than 2 minutes. The time it takes to autoscale replicas for system components has decreased from more than an hour to less than 2 minutes. Initially, virtualization gave 20% overhead, but with tuning this was reduced to ~5%. Moving to Kubernetes on bare metal would get this to 0%. Not having to host virtual machines is expected to also get 10% of memory capacity back. -
+

"Kubernetes gives us the full automation of the application," says Rocha. "It comes with built-in monitoring and logging for all the applications and the workloads that deploy in Kubernetes. This is a massive simplification of our current deployments." The time to deploy a new cluster for a complex distributed storage system has gone from more than 3 hours to less than 15 minutes. Adding new nodes to a cluster used to take more than an hour; now it takes less than 2 minutes. The time it takes to autoscale replicas for system components has decreased from more than an hour to less than 2 minutes. Initially, virtualization gave 20% overhead, but with tuning this was reduced to ~5%. Moving to Kubernetes on bare metal would get this to 0%. Not having to host virtual machines is expected to also get 10% of memory capacity back.

-
-
-
-
- "Kubernetes is something we can relate to very much because it’s naturally distributed. What it gives us is a uniform API across heterogeneous resources to define our workloads. This is something we struggled with a lot in the past when we want to expand our resources outside our infrastructure." -

- Ricardo Rocha, Software Engineer, CERN
-
-
-
-
-

With a mission of researching fundamental science, and a stable of extremely large machines, the European Organization for Nuclear Research (CERN) operates at what can only be described as hyperscale.

- Experiments are conducted in particle accelerators, the biggest of which is 27 kilometers in circumference. "We accelerate protons to very high energy, to close to the speed of light, and we make the two beams of protons collide in well-defined places," says CERN Software Engineer Ricardo Rocha. "We build experiments around these places where we do the collisions. The end result is a lot of data that we have to process."

- And he does mean a lot: CERN currently stores and processes 330 petabytes of data—gathered from 4,300 projects and 3,300 users—using 10,000 hypervisors and 320,000 cores in its data centers.

- Over the years, the CERN technology department has built a large computing infrastructure, based on OpenStack private clouds, to help the organization’s physicists analyze and treat all this data. The organization experiences extreme peaks in its workloads. "Very often, just before conferences, physicists want to do an enormous amount of extra analysis to publish their papers, and we have to scale to these peaks, which means overcommitting resources in some cases," says Rocha. "We want to have a more hybrid infrastructure, where we have our on premise infrastructure but can make use of public clouds temporarily when these peaks come up."

- Additionally, few years ago, CERN announced that it would be doing a big upgrade of its accelerators, which will mean a ten-fold increase in the amount of data that can be collected. "So we’ve been looking to new technologies that can help improve our efficiency in our infrastructure, so that we can dedicate more of our resources to the actual processing of the data," says Rocha. +{{< case-studies/quote author="Ricardo Rocha, Software Engineer, CERN" >}} +"Kubernetes is something we can relate to very much because it's naturally distributed. What it gives us is a uniform API across heterogeneous resources to define our workloads. This is something we struggled with a lot in the past when we want to expand our resources outside our infrastructure." +{{< /case-studies/quote >}} -
-
-
-
- "Before, the tendency was always: ‘I need this, I get a couple of developers, and I implement it.’ Right now it’s ‘I need this, I’m sure other people also need this, so I’ll go and ask around.’ The CNCF is a good source because there’s a very large catalog of applications available. It’s very hard right now to justify developing a new product in-house. There is really no real reason to keep doing that. It’s much easier for us to try it out, and if we see it’s a good solution, we try to reach out to the community and start working with that community."

- Ricardo Rocha, Software Engineer, CERN
+{{< case-studies/lead >}} +With a mission of researching fundamental science, and a stable of extremely large machines, the European Organization for Nuclear Research (CERN) operates at what can only be described as hyperscale. +{{< /case-studies/lead >}} -
-
-
-
- Rocha’s team started looking at Kubernetes and containerization in the second half of 2015. "We’ve been using distributed infrastructures for decades now," says Rocha. "Kubernetes is something we can relate to very much because it’s naturally distributed. What it gives us is a uniform API across heterogeneous resources to define our workloads. This is something we struggled with a lot in the past when we want to expand our resources outside our infrastructure."

- The team created a prototype system for users to deploy their own Kubernetes cluster in CERN’s infrastructure, and spent six months validating the use cases and making sure that Kubernetes integrated with CERN’s internal systems. The main use case is batch workloads, which represent more than 80% of resource usage at CERN. (One single project that does most of the physics data processing and analysis alone consumes 250,000 cores.) "This is something where the investment in simplification of the deployment, logging, and monitoring pays off very quickly," says Rocha. Other use cases include Spark-based data analysis and machine learning to improve physics analysis. "The fact that most of these technologies integrate very well with Kubernetes makes our lives easier," he adds.

- The system went into production in October 2016, also using Helm for deployment, Prometheus for monitoring, and CoreDNS for DNS resolution within the cluster. "One thing that Kubernetes gives us is the full automation of the application," says Rocha. "So it comes with built-in monitoring and logging for all the applications and the workloads that deploy in Kubernetes. This is a massive simplification of our current deployments." The time to deploy a new cluster for a complex distributed storage system has gone from more than 3 hours to less than 15 minutes.

Adding new nodes to a cluster used to take more than an hour; now it takes less than 2 minutes. The time it takes to autoscale replicas for system components has decreased from more than an hour to less than 2 minutes. +

Experiments are conducted in particle accelerators, the biggest of which is 27 kilometers in circumference. "We accelerate protons to very high energy, to close to the speed of light, and we make the two beams of protons collide in well-defined places," says CERN Software Engineer Ricardo Rocha. "We build experiments around these places where we do the collisions. The end result is a lot of data that we have to process."

-
-
-
-
- "With Kubernetes, there’s a well-established technology and a big community that we can contribute to. It allows us to do our physics analysis without having to focus so much on the lower level software. This is just exciting. We are looking forward to keep contributing to the community and collaborating with everyone."

- Ricardo Rocha, Software Engineer, CERN
-
-
+

And he does mean a lot: CERN currently stores and processes 330 petabytes of data—gathered from 4,300 projects and 3,300 users—using 10,000 hypervisors and 320,000 cores in its data centers.

-
-
- Rocha points out that the metric used in the particle accelerators may be events per second, but in reality "it’s how fast and how much of the data we can process that actually counts." And efficiency has certainly been improved with Kubernetes. Initially, virtualization gave 20% overhead, but with tuning this was reduced to ~5%. Moving to Kubernetes on bare metal would get this to 0%. Not having to host virtual machines is expected to also get 10% of memory capacity back.

- Kubernetes federation, which CERN has been using for a portion of its production workloads since February 2018, has allowed the organization to adopt a hybrid cloud strategy. And it was remarkably simple to do. "We had a summer intern working on federation," says Rocha. "For many years, I’ve been developing distributed computing software, which took like a decade and a lot of effort from a lot of people to stabilize and make sure it works. And for our intern, in a couple of days he was able to demo to me and my team that we had a cluster at CERN and a few clusters outside in public clouds that were federated together and that we could submit workloads to. This was shocking for us. It really shows the power of using this kind of well-established technologies."

- With such results, adoption of Kubernetes has made rapid gains at CERN, and the team is eager to give back to the community. "If we look back into the ’90s and early 2000s, there were not a lot of companies focusing on systems that have to scale to this kind of size, storing petabytes of data, analyzing petabytes of data," says Rocha. "The fact that Kubernetes is supported by such a wide community and different backgrounds, it motivates us to contribute back." +

Over the years, the CERN technology department has built a large computing infrastructure, based on OpenStack private clouds, to help the organization's physicists analyze and treat all this data. The organization experiences extreme peaks in its workloads. "Very often, just before conferences, physicists want to do an enormous amount of extra analysis to publish their papers, and we have to scale to these peaks, which means overcommitting resources in some cases," says Rocha. "We want to have a more hybrid infrastructure, where we have our on premise infrastructure but can make use of public clouds temporarily when these peaks come up."

-
+

Additionally, few years ago, CERN announced that it would be doing a big upgrade of its accelerators, which will mean a ten-fold increase in the amount of data that can be collected. "So we've been looking to new technologies that can help improve our efficiency in our infrastructure, so that we can dedicate more of our resources to the actual processing of the data," says Rocha.

-
-
-This means that the physicist can build his or her analysis and publish it in a repository, share it with colleagues, and in 10 years redo the same analysis with new data. If we looked back even 10 years, this was just a dream."

- Ricardo Rocha, Software Engineer, CERN
-
+{{< case-studies/quote + image="/images/case-studies/cern/banner3.jpg" + author="Ricardo Rocha, Software Engineer, CERN" +>}} +"Before, the tendency was always: 'I need this, I get a couple of developers, and I implement it.' Right now it's 'I need this, I'm sure other people also need this, so I'll go and ask around.' The CNCF is a good source because there's a very large catalog of applications available. It's very hard right now to justify developing a new product in-house. There is really no real reason to keep doing that. It's much easier for us to try it out, and if we see it's a good solution, we try to reach out to the community and start working with that community." +{{< /case-studies/quote >}} -
- These new technologies aren’t just enabling infrastructure improvements. CERN also uses the Kubernetes-based Reana/Recast platform for reusable analysis, which is "the ability to define physics analysis as a set of workflows that are fully containerized in one single entry point," says Rocha. "This means that the physicist can build his or her analysis and publish it in a repository, share it with colleagues, and in 10 years redo the same analysis with new data. If we looked back even 10 years, this was just a dream."

- All of these things have changed the culture at CERN considerably. A decade ago, "The tendency was always: ‘I need this, I get a couple of developers, and I implement it,’" says Rocha. "Right now it’s ‘I need this, I’m sure other people also need this, so I’ll go and ask around.’ The CNCF is a good source because there’s a very large catalog of applications available. It’s very hard right now to justify developing a new product in-house. There is really no real reason to keep doing that. It’s much easier for us to try it out, and if we see it’s a good solution, we try to reach out to the community and start working with that community." +

Rocha's team started looking at Kubernetes and containerization in the second half of 2015. "We've been using distributed infrastructures for decades now," says Rocha. "Kubernetes is something we can relate to very much because it's naturally distributed. What it gives us is a uniform API across heterogeneous resources to define our workloads. This is something we struggled with a lot in the past when we want to expand our resources outside our infrastructure."

-
-
+

The team created a prototype system for users to deploy their own Kubernetes cluster in CERN's infrastructure, and spent six months validating the use cases and making sure that Kubernetes integrated with CERN's internal systems. The main use case is batch workloads, which represent more than 80% of resource usage at CERN. (One single project that does most of the physics data processing and analysis alone consumes 250,000 cores.) "This is something where the investment in simplification of the deployment, logging, and monitoring pays off very quickly," says Rocha. Other use cases include Spark-based data analysis and machine learning to improve physics analysis. "The fact that most of these technologies integrate very well with Kubernetes makes our lives easier," he adds.

+ +

The system went into production in October 2016, also using Helm for deployment, Prometheus for monitoring, and CoreDNS for DNS resolution within the cluster. "One thing that Kubernetes gives us is the full automation of the application," says Rocha. "So it comes with built-in monitoring and logging for all the applications and the workloads that deploy in Kubernetes. This is a massive simplification of our current deployments." The time to deploy a new cluster for a complex distributed storage system has gone from more than 3 hours to less than 15 minutes.

+ +

Adding new nodes to a cluster used to take more than an hour; now it takes less than 2 minutes. The time it takes to autoscale replicas for system components has decreased from more than an hour to less than 2 minutes.

+ +{{< case-studies/quote + image="/images/case-studies/cern/banner4.jpg" + author="Ricardo Rocha, Software Engineer, CERN" +>}} +"With Kubernetes, there's a well-established technology and a big community that we can contribute to. It allows us to do our physics analysis without having to focus so much on the lower level software. This is just exciting. We are looking forward to keep contributing to the community and collaborating with everyone." +{{< /case-studies/quote >}} + +

Rocha points out that the metric used in the particle accelerators may be events per second, but in reality "it's how fast and how much of the data we can process that actually counts." And efficiency has certainly been improved with Kubernetes. Initially, virtualization gave 20% overhead, but with tuning this was reduced to ~5%. Moving to Kubernetes on bare metal would get this to 0%. Not having to host virtual machines is expected to also get 10% of memory capacity back.

+ +

Kubernetes federation, which CERN has been using for a portion of its production workloads since February 2018, has allowed the organization to adopt a hybrid cloud strategy. And it was remarkably simple to do. "We had a summer intern working on federation," says Rocha. "For many years, I've been developing distributed computing software, which took like a decade and a lot of effort from a lot of people to stabilize and make sure it works. And for our intern, in a couple of days he was able to demo to me and my team that we had a cluster at CERN and a few clusters outside in public clouds that were federated together and that we could submit workloads to. This was shocking for us. It really shows the power of using this kind of well-established technologies."

+ +

With such results, adoption of Kubernetes has made rapid gains at CERN, and the team is eager to give back to the community. "If we look back into the '90s and early 2000s, there were not a lot of companies focusing on systems that have to scale to this kind of size, storing petabytes of data, analyzing petabytes of data," says Rocha. "The fact that Kubernetes is supported by such a wide community and different backgrounds, it motivates us to contribute back."

+ +{{< case-studies/quote author="Ricardo Rocha, Software Engineer, CERN" >}} +This means that the physicist can build his or her analysis and publish it in a repository, share it with colleagues, and in 10 years redo the same analysis with new data. If we looked back even 10 years, this was just a dream." +{{< /case-studies/quote >}} + +

These new technologies aren't just enabling infrastructure improvements. CERN also uses the Kubernetes-based Reana/Recast platform for reusable analysis, which is "the ability to define physics analysis as a set of workflows that are fully containerized in one single entry point," says Rocha. "This means that the physicist can build his or her analysis and publish it in a repository, share it with colleagues, and in 10 years redo the same analysis with new data. If we looked back even 10 years, this was just a dream."

+ +

All of these things have changed the culture at CERN considerably. A decade ago, "The tendency was always: 'I need this, I get a couple of developers, and I implement it,'" says Rocha. "Right now it's 'I need this, I'm sure other people also need this, so I'll go and ask around.' The CNCF is a good source because there's a very large catalog of applications available. It's very hard right now to justify developing a new product in-house. There is really no real reason to keep doing that. It's much easier for us to try it out, and if we see it's a good solution, we try to reach out to the community and start working with that community."

diff --git a/content/en/case-studies/chinaunicom/index.html b/content/en/case-studies/chinaunicom/index.html index 4479d60e67..4327e5c6a3 100644 --- a/content/en/case-studies/chinaunicom/index.html +++ b/content/en/case-studies/chinaunicom/index.html @@ -1,95 +1,77 @@ --- title: China Unicom Case Study - linkTitle: chinaunicom case_study_styles: true cid: caseStudies -css: /css/style_case_studies.css featured: false + +new_case_study_styles: true +heading_background: /images/case-studies/chinaunicom/banner1.jpg +heading_title_logo: /images/chinaunicom_logo.png +subheading: > + China Unicom: How China Unicom Leveraged Kubernetes to Boost Efficiency and Lower IT Costs +case_study_details: + - Company: China Unicom + - Location: Beijing, China + - Industry: Telecom --- -
-

CASE STUDY:
China Unicom: How China Unicom Leveraged Kubernetes to Boost Efficiency
and Lower IT Costs +

Challenge

-

+

China Unicom is one of the top three telecom operators in China, and to serve its 300 million users, the company runs several data centers with thousands of servers in each, using Docker containerization and VMWare and OpenStack infrastructure since 2016. Unfortunately, "the resource utilization rate was relatively low," says Chengyu Zhang, Group Leader of Platform Technology R&D, "and we didn't have a cloud platform to accommodate our hundreds of applications." Formerly an entirely state-owned company, China Unicom has in recent years taken private investment from BAT (Baidu, Alibaba, Tencent) and JD.com, and is now focusing on internal development using open source technology, rather than commercial products. As such, Zhang's China Unicom Lab team began looking for open source orchestration for its cloud infrastructure.

-
+

Solution

-
- Company  China Unicom     Location  Beijing, China     Industry  Telecom -
+

Because of its rapid growth and mature open source community, Kubernetes was a natural choice for China Unicom. The company's Kubernetes-enabled cloud platform now hosts 50 microservices and all new development going forward. "Kubernetes has improved our experience using cloud infrastructure," says Zhang. "There is currently no alternative technology that can replace it." China Unicom also uses Istio for its microservice framework, Envoy, CoreDNS, and Fluentd.

-
-
-
-
-

Challenge

- China Unicom is one of the top three telecom operators in China, and to serve its 300 million users, the company runs several data centers with thousands of servers in each, using Docker containerization and VMWare and OpenStack infrastructure since 2016. Unfortunately, "the resource utilization rate was relatively low," says Chengyu Zhang, Group Leader of Platform Technology R&D, "and we didn’t have a cloud platform to accommodate our hundreds of applications." Formerly an entirely state-owned company, China Unicom has in recent years taken private investment from BAT (Baidu, Alibaba, Tencent) and JD.com, and is now focusing on internal development using open source technology, rather than commercial products. As such, Zhang’s China Unicom Lab team began looking for open source orchestration for its cloud infrastructure. -

-

Solution

- Because of its rapid growth and mature open source community, Kubernetes was a natural choice for China Unicom. The company’s Kubernetes-enabled cloud platform now hosts 50 microservices and all new development going forward. "Kubernetes has improved our experience using cloud infrastructure," says Zhang. "There is currently no alternative technology that can replace it." China Unicom also uses Istio for its microservice framework, Envoy, CoreDNS, and Fluentd. -

Impact

- At China Unicom, Kubernetes has improved both operational and development efficiency. Resource utilization has increased by 20-50%, lowering IT infrastructure costs, and deployment time has gone from a couple of hours to 5-10 minutes. "This is mainly because of the self-healing and scalability, so we can increase our efficiency in operation and maintenance," Zhang says. "For example, we currently have only five people maintaining our multiple systems. We could never imagine we can achieve this scalability in such a short time." -
+

At China Unicom, Kubernetes has improved both operational and development efficiency. Resource utilization has increased by 20-50%, lowering IT infrastructure costs, and deployment time has gone from a couple of hours to 5-10 minutes. "This is mainly because of the self-healing and scalability, so we can increase our efficiency in operation and maintenance," Zhang says. "For example, we currently have only five people maintaining our multiple systems. We could never imagine we can achieve this scalability in such a short time."

-
-
-
-
- "Kubernetes has improved our experience using cloud infrastructure. There is currently no alternative technology that can replace it." -

- Chengyu Zhang, Group Leader of Platform Technology R&D, China Unicom
-
-
-
-
-

With more than 300 million users, China Unicom is one of the country’s top three telecom operators.

+{{< case-studies/quote author="Chengyu Zhang, Group Leader of Platform Technology R&D, China Unicom" >}} +"Kubernetes has improved our experience using cloud infrastructure. There is currently no alternative technology that can replace it." +{{< /case-studies/quote >}} - Behind the scenes, the company runs multiple data centers with thousands of servers in each, using Docker containerization and VMWare and OpenStack infrastructure since 2016. Unfortunately, "the resource utilization rate was relatively low," says Chengyu Zhang, Group Leader of Platform Technology R&D, "and we didn’t have a cloud platform to accommodate our hundreds of applications."

- Zhang’s team, which is responsible for new technology, R&D and platforms, set out to find an IT management solution. Formerly an entirely state-owned company, China Unicom has in recent years taken private investment from BAT (Baidu, Alibaba, Tencent) and JD.com, and is now focusing on homegrown development using open source technology, rather than commercial products. For that reason, the team began looking for open source orchestration for its cloud infrastructure. +{{< case-studies/lead >}} +With more than 300 million users, China Unicom is one of the country's top three telecom operators. +{{< /case-studies/lead >}} -
-
-
-
- "We could never imagine we can achieve this scalability in such a short time."

- Chengyu Zhang, Group Leader of Platform Technology R&D, China Unicom
+

Behind the scenes, the company runs multiple data centers with thousands of servers in each, using Docker containerization and VMWare and OpenStack infrastructure since 2016. Unfortunately, "the resource utilization rate was relatively low," says Chengyu Zhang, Group Leader of Platform Technology R&D, "and we didn't have a cloud platform to accommodate our hundreds of applications."

-
-
-
-
- Though China Unicom was already using Mesos for a core telecom operator system, the team felt that Kubernetes was a natural choice for the new cloud platform. "The main reason was that it has a mature community," says Zhang. "It grows very rapidly, and so we can learn a lot from others’ best practices." China Unicom also uses Istio for its microservice framework, Envoy, CoreDNS, and Fluentd.

- The company’s Kubernetes-enabled cloud platform now hosts 50 microservices and all new development going forward. China Unicom developers can easily leverage the technology through APIs, without doing the development work themselves. The cloud platform provides 20-30 services connected to the company’s data center PaaS platform, as well as supports things such as big data analysis for internal users in the branch offices across the 31 provinces in China.

- "Kubernetes has improved our experience using cloud infrastructure," says Zhang. "There is currently no alternative technology that can replace it." +

Zhang's team, which is responsible for new technology, R&D and platforms, set out to find an IT management solution. Formerly an entirely state-owned company, China Unicom has in recent years taken private investment from BAT (Baidu, Alibaba, Tencent) and JD.com, and is now focusing on homegrown development using open source technology, rather than commercial products. For that reason, the team began looking for open source orchestration for its cloud infrastructure.

-
-
-
-
- "This technology is relatively complicated, but as long as developers get used to it, they can enjoy all the benefits."

- Jie Jia, Member of Platform Technology R&D, China Unicom
-
-
+{{< case-studies/quote + image="/images/case-studies/chinaunicom/banner3.jpg" + author="Chengyu Zhang, Group Leader of Platform Technology R&D, China Unicom" +>}} +"We could never imagine we can achieve this scalability in such a short time." +{{< /case-studies/quote >}} -
-
- In fact, Kubernetes has boosted both operational and development efficiency at China Unicom. Resource utilization has increased by 20-50%, lowering IT infrastructure costs, and deployment time has gone from a couple of hours to 5-10 minutes. "This is mainly because of the self-healing and scalability of Kubernetes, so we can increase our efficiency in operation and maintenance," Zhang says. "For example, we currently have only five people maintaining our multiple systems."

- With the wins China Unicom has experienced with Kubernetes, Zhang and his team are eager to give back to the community. That starts with participating in meetups and conferences, and offering advice to other companies that are considering a similar path. "Especially for those companies who have had traditional cloud computing system, I really recommend them to join the cloud native computing community," says Zhang. +

Though China Unicom was already using Mesos for a core telecom operator system, the team felt that Kubernetes was a natural choice for the new cloud platform. "The main reason was that it has a mature community," says Zhang. "It grows very rapidly, and so we can learn a lot from others' best practices." China Unicom also uses Istio for its microservice framework, Envoy, CoreDNS, and Fluentd.

-
+

The company's Kubernetes-enabled cloud platform now hosts 50 microservices and all new development going forward. China Unicom developers can easily leverage the technology through APIs, without doing the development work themselves. The cloud platform provides 20-30 services connected to the company's data center PaaS platform, as well as supports things such as big data analysis for internal users in the branch offices across the 31 provinces in China.

-
-
-"Companies can use the managed services offered by companies like Rancher, because they have already customized this technology, you can easily leverage this technology."

- Jie Jia, Member of Platform Technology R&D, China Unicom
-
+

"Kubernetes has improved our experience using cloud infrastructure," says Zhang. "There is currently no alternative technology that can replace it."

-
- Platform Technology R&D team member Jie Jia adds that though "this technology is relatively complicated, as long as developers get used to it, they can enjoy all the benefits." And Zhang points out that in his own experience with virtual machine cloud, "Kubernetes and these cloud native technologies are relatively simpler."

- Plus, "companies can use the managed services offered by companies like Rancher, because they have already customized this technology," says Jia. "You can easily leverage this technology."

- Looking ahead, China Unicom plans to develop more applications on Kubernetes, focusing on big data and machine learning. The team is continuing to optimize the cloud platform that it built, and hopes to pass the conformance test to join CNCF’s Certified Kubernetes Conformance Program. They’re also hoping to someday contribute code back to the community.

- If that sounds ambitious, it’s because the results they’ve gotten from adopting Kubernetes have been beyond even their greatest expectations. Says Zhang: "We could never imagine we can achieve this scalability in such a short time." +{{< case-studies/quote + image="/images/case-studies/chinaunicom/banner4.jpg" + author="Jie Jia, Member of Platform Technology R&D, China Unicom" +>}} +"This technology is relatively complicated, but as long as developers get used to it, they can enjoy all the benefits." +{{< /case-studies/quote >}} -
-
- - +

In fact, Kubernetes has boosted both operational and development efficiency at China Unicom. Resource utilization has increased by 20-50%, lowering IT infrastructure costs, and deployment time has gone from a couple of hours to 5-10 minutes. "This is mainly because of the self-healing and scalability of Kubernetes, so we can increase our efficiency in operation and maintenance," Zhang says. "For example, we currently have only five people maintaining our multiple systems."

+ +

With the wins China Unicom has experienced with Kubernetes, Zhang and his team are eager to give back to the community. That starts with participating in meetups and conferences, and offering advice to other companies that are considering a similar path. "Especially for those companies who have had traditional cloud computing system, I really recommend them to join the cloud native computing community," says Zhang.

+ +{{< case-studies/quote author="Jie Jia, Member of Platform Technology R&D, China Unicom" >}} +"Companies can use the managed services offered by companies like Rancher, because they have already customized this technology, you can easily leverage this technology." +{{< /case-studies/quote >}} + +

Platform Technology R&D team member Jie Jia adds that though "this technology is relatively complicated, as long as developers get used to it, they can enjoy all the benefits." And Zhang points out that in his own experience with virtual machine cloud, "Kubernetes and these cloud native technologies are relatively simpler."

+ +

Plus, "companies can use the managed services offered by companies like Rancher, because they have already customized this technology," says Jia. "You can easily leverage this technology."

+ +

Looking ahead, China Unicom plans to develop more applications on Kubernetes, focusing on big data and machine learning. The team is continuing to optimize the cloud platform that it built, and hopes to pass the conformance test to join CNCF's Certified Kubernetes Conformance Program. They're also hoping to someday contribute code back to the community.

+ +

If that sounds ambitious, it's because the results they've gotten from adopting Kubernetes have been beyond even their greatest expectations. Says Zhang: "We could never imagine we can achieve this scalability in such a short time."

diff --git a/content/en/case-studies/city-of-montreal/index.html b/content/en/case-studies/city-of-montreal/index.html index 55378c649e..f639a30bde 100644 --- a/content/en/case-studies/city-of-montreal/index.html +++ b/content/en/case-studies/city-of-montreal/index.html @@ -3,97 +3,79 @@ title: City of Montreal Case Study linkTitle: city-of-montreal case_study_styles: true cid: caseStudies -css: /css/style_case_studies.css featured: false + +new_case_study_styles: true +heading_background: /images/case-studies/montreal/banner1.jpg +heading_title_logo: /images/montreal_logo.png +subheading: > + City of Montréal - How the City of Montréal Is Modernizing Its 30-Year-Old, Siloed Architecture with Kubernetes +case_study_details: + - Company: City of Montréal + - Location: Montréal, Québec, Canada + - Industry: Government --- -
-

CASE STUDY:
City of Montréal - How the City of Montréal Is Modernizing Its 30-Year-Old, Siloed Architecture with Kubernetes +

Challenge

-

+

Like many governments, Montréal has a number of legacy systems, and "we have systems that are older than some developers working here," says the city's CTO, Jean-Martin Thibault. "We have mainframes, all flavors of Windows, various flavors of Linux, old and new Oracle systems, Sun servers, all kinds of databases. Like all big corporations, some of the most important systems, like Budget and Human Resources, were developed on mainframes in-house over the past 30 years." There are over 1,000 applications in all, and most of them were running on different ecosystems. In 2015, a new management team decided to break down those silos, and invest in IT in order to move toward a more integrated governance for the city. They needed to figure out how to modernize the architecture.

-
+

Solution

-
- Company  City of Montréal     Location  Montréal, Québec, Canada     Industry  Government -
+

The first step was containerization. The team started with a small Docker farm with four or five servers, with Rancher for providing access to the Docker containers and their logs and Jenkins to deploy. "We based our effort on the new trends; we understood the benefits of immutability and deployments without downtime and such things," says Solutions Architect Marc Khouzam. They soon realized they needed orchestration as well, and opted for Kubernetes. Says Enterprise Architect Morgan Martinet: "Kubernetes offered concepts on how you would describe an architecture for any kind of application, and based on those concepts, deploy what's required to run the infrastructure. It was becoming a de facto standard."

-
-
-
-
-

Challenge

- Like many governments, Montréal has a number of legacy systems, and “we have systems that are older than some developers working here,” says the city’s CTO, Jean-Martin Thibault. “We have mainframes, all flavors of Windows, various flavors of Linux, old and new Oracle systems, Sun servers, all kinds of databases. Like all big corporations, some of the most important systems, like Budget and Human Resources, were developed on mainframes in-house over the past 30 years.” There are over 1,000 applications in all, and most of them were running on different ecosystems. In 2015, a new management team decided to break down those silos, and invest in IT in order to move toward a more integrated governance for the city. They needed to figure out how to modernize the architecture. - -

Solution

- The first step was containerization. The team started with a small Docker farm with four or five servers, with Rancher for providing access to the Docker containers and their logs and Jenkins to deploy. “We based our effort on the new trends; we understood the benefits of immutability and deployments without downtime and such things,” says Solutions Architect Marc Khouzam. They soon realized they needed orchestration as well, and opted for Kubernetes. Says Enterprise Architect Morgan Martinet: “Kubernetes offered concepts on how you would describe an architecture for any kind of application, and based on those concepts, deploy what’s required to run the infrastructure. It was becoming a de facto standard.” -

Impact

- The time to market has improved drastically, from many months to a few weeks. Deployments went from months to hours. “In the past, you would have to ask for virtual machines, and that alone could take weeks, easily,” says Thibault. “Now you don’t even have to ask for anything. You just create your project and it gets deployed.” Kubernetes has also improved the efficiency of how the city uses its compute resources: “Before, the 200 application components we currently run on Kubernetes would have required hundreds of virtual machines, and now, if we’re talking about a single environment of production, we are able to run them on 8 machines, counting the masters of Kubernetes,” says Martinet. And it’s all done with a small team of just 5 people operating the Kubernetes clusters. -
-
-
-
-
- "We realized the limitations of having a non-orchestrated Docker environment. Kubernetes came to the rescue, bringing in all these features that make it a lot easier to manage and give a lot more benefits to the users." -

- JEAN-MARTIN THIBAULT, CTO, CITY OF MONTRÉAL
-
-
-
-
-

The second biggest municipality in Canada, Montréal has a large number of legacy systems keeping the government running. And while they don’t quite date back to the city’s founding in 1642, “we have systems that are older than some developers working here,” jokes the city’s CTO, Jean-Martin Thibault.

- “We have mainframes, all flavors of Windows, various flavors of Linux, old and new Oracle systems, Sun servers, all kinds of databases. Some of the most important systems, like Budget and Human Resources, were developed on mainframes in-house over the past 30 years.” -

- In recent years, that fact became a big pain point. There are over 1,000 applications in all, running on almost as many different ecosystems. In 2015, a new city management team decided to break down those silos, and invest in IT in order to move toward a more integrated governance. “The organization was siloed, so as a result the architecture was siloed,” says Thibault. “Once we got integrated into one IT team, we decided to redo an overall enterprise architecture.” -

- The first step to modernize the architecture was containerization. “We based our effort on the new trends; we understood the benefits of immutability and deployments without downtime and such things,” says Solutions Architect Marc Khouzam. The team started with a small Docker farm with four or five servers, with Rancher for providing access to the Docker containers and their logs and Jenkins for deployment. -
-
-
-
- "Getting a project running in Kubernetes is entirely dependent on how long you need to program the actual software. It’s no longer dependent on deployment. Deployment is so fast that it’s negligible."

- MARC KHOUZAM, SOLUTIONS ARCHITECT, CITY OF MONTRÉAL
+

The time to market has improved drastically, from many months to a few weeks. Deployments went from months to hours. "In the past, you would have to ask for virtual machines, and that alone could take weeks, easily," says Thibault. "Now you don't even have to ask for anything. You just create your project and it gets deployed." Kubernetes has also improved the efficiency of how the city uses its compute resources: "Before, the 200 application components we currently run on Kubernetes would have required hundreds of virtual machines, and now, if we're talking about a single environment of production, we are able to run them on 8 machines, counting the masters of Kubernetes," says Martinet. And it's all done with a small team of just 5 people operating the Kubernetes clusters.

-
-
-
-
- But this Docker farm setup had some limitations, including the lack of self-healing and dynamic scaling based on traffic, and the effort required to optimize server resources and scale to multiple instances of the same container. The team soon realized they needed orchestration as well. “Kubernetes came to the rescue,” says Thibault, “bringing in all these features that make it a lot easier to manage and give a lot more benefits to the users.” -

- The team had evaluated several orchestration solutions, but Kubernetes stood out because it addressed all of the pain points. (They were also inspired by Yahoo! Japan’s use case, which the team members felt came close to their vision.) “Kubernetes offered concepts on how you would describe an architecture for any kind of application, and based on those concepts, deploy what’s required to run the infrastructure,” says Enterprise Architect Morgan Martinet. “It was becoming a de facto standard. It also promised portability across cloud providers. The choice of Kubernetes now gives us many options such as running clusters in-house or in any IaaS provider, or even using Kubernetes-as-a-service in any of the major cloud providers.” -

- Another important factor in the decision was vendor neutrality. “As a government entity, it is essential for us to be neutral in our selection of products and providers,” says Thibault. “The independence of the Cloud Native Computing Foundation from any company provides this.” -
-
-
-
- "Kubernetes has been great. It’s been stable, and it provides us with elasticity, resilience, and robustness. While re-architecting for Kubernetes, we also benefited from the monitoring and logging aspects, with centralized logging, Prometheus logging, and Grafana dashboards. We have enhanced visibility of what’s being deployed."

- MORGAN MARTINET, ENTERPRISE ARCHITECT, CITY OF MONTRÉAL
-
-
+{{< case-studies/quote author="JEAN-MARTIN THIBAULT, CTO, CITY OF MONTRÉAL" >}} +"We realized the limitations of having a non-orchestrated Docker environment. Kubernetes came to the rescue, bringing in all these features that make it a lot easier to manage and give a lot more benefits to the users." +{{< /case-studies/quote >}} -
-
- The Kubernetes implementation began with the deployment of a small cluster using an internal Ansible playbook, which was soon replaced by the Kismatic distribution. Given the complexity they saw in operating a Kubernetes platform, they decided to provide development groups with an automated CI/CD solution based on Helm. “An integrated CI/CD solution on Kubernetes standardized how the various development teams designed and deployed their solutions, but allowed them to remain independent,” says Khouzam. -

- During the re-architecting process, the team also added Prometheus for monitoring and alerting, Fluentd for logging, and Grafana for visualization. “We have enhanced visibility of what’s being deployed,” says Martinet. Adds Khouzam: “The big benefit is we can track anything, even things that don’t run inside the Kubernetes cluster. It’s our way to unify our monitoring effort.” -

- All together, the cloud native solution has had a positive impact on velocity as well as administrative overhead. With standardization, code generation, automatic deployments into Kubernetes, and standardized monitoring through Prometheus, the time to market has improved drastically, from many months to a few weeks. Deployments went from months and weeks of planning down to hours. “In the past, you would have to ask for virtual machines, and that alone could take weeks to properly provision,” says Thibault. Plus, for dedicated systems, experts often had to be brought in to install them with their own recipes, which could take weeks and months. -

- Now, says Khouzam, “we can deploy pretty much any application that’s been Dockerized without any help from anybody. Getting a project running in Kubernetes is entirely dependent on how long you need to program the actual software. It’s no longer dependent on deployment. Deployment is so fast that it’s negligible.” +{{< case-studies/lead >}} +The second biggest municipality in Canada, Montréal has a large number of legacy systems keeping the government running. And while they don't quite date back to the city's founding in 1642, "we have systems that are older than some developers working here," jokes the city's CTO, Jean-Martin Thibault. +{{< /case-studies/lead >}} -
+

"We have mainframes, all flavors of Windows, various flavors of Linux, old and new Oracle systems, Sun servers, all kinds of databases. Some of the most important systems, like Budget and Human Resources, were developed on mainframes in-house over the past 30 years."

-
-
-"We’re working with the market when possible, to put pressure on our vendors to support Kubernetes, because it’s a much easier solution to manage"

- MORGAN MARTINET, ENTERPRISE ARCHITECT, CITY OF MONTRÉAL
-
+

In recent years, that fact became a big pain point. There are over 1,000 applications in all, running on almost as many different ecosystems. In 2015, a new city management team decided to break down those silos, and invest in IT in order to move toward a more integrated governance. "The organization was siloed, so as a result the architecture was siloed," says Thibault. "Once we got integrated into one IT team, we decided to redo an overall enterprise architecture."

-
- Kubernetes has also improved the efficiency of how the city uses its compute resources: “Before, the 200 application components we currently run in Kubernetes would have required hundreds of virtual machines, and now, if we’re talking about a single environment of production, we are able to run them on 8 machines, counting the masters of Kubernetes,” says Martinet. And it’s all done with a small team of just five people operating the Kubernetes clusters. Adds Martinet: “It’s a dramatic improvement no matter what you measure.” -

- So it should come as no surprise that the team’s strategy going forward is to target Kubernetes as much as they can. “If something can’t run inside Kubernetes, we’ll wait for it,” says Thibault. That means they haven’t moved any of the city’s Windows systems onto Kubernetes, though it’s something they would like to do. “We’re working with the market when possible, to put pressure on our vendors to support Kubernetes, because it’s a much easier solution to manage,” says Martinet. -

- Thibault sees a near future where 60% of the city’s workloads are running on a Kubernetes platform—basically any and all of the use cases that they can get to work there. “It’s so much more efficient than the way we used to do things,” he says. “There’s no looking back.” +

The first step to modernize the architecture was containerization. "We based our effort on the new trends; we understood the benefits of immutability and deployments without downtime and such things," says Solutions Architect Marc Khouzam. The team started with a small Docker farm with four or five servers, with Rancher for providing access to the Docker containers and their logs and Jenkins for deployment.

-
-
+{{< case-studies/quote + image="/images/case-studies/montreal/banner3.jpg" + author="MARC KHOUZAM, SOLUTIONS ARCHITECT, CITY OF MONTRÉAL" +>}} +"Getting a project running in Kubernetes is entirely dependent on how long you need to program the actual software. It's no longer dependent on deployment. Deployment is so fast that it's negligible." +{{< /case-studies/quote >}} + +

But this Docker farm setup had some limitations, including the lack of self-healing and dynamic scaling based on traffic, and the effort required to optimize server resources and scale to multiple instances of the same container. The team soon realized they needed orchestration as well. "Kubernetes came to the rescue," says Thibault, "bringing in all these features that make it a lot easier to manage and give a lot more benefits to the users."

+ +

The team had evaluated several orchestration solutions, but Kubernetes stood out because it addressed all of the pain points. (They were also inspired by Yahoo! Japan's use case, which the team members felt came close to their vision.) "Kubernetes offered concepts on how you would describe an architecture for any kind of application, and based on those concepts, deploy what's required to run the infrastructure," says Enterprise Architect Morgan Martinet. "It was becoming a de facto standard. It also promised portability across cloud providers. The choice of Kubernetes now gives us many options such as running clusters in-house or in any IaaS provider, or even using Kubernetes-as-a-service in any of the major cloud providers."

+ +

Another important factor in the decision was vendor neutrality. "As a government entity, it is essential for us to be neutral in our selection of products and providers," says Thibault. "The independence of the Cloud Native Computing Foundation from any company provides this."

+ +{{< case-studies/quote + image="/images/case-studies/montreal/banner4.jpg" + author="MORGAN MARTINET, ENTERPRISE ARCHITECT, CITY OF MONTRÉAL" +>}} +"Kubernetes has been great. It's been stable, and it provides us with elasticity, resilience, and robustness. While re-architecting for Kubernetes, we also benefited from the monitoring and logging aspects, with centralized logging, Prometheus logging, and Grafana dashboards. We have enhanced visibility of what's being deployed." +{{< /case-studies/quote >}} + +

The Kubernetes implementation began with the deployment of a small cluster using an internal Ansible playbook, which was soon replaced by the Kismatic distribution. Given the complexity they saw in operating a Kubernetes platform, they decided to provide development groups with an automated CI/CD solution based on Helm. "An integrated CI/CD solution on Kubernetes standardized how the various development teams designed and deployed their solutions, but allowed them to remain independent," says Khouzam.

+ +

During the re-architecting process, the team also added Prometheus for monitoring and alerting, Fluentd for logging, and Grafana for visualization. "We have enhanced visibility of what's being deployed," says Martinet. Adds Khouzam: "The big benefit is we can track anything, even things that don't run inside the Kubernetes cluster. It's our way to unify our monitoring effort."

+ +

All together, the cloud native solution has had a positive impact on velocity as well as administrative overhead. With standardization, code generation, automatic deployments into Kubernetes, and standardized monitoring through Prometheus, the time to market has improved drastically, from many months to a few weeks. Deployments went from months and weeks of planning down to hours. "In the past, you would have to ask for virtual machines, and that alone could take weeks to properly provision," says Thibault. Plus, for dedicated systems, experts often had to be brought in to install them with their own recipes, which could take weeks and months.

+ +

Now, says Khouzam, "we can deploy pretty much any application that's been Dockerized without any help from anybody. Getting a project running in Kubernetes is entirely dependent on how long you need to program the actual software. It's no longer dependent on deployment. Deployment is so fast that it's negligible."

+ +{{< case-studies/quote author="MORGAN MARTINET, ENTERPRISE ARCHITECT, CITY OF MONTRÉAL">}} +"We're working with the market when possible, to put pressure on our vendors to support Kubernetes, because it's a much easier solution to manage" +{{< /case-studies/quote >}} + +

Kubernetes has also improved the efficiency of how the city uses its compute resources: "Before, the 200 application components we currently run in Kubernetes would have required hundreds of virtual machines, and now, if we're talking about a single environment of production, we are able to run them on 8 machines, counting the masters of Kubernetes," says Martinet. And it's all done with a small team of just five people operating the Kubernetes clusters. Adds Martinet: "It's a dramatic improvement no matter what you measure."

+ +

So it should come as no surprise that the team's strategy going forward is to target Kubernetes as much as they can. "If something can't run inside Kubernetes, we'll wait for it," says Thibault. That means they haven't moved any of the city's Windows systems onto Kubernetes, though it's something they would like to do. "We're working with the market when possible, to put pressure on our vendors to support Kubernetes, because it's a much easier solution to manage," says Martinet.

+ +

Thibault sees a near future where 60% of the city's workloads are running on a Kubernetes platform—basically any and all of the use cases that they can get to work there. "It's so much more efficient than the way we used to do things," he says. "There's no looking back."

diff --git a/content/en/case-studies/crowdfire/index.html b/content/en/case-studies/crowdfire/index.html index 227a5c0839..98caae2830 100644 --- a/content/en/case-studies/crowdfire/index.html +++ b/content/en/case-studies/crowdfire/index.html @@ -1,101 +1,85 @@ --- title: Crowdfire Case Study - case_study_styles: true cid: caseStudies -css: /css/style_crowdfire.css + +new_case_study_styles: true +heading_background: /images/case-studies/crowdfire/banner1.jpg +heading_title_logo: /images/crowdfire_logo.png +subheading: > + How to Keep Iterating a Fast-Growing App With a Cloud-Native Approach +case_study_details: + - Company: Crowdfire + - Location: Mumbai, India + - Industry: Social Media Software --- -
-

CASE STUDY:
How to Keep Iterating a Fast-Growing App With a Cloud-Native Approach

+

Challenge

-
+

Crowdfire helps content creators create their content anywhere on the Internet and publish it everywhere else in the right format. Since its launch in 2010, it has grown to 16 million users. The product began as a monolith app running on Google App Engine, and in 2015, the company began a transformation to microservices running on Amazon Web Services Elastic Beanstalk. "It was okay for our use cases initially, but as the number of services, development teams and scale increased, the deploy times, self-healing capabilities and resource utilization started to become problems for us," says Software Engineer Amanpreet Singh, who leads the infrastructure team for Crowdfire.

-
- Company  Crowdfire     Location  Mumbai, India     Industry  Social Media Software -
+

Solution

-
-
-
-
-

Challenge

- Crowdfire helps content creators create their content anywhere on the Internet and publish it everywhere else in the right format. Since its launch in 2010, it has grown to 16 million users. The product began as a monolith app running on Google App Engine, and in 2015, the company began a transformation to microservices running on Amazon Web Services Elastic Beanstalk. "It was okay for our use cases initially, but as the number of services, development teams and scale increased, the deploy times, self-healing capabilities and resource utilization started to become problems for us," says Software Engineer Amanpreet Singh, who leads the infrastructure team for Crowdfire.
-

Solution

- "We realized that we needed a more cloud-native approach to deal with these issues," says Singh. The team decided to implement a custom setup of Kubernetes based on Terraform and Ansible. -
-
- -
+

"We realized that we needed a more cloud-native approach to deal with these issues," says Singh. The team decided to implement a custom setup of Kubernetes based on Terraform and Ansible.

Impact

- "Kubernetes has helped us reduce the deployment time from 15 minutes to less than a minute," says Singh. "Due to Kubernetes’s self-healing nature, the operations team doesn’t need to do any manual intervention in case of a node or pod failure." Plus, he says, "Dev-Prod parity has improved since developers can experiment with options in dev/staging clusters, and when it’s finalized, they just commit the config changes in the respective code repositories. These changes automatically get replicated on the production cluster via CI/CD pipelines." -
-
-
-
-
- "In the 15 months that we’ve been using Kubernetes, it has been amazing for us. It enabled us to iterate quickly, increase development speed, and continuously deliver new features and bug fixes to our users, while keeping our operational costs and infrastructure management overhead under control."

- Amanpreet Singh, Software Engineer at Crowdfire
-
-
-
-
-

"If you build it, they will come."

- For most content creators, only half of that movie quote may ring true. Sure, platforms like Wordpress, YouTube and Shopify have made it simple for almost anyone to start publishing new content online, but attracting an audience isn’t as easy. Crowdfire "helps users publish their content to all possible places where their audience exists," says Amanpreet Singh, a Software Engineer at the company based in Mumbai, India. Crowdfire has gained more than 16 million users—from bloggers and artists to makers and small businesses—since its launch in 2010.

- With that kind of growth—and a high demand from users for new features and continuous improvements—the Crowdfire team struggled to keep up behind the scenes. In 2015, they moved their monolith Java application to Amazon Web Services Elastic Beanstalk and started breaking it down into microservices.

- It was a good first step, but the team soon realized they needed to go further down the cloud-native path, which would lead them to Kubernetes. "It was okay for our use cases initially, but as the number of services and development teams increased and we scaled further, deploy times, self-healing capabilities and resource utilization started to become problematic," says Singh, who leads the infrastructure team at Crowdfire. "We realized that we needed a more cloud-native approach to deal with these issues."

- As he looked around for solutions, Singh had a checklist of what Crowdfire needed. "We wanted to keep some things separate so they could be shipped independent of other things; this would help remove blockers and let different teams work at their own pace," he says. "We also make a lot of data-driven decisions, so shipping a feature and its iterations quickly was a must."

- Kubernetes checked all the boxes and then some. "One of the best things was the built-in service discovery," he says. "When you have a bunch of microservices that need to call each other, having internal DNS readily available and service IPs and ports automatically set as environment variables help a lot." Plus, he adds, "Kubernetes’s opinionated approach made it easier to get started." +

"Kubernetes has helped us reduce the deployment time from 15 minutes to less than a minute," says Singh. "Due to Kubernetes's self-healing nature, the operations team doesn't need to do any manual intervention in case of a node or pod failure." Plus, he says, "Dev-Prod parity has improved since developers can experiment with options in dev/staging clusters, and when it's finalized, they just commit the config changes in the respective code repositories. These changes automatically get replicated on the production cluster via CI/CD pipelines."

-
-
-
-
- "We realized that we needed a more cloud-native approach to deal with these issues," says Singh. The team decided to implement a custom setup of Kubernetes based on Terraform and Ansible." -
-
-
-
- There was another compelling business reason for the cloud-native approach. "In today’s world of ever-changing business requirements, using cloud native technology provides a variety of options to choose from—even the ability to run services in a hybrid cloud environment," says Singh. "Businesses can keep services in a region closest to the users, and thus benefit from high-availability and resiliency."

- So in February 2016, Singh set up a test Kubernetes cluster using the kube-up scripts provided. "I explored the features and was able to deploy an application pretty easily," he says. "However, it seemed like a black box since I didn’t understand the components completely, and had no idea what the kube-up script did under the hood. So when it broke, it was hard to find the issue and fix it."

- To get a better understanding, Singh dove into the internals of Kubernetes, reading the docs and even some of the code. And he looked to the Kubernetes community for more insight. "I used to stay up a little late every night (a lot of users were active only when it’s night here in India) and would try to answer questions on the Kubernetes community Slack from users who were getting started," he says. "I would also follow other conversations closely. I must admit I was able to avoid a lot of issues in our setup because I knew others had faced the same issues."

- Based on the knowledge he gained, Singh decided to implement a custom setup of Kubernetes based on Terraform and Ansible. "I wrote Terraform to launch Kubernetes master and nodes (Auto Scaling Groups) and an Ansible playbook to install the required components," he says. (The company recently switched to using prebaked AMIs to make the node bringup faster, and is planning to change its networking layer.)

+{{< case-studies/quote author="Amanpreet Singh, Software Engineer at Crowdfire" >}} +"In the 15 months that we've been using Kubernetes, it has been amazing for us. It enabled us to iterate quickly, increase development speed, and continuously deliver new features and bug fixes to our users, while keeping our operational costs and infrastructure management overhead under control." +{{< /case-studies/quote >}} -
-
-
-
- "Kubernetes helped us reduce the deployment time from 15 minutes to less than a minute. Due to Kubernetes’s self-healing nature, the operations team doesn’t need to do any manual intervention in case of a node or pod failure." -
-
+{{< case-studies/lead >}} +"If you build it, they will come." +{{< /case-studies/lead >}} -
-
- First, the team migrated a few staging services from Elastic Beanstalk to the new Kubernetes staging cluster, and then set up a production cluster a month later to deploy some services. The results were convincing. "By the end of March 2016, we established that all the new services must be deployed on Kubernetes," says Singh. "Kubernetes helped us reduce the deployment time from 15 minutes to less than a minute. Due to Kubernetes’s self-healing nature, the operations team doesn’t need to do any manual intervention in case of a node or pod failure." On top of that, he says, "Dev-Prod parity has improved since developers can experiment with options in dev/staging clusters, and when it’s finalized, they just commit the config changes in the respective code repositories. These changes automatically get replicated on the production cluster via CI/CD pipelines. This brings more visibility into the changes being made, and keeping an audit trail."

- Over the next six months, the team worked on migrating all the services from Elastic Beanstalk to Kubernetes, except for the few that were deprecated and would soon be terminated anyway. The services were moved one at a time, and their performance was monitored for two to three days each. Today, "We’re completely migrated and we run all new services on Kubernetes," says Singh.

- The impact has been considerable: With Kubernetes, the company has experienced a 90% cost savings on Elastic Load Balancer, which is now only used for their public, user-facing services. Their EC2 operating expenses have been decreased by as much as 50%.

- All 30 engineers at Crowdfire were onboarded at once. "I gave an internal talk where I shared the basic components and demoed the usage of kubectl," says Singh. "Everyone was excited and happy about using Kubernetes. Developers have more control and visibility into their applications running in production now. Most of all, they’re happy with the low deploy times and self-healing services."

- And they’re much more productive, too. "Where we used to do about 5 deployments per day," says Singh, "now we’re doing 30+ production and 50+ staging deployments almost every day." +

For most content creators, only half of that movie quote may ring true. Sure, platforms like Wordpress, YouTube and Shopify have made it simple for almost anyone to start publishing new content online, but attracting an audience isn't as easy. Crowdfire "helps users publish their content to all possible places where their audience exists," says Amanpreet Singh, a Software Engineer at the company based in Mumbai, India. Crowdfire has gained more than 16 million users—from bloggers and artists to makers and small businesses—since its launch in 2010.

+

With that kind of growth—and a high demand from users for new features and continuous improvements—the Crowdfire team struggled to keep up behind the scenes. In 2015, they moved their monolith Java application to Amazon Web Services Elastic Beanstalk and started breaking it down into microservices.

-
-
-
-
- The impact has been considerable: With Kubernetes, the company has experienced a 90% cost savings on Elastic Load Balancer, which is now only used for their public, user-facing services. Their EC2 operating expenses have been decreased by as much as 50%. +

It was a good first step, but the team soon realized they needed to go further down the cloud-native path, which would lead them to Kubernetes. "It was okay for our use cases initially, but as the number of services and development teams increased and we scaled further, deploy times, self-healing capabilities and resource utilization started to become problematic," says Singh, who leads the infrastructure team at Crowdfire. "We realized that we needed a more cloud-native approach to deal with these issues."

-
-
-
-
+

As he looked around for solutions, Singh had a checklist of what Crowdfire needed. "We wanted to keep some things separate so they could be shipped independent of other things; this would help remove blockers and let different teams work at their own pace," he says. "We also make a lot of data-driven decisions, so shipping a feature and its iterations quickly was a must."

- Singh notes that almost all of the engineers interact with the staging cluster on a daily basis, and that has created a cultural change at Crowdfire. "Developers are more aware of the cloud infrastructure now," he says. "They’ve started following cloud best practices like better health checks, structured logs to stdout [standard output], and config via files or environment variables."

- With Crowdfire’s commitment to Kubernetes, Singh is looking to expand the company’s cloud-native stack. The team already uses Prometheus for monitoring, and he says he is evaluating Linkerd and Envoy Proxy as a way to "get more metrics about request latencies and failures, and handle them better." Other CNCF projects, including OpenTracing and gRPC are also on his radar.

- Singh has found that the cloud-native community is growing in India, too, particularly in Bangalore. "A lot of startups and new companies are starting to run their infrastructure on Kubernetes," he says.

- And when people ask him about Crowdfire’s experience, he has this advice to offer: "Kubernetes is a great piece of technology, but it might not be right for you, especially if you have just one or two services or your app isn’t easy to run in a containerized environment," he says. "Assess your situation and the value that Kubernetes provides before going all in. If you do decide to use Kubernetes, make sure you understand the components that run under the hood and what role they play in smoothly running the cluster. Another thing to consider is if your apps are ‘Kubernetes-ready,’ meaning if they have proper health checks and handle termination signals to shut down gracefully."

- And if your company fits that profile, go for it. Crowdfire clearly did—and is now reaping the benefits. "In the 15 months that we’ve been using Kubernetes, it has been amazing for us," says Singh. "It enabled us to iterate quickly, increase development speed and continuously deliver new features and bug fixes to our users, while keeping our operational costs and infrastructure management overhead under control." +

Kubernetes checked all the boxes and then some. "One of the best things was the built-in service discovery," he says. "When you have a bunch of microservices that need to call each other, having internal DNS readily available and service IPs and ports automatically set as environment variables help a lot." Plus, he adds, "Kubernetes's opinionated approach made it easier to get started."

+{{< case-studies/quote image="/images/case-studies/crowdfire/banner3.jpg" >}} +"We realized that we needed a more cloud-native approach to deal with these issues," says Singh. The team decided to implement a custom setup of Kubernetes based on Terraform and Ansible." +{{< /case-studies/quote >}} -
-
+

There was another compelling business reason for the cloud-native approach. "In today's world of ever-changing business requirements, using cloud native technology provides a variety of options to choose from—even the ability to run services in a hybrid cloud environment," says Singh. "Businesses can keep services in a region closest to the users, and thus benefit from high-availability and resiliency."

+ +

So in February 2016, Singh set up a test Kubernetes cluster using the kube-up scripts provided. "I explored the features and was able to deploy an application pretty easily," he says. "However, it seemed like a black box since I didn't understand the components completely, and had no idea what the kube-up script did under the hood. So when it broke, it was hard to find the issue and fix it."

+ +

To get a better understanding, Singh dove into the internals of Kubernetes, reading the docs and even some of the code. And he looked to the Kubernetes community for more insight. "I used to stay up a little late every night (a lot of users were active only when it's night here in India) and would try to answer questions on the Kubernetes community Slack from users who were getting started," he says. "I would also follow other conversations closely. I must admit I was able to avoid a lot of issues in our setup because I knew others had faced the same issues."

+ +

Based on the knowledge he gained, Singh decided to implement a custom setup of Kubernetes based on Terraform and Ansible. "I wrote Terraform to launch Kubernetes master and nodes (Auto Scaling Groups) and an Ansible playbook to install the required components," he says. (The company recently switched to using prebaked AMIs to make the node bringup faster, and is planning to change its networking layer.)

+ +{{< case-studies/quote image="/images/case-studies/crowdfire/banner4.jpg" >}} +"Kubernetes helped us reduce the deployment time from 15 minutes to less than a minute. Due to Kubernetes's self-healing nature, the operations team doesn't need to do any manual intervention in case of a node or pod failure." +{{< /case-studies/quote >}} + +

First, the team migrated a few staging services from Elastic Beanstalk to the new Kubernetes staging cluster, and then set up a production cluster a month later to deploy some services. The results were convincing. "By the end of March 2016, we established that all the new services must be deployed on Kubernetes," says Singh. "Kubernetes helped us reduce the deployment time from 15 minutes to less than a minute. Due to Kubernetes's self-healing nature, the operations team doesn't need to do any manual intervention in case of a node or pod failure." On top of that, he says, "Dev-Prod parity has improved since developers can experiment with options in dev/staging clusters, and when it's finalized, they just commit the config changes in the respective code repositories. These changes automatically get replicated on the production cluster via CI/CD pipelines. This brings more visibility into the changes being made, and keeping an audit trail."

+ +

Over the next six months, the team worked on migrating all the services from Elastic Beanstalk to Kubernetes, except for the few that were deprecated and would soon be terminated anyway. The services were moved one at a time, and their performance was monitored for two to three days each. Today, "We're completely migrated and we run all new services on Kubernetes," says Singh.

+ +

The impact has been considerable: With Kubernetes, the company has experienced a 90% cost savings on Elastic Load Balancer, which is now only used for their public, user-facing services. Their EC2 operating expenses have been decreased by as much as 50%.

+ +

All 30 engineers at Crowdfire were onboarded at once. "I gave an internal talk where I shared the basic components and demoed the usage of kubectl," says Singh. "Everyone was excited and happy about using Kubernetes. Developers have more control and visibility into their applications running in production now. Most of all, they're happy with the low deploy times and self-healing services."

+ +

And they're much more productive, too. "Where we used to do about 5 deployments per day," says Singh, "now we're doing 30+ production and 50+ staging deployments almost every day."

+ +{{< case-studies/quote >}} +The impact has been considerable: With Kubernetes, the company has experienced a 90% cost savings on Elastic Load Balancer, which is now only used for their public, user-facing services. Their EC2 operating expenses have been decreased by as much as 50%. +{{< /case-studies/quote >}} + +

Singh notes that almost all of the engineers interact with the staging cluster on a daily basis, and that has created a cultural change at Crowdfire. "Developers are more aware of the cloud infrastructure now," he says. "They've started following cloud best practices like better health checks, structured logs to stdout [standard output], and config via files or environment variables."

+ +

With Crowdfire's commitment to Kubernetes, Singh is looking to expand the company's cloud-native stack. The team already uses Prometheus for monitoring, and he says he is evaluating Linkerd and Envoy Proxy as a way to "get more metrics about request latencies and failures, and handle them better." Other CNCF projects, including OpenTracing and gRPC are also on his radar.

+ +

Singh has found that the cloud-native community is growing in India, too, particularly in Bangalore. "A lot of startups and new companies are starting to run their infrastructure on Kubernetes," he says.

+ +

And when people ask him about Crowdfire's experience, he has this advice to offer: "Kubernetes is a great piece of technology, but it might not be right for you, especially if you have just one or two services or your app isn't easy to run in a containerized environment," he says. "Assess your situation and the value that Kubernetes provides before going all in. If you do decide to use Kubernetes, make sure you understand the components that run under the hood and what role they play in smoothly running the cluster. Another thing to consider is if your apps are 'Kubernetes-ready,' meaning if they have proper health checks and handle termination signals to shut down gracefully."

+ +

And if your company fits that profile, go for it. Crowdfire clearly did—and is now reaping the benefits. "In the 15 months that we've been using Kubernetes, it has been amazing for us," says Singh. "It enabled us to iterate quickly, increase development speed and continuously deliver new features and bug fixes to our users, while keeping our operational costs and infrastructure management overhead under control."

diff --git a/content/en/case-studies/denso/index.html b/content/en/case-studies/denso/index.html index 27ef1c77ed..31fb1279b9 100644 --- a/content/en/case-studies/denso/index.html +++ b/content/en/case-studies/denso/index.html @@ -3,108 +3,80 @@ title: Denso Case Study linkTitle: Denso case_study_styles: true cid: caseStudies -css: /css/case-studies-gradient.css logo: denso_featured_logo.svg featured: true weight: 4 quote: > We got Kubernetes experts involved on our team, and it dramatically accelerated development speed. + +new_case_study_styles: true +heading_background: /images/case-studies/denso/banner2.jpg +heading_title_text: Denso +use_gradient_overlay: true +subheading: > + How DENSO Is Fueling Development on the Vehicle Edge with Kubernetes +case_study_details: + - Company: Denso + - Location: Japan + - Industry: Automotive, Edge --- +

Challenge

-
-

CASE STUDY: Denso

-
How DENSO Is Fueling Development on the Vehicle Edge with Kubernetes
-
+

DENSO Corporation is one of the biggest automotive components suppliers in the world. With the advent of connected cars, the company launched a Digital Innovation Department to expand into software, working on vehicle edge and vehicle cloud products. But there were several technical challenges to creating an integrated vehicle edge/cloud platform: "the amount of computing resources, the occasional lack of mobile signal, and an enormous number of distributed vehicles," says R&D Product Manager Seiichi Koizumi.

-
- Company  Denso     Location  Japan     Industry  Automotive, Edge -
+

Solution

-
-
-
-
-

Challenge

- DENSO Corporation is one of the biggest automotive components suppliers in the world. With the advent of connected cars, the company launched a Digital Innovation Department to expand into software, working on vehicle edge and vehicle cloud products. But there were several technical challenges to creating an integrated vehicle edge/cloud platform: "the amount of computing resources, the occasional lack of mobile signal, and an enormous number of distributed vehicles," says R&D Product Manager Seiichi Koizumi. - - -

Solution

- Koizumi’s team realized that because mobility services evolve every day, they needed the flexibility of the cloud native ecosystem for their platform. After considering other orchestrators, DENSO went with Kubernetes for orchestration and added Prometheus, Fluentd, Envoy, Istio, and Helm to the platform. Today, DENSO is using a vehicle edge computer, a private Kubernetes cloud, and managed Kubernetes (GKE, EKS, AKS). +

Koizumi's team realized that because mobility services evolve every day, they needed the flexibility of the cloud native ecosystem for their platform. After considering other orchestrators, DENSO went with Kubernetes for orchestration and added Prometheus, Fluentd, Envoy, Istio, and Helm to the platform. Today, DENSO is using a vehicle edge computer, a private Kubernetes cloud, and managed Kubernetes (GKE, EKS, AKS).

Impact

- Critical layer features can take 2-3 years to implement in the traditional, waterfall model of development at DENSO. With the Kubernetes platform and agile methods, there’s a 2-month development cycle for non-critical software. Now, ten new applications are released a year, and a new prototype is introduced every week. "By utilizing Kubernetes managed services, such as GKE/EKS/AKS, we can unify the environment and simplify our maintenance operation," says Koizumi. -
-
-
-
- "Another disruptive innovation is coming, so to survive in this situation, we need to change our culture." -

- SEIICHI KOIZUMI, R&D PRODUCT MANAGER, DIGITAL INNOVATION DEPARTMENT AT DENSO

-
-
+

Critical layer features can take 2-3 years to implement in the traditional, waterfall model of development at DENSO. With the Kubernetes platform and agile methods, there's a 2-month development cycle for non-critical software. Now, ten new applications are released a year, and a new prototype is introduced every week. "By utilizing Kubernetes managed services, such as GKE/EKS/AKS, we can unify the environment and simplify our maintenance operation," says Koizumi.

-
-
-

Spun off from Toyota in 1949, DENSO Corporation is one of the top automotive suppliers in the world today, with consolidated net revenue of $48.3 billion. -

+{{< case-studies/quote + image="/images/case-studies/denso/banner1.png" + author="SEIICHI KOIZUMI, R&D PRODUCT MANAGER, DIGITAL INNOVATION DEPARTMENT AT DENSO" +>}} +"Another disruptive innovation is coming, so to survive in this situation, we need to change our culture." +{{< /case-studies/quote >}} -

The company’s mission is "contributing to a better world by creating value together with a vision for the future"—and part of that vision in recent years has been development on the vehicle edge and vehicle cloud.

+{{< case-studies/lead >}} +Spun off from Toyota in 1949, DENSO Corporation is one of the top automotive suppliers in the world today, with consolidated net revenue of $48.3 billion. +{{< /case-studies/lead >}} -

With the advent of connected cars, DENSO established a Digital Innovation Department to expand its business beyond the critical layer of the engine, braking systems, and other automotive parts into the non-critical analytics and entertainment layer. Comparing connected cars to smartphones, R&D Product Manager Seiichi Koizumi says DENSO wants the ability to quickly and easily develop and install apps for the "blank slate" of the car, and iterate them based on the driver’s preferences. Thus "we need a flexible application platform," he says.

+

The company's mission is "contributing to a better world by creating value together with a vision for the future"—and part of that vision in recent years has been development on the vehicle edge and vehicle cloud.

+ +

With the advent of connected cars, DENSO established a Digital Innovation Department to expand its business beyond the critical layer of the engine, braking systems, and other automotive parts into the non-critical analytics and entertainment layer. Comparing connected cars to smartphones, R&D Product Manager Seiichi Koizumi says DENSO wants the ability to quickly and easily develop and install apps for the "blank slate" of the car, and iterate them based on the driver's preferences. Thus "we need a flexible application platform," he says.

But working on vehicle edge and vehicle cloud products meant there were several technical challenges: "the amount of computing resources, the occasional lack of mobile signal, and an enormous number of distributed vehicles," says Koizumi. "We are tackling these challenges to create an integrated vehicle edge/cloud platform."

+{{< case-studies/quote author="SEIICHI KOIZUMI, R&D PRODUCT MANAGER, DIGITAL INNOVATION DEPARTMENT AT DENSO" >}} +"We got Kubernetes experts involved on our team, and it dramatically accelerated development speed." +{{< /case-studies/quote >}} -
-
+

Koizumi's team realized that because mobility services evolve every day, they needed the flexibility of the cloud native ecosystem for their platform. As they evaluated technologies, they were led by these criteria: Because their service-enabler business needed to support multiple cloud and on-premise environments, the solution needed to be cloud agnostic, with no vendor lock-in and open governance. It also had to support an edge-cloud integrated environment.

-
-
- "We got Kubernetes experts involved on our team, and it dramatically accelerated development speed."

— SEIICHI KOIZUMI, R&D PRODUCT MANAGER, DIGITAL INNOVATION DEPARTMENT AT DENSO

-
-
+

After considering other orchestrators, DENSO went with Kubernetes for orchestration and added Prometheus, Fluentd, Envoy, Istio, and Helm to the platform. During implementation, the team used "design thinking to clarify use cases and their value proposition," says Koizumi. Next, an agile development team worked on a POC, then an MVP, in DevOps style. "Even in the development phase, we are keeping a channel to end users," he adds.

-
-
-

- Koizumi’s team realized that because mobility services evolve every day, they needed the flexibility of the cloud native ecosystem for their platform. As they evaluated technologies, they were led by these criteria: Because their service-enabler business needed to support multiple cloud and on-premise environments, the solution needed to be cloud agnostic, with no vendor lock-in and open governance. It also had to support an edge-cloud integrated environment.

-

- After considering other orchestrators, DENSO went with Kubernetes for orchestration and added Prometheus, Fluentd, Envoy, Istio, and Helm to the platform. During implementation, the team used "design thinking to clarify use cases and their value proposition," says Koizumi. Next, an agile development team worked on a POC, then an MVP, in DevOps style. "Even in the development phase, we are keeping a channel to end users," he adds.

-

- One lesson learned during this process was the value of bringing in experts. "We tried to learn Kubernetes and cloud native technologies from scratch, but it took more time than expected," says Koizumi. "We got Kubernetes experts involved on our team, and it dramatically accelerated development speed."

+

One lesson learned during this process was the value of bringing in experts. "We tried to learn Kubernetes and cloud native technologies from scratch, but it took more time than expected," says Koizumi. "We got Kubernetes experts involved on our team, and it dramatically accelerated development speed."

-

-
-
+{{< case-studies/quote + image="/images/case-studies/denso/banner4.jpg" + author="SEIICHI KOIZUMI, R&D PRODUCT MANAGER, DIGITAL INNOVATION DEPARTMENT AT DENSO" +>}} +"By utilizing Kubernetes managed services, such as GKE/EKS/AKS, we can unify the environment and simplify our maintenance operation." +{{< /case-studies/quote >}} +

Today, DENSO is using a vehicle edge computer, a private Kubernetes cloud, and managed Kubernetes on GKE, EKS, and AKS. "We are developing a vehicle edge/cloud integrated platform based on a microservice and service mesh architecture," says Koizumi. "We extend cloud into multiple vehicle edges and manage it as a unified platform."

-
-
- "By utilizing Kubernetes managed services, such as GKE/EKS/AKS, we can unify the environment and simplify our maintenance operation."

- SEIICHI KOIZUMI, R&D PRODUCT MANAGER, DIGITAL INNOVATION DEPARTMENT AT DENSO

-
-
+

Cloud native has enabled DENSO to deliver applications via its new dash cam, which has a secure connection that collects data to the cloud. "It's like a smartphone," he says. "We are installing new applications and getting the data through the cloud, and we can keep updating new applications all through the dash cam."

-
-
-

- Today, DENSO is using a vehicle edge computer, a private Kubernetes cloud, and managed Kubernetes on GKE, EKS, and AKS. "We are developing a vehicle edge/cloud integrated platform based on a microservice and service mesh architecture," says Koizumi. "We extend cloud into multiple vehicle edges and manage it as a unified platform."

-

- Cloud native has enabled DENSO to deliver applications via its new dash cam, which has a secure connection that collects data to the cloud. "It’s like a smartphone," he says. "We are installing new applications and getting the data through the cloud, and we can keep updating new applications all through the dash cam."

-

- The unified cloud native platform, combined with agile development, has had a positive impact on productivity. Critical layer features—those involving engines or braking systems, for example—can take 2-3 years to implement at DENSO, because of the time needed to test safety, but also because of the traditional, waterfall model of development. With the Kubernetes platform and agile methods, there’s a 2-month development cycle for non-critical software. Now, ten new applications are released a year, and with the department’s scrum-style development, a new prototype is introduced every week.

-

- Application portability has also led to greater developer efficiency. "There’s no need to care about differences in the multi-cloud platform anymore," says Koizumi. Now, "we are also trying to have the same portability between vehicle edge and cloud platform." -

-

- Another improvement: Automotive Tier-1 suppliers like DENSO always have multiple Tier-2 suppliers. "To provide automotive-grade high-availability services, we tried to do the same thing on a multi-cloud platform," says Koizumi. Before Kubernetes, maintaining two different systems simultaneously was difficult. "By utilizing Kubernetes managed services, such as GKE/EKS/AKS, we can unify the environment and simplify our maintenance operation," he says. -

-

-Cloud native has also profoundly changed the culture at DENSO. The Digital Innovation Department is known as "Noah’s Ark," and it has grown from 2 members to 70—with plans to more than double in the next year. The way they operate is completely different from the traditional Japanese automotive culture. But just as the company embraced change brought by hybrid cars in the past decade, Koizumi says, they’re doing it again now, as technology companies have moved into the connected car space. "Another disruptive innovation is coming," he says, "so to survive in this situation, we need to change our culture." -

-

-Looking ahead, Koizumi and his team are expecting serverless and zero-trust security architecture to be important enhancements of Kubernetes. They are glad DENSO has come along for the ride. "Mobility service businesses require agility and flexibility," he says. "DENSO is trying to bring cloud native flexibility into the vehicle infrastructure." -

-
-
- +

The unified cloud native platform, combined with agile development, has had a positive impact on productivity. Critical layer features—those involving engines or braking systems, for example—can take 2-3 years to implement at DENSO, because of the time needed to test safety, but also because of the traditional, waterfall model of development. With the Kubernetes platform and agile methods, there's a 2-month development cycle for non-critical software. Now, ten new applications are released a year, and with the department's scrum-style development, a new prototype is introduced every week.

+ +

Application portability has also led to greater developer efficiency. "There's no need to care about differences in the multi-cloud platform anymore," says Koizumi. Now, "we are also trying to have the same portability between vehicle edge and cloud platform."

+ +

Another improvement: Automotive Tier-1 suppliers like DENSO always have multiple Tier-2 suppliers. "To provide automotive-grade high-availability services, we tried to do the same thing on a multi-cloud platform," says Koizumi. Before Kubernetes, maintaining two different systems simultaneously was difficult. "By utilizing Kubernetes managed services, such as GKE/EKS/AKS, we can unify the environment and simplify our maintenance operation," he says.

+ +

Cloud native has also profoundly changed the culture at DENSO. The Digital Innovation Department is known as "Noah's Ark," and it has grown from 2 members to 70—with plans to more than double in the next year. The way they operate is completely different from the traditional Japanese automotive culture. But just as the company embraced change brought by hybrid cars in the past decade, Koizumi says, they're doing it again now, as technology companies have moved into the connected car space. "Another disruptive innovation is coming," he says, "so to survive in this situation, we need to change our culture."

+ +

Looking ahead, Koizumi and his team are expecting serverless and zero-trust security architecture to be important enhancements of Kubernetes. They are glad DENSO has come along for the ride. "Mobility service businesses require agility and flexibility," he says. "DENSO is trying to bring cloud native flexibility into the vehicle infrastructure."

diff --git a/content/en/case-studies/golfnow/index.html b/content/en/case-studies/golfnow/index.html index f4bf4d4f27..9e82d90cd1 100644 --- a/content/en/case-studies/golfnow/index.html +++ b/content/en/case-studies/golfnow/index.html @@ -1,125 +1,89 @@ --- title: GolfNow Case Study - case_study_styles: true cid: caseStudies -css: /css/style_golfnow.css + +new_case_study_styles: true +heading_background: /images/case-studies/golfnow/banner1.jpg +heading_title_logo: /images/golfnow_logo.png +subheading: > + Saving Time and Money with Cloud Native Infrastructure +case_study_details: + - Company: GolfNow + - Location: Orlando, Florida + - Industry: Golf Industry Technology and Services Provider --- -
-

CASE STUDY:
-
Saving Time and Money with Cloud Native Infrastructure
-

-
+

Challenge

-
- Company GolfNow     Location Orlando, Florida     Industry Golf Industry Technology and Services Provider -
+

A member of the NBC Sports Group, GolfNow is the golf industry's technology and services leader, managing 10 different products, as well as the largest e-commerce tee time marketplace in the world. As its business began expanding rapidly and globally, GolfNow's monolithic application became problematic. "We kept growing our infrastructure vertically rather than horizontally, and the cost of doing business became problematic," says Sheriff Mohamed, GolfNow's Director, Architecture. "We wanted the ability to more easily expand globally."

-
+

Solution

-
-
-
+

Turning to microservices and containerization, GolfNow began moving its applications and databases from third-party services to its own clusters running on Docker and Kubernetes.

-

Challenge

- A member of the NBC Sports Group, GolfNow is the golf industry’s technology and services leader, managing 10 different products, as well as the largest e-commerce tee time marketplace in the world. As its business began expanding rapidly and globally, GolfNow’s monolithic application became problematic. "We kept growing our infrastructure vertically rather than horizontally, and the cost of doing business became problematic," says Sheriff Mohamed, GolfNow’s Director, Architecture. "We wanted the ability to more easily expand globally." -
-
+

Impact

-
-

Solution

- Turning to microservices and containerization, GolfNow began moving its applications and databases from third-party services to its own clusters running on Docker and Kubernetes.

+

The results were immediate. While maintaining the same capacity—and beyond, during peak periods—GolfNow saw its infrastructure costs for the first application virtually cut in half.

-

Impact

- The results were immediate. While maintaining the same capacity—and beyond, during peak periods—GolfNow saw its infrastructure costs for the first application virtually cut in half. -
-
-
+{{< case-studies/quote author="SHERIFF MOHAMED, DIRECTOR, ARCHITECTURE AT GOLFNOW" >}} +"With our growth we obviously needed to expand our infrastructure, and we kept growing vertically rather than horizontally. We were basically wasting money and doubling the cost of our infrastructure." +{{< /case-studies/quote >}} +{{< case-studies/lead >}} +It's not every day that you can say you've slashed an operating expense by half. +{{< /case-studies/lead >}} -
-
- "With our growth we obviously needed to expand our infrastructure, and we kept growing vertically rather than horizontally. We were basically wasting money and doubling the cost of our infrastructure."

- SHERIFF MOHAMED, DIRECTOR, ARCHITECTURE AT GOLFNOW -
-
+

But Sheriff Mohamed and Josh Chandler did just that when they helped lead their company, GolfNow, on a journey from a monolithic to a containerized, cloud native infrastructure managed by Kubernetes.

-
-
-

It’s not every day that you can say you’ve slashed an operating expense by half.

+

A top-performing business within the NBC Sports Group, GolfNow is a technology and services company with the largest tee time marketplace in the world. GolfNow serves 5 million active golfers across 10 different products. In recent years, the business had grown so fast that the infrastructure supporting their giant monolithic application (written in C#.NET and backed by SQL Server database management system) could not keep up. "With our growth we obviously needed to expand our infrastructure, and we kept growing vertically rather than horizontally," says Sheriff, GolfNow's Director, Architecture. "Our costs were growing exponentially. And on top of that, we had to build a Disaster Recovery (DR) environment, which then meant we'd have to copy exactly what we had in our original data center to another data center that was just the standby. We were basically wasting money and doubling the cost of our infrastructure."

- But Sheriff Mohamed and Josh Chandler did just that when they helped lead their company, GolfNow, on a journey from a monolithic to a containerized, cloud native infrastructure managed by Kubernetes. -

- A top-performing business within the NBC Sports Group, GolfNow is a technology and services company with the largest tee time marketplace in the world. GolfNow serves 5 million active golfers across 10 different products. In recent years, the business had grown so fast that the infrastructure supporting their giant monolithic application (written in C#.NET and backed by SQL Server database management system) could not keep up. "With our growth we obviously needed to expand our infrastructure, and we kept growing vertically rather than horizontally," says Sheriff, GolfNow’s Director, Architecture. "Our costs were growing exponentially. And on top of that, we had to build a Disaster Recovery (DR) environment, which then meant we’d have to copy exactly what we had in our original data center to another data center that was just the standby. We were basically wasting money and doubling the cost of our infrastructure." -

- In moving just the first of GolfNow’s important applications—a booking engine for golf courses and B2B marketing platform—from third-party services to their own Kubernetes environment, "our bill went down drastically," says Sheriff. -

- The path to those stellar results began in late 2014. In order to support GolfNow’s global growth, the team decided that the company needed to have multiple data centers and the ability to quickly and easily re-route traffic as needed. "From there we knew that we needed to go in a direction of breaking things apart, microservices, and containerization," says Sheriff. "At the time we were trying to get away from C#.NET and SQL Server since it didn’t run very well on Linux, where everything container was running smoothly." -

- To that end, the team shifted to working with Node.js, the open-source, cross-platform JavaScript runtime environment for developing tools and applications, and MongoDB, the open-source database program. At the time, Docker, the platform for deploying applications in containers, was still new. But once the team began experimenting with it, Sheriff says, "we realized that was the way we wanted to go, especially since that’s the way the industry is heading." -
-
+

In moving just the first of GolfNow's important applications—a booking engine for golf courses and B2B marketing platform—from third-party services to their own Kubernetes environment, "our bill went down drastically," says Sheriff.

-
-
- "The team migrated the rest of the application into their Kubernetes cluster. And the impact was immediate: On top of cutting monthly costs by a large percentage, says Sheriff, 'Running at the same capacity and during our peak time, we were able to horizontally grow. Since we were using our VMs more efficiently with containers, we didn’t have to pay extra money at all.'" -
-
+

The path to those stellar results began in late 2014. In order to support GolfNow's global growth, the team decided that the company needed to have multiple data centers and the ability to quickly and easily re-route traffic as needed. "From there we knew that we needed to go in a direction of breaking things apart, microservices, and containerization," says Sheriff. "At the time we were trying to get away from C#.NET and SQL Server since it didn't run very well on Linux, where everything container was running smoothly."

-
-
- GolfNow’s dev team ran an "internal, low-key" proof of concept and were won over. "We really liked how easy it was to be able to pass containers around to each other and have them up and running in no time, exactly the way it was running on my machine," says Sheriff. "Because that is always the biggest gripe that Ops has with developers, right? ‘It worked on my machine!’ But then we started getting to the point of, ‘How do we make sure that these things stay up and running?’"

- That led the team on a quest to find the right orchestration system for the company’s needs. Sheriff says the first few options they tried were either too heavy or "didn’t feel quite right." In late summer 2015, they discovered the just-released Kubernetes, which Sheriff immediately liked for its ease of use. "We did another proof of concept," he says, "and Kubernetes won because of the fact that the community backing was there, built on top of what Google had already done." -

- But before they could go with Kubernetes, NBC, GolfNow’s parent company, also asked them to comparison shop with another company. Sheriff and his team liked the competing company’s platform user interface, but didn’t like that its platform would not allow containers to run natively on Docker. With no clear decision in sight, Sheriff’s VP at GolfNow, Steve McElwee, set up a three-month trial during which a GolfNow team (consisting of Sheriff and Josh, who’s now Lead Architect, Open Platforms) would build out a Kubernetes environment, and a large NBC team would build out one with the other company’s platform. -

- "We spun up the cluster and we tried to get everything to run the way we wanted it to run," Sheriff says. "The biggest thing that we took away from it is that not only did we want our applications to run within Kubernetes and Docker, we also wanted our databases to run there. We literally wanted our entire infrastructure to run within Kubernetes." -

- At the time there was nothing in the community to help them get Kafka and MongoDB clusters running within a Kubernetes and Docker environment, so Sheriff and Josh figured it out on their own, taking a full month to get it right. "Everything started rolling from there," Sheriff says. "We were able to get all our applications connected, and we finished our side of the proof of concept a month in advance. My VP was like, ‘Alright, it’s over. Kubernetes wins.’" -

- The next step, beginning in January 2016, was getting everything working in production. The team focused first on one application that was already written in Node.js and MongoDB. A booking engine for golf courses and B2B marketing platform, the application was already going in the microservice direction but wasn’t quite finished yet. At the time, it was running in Heroku Compose and other third-party services—resulting in a large monthly bill. +

To that end, the team shifted to working with Node.js, the open-source, cross-platform JavaScript runtime environment for developing tools and applications, and MongoDB, the open-source database program. At the time, Docker, the platform for deploying applications in containers, was still new. But once the team began experimenting with it, Sheriff says, "we realized that was the way we wanted to go, especially since that's the way the industry is heading."

-
-
+{{< case-studies/quote image="/images/case-studies/golfnow/banner3.jpg" >}} +"The team migrated the rest of the application into their Kubernetes cluster. And the impact was immediate: On top of cutting monthly costs by a large percentage, says Sheriff, 'Running at the same capacity and during our peak time, we were able to horizontally grow. Since we were using our VMs more efficiently with containers, we didn't have to pay extra money at all.'" +{{< /case-studies/quote >}} -
-
- "'The time I spent actually moving the applications was under 30 seconds! We can move data centers in just incredible amounts of time. If you haven’t come from the Kubernetes world you wouldn’t believe me.' Sheriff puts it in these terms: 'Before Kubernetes I wasn’t sleeping at night, literally. I was woken up all the time, because things were down. After Kubernetes, I’ve been sleeping at night.'" -
-
+

GolfNow's dev team ran an "internal, low-key" proof of concept and were won over. "We really liked how easy it was to be able to pass containers around to each other and have them up and running in no time, exactly the way it was running on my machine," says Sheriff. "Because that is always the biggest gripe that Ops has with developers, right? 'It worked on my machine!' But then we started getting to the point of, 'How do we make sure that these things stay up and running?'"

-
-
- "The goal was to take all of that out and put it within this new platform we’ve created with Kubernetes on Google Compute Engine (GCE)," says Sheriff. "So we ended up building piece by piece, in parallel, what was out in Heroku and Compose, in our Kubernetes cluster. Then, literally, just switched configs in the background. So in Heroku we had the app running hitting a Compose database. We’d take the config, change it and make it hit the database that was running in our cluster." -

- Using this procedure, they were able to migrate piecemeal, without any downtime. The first migration was done during off hours, but to test the limits, the team migrated the second database in the middle of the day, when lots of users were running the application. "We did it," Sheriff says, "and again it was successful. Nobody noticed." -

- After three weeks of monitoring to make sure everything was running stable, the team migrated the rest of the application into their Kubernetes cluster. And the impact was immediate: On top of cutting monthly costs by a large percentage, says Sheriff, "Running at the same capacity and during our peak time, we were able to horizontally grow. Since we were using our VMs more efficiently with containers, we didn’t have to pay extra money at all." -

- Not only were they saving money, but they were also saving time. "I had a meeting this morning about migrating some applications from one cluster to another," says Josh. "I spent about 2 hours explaining the process. The time I spent actually moving the applications was under 30 seconds! We can move data centers in just incredible amounts of time. If you haven’t come from the Kubernetes world you wouldn’t believe me." Sheriff puts it in these terms: "Before Kubernetes I wasn’t sleeping at night, literally. I was woken up all the time, because things were down. After Kubernetes, I’ve been sleeping at night." -

- A small percentage of the applications on GolfNow have been migrated over to the Kubernetes environment. "Our Core Team is rewriting a lot of the .NET applications into .NET Core [which is compatible with Linux and Docker] so that we can run them within containers," says Sheriff. -

- Looking ahead, Sheriff and his team want to spend 2017 continuing to build a whole platform around Kubernetes with Drone, an open-source continuous delivery platform, to make it more developer-centric. "Now they’re able to manage configuration, they’re able to manage their deployments and things like that, making all these subteams that are now creating all these microservices, be self sufficient," he says. "So it can pull us away from applications and allow us to just make sure the cluster is running and healthy, and then actually migrate that over to our Ops team." +

That led the team on a quest to find the right orchestration system for the company's needs. Sheriff says the first few options they tried were either too heavy or "didn't feel quite right." In late summer 2015, they discovered the just-released Kubernetes, which Sheriff immediately liked for its ease of use. "We did another proof of concept," he says, "and Kubernetes won because of the fact that the community backing was there, built on top of what Google had already done."

-
-
+

But before they could go with Kubernetes, NBC, GolfNow's parent company, also asked them to comparison shop with another company. Sheriff and his team liked the competing company's platform user interface, but didn't like that its platform would not allow containers to run natively on Docker. With no clear decision in sight, Sheriff's VP at GolfNow, Steve McElwee, set up a three-month trial during which a GolfNow team (consisting of Sheriff and Josh, who's now Lead Architect, Open Platforms) would build out a Kubernetes environment, and a large NBC team would build out one with the other company's platform.

-
-
- "Having gone from complete newbies to production-ready in three months, the GolfNow team is eager to encourage other companies to follow their lead. 'This is The Six Million Dollar Man of the cloud right now,' adds Josh. 'Just try it out, watch it happen. I feel like the proof is in the pudding when you look at these kinds of application stacks. They’re faster, they’re more resilient.'" -
-
+

"We spun up the cluster and we tried to get everything to run the way we wanted it to run," Sheriff says. "The biggest thing that we took away from it is that not only did we want our applications to run within Kubernetes and Docker, we also wanted our databases to run there. We literally wanted our entire infrastructure to run within Kubernetes."

-
-
- And long-term, Sheriff has an even bigger goal for getting more people into the Kubernetes fold. "We’re actually trying to make this platform generic enough so that any of our sister companies can use it if they wish," he says. "Most definitely I think it can be used as a model. I think the way we migrated into it, the way we built it out, are all ways that I think other companies can learn from, and should not be afraid of." -

- The GolfNow team is also giving back to the Kubernetes community by open-sourcing a bot framework that Josh built. "We noticed that the dashboard user interface is actually moving a lot faster than when we started," says Sheriff. "However we realized what we needed was something that’s more of a bot that really helps us administer Kubernetes as a whole through Slack." Josh explains: "With the Kubernetes-Slack integration, you can essentially hook into a cluster and the issue commands and edit configurations. We’ve tried to simplify the security configuration as much as possible. We hope this will be our major thank you to Kubernetes, for everything you’ve given us." -

- Having gone from complete newbies to production-ready in three months, the GolfNow team is eager to encourage other companies to follow their lead. The lessons they’ve learned: "You’ve got to have buy-in from your boss," says Sheriff. "Another big deal is having two to three people dedicated to this type of endeavor. You can’t have people who are half in, half out." And if you don’t have buy-in from the get go, proving it out will get you there. -

- "This is The Six Million Dollar Man of the cloud right now," adds Josh. "Just try it out, watch it happen. I feel like the proof is in the pudding when you look at these kinds of application stacks. They’re faster, they’re more resilient." +

At the time there was nothing in the community to help them get Kafka and MongoDB clusters running within a Kubernetes and Docker environment, so Sheriff and Josh figured it out on their own, taking a full month to get it right. "Everything started rolling from there," Sheriff says. "We were able to get all our applications connected, and we finished our side of the proof of concept a month in advance. My VP was like, 'Alright, it's over. Kubernetes wins.'"

-
-
+

The next step, beginning in January 2016, was getting everything working in production. The team focused first on one application that was already written in Node.js and MongoDB. A booking engine for golf courses and B2B marketing platform, the application was already going in the microservice direction but wasn't quite finished yet. At the time, it was running in Heroku Compose and other third-party services—resulting in a large monthly bill.

+ +{{< case-studies/quote image="/images/case-studies/golfnow/banner4.jpg" >}} +"'The time I spent actually moving the applications was under 30 seconds! We can move data centers in just incredible amounts of time. If you haven't come from the Kubernetes world you wouldn't believe me.' Sheriff puts it in these terms: 'Before Kubernetes I wasn't sleeping at night, literally. I was woken up all the time, because things were down. After Kubernetes, I've been sleeping at night.'" +{{< /case-studies/quote >}} + +

"The goal was to take all of that out and put it within this new platform we've created with Kubernetes on Google Compute Engine (GCE)," says Sheriff. "So we ended up building piece by piece, in parallel, what was out in Heroku and Compose, in our Kubernetes cluster. Then, literally, just switched configs in the background. So in Heroku we had the app running hitting a Compose database. We'd take the config, change it and make it hit the database that was running in our cluster."

+ +

Using this procedure, they were able to migrate piecemeal, without any downtime. The first migration was done during off hours, but to test the limits, the team migrated the second database in the middle of the day, when lots of users were running the application. "We did it," Sheriff says, "and again it was successful. Nobody noticed."

+ +

After three weeks of monitoring to make sure everything was running stable, the team migrated the rest of the application into their Kubernetes cluster. And the impact was immediate: On top of cutting monthly costs by a large percentage, says Sheriff, "Running at the same capacity and during our peak time, we were able to horizontally grow. Since we were using our VMs more efficiently with containers, we didn't have to pay extra money at all."

+ +

Not only were they saving money, but they were also saving time. "I had a meeting this morning about migrating some applications from one cluster to another," says Josh. "I spent about 2 hours explaining the process. The time I spent actually moving the applications was under 30 seconds! We can move data centers in just incredible amounts of time. If you haven't come from the Kubernetes world you wouldn't believe me." Sheriff puts it in these terms: "Before Kubernetes I wasn't sleeping at night, literally. I was woken up all the time, because things were down. After Kubernetes, I've been sleeping at night."

+ +

A small percentage of the applications on GolfNow have been migrated over to the Kubernetes environment. "Our Core Team is rewriting a lot of the .NET applications into .NET Core [which is compatible with Linux and Docker] so that we can run them within containers," says Sheriff.

+ +

Looking ahead, Sheriff and his team want to spend 2017 continuing to build a whole platform around Kubernetes with Drone, an open-source continuous delivery platform, to make it more developer-centric. "Now they're able to manage configuration, they're able to manage their deployments and things like that, making all these subteams that are now creating all these microservices, be self sufficient," he says. "So it can pull us away from applications and allow us to just make sure the cluster is running and healthy, and then actually migrate that over to our Ops team."

+ +{{< case-studies/quote >}} +"Having gone from complete newbies to production-ready in three months, the GolfNow team is eager to encourage other companies to follow their lead. 'This is The Six Million Dollar Man of the cloud right now,' adds Josh. 'Just try it out, watch it happen. I feel like the proof is in the pudding when you look at these kinds of application stacks. They're faster, they're more resilient.'" +{{< /case-studies/quote >}} + +

And long-term, Sheriff has an even bigger goal for getting more people into the Kubernetes fold. "We're actually trying to make this platform generic enough so that any of our sister companies can use it if they wish," he says. "Most definitely I think it can be used as a model. I think the way we migrated into it, the way we built it out, are all ways that I think other companies can learn from, and should not be afraid of."

+ +

The GolfNow team is also giving back to the Kubernetes community by open-sourcing a bot framework that Josh built. "We noticed that the dashboard user interface is actually moving a lot faster than when we started," says Sheriff. "However we realized what we needed was something that's more of a bot that really helps us administer Kubernetes as a whole through Slack." Josh explains: "With the Kubernetes-Slack integration, you can essentially hook into a cluster and the issue commands and edit configurations. We've tried to simplify the security configuration as much as possible. We hope this will be our major thank you to Kubernetes, for everything you've given us."

+ +

Having gone from complete newbies to production-ready in three months, the GolfNow team is eager to encourage other companies to follow their lead. The lessons they've learned: "You've got to have buy-in from your boss," says Sheriff. "Another big deal is having two to three people dedicated to this type of endeavor. You can't have people who are half in, half out." And if you don't have buy-in from the get go, proving it out will get you there.

+ +

"This is The Six Million Dollar Man of the cloud right now," adds Josh. "Just try it out, watch it happen. I feel like the proof is in the pudding when you look at these kinds of application stacks. They're faster, they're more resilient."

diff --git a/content/en/case-studies/haufegroup/index.html b/content/en/case-studies/haufegroup/index.html index f4256ff569..580a282683 100644 --- a/content/en/case-studies/haufegroup/index.html +++ b/content/en/case-studies/haufegroup/index.html @@ -1,112 +1,85 @@ --- title: Haufe Group Case Study - case_study_styles: true cid: caseStudies -css: /css/style_haufegroup.css + +new_case_study_styles: true +heading_background: /images/case-studies/haufegroup/banner1.jpg +heading_title_logo: /images/haufegroup_logo.png +subheading: > + Paving the Way for Cloud Native for Midsize Companies +case_study_details: + - Company: Haufe Group + - Location: Freiburg, Germany + - Industry: Media and Software --- +

Challenge

-
-

CASE STUDY:
Paving the Way for Cloud Native for Midsize Companies

+

Founded in 1930 as a traditional publisher, Haufe Group has grown into a media and software company with 95 percent of its sales from digital products. Over the years, the company has gone from having "hardware in the basement" to outsourcing its infrastructure operations and IT. More recently, the development of new products, from Internet portals for tax experts to personnel training software, has created demands for increased speed, reliability and scalability. "We need to be able to move faster," says Solution Architect Martin Danielsson. "Adapting workloads is something that we really want to be able to do."

-
+

Solution

+

Haufe Group began its cloud-native journey when Microsoft Azure became available in Europe; the company needed cloud deployments for its desktop apps with bandwidth-heavy download services. "After that, it has been different projects trying out different things," says Danielsson. Two years ago, Holger Reinhardt joined Haufe Group as CTO and rapidly re-oriented the traditional host provider-based approach toward a cloud and API-first strategy.

-
- Company  Haufe Group     Location  Freiburg, Germany     Industry  Media and Software -
+

A core part of this strategy was a strong mandate to embrace infrastructure-as-code across the entire software deployment lifecycle via Docker. The company is now getting ready to go live with two services in production using Kubernetes orchestration on Microsoft Azure and Amazon Web Services. The team is also working on breaking up one of their core Java Enterprise desktop products into microservices to allow for better evolvability and dynamic scaling in the cloud.

-
- -
- -
-
-

Challenge

- Founded in 1930 as a traditional publisher, Haufe Group has grown into a media and software company with 95 percent of its sales from digital products. Over the years, the company has gone from having "hardware in the basement" to outsourcing its infrastructure operations and IT. More recently, the development of new products, from Internet portals for tax experts to personnel training software, has created demands for increased speed, reliability and scalability. "We need to be able to move faster," says Solution Architect Martin Danielsson. "Adapting workloads is something that we really want to be able to do." -
-
-

Solution

- Haufe Group began its cloud-native journey when Microsoft Azure became available in Europe; the company needed cloud deployments for its desktop apps with bandwidth-heavy download services. "After that, it has been different projects trying out different things," says Danielsson. Two years ago, Holger Reinhardt joined Haufe Group as CTO and rapidly re-oriented the traditional host provider-based approach toward a cloud and API-first strategy. -
-
- A core part of this strategy was a strong mandate to embrace infrastructure-as-code across the entire software deployment lifecycle via Docker. The company is now getting ready to go live with two services in production using Kubernetes orchestration on Microsoft Azure and Amazon Web Services. The team is also working on breaking up one of their core Java Enterprise desktop products into microservices to allow for better evolvability and dynamic scaling in the cloud. -
-

Impact

- With the ability to adapt workloads, Danielsson says, teams "will be able to scale down to around half the capacity at night, saving 30 percent of the hardware cost." Plus, shorter release times have had a major impact. "Before, we had to announce at least a week in advance when we wanted to do a release because there was a huge checklist of things that you had to do," he says. "By going cloud native, we have the infrastructure in place to be able to automate all of these things. Now we can get a new release done in half an hour instead of days." +

With the ability to adapt workloads, Danielsson says, teams "will be able to scale down to around half the capacity at night, saving 30 percent of the hardware cost." Plus, shorter release times have had a major impact. "Before, we had to announce at least a week in advance when we wanted to do a release because there was a huge checklist of things that you had to do," he says. "By going cloud native, we have the infrastructure in place to be able to automate all of these things. Now we can get a new release done in half an hour instead of days."

-
-
+{{< case-studies/quote author="Martin Danielsson, Solution Architect, Haufe Group" >}} +"Over the next couple of years, people won't even think that much about it when they want to run containers. Kubernetes is going to be the go-to solution." +{{< /case-studies/quote >}} -
+{{< case-studies/lead >}} +More than 80 years ago, Haufe Group was founded as a traditional publishing company, printing books and commentary on paper. +{{< /case-studies/lead >}} -
-
- "Over the next couple of years, people won’t even think that much about it when they want to run containers. Kubernetes is going to be the go-to solution."

- Martin Danielsson, Solution Architect, Haufe Group
-
-
+

By the 1990s, though, the company's leaders recognized that the future was digital, and to their credit, were able to transform Haufe Group into a media and software business that now gets 95 percent of its sales from digital products. "Among the German companies doing this, we were one of the early adopters," says Martin Danielsson, Solution Architect for Haufe Group.

-
+

And now they're leading the way for midsize companies embracing cloud-native technology like Kubernetes. "The really big companies like Ticketmaster and Google get it right, and the startups get it right because they're faster," says Danielsson. "We're in this big lump of companies in the middle with a lot of legacy, a lot of structure, a lot of culture that does not easily fit the cloud technologies. We're just 1,500 people, but we have hundreds of customer-facing applications. So we're doing things that will be relevant for many companies of our size or even smaller."

-
-

More than 80 years ago, Haufe Group was founded as a traditional publishing company, printing books and commentary on paper.

By the 1990s, though, the company’s leaders recognized that the future was digital, and to their credit, were able to transform Haufe Group into a media and software business that now gets 95 percent of its sales from digital products. "Among the German companies doing this, we were one of the early adopters," says Martin Danielsson, Solution Architect for Haufe Group.

- And now they’re leading the way for midsize companies embracing cloud-native technology like Kubernetes. "The really big companies like Ticketmaster and Google get it right, and the startups get it right because they’re faster," says Danielsson. "We’re in this big lump of companies in the middle with a lot of legacy, a lot of structure, a lot of culture that does not easily fit the cloud technologies. We’re just 1,500 people, but we have hundreds of customer-facing applications. So we’re doing things that will be relevant for many companies of our size or even smaller."

- Many of those legacy challenges stemmed from simply following the technology trends of the times. "We used to do full DevOps," he says. In the 1990s and 2000s, "that meant that you had your hardware in the basement. And then 10 years ago, the hype of the moment was to outsource application operations, outsource everything, and strip down your IT department to take away the distraction of all these hardware things. That’s not our area of expertise. We didn’t want to be an infrastructure provider. And now comes the backlash of that."

- Haufe Group began feeling the pain as they were developing more new products, from Internet portals for tax experts to personnel training software, that have created demands for increased speed, reliability and scalability. "Right now, we have this break in workflows, where we go from writing concepts to developing, handing it over to production and then handing that over to your host provider," he says. "And then when things go bad we have no clue what went wrong. We definitely want to take back control, and we want to move a lot faster. Adapting workloads is something that we really want to be able to do."

- Those needs led them to explore cloud-native technology. Their first foray into the cloud was doing deployments in Microsoft Azure, once it became available in Europe, for desktop products that had built-in download services. Hosting expenses for such bandwidth-heavy services were too high, so the company turned to the cloud. "After that, it has been different projects trying out different things," says Danielsson. -
-
+

Many of those legacy challenges stemmed from simply following the technology trends of the times. "We used to do full DevOps," he says. In the 1990s and 2000s, "that meant that you had your hardware in the basement. And then 10 years ago, the hype of the moment was to outsource application operations, outsource everything, and strip down your IT department to take away the distraction of all these hardware things. That's not our area of expertise. We didn't want to be an infrastructure provider. And now comes the backlash of that."

-
-
- "We have been doing containers for the last two years, and we really got the hang of how they work," says Danielsson. "But it was always for development and test, never in production, because we didn’t fully understand how that would work. And to me, Kubernetes was definitely the technology that solved that." -
-
+

Haufe Group began feeling the pain as they were developing more new products, from Internet portals for tax experts to personnel training software, that have created demands for increased speed, reliability and scalability. "Right now, we have this break in workflows, where we go from writing concepts to developing, handing it over to production and then handing that over to your host provider," he says. "And then when things go bad we have no clue what went wrong. We definitely want to take back control, and we want to move a lot faster. Adapting workloads is something that we really want to be able to do."

-
-
+

Those needs led them to explore cloud-native technology. Their first foray into the cloud was doing deployments in Microsoft Azure, once it became available in Europe, for desktop products that had built-in download services. Hosting expenses for such bandwidth-heavy services were too high, so the company turned to the cloud. "After that, it has been different projects trying out different things," says Danielsson.

- Two years ago, Holger Reinhardt joined Haufe Group as CTO and rapidly re-oriented the traditional host provider-based approach toward a cloud and API-first strategy. A core part of this strategy was a strong mandate to embrace infrastructure-as-code across the entire software deployment lifecycle via Docker. - Some experiments went further than others; German regulations about sensitive data proved to be a road block in moving some workloads to Azure and Amazon Web Services. "Due to our history, Germany is really strict with things like personally identifiable data," Danielsson says.

- These experiments took on new life with the arrival of the Azure Sovereign Cloud for Germany (an Azure clone run by the German T-Systems provider). With the availability of Azure.de—which conforms to Germany’s privacy regulations—teams started to seriously consider deploying production loads in Docker into the cloud. "We have been doing containers for the last two years, and we really got the hang of how they work," says Danielsson. "But it was always for development and test, never in production, because we didn’t fully understand how that would work. And to me, Kubernetes was definitely the technology that solved that."

- In parallel, Danielsson had built an API management system with the aim of supporting CI/CD scenarios, aspects of which were missing in off-the-shelf API management products. With a foundation based on Mashape’s Kong gateway, it is open-sourced as wicked.haufe.io. He put wicked.haufe.io to use with his product team.

Otherwise, Danielsson says his philosophy was "don’t try to reinvent the wheel all the time. Go for what’s there and 99 percent of the time it will be enough. And if you think you really need something custom or additional, think perhaps once or twice again. One of the things that I find so amazing with this cloud-native framework is that everything ties in."

- Currently, Haufe Group is working on two projects using Kubernetes in production. One is a new mobile application for researching legislation and tax laws. "We needed a way to take out functionality from a legacy core and put an application on top of that with an API gateway—a lot of moving parts that screams containers," says Danielsson. So the team moved the build pipeline away from "deploying to some old, huge machine that you could deploy anything to" and onto a Kubernetes cluster where there would be automatic CI/CD "with feature branches and all these things that were a bit tedious in the past." -
-
+{{< case-studies/quote image="/images/case-studies/haufegroup/banner3.jpg" >}} +"We have been doing containers for the last two years, and we really got the hang of how they work," says Danielsson. "But it was always for development and test, never in production, because we didn't fully understand how that would work. And to me, Kubernetes was definitely the technology that solved that." +{{< /case-studies/quote >}} -
-
- "Before, we had to announce at least a week in advance when we wanted to do a release because there was a huge checklist of things that you had to do," says Danielsson. "By going cloud native, we have the infrastructure in place to be able to automate all of these things. Now we can get a new release done in half an hour instead of days." -
-
+

Two years ago, Holger Reinhardt joined Haufe Group as CTO and rapidly re-oriented the traditional host provider-based approach toward a cloud and API-first strategy. A core part of this strategy was a strong mandate to embrace infrastructure-as-code across the entire software deployment lifecycle via Docker. Some experiments went further than others; German regulations about sensitive data proved to be a road block in moving some workloads to Azure and Amazon Web Services. "Due to our history, Germany is really strict with things like personally identifiable data," Danielsson says.

-
-
- It was a proof of concept effort, and the proof was in the pudding. "Everyone was really impressed at what we accomplished in a week," says Danielsson. "We did these kinds of integrations just to make sure that we got a handle on how Kubernetes works. If you can create optimism and buzz around something, it’s half won. And if the developers and project managers know this is working, you’re more or less done." Adds Reinhardt: "You need to create some very visible, quick wins in order to overcome the status quo."

- The impact on the speed of deployment was clear: "Before, we had to announce at least a week in advance when we wanted to do a release because there was a huge checklist of things that you had to do," says Danielsson. "By going cloud native, we have the infrastructure in place to be able to automate all of these things. Now we can get a new release done in half an hour instead of days."

- The potential impact on cost was another bonus. "Hosting applications is quite expensive, so moving to the cloud is something that we really want to be able to do," says Danielsson. With the ability to adapt workloads, teams "will be able to scale down to around half the capacity at night, saving 30 percent of the hardware cost."

- Just as importantly, Danielsson says, there’s added flexibility: "When we try to move or rework applications that are really crucial, it’s often tricky to validate whether the path we want to take is going to work out well. In order to validate that, we would need to reproduce the environment and really do testing, and that’s prohibitively expensive and simply not doable with traditional host providers. Cloud native gives us the ability to do risky changes and validate them in a cost-effective way."

- As word of the two successful test projects spread throughout the company, interest in Kubernetes has grown. "We want to be able to support our developers in running Kubernetes clusters but we’re not there yet, so we allow them to do it as long as they’re aware that they are on their own," says Danielsson. "So that’s why we are also looking at things like [the managed Kubernetes platform] CoreOS Tectonic, Azure Container Service, ECS, etc. These kinds of services will be a lot more relevant to midsize companies that want to leverage cloud native but don’t have the IT departments or the structure around that."

- In the next year and a half, Danielsson says the company will be working on moving one of their legacy desktop products, a web app for researching legislation and tax laws originally built in Java Enterprise, onto cloud-native technology. "We’re doing a microservice split out right now so that we can independently deploy the different parts," he says. The main website, which provides free content for customers, is also moving to cloud native. +

These experiments took on new life with the arrival of the Azure Sovereign Cloud for Germany (an Azure clone run by the German T-Systems provider). With the availability of Azure.de—which conforms to Germany's privacy regulations—teams started to seriously consider deploying production loads in Docker into the cloud. "We have been doing containers for the last two years, and we really got the hang of how they work," says Danielsson. "But it was always for development and test, never in production, because we didn't fully understand how that would work. And to me, Kubernetes was definitely the technology that solved that."

-
-
+

In parallel, Danielsson had built an API management system with the aim of supporting CI/CD scenarios, aspects of which were missing in off-the-shelf API management products. With a foundation based on Mashape's Kong gateway, it is open-sourced as wicked.haufe.io. He put wicked.haufe.io to use with his product team.

Otherwise, Danielsson says his philosophy was "don't try to reinvent the wheel all the time. Go for what's there and 99 percent of the time it will be enough. And if you think you really need something custom or additional, think perhaps once or twice again. One of the things that I find so amazing with this cloud-native framework is that everything ties in."

-
-
- "the execution of a strategy requires alignment of culture, structure and technology. Only if those three dimensions are aligned can you successfully execute a transformation into microservices and cloud-native architectures. And it is only then that the Cloud will pay the dividends in much faster speeds in product innovation and much lower operational costs." +

Currently, Haufe Group is working on two projects using Kubernetes in production. One is a new mobile application for researching legislation and tax laws. "We needed a way to take out functionality from a legacy core and put an application on top of that with an API gateway—a lot of moving parts that screams containers," says Danielsson. So the team moved the build pipeline away from "deploying to some old, huge machine that you could deploy anything to" and onto a Kubernetes cluster where there would be automatic CI/CD "with feature branches and all these things that were a bit tedious in the past."

-
-
+{{< case-studies/quote image="/images/case-studies/haufegroup/banner4.jpg" >}} +"Before, we had to announce at least a week in advance when we wanted to do a release because there was a huge checklist of things that you had to do," says Danielsson. "By going cloud native, we have the infrastructure in place to be able to automate all of these things. Now we can get a new release done in half an hour instead of days." +{{< /case-studies/quote >}} -
-
- But with these goals, Danielsson believes there are bigger cultural challenges that need to be constantly addressed. The move to new technology, not to mention a shift toward DevOps, means a lot of change for employees. "The roles were rather fixed in the past," he says. "You had developers, you had project leads, you had testers. And now you get into these really, really important things like test automation. Testers aren’t actually doing click testing anymore, and they have to write automated testing. And if you really want to go full-blown CI/CD, all these little pieces have to work together so that you get the confidence to do a check in, and know this check in is going to land in production, because if I messed up, some test is going to break. This is a really powerful thing because whatever you do, whenever you merge something into the trunk or to the master, this is going live. And that’s where you either get the people or they run away screaming." - Danielsson understands that it may take some people much longer to get used to the new ways.

- "Culture is nothing that you can force on people," he says. "You have to live it for yourself. You have to evangelize. You have to show the advantages time and time again: This is how you can do it, this is what you get from it." To that end, his team has scheduled daylong workshops for the staff, bringing in outside experts to talk about everything from API to Devops to cloud.

- For every person who runs away screaming, many others get drawn in. "Get that foot in the door and make them really interested in this stuff," says Danielsson. "Usually it catches on. We have people you never would have expected chanting, ‘Docker Docker Docker’ now. It’s cool to see them realize that there is a world outside of their Python libraries. It’s awesome to see them really work with Kubernetes."

- Ultimately, Reinhardt says, "the execution of a strategy requires alignment of culture, structure and technology. Only if those three dimensions are aligned can you successfully execute a transformation into microservices and cloud-native architectures. And it is only then that the Cloud will pay the dividends in much faster speeds in product innovation and much lower operational costs." +

It was a proof of concept effort, and the proof was in the pudding. "Everyone was really impressed at what we accomplished in a week," says Danielsson. "We did these kinds of integrations just to make sure that we got a handle on how Kubernetes works. If you can create optimism and buzz around something, it's half won. And if the developers and project managers know this is working, you're more or less done." Adds Reinhardt: "You need to create some very visible, quick wins in order to overcome the status quo."

-
-
+

The impact on the speed of deployment was clear: "Before, we had to announce at least a week in advance when we wanted to do a release because there was a huge checklist of things that you had to do," says Danielsson. "By going cloud native, we have the infrastructure in place to be able to automate all of these things. Now we can get a new release done in half an hour instead of days."

+ +

The potential impact on cost was another bonus. "Hosting applications is quite expensive, so moving to the cloud is something that we really want to be able to do," says Danielsson. With the ability to adapt workloads, teams "will be able to scale down to around half the capacity at night, saving 30 percent of the hardware cost."

+ +

Just as importantly, Danielsson says, there's added flexibility: "When we try to move or rework applications that are really crucial, it's often tricky to validate whether the path we want to take is going to work out well. In order to validate that, we would need to reproduce the environment and really do testing, and that's prohibitively expensive and simply not doable with traditional host providers. Cloud native gives us the ability to do risky changes and validate them in a cost-effective way."

+ +

As word of the two successful test projects spread throughout the company, interest in Kubernetes has grown. "We want to be able to support our developers in running Kubernetes clusters but we're not there yet, so we allow them to do it as long as they're aware that they are on their own," says Danielsson. "So that's why we are also looking at things like [the managed Kubernetes platform] CoreOS Tectonic, Azure Container Service, ECS, etc. These kinds of services will be a lot more relevant to midsize companies that want to leverage cloud native but don't have the IT departments or the structure around that."

+ +

In the next year and a half, Danielsson says the company will be working on moving one of their legacy desktop products, a web app for researching legislation and tax laws originally built in Java Enterprise, onto cloud-native technology. "We're doing a microservice split out right now so that we can independently deploy the different parts," he says. The main website, which provides free content for customers, is also moving to cloud native.

+ +{{< case-studies/quote >}} +"the execution of a strategy requires alignment of culture, structure and technology. Only if those three dimensions are aligned can you successfully execute a transformation into microservices and cloud-native architectures. And it is only then that the Cloud will pay the dividends in much faster speeds in product innovation and much lower operational costs." +{{< /case-studies/quote >}} + +

But with these goals, Danielsson believes there are bigger cultural challenges that need to be constantly addressed. The move to new technology, not to mention a shift toward DevOps, means a lot of change for employees. "The roles were rather fixed in the past," he says. "You had developers, you had project leads, you had testers. And now you get into these really, really important things like test automation. Testers aren't actually doing click testing anymore, and they have to write automated testing. And if you really want to go full-blown CI/CD, all these little pieces have to work together so that you get the confidence to do a check in, and know this check in is going to land in production, because if I messed up, some test is going to break. This is a really powerful thing because whatever you do, whenever you merge something into the trunk or to the master, this is going live. And that's where you either get the people or they run away screaming." Danielsson understands that it may take some people much longer to get used to the new ways.

+ +

"Culture is nothing that you can force on people," he says. "You have to live it for yourself. You have to evangelize. You have to show the advantages time and time again: This is how you can do it, this is what you get from it." To that end, his team has scheduled daylong workshops for the staff, bringing in outside experts to talk about everything from API to Devops to cloud.

+ +

For every person who runs away screaming, many others get drawn in. "Get that foot in the door and make them really interested in this stuff," says Danielsson. "Usually it catches on. We have people you never would have expected chanting, 'Docker Docker Docker' now. It's cool to see them realize that there is a world outside of their Python libraries. It's awesome to see them really work with Kubernetes."

+ +

Ultimately, Reinhardt says, "the execution of a strategy requires alignment of culture, structure and technology. Only if those three dimensions are aligned can you successfully execute a transformation into microservices and cloud-native architectures. And it is only then that the Cloud will pay the dividends in much faster speeds in product innovation and much lower operational costs."

diff --git a/content/en/case-studies/huawei/index.html b/content/en/case-studies/huawei/index.html index 29de86f5c4..ec1cd212f2 100644 --- a/content/en/case-studies/huawei/index.html +++ b/content/en/case-studies/huawei/index.html @@ -1,101 +1,73 @@ --- title: Huawei Case Study - case_study_styles: true cid: caseStudies -css: /css/style_huawei.css + +new_case_study_styles: true +heading_background: /images/case-studies/huawei/banner1.jpg +heading_title_logo: /images/huawei_logo.png +subheading: > + Embracing Cloud Native as a User – and a Vendor +case_study_details: + - Company: Huawei + - Location: Shenzhen, China + - Industry: Telecommunications Equipment --- -
-

CASE STUDY:
Embracing Cloud Native as a User – and a Vendor

-
+

Challenge

-
- Company  Huawei     Location  Shenzhen, China     Industry  Telecommunications Equipment -
+

A multinational company that's the largest telecommunications equipment manufacturer in the world, Huawei has more than 180,000 employees. In order to support its fast business development around the globe, Huawei has eight data centers for its internal I.T. department, which have been running 800+ applications in 100K+ VMs to serve these 180,000 users. With the rapid increase of new applications, the cost and efficiency of management and deployment of VM-based apps all became critical challenges for business agility. "It's very much a distributed system so we found that managing all of the tasks in a more consistent way is always a challenge," says Peixin Hou, the company's Chief Software Architect and Community Director for Open Source. "We wanted to move into a more agile and decent practice."

-
+

Solution

-
+

After deciding to use container technology, Huawei began moving the internal I.T. department's applications to run on Kubernetes. So far, about 30 percent of these applications have been transferred to cloud native.

-
-
-

Challenge

- A multinational company that’s the largest telecommunications equipment manufacturer in the world, Huawei has more than 180,000 employees. In order to support its fast business development around the globe, Huawei has eight data centers for its internal I.T. department, which have been running 800+ applications in 100K+ VMs to serve these 180,000 users. With the rapid increase of new applications, the cost and efficiency of management and deployment of VM-based apps all became critical challenges for business agility. "It’s very much a distributed system so we found that managing all of the tasks in a more consistent way is always a challenge," says Peixin Hou, the company’s Chief Software Architect and Community Director for Open Source. "We wanted to move into a more agile and decent practice." -
- -
-

Solution

- After deciding to use container technology, Huawei began moving the internal I.T. department’s applications to run on Kubernetes. So far, about 30 percent of these applications have been transferred to cloud native. -
-

Impact

- "By the end of 2016, Huawei’s internal I.T. department managed more than 4,000 nodes with tens of thousands containers using a Kubernetes-based Platform as a Service (PaaS) solution," says Hou. "The global deployment cycles decreased from a week to minutes, and the efficiency of application delivery has been improved 10 fold." For the bottom line, he says, "We also see significant operating expense spending cut, in some circumstances 20-30 percent, which we think is very helpful for our business." Given the results Huawei has had internally – and the demand it is seeing externally – the company has also built the technologies into FusionStage™, the PaaS solution it offers its customers. -
-
+

"By the end of 2016, Huawei's internal I.T. department managed more than 4,000 nodes with tens of thousands containers using a Kubernetes-based Platform as a Service (PaaS) solution," says Hou. "The global deployment cycles decreased from a week to minutes, and the efficiency of application delivery has been improved 10 fold." For the bottom line, he says, "We also see significant operating expense spending cut, in some circumstances 20-30 percent, which we think is very helpful for our business." Given the results Huawei has had internally – and the demand it is seeing externally – the company has also built the technologies into FusionStage™, the PaaS solution it offers its customers.

-
+{{< case-studies/quote author="Peixin Hou, chief software architect and community director for open source" >}} +"If you're a vendor, in order to convince your customer, you should use it yourself. Luckily because Huawei has a lot of employees, we can demonstrate the scale of cloud we can build using this technology." +{{< /case-studies/quote >}} -
-
- "If you’re a vendor, in order to convince your customer, you should use it yourself. Luckily because Huawei has a lot of employees, we can demonstrate the scale of cloud we can build using this technology."

- Peixin Hou, chief software architect and community director for open source
-
-
+

Huawei's Kubernetes journey began with one developer. Over two years ago, one of the engineers employed by the networking and telecommunications giant became interested in Kubernetes, the technology for managing application containers across clusters of hosts, and started contributing to its open source community. As the technology developed and the community grew, he kept telling his managers about it.

-
+

And as fate would have it, at the same time, Huawei was looking for a better orchestration system for its internal enterprise I.T. department, which supports every business flow processing. "We have more than 180,000 employees worldwide, and a complicated internal procedure, so probably every week this department needs to develop some new applications," says Peixin Hou, Huawei's Chief Software Architect and Community Director for Open Source. "Very often our I.T. departments need to launch tens of thousands of containers, with tasks running across thousands of nodes across the world. It's very much a distributed system, so we found that managing all of the tasks in a more consistent way is always a challenge."

-
- Huawei’s Kubernetes journey began with one developer. - Over two years ago, one of the engineers employed by the networking and telecommunications giant became interested in Kubernetes, the technology for managing application containers across clusters of hosts, and started contributing to its open source community. As the technology developed and the community grew, he kept telling his managers about it.

- And as fate would have it, at the same time, Huawei was looking for a better orchestration system for its internal enterprise I.T. department, which supports every business flow processing. "We have more than 180,000 employees worldwide, and a complicated internal procedure, so probably every week this department needs to develop some new applications," says Peixin Hou, Huawei’s Chief Software Architect and Community Director for Open Source. "Very often our I.T. departments need to launch tens of thousands of containers, with tasks running across thousands of nodes across the world. It’s very much a distributed system, so we found that managing all of the tasks in a more consistent way is always a challenge."

- In the past, Huawei had used virtual machines to encapsulate applications, but "every time when we start a VM," Hou says, "whether because it’s a new service or because it was a service that was shut down because of some abnormal node functioning, it takes a lot of time." Huawei turned to containerization, so the timing was right to try Kubernetes. It took a year to adopt that engineer’s suggestion – the process "is not overnight," says Hou – but once in use, he says, "Kubernetes basically solved most of our problems. Before, the time of deployment took about a week, now it only takes minutes. The developers are happy. That department is also quite happy."

- Hou sees great benefits to the company that come with using this technology: "Kubernetes brings agility, scale-out capability, and DevOps practice to the cloud-based applications," he says. "It provides us with the ability to customize the scheduling architecture, which makes possible the affinity between container tasks that gives greater efficiency. It supports multiple container formats. It has extensive support for various container networking solutions and container storage." -
-
+

In the past, Huawei had used virtual machines to encapsulate applications, but "every time when we start a VM," Hou says, "whether because it's a new service or because it was a service that was shut down because of some abnormal node functioning, it takes a lot of time." Huawei turned to containerization, so the timing was right to try Kubernetes. It took a year to adopt that engineer's suggestion – the process "is not overnight," says Hou – but once in use, he says, "Kubernetes basically solved most of our problems. Before, the time of deployment took about a week, now it only takes minutes. The developers are happy. That department is also quite happy."

-
-
- "Kubernetes basically solved most of our problems. Before, the time of deployment took about a week, now it only takes minutes. The developers are happy. That department is also quite happy." -
-
+

Hou sees great benefits to the company that come with using this technology: "Kubernetes brings agility, scale-out capability, and DevOps practice to the cloud-based applications," he says. "It provides us with the ability to customize the scheduling architecture, which makes possible the affinity between container tasks that gives greater efficiency. It supports multiple container formats. It has extensive support for various container networking solutions and container storage."

-
-
- And not least of all, there’s an impact on the bottom line. Says Hou: "We also see significant operating expense spending cut in some circumstances 20-30 percent, which is very helpful for our business."

- Pleased with those initial results, and seeing a demand for cloud native technologies from its customers, Huawei doubled down on Kubernetes. In the spring of 2016, the company became not only a user but also a vendor.

- "We built the Kubernetes technologies into our solutions," says Hou, referring to Huawei’s FusionStage™ PaaS offering. "Our customers, from very big telecommunications operators to banks, love the idea of cloud native. They like Kubernetes technology. But they need to spend a lot of time to decompose their applications to turn them into microservice architecture, and as a solution provider, we help them. We’ve started to work with some Chinese banks, and we see a lot of interest from our customers like China Mobile and Deutsche Telekom."

- "If you’re just a user, you’re just a user," adds Hou. "But if you’re a vendor, in order to even convince your customers, you should use it yourself. Luckily because Huawei has a lot of employees, we can demonstrate the scale of cloud we can build using this technology. We provide customer wisdom." While Huawei has its own private cloud, many of its customers run cross-cloud applications using Huawei’s solutions. It’s a big selling point that most of the public cloud providers now support Kubernetes. "This makes the cross-cloud transition much easier than with other solutions," says Hou.

-
-
+{{< case-studies/quote image="/images/case-studies/huawei/banner3.jpg" >}} +"Kubernetes basically solved most of our problems. Before, the time of deployment took about a week, now it only takes minutes. The developers are happy. That department is also quite happy." +{{< /case-studies/quote >}} -
-
- "Our customers, from very big telecommunications operators to banks, love the idea of cloud native. They like Kubernetes technology. But they need to spend a lot of time to decompose their applications to turn them into microservice architecture, and as a solution provider, we help them." -
-
+

And not least of all, there's an impact on the bottom line. Says Hou: "We also see significant operating expense spending cut in some circumstances 20-30 percent, which is very helpful for our business."

-
-
- Within Huawei itself, once his team completes the transition of the internal business procedure department to Kubernetes, Hou is looking to convince more departments to move over to the cloud native development cycle and practice. "We have a lot of software developers, so we will provide them with our platform as a service solution, our own product," he says. "We would like to see significant cuts in their iteration cycle."

- Having overseen the initial move to Kubernetes at Huawei, Hou has advice for other companies considering the technology: "When you start to design the architecture of your application, think about cloud native, think about microservice architecture from the beginning," he says. "I think you will benefit from that."

- But if you already have legacy applications, "start from some microservice-friendly part of those applications first, parts that are relatively easy to be decomposed into simpler pieces and are relatively lightweight," Hou says. "Don’t think from day one that within how many days I want to move the whole architecture, or move everything into microservices. Don’t put that as a kind of target. You should do it in a gradual manner. And I would say for legacy applications, not every piece would be suitable for microservice architecture. No need to force it."

- After all, as enthusiastic as Hou is about Kubernetes at Huawei, he estimates that "in the next 10 years, maybe 80 percent of the workload can be distributed, can be run on the cloud native environments. There’s still 20 percent that’s not, but it’s fine. If we can make 80 percent of our workload really be cloud native, to have agility, it’s a much better world at the end of the day." +

Pleased with those initial results, and seeing a demand for cloud native technologies from its customers, Huawei doubled down on Kubernetes. In the spring of 2016, the company became not only a user but also a vendor.

-
-
+

"We built the Kubernetes technologies into our solutions," says Hou, referring to Huawei's FusionStage™ PaaS offering. "Our customers, from very big telecommunications operators to banks, love the idea of cloud native. They like Kubernetes technology. But they need to spend a lot of time to decompose their applications to turn them into microservice architecture, and as a solution provider, we help them. We've started to work with some Chinese banks, and we see a lot of interest from our customers like China Mobile and Deutsche Telekom."

-
-
- "In the next 10 years, maybe 80 percent of the workload can be distributed, can be run on the cloud native environments. There’s still 20 percent that’s not, but it’s fine. If we can make 80 percent of our workload really be cloud native, to have agility, it’s a much better world at the end of the day." +

"If you're just a user, you're just a user," adds Hou. "But if you're a vendor, in order to even convince your customers, you should use it yourself. Luckily because Huawei has a lot of employees, we can demonstrate the scale of cloud we can build using this technology. We provide customer wisdom." While Huawei has its own private cloud, many of its customers run cross-cloud applications using Huawei's solutions. It's a big selling point that most of the public cloud providers now support Kubernetes. "This makes the cross-cloud transition much easier than with other solutions," says Hou.

-
-
-
-
- In the nearer future, Hou is looking forward to new features that are being developed around Kubernetes, not least of all the ones that Huawei is contributing to. Huawei engineers have worked on the federation feature (which puts multiple Kubernetes clusters in a single framework to be managed seamlessly), scheduling, container networking and storage, and a just-announced technology called Container Ops, which is a DevOps pipeline engine. "This will put every DevOps job into a container," he explains. "And then this container mechanism is running using Kubernetes, but is also used to test Kubernetes. With that mechanism, we can make the containerized DevOps jobs be created, shared and managed much more easily than before."

- Still, Hou sees this technology as only halfway to its full potential. First and foremost, he’d like to expand the scale it can orchestrate, which is important for supersized companies like Huawei – as well as some of its customers.

- Hou proudly notes that two years after that first Huawei engineer became a contributor to and evangelist for Kubernetes, Huawei is now a top contributor to the community. "We’ve learned that the more you contribute to the community," he says, "the more you get back." +{{< case-studies/quote image="/images/case-studies/huawei/banner4.jpg" >}} +"Our customers, from very big telecommunications operators to banks, love the idea of cloud native. They like Kubernetes technology. But they need to spend a lot of time to decompose their applications to turn them into microservice architecture, and as a solution provider, we help them." +{{< /case-studies/quote >}} -
-
+

Within Huawei itself, once his team completes the transition of the internal business procedure department to Kubernetes, Hou is looking to convince more departments to move over to the cloud native development cycle and practice. "We have a lot of software developers, so we will provide them with our platform as a service solution, our own product," he says. "We would like to see significant cuts in their iteration cycle."

+ +

Having overseen the initial move to Kubernetes at Huawei, Hou has advice for other companies considering the technology: "When you start to design the architecture of your application, think about cloud native, think about microservice architecture from the beginning," he says. "I think you will benefit from that."

+ +

But if you already have legacy applications, "start from some microservice-friendly part of those applications first, parts that are relatively easy to be decomposed into simpler pieces and are relatively lightweight," Hou says. "Don't think from day one that within how many days I want to move the whole architecture, or move everything into microservices. Don't put that as a kind of target. You should do it in a gradual manner. And I would say for legacy applications, not every piece would be suitable for microservice architecture. No need to force it."

+ +

After all, as enthusiastic as Hou is about Kubernetes at Huawei, he estimates that "in the next 10 years, maybe 80 percent of the workload can be distributed, can be run on the cloud native environments. There's still 20 percent that's not, but it's fine. If we can make 80 percent of our workload really be cloud native, to have agility, it's a much better world at the end of the day."

+ +{{< case-studies/quote >}} +"In the next 10 years, maybe 80 percent of the workload can be distributed, can be run on the cloud native environments. There's still 20 percent that's not, but it's fine. If we can make 80 percent of our workload really be cloud native, to have agility, it's a much better world at the end of the day." +{{< /case-studies/quote >}} + +

In the nearer future, Hou is looking forward to new features that are being developed around Kubernetes, not least of all the ones that Huawei is contributing to. Huawei engineers have worked on the federation feature (which puts multiple Kubernetes clusters in a single framework to be managed seamlessly), scheduling, container networking and storage, and a just-announced technology called Container Ops, which is a DevOps pipeline engine. "This will put every DevOps job into a container," he explains. "And then this container mechanism is running using Kubernetes, but is also used to test Kubernetes. With that mechanism, we can make the containerized DevOps jobs be created, shared and managed much more easily than before."

+ +

Still, Hou sees this technology as only halfway to its full potential. First and foremost, he'd like to expand the scale it can orchestrate, which is important for supersized companies like Huawei – as well as some of its customers.

+ +

Hou proudly notes that two years after that first Huawei engineer became a contributor to and evangelist for Kubernetes, Huawei is now a top contributor to the community. "We've learned that the more you contribute to the community," he says, "the more you get back."

diff --git a/content/en/case-studies/ibm/index.html b/content/en/case-studies/ibm/index.html index e9a78a9443..aa3f108e1c 100644 --- a/content/en/case-studies/ibm/index.html +++ b/content/en/case-studies/ibm/index.html @@ -1,108 +1,80 @@ --- title: IBM Case Study - linkTitle: IBM case_study_styles: true cid: caseStudies -css: /css/style_case_studies.css logo: ibm_featured_logo.svg featured: false + +new_case_study_styles: true +heading_background: /images/case-studies/ibm/banner1.jpg +heading_title_logo: /images/ibm_logo.png +subheading: > + Building an Image Trust Service on Kubernetes with Notary and TUF +case_study_details: + - Company: IBM + - Location: Armonk, New York + - Industry: Cloud Computing --- -
-

CASE STUDY:
Building an Image Trust Service on Kubernetes with Notary and TUF

+

Challenge

-
+

IBM Cloud offers public, private, and hybrid cloud functionality across a diverse set of runtimes from its OpenWhisk-based function as a service (FaaS) offering, managed Kubernetes and containers, to Cloud Foundry platform as a service (PaaS). These runtimes are combined with the power of the company's enterprise technologies, such as MQ and DB2, its modern artificial intelligence (AI) Watson, and data analytics services. Users of IBM Cloud can exploit capabilities from more than 170 different cloud native services in its catalog, including capabilities such as IBM's Weather Company API and data services. In the later part of 2017, the IBM Cloud Container Registry team wanted to build out an image trust service.

-
- Company  IBM     Location  Armonk, New York     Industry  Cloud Computing -
- -
-
-
-
-

Challenge

- IBM Cloud offers public, private, and hybrid cloud functionality across a diverse set of runtimes from its OpenWhisk-based function as a service (FaaS) offering, managed Kubernetes and containers, to Cloud Foundry platform as a service (PaaS). These runtimes are combined with the power of the company’s enterprise technologies, such as MQ and DB2, its modern artificial intelligence (AI) Watson, and data analytics services. Users of IBM Cloud can exploit capabilities from more than 170 different cloud native services in its catalog, including capabilities such as IBM’s Weather Company API and data services. In the later part of 2017, the IBM Cloud Container Registry team wanted to build out an image trust service. -

-

Solution

- The work on this new service culminated with its public availability in the IBM Cloud in February 2018. The image trust service, called Portieris, is fully based on the Cloud Native Computing Foundation (CNCF) open source project Notary, according to Michael Hough, a software developer with the IBM Cloud Container Registry team. Portieris is a Kubernetes admission controller for enforcing content trust. Users can create image security policies for each Kubernetes namespace, or at the cluster level, and enforce different levels of trust for different images. Portieris is a key part of IBM’s trust story, since it makes it possible for users to consume the company’s Notary offering from within their IKS clusters. The offering is that Notary server runs in IBM’s cloud, and then Portieris runs inside the IKS cluster. This enables users to be able to have their IKS cluster verify that the image they're loading containers from contains exactly what they expect it to, and Portieris is what allows an IKS cluster to apply that verification. - -
- -
+

Solution

+

The work on this new service culminated with its public availability in the IBM Cloud in February 2018. The image trust service, called Portieris, is fully based on the Cloud Native Computing Foundation (CNCF) open source project Notary, according to Michael Hough, a software developer with the IBM Cloud Container Registry team. Portieris is a Kubernetes admission controller for enforcing content trust. Users can create image security policies for each Kubernetes namespace, or at the cluster level, and enforce different levels of trust for different images. Portieris is a key part of IBM's trust story, since it makes it possible for users to consume the company's Notary offering from within their IKS clusters. The offering is that Notary server runs in IBM's cloud, and then Portieris runs inside the IKS cluster. This enables users to be able to have their IKS cluster verify that the image they're loading containers from contains exactly what they expect it to, and Portieris is what allows an IKS cluster to apply that verification.

Impact

- IBM's intention in offering a managed Kubernetes container service and image registry is to provide a fully secure end-to-end platform for its enterprise customers. "Image signing is one key part of that offering, and our container registry team saw Notary as the de facto way to implement that capability in the current Docker and container ecosystem," Hough says. The company had not been offering image signing before, and Notary is the tool it used to implement that capability. "We had a multi-tenant Docker Registry with private image hosting," Hough says. "The Docker Registry uses hashes to ensure that image content is correct, and data is encrypted both in flight and at rest. But it does not provide any guarantees of who pushed an image. We used Notary to enable users to sign images in their private registry namespaces if they so choose." +

IBM's intention in offering a managed Kubernetes container service and image registry is to provide a fully secure end-to-end platform for its enterprise customers. "Image signing is one key part of that offering, and our container registry team saw Notary as the de facto way to implement that capability in the current Docker and container ecosystem," Hough says. The company had not been offering image signing before, and Notary is the tool it used to implement that capability. "We had a multi-tenant Docker Registry with private image hosting," Hough says. "The Docker Registry uses hashes to ensure that image content is correct, and data is encrypted both in flight and at rest. But it does not provide any guarantees of who pushed an image. We used Notary to enable users to sign images in their private registry namespaces if they so choose."

-
+{{< case-studies/quote author="Michael Hough, a software developer with the IBM Container Registry team" >}} +"We see CNCF as a safe haven for cloud native open source, providing stability, longevity, and expected maintenance for member projects—no matter the originating vendor or project." +{{< /case-studies/quote >}} -
-
-
-
- "We see CNCF as a safe haven for cloud native open source, providing stability, longevity, and expected maintenance for member projects—no matter the originating vendor or project."

- Michael Hough, a software developer with the IBM Container Registry team
-
-
-
-
-

Docker had already created the Notary project as an implementation of The Update Framework (TUF), and this implementation of TUF provided the capabilities for Docker Content Trust.

"After contribution to CNCF of both TUF and Notary, we perceived that it was becoming the de facto standard for image signing in the container ecosystem", says Michael Hough, a software developer with the IBM Cloud Container Registry team. -

-The key reason for selecting Notary was that it was already compatible with the existing authentication stack IBM’s container registry was using. So was the design of TUF, which does not require the registry team to have to enter the business of key management. Both of these were "attractive design decisions that confirmed our choice of Notary," he says. -

-The introduction of Notary to implement image signing capability in IBM Cloud encourages increased security across IBM's cloud platform, "where we expect it will include both the signing of official IBM images as well as expected use by security-conscious enterprise customers," Hough says. "When combined with security policy implementations, we expect an increased use of deployment policies in CI/CD pipelines that allow for fine-grained control of service deployment based on image signers." -The availability of image signing "is a huge benefit to security-conscious customers who require this level of image provenance and security," Hough says. "With our IBM Cloud Kubernetes as-a-service offering and the admission controller we have made available, it allows both IBM services as well as customers of the IBM public cloud to use security policies to control service deployment." +{{< case-studies/lead >}} +Docker had already created the Notary project as an implementation of The Update Framework (TUF), and this implementation of TUF provided the capabilities for Docker Content Trust. +{{< /case-studies/lead >}} +

"After contribution to CNCF of both TUF and Notary, we perceived that it was becoming the de facto standard for image signing in the container ecosystem", says Michael Hough, a software developer with the IBM Cloud Container Registry team.

-
-
-
-
- "Image signing is one key part of our Kubernetes container service offering, and our container registry team saw Notary as the de facto way to implement that capability in the current Docker and container ecosystem"

- Michael Hough, a software developer with the IBM Cloud Container Registry team
-
-
-
-
- Now that the Notary-implemented service is generally available in IBM’s public cloud as a component of its existing IBM Cloud Container Registry, it is deployed as a highly available service across five IBM Cloud regions. This high-availability deployment has three instances across two zones in each of the five regions, load balanced with failover support. "We have also deployed it with end-to-end TLS support through to our back-end IBM Cloudant persistence storage service," Hough says. -

- The IBM team has created and open sourced a Kubernetes admission controller called Portieris, which uses Notary signing information combined with customer-defined security policies to control image deployment into their cluster. "We are hoping to drive adoption of Portieris through its use of our Notary offering," Hough says. -

- IBM has been a key player in the creation and support of open source foundations, including CNCF. Todd Moore, IBM's vice president of Open Technology, is the current CNCF governing board chair and a number of IBMers are active across many of the CNCF member projects. +

The key reason for selecting Notary was that it was already compatible with the existing authentication stack IBM's container registry was using. So was the design of TUF, which does not require the registry team to have to enter the business of key management. Both of these were "attractive design decisions that confirmed our choice of Notary," he says.

+

The introduction of Notary to implement image signing capability in IBM Cloud encourages increased security across IBM's cloud platform, "where we expect it will include both the signing of official IBM images as well as expected use by security-conscious enterprise customers," Hough says. "When combined with security policy implementations, we expect an increased use of deployment policies in CI/CD pipelines that allow for fine-grained control of service deployment based on image signers."

+

The availability of image signing "is a huge benefit to security-conscious customers who require this level of image provenance and security," Hough says. "With our IBM Cloud Kubernetes as-a-service offering and the admission controller we have made available, it allows both IBM services as well as customers of the IBM public cloud to use security policies to control service deployment."

-
-
-
-
- "With our IBM Cloud Kubernetes as-a-service offering and the admission controller we have made available, it allows both IBM services as well as customers of the IBM public cloud to use security policies to control service deployment."

- Michael Hough, a software developer with the IBM Cloud Container Registry team
-
-
- +{{< case-studies/quote + image="/images/case-studies/ibm/banner3.jpg" + author="Michael Hough, a software developer with the IBM Cloud Container Registry team" +>}} +"Image signing is one key part of our Kubernetes container service offering, and our container registry team saw Notary as the de facto way to implement that capability in the current Docker and container ecosystem" +{{< /case-studies/quote >}} -
-
- "Given that, we see CNCF as a safe haven for cloud native open source, providing stability, longevity, and expected maintenance for member projects—no matter the originating vendor or project," Hough says. Because the entire cloud native world is a fast-moving area with many competing vendors and solutions, "we see the CNCF model as an arbiter of openness and fair play across the ecosystem," he says. -

-With both TUF and Notary as part of CNCF, IBM expects there to be standardization around these capabilities beyond just de facto standards for signing and provenance. IBM has determined to not simply consume Notary, but also to contribute to the open source project where applicable. "IBMers have contributed a CouchDB backend to support our use of IBM Cloudant as the persistent store; and are working on generalization of the pkcs11 provider, allowing support of other security hardware devices beyond Yubikey," Hough says. +

Now that the Notary-implemented service is generally available in IBM's public cloud as a component of its existing IBM Cloud Container Registry, it is deployed as a highly available service across five IBM Cloud regions. This high-availability deployment has three instances across two zones in each of the five regions, load balanced with failover support. "We have also deployed it with end-to-end TLS support through to our back-end IBM Cloudant persistence storage service," Hough says.

-
-
-
-
- "There are new projects addressing these challenges, including within CNCF. We will definitely be following these advancements with interest. We found the Notary community to be an active and friendly community open to changes, such as our addition of a CouchDB backend for persistent storage."

- Michael Hough, a software developer with the IBM Cloud Container Registry team
-
-
-
-
-The company has used other CNCF projects containerd, Envoy, Prometheus, gRPC, and CNI, and is looking into SPIFFE and SPIRE as well for potential future use. -

-What advice does Hough have for other companies that are looking to deploy Notary or a cloud native infrastructure? -

-"While this is true for many areas of cloud native infrastructure software, we found that a high-availability, multi-region deployment of Notary requires a solid implementation to handle certificate management and rotation," he says. "There are new projects addressing these challenges, including within CNCF. We will definitely be following these advancements with interest. We found the Notary community to be an active and friendly community open to changes, such as our addition of a CouchDB backend for persistent storage." +

The IBM team has created and open sourced a Kubernetes admission controller called Portieris, which uses Notary signing information combined with customer-defined security policies to control image deployment into their cluster. "We are hoping to drive adoption of Portieris through its use of our Notary offering," Hough says.

-
+

IBM has been a key player in the creation and support of open source foundations, including CNCF. Todd Moore, IBM's vice president of Open Technology, is the current CNCF governing board chair and a number of IBMers are active across many of the CNCF member projects.

-
+{{< case-studies/quote + image="/images/case-studies/ibm/banner4.jpg" + author="Michael Hough, a software developer with the IBM Cloud Container Registry team" +>}} +"With our IBM Cloud Kubernetes as-a-service offering and the admission controller we have made available, it allows both IBM services as well as customers of the IBM public cloud to use security policies to control service deployment." +{{< /case-studies/quote >}} + +

"Given that, we see CNCF as a safe haven for cloud native open source, providing stability, longevity, and expected maintenance for member projects—no matter the originating vendor or project," Hough says. Because the entire cloud native world is a fast-moving area with many competing vendors and solutions, "we see the CNCF model as an arbiter of openness and fair play across the ecosystem," he says.

+ +

With both TUF and Notary as part of CNCF, IBM expects there to be standardization around these capabilities beyond just de facto standards for signing and provenance. IBM has determined to not simply consume Notary, but also to contribute to the open source project where applicable. "IBMers have contributed a CouchDB backend to support our use of IBM Cloudant as the persistent store; and are working on generalization of the pkcs11 provider, allowing support of other security hardware devices beyond Yubikey," Hough says.

+ +{{< case-studies/quote author="Michael Hough, a software developer with the IBM Cloud Container Registry team" >}} +"There are new projects addressing these challenges, including within CNCF. We will definitely be following these advancements with interest. We found the Notary community to be an active and friendly community open to changes, such as our addition of a CouchDB backend for persistent storage." +{{< /case-studies/quote >}} + +

The company has used other CNCF projects containerd, Envoy, Prometheus, gRPC, and CNI, and is looking into SPIFFE and SPIRE as well for potential future use.

+ +

What advice does Hough have for other companies that are looking to deploy Notary or a cloud native infrastructure?

+ +

"While this is true for many areas of cloud native infrastructure software, we found that a high-availability, multi-region deployment of Notary requires a solid implementation to handle certificate management and rotation," he says. "There are new projects addressing these challenges, including within CNCF. We will definitely be following these advancements with interest. We found the Notary community to be an active and friendly community open to changes, such as our addition of a CouchDB backend for persistent storage."

diff --git a/content/en/case-studies/ing/index.html b/content/en/case-studies/ing/index.html index 943daec2de..037ba9775d 100644 --- a/content/en/case-studies/ing/index.html +++ b/content/en/case-studies/ing/index.html @@ -5,95 +5,74 @@ case_study_styles: true cid: caseStudies weight: 50 featured: true -css: /css/style_case_studies.css quote: > - The big cloud native promise to our business is the ability to go from idea to production within 48 hours. We are some years away from this, but that’s quite feasible to us. + The big cloud native promise to our business is the ability to go from idea to production within 48 hours. We are some years away from this, but that's quite feasible to us. + +new_case_study_styles: true +heading_background: /images/case-studies/ing/banner1.jpg +heading_title_logo: /images/ing_logo.png +subheading: > + Driving Banking Innovation with Cloud Native +case_study_details: + - Company: ING + - Location: Amsterdam, Netherlands + - Industry: Finance --- +

Challenge

-
-

CASE STUDY:
Driving Banking Innovation with Cloud Native -

+

After undergoing an agile transformation, ING realized it needed a standardized platform to support the work their developers were doing. "Our DevOps teams got empowered to be autonomous," says Infrastructure Architect Thijs Ebbers. "It has benefits; you get all kinds of ideas. But a lot of teams are going to devise the same wheel. Teams started tinkering with Docker, Docker Swarm, Kubernetes, Mesos. Well, it's not really useful for a company to have one hundred wheels, instead of one good wheel.

-
+

Solution

-
- Company  ING     Location  Amsterdam, Netherlands -     Industry  Finance -
+

Using Kubernetes for container orchestration and Docker for containerization, the ING team began building an internal public cloud for its CI/CD pipeline and green-field applications. The pipeline, which has been built on Mesos Marathon, will be migrated onto Kubernetes. The bank-account management app Yolt in the U.K. (and soon France and Italy) market already is live hosted on a Kubernetes framework. At least two greenfield projects currently on the Kubernetes framework will be going into production later this year. By the end of 2018, the company plans to have converted a number of APIs used in the banking customer experience to cloud native APIs and host these on the Kubernetes-based platform.

-
-
-
-
-

Challenge

- After undergoing an agile transformation, ING realized it needed a standardized platform to support the work their developers were doing. "Our DevOps teams got empowered to be autonomous," says Infrastructure Architect Thijs Ebbers. "It has benefits; you get all kinds of ideas. But a lot of teams are going to devise the same wheel. Teams started tinkering with Docker, Docker Swarm, Kubernetes, Mesos. Well, it’s not really useful for a company to have one hundred wheels, instead of one good wheel. -
-
-

Solution

- Using Kubernetes for container orchestration and Docker for containerization, the ING team began building an internal public cloud for its CI/CD pipeline and green-field applications. The pipeline, which has been built on Mesos Marathon, will be migrated onto Kubernetes. The bank-account management app Yolt in the U.K. (and soon France and Italy) market already is live hosted on a Kubernetes framework. At least two greenfield projects currently on the Kubernetes framework will be going into production later this year. By the end of 2018, the company plans to have converted a number of APIs used in the banking customer experience to cloud native APIs and host these on the Kubernetes-based platform. +

Impact

-
-
+

"Cloud native technologies are helping our speed, from getting an application to test to acceptance to production," says Infrastructure Architect Onno Van der Voort. "If you walk around ING now, you see all these DevOps teams, doing stand-ups, demoing. They try to get new functionality out there really fast. We held a hackathon for one of our existing components and basically converted it to cloud native within 2.5 days, though of course the tail takes more time before code is fully production ready."

-
-
-

Impact

- "Cloud native technologies are helping our speed, from getting an application to test to acceptance to production," says Infrastructure Architect Onno Van der Voort. "If you walk around ING now, you see all these DevOps teams, doing stand-ups, demoing. They try to get new functionality out there really fast. We held a hackathon for one of our existing components and basically converted it to cloud native within 2.5 days, though of course the tail takes more time before code is fully production ready." -
-
-
-
-
-
- "The big cloud native promise to our business is the ability to go from idea to production within 48 hours. We are some years away from this, but that’s quite feasible to us." -

— Thijs Ebbers, Infrastructure Architect, ING
-
-
-
-
-

ING has long embraced innovation in banking, launching the internet-based ING Direct in 1997.

In that same spirit, the company underwent an agile transformation a few years ago. "Our DevOps teams got empowered to be autonomous," says Infrastructure Architect Thijs Ebbers. "It has benefits; you get all kinds of ideas. But a lot of teams are going to devise the same wheel. Teams started tinkering with Docker, Docker Swarm, Kubernetes, Mesos. Well, it’s not really useful for a company to have one hundred wheels, instead of one good wheel."

- Looking to standardize the deployment process within the company’s strict security guidelines, the team looked at several solutions and found that in the past year, "Kubernetes won the container management framework wars," says Ebbers. "We decided to standardize ING on a Kubernetes framework." Everything is run on premise due to banking regulations, he adds, but "we will be building an internal public cloud. We are trying to get on par with what public clouds are doing. That’s one of the reasons we got Kubernetes."

- They also embraced Docker to address a major pain point in ING’s CI/CD pipeline. Before containerization, "Every development team had to order a VM, and it was quite a heavy delivery model for them," says Infrastructure Architect Onno Van der Voort. "Another use case for containerization is when the application travels through the pipeline, they fire up Docker containers to do test work against the applications and after they’ve done the work, the containers get killed again." +{{< case-studies/quote author="Thijs Ebbers, Infrastructure Architect, ING">}} +"The big cloud native promise to our business is the ability to go from idea to production within 48 hours. We are some years away from this, but that's quite feasible to us." +{{< /case-studies/quote >}} -
-
-
-
- "We decided to standardize ING on a Kubernetes framework." Everything is run on premise due to banking regulations, he adds, but "we will be building an internal public cloud. We are trying to get on par with what public clouds are doing. That’s one of the reasons we got Kubernetes." -

— Thijs Ebbers, Infrastructure Architect, ING
-
-
-
-
- Because of industry regulations, applications are only allowed to go through the pipeline, where compliance is enforced, rather than be deployed directly into a container. "We have to run the complete platform of services we need, many routing from different places," says Van der Voort. "We need this Kubernetes framework for deploying the containers, with all those components, monitoring, logging. It’s complex." For that reason, ING has chosen to start on the OpenShift Origin Kubernetes distribution.

- Already, "cloud native technologies are helping our speed, from getting an application to test to acceptance to production," says Van der Voort. "If you walk around ING now, you see all these DevOps teams, doing stand-ups, demoing. They try to get new functionality out there really fast. We held a hackathon for one of our existing components and basically converted it to cloud native within 2.5 days, though of course the tail takes more time before code is fully production ready."

- The pipeline, which has been built on Mesos Marathon, will be migrated onto Kubernetes. Some legacy applications are also being rewritten as cloud native in order to run on the framework. At least two smaller greenfield projects built on Kubernetes will go into production this year. By the end of 2018, the company plans to have converted a number of APIs used in the banking customer experience to cloud native APIs and host these on the Kubernetes-based platform. +{{< case-studies/lead >}} +ING has long embraced innovation in banking, launching the internet-based ING Direct in 1997. +{{< /case-studies/lead >}} -
-
-
-
- "We have to run the complete platform of services we need, many routing from different places. We need this Kubernetes framework for deploying the containers, with all those components, monitoring, logging. It’s complex."

— Onno Van der Voort, Infrastructure Architect, ING
-
-
+

In that same spirit, the company underwent an agile transformation a few years ago. "Our DevOps teams got empowered to be autonomous," says Infrastructure Architect Thijs Ebbers. "It has benefits; you get all kinds of ideas. But a lot of teams are going to devise the same wheel. Teams started tinkering with Docker, Docker Swarm, Kubernetes, Mesos. Well, it's not really useful for a company to have one hundred wheels, instead of one good wheel."

+

Looking to standardize the deployment process within the company's strict security guidelines, the team looked at several solutions and found that in the past year, "Kubernetes won the container management framework wars," says Ebbers. "We decided to standardize ING on a Kubernetes framework." Everything is run on premise due to banking regulations, he adds, but "we will be building an internal public cloud. We are trying to get on par with what public clouds are doing. That's one of the reasons we got Kubernetes."

-
- The team, however, doesn’t see the bank’s back-end systems going onto the Kubernetes platform. "Our philosophy is it only makes sense to move things to cloud if they are cloud native," says Van der Voort. "If you have traditional architecture, build traditional patterns, it doesn’t hold any value to go to the cloud." Adds Cloud Platform Architect Alfonso Fernandez-Barandiaran: "ING has a strategy about where we will go, in order to improve our agility. So it’s not about how cool this technology is, it’s about finding the right technology and the right approach."

- The Kubernetes framework will be hosting some greenfield projects that are high priority for ING: applications the company is developing in response to PSD2, the European Commission directive requiring more innovative online and mobile payments that went into effect at the beginning of 2018. For example, a bank-account management app called Yolt, serving the U.K. market (and soon France and Italy), was built on a Kubernetes platform and has gone into production. ING is also developing blockchain-enabled applications that will live on the Kubernetes platform. "We’ve been contacted by a lot of development teams that have ideas with what they want to do with containers," says Ebbers. +

They also embraced Docker to address a major pain point in ING's CI/CD pipeline. Before containerization, "Every development team had to order a VM, and it was quite a heavy delivery model for them," says Infrastructure Architect Onno Van der Voort. "Another use case for containerization is when the application travels through the pipeline, they fire up Docker containers to do test work against the applications and after they've done the work, the containers get killed again."

-
-
-
-
-Even with the particular requirements that come in banking, ING has managed to take a lead in technology and innovation. "Every time we have constraints, we look for maybe a better way that we can use this technology."

— Alfonso Fernandez-Barandiaran, Cloud Platform Architect, ING
-
+{{< case-studies/quote + image="/images/case-studies/ing/banner3.jpg" + author="Thijs Ebbers, Infrastructure Architect, ING" +>}} +"We decided to standardize ING on a Kubernetes framework." Everything is run on premise due to banking regulations, he adds, but "we will be building an internal public cloud. We are trying to get on par with what public clouds are doing. That's one of the reasons we got Kubernetes." +{{< /case-studies/quote >}} -
- Even with the particular requirements that come in banking, ING has managed to take a lead in technology and innovation. "Every time we have constraints, we look for maybe a better way that we can use this technology," says Fernandez-Barandiaran.

- The results, after all, are worth the effort. "The big cloud native promise to our business is the ability to go from idea to production within 48 hours," says Ebbers. "That would require all these projects to be mature. We are some years away from this, but that’s quite feasible to us." +

Because of industry regulations, applications are only allowed to go through the pipeline, where compliance is enforced, rather than be deployed directly into a container. "We have to run the complete platform of services we need, many routing from different places," says Van der Voort. "We need this Kubernetes framework for deploying the containers, with all those components, monitoring, logging. It's complex." For that reason, ING has chosen to start on the OpenShift Origin Kubernetes distribution.

-
+

Already, "cloud native technologies are helping our speed, from getting an application to test to acceptance to production," says Van der Voort. "If you walk around ING now, you see all these DevOps teams, doing stand-ups, demoing. They try to get new functionality out there really fast. We held a hackathon for one of our existing components and basically converted it to cloud native within 2.5 days, though of course the tail takes more time before code is fully production ready."

-
+

The pipeline, which has been built on Mesos Marathon, will be migrated onto Kubernetes. Some legacy applications are also being rewritten as cloud native in order to run on the framework. At least two smaller greenfield projects built on Kubernetes will go into production this year. By the end of 2018, the company plans to have converted a number of APIs used in the banking customer experience to cloud native APIs and host these on the Kubernetes-based platform.

+ +{{< case-studies/quote + image="/images/case-studies/ing/banner4.jpg" + author="Onno Van der Voort, Infrastructure Architect, ING" +>}} +"We have to run the complete platform of services we need, many routing from different places. We need this Kubernetes framework for deploying the containers, with all those components, monitoring, logging. It's complex." +{{< /case-studies/quote >}} + +

The team, however, doesn't see the bank's back-end systems going onto the Kubernetes platform. "Our philosophy is it only makes sense to move things to cloud if they are cloud native," says Van der Voort. "If you have traditional architecture, build traditional patterns, it doesn't hold any value to go to the cloud." Adds Cloud Platform Architect Alfonso Fernandez-Barandiaran: "ING has a strategy about where we will go, in order to improve our agility. So it's not about how cool this technology is, it's about finding the right technology and the right approach."

+ +

The Kubernetes framework will be hosting some greenfield projects that are high priority for ING: applications the company is developing in response to PSD2, the European Commission directive requiring more innovative online and mobile payments that went into effect at the beginning of 2018. For example, a bank-account management app called Yolt, serving the U.K. market (and soon France and Italy), was built on a Kubernetes platform and has gone into production. ING is also developing blockchain-enabled applications that will live on the Kubernetes platform. "We've been contacted by a lot of development teams that have ideas with what they want to do with containers," says Ebbers.

+ +{{< case-studies/quote author="Alfonso Fernandez-Barandiaran, Cloud Platform Architect, ING" >}} +Even with the particular requirements that come in banking, ING has managed to take a lead in technology and innovation. "Every time we have constraints, we look for maybe a better way that we can use this technology." +{{< /case-studies/quote >}} + +

Even with the particular requirements that come in banking, ING has managed to take a lead in technology and innovation. "Every time we have constraints, we look for maybe a better way that we can use this technology," says Fernandez-Barandiaran.

+ +

The results, after all, are worth the effort. "The big cloud native promise to our business is the ability to go from idea to production within 48 hours," says Ebbers. "That would require all these projects to be mature. We are some years away from this, but that's quite feasible to us."

diff --git a/content/en/case-studies/jd-com/index.html b/content/en/case-studies/jd-com/index.html index aed12fc54b..ae3360216b 100644 --- a/content/en/case-studies/jd-com/index.html +++ b/content/en/case-studies/jd-com/index.html @@ -3,95 +3,77 @@ title: JD.com Case Study linkTitle: jd-com case_study_styles: true cid: caseStudies -css: /css/style_case_studies.css featured: false + +new_case_study_styles: true +heading_background: /images/case-studies/jdcom/banner1.jpg +heading_title_logo: /images/jdcom_logo.png +subheading: > + JD.com: How JD.com Pioneered Kubernetes for E-Commerce at Hyperscale +case_study_details: + - Company: JD.com + - Location: Beijing, China + - Industry: eCommerce --- -
-

CASE STUDY:
JD.com: How JD.com Pioneered Kubernetes for E-Commerce at Hyperscale +

Challenge

-

+

With more than 300 million active users and total 2017 revenue of more than $55 billion, JD.com is China's largest retailer, and its operations are the epitome of hyperscale. For example, there are more than a trillion images in JD.com's product databases—with 100 million being added daily—and this enormous amount of data needs to be instantly accessible. In 2014, JD.com moved its applications to containers running on bare metal machines using OpenStack and Docker to "speed up the delivery of our computing resources and make the operations much simpler," says Haifeng Liu, JD.com's Chief Architect. But by the end of 2015, with tens of thousands of nodes running in multiple data centers, "we encountered a lot of problems because our platform was not strong enough, and we suffered from bottlenecks and scalability issues," says Liu. "We needed infrastructure for the next five years of development, now."

-
+

Solution

-
- Company  JD.com     Location  Beijing, China     Industry  eCommerce -
+

JD.com turned to Kubernetes to accommodate its clusters. At the beginning of 2016, the company began to transition from OpenStack to Kubernetes, and today, JD.com runs the world's largest Kubernetes cluster. "Kubernetes has provided a strong foundation on top of which we have customized the solution to suit our needs as China's largest retailer."

-
-
-
-
-

Challenge

- With more than 300 million active users and total 2017 revenue of more than $55 billion, JD.com is China’s largest retailer, and its operations are the epitome of hyperscale. For example, there are more than a trillion images in JD.com’s product databases—with 100 million being added daily—and this enormous amount of data needs to be instantly accessible. In 2014, JD.com moved its applications to containers running on bare metal machines using OpenStack and Docker to "speed up the delivery of our computing resources and make the operations much simpler," says Haifeng Liu, JD.com’s Chief Architect. But by the end of 2015, with tens of thousands of nodes running in multiple data centers, "we encountered a lot of problems because our platform was not strong enough, and we suffered from bottlenecks and scalability issues," says Liu. "We needed infrastructure for the next five years of development, now." - -

Solution

- JD.com turned to Kubernetes to accommodate its clusters. At the beginning of 2016, the company began to transition from OpenStack to Kubernetes, and today, JD.com runs the world’s largest Kubernetes cluster. "Kubernetes has provided a strong foundation on top of which we have customized the solution to suit our needs as China’s largest retailer." -

Impact

- "We have greater data center efficiency, better managed resources, and smarter deployment with the Kubernetes platform," says Liu. Deployment time went from several hours to tens of seconds. Efficiency has improved by 20-30%, measured in IT costs. With the further optimizations the team is working on, Liu believes there is the potential to save hundreds of millions of dollars a year. But perhaps the best indication of success was the annual Singles Day shopping event, which ran on the Kubernetes platform for the first time in 2018. Over 11 days, transaction volume on JD.com was $23 billion, and "our e-commerce platforms did great," says Liu. "Infrastructure led the way to prep for 11.11. We took the approach of predicting volume, emulating the behavior of customers to prepare beforehand, and drilled for malfunctions. Because of Kubernetes’s scalability, we were able to handle an extremely high level of demand." -
-
-
-
-
- "Kubernetes helped us reduce the complexity of operations to make distributed systems stable and scalable. Most importantly, we can leverage Kubernetes for scheduling resources to reduce hardware costs. That’s the big win." -

- HAIFENG LIU, CHIEF ARCHITECT, JD.com
-
-
-
-
-

With more than 300 million active users and $55.7 billion in annual revenues last year, JD.com is China’s largest retailer, and its operations are the epitome of hyperscale.

- For example, there are more than a trillion images in JD.com’s product databases for customers, with 100 million being added daily. And this enormous amount of data needs to be instantly accessible to enable a smooth online customer experience. -

- In 2014, JD.com moved its applications to containers running on bare metal machines using OpenStack and Docker to "speed up the delivery of our computing resources and make the operations much simpler," says Haifeng Liu, JD.com’s Chief Architect. But by the end of 2015, with hundreds of thousands of nodes in multiple data centers, "we encountered a lot of problems because our platform was not strong enough, and we suffered from bottlenecks and scalability issues," Liu adds. "We needed infrastructure for the next five years of development, now." -

- After considering a number of orchestration technologies, JD.com decided to adopt Kubernetes to accommodate its ever-growing clusters. "The main reason is because Kubernetes can give us more efficient, scalable and much simpler application deployments, plus we can leverage it to do flexible platform scheduling," says Liu. +

"We have greater data center efficiency, better managed resources, and smarter deployment with the Kubernetes platform," says Liu. Deployment time went from several hours to tens of seconds. Efficiency has improved by 20-30%, measured in IT costs. With the further optimizations the team is working on, Liu believes there is the potential to save hundreds of millions of dollars a year. But perhaps the best indication of success was the annual Singles Day shopping event, which ran on the Kubernetes platform for the first time in 2018. Over 11 days, transaction volume on JD.com was $23 billion, and "our e-commerce platforms did great," says Liu. "Infrastructure led the way to prep for 11.11. We took the approach of predicting volume, emulating the behavior of customers to prepare beforehand, and drilled for malfunctions. Because of Kubernetes's scalability, we were able to handle an extremely high level of demand."

-
-
-
-
- "We customized Kubernetes and built a modern system on top of it. This entire ecosystem of Kubernetes plus our own optimizations have helped us save costs and time."

- HAIFENG LIU, CHIEF ARCHITECT, JD.com
+{{< case-studies/quote author="HAIFENG LIU, CHIEF ARCHITECT, JD.com" >}} +"Kubernetes helped us reduce the complexity of operations to make distributed systems stable and scalable. Most importantly, we can leverage Kubernetes for scheduling resources to reduce hardware costs. That's the big win." +{{< /case-studies/quote >}} -
-
-
-
- The fact that Kubernetes is based on Google’s Borg also gave the company confidence. The team liked that Kubernetes has a clear and simple architecture, and that it’s developed mostly in Go, which is a popular language within JD.com. Though he felt that at the time Kubernetes "was not mature enough," Liu says, "we adopted it anyway." -

- The team spent a year developing the new container engine platform based on Kubernetes, and at the end of 2016, began promoting it within the company. "We wanted the cluster to be the default way for creating services, so scalability is easier," says Liu. "We talked to developers, interest grew, and we solved problems together." Some of these problems included networking performance and etcd scalability. "But during the past two years, Kubernetes has become more mature and very stable," he adds. -

- Today, the company runs the world’s largest Kubernetes cluster. "We customized Kubernetes and built a modern system on top of it," says Liu. "This entire ecosystem of Kubernetes plus our own optimizations have helped us save costs and time. We have greater data center efficiency, better managed resources, and smarter deployment with the Kubernetes platform." +{{< case-studies/lead >}} +With more than 300 million active users and $55.7 billion in annual revenues last year, JD.com is China's largest retailer, and its operations are the epitome of hyperscale. +{{< /case-studies/lead >}} -
-
-
-
- "My advice is first you need to combine this technology with your own businesses, and the second is you need clear goals. You cannot just use the technology because others are using it. You need to consider your own objectives."

- HAIFENG LIU, CHIEF ARCHITECT, JD.com
-
-
+

For example, there are more than a trillion images in JD.com's product databases for customers, with 100 million being added daily. And this enormous amount of data needs to be instantly accessible to enable a smooth online customer experience.

-
-
- The results are clear: Deployment time went from several hours to tens of seconds. Efficiency has improved by 20-30%, measured in IT costs. But perhaps the best indication of success was the annual Singles Day shopping event, which ran on the Kubernetes platform for the first time in 2018. Over 11 days, transaction volume on JD.com was $23 billion, and "our e-commerce platforms did great," says Liu. "Infrastructure led the way to prep for 11.11. We took the approach of predicting volume, emulating the behavior of customers to prepare beforehand, and drilled for malfunctions. Because of Kubernetes’s scalability, we were able to handle an extremely high level of demand." -

- JD.com is now in its second stage with Kubernetes: The platform is already stable, scalable, and flexible, so the focus is on how to run things much more efficiently to further reduce costs. With the optimizations the team is working on with resource management, Liu believes there is the potential to save hundreds of millions of dollars a year. -

- "We run Kubernetes and container clusters on roughly tens of thousands of physical bare metal nodes," he says. "Using Kubernetes and leveraging our own machine learning pipeline to predict how many resources we need for each application we use, and our own intelligent scaling algorithm, we can improve our resource usage. If we boost the resource usage, for example, by several percent, that means we can reduce huge hardware costs. Then we don’t need that many servers to get that same amount of workload. That can save us a lot of resources." -
+

In 2014, JD.com moved its applications to containers running on bare metal machines using OpenStack and Docker to "speed up the delivery of our computing resources and make the operations much simpler," says Haifeng Liu, JD.com's Chief Architect. But by the end of 2015, with hundreds of thousands of nodes in multiple data centers, "we encountered a lot of problems because our platform was not strong enough, and we suffered from bottlenecks and scalability issues," Liu adds. "We needed infrastructure for the next five years of development, now."

-
-
-"We can share our successful experience with the community, and we also receive good feedback from others. So it’s mutually beneficial."

- HAIFENG LIU, CHIEF ARCHITECT, JD.com
-
+

After considering a number of orchestration technologies, JD.com decided to adopt Kubernetes to accommodate its ever-growing clusters. "The main reason is because Kubernetes can give us more efficient, scalable and much simpler application deployments, plus we can leverage it to do flexible platform scheduling," says Liu.

-
- JD.com, which won CNCF’s 2018 End User Award, is also using Helm, CNI, Harbor, and Vitess on its platform. JD.com developers have made considerable contributions to Vitess, the CNCF project for scalable MySQL cluster management, and the company hopes to donate its own project to CNCF in the near future. Community participation is a priority for JD.com. "We have a good partnership with this community," says Liu. "We can share our successful experience with the community, and we also receive good feedback from others. So it’s mutually beneficial." -

- To that end, Liu offers this advice for other companies considering adopting cloud native technology. "First you need to combine this technology with your own businesses, and the second is you need clear goals," he says. "You cannot just use the technology because others are using it. You need to consider your own objectives." -

- For JD.com’s objectives, these cloud native technologies have been an ideal fit with the company’s own homegrown innovation. "Kubernetes helped us reduce the complexity of operations to make distributed systems stable and scalable," says Liu. "Most importantly, we can leverage Kubernetes for scheduling resources to reduce hardware costs. That’s the big win." -
-
+{{< case-studies/quote + image="/images/case-studies/jdcom/banner3.jpg" + author="HAIFENG LIU, CHIEF ARCHITECT, JD.com" +>}} +"We customized Kubernetes and built a modern system on top of it. This entire ecosystem of Kubernetes plus our own optimizations have helped us save costs and time." +{{< /case-studies/quote >}} + +

The fact that Kubernetes is based on Google's Borg also gave the company confidence. The team liked that Kubernetes has a clear and simple architecture, and that it's developed mostly in Go, which is a popular language within JD.com. Though he felt that at the time Kubernetes "was not mature enough," Liu says, "we adopted it anyway."

+ +

The team spent a year developing the new container engine platform based on Kubernetes, and at the end of 2016, began promoting it within the company. "We wanted the cluster to be the default way for creating services, so scalability is easier," says Liu. "We talked to developers, interest grew, and we solved problems together." Some of these problems included networking performance and etcd scalability. "But during the past two years, Kubernetes has become more mature and very stable," he adds.

+ +

Today, the company runs the world's largest Kubernetes cluster. "We customized Kubernetes and built a modern system on top of it," says Liu. "This entire ecosystem of Kubernetes plus our own optimizations have helped us save costs and time. We have greater data center efficiency, better managed resources, and smarter deployment with the Kubernetes platform."

+ +{{< case-studies/quote + image="/images/case-studies/jdcom/banner4.jpg" + author="HAIFENG LIU, CHIEF ARCHITECT, JD.com" +>}} +"My advice is first you need to combine this technology with your own businesses, and the second is you need clear goals. You cannot just use the technology because others are using it. You need to consider your own objectives." +{{< /case-studies/quote >}} + +

The results are clear: Deployment time went from several hours to tens of seconds. Efficiency has improved by 20-30%, measured in IT costs. But perhaps the best indication of success was the annual Singles Day shopping event, which ran on the Kubernetes platform for the first time in 2018. Over 11 days, transaction volume on JD.com was $23 billion, and "our e-commerce platforms did great," says Liu. "Infrastructure led the way to prep for 11.11. We took the approach of predicting volume, emulating the behavior of customers to prepare beforehand, and drilled for malfunctions. Because of Kubernetes's scalability, we were able to handle an extremely high level of demand."

+ +

JD.com is now in its second stage with Kubernetes: The platform is already stable, scalable, and flexible, so the focus is on how to run things much more efficiently to further reduce costs. With the optimizations the team is working on with resource management, Liu believes there is the potential to save hundreds of millions of dollars a year.

+ +

"We run Kubernetes and container clusters on roughly tens of thousands of physical bare metal nodes," he says. "Using Kubernetes and leveraging our own machine learning pipeline to predict how many resources we need for each application we use, and our own intelligent scaling algorithm, we can improve our resource usage. If we boost the resource usage, for example, by several percent, that means we can reduce huge hardware costs. Then we don't need that many servers to get that same amount of workload. That can save us a lot of resources."

+ +{{< case-studies/quote author="HAIFENG LIU, CHIEF ARCHITECT, JD.com" >}} +"We can share our successful experience with the community, and we also receive good feedback from others. So it's mutually beneficial." +{{< /case-studies/quote >}} + +

JD.com, which won CNCF's 2018 End User Award, is also using Helm, CNI, Harbor, and Vitess on its platform. JD.com developers have made considerable contributions to Vitess, the CNCF project for scalable MySQL cluster management, and the company hopes to donate its own project to CNCF in the near future. Community participation is a priority for JD.com. "We have a good partnership with this community," says Liu. "We can share our successful experience with the community, and we also receive good feedback from others. So it's mutually beneficial."

+ +

To that end, Liu offers this advice for other companies considering adopting cloud native technology. "First you need to combine this technology with your own businesses, and the second is you need clear goals," he says. "You cannot just use the technology because others are using it. You need to consider your own objectives."

+ +

For JD.com's objectives, these cloud native technologies have been an ideal fit with the company's own homegrown innovation. "Kubernetes helped us reduce the complexity of operations to make distributed systems stable and scalable," says Liu. "Most importantly, we can leverage Kubernetes for scheduling resources to reduce hardware costs. That's the big win."

diff --git a/content/en/case-studies/naic/index.html b/content/en/case-studies/naic/index.html index 3deb91e480..23955bdef8 100644 --- a/content/en/case-studies/naic/index.html +++ b/content/en/case-studies/naic/index.html @@ -1,113 +1,87 @@ --- title: NAIC Case Study - linkTitle: NAIC case_study_styles: true cid: caseStudies -css: /css/style_case_studies.css logo: naic_featured_logo.png featured: false + +new_case_study_styles: true +heading_background: /images/case-studies/naic/banner1.jpg +heading_title_logo: /images/naic_logo.png +subheading: > + A Culture and Technology Transition Enabled by Kubernetes +case_study_details: + - Company: NAIC + - Location: Washington, DC + - Industry: Regulatory --- -
-

CASE STUDY:
A Culture and Technology Transition Enabled by Kubernetes

+

Challenge

-
+

The National Association of Insurance Commissioners (NAIC), the U.S. standard-setting and regulatory support organization, was looking for a way to deliver new services faster to provide more value for members and staff. It also needed greater agility to improve productivity internally.

-
- Company  National Association of Insurance Commissioners (NAIC)     Location  Washington, DC     Industry  Regulatory -
- -
-
-
-
-

Challenge

- The National Association of Insurance Commissioners (NAIC), the U.S. standard-setting and regulatory support organization, was looking for a way to deliver new services faster to provide more value for members and staff. It also needed greater agility to improve productivity internally. -

-

Solution

- Beginning in 2016, they started using Cloud Native Computing Foundation (CNCF) tools such as Prometheus. NAIC began hosting internal systems and development systems on Kubernetes at the beginning of 2018, as part of a broad move toward the public cloud. "Our culture and technology transition is a strategy embraced by our top leaders," says Dan Barker, Chief Enterprise Architect. "It has already proven successful by allowing us to accelerate our value pipeline by more than double while decreasing our costs by more than half. We are also seeing customer satisfaction increase as we add more and more applications to these new technologies." - - - -
- -
+

Solution

+

Beginning in 2016, they started using Cloud Native Computing Foundation (CNCF) tools such as Prometheus. NAIC began hosting internal systems and development systems on Kubernetes at the beginning of 2018, as part of a broad move toward the public cloud. "Our culture and technology transition is a strategy embraced by our top leaders," says Dan Barker, Chief Enterprise Architect. "It has already proven successful by allowing us to accelerate our value pipeline by more than double while decreasing our costs by more than half. We are also seeing customer satisfaction increase as we add more and more applications to these new technologies."

Impact

- Leveraging Kubernetes, "our development teams can create rapid prototypes far faster than they used to," Barker said. Applications running on Kubernetes are more resilient than those running in other environments. The deployment of open source solutions is helping influence company culture, as NAIC becomes a more open and transparent organization. -

- "We completed a small prototype in two days that would have previously taken at least a month," Barker says. Resiliency is currently measured in how much downtime systems have. "They’ve basically had none, and the occasional issue is remedied in minutes," he says. +

Leveraging Kubernetes, "our development teams can create rapid prototypes far faster than they used to," Barker said. Applications running on Kubernetes are more resilient than those running in other environments. The deployment of open source solutions is helping influence company culture, as NAIC becomes a more open and transparent organization.

-
+

"We completed a small prototype in two days that would have previously taken at least a month," Barker says. Resiliency is currently measured in how much downtime systems have. "They've basically had none, and the occasional issue is remedied in minutes," he says.

-
-
-
-
- "Our culture and technology transition is a strategy embraced by our top leaders. It has already proven successful by allowing us to accelerate our value pipeline by more than double while decreasing our costs by more than half. We are also seeing customer satisfaction increase as we add more and more applications to these new technologies."

- Dan Barker, Chief Enterprise Architect, NAIC
-
-
-
-
- NAIC—which was created and overseen by the chief insurance regulators from the 50 states, the District of Columbia and five U.S. territories—provides a means through which state insurance regulators establish standards and best practices, conduct peer reviews, and coordinate their regulatory oversight. Their staff supports these efforts and represents the collective views of regulators in the United States and internationally. NAIC members, together with the organization’s central resources, form the national system of state-based insurance regulation in the United States.

-The organization has been using the cloud for years, and wanted to find more ways to quickly deliver new services that provide more value for members and staff. They looked to Kubernetes for a solution. Within NAIC, several groups are leveraging Kubernetes, one being the Platform Engineering Team. "The team building out these tools are not only deploying and operating Kubernetes, but they’re also using them," Barker says. "In fact, we’re using GitLab to deploy Kubernetes with a pipeline using kops. This team was created from developers, operators, and quality engineers from across the company, so their jobs have changed quite a bit."

-In addition, NAIC is onboarding teams to the new platform, and those teams have seen a lot of change in how they work and what they can do. "They now have more power in creating their own infrastructure and deploying their own applications," Barker says. They also use pipelines to facilitate their currently manual processes. NAIC has consumers who are using GitLab heavily, and they’re starting to use Kubernetes to deploy simple applications that help their internal processes. +{{< case-studies/quote author="Dan Barker, Chief Enterprise Architect, NAIC" >}} +"Our culture and technology transition is a strategy embraced by our top leaders. It has already proven successful by allowing us to accelerate our value pipeline by more than double while decreasing our costs by more than half. We are also seeing customer satisfaction increase as we add more and more applications to these new technologies." +{{< /case-studies/quote >}} +

NAIC—which was created and overseen by the chief insurance regulators from the 50 states, the District of Columbia and five U.S. territories—provides a means through which state insurance regulators establish standards and best practices, conduct peer reviews, and coordinate their regulatory oversight. Their staff supports these efforts and represents the collective views of regulators in the United States and internationally. NAIC members, together with the organization's central resources, form the national system of state-based insurance regulation in the United States.

-
-
-
-
- "In our experience, vendor lock-in and tooling that is highly specific results in less resilient technology with fewer minds working to solve problems and grow the community."

- Dan Barker, Chief Enterprise Architect, NAIC
-
-
-
-
- "We needed greater agility to enable our own productivity internally," he says. "We decided it was right for us to move everything to the public cloud [Amazon Web Services] to help with that process and be able to access many of the native tools that allows us to move faster by not needing to build everything." -The NAIC also wanted to be cloud-agnostic, "and Kubernetes helps with this for our compute layer," Barker says. "Compute is pretty standard across the clouds, and now we can take advantage of any of them while getting all of the other features Kubernetes offers."

-The NAIC currently hosts internal systems and development systems on Kubernetes, and has already seen how impactful it can be. "Our development teams can create rapid prototypes in minutes instead of weeks," Barker says. "This recently happened with an internal tool that had no measurable wait time on the infrastructure. It was solely development bound. There is now a central shared resource that lives in AWS, which means it can grow as needed." -The native integrations into Kubernetes at NAIC has made it easy to write code and have it running in minutes instead of weeks. Applications running on Kubernetes have also proven to be more resilient than those running in other environments. "We even have teams using this to create more internal tools to help with communication or automating some of their current tasks," Barker says. -

-"We knew that Kubernetes had become the de facto standard for container orchestration," he says. "Two major factors for selecting this were the three major cloud vendors hosting their own versions and having it hosted in a neutral party as fully open source." -

-As for other CNCF projects, NAIC is using Prometheus on a small scale and hopes to continue using it moving forward because of the seamless integration with Kubernetes. The Association also is considering gRPC as its internal communications standard, Envoy in conjunction with Istio for service mesh, OpenTracing and Jaeger for tracing aggregation, and Fluentd with its Elasticsearch cluster. +

The organization has been using the cloud for years, and wanted to find more ways to quickly deliver new services that provide more value for members and staff. They looked to Kubernetes for a solution. Within NAIC, several groups are leveraging Kubernetes, one being the Platform Engineering Team. "The team building out these tools are not only deploying and operating Kubernetes, but they're also using them," Barker says. "In fact, we're using GitLab to deploy Kubernetes with a pipeline using kops. This team was created from developers, operators, and quality engineers from across the company, so their jobs have changed quite a bit."

-
-
-
-
- "We knew that Kubernetes had become the de facto standard for container orchestration. Two major factors for selecting this were the three major cloud vendors hosting their own versions and having it hosted in a neutral party as fully open source."

- Dan Barker, Chief Enterprise Architect, NAIC
-
-
- +

In addition, NAIC is onboarding teams to the new platform, and those teams have seen a lot of change in how they work and what they can do. "They now have more power in creating their own infrastructure and deploying their own applications," Barker says. They also use pipelines to facilitate their currently manual processes. NAIC has consumers who are using GitLab heavily, and they're starting to use Kubernetes to deploy simple applications that help their internal processes.

-
-
+{{< case-studies/quote + image="/images/case-studies/naic/banner3.jpg" + author="Dan Barker, Chief Enterprise Architect, NAIC" +>}} +"In our experience, vendor lock-in and tooling that is highly specific results in less resilient technology with fewer minds working to solve problems and grow the community." +{{< /case-studies/quote >}} -The open governance and broad industry participation in CNCF provided a comfort level with the technology, Barker says. "We also see it as helping to influence our own company culture," he says. "We’re moving to be a more open and transparent company, and we are encouraging our staff to get involved with the different working groups and codebases. We recently became CNCF members to help further our commitment to community contribution and transparency."

-Factors such as vendor-neutrality and cross-industry investment were important in the selection. "In our experience, vendor lock-in and tooling that is highly specific results in less resilient technology with fewer minds working to solve problems and grow the community," Barker says.

-NAIC is a largely Oracle shop, Barker says, and has been running mostly Java on JBoss. "However, we have years of history with other applications," he says. "Some of these have been migrated by completely rewriting the application, while others are just being modified slightly to fit into this new paradigm."

-Running on AWS cloud, the Association has not specifically taken a microservices approach. "We are moving to microservices where practical, but we haven’t found that it’s a necessity to operate them within Kubernetes," Barker says

-All of its databases are currently running within public cloud services, but they have explored eventually running those in Kubernetes, as it makes sense. "We’re doing this to get more reuse from common components and to limit our failure domains to something more manageable and observable," Barker says. +

"We needed greater agility to enable our own productivity internally," he says. "We decided it was right for us to move everything to the public cloud [Amazon Web Services] to help with that process and be able to access many of the native tools that allows us to move faster by not needing to build everything." +The NAIC also wanted to be cloud-agnostic, "and Kubernetes helps with this for our compute layer," Barker says. "Compute is pretty standard across the clouds, and now we can take advantage of any of them while getting all of the other features Kubernetes offers."

+

The NAIC currently hosts internal systems and development systems on Kubernetes, and has already seen how impactful it can be. "Our development teams can create rapid prototypes in minutes instead of weeks," Barker says. "This recently happened with an internal tool that had no measurable wait time on the infrastructure. It was solely development bound. There is now a central shared resource that lives in AWS, which means it can grow as needed."

-
+

The native integrations into Kubernetes at NAIC has made it easy to write code and have it running in minutes instead of weeks. Applications running on Kubernetes have also proven to be more resilient than those running in other environments. "We even have teams using this to create more internal tools to help with communication or automating some of their current tasks," Barker says.

-
-
- "We have been able to move much faster at lower cost than we were able to in the past," Barker says. "We were able to complete one of our projects in a year, when the previous version took over two years. And the new project cost $500,000 while the original required $3 million, and with fewer defects. We are also able to push out new features much faster."

- Dan Barker, Chief Enterprise Architect, NAIC
+

"We knew that Kubernetes had become the de facto standard for container orchestration," he says. "Two major factors for selecting this were the three major cloud vendors hosting their own versions and having it hosted in a neutral party as fully open source."

-
-
-
+

As for other CNCF projects, NAIC is using Prometheus on a small scale and hopes to continue using it moving forward because of the seamless integration with Kubernetes. The Association also is considering gRPC as its internal communications standard, Envoy in conjunction with Istio for service mesh, OpenTracing and Jaeger for tracing aggregation, and Fluentd with its Elasticsearch cluster.

-NAIC has seen a significant business impact from its efforts. "We have been able to move much faster at lower cost than we were able to in the past," Barker says. "We were able to complete one of our projects in a year, when the previous version took over two years. And the new project cost $500,000 while the original required $3 million, and with fewer defects. We are also able to push out new features much faster." -He says the organization is moving toward continuous deployment "because the business case makes sense. The research is becoming very hard to argue with. We want to reduce our batch sizes and optimize on delivering value to customers and not feature count. This is requiring a larger cultural shift than just a technology shift." -NAIC is "becoming more open and transparent, as well as more resilient to failure," Barker says. "Even our customers are wanting more and more of this and trying to figure out how they can work with us to accomplish our mutual goals faster. Members of the insurance industry have reached out so that we can better learn together and grow as an industry." +{{< case-studies/quote + image="/images/case-studies/naic/banner4.jpg" + author="Dan Barker, Chief Enterprise Architect, NAIC" +>}} +"We knew that Kubernetes had become the de facto standard for container orchestration. Two major factors for selecting this were the three major cloud vendors hosting their own versions and having it hosted in a neutral party as fully open source." +{{< /case-studies/quote >}} -
+

The open governance and broad industry participation in CNCF provided a comfort level with the technology, Barker says. "We also see it as helping to influence our own company culture," he says. "We're moving to be a more open and transparent company, and we are encouraging our staff to get involved with the different working groups and codebases. We recently became CNCF members to help further our commitment to community contribution and transparency."

-
+

Factors such as vendor-neutrality and cross-industry investment were important in the selection. "In our experience, vendor lock-in and tooling that is highly specific results in less resilient technology with fewer minds working to solve problems and grow the community," Barker says.

+ +

NAIC is a largely Oracle shop, Barker says, and has been running mostly Java on JBoss. "However, we have years of history with other applications," he says. "Some of these have been migrated by completely rewriting the application, while others are just being modified slightly to fit into this new paradigm."

+ +

Running on AWS cloud, the Association has not specifically taken a microservices approach. "We are moving to microservices where practical, but we haven't found that it's a necessity to operate them within Kubernetes," Barker says.

+ +

All of its databases are currently running within public cloud services, but they have explored eventually running those in Kubernetes, as it makes sense. "We're doing this to get more reuse from common components and to limit our failure domains to something more manageable and observable," Barker says.

+ +{{< case-studies/quote author="Dan Barker, Chief Enterprise Architect, NAIC" >}} +"We have been able to move much faster at lower cost than we were able to in the past," Barker says. "We were able to complete one of our projects in a year, when the previous version took over two years. And the new project cost $500,000 while the original required $3 million, and with fewer defects. We are also able to push out new features much faster." +{{< /case-studies/quote >}} + +

NAIC has seen a significant business impact from its efforts. "We have been able to move much faster at lower cost than we were able to in the past," Barker says. "We were able to complete one of our projects in a year, when the previous version took over two years. And the new project cost $500,000 while the original required $3 million, and with fewer defects. We are also able to push out new features much faster."

+ +

He says the organization is moving toward continuous deployment "because the business case makes sense. The research is becoming very hard to argue with. We want to reduce our batch sizes and optimize on delivering value to customers and not feature count. This is requiring a larger cultural shift than just a technology shift."

+ +

NAIC is "becoming more open and transparent, as well as more resilient to failure," Barker says. "Even our customers are wanting more and more of this and trying to figure out how they can work with us to accomplish our mutual goals faster. Members of the insurance industry have reached out so that we can better learn together and grow as an industry."

\ No newline at end of file diff --git a/content/en/case-studies/nav/index.html b/content/en/case-studies/nav/index.html index bd606e7314..6be206b5e3 100644 --- a/content/en/case-studies/nav/index.html +++ b/content/en/case-studies/nav/index.html @@ -1,95 +1,81 @@ --- title: Nav Case Study - linkTitle: Nav case_study_styles: true cid: caseStudies -css: /css/style_case_studies.css featured: false + +new_case_study_styles: true +heading_background: /images/case-studies/nav/banner1.jpg +heading_title_logo: /images/nav_logo.png +subheading: > + How A Startup Reduced Its Infrastructure Costs by 50% With Kubernetes +case_study_details: + - Company: Nav + - Location: Salt Lake City, Utah, and San Mateo, California + - Industry: Financial services for businesses --- -
-

CASE STUDY:
How A Startup Reduced Its Infrastructure Costs by 50% With Kubernetes -

+

Challenge

-
+

Founded in 2012, Nav provides small business owners with access to their business credit scores from all three major commercial credit bureaus—Equifax, Experian and Dun & Bradstreet—and financing options that best fit their needs. Five years in, the startup was growing rapidly, and "our cloud environments were getting very large, and our usage of those environments was extremely low, like under 1%," says Director of Engineering Travis Jeppson. "We wanted our usage of cloud environments to be more tightly coupled with what we actually needed, so we started looking at containerization and orchestration to help us be able to run workloads that were distinct from one another but could share a similar resource pool."

-
- Company  Nav     Location  Salt Lake City, Utah, and San Mateo, California     Industry  Financial services for businesses -
+

Solution

+ +

After evaluating a number of orchestration solutions, the Nav team decided to adopt Kubernetes running on AWS. The strength of the community around Kubernetes was a strong draw, as well as its Google provenance. Plus, "the other solutions tended to be fairly heavy-handed, really complex, really large, and really hard to manage just off the bat," says Jeppson. "Kubernetes gave us a very simple way to be able to step into an orchestration solution that fit our needs at the time, but also the extensibility of it allowed us to be able to grow with it and be able to build in more features and functionality later on."

-
-
-
-
-

-

Challenge

- Founded in 2012, Nav provides small business owners with access to their business credit scores from all three major commercial credit bureaus—Equifax, Experian and Dun & Bradstreet—and financing options that best fit their needs. Five years in, the startup was growing rapidly, and “our cloud environments were getting very large, and our usage of those environments was extremely low, like under 1%,” says Director of Engineering Travis Jeppson. “We wanted our usage of cloud environments to be more tightly coupled with what we actually needed, so we started looking at containerization and orchestration to help us be able to run workloads that were distinct from one another but could share a similar resource pool.” -

-

Solution

- After evaluating a number of orchestration solutions, the Nav team decided to adopt Kubernetes running on AWS. The strength of the community around Kubernetes was a strong draw, as well as its Google provenance. Plus, “the other solutions tended to be fairly heavy-handed, really complex, really large, and really hard to manage just off the bat,” says Jeppson. “Kubernetes gave us a very simple way to be able to step into an orchestration solution that fit our needs at the time, but also the extensibility of it allowed us to be able to grow with it and be able to build in more features and functionality later on.” -

Impact

- The four-person team got Kubernetes up and running in six months, and the full migration of Nav’s 25 microservices was completed in another six months. The results have been impressive: Resource utilization, which led the company on this path in the first place, has increased from 1% to 40%. Launching a new service used to take two developers two weeks; now it takes only one developer less than 10 minutes. Deployments have increased 5x. And the company is saving 50% in infrastructure costs. -
-
-
-
-
- -

- "Kubernetes gave us a very simple way to be able to step into an orchestration solution that fit our needs at the time, but also the extensibility of it allowed us to be able to grow with it and be able to build in more features and functionality later on." -

- Travis Jeppson, Director of Engineering, Nav
-
-
-
-
-

Founded in 2012, Nav provides small business owners with access to their business credit scores from all three major commercial credit bureaus—Equifax, Experian and Dun & Bradstreet—as well as details on their businesses’ financial health and financing options that best fit their needs. Its mission boils down to this, says Director of Engineering Travis Jeppson: “to increase the success rate of small businesses.”

- A couple of years ago, Nav recognized an obstacle in its own path to success. The business was growing rapidly, and “our cloud environments were getting very large, and our usage of those environments was extremely low, like under 1%,” says Jeppson. “Most of the problem was around the ability to scale. We were just throwing money at it. ‘Let’s just spin up more servers. Let’s just do more things in order to handle an increased load.’ And with us being a startup, that could lead to our demise. We don’t have the money to burn on that kind of stuff.”

- Plus, every new service had to go through 10 different people, taking an unacceptably long two weeks to launch. “All of the patch management and the server management was done very manually, and so we all had to watch it and maintain it really well,” adds Jeppson. “It was just a very troublesome system.” +

The four-person team got Kubernetes up and running in six months, and the full migration of Nav's 25 microservices was completed in another six months. The results have been impressive: Resource utilization, which led the company on this path in the first place, has increased from 1% to 40%. Launching a new service used to take two developers two weeks; now it takes only one developer less than 10 minutes. Deployments have increased 5x. And the company is saving 50% in infrastructure costs.

+{{< case-studies/quote author="Travis Jeppson, Director of Engineering, Nav" >}} + +
+"Kubernetes gave us a very simple way to be able to step into an orchestration solution that fit our needs at the time, but also the extensibility of it allowed us to be able to grow with it and be able to build in more features and functionality later on." +{{< /case-studies/quote >}} -
-
-
-
- "The community is absolutely vital: being able to pass ideas around, talk about a lot of the similar challenges that we’re all facing, and just get help. I like that we’re able to tackle the same problems for different reasons but help each other along the way."

- Travis Jeppson, Director of Engineering, Nav
+{{< case-studies/lead >}} +Founded in 2012, Nav provides small business owners with access to their business credit scores from all three major commercial credit bureaus—Equifax, Experian and Dun & Bradstreet—as well as details on their businesses' financial health and financing options that best fit their needs. Its mission boils down to this, says Director of Engineering Travis Jeppson: "to increase the success rate of small businesses." +{{< /case-studies/lead >}} -
-
-
-
- Jeppson had worked with containers at his previous job, and pitched that technology to Nav’s management as a solution to these problems. He got the green light in early 2017. “We wanted our usage of cloud environments to be more tightly coupled with what we actually needed, so we started looking at containerization and orchestration to help us be able to run workloads that were distinct from one another but could share a similar resource pool,” he says.

- After evaluating a number of orchestration solutions, the company decided to adopt Kubernetes running on AWS. The strength of the community around Kubernetes was a strong draw, as was its Google origins. Additionally, “the other solutions tended to be fairly heavy-handed, really complex, really large, and really hard to manage just off the bat,” says Jeppson. “Kubernetes gave us a very simple way to be able to step into an orchestration solution that fit our needs at the time, but the extensibility of it would also allow us to grow with it and build in more features and functionality later on.”

- Jeppson’s four-person Engineering Services team got Kubernetes up and running in six months (they decided to use Kubespray to spin up clusters), and the full migration of Nav’s 25 microservices and one primary monolith was completed in another six months. “We couldn’t rewrite everything; we couldn’t stop,” he says. “We had to stay up, we had to stay available, and we had to have minimal amount of downtime. So we got really comfortable around our building pipeline, our metrics and logging, and then around Kubernetes itself: how to launch it, how to upgrade it, how to service it. And we moved little by little.” -
-
-
-
- “Kubernetes has brought so much value to Nav by allowing all of these new freedoms that we had just never had before.”

- Travis Jeppson, Director of Engineering, Nav
-
-
+

A couple of years ago, Nav recognized an obstacle in its own path to success. The business was growing rapidly, and "our cloud environments were getting very large, and our usage of those environments was extremely low, like under 1%," says Jeppson. "Most of the problem was around the ability to scale. We were just throwing money at it. 'Let's just spin up more servers. Let's just do more things in order to handle an increased load.' And with us being a startup, that could lead to our demise. We don't have the money to burn on that kind of stuff."

-
+

Plus, every new service had to go through 10 different people, taking an unacceptably long two weeks to launch. "All of the patch management and the server management was done very manually, and so we all had to watch it and maintain it really well," adds Jeppson. "It was just a very troublesome system."

-
- A crucial part of the process involved educating Nav’s 50 engineers and being transparent regarding the new workflow as well as the roadmap for the migration. Jeppson did regular presentations along the way, and a week of four-hours-a-day labs for the entire staff of engineers. He then created a repository in GitLab to house all of the information. “We showed all the frontend and backend developers how to go in, create their own namespace using kubectl, all themselves,” he says. “Now, a lot of times, they just come to us and say, ‘This is ready.’ We click a little button in GitLab to allow it to release into production, and they’re off to the races.”

- Since the migration was completed in early 2018, the results have been impressive: Resource utilization, which led the company on this path in the first place, has increased from 1% to 40%. Launching a new service used to take two developers two weeks; now it takes only one developer less than 10 minutes. Deployments have increased 5x, from 10 a day to 50 a day. And the company is saving 50% in infrastructure costs on the computational side. “Next we want to go in to address the database side, and once we do that, then we’re going to continue to drop that cost quite a bit more,” says Jeppson.

- Kubernetes has also helped Nav with its compliance needs. Before, “we had to map one application to one server, mostly due to different compliance regulations around data,” Jeppson says. “With the Kubernetes API, we could add in network policies and segregate that data and restrict it if needed.” The company segregates its cluster into an unrestricted zone and a restricted zone, which has its own set of nodes where data protection happens. The company also uses the Twistlock tool to ensure security, “and that makes it a lot easier to sleep at night,” he adds. +{{< case-studies/quote + image="/images/case-studies/nav/banner3.jpg" + author="Travis Jeppson, Director of Engineering, Nav" +>}} +"The community is absolutely vital: being able to pass ideas around, talk about a lot of the similar challenges that we're all facing, and just get help. I like that we're able to tackle the same problems for different reasons but help each other along the way." +{{< /case-studies/quote >}} -
+

Jeppson had worked with containers at his previous job, and pitched that technology to Nav's management as a solution to these problems. He got the green light in early 2017. "We wanted our usage of cloud environments to be more tightly coupled with what we actually needed, so we started looking at containerization and orchestration to help us be able to run workloads that were distinct from one another but could share a similar resource pool," he says.

-
-
-"We’re talking four to 10 times the amount of traffic that we handle now, and it’s just like, 'Oh, yeah. We’re good. Kubernetes handles this for us.'"

- Travis Jeppson, Director of Engineering, Nav
-
+

After evaluating a number of orchestration solutions, the company decided to adopt Kubernetes running on AWS. The strength of the community around Kubernetes was a strong draw, as was its Google origins. Additionally, "the other solutions tended to be fairly heavy-handed, really complex, really large, and really hard to manage just off the bat," says Jeppson. "Kubernetes gave us a very simple way to be able to step into an orchestration solution that fit our needs at the time, but the extensibility of it would also allow us to grow with it and build in more features and functionality later on."

-
- With Kubernetes in place, the Nav team also started improving the system’s metrics and logging by adopting Prometheus. “Prometheus created a standard around metrics that was really easy for a developer to adopt,” says Jeppson. “They have the freedom to display what they want, to do what they need, and keep their codebase clean, and that to us was absolutely a must.”

- Next up for Nav in the coming year: looking at tracing, storage, and service mesh. They’re currently evaluating Envoy, OpenTracing, and Jaeger after spending much of KubeCon talking to other companies. “The community is absolutely vital: being able to pass ideas around, talk about a lot of the similar challenges that we’re all facing, and just get help. I like that we’re able to tackle the same problems for different reasons but help each other along the way,” says Jeppson. “There’s still so, so much to do around scalability, around being able to really fully adopt a cloud native solution.”

- Of course, it all starts with Kubernetes. With that technology, Jeppson’s team has built a platform that allows Nav to scale, and that “has brought so much value to Nav by allowing all of these new freedoms that we had just never had before,” he says.

- Conversations about new products used to be bogged down by the fact they’d have to wait six months to get an environment set up with isolation and then figure out how to handle spikes of traffic. “But now it’s just nothing to us,” says Jeppson. “We’re talking four to 10 times the amount of traffic that we handle now, and it’s just like, ‘Oh, yeah. We’re good. Kubernetes handles this for us.’” +

Jeppson's four-person Engineering Services team got Kubernetes up and running in six months (they decided to use Kubespray to spin up clusters), and the full migration of Nav's 25 microservices and one primary monolith was completed in another six months. "We couldn't rewrite everything; we couldn't stop," he says. "We had to stay up, we had to stay available, and we had to have minimal amount of downtime. So we got really comfortable around our building pipeline, our metrics and logging, and then around Kubernetes itself: how to launch it, how to upgrade it, how to service it. And we moved little by little."

-
-
+{{< case-studies/quote + image="/images/case-studies/nav/banner4.jpg" + author="Travis Jeppson, Director of Engineering, Nav" +>}} +"Kubernetes has brought so much value to Nav by allowing all of these new freedoms that we had just never had before." +{{< /case-studies/quote >}} + +

A crucial part of the process involved educating Nav's 50 engineers and being transparent regarding the new workflow as well as the roadmap for the migration. Jeppson did regular presentations along the way, and a week of four-hours-a-day labs for the entire staff of engineers. He then created a repository in GitLab to house all of the information. "We showed all the frontend and backend developers how to go in, create their own namespace using kubectl, all themselves," he says. "Now, a lot of times, they just come to us and say, 'This is ready.' We click a little button in GitLab to allow it to release into production, and they're off to the races."

+ +

Since the migration was completed in early 2018, the results have been impressive: Resource utilization, which led the company on this path in the first place, has increased from 1% to 40%. Launching a new service used to take two developers two weeks; now it takes only one developer less than 10 minutes. Deployments have increased 5x, from 10 a day to 50 a day. And the company is saving 50% in infrastructure costs on the computational side. "Next we want to go in to address the database side, and once we do that, then we're going to continue to drop that cost quite a bit more," says Jeppson.

+ +

Kubernetes has also helped Nav with its compliance needs. Before, "we had to map one application to one server, mostly due to different compliance regulations around data," Jeppson says. "With the Kubernetes API, we could add in network policies and segregate that data and restrict it if needed." The company segregates its cluster into an unrestricted zone and a restricted zone, which has its own set of nodes where data protection happens. The company also uses the Twistlock tool to ensure security, "and that makes it a lot easier to sleep at night," he adds.

+ +{{< case-studies/quote author="Travis Jeppson, Director of Engineering, Nav" >}} +"We're talking four to 10 times the amount of traffic that we handle now, and it's just like, 'Oh, yeah. We're good. Kubernetes handles this for us.'" +{{< /case-studies/quote >}} + +

With Kubernetes in place, the Nav team also started improving the system's metrics and logging by adopting Prometheus. "Prometheus created a standard around metrics that was really easy for a developer to adopt," says Jeppson. "They have the freedom to display what they want, to do what they need, and keep their codebase clean, and that to us was absolutely a must."

+ +

Next up for Nav in the coming year: looking at tracing, storage, and service mesh. They're currently evaluating Envoy, OpenTracing, and Jaeger after spending much of KubeCon talking to other companies. "The community is absolutely vital: being able to pass ideas around, talk about a lot of the similar challenges that we're all facing, and just get help. I like that we're able to tackle the same problems for different reasons but help each other along the way," says Jeppson. "There's still so, so much to do around scalability, around being able to really fully adopt a cloud native solution."

+ +

Of course, it all starts with Kubernetes. With that technology, Jeppson's team has built a platform that allows Nav to scale, and that "has brought so much value to Nav by allowing all of these new freedoms that we had just never had before," he says.

+ +

Conversations about new products used to be bogged down by the fact they'd have to wait six months to get an environment set up with isolation and then figure out how to handle spikes of traffic. "But now it's just nothing to us," says Jeppson. "We're talking four to 10 times the amount of traffic that we handle now, and it's just like, 'Oh, yeah. We're good. Kubernetes handles this for us.'"

diff --git a/content/en/case-studies/nerdalize/index.html b/content/en/case-studies/nerdalize/index.html index 2756ce431c..d961c0a289 100644 --- a/content/en/case-studies/nerdalize/index.html +++ b/content/en/case-studies/nerdalize/index.html @@ -3,94 +3,79 @@ title: Prowise Case Study linkTitle: prowise case_study_styles: true cid: caseStudies -css: /css/style_case_studies.css featured: false + +new_case_study_styles: true +heading_background: /images/case-studies/nerdalize/banner1.jpg +heading_title_logo: /images/nerdalize_logo.png +subheading: > + Nerdalize: Providing Affordable and Sustainable Cloud Hosting with Kubernetes +case_study_details: + - Company: Nerdalize + - Location: Delft, Netherlands + - Industry: Cloud Provider --- -
-

CASE STUDY:
Nerdalize: Providing Affordable and Sustainable Cloud Hosting with Kubernetes -

-
+

Challenge

-
- Company  Nerdalize     Location  Delft, Netherlands      Industry  Cloud Provider -
+

Nerdalize offers affordable cloud hosting for customers—and free heat and hot water for people who sign up to house the heating devices that contain the company's servers. The savings Nerdalize realizes by not running data centers are passed on to its customers. When the team began using Docker to make its software more portable, it realized it also needed a container orchestration solution. "As a cloud provider, we have internal services for hosting our backends and billing our customers, but we also need to offer our compute to our end users," says Digital Product Engineer Ad van der Veer. "Since we have these heating devices spread across the Netherlands, we need some way of tying that all together."

-
-
-
-
+

Solution

+ +

After briefly using a basic scheduling setup with another open source tool, Nerdalize switched to Kubernetes. "On top of our heating devices throughout the Netherlands, we have a virtual machine layer, and on top of that we run Kubernetes clusters for our customers," says van der Veer. "As a small company, we have to provide a rock solid story in terms of the technology. Kubernetes allows us to offer a hybrid solution: 'You can run this on our cloud, but you can run it on other clouds as well. It runs in your internal hardware if you like.' And together with the Docker image standard and our multi-cloud dashboard, that allows them peace of mind."

-

Challenge

- Nerdalize offers affordable cloud hosting for customers—and free heat and hot water for people who sign up to house the heating devices that contain the company’s servers. The savings Nerdalize realizes by not running data centers are passed on to its customers. When the team began using Docker to make its software more portable, it realized it also needed a container orchestration solution. “As a cloud provider, we have internal services for hosting our backends and billing our customers, but we also need to offer our compute to our end users,” says Digital Product Engineer Ad van der Veer. “Since we have these heating devices spread across the Netherlands, we need some way of tying that all together.” -

Solution

- After briefly using a basic scheduling setup with another open source tool, Nerdalize switched to Kubernetes. “On top of our heating devices throughout the Netherlands, we have a virtual machine layer, and on top of that we run Kubernetes clusters for our customers,” says van der Veer. “As a small company, we have to provide a rock solid story in terms of the technology. Kubernetes allows us to offer a hybrid solution: ‘You can run this on our cloud, but you can run it on other clouds as well. It runs in your internal hardware if you like.’ And together with the Docker image standard and our multi-cloud dashboard, that allows them peace of mind.”

Impact

- Nerdalize prides itself on being a Kubernetes-native cloud provider that charges its customers prices 40% below that of other cloud providers. “Every euro that we have to invest for licensing of software that’s not open source comes from that 40%,” says van der Veer. If they had used a non-open source orchestration platform instead of Kubernetes, “that would reduce this proposition that we have of 40% less cost to like 30%. Kubernetes directly allows us to have this business model and this strategic advantage.” Nerdalize customers also benefit from time savings: One went from spending a day to set up VMs, network, and software, to spinning up a Kubernetes cluster in minutes. And for households using the heating devices, they save an average of 200 euro a year on their heating bill. The environmental impact? The annual reduction in CO2 emissions comes out to be 2 tons per Nerdalize household, which is equivalent to a car driving 8,000 km. -
-
-
-
-
- “We can walk into a boardroom and put a Kubernetes logo up, and people accept it as an established technology. It becomes this centerpiece where other cloud native projects can tie in, so there’s a network effect that each project empowers each other. This is something that has a lot of value when we have to talk to customers and convince them that our cloud fits their needs.” -

— AD VAN DER VEER, PRODUCT ENGINEER, NERDALIZE
-
-
-
-
-

Nerdalize is a cloud hosting provider that has no data centers. Instead, the four-year-old startup places its servers in homes across the Netherlands, inside heating devices it developed to turn the heat produced by the servers into heating and hot water for the residents. -

- “Households save on their gas bills, and cloud users have a much more sustainable cloud solution,” says Maaike Stoops, Customer Experience Queen at Nerdalize. “And we don’t have the overhead of building a data center, so our cloud is up to 40% more affordable.” -

- That business model has been enabled by the company’s adoption of containerization and Kubernetes. “When we just got started, Docker was just introduced,” says Digital Product Engineer Ad van der Veer. “We began with a very basic bare metal setup, but once we developed the business, we saw that containerization technology was super useful to help our customers. As a cloud provider, we have internal services for hosting our backends and billing our customers, but we also need to offer our compute to our end users. Since we have these heating devices spread across the Netherlands, we need some way of tying that all together.” -

- After trying to develop its own scheduling system using another open source tool, Nerdalize found Kubernetes. “Kubernetes provided us with more functionality out of the gate,” says van der Veer. -
-
-
-
- “We always try to get a working version online first, like minimal viable products, and then move to stabilize that,” says van der Veer. “And I think that these kinds of day-two problems are now immediately solved. The rapid prototyping we saw internally is a very valuable aspect of Kubernetes.”

— AD VAN DER VEER, PRODUCT ENGINEER, NERDALIZE
-
-
-
-
- The team first experimented with a basic use case to run customers’ workloads on Kubernetes. “Getting the data working was kind of difficult, and at the time the installation wasn’t that simple,” says van der Veer. “Then CNCF started, we saw the community grow, these problems got solved, and from there it became a very easy decision.” -

- The first Nerdalize product that was launched in 2017 was “100% containerized and Kubernetes native,” says van der Veer. “On top of our heating devices throughout the Netherlands, we have a virtual machine layer, and on top of that we run Kubernetes clusters for our customers. As a small company, we have to provide a rock solid story in terms of the technology. Kubernetes allows us to offer a hybrid solution: ‘You can run this on our cloud, but you can run it on other clouds as well. It runs in your internal hardware if you like.’ And together with the Docker image standard and our multi-cloud dashboard, that gives them peace of mind.” -

- Not to mention the 40% cost savings. “Every euro that we have to invest for licensing of software that’s not open source comes from that 40%,” says van der Veer. If Nerdalize had used a non-open source orchestration platform instead of Kubernetes, “that would reduce our cost savings proposition to like 30%. Kubernetes directly allows us to have this business model and this strategic advantage.” -
-
-
-
- “One of our customers used to spend up to a day setting up the virtual machines, network and software every time they wanted to run a project in the cloud. On our platform, with Docker and Kubernetes, customers can have their projects running in a couple of minutes.” -

- MAAIKE STOOPS, CUSTOMER EXPERIENCE QUEEN, NERDALIZE
-
-
-
-
- Nerdalize now has customers, from individual engineers to data-intensive startups and established companies, all around the world. (For the time being, though, the heating devices are exclusive to the Netherlands.) One of the most common use cases is batch workloads used by data scientists and researchers, and the time savings for these end users is profound. “One of our customers used to spend up to a day setting up the virtual machines, network and software every time they wanted to run a project in the cloud,” says Stoops. “On our platform, with Docker and Kubernetes, customers can have their projects running in a couple of minutes.” -

- As for households using the heating devices, they save an average of 200 euro a year on their heating bill. The environmental impact? The annual reduction in CO2 emissions comes out to 2 tons per Nerdalize household, which is equivalent to a car driving 8,000 km. -

- For the Nerdalize team, feature development—such as the accessible command line interface called Nerd, which recently went live—has also been sped up by Kubernetes. “We always try to get a working version online first, like minimal viable products, and then move to stabilize that,” says van der Veer. “And I think that these kinds of day-two problems are now immediately solved. The rapid prototyping we saw internally is a very valuable aspect of Kubernetes.” -

- Another unexpected benefit has been the growing influence and reputation of Kubernetes. “We can walk into a boardroom and put a Kubernetes logo up, and people accept it as an established technology,” says van der Veer. “It becomes this centerpiece where other cloud native projects can tie in, so there’s a network effect that each project empowers each other. This is something that has a lot of value when we have to convince customers that our cloud fits their needs.” -
+

Nerdalize prides itself on being a Kubernetes-native cloud provider that charges its customers prices 40% below that of other cloud providers. "Every euro that we have to invest for licensing of software that's not open source comes from that 40%," says van der Veer. If they had used a non-open source orchestration platform instead of Kubernetes, "that would reduce this proposition that we have of 40% less cost to like 30%. Kubernetes directly allows us to have this business model and this strategic advantage." Nerdalize customers also benefit from time savings: One went from spending a day to set up VMs, network, and software, to spinning up a Kubernetes cluster in minutes. And for households using the heating devices, they save an average of 200 euro a year on their heating bill. The environmental impact? The annual reduction in CO2 emissions comes out to be 2 tons per Nerdalize household, which is equivalent to a car driving 8,000 km.

-
-
-“It shouldn’t be too big of a hassle and too large of a commitment. It should be fun and easy for end users. So we really love Kubernetes in that way.”

- MAAIKE STOOPS, CUSTOMER EXPERIENCE QUEEN, NERDALIZE
-
+{{< case-studies/quote author="AD VAN DER VEER, PRODUCT ENGINEER, NERDALIZE" >}} +"We can walk into a boardroom and put a Kubernetes logo up, and people accept it as an established technology. It becomes this centerpiece where other cloud native projects can tie in, so there's a network effect that each project empowers each other. This is something that has a lot of value when we have to talk to customers and convince them that our cloud fits their needs." +{{< /case-studies/quote >}} -
+{{< case-studies/lead >}} +Nerdalize is a cloud hosting provider that has no data centers. Instead, the four-year-old startup places its servers in homes across the Netherlands, inside heating devices it developed to turn the heat produced by the servers into heating and hot water for the residents. +{{< /case-studies/lead >}} - In fact, Nerdalize is currently looking into implementing other CNCF projects, such as Prometheus for monitoring and Rook, “which should help us with some of the data problems that we want to solve for our customers,” says van der Veer. -

- In the coming year, Nerdalize will scale up the number of households running its hardware to 50, or the equivalent of a small scale data center. Geographic redundancy and greater server ability for customers are two main goals. Spreading the word about Kubernetes is also in the game plan. “We offer a free namespace on our sandbox, multi-tenant Kubernetes cluster for anyone to try,” says van der Veer. “What’s more cool than trying your first Kubernetes project on houses, to warm a shower?” -

- Ultimately, this ties into Nerdalize’s mission of supporting affordable and sustainable cloud hosting. “We want to be the disrupter of the cloud space, showing organizations that running in the cloud is easy and affordable,” says Stoops. “It shouldn’t be too big of a hassle and too large of a commitment. It should be fun and easy for end users. So we really love Kubernetes in that way.” -
+

"Households save on their gas bills, and cloud users have a much more sustainable cloud solution," says Maaike Stoops, Customer Experience Queen at Nerdalize. "And we don't have the overhead of building a data center, so our cloud is up to 40% more affordable."

-
+

That business model has been enabled by the company's adoption of containerization and Kubernetes. "When we just got started, Docker was just introduced," says Digital Product Engineer Ad van der Veer. "We began with a very basic bare metal setup, but once we developed the business, we saw that containerization technology was super useful to help our customers. As a cloud provider, we have internal services for hosting our backends and billing our customers, but we also need to offer our compute to our end users. Since we have these heating devices spread across the Netherlands, we need some way of tying that all together."

+ +

After trying to develop its own scheduling system using another open source tool, Nerdalize found Kubernetes. "Kubernetes provided us with more functionality out of the gate," says van der Veer.

+ +{{< case-studies/quote + image="/images/case-studies/nerdalize/banner3.jpg" + author="AD VAN DER VEER, PRODUCT ENGINEER, NERDALIZE" +>}} +"We always try to get a working version online first, like minimal viable products, and then move to stabilize that," says van der Veer. "And I think that these kinds of day-two problems are now immediately solved. The rapid prototyping we saw internally is a very valuable aspect of Kubernetes." +{{< /case-studies/quote >}} + +

The team first experimented with a basic use case to run customers' workloads on Kubernetes. "Getting the data working was kind of difficult, and at the time the installation wasn't that simple," says van der Veer. "Then CNCF started, we saw the community grow, these problems got solved, and from there it became a very easy decision."

+ +

The first Nerdalize product that was launched in 2017 was "100% containerized and Kubernetes native," says van der Veer. "On top of our heating devices throughout the Netherlands, we have a virtual machine layer, and on top of that we run Kubernetes clusters for our customers. As a small company, we have to provide a rock solid story in terms of the technology. Kubernetes allows us to offer a hybrid solution: 'You can run this on our cloud, but you can run it on other clouds as well. It runs in your internal hardware if you like.' And together with the Docker image standard and our multi-cloud dashboard, that gives them peace of mind."

+ +

Not to mention the 40% cost savings. "Every euro that we have to invest for licensing of software that's not open source comes from that 40%," says van der Veer. If Nerdalize had used a non-open source orchestration platform instead of Kubernetes, "that would reduce our cost savings proposition to like 30%. Kubernetes directly allows us to have this business model and this strategic advantage."

+ +{{< case-studies/quote + image="/images/case-studies/nerdalize/banner4.jpg" + author="MAAIKE STOOPS, CUSTOMER EXPERIENCE QUEEN, NERDALIZE" +>}} +"One of our customers used to spend up to a day setting up the virtual machines, network and software every time they wanted to run a project in the cloud. On our platform, with Docker and Kubernetes, customers can have their projects running in a couple of minutes." +{{< /case-studies/quote >}} + +

Nerdalize now has customers, from individual engineers to data-intensive startups and established companies, all around the world. (For the time being, though, the heating devices are exclusive to the Netherlands.) One of the most common use cases is batch workloads used by data scientists and researchers, and the time savings for these end users is profound. "One of our customers used to spend up to a day setting up the virtual machines, network and software every time they wanted to run a project in the cloud," says Stoops. "On our platform, with Docker and Kubernetes, customers can have their projects running in a couple of minutes."

+ +

As for households using the heating devices, they save an average of 200 euro a year on their heating bill. The environmental impact? The annual reduction in CO2 emissions comes out to 2 tons per Nerdalize household, which is equivalent to a car driving 8,000 km.

+ +

For the Nerdalize team, feature development—such as the accessible command line interface called Nerd, which recently went live—has also been sped up by Kubernetes. "We always try to get a working version online first, like minimal viable products, and then move to stabilize that," says van der Veer. "And I think that these kinds of day-two problems are now immediately solved. The rapid prototyping we saw internally is a very valuable aspect of Kubernetes."

+ +

Another unexpected benefit has been the growing influence and reputation of Kubernetes. "We can walk into a boardroom and put a Kubernetes logo up, and people accept it as an established technology," says van der Veer. "It becomes this centerpiece where other cloud native projects can tie in, so there's a network effect that each project empowers each other. This is something that has a lot of value when we have to convince customers that our cloud fits their needs."

+ +{{< case-studies/quote author="MAAIKE STOOPS, CUSTOMER EXPERIENCE QUEEN, NERDALIZE" >}} +"It shouldn't be too big of a hassle and too large of a commitment. It should be fun and easy for end users. So we really love Kubernetes in that way." +{{< /case-studies/quote >}} + +

In fact, Nerdalize is currently looking into implementing other CNCF projects, such as Prometheus for monitoring and Rook, "which should help us with some of the data problems that we want to solve for our customers," says van der Veer.

+ +

In the coming year, Nerdalize will scale up the number of households running its hardware to 50, or the equivalent of a small scale data center. Geographic redundancy and greater server ability for customers are two main goals. Spreading the word about Kubernetes is also in the game plan. "We offer a free namespace on our sandbox, multi-tenant Kubernetes cluster for anyone to try," says van der Veer. "What's more cool than trying your first Kubernetes project on houses, to warm a shower?"

+ +

Ultimately, this ties into Nerdalize's mission of supporting affordable and sustainable cloud hosting. "We want to be the disrupter of the cloud space, showing organizations that running in the cloud is easy and affordable," says Stoops. "It shouldn't be too big of a hassle and too large of a commitment. It should be fun and easy for end users. So we really love Kubernetes in that way."

diff --git a/content/en/case-studies/netease/index.html b/content/en/case-studies/netease/index.html index 6cba5579ab..3a3a8ab3a7 100644 --- a/content/en/case-studies/netease/index.html +++ b/content/en/case-studies/netease/index.html @@ -3,87 +3,74 @@ title: NetEase Case Study linkTitle: NetEase case_study_styles: true cid: caseStudies -css: /css/style_case_studies.css logo: netease_featured_logo.png featured: false + +new_case_study_styles: true +heading_background: /images/case-studies/netease/banner1.jpg +heading_title_logo: /images/netease_logo.png +subheading: > + How NetEase Leverages Kubernetes to Support Internet Business Worldwide +case_study_details: + - Company: NetEase + - Location: Hangzhou, China + - Industry: Internet technology --- +

Challenge

-
-

CASE STUDY:
How NetEase Leverages Kubernetes to Support Internet Business Worldwide

+

Its gaming business is one of the largest in the world, but that's not all that NetEase provides to Chinese consumers. The company also operates e-commerce, advertising, music streaming, online education, and email platforms; the last of which serves almost a billion users with free email services through sites like 163.com. In 2015, the NetEase Cloud team providing the infrastructure for all of these systems realized that their R&D process was slowing down developers. "Our users needed to prepare all of the infrastructure by themselves," says Feng Changjian, Architect for NetEase Cloud and Container Service. "We were eager to provide the infrastructure and tools for our users automatically via serverless container service."

-
+

Solution

-
- Company  NetEase     Location  Hangzhou, China     Industry  Internet technology -
+

After considering building its own orchestration solution, NetEase decided to base its private cloud platform on Kubernetes. The fact that the technology came out of Google gave the team confidence that it could keep up with NetEase's scale. "After our 2-to-3-month evaluation, we believed it could satisfy our needs," says Feng. The team started working with Kubernetes in 2015, before it was even 1.0. Today, the NetEase internal cloud platform—which also leverages the CNCF projects Prometheus, Envoy, Harbor, gRPC, and Helm—runs 10,000 nodes in a production cluster and can support up to 30,000 nodes in a cluster. Based on its learnings from its internal platform, the company introduced a Kubernetes-based cloud and microservices-oriented PaaS product, NetEase Qingzhou Microservice, to outside customers.

-
-
-
-
-

Challenge

- Its gaming business is one of the largest in the world, but that’s not all that NetEase provides to Chinese consumers. The company also operates e-commerce, advertising, music streaming, online education, and email platforms; the last of which serves almost a billion users with free email services through sites like 163.com. In 2015, the NetEase Cloud team providing the infrastructure for all of these systems realized that their R&D process was slowing down developers. “Our users needed to prepare all of the infrastructure by themselves,” says Feng Changjian, Architect for NetEase Cloud and Container Service. “We were eager to provide the infrastructure and tools for our users automatically via serverless container service.” -

-

Solution

- After considering building its own orchestration solution, NetEase decided to base its private cloud platform on Kubernetes. The fact that the technology came out of Google gave the team confidence that it could keep up with NetEase’s scale. “After our 2-to-3-month evaluation, we believed it could satisfy our needs,” says Feng. The team started working with Kubernetes in 2015, before it was even 1.0. Today, the NetEase internal cloud platform—which also leverages the CNCF projects Prometheus, Envoy, Harbor, gRPC, and Helm—runs 10,000 nodes in a production cluster and can support up to 30,000 nodes in a cluster. Based on its learnings from its internal platform, the company introduced a Kubernetes-based cloud and microservices-oriented PaaS product, NetEase Qingzhou Microservice, to outside customers. - - -

Impact

- The NetEase team reports that Kubernetes has increased R&D efficiency by more than 100%. Deployment efficiency has improved by 280%. “In the past, if we wanted to do upgrades, we needed to work with other teams, even in other departments,” says Feng. “We needed special staff to prepare everything, so it took about half an hour. Now we can do it in only 5 minutes.” The new platform also allows for mixed deployments using GPU and CPU resources. “Before, if we put all the resources toward the GPU, we won’t have spare resources for the CPU. But now we have improvements thanks to the mixed deployments,” he says. Those improvements have also brought an increase in resource utilization. -
-
-
-
-
- "The system can support 30,000 nodes in a single cluster. In production, we have gotten the data of 10,000 nodes in a single cluster. The whole internal system is using this system for development, test, and production."

— Zeng Yuxing, Architect, NetEase
-
-
-
-
-

Its gaming business is the fifth-largest in the world, but that’s not all that NetEase provides consumers.

The company also operates e-commerce, advertising, music streaming, online education, and email platforms in China; the last of which serves almost a billion users with free email services through popular sites like 163.com and 126.com. With that kind of scale, the NetEase Cloud team providing the infrastructure for all of these systems realized in 2015 that their R&D process was making it hard for developers to keep up with demand. “Our users needed to prepare all of the infrastructure by themselves,” says Feng Changjian, Architect for NetEase Cloud and Container Service. “We were eager to provide the infrastructure and tools for our users automatically via serverless container service.”

- After considering building its own orchestration solution, NetEase decided to base its private cloud platform on Kubernetes. The fact that the technology came out of Google gave the team confidence that it could keep up with NetEase’s scale. “After our 2-to-3-month evaluation, we believed it could satisfy our needs,” says Feng. -
-
-
-
- "We leveraged the programmability of Kubernetes so that we can build a platform to satisfy the needs of our internal customers for upgrades and deployment." -

- Feng Changjian, Architect for NetEase Cloud and Container Service, NetEase
-
-
-
-
- The team started adopting Kubernetes in 2015, before it was even 1.0, because it was relatively easy to use and enabled DevOps at the company. “We abandoned some of the concepts of Kubernetes; we only wanted to use the standardized framework,” says Feng. “We leveraged the programmability of Kubernetes so that we can build a platform to satisfy the needs of our internal customers for upgrades and deployment.”

- The team first focused on building the container platform to manage resources better, and then turned their attention to improving its support of microservices by adding internal systems such as monitoring. That has meant integrating the CNCF projects Prometheus, Envoy, Harbor, gRPC, and Helm. “We are trying to provide a simplified and standardized process, so our users and customers can leverage our best practices,” says Feng.

- And the team is continuing to make improvements. For example, the e-commerce part of the business needs to leverage mixed deployments, which in the past required using two separate platforms: the infrastructure-as-a-service platform and the Kubernetes platform. More recently, NetEase has created a cross-platform application that enables using both with one-command deployment. -
-
-
-
- "As long as a company has a mature team and enough developers, I think Kubernetes is a very good technology that can help them." -

- Li Lanqing, Kubernetes Developer, NetEase
-
-
- +

The NetEase team reports that Kubernetes has increased R&D efficiency by more than 100%. Deployment efficiency has improved by 280%. "In the past, if we wanted to do upgrades, we needed to work with other teams, even in other departments," says Feng. "We needed special staff to prepare everything, so it took about half an hour. Now we can do it in only 5 minutes." The new platform also allows for mixed deployments using GPU and CPU resources. "Before, if we put all the resources toward the GPU, we won't have spare resources for the CPU. But now we have improvements thanks to the mixed deployments," he says. Those improvements have also brought an increase in resource utilization.

-
-
- Today, the NetEase internal cloud platform “can support 30,000 nodes in a single cluster,” says Architect Zeng Yuxing. “In production, we have gotten the data of 10,000 nodes in a single cluster. The whole internal system is using this system for development, test, and production.”

- The NetEase team reports that Kubernetes has increased R&D efficiency by more than 100%. Deployment efficiency has improved by 280%. “In the past, if we wanted to do upgrades, we needed to work with other teams, even in other departments,” says Feng. “We needed special staff to prepare everything, so it took about half an hour. Now we can do it in only 5 minutes.” The new platform also allows for mixed deployments using GPU and CPU resources. “Before, if we put all the resources toward the GPU, we won’t have spare resources for the CPU. But now we have improvements thanks to the mixed deployments.” Those improvements have also brought an increase in resource utilization. +{{< case-studies/quote author="Zeng Yuxing, Architect, NetEase" >}} +"The system can support 30,000 nodes in a single cluster. In production, we have gotten the data of 10,000 nodes in a single cluster. The whole internal system is using this system for development, test, and production." +{{< /case-studies/quote >}} -
+{{< case-studies/lead >}} +Its gaming business is the fifth-largest in the world, but that's not all that NetEase provides consumers. +{{< /case-studies/lead >}} -
-
- "By engaging with this community, we can gain some experience from it and we can also benefit from it. We can see what are the concerns and the challenges faced by the community, so we can get involved."

- Li Lanqing, Kubernetes Developer, NetEase
+

The company also operates e-commerce, advertising, music streaming, online education, and email platforms in China; the last of which serves almost a billion users with free email services through popular sites like 163.com and 126.com. With that kind of scale, the NetEase Cloud team providing the infrastructure for all of these systems realized in 2015 that their R&D process was making it hard for developers to keep up with demand. "Our users needed to prepare all of the infrastructure by themselves," says Feng Changjian, Architect for NetEase Cloud and Container Service. "We were eager to provide the infrastructure and tools for our users automatically via serverless container service."

-
-
-
- Based on the results and learnings from using its internal platform, the company introduced a Kubernetes-based cloud and microservices-oriented PaaS product, NetEase Qingzhou Microservice, to outside customers. “The idea is that we can find the problems encountered by our game and e-commerce and cloud music providers, so we can integrate their experiences and provide a platform to satisfy the needs of our users,” says Zeng.

- With or without the use of the NetEase product, the team encourages other companies to try Kubernetes. “As long as a company has a mature team and enough developers, I think Kubernetes is a very good technology that can help them,” says Kubernetes developer Li Lanqing.

- As an end user as well as a vendor, NetEase has become more involved in the community, learning from other companies and sharing what they’ve done. The team has been contributing to the Harbor and Envoy projects, providing feedback as the technologies are being tested at NetEase scale. “We are a team focusing on addressing the challenges of microservices architecture,” says Feng. “By engaging with this community, we can gain some experience from it and we can also benefit from it. We can see what are the concerns and the challenges faced by the community, so we can get involved.” -
-
+

After considering building its own orchestration solution, NetEase decided to base its private cloud platform on Kubernetes. The fact that the technology came out of Google gave the team confidence that it could keep up with NetEase's scale. "After our 2-to-3-month evaluation, we believed it could satisfy our needs," says Feng.

+ +{{< case-studies/quote + image="/images/case-studies/netease/banner3.jpg" + author="Feng Changjian, Architect for NetEase Cloud and Container Service, NetEase" +>}} +"We leveraged the programmability of Kubernetes so that we can build a platform to satisfy the needs of our internal customers for upgrades and deployment." +{{< /case-studies/quote >}} + +

The team started adopting Kubernetes in 2015, before it was even 1.0, because it was relatively easy to use and enabled DevOps at the company. "We abandoned some of the concepts of Kubernetes; we only wanted to use the standardized framework," says Feng. "We leveraged the programmability of Kubernetes so that we can build a platform to satisfy the needs of our internal customers for upgrades and deployment."

+ +

The team first focused on building the container platform to manage resources better, and then turned their attention to improving its support of microservices by adding internal systems such as monitoring. That has meant integrating the CNCF projects Prometheus, Envoy, Harbor, gRPC, and Helm. "We are trying to provide a simplified and standardized process, so our users and customers can leverage our best practices," says Feng.

+ +

And the team is continuing to make improvements. For example, the e-commerce part of the business needs to leverage mixed deployments, which in the past required using two separate platforms: the infrastructure-as-a-service platform and the Kubernetes platform. More recently, NetEase has created a cross-platform application that enables using both with one-command deployment.

+ +{{< case-studies/quote + image="/images/case-studies/netease/banner4.jpg" + author="Li Lanqing, Kubernetes Developer, NetEase" +>}} +"As long as a company has a mature team and enough developers, I think Kubernetes is a very good technology that can help them." +{{< /case-studies/quote >}} + +

Today, the NetEase internal cloud platform "can support 30,000 nodes in a single cluster," says Architect Zeng Yuxing. "In production, we have gotten the data of 10,000 nodes in a single cluster. The whole internal system is using this system for development, test, and production."

+ +

The NetEase team reports that Kubernetes has increased R&D efficiency by more than 100%. Deployment efficiency has improved by 280%. "In the past, if we wanted to do upgrades, we needed to work with other teams, even in other departments," says Feng. "We needed special staff to prepare everything, so it took about half an hour. Now we can do it in only 5 minutes." The new platform also allows for mixed deployments using GPU and CPU resources. "Before, if we put all the resources toward the GPU, we won't have spare resources for the CPU. But now we have improvements thanks to the mixed deployments." Those improvements have also brought an increase in resource utilization.

+ +{{< case-studies/quote author="Li Lanqing, Kubernetes Developer, NetEase">}} +"By engaging with this community, we can gain some experience from it and we can also benefit from it. We can see what are the concerns and the challenges faced by the community, so we can get involved." +{{< /case-studies/quote >}} + +

Based on the results and learnings from using its internal platform, the company introduced a Kubernetes-based cloud and microservices-oriented PaaS product, NetEase Qingzhou Microservice, to outside customers. "The idea is that we can find the problems encountered by our game and e-commerce and cloud music providers, so we can integrate their experiences and provide a platform to satisfy the needs of our users," says Zeng.

+ +

With or without the use of the NetEase product, the team encourages other companies to try Kubernetes. "As long as a company has a mature team and enough developers, I think Kubernetes is a very good technology that can help them," says Kubernetes developer Li Lanqing.

+ +

As an end user as well as a vendor, NetEase has become more involved in the community, learning from other companies and sharing what they've done. The team has been contributing to the Harbor and Envoy projects, providing feedback as the technologies are being tested at NetEase scale. "We are a team focusing on addressing the challenges of microservices architecture," says Feng. "By engaging with this community, we can gain some experience from it and we can also benefit from it. We can see what are the concerns and the challenges faced by the community, so we can get involved."

diff --git a/content/en/case-studies/newyorktimes/index.html b/content/en/case-studies/newyorktimes/index.html index 53dbd06a55..ae57a9ec7d 100644 --- a/content/en/case-studies/newyorktimes/index.html +++ b/content/en/case-studies/newyorktimes/index.html @@ -2,107 +2,73 @@ title: New York Times Case Study case_study_styles: true cid: caseStudies -css: /css/style_case_studies.css + +new_case_study_styles: true +heading_background: /images/case-studies/newyorktimes/banner1.jpg +heading_title_logo: /images/newyorktimes_logo.png +subheading: > + The New York Times: From Print to the Web to Cloud Native +case_study_details: + - Company: New York Times + - Location: New York, N.Y. + - Industry: News Media --- -
-

CASE STUDY:
The New York Times: From Print to the Web to Cloud Native +

Challenge

-

+

When the company decided a few years ago to move out of its data centers, its first deployments on the public cloud were smaller, less critical applications managed on virtual machines. "We started building more and more tools, and at some point we realized that we were doing a disservice by treating Amazon as another data center," says Deep Kapadia, Executive Director, Engineering at The New York Times. Kapadia was tapped to lead a Delivery Engineering Team that would "design for the abstractions that cloud providers offer us."

-
+

Solution

-
- Company  New York Times     Location  New York, N.Y. -     Industry  News Media -
+

The team decided to use Google Cloud Platform and its Kubernetes-as-a-service offering, GKE.

-
-
-
-
-

Challenge

- When the company decided a few years ago to move out of its data centers, its first deployments on the public cloud were smaller, less critical applications managed on virtual machines. "We started building more and more tools, and at some point we realized that we were doing a disservice by treating Amazon as another data center," says Deep Kapadia, Executive Director, Engineering at The New York Times. Kapadia was tapped to lead a Delivery Engineering Team that would "design for the abstractions that cloud providers offer us." +

Impact

-

-

Solution

- The team decided to use Google Cloud Platform and its Kubernetes-as-a-service offering, GKE. -
-
+

Speed of delivery increased. Some of the legacy VM-based deployments took 45 minutes; with Kubernetes, that time was "just a few seconds to a couple of minutes," says Engineering Manager Brian Balser. Adds Li: "Teams that used to deploy on weekly schedules or had to coordinate schedules with the infrastructure team now deploy their updates independently, and can do it daily when necessary." Adopting Cloud Native Computing Foundation technologies allows for a more unified approach to deployment across the engineering staff, and portability for the company.

-
- -

Impact

- Speed of delivery increased. Some of the legacy VM-based deployments took 45 minutes; with Kubernetes, that time was "just a few seconds to a couple of minutes," says Engineering Manager Brian Balser. Adds Li: "Teams that used to deploy on weekly schedules or had to coordinate schedules with the infrastructure team now deploy their updates independently, and can do it daily when necessary." Adopting Cloud Native Computing Foundation technologies allows for a more unified approach to deployment across the engineering staff, and portability for the company. -
-
- -
-
-
-
-
- - -
-
-
-

-
-
+{{< case-studies/quote author="Deep Kapadia, Executive Director, Engineering at The New York Times">}} + +
- "I think once you get over the initial hump, things get a lot easier and actually a lot faster." — Deep Kapadia, Executive Director, Engineering at The New York Times
-
- -
-
- Founded in 1851 and known as the newspaper of record, The New York Times is a digital pioneer: Its first website launched in 1996, before Google even existed. After the company decided a few years ago to move out of its private data centers—including one located in the pricy real estate of Manhattan. It recently took another step into the future by going cloud native.

- At first, the infrastructure team "managed the virtual machines in the Amazon cloud, and they deployed more critical applications in our data centers and the less critical ones on AWS as an experiment," says Deep Kapadia, Executive Director, Engineering at The New York Times. "We started building more and more tools, and at some point we realized that we were doing a disservice by treating Amazon as another data center."

- To get the most out of the cloud, Kapadia was tapped to lead a new Delivery Engineering Team that would "design for the abstractions that cloud providers offer us." In mid-2016, they began looking at the Google Cloud Platform and its Kubernetes-as-a-service offering, GKE.

- At the time, says team member Tony Li, a Site Reliability Engineer, "We had some internal tooling that attempted to do what Kubernetes does for containers, but for VMs. We asked why are we building and maintaining these tools ourselves?"

- In early 2017, the first production application—the nytimes.com mobile homepage—began running on Kubernetes, serving just 1% of the traffic. Today, almost 100% of the nytimes.com site’s end-user facing applications run on GCP, with the majority on Kubernetes. +"I think once you get over the initial hump, things get a lot easier and actually a lot faster." +{{< /case-studies/quote >}} -
-
-
-
- "We had some internal tooling that attempted to do what Kubernetes does for containers, but for VMs. We asked why are we building and maintaining these tools ourselves?" -
-
-
-
+

Founded in 1851 and known as the newspaper of record, The New York Times is a digital pioneer: Its first website launched in 1996, before Google even existed. After the company decided a few years ago to move out of its private data centers—including one located in the pricy real estate of Manhattan. It recently took another step into the future by going cloud native.

- The team found that the speed of delivery was immediately impacted. "Deploying Docker images versus spinning up VMs was quite a lot faster," says Engineering Manager Brian Balser. Some of the legacy VM-based deployments took 45 minutes; with Kubernetes, that time was "just a few seconds to a couple of minutes."

- The plan is to get as much as possible, not just the website, running on Kubernetes, and beyond that, moving toward serverless deployments. For instance, The New York Times crossword app was built on Google App Engine, which has been the main platform for the company’s experimentation with serverless. "The hardest part was getting the engineers over the hurdle of how little they had to do," Chief Technology Officer Nick Rockwell recently told The CTO Advisor. "Our experience has been very, very good. We have invested a lot of work into deploying apps on container services, and I’m really excited about experimenting with deploying those on App Engine Flex and AWS Fargate and seeing how that feels, because that’s a great migration path."

- There are some exceptions to the move to cloud native, of course. "We have the print publishing business as well," says Kapadia. "A lot of that is definitely not going down the cloud-native path because they’re using vendor software and even special machinery that prints the physical paper. But even those teams are looking at things like App Engine and Kubernetes if they can."

- Kapadia acknowledges that there was a steep learning curve for some engineers, but "I think once you get over the initial hump, things get a lot easier and actually a lot faster." +

At first, the infrastructure team "managed the virtual machines in the Amazon cloud, and they deployed more critical applications in our data centers and the less critical ones on AWS as an experiment," says Deep Kapadia, Executive Director, Engineering at The New York Times. "We started building more and more tools, and at some point we realized that we were doing a disservice by treating Amazon as another data center."

-
-
-
-
- "Right now, every team is running a small Kubernetes cluster, but it would be nice if we could all live in a larger ecosystem," says Kapadia. "Then we can harness the power of things like service mesh proxies that can actually do a lot of instrumentation between microservices, or service-to-service orchestration. Those are the new things that we want to experiment with as we go forward." +

To get the most out of the cloud, Kapadia was tapped to lead a new Delivery Engineering Team that would "design for the abstractions that cloud providers offer us." In mid-2016, they began looking at the Google Cloud Platform and its Kubernetes-as-a-service offering, GKE.

-
-
-
-
- At The New York Times, they did. As teams started sharing their own best practices with each other, "We’re no longer the bottleneck for figuring out certain things," Kapadia says. "Most of the infrastructure and systems were managed by a centralized function. We’ve sort of blown that up, partly because Google and Amazon have tools that allow us to do that. We provide teams with complete ownership of their Google Cloud Platform projects, and give them a set of sensible defaults or standards. We let them know, ‘If this works for you as is, great! If not, come talk to us and we’ll figure out how to make it work for you.’"

- As a result, "It’s really allowed teams to move at a much more rapid pace than they were able to in the past," says Kapadia. Adds Li: "The use of GKE means each team can get their own compute cluster, reducing the number of individual instances they have to care about since developers can treat the cluster as a whole. Because the ticket-based workflow was removed from requesting resources and connections, developers can just call an API to get what they want. Teams that used to deploy on weekly schedules or had to coordinate schedules with the infrastructure team now deploy their updates independently, and can do it daily when necessary."

- Another benefit to adopting Kubernetes: allowing for a more unified approach to deployment across the engineering staff. "Before, many teams were building their own tools for deployment," says Balser. With Kubernetes—as well as the other CNCF projects The New York Times uses, including Fluentd to collect logs for all of its AWS servers, gRPC for its Publishing Pipeline, Prometheus, and Envoy—"we can benefit from the advances that each of these technologies make, instead of trying to catch up." +

At the time, says team member Tony Li, a Site Reliability Engineer, "We had some internal tooling that attempted to do what Kubernetes does for containers, but for VMs. We asked why are we building and maintaining these tools ourselves?"

-
+

In early 2017, the first production application—the nytimes.com mobile homepage—began running on Kubernetes, serving just 1% of the traffic. Today, almost 100% of the nytimes.com site's end-user facing applications run on GCP, with the majority on Kubernetes.

-
-
+{{< case-studies/quote image="/images/case-studies/newyorktimes/banner3.jpg" >}} +"We had some internal tooling that attempted to do what Kubernetes does for containers, but for VMs. We asked why are we building and maintaining these tools ourselves?" +{{< /case-studies/quote >}} -Li calls the Cloud Native Computing Foundation’s projects "a northern star that we can all look at and follow." -
-
+

The team found that the speed of delivery was immediately impacted. "Deploying Docker images versus spinning up VMs was quite a lot faster," says Engineering Manager Brian Balser. Some of the legacy VM-based deployments took 45 minutes; with Kubernetes, that time was "just a few seconds to a couple of minutes."

-
- These open-source technologies have given the company more portability. "CNCF has enabled us to follow an industry standard," says Kapadia. "It allows us to think about whether we want to move away from our current service providers. Most of our applications are connected to Fluentd. If we wish to switch our logging provider from provider A to provider B we can do that. We’re running Kubernetes in GCP today, but if we want to run it in Amazon or Azure, we could potentially look into that as well."

- Li calls the Cloud Native Computing Foundation’s projects "a northern star that we can all look at and follow." Led by that star, the team is looking ahead to a year of onboarding the remaining half of the 40 or so product engineering teams to extract even more value out of the technology. "Right now, every team is running a small Kubernetes cluster, but it would be nice if we could all live in a larger ecosystem," says Kapadia. "Then we can harness the power of things like service mesh proxies that can actually do a lot of instrumentation between microservices, or service-to-service orchestration. Those are the new things that we want to experiment with as we go forward." +

The plan is to get as much as possible, not just the website, running on Kubernetes, and beyond that, moving toward serverless deployments. For instance, The New York Times crossword app was built on Google App Engine, which has been the main platform for the company's experimentation with serverless. "The hardest part was getting the engineers over the hurdle of how little they had to do," Chief Technology Officer Nick Rockwell recently told The CTO Advisor. "Our experience has been very, very good. We have invested a lot of work into deploying apps on container services, and I'm really excited about experimenting with deploying those on App Engine Flex and AWS Fargate and seeing how that feels, because that's a great migration path."

-
-
+

There are some exceptions to the move to cloud native, of course. "We have the print publishing business as well," says Kapadia. "A lot of that is definitely not going down the cloud-native path because they're using vendor software and even special machinery that prints the physical paper. But even those teams are looking at things like App Engine and Kubernetes if they can."

+ +

Kapadia acknowledges that there was a steep learning curve for some engineers, but "I think once you get over the initial hump, things get a lot easier and actually a lot faster."

+ +{{< case-studies/quote image="/images/case-studies/newyorktimes/banner4.jpg" >}} +"Right now, every team is running a small Kubernetes cluster, but it would be nice if we could all live in a larger ecosystem," says Kapadia. "Then we can harness the power of things like service mesh proxies that can actually do a lot of instrumentation between microservices, or service-to-service orchestration. Those are the new things that we want to experiment with as we go forward." +{{< /case-studies/quote >}} + +

At The New York Times, they did. As teams started sharing their own best practices with each other, "We're no longer the bottleneck for figuring out certain things," Kapadia says. "Most of the infrastructure and systems were managed by a centralized function. We've sort of blown that up, partly because Google and Amazon have tools that allow us to do that. We provide teams with complete ownership of their Google Cloud Platform projects, and give them a set of sensible defaults or standards. We let them know, 'If this works for you as is, great! If not, come talk to us and we'll figure out how to make it work for you.'"

+ +

As a result, "It's really allowed teams to move at a much more rapid pace than they were able to in the past," says Kapadia. Adds Li: "The use of GKE means each team can get their own compute cluster, reducing the number of individual instances they have to care about since developers can treat the cluster as a whole. Because the ticket-based workflow was removed from requesting resources and connections, developers can just call an API to get what they want. Teams that used to deploy on weekly schedules or had to coordinate schedules with the infrastructure team now deploy their updates independently, and can do it daily when necessary."

+ +

Another benefit to adopting Kubernetes: allowing for a more unified approach to deployment across the engineering staff. "Before, many teams were building their own tools for deployment," says Balser. With Kubernetes—as well as the other CNCF projects The New York Times uses, including Fluentd to collect logs for all of its AWS servers, gRPC for its Publishing Pipeline, Prometheus, and Envoy—"we can benefit from the advances that each of these technologies make, instead of trying to catch up."

+ +{{< case-studies/quote >}} +Li calls the Cloud Native Computing Foundation's projects "a northern star that we can all look at and follow." +{{< /case-studies/quote >}} + +

These open-source technologies have given the company more portability. "CNCF has enabled us to follow an industry standard," says Kapadia. "It allows us to think about whether we want to move away from our current service providers. Most of our applications are connected to Fluentd. If we wish to switch our logging provider from provider A to provider B we can do that. We're running Kubernetes in GCP today, but if we want to run it in Amazon or Azure, we could potentially look into that as well."

+ +

Li calls the Cloud Native Computing Foundation's projects "a northern star that we can all look at and follow." Led by that star, the team is looking ahead to a year of onboarding the remaining half of the 40 or so product engineering teams to extract even more value out of the technology. "Right now, every team is running a small Kubernetes cluster, but it would be nice if we could all live in a larger ecosystem," says Kapadia. "Then we can harness the power of things like service mesh proxies that can actually do a lot of instrumentation between microservices, or service-to-service orchestration. Those are the new things that we want to experiment with as we go forward."

diff --git a/content/en/case-studies/nokia/index.html b/content/en/case-studies/nokia/index.html index f824685327..0de1426853 100644 --- a/content/en/case-studies/nokia/index.html +++ b/content/en/case-studies/nokia/index.html @@ -3,91 +3,75 @@ title: Nokia Case Study linkTitle: Nokia case_study_styles: true cid: caseStudies -css: /css/style_case_studies.css logo: nokia_featured_logo.png + +new_case_study_styles: true +heading_background: /images/case-studies/nokia/banner1.jpg +heading_title_logo: /images/nokia_logo.png +subheading: > + Nokia: Enabling 5G and DevOps at a Telecom Company with Kubernetes +case_study_details: + - Company: Nokia + - Location: Espoo, Finland + - Industry: Telecommunications --- +

Challenge

-
-

CASE STUDY:
Nokia: Enabling 5G and DevOps at a Telecom Company with Kubernetes +

Nokia's core business is building telecom networks end-to-end; its main products are related to the infrastructure, such as antennas, switching equipment, and routing equipment. "As telecom vendors, we have to deliver our software to several telecom operators and put the software into their infrastructure, and each of the operators have a bit different infrastructure," says Gergely Csatari, Senior Open Source Engineer. "There are operators who are running on bare metal. There are operators who are running on virtual machines. There are operators who are running on VMware Cloud and OpenStack Cloud. We want to run the same product on all of these different infrastructures without changing the product itself."

-

+

Solution

-
+

The company decided that moving to cloud native technologies would allow teams to have infrastructure-agnostic behavior in their products. Teams at Nokia began experimenting with Kubernetes in pre-1.0 versions. "The simplicity of the label-based scheduling of Kubernetes was a sign that showed us this architecture will scale, will be stable, and will be good for our purposes," says Csatari. The first Kubernetes-based product, the Nokia Telephony Application Server, went live in early 2018. "Now, all the products are doing some kind of re-architecture work, and they're moving to Kubernetes."

-
- Company  Nokia     Location  Espoo, Finland -     Industry  Telecommunications -
- -
-
-
-
-

Challenge

- Nokia’s core business is building telecom networks end-to-end; its main products are related to the infrastructure, such as antennas, switching equipment, and routing equipment. "As telecom vendors, we have to deliver our software to several telecom operators and put the software into their infrastructure, and each of the operators have a bit different infrastructure," says Gergely Csatari, Senior Open Source Engineer. "There are operators who are running on bare metal. There are operators who are running on virtual machines. There are operators who are running on VMware Cloud and OpenStack Cloud. We want to run the same product on all of these different infrastructures without changing the product itself." - -

Solution

- The company decided that moving to cloud native technologies would allow teams to have infrastructure-agnostic behavior in their products. Teams at Nokia began experimenting with Kubernetes in pre-1.0 versions. "The simplicity of the label-based scheduling of Kubernetes was a sign that showed us this architecture will scale, will be stable, and will be good for our purposes," says Csatari. The first Kubernetes-based product, the Nokia Telephony Application Server, went live in early 2018. "Now, all the products are doing some kind of re-architecture work, and they’re moving to Kubernetes." -

Impact

- Kubernetes has enabled Nokia’s foray into 5G. "When you develop something that is part of the operator’s infrastructure, you have to develop it for the future, and Kubernetes and containers are the forward-looking technologies," says Csatari. The teams using Kubernetes are already seeing clear benefits. "By separating the infrastructure and the application layer, we have less dependencies in the system, which means that it’s easier to implement features in the application layer," says Csatari. And because teams can test the exact same binary artifact independently of the target execution environment, "we find more errors in early phases of the testing, and we do not need to run the same tests on different target environments, like VMware, OpenStack, or bare metal," he adds. As a result, "we save several hundred hours in every release." -
+

Kubernetes has enabled Nokia's foray into 5G. "When you develop something that is part of the operator's infrastructure, you have to develop it for the future, and Kubernetes and containers are the forward-looking technologies," says Csatari. The teams using Kubernetes are already seeing clear benefits. "By separating the infrastructure and the application layer, we have less dependencies in the system, which means that it's easier to implement features in the application layer," says Csatari. And because teams can test the exact same binary artifact independently of the target execution environment, "we find more errors in early phases of the testing, and we do not need to run the same tests on different target environments, like VMware, OpenStack, or bare metal," he adds. As a result, "we save several hundred hours in every release."

-
-
-
-
- "When people are picking up their phones and making a call on Nokia networks, they are creating containers in the background with Kubernetes." -

- Gergely Csatari, Senior Open Source Engineer, Nokia
-
-
-
-
-

Nokia was the first name in mobile phones when they were becoming ubiquitous in the late 1990s and early 2000s. But by 2014, the company had sold off its mobile device division and was focusing its core business not on the handhelds used for calls, but on the networks.

- Today, Nokia is building telecom networks end-to-end—from antennas to switching and routing equipment—serving operators in more than 120 countries. "As telecom vendors, we have to deliver our software to several telecom operators and put the software into their infrastructure, and each of the operators have a bit different infrastructure," says Gergely Csatari, Senior Open Source Engineer at Nokia. "There are operators who are running on bare metal. There are operators who are running on virtual machines. There are operators who are running on VMware Cloud and OpenStack Cloud. We want to run the same product on all of these different infrastructures without changing the product itself."

- Looking for a way to allow its teams to build products with infrastructure-agnostic behavior, the company decided to embrace containerization, Kubernetes, and other cloud native technologies, a move that is being made across the telecom industry. Since early 2018, "when people are picking up their phones and making a call on Nokia networks, they are creating containers in the background with Kubernetes," says Csatari. "Now, all the products are doing some kind of re-architecture work, and they’re moving to Kubernetes." +{{< case-studies/quote author="Gergely Csatari, Senior Open Source Engineer, Nokia" >}} +"When people are picking up their phones and making a call on Nokia networks, they are creating containers in the background with Kubernetes." +{{< /case-studies/quote >}} -
-
-
-
- "Having the community and CNCF around Kubernetes is not only important for having a connection to other companies who are using Kubernetes and a forum where you can ask or discuss features of Kubernetes. But as a company who would like to contribute to Kubernetes, it was very important to have a CLA (Contributors License Agreement) which is connected to the CNCF and not to a particular company. That was a critical step for us to start contributing to Kubernetes and Helm."

- Gergely Csatari, Senior Open Source Engineer, Nokia
+{{< case-studies/lead >}} +Nokia was the first name in mobile phones when they were becoming ubiquitous in the late 1990s and early 2000s. But by 2014, the company had sold off its mobile device division and was focusing its core business not on the handhelds used for calls, but on the networks. +{{< /case-studies/lead >}} -
-
-
-
- Nokia’s cloud native journey began about two years ago, when Csatari’s team was building the company’s Telephony Application Server (TAS). "We wanted to have a service execution engine in the product, which was a totally separate function from all other parts," he says. "There, we had the possibility to think about new architectures and new tools that we could use. We created this particular product based on Kubernetes, and we liked the work, so we started to talk about cloud native and containers and all of these things. We did a very extensive research of different container orchestration tools. We knew that we have some, let’s say, strange or different requirements because of the special environment that our software is running on."

- For one thing, Nokia’s software serves millions of people, and is required to have the carrier-grade "five nines" availability: to be up 99.999% of the time. "If you turn it to minutes, this means we’re allowed to have only 10 minutes of downtime in a whole year," says Csatari. "Downtime here means that you are not able to serve the person to full capacity, which means that we cannot fail. This includes software upgrades, everything, because when you call 911, you’re using our software, and you expect that it will work."

- That meant that they needed to be able to set affinity and anti-affinity rules in their orchestration tools. "You cannot put all of the functions to the same physical host because physical hosts are failing," Csatari explains. "If you fail with one physical host, then you lose all of the core processing processes. Then there are no calls going through. So we have to divide them among the different physical hosts. At that time, only Kubernetes was able to provide these features. The simplicity of the label-based scheduling of Kubernetes was a sign that showed us this architecture will scale, will be stable, and will be good for our purposes." +

Today, Nokia is building telecom networks end-to-end—from antennas to switching and routing equipment—serving operators in more than 120 countries. "As telecom vendors, we have to deliver our software to several telecom operators and put the software into their infrastructure, and each of the operators have a bit different infrastructure," says Gergely Csatari, Senior Open Source Engineer at Nokia. "There are operators who are running on bare metal. There are operators who are running on virtual machines. There are operators who are running on VMware Cloud and OpenStack Cloud. We want to run the same product on all of these different infrastructures without changing the product itself."

-
-
-
-
- "Kubernetes opened the window to all of these open source projects instead of implementing everything in house. Our engineers can focus more on the application level, which is actually the thing what we are selling, and not on the infrastructure level. For us, the most important thing about Kubernetes is it allows us to focus on value creation of our business."

- Gergely Csatari, Senior Open Source Engineer, Nokia
-
-
+

Looking for a way to allow its teams to build products with infrastructure-agnostic behavior, the company decided to embrace containerization, Kubernetes, and other cloud native technologies, a move that is being made across the telecom industry. Since early 2018, "when people are picking up their phones and making a call on Nokia networks, they are creating containers in the background with Kubernetes," says Csatari. "Now, all the products are doing some kind of re-architecture work, and they're moving to Kubernetes."

-
-
- The TAS went live in early 2018, and now Kubernetes is also enabling Nokia’s foray into 5G. The company is introducing microservices architecture and Kubernetes while adding 5G features to existing products. And all new 5G product development will be on top of Kubernetes. "When you develop something that is part of the operator’s infrastructure, you have to develop it for the future, and Kubernetes and containers are the forward-looking technologies," says Csatari.

- There have been real time savings thanks to Kubernetes. "By separating the infrastructure and the application layer, we have less dependencies in the system, which means that it’s easier to implement features in the application layer," says Csatari. Because teams can test the exact same binary artifact independently of the target execution environment, "we find more errors in early phases of the testing, and we do not need to run the same tests on different target environments, like VMware, OpenStack or bare metal," he adds. As a result, "we save several hundred hours in every release."

- Moving from Nokia’s legacy cluster management system, which had been built in-house more than thirty years ago, to a Kubernetes platform also meant that "we started using Linux as a base operating system, so we just opened the window to all of these open source projects instead of implementing everything in house," says Csatari. (From CNCF’s ecosystem, the team is already using Helm, gRPC, CNI, Prometheus, and Envoy, and plans to implement CoreDNS.) "Our engineers can focus more on the application level, which is actually the thing what we are selling, and not on the infrastructure level. For us, the most important thing about Kubernetes is it allows us to focus on value creation of our business." +{{< case-studies/quote + image="/images/case-studies/nokia/banner3.jpg" + author="Gergely Csatari, Senior Open Source Engineer, Nokia" +>}} +"Having the community and CNCF around Kubernetes is not only important for having a connection to other companies who are using Kubernetes and a forum where you can ask or discuss features of Kubernetes. But as a company who would like to contribute to Kubernetes, it was very important to have a CLA (Contributors License Agreement) which is connected to the CNCF and not to a particular company. That was a critical step for us to start contributing to Kubernetes and Helm." +{{< /case-studies/quote >}} -
+

Nokia's cloud native journey began about two years ago, when Csatari's team was building the company's Telephony Application Server (TAS). "We wanted to have a service execution engine in the product, which was a totally separate function from all other parts," he says. "There, we had the possibility to think about new architectures and new tools that we could use. We created this particular product based on Kubernetes, and we liked the work, so we started to talk about cloud native and containers and all of these things. We did a very extensive research of different container orchestration tools. We knew that we have some, let's say, strange or different requirements because of the special environment that our software is running on."

-
-
-"I had some discussions at KubeCon with people from the networking SIG and the resource management working group, to work together on our requirements, and that’s very exciting for me and my colleagues,"

- Gergely Csatari, Senior Open Source Engineer, Nokia
-
+

For one thing, Nokia's software serves millions of people, and is required to have the carrier-grade "five nines" availability: to be up 99.999% of the time. "If you turn it to minutes, this means we're allowed to have only 10 minutes of downtime in a whole year," says Csatari. "Downtime here means that you are not able to serve the person to full capacity, which means that we cannot fail. This includes software upgrades, everything, because when you call 911, you're using our software, and you expect that it will work."

-
- The company has a long-term goal of moving the entire product portfolio into the Kubernetes platform. To that end, Nokia teams are working together with other companies to add the features needed to use Kubernetes with the real-time, nanosecond-sensitive applications close to the edge of the radio network.

- And the CNCF community is proving to be a great forum for that collaboration. "I had some discussions at KubeCon with people from the networking SIG and the resource management working group, to work together on our requirements, and that’s very exciting for me and my colleagues," says Csatari. "Previously, everybody had the same problem, but everybody just did it in his own, and now we are trying to solve the same problem together."

- Perhaps the biggest impact that Kubernetes is having on Nokia, Csatari believes, is that people are starting to think about how a telecom company can do DevOps. "We are building a DevOps pipeline, which reaches from the actual developer to the customers, and thinking about new ways how can we digitally deliver our software to our customers and get feedback from the customers right to the engineers," he says. "This is something that will fundamentally change how telecom companies are delivering software, and how quickly can we develop new features. This is because of the usage of containers and, of course, the usage of Kubernetes." +

That meant that they needed to be able to set affinity and anti-affinity rules in their orchestration tools. "You cannot put all of the functions to the same physical host because physical hosts are failing," Csatari explains. "If you fail with one physical host, then you lose all of the core processing processes. Then there are no calls going through. So we have to divide them among the different physical hosts. At that time, only Kubernetes was able to provide these features. The simplicity of the label-based scheduling of Kubernetes was a sign that showed us this architecture will scale, will be stable, and will be good for our purposes."

-
-
+{{< case-studies/quote + image="/images/case-studies/nokia/banner4.jpg" + author="Gergely Csatari, Senior Open Source Engineer, Nokia" +>}} +"Kubernetes opened the window to all of these open source projects instead of implementing everything in house. Our engineers can focus more on the application level, which is actually the thing what we are selling, and not on the infrastructure level. For us, the most important thing about Kubernetes is it allows us to focus on value creation of our business." +{{< /case-studies/quote >}} + +

The TAS went live in early 2018, and now Kubernetes is also enabling Nokia's foray into 5G. The company is introducing microservices architecture and Kubernetes while adding 5G features to existing products. And all new 5G product development will be on top of Kubernetes. "When you develop something that is part of the operator's infrastructure, you have to develop it for the future, and Kubernetes and containers are the forward-looking technologies," says Csatari.

+ +

There have been real time savings thanks to Kubernetes. "By separating the infrastructure and the application layer, we have less dependencies in the system, which means that it's easier to implement features in the application layer," says Csatari. Because teams can test the exact same binary artifact independently of the target execution environment, "we find more errors in early phases of the testing, and we do not need to run the same tests on different target environments, like VMware, OpenStack or bare metal," he adds. As a result, "we save several hundred hours in every release."

+ +

Moving from Nokia's legacy cluster management system, which had been built in-house more than thirty years ago, to a Kubernetes platform also meant that "we started using Linux as a base operating system, so we just opened the window to all of these open source projects instead of implementing everything in house," says Csatari. (From CNCF's ecosystem, the team is already using Helm, gRPC, CNI, Prometheus, and Envoy, and plans to implement CoreDNS.) "Our engineers can focus more on the application level, which is actually the thing what we are selling, and not on the infrastructure level. For us, the most important thing about Kubernetes is it allows us to focus on value creation of our business."

+ +{{< case-studies/quote author="Gergely Csatari, Senior Open Source Engineer, Nokia" >}} +"I had some discussions at KubeCon with people from the networking SIG and the resource management working group, to work together on our requirements, and that's very exciting for me and my colleagues," +{{< /case-studies/quote >}} + +

The company has a long-term goal of moving the entire product portfolio into the Kubernetes platform. To that end, Nokia teams are working together with other companies to add the features needed to use Kubernetes with the real-time, nanosecond-sensitive applications close to the edge of the radio network.

+ +

And the CNCF community is proving to be a great forum for that collaboration. "I had some discussions at KubeCon with people from the networking SIG and the resource management working group, to work together on our requirements, and that's very exciting for me and my colleagues," says Csatari. "Previously, everybody had the same problem, but everybody just did it in his own, and now we are trying to solve the same problem together."

+ +

Perhaps the biggest impact that Kubernetes is having on Nokia, Csatari believes, is that people are starting to think about how a telecom company can do DevOps. "We are building a DevOps pipeline, which reaches from the actual developer to the customers, and thinking about new ways how can we digitally deliver our software to our customers and get feedback from the customers right to the engineers," he says. "This is something that will fundamentally change how telecom companies are delivering software, and how quickly can we develop new features. This is because of the usage of containers and, of course, the usage of Kubernetes."

diff --git a/content/en/case-studies/nordstrom/index.html b/content/en/case-studies/nordstrom/index.html index 788453de35..73bc4e147e 100644 --- a/content/en/case-studies/nordstrom/index.html +++ b/content/en/case-studies/nordstrom/index.html @@ -2,109 +2,74 @@ title: Nordstrom Case Study case_study_styles: true cid: caseStudies -css: /css/style_case_studies.css + +new_case_study_styles: true +heading_background: /images/case-studies/nordstrom/banner1.jpg +heading_title_logo: /images/nordstrom_logo.png +subheading: > + Finding Millions in Potential Savings in a Tough Retail Climate +case_study_details: + - Company: Nordstrom + - Location: Seattle, Washington + - Industry: Retail --- -
-

CASE STUDY:
Finding Millions in Potential Savings in a Tough Retail Climate +

Challenge

+

Nordstrom wanted to increase the efficiency and speed of its technology operations, which includes the Nordstrom.com e-commerce site. At the same time, Nordstrom Technology was looking for ways to tighten its technology operational costs.

-

+

Solution

-
+

After embracing a DevOps transformation and launching a continuous integration/continuous deployment (CI/CD) project four years ago, the company reduced its deployment time from three months to 30 minutes. But they wanted to go even faster across environments, so they began their cloud native journey, adopting Docker containers orchestrated with Kubernetes.

-
- Company  Nordstrom     Location  Seattle, Washington     Industry  Retail -
+

Impact

-
-
-
-
-

Challenge

- Nordstrom wanted to increase the efficiency and speed of its technology operations, which includes the Nordstrom.com e-commerce site. At the same time, Nordstrom Technology was looking for ways to tighten its technology operational costs. +

Nordstrom Technology developers using Kubernetes now deploy faster and can "just focus on writing applications," says Dhawal Patel, a senior engineer on the team building a Kubernetes enterprise platform for Nordstrom. Furthermore, the team has increased Ops efficiency, improving CPU utilization from 5x to 12x depending on the workload. "We run thousands of virtual machines (VMs), but aren't effectively using all those resources," says Patel. "With Kubernetes, without even trying to make our cluster efficient, we are currently at a 10x increase."

-
-

Solution

- After embracing a DevOps transformation and launching a continuous integration/continuous deployment (CI/CD) project four years ago, the company reduced its deployment time from three months to 30 minutes. But they wanted to go even faster across environments, so they began their cloud native journey, adopting Docker containers orchestrated with Kubernetes. +{{< case-studies/quote author="Dhawal Patel, senior engineer at Nordstrom" >}} +"We are always looking for ways to optimize and provide more value through technology. With Kubernetes we are showcasing two types of efficiency that we can bring: Dev efficiency and Ops efficiency. It's a win-win." +{{< /case-studies/quote >}} -
+

When Dhawal Patel joined Nordstrom five years ago as an application developer for the retailer's website, he realized there was an opportunity to help speed up development cycles.

+

In those early DevOps days, Nordstrom Technology still followed a traditional model of silo teams and functions. "As a developer, I was spending more time fixing environments than writing code and adding value to business," Patel says. "I was passionate about that—so I was given the opportunity to help fix it."

-
+

The company was eager to move faster, too, and in 2013 launched the first continuous integration/continuous deployment (CI/CD) project. That project was the first step in Nordstrom's cloud native journey.

-
+

Dev and Ops team members built a CI/CD pipeline, working with the company's servers on premise. The team chose Chef, and wrote cookbooks that automated virtual IP creation, servers, and load balancing. "After we completed the project, deployment went from three months to 30 minutes," says Patel. "We still had multiple environments—dev, test, staging, then production—so with each environment running the Chef cookbooks, it took 30 minutes. It was a huge achievement at that point."

-

Impact

- Nordstrom Technology developers using Kubernetes now deploy faster and can "just focus on writing applications," says Dhawal Patel, a senior engineer on the team building a Kubernetes enterprise platform for Nordstrom. Furthermore, the team has increased Ops efficiency, improving CPU utilization from 5x to 12x depending on the workload. "We run thousands of virtual machines (VMs), but aren’t effectively using all those resources," says Patel. "With Kubernetes, without even trying to make our cluster efficient, we are currently at a 10x increase." +

But new environments still took too long to turn up, so the next step was working in the cloud. Today, Nordstrom Technology has built an enterprise platform that allows the company's 1,500 developers to deploy applications running as Docker containers in the cloud, orchestrated with Kubernetes.

-
-
-
-
-
- "We are always looking for ways to optimize and provide more value through technology. With Kubernetes we are showcasing two types of efficiency that we can bring: Dev efficiency and Ops efficiency. It’s a win-win." -

-— Dhawal Patel, senior engineer at Nordstrom
-
-
-
-
- When Dhawal Patel joined Nordstrom five years ago as an application developer for the retailer’s website, he realized there was an opportunity to help speed up development cycles. -

- In those early DevOps days, Nordstrom Technology still followed a traditional model of silo teams and functions. "As a developer, I was spending more time fixing environments than writing code and adding value to business," Patel says. "I was passionate about that—so I was given the opportunity to help fix it." -

- The company was eager to move faster, too, and in 2013 launched the first continuous integration/continuous deployment (CI/CD) project. That project was the first step in Nordstrom’s cloud native journey. -

- Dev and Ops team members built a CI/CD pipeline, working with the company’s servers on premise. The team chose Chef, and wrote cookbooks that automated virtual IP creation, servers, and load balancing. "After we completed the project, deployment went from three months to 30 minutes," says Patel. "We still had multiple environments—dev, test, staging, then production—so with each environment running the Chef cookbooks, it took 30 minutes. It was a huge achievement at that point." -

But new environments still took too long to turn up, so the next step was working in the cloud. Today, Nordstrom Technology has built an enterprise platform that allows the company’s 1,500 developers to deploy applications running as Docker containers in the cloud, orchestrated with Kubernetes. +{{< case-studies/quote image="/images/case-studies/nordstrom/banner3.jpg" >}} +"We made a bet that Kubernetes was going to take off, informed by early indicators of community support and project velocity, so we rebuilt our system with Kubernetes at the core," +{{< /case-studies/quote >}} -
-
-
-
- "We made a bet that Kubernetes was going to take off, informed by early indicators of community support and project velocity, so we rebuilt our system with Kubernetes at the core," -
-
-
-
+

"The cloud provided faster access to resources, because it took weeks for us to get a virtual machine (VM) on premises," says Patel. "But now we can do the same thing in only five minutes."

-"The cloud provided faster access to resources, because it took weeks for us to get a virtual machine (VM) on premises," says Patel. "But now we can do the same thing in only five minutes." -

-Nordstrom’s first foray into scheduling containers on a cluster was a homegrown system based on CoreOS fleet. They began doing a few proofs of concept projects with that system until Kubernetes 1.0 was released when they made the switch. "We made a bet that Kubernetes was going to take off, informed by early indicators of community support and project velocity, so we rebuilt our system with Kubernetes at the core," says Marius Grigoriu, Sr. Manager of the Kubernetes team at Nordstrom. -While Kubernetes is often thought as a platform for microservices, the first application to launch on Kubernetes in a critical production role at Nordstrom was Jira. "It was not the ideal microservice we were hoping to get as our first application," Patel admits, "but the team that was working on it was really passionate about Docker and Kubernetes, and they wanted to try it out. They had their application running on premises, and wanted to move it to Kubernetes." -

-The benefits were immediate for the teams that came on board. "Teams running on our Kubernetes cluster loved the fact that they had fewer issues to worry about. They didn’t need to manage infrastructure or operating systems," says Grigoriu. "Early adopters loved the declarative nature of Kubernetes. They loved the reduced surface area they had to deal with." +

Nordstrom's first foray into scheduling containers on a cluster was a homegrown system based on CoreOS fleet. They began doing a few proofs of concept projects with that system until Kubernetes 1.0 was released when they made the switch. "We made a bet that Kubernetes was going to take off, informed by early indicators of community support and project velocity, so we rebuilt our system with Kubernetes at the core," says Marius Grigoriu, Sr. Manager of the Kubernetes team at Nordstrom.

-
-
-
-
- "Teams running on our Kubernetes cluster loved the fact that they had fewer issues to worry about. They didn’t need to manage infrastructure or operating systems," says Grigoriu. "Early adopters loved the declarative nature of Kubernetes. They loved the reduced surface area they had to deal with." -
-
+

While Kubernetes is often thought as a platform for microservices, the first application to launch on Kubernetes in a critical production role at Nordstrom was Jira. "It was not the ideal microservice we were hoping to get as our first application," Patel admits, "but the team that was working on it was really passionate about Docker and Kubernetes, and they wanted to try it out. They had their application running on premises, and wanted to move it to Kubernetes."

-
-
- To support these early adopters, Patel’s team began growing the cluster and building production-grade services. "We integrated with Prometheus for monitoring, with a Grafana front end; we used Fluentd to push logs to Elasticsearch, so that gives us log aggregation," says Patel. The team also added dozens of open-source components, including CNCF projects and has made contributions to Kubernetes, Terraform, and kube2iam. -

-There are now more than 60 development teams running Kubernetes in Nordstrom Technology, and as success stories have popped up, more teams have gotten on board. "Our initial customer base, the ones who were willing to try this out, are now going and evangelizing to the next set of users," says Patel. "One early adopter had Docker containers and he was not sure how to run it in production. We sat with him and within 15 minutes we deployed it in production. He thought it was amazing, and more people in his org started coming in." -

-For Nordstrom Technology, going cloud-native has vastly improved development and operational efficiency. The developers using Kubernetes now deploy faster and can focus on building value in their applications. One such team started with a 25-minute merge to deploy by launching virtual machines in the cloud. Switching to Kubernetes was a 5x speedup in their process, improving their merge to deploy time to 5 minutes. -
+

The benefits were immediate for the teams that came on board. "Teams running on our Kubernetes cluster loved the fact that they had fewer issues to worry about. They didn't need to manage infrastructure or operating systems," says Grigoriu. "Early adopters loved the declarative nature of Kubernetes. They loved the reduced surface area they had to deal with."

-
-
- "With Kubernetes, without even trying to make our cluster efficient, we are currently at 40 percent CPU utilization—a 10x increase. we are running 2600+ customer pods that would have been 2600+ VMs if they had gone directly to the cloud. We are running them on 40 VMs now, so that’s a huge reduction in operational overhead." -
-
+{{< case-studies/quote image="/images/case-studies/nordstrom/banner4.jpg">}} +"Teams running on our Kubernetes cluster loved the fact that they had fewer issues to worry about. They didn't need to manage infrastructure or operating systems," says Grigoriu. "Early adopters loved the declarative nature of Kubernetes. They loved the reduced surface area they had to deal with." +{{< /case-studies/quote >}} -
- Speed is great, and easily demonstrated, but perhaps the bigger impact lies in the operational efficiency. "We run thousands of VMs on AWS, and their overall average CPU utilization is about four percent," says Patel. "With Kubernetes, without even trying to make our cluster efficient, we are currently at 40 percent CPU utilization—a 10x increase. We are running 2600+ customer pods that would have been 2600+ VMs if they had gone directly to the cloud. We are running them on 40 VMs now, so that’s a huge reduction in operational overhead." -

- Nordstrom Technology is also exploring running Kubernetes on bare metal on premises. "If we can build an on-premises Kubernetes cluster," says Patel, "we could bring the power of cloud to provision resources fast on-premises. Then for the developer, their interface is Kubernetes; they might not even realize or care that their services are now deployed on premises because they’re only working with Kubernetes." - For that reason, Patel is eagerly following Kubernetes’ development of multi-cluster capabilities. "With cluster federation, we can have our on-premise as the primary cluster and the cloud as a secondary burstable cluster," he says. "So, when there is an anniversary sale or Black Friday sale, and we need more containers - we can go to the cloud." -

- That kind of possibility—as well as the impact that Grigoriu and Patel’s team has already delivered using Kubernetes—is what led Nordstrom on its cloud native journey in the first place. "The way the retail environment is today, we are trying to build responsiveness and flexibility where we can," says Grigoriu. "Kubernetes makes it easy to: bring efficiency to both the Dev and Ops side of the equation. It’s a win-win." +

To support these early adopters, Patel's team began growing the cluster and building production-grade services. "We integrated with Prometheus for monitoring, with a Grafana front end; we used Fluentd to push logs to Elasticsearch, so that gives us log aggregation," says Patel. The team also added dozens of open-source components, including CNCF projects and has made contributions to Kubernetes, Terraform, and kube2iam.

-
-
+

There are now more than 60 development teams running Kubernetes in Nordstrom Technology, and as success stories have popped up, more teams have gotten on board. "Our initial customer base, the ones who were willing to try this out, are now going and evangelizing to the next set of users," says Patel. "One early adopter had Docker containers and he was not sure how to run it in production. We sat with him and within 15 minutes we deployed it in production. He thought it was amazing, and more people in his org started coming in."

+ +

For Nordstrom Technology, going cloud-native has vastly improved development and operational efficiency. The developers using Kubernetes now deploy faster and can focus on building value in their applications. One such team started with a 25-minute merge to deploy by launching virtual machines in the cloud. Switching to Kubernetes was a 5x speedup in their process, improving their merge to deploy time to 5 minutes.

+ +{{< case-studies/quote >}} +"With Kubernetes, without even trying to make our cluster efficient, we are currently at 40 percent CPU utilization—a 10x increase. we are running 2600+ customer pods that would have been 2600+ VMs if they had gone directly to the cloud. We are running them on 40 VMs now, so that's a huge reduction in operational overhead." +{{< /case-studies/quote >}} + +

Speed is great, and easily demonstrated, but perhaps the bigger impact lies in the operational efficiency. "We run thousands of VMs on AWS, and their overall average CPU utilization is about four percent," says Patel. "With Kubernetes, without even trying to make our cluster efficient, we are currently at 40 percent CPU utilization—a 10x increase. We are running 2600+ customer pods that would have been 2600+ VMs if they had gone directly to the cloud. We are running them on 40 VMs now, so that's a huge reduction in operational overhead."

+ +

Nordstrom Technology is also exploring running Kubernetes on bare metal on premises. "If we can build an on-premises Kubernetes cluster," says Patel, "we could bring the power of cloud to provision resources fast on-premises. Then for the developer, their interface is Kubernetes; they might not even realize or care that their services are now deployed on premises because they're only working with Kubernetes."

+ +

For that reason, Patel is eagerly following Kubernetes' development of multi-cluster capabilities. "With cluster federation, we can have our on-premise as the primary cluster and the cloud as a secondary burstable cluster," he says. "So, when there is an anniversary sale or Black Friday sale, and we need more containers - we can go to the cloud."

+ +

That kind of possibility—as well as the impact that Grigoriu and Patel's team has already delivered using Kubernetes—is what led Nordstrom on its cloud native journey in the first place. "The way the retail environment is today, we are trying to build responsiveness and flexibility where we can," says Grigoriu. "Kubernetes makes it easy to: bring efficiency to both the Dev and Ops side of the equation. It's a win-win."

diff --git a/content/en/case-studies/northwestern-mutual/index.html b/content/en/case-studies/northwestern-mutual/index.html index 47b4bbc7be..a25d25484c 100644 --- a/content/en/case-studies/northwestern-mutual/index.html +++ b/content/en/case-studies/northwestern-mutual/index.html @@ -2,98 +2,68 @@ title: Northwestern Mutual Case Study case_study_styles: true cid: caseStudies -css: /css/style_case_studies.css + +new_case_study_styles: true +heading_background: /images/case-studies/northwestern/banner1.jpg +heading_title_logo: /images/northwestern_logo.png +subheading: > + Cloud Native at Northwestern Mutual +case_study_details: + - Company: Northwestern Mutual + - Location: Milwaukee, WI + - Industry: Insurance and Financial Services --- -
-

CASE STUDY:
Cloud Native at Northwestern Mutual +

Challenge

+

In the spring of 2015, Northwestern Mutual acquired a fintech startup, LearnVest, and decided to take "Northwestern Mutual's leading products and services and meld it with LearnVest's digital experience and innovative financial planning platform," says Brad Williams, Director of Engineering for Client Experience, Northwestern Mutual. The company's existing infrastructure had been optimized for batch workflows hosted on on-prem networks; deployments were very traditional, focused on following a process instead of providing deployment agility. "We had to build a platform that was elastically scalable, but also much more responsive, so we could quickly get data to the client website so our end-customers have the experience they expect," says Williams.

-

+

Solution

-
- -
- Company  Northwestern Mutual     Location  Milwaukee, WI     Industry  Insurance and Financial Services -
- -
-
-
-
-

Challenge

- In the spring of 2015, Northwestern Mutual acquired a fintech startup, LearnVest, and decided to take "Northwestern Mutual’s leading products and services and meld it with LearnVest’s digital experience and innovative financial planning platform," says Brad Williams, Director of Engineering for Client Experience, Northwestern Mutual. The company’s existing infrastructure had been optimized for batch workflows hosted on on-prem networks; deployments were very traditional, focused on following a process instead of providing deployment agility. "We had to build a platform that was elastically scalable, but also much more responsive, so we could quickly get data to the client website so our end-customers have the experience they expect," says Williams. -
-

Solution

- The platform team came up with a plan for using the public cloud (AWS), Docker containers, and Kubernetes for orchestration. "Kubernetes gave us that base framework so teams can be very autonomous in what they’re building and deliver very quickly and frequently," says Northwestern Mutual Cloud Native Engineer Frank Greco Jr. The team also built and open-sourced Kanali, a Kubernetes-native API management tool that uses OpenTracing, Jaeger, and gRPC. - - -
- -
+

The platform team came up with a plan for using the public cloud (AWS), Docker containers, and Kubernetes for orchestration. "Kubernetes gave us that base framework so teams can be very autonomous in what they're building and deliver very quickly and frequently," says Northwestern Mutual Cloud Native Engineer Frank Greco Jr. The team also built and open-sourced Kanali, a Kubernetes-native API management tool that uses OpenTracing, Jaeger, and gRPC.

Impact

- Before, infrastructure deployments could take weeks; now, it is done in a matter of minutes. The number of deployments has increased dramatically, from about 24 a year to over 500 in just the first 10 months of 2017. Availability has also increased: There used to be a six-hour control window for commits every Sunday morning, as well as other periods of general maintenance, during which outages could happen. "Now we have eliminated the planned outage windows," says Bryan Pfremmer, App Platform Teams Manager, Northwestern Mutual. Kanali has had an impact on the bottom line. The vendor API management product that the company previously used required 23 servers, "dedicated, to only API management," says Pfremmer. "Now it’s all integrated in the existing stack and running as another deployment on Kubernetes. And that’s just one environment. Between the three that we had plus the test, that’s hard dollar savings." -
-
-
-
-
-
-"In a large enterprise, you’re going to have people using Kubernetes, but then you’re also going to have people using WAS and .NET. You may not be at a point where your whole stack can be cloud native. What if you can take your API management tool and make it cloud native, but still proxy to legacy systems? Using different pieces that are cloud native, open source and Kubernetes native, you can do pretty innovative stuff." — Frank Greco Jr., Cloud Native Engineer at Northwestern Mutual
-
-
-
-

For more than 160 years, Northwestern Mutual has maintained its industry leadership in part by keeping a strong focus on risk management.

- For many years, the company took a similar approach to managing its technology and has recently undergone a digital transformation to advance the company’s digital strategy - including making a lot of noise in the cloud-native world.

-In the spring of 2015, this insurance and financial services company acquired a fintech startup, LearnVest, and decided to take "Northwestern Mutual’s leading products and services and meld it with LearnVest’s digital experience and innovative financial planning platform," says Brad Williams, Director of Engineering for Client Experience, Northwestern Mutual. The company’s existing infrastructure had been optimized for batch workflows hosted on an on-premise datacenter; deployments were very traditional and had to many manual steps that were error prone.

-In order to give the company’s 4.5 million clients the digital experience they’d come to expect, says Williams, "We had to build a platform that was elastically scalable, but also much more responsive, so we could quickly get data to the client website. We essentially said, 'You build the system that you think is necessary to support a new, modern-facing one.’ That’s why we departed from anything legacy." +

Before, infrastructure deployments could take weeks; now, it is done in a matter of minutes. The number of deployments has increased dramatically, from about 24 a year to over 500 in just the first 10 months of 2017. Availability has also increased: There used to be a six-hour control window for commits every Sunday morning, as well as other periods of general maintenance, during which outages could happen. "Now we have eliminated the planned outage windows," says Bryan Pfremmer, App Platform Teams Manager, Northwestern Mutual. Kanali has had an impact on the bottom line. The vendor API management product that the company previously used required 23 servers, "dedicated, to only API management," says Pfremmer. "Now it's all integrated in the existing stack and running as another deployment on Kubernetes. And that's just one environment. Between the three that we had plus the test, that's hard dollar savings."

+{{< case-studies/quote author="Frank Greco Jr., Cloud Native Engineer at Northwestern Mutual">}} +"In a large enterprise, you're going to have people using Kubernetes, but then you're also going to have people using WAS and .NET. You may not be at a point where your whole stack can be cloud native. What if you can take your API management tool and make it cloud native, but still proxy to legacy systems? Using different pieces that are cloud native, open source and Kubernetes native, you can do pretty innovative stuff." +{{< /case-studies/quote >}} -
-
-
-
- "Kubernetes has definitely been the right choice for us. It gave us that base framework so teams can be autonomous in what they’re building and deliver very quickly and frequently." +{{< case-studies/lead >}} +For more than 160 years, Northwestern Mutual has maintained its industry leadership in part by keeping a strong focus on risk management. +{{< /case-studies/lead >}} -
-
-
-
- Williams and the rest of the platform team decided that the first step would be to start moving from private data centers to AWS. With a new microservice architecture in mind—and the freedom to implement what was best for the organization—they began using Docker containers. After looking into the various container orchestration options, they went with Kubernetes, even though it was still in beta at the time. "There was some debate whether we should build something ourselves, or just leverage that product and evolve with it," says Northwestern Mutual Cloud Native Engineer Frank Greco Jr. "Kubernetes has definitely been the right choice for us. It gave us that base framework so teams can be autonomous in what they’re building and deliver very quickly and frequently."

-As early adopters, the team had to do a lot of work with Ansible scripts to stand up the cluster. "We had a lot of hard security requirements given the nature of our business," explains Bryan Pfremmer, App Platform Teams Manager, Northwestern Mutual. "We found ourselves running a configuration that very few other people ever tried." The client experience group was the first to use the new platform; today, a few hundred of the company’s 1,500 engineers are using it and more are eager to get on board. -The results have been dramatic. Before, infrastructure deployments could take two weeks; now, it is done in a matter of minutes. Now with a focus on Infrastructure automation, and self-service, "You can take an app to production in that same day if you want to," says Pfremmer. +

For many years, the company took a similar approach to managing its technology and has recently undergone a digital transformation to advance the company's digital strategy - including making a lot of noise in the cloud-native world.

+

In the spring of 2015, this insurance and financial services company acquired a fintech startup, LearnVest, and decided to take "Northwestern Mutual's leading products and services and meld it with LearnVest's digital experience and innovative financial planning platform," says Brad Williams, Director of Engineering for Client Experience, Northwestern Mutual. The company's existing infrastructure had been optimized for batch workflows hosted on an on-premise datacenter; deployments were very traditional and had to many manual steps that were error prone.

-
-
-
-
+

In order to give the company's 4.5 million clients the digital experience they'd come to expect, says Williams, "We had to build a platform that was elastically scalable, but also much more responsive, so we could quickly get data to the client website. We essentially said, 'You build the system that you think is necessary to support a new, modern-facing one.' That's why we departed from anything legacy."

+ +{{< case-studies/quote image="/images/case-studies/northwestern/banner3.jpg" >}} +"Kubernetes has definitely been the right choice for us. It gave us that base framework so teams can be autonomous in what they're building and deliver very quickly and frequently." +{{< /case-studies/quote >}} + +

Williams and the rest of the platform team decided that the first step would be to start moving from private data centers to AWS. With a new microservice architecture in mind—and the freedom to implement what was best for the organization—they began using Docker containers. After looking into the various container orchestration options, they went with Kubernetes, even though it was still in beta at the time. "There was some debate whether we should build something ourselves, or just leverage that product and evolve with it," says Northwestern Mutual Cloud Native Engineer Frank Greco Jr. "Kubernetes has definitely been the right choice for us. It gave us that base framework so teams can be autonomous in what they're building and deliver very quickly and frequently."

+ +

As early adopters, the team had to do a lot of work with Ansible scripts to stand up the cluster. "We had a lot of hard security requirements given the nature of our business," explains Bryan Pfremmer, App Platform Teams Manager, Northwestern Mutual. "We found ourselves running a configuration that very few other people ever tried." The client experience group was the first to use the new platform; today, a few hundred of the company's 1,500 engineers are using it and more are eager to get on board.

+ +

The results have been dramatic. Before, infrastructure deployments could take two weeks; now, it is done in a matter of minutes. Now with a focus on Infrastructure automation, and self-service, "You can take an app to production in that same day if you want to," says Pfremmer.

+ +{{< case-studies/quote image="/images/case-studies/northwestern/banner4.jpg" >}} "Now, developers have autonomy, they can use this whenever they want, however they want. It becomes more valuable the more instrumentation downstream that happens, as we mature in it." -
-
+{{< /case-studies/quote >}} -
-
- The process used to be so cumbersome that minor bug releases would be bundled with feature releases. With the new streamlined system enabled by Kubernetes, the number of deployments has increased from about 24 a year to more than 500 in just the first 10 months of 2017. Availability has also been improved: There used to be a six-hour control window for commits every early Sunday morning, as well as other periods of general maintenance, during which outages could happen. "Now there’s no planned outage window," notes Pfremmer.

-Northwestern Mutual built that API management tool—called Kanali—and open sourced it in the summer of 2017. The team took on the project because it was a key capability for what they were building and prior the solution worked in an "anti-cloud native way that was different than everything else we were doing," says Greco. Now API management is just another container deployed to Kubernetes along with a separate Jaeger deployment.

-Now the engineers using the Kubernetes deployment platform have the added benefit of visibility in production—and autonomy. Before, a centralized team and would have to run a trace. "Now, developers have autonomy, they can use this whenever they want, however they want. It becomes more valuable the more instrumentation downstream that happens, as we mature in it." says Greco. +

The process used to be so cumbersome that minor bug releases would be bundled with feature releases. With the new streamlined system enabled by Kubernetes, the number of deployments has increased from about 24 a year to more than 500 in just the first 10 months of 2017. Availability has also been improved: There used to be a six-hour control window for commits every early Sunday morning, as well as other periods of general maintenance, during which outages could happen. "Now there's no planned outage window," notes Pfremmer.

+

Northwestern Mutual built that API management tool—called Kanali—and open sourced it in the summer of 2017. The team took on the project because it was a key capability for what they were building and prior the solution worked in an "anti-cloud native way that was different than everything else we were doing," says Greco. Now API management is just another container deployed to Kubernetes along with a separate Jaeger deployment.

-
+

Now the engineers using the Kubernetes deployment platform have the added benefit of visibility in production—and autonomy. Before, a centralized team and would have to run a trace. "Now, developers have autonomy, they can use this whenever they want, however they want. It becomes more valuable the more instrumentation downstream that happens, as we mature in it." says Greco.

-
-
- "We’re trying to make what we’re doing known so that we can find people who are like, 'Yeah, that’s interesting. I want to come do it!’" -
-
+{{< case-studies/quote >}} +"We're trying to make what we're doing known so that we can find people who are like, 'Yeah, that's interesting. I want to come do it!'" +{{< /case-studies/quote >}} -
- But the team didn’t stop there. "In a large enterprise, you’re going to have people using Kubernetes, but then you’re also going to have people using WAS and .NET," says Greco. "You may not be at a point where your whole stack can be cloud native. What if you can take your API management tool and make it cloud native, but still proxy to legacy systems? Using different pieces that are cloud native, open source and Kubernetes native, you can do pretty innovative stuff."

- As the team continues to improve its stack and share its Kubernetes best practices, it feels that Northwestern Mutual’s reputation as a technology-first company is evolving too. "No one would think a company that’s 160-plus years old is foraying this deep into the cloud and infrastructure stack," says Pfremmer. And they’re hoping that means they’ll be able to attract new talent. "We’re trying to make what we’re doing known so that we can find people who are like, 'Yeah, that’s interesting. I want to come do it!’" +

But the team didn't stop there. "In a large enterprise, you're going to have people using Kubernetes, but then you're also going to have people using WAS and .NET," says Greco. "You may not be at a point where your whole stack can be cloud native. What if you can take your API management tool and make it cloud native, but still proxy to legacy systems? Using different pieces that are cloud native, open source and Kubernetes native, you can do pretty innovative stuff."

- -
- -
+

As the team continues to improve its stack and share its Kubernetes best practices, it feels that Northwestern Mutual's reputation as a technology-first company is evolving too. "No one would think a company that's 160-plus years old is foraying this deep into the cloud and infrastructure stack," says Pfremmer. And they're hoping that means they'll be able to attract new talent. "We're trying to make what we're doing known so that we can find people who are like, 'Yeah, that's interesting. I want to come do it!'"

diff --git a/content/en/case-studies/ocado/index.html b/content/en/case-studies/ocado/index.html index 79ac9bf3a8..4c4018009a 100644 --- a/content/en/case-studies/ocado/index.html +++ b/content/en/case-studies/ocado/index.html @@ -1,99 +1,83 @@ --- title: Ocado Case Study - linkTitle: Ocado case_study_styles: true cid: caseStudies -css: /css/style_case_studies.css logo: ocado_featured_logo.png featured: true weight: 4 quote: > - People at Ocado Technology have been quite amazed. They ask, ‘Can we do this on a Dev cluster?’ and 10 minutes later we have rolled out something that is deployed across the cluster. The speed from idea to implementation to deployment is amazing. + People at Ocado Technology have been quite amazed. They ask, 'Can we do this on a Dev cluster?' and 10 minutes later we have rolled out something that is deployed across the cluster. The speed from idea to implementation to deployment is amazing. + +new_case_study_styles: true +heading_background: /images/case-studies/ocado/banner1.jpg +heading_title_logo: /images/ocado_logo.png +subheading: > + Ocado: Running Grocery Warehouses with a Cloud Native Platform +case_study_details: + - Company: Ocado Technology + - Location: Hatfield, England + - Industry: Grocery retail technology and platforms --- -
-

CASE STUDY:
Ocado: Running Grocery Warehouses with a Cloud Native Platform

-
+

Challenge

-
- Company  Ocado Technology     Location  Hatfield, England     Industry  Grocery retail technology and platforms -
+

The world's largest online-only grocery retailer, Ocado developed the Ocado Smart Platform to manage its own operations, from websites to warehouses, and is now licensing the technology to other retailers such as Kroger. To set up the first warehouses for the platform, Ocado shifted from virtual machines and Puppet infrastructure to Docker containers, using CoreOS's fleet scheduler to provision all the services on its OpenStack-based private cloud on bare metal. As the Smart Platform grew and "fleet was going end-of-life," says Platform Engineer Mike Bryant, "we started looking for a more complete platform, with all of these disparate infrastructure services being brought together in one unified API."

-
-
-
-
-

Challenge

- The world’s largest online-only grocery retailer, Ocado developed the Ocado Smart Platform to manage its own operations, from websites to warehouses, and is now licensing the technology to other retailers such as Kroger. To set up the first warehouses for the platform, Ocado shifted from virtual machines and Puppet infrastructure to Docker containers, using CoreOS’s fleet scheduler to provision all the services on its OpenStack-based private cloud on bare metal. As the Smart Platform grew and "fleet was going end-of-life," says Platform Engineer Mike Bryant, "we started looking for a more complete platform, with all of these disparate infrastructure services being brought together in one unified API."

-

Solution

- The team decided to migrate from fleet to Kubernetes on Ocado’s private cloud. The Kubernetes stack currently uses kubeadm for bootstrapping, CNI with Weave Net for networking, Prometheus Operator for monitoring, Fluentd for logging, and OpenTracing for distributed tracing. The first app on Kubernetes, a business-critical service in the warehouses, went into production in the summer of 2017, with a mass migration continuing into 2018. Hundreds of Ocado engineers working on the Smart Platform are now deploying on Kubernetes. - -
- -
+

Solution

+

The team decided to migrate from fleet to Kubernetes on Ocado's private cloud. The Kubernetes stack currently uses kubeadm for bootstrapping, CNI with Weave Net for networking, Prometheus Operator for monitoring, Fluentd for logging, and OpenTracing for distributed tracing. The first app on Kubernetes, a business-critical service in the warehouses, went into production in the summer of 2017, with a mass migration continuing into 2018. Hundreds of Ocado engineers working on the Smart Platform are now deploying on Kubernetes.

Impact

- With Kubernetes, "the speed from idea to implementation to deployment is amazing," says Bryant. "I’ve seen features go from development to production inside of a week now. In the old world, a new application deployment could easily take over a month." And because there are no longer restrictive deployment windows in the warehouses, the rate of deployments has gone from as few as two per week to dozens per week. Ocado has also achieved cost savings because Kubernetes gives the team the ability to have more fine-grained resource allocation. Says DevOps Team Leader Kevin McCormack: "We have more confidence in the resource allocation/separation features of Kubernetes, so we have been able to migrate from around 10 fleet clusters to one Kubernetes cluster." The team also uses Prometheus and Grafana to visualize resource allocation, and makes the data available to developers. "The increased visibility offered by Prometheus means developers are more aware of what they are using and how their use impacts others, especially since we now have one shared cluster," says McCormack. "I’d estimate that we use about 15-25% less hardware resources to host the same applications in Kubernetes in our test environments." -
+

With Kubernetes, "the speed from idea to implementation to deployment is amazing," says Bryant. "I've seen features go from development to production inside of a week now. In the old world, a new application deployment could easily take over a month." And because there are no longer restrictive deployment windows in the warehouses, the rate of deployments has gone from as few as two per week to dozens per week. Ocado has also achieved cost savings because Kubernetes gives the team the ability to have more fine-grained resource allocation. Says DevOps Team Leader Kevin McCormack: "We have more confidence in the resource allocation/separation features of Kubernetes, so we have been able to migrate from around 10 fleet clusters to one Kubernetes cluster." The team also uses Prometheus and Grafana to visualize resource allocation, and makes the data available to developers. "The increased visibility offered by Prometheus means developers are more aware of what they are using and how their use impacts others, especially since we now have one shared cluster," says McCormack. "I'd estimate that we use about 15-25% less hardware resources to host the same applications in Kubernetes in our test environments."

-
-
-
-
- "People at Ocado Technology have been quite amazed. They ask, ‘Can we do this on a Dev cluster?’ and 10 minutes later we have rolled out something that is deployed across the cluster. The speed from idea to implementation to deployment is amazing."

- Mike Bryant, Platform Engineer, Ocado
-
-
-
-
-

When it was founded in 2000, Ocado was an online-only grocery retailer in the U.K. In the years since, it has expanded from delivering produce to families to providing technology to other grocery retailers.

-The company began developing its Ocado Smart Platform to manage its own operations, from websites to warehouses, and is now licensing the technology to other grocery chains around the world, such as Kroger. To set up the first warehouses on the platform, Ocado shifted from virtual machines and Puppet infrastructure to Docker containers, using CoreOS’s fleet scheduler to provision all the services on its OpenStack-based private cloud on bare metal. As the Smart Platform grew, and "fleet was going end-of-life," says Platform Engineer Mike Bryant, "we started looking for a more complete platform, with all of these disparate infrastructure services being brought together in one unified API."

-Bryant had already been using Kubernetes with Code for Life, a children’s education project that’s part of Ocado’s charity arm. "We really liked it, so we started looking at it seriously for our production workloads," says Bryant. The team that managed fleet had researched orchestration solutions and landed on Kubernetes as well. "We were looking for a platform with wide adoption, and that was where the momentum was," says DevOps Team Leader Kevin McCormack. The two paths converged, and "We didn’t even go through any proof-of-concept stage. The Code for Life work served that purpose," says Bryant. +{{< case-studies/quote author="Mike Bryant, Platform Engineer, Ocado" >}} +"People at Ocado Technology have been quite amazed. They ask, 'Can we do this on a Dev cluster?' and 10 minutes later we have rolled out something that is deployed across the cluster. The speed from idea to implementation to deployment is amazing." +{{< /case-studies/quote >}} -
-
-
-
- "We were looking for a platform with wide adoption, and that was where the momentum was, the two paths converged, and we didn’t even go through any proof-of-concept stage. The Code for Life work served that purpose,"

- Kevin McCormack, DevOps Team Leader, Ocado
-
-
-
-
- In the summer of 2016, the team began migrating from fleet to Kubernetes on Ocado’s private cloud. The Kubernetes stack currently uses kubeadm for bootstrapping, CNI with Weave Net for networking, Prometheus Operator for monitoring, Fluentd for logging, and OpenTracing for distributed tracing.

- The first app on Kubernetes, a business-critical service in the warehouses, went into production a year later. Once that app was running smoothly, a mass migration continued into 2018. Hundreds of Ocado engineers working on the Smart Platform are now deploying on Kubernetes, and the platform is live in Ocado’s warehouses, managing tens of thousands of orders a week. At full capacity, Ocado’s latest warehouse in Erith, southeast London, will deliver more than 200,000 orders per week, making it the world’s largest facility for online grocery.

- There are about 150 microservices now running on Kubernetes, with multiple instances of many of them. "We’re not just deploying all these microservices at once. We’re deploying them all for one warehouse, and then they’re all being deployed again for the next warehouse, and again and again," says Bryant.

- The move to Kubernetes was eye-opening for many people at Ocado Technology. "In the early days of putting the platform into our test infrastructure, the technical architect asked what network performance was like on Weave Net with encryption turned on," recalls Bryant. "So we found a Docker container for iPerf, wrote a daemon set, deployed it. A few moments later, we’ve deployed the entire thing across this cluster. He was pretty blown away by that." +{{< case-studies/lead >}} +When it was founded in 2000, Ocado was an online-only grocery retailer in the U.K. In the years since, it has expanded from delivering produce to families to providing technology to other grocery retailers. +{{< /case-studies/lead >}} -
-
-
-
- "The unified API of Kubernetes means this is all in one place, and it’s one flow for approval and rollout. I’ve seen features go from development to production inside of a week now. In the old world, a new application deployment could easily take over a month."

- Mike Bryant, Platform Engineer, Ocado
-
-
- +

The company began developing its Ocado Smart Platform to manage its own operations, from websites to warehouses, and is now licensing the technology to other grocery chains around the world, such as Kroger. To set up the first warehouses on the platform, Ocado shifted from virtual machines and Puppet infrastructure to Docker containers, using CoreOS's fleet scheduler to provision all the services on its OpenStack-based private cloud on bare metal. As the Smart Platform grew, and "fleet was going end-of-life," says Platform Engineer Mike Bryant, "we started looking for a more complete platform, with all of these disparate infrastructure services being brought together in one unified API."

-
-
- Indeed, the impact has been profound. "Prior to containerization, we had quite restrictive deployment windows in our warehouses," says Bryant. "Moving to microservices, we’ve been able to deploy much more frequently. We’ve been able to move towards continuous delivery in a number of areas. In our older warehouse, new application deployments involve talking to a bunch of different teams for different levels of the stack: from VM provisioning, to storage, to load balancers, and so on. The unified API of Kubernetes means this is all in one place, and it’s one flow for approval and rollout. I’ve seen features go from development to production inside of a week now. In the old world, a new application deployment could easily take over a month."

- The rate of deployment has gone from as few as two per week to dozens per week. "With Kubernetes, some of our development teams have been able to deploy their application to production on the new platform without us noticing," says Bryant, "which means they’re faster at doing what they need to do and we have less work."

- Ocado has also achieved cost savings because Kubernetes gives the team the ability to have more fine-grained resource allocation. "That lets us shrink quite a lot of our deployments from being per-core VM deployments to having fractions of the core," says Bryant. Adds McCormack: "We have more confidence in the resource allocation/separation features of Kubernetes, so we have been able to migrate from around 10 fleet clusters to one Kubernetes cluster. This means we use our hardware better since if we have to always have two nodes of excess capacity available in case of node failures then we only need two extra instead of 20." +

Bryant had already been using Kubernetes with Code for Life, a children's education project that's part of Ocado's charity arm. "We really liked it, so we started looking at it seriously for our production workloads," says Bryant. The team that managed fleet had researched orchestration solutions and landed on Kubernetes as well. "We were looking for a platform with wide adoption, and that was where the momentum was," says DevOps Team Leader Kevin McCormack. The two paths converged, and "We didn't even go through any proof-of-concept stage. The Code for Life work served that purpose," says Bryant.

-
+{{< case-studies/quote + image="/images/case-studies/ocado/banner3.jpg" + author="Kevin McCormack, DevOps Team Leader, Ocado" +>}} +"We were looking for a platform with wide adoption, and that was where the momentum was, the two paths converged, and we didn't even go through any proof-of-concept stage. The Code for Life work served that purpose," +{{< /case-studies/quote >}} -
-
- "CNCF have provided us with support of different technologies. We’ve been able to adopt those in a very easy fashion. We do like that CNCF is vendor agnostic. We’re not being asked to commit to this one way of doing things. The vast diversity of viewpoints in CNCF lead to better technology."

- Mike Bryant, Platform Engineer, Ocado
+

In the summer of 2016, the team began migrating from fleet to Kubernetes on Ocado's private cloud. The Kubernetes stack currently uses kubeadm for bootstrapping, CNI with Weave Net for networking, Prometheus Operator for monitoring, Fluentd for logging, and OpenTracing for distributed tracing.

-
-
+

The first app on Kubernetes, a business-critical service in the warehouses, went into production a year later. Once that app was running smoothly, a mass migration continued into 2018. Hundreds of Ocado engineers working on the Smart Platform are now deploying on Kubernetes, and the platform is live in Ocado's warehouses, managing tens of thousands of orders a week. At full capacity, Ocado's latest warehouse in Erith, southeast London, will deliver more than 200,000 orders per week, making it the world's largest facility for online grocery.

-
- The team also uses Prometheus and Grafana to visualize resource allocation, and makes the data available to developers. "The increased visibility offered by Prometheus means developers are more aware of what they are using and how their use impacts others, especially since we now have one shared cluster," says McCormack. "I’d estimate that we use about 15-25% less hardware resource to host the same applications in Kubernetes in our test environments."

- One of the broader benefits of cloud native, says Bryant, is the unified API. "We have one method of doing our deployments that covers the wide range of things we need to do, and we can extend the API," he says. In addition to using Prometheus Operator, the Ocado team has started writing its own operators, some of which have been open sourced. Plus, "CNCF has provided us with support of these different technologies. We’ve been able to adopt those in a very easy fashion. We do like that CNCF is vendor agnostic. We’re not being asked to commit to this one way of doing things. The vast diversity of viewpoints in the CNCF leads to better technology."

- Ocado’s own technology, in the form of its Smart Platform, will soon be used around the world. And cloud native plays a crucial role in this global expansion. "I wouldn’t have wanted to try it without Kubernetes," says Bryant. "Kubernetes has made it so much nicer, especially to have that consistent way of deploying all of the applications, then taking the same thing and being able to replicate it. It’s very valuable." +

There are about 150 microservices now running on Kubernetes, with multiple instances of many of them. "We're not just deploying all these microservices at once. We're deploying them all for one warehouse, and then they're all being deployed again for the next warehouse, and again and again," says Bryant.

-
-
+

The move to Kubernetes was eye-opening for many people at Ocado Technology. "In the early days of putting the platform into our test infrastructure, the technical architect asked what network performance was like on Weave Net with encryption turned on," recalls Bryant. "So we found a Docker container for iPerf, wrote a daemon set, deployed it. A few moments later, we've deployed the entire thing across this cluster. He was pretty blown away by that."

+ +{{< case-studies/quote + image="/images/case-studies/ocado/banner4.jpg" + author="Mike Bryant, Platform Engineer, Ocado" +>}} +"The unified API of Kubernetes means this is all in one place, and it's one flow for approval and rollout. I've seen features go from development to production inside of a week now. In the old world, a new application deployment could easily take over a month." +{{< /case-studies/quote >}} + +

Indeed, the impact has been profound. "Prior to containerization, we had quite restrictive deployment windows in our warehouses," says Bryant. "Moving to microservices, we've been able to deploy much more frequently. We've been able to move towards continuous delivery in a number of areas. In our older warehouse, new application deployments involve talking to a bunch of different teams for different levels of the stack: from VM provisioning, to storage, to load balancers, and so on. The unified API of Kubernetes means this is all in one place, and it's one flow for approval and rollout. I've seen features go from development to production inside of a week now. In the old world, a new application deployment could easily take over a month."

+ +

The rate of deployment has gone from as few as two per week to dozens per week. "With Kubernetes, some of our development teams have been able to deploy their application to production on the new platform without us noticing," says Bryant, "which means they're faster at doing what they need to do and we have less work."

+ +

Ocado has also achieved cost savings because Kubernetes gives the team the ability to have more fine-grained resource allocation. "That lets us shrink quite a lot of our deployments from being per-core VM deployments to having fractions of the core," says Bryant. Adds McCormack: "We have more confidence in the resource allocation/separation features of Kubernetes, so we have been able to migrate from around 10 fleet clusters to one Kubernetes cluster. This means we use our hardware better since if we have to always have two nodes of excess capacity available in case of node failures then we only need two extra instead of 20."

+ +{{< case-studies/quote author="Mike Bryant, Platform Engineer, Ocado" >}} +"CNCF have provided us with support of different technologies. We've been able to adopt those in a very easy fashion. We do like that CNCF is vendor agnostic. We're not being asked to commit to this one way of doing things. The vast diversity of viewpoints in CNCF lead to better technology." +{{< /case-studies/quote >}} + +

The team also uses Prometheus and Grafana to visualize resource allocation, and makes the data available to developers. "The increased visibility offered by Prometheus means developers are more aware of what they are using and how their use impacts others, especially since we now have one shared cluster," says McCormack. "I'd estimate that we use about 15-25% less hardware resource to host the same applications in Kubernetes in our test environments."

+ +

One of the broader benefits of cloud native, says Bryant, is the unified API. "We have one method of doing our deployments that covers the wide range of things we need to do, and we can extend the API," he says. In addition to using Prometheus Operator, the Ocado team has started writing its own operators, some of which have been open sourced. Plus, "CNCF has provided us with support of these different technologies. We've been able to adopt those in a very easy fashion. We do like that CNCF is vendor agnostic. We're not being asked to commit to this one way of doing things. The vast diversity of viewpoints in the CNCF leads to better technology."

+ +

Ocado's own technology, in the form of its Smart Platform, will soon be used around the world. And cloud native plays a crucial role in this global expansion. "I wouldn't have wanted to try it without Kubernetes," says Bryant. "Kubernetes has made it so much nicer, especially to have that consistent way of deploying all of the applications, then taking the same thing and being able to replicate it. It's very valuable."

diff --git a/content/en/case-studies/openAI/index.html b/content/en/case-studies/openAI/index.html index 1b95ec5f35..543c1dee64 100644 --- a/content/en/case-studies/openAI/index.html +++ b/content/en/case-studies/openAI/index.html @@ -2,98 +2,68 @@ title: OpenAI Case Study case_study_styles: true cid: caseStudies -css: /css/style_case_studies.css + +new_case_study_styles: true +heading_background: /images/case-studies/openAI/banner1.jpg +heading_title_logo: /images/openAI_logo.png +subheading: > + Launching and Scaling Up Experiments, Made Simple +case_study_details: + - Company: OpenAI + - Location: San Francisco, California + - Industry: Artificial Intelligence Research --- -
-

CASE STUDY:
Launching and Scaling Up Experiments, Made Simple +

Challenge

-

+

An artificial intelligence research lab, OpenAI needed infrastructure for deep learning that would allow experiments to be run either in the cloud or in its own data center, and to easily scale. Portability, speed, and cost were the main drivers.

-
+

Solution

-
- Company  OpenAI     Location  San Francisco, California     Industry  Artificial Intelligence Research -
- -
-
-
-
-

Challenge

- An artificial intelligence research lab, OpenAI needed infrastructure for deep learning that would allow experiments to be run either in the cloud or in its own data center, and to easily scale. Portability, speed, and cost were the main drivers. -
-

Solution

- OpenAI began running Kubernetes on top of AWS in 2016, and in early 2017 migrated to Azure. OpenAI runs key experiments in fields including robotics and gaming both in Azure and in its own data centers, depending on which cluster has free capacity. "We use Kubernetes mainly as a batch scheduling system and rely on our autoscaler to dynamically scale up and down our cluster," says Christopher Berner, Head of Infrastructure. "This lets us significantly reduce costs for idle nodes, while still providing low latency and rapid iteration." -
- -
+

OpenAI began running Kubernetes on top of AWS in 2016, and in early 2017 migrated to Azure. OpenAI runs key experiments in fields including robotics and gaming both in Azure and in its own data centers, depending on which cluster has free capacity. "We use Kubernetes mainly as a batch scheduling system and rely on our autoscaler to dynamically scale up and down our cluster," says Christopher Berner, Head of Infrastructure. "This lets us significantly reduce costs for idle nodes, while still providing low latency and rapid iteration."

Impact

- The company has benefited from greater portability: "Because Kubernetes provides a consistent API, we can move our research experiments very easily between clusters," says Berner. Being able to use its own data centers when appropriate is "lowering costs and providing us access to hardware that we wouldn’t necessarily have access to in the cloud," he adds. "As long as the utilization is high, the costs are much lower there." Launching experiments also takes far less time: "One of our researchers who is working on a new distributed training system has been able to get his experiment running in two or three days. In a week or two he scaled it out to hundreds of GPUs. Previously, that would have easily been a couple of months of work." +

The company has benefited from greater portability: "Because Kubernetes provides a consistent API, we can move our research experiments very easily between clusters," says Berner. Being able to use its own data centers when appropriate is "lowering costs and providing us access to hardware that we wouldn't necessarily have access to in the cloud," he adds. "As long as the utilization is high, the costs are much lower there." Launching experiments also takes far less time: "One of our researchers who is working on a new distributed training system has been able to get his experiment running in two or three days. In a week or two he scaled it out to hundreds of GPUs. Previously, that would have easily been a couple of months of work."

-
+{{< case-studies/quote >}} + +
+Check out "Building the Infrastructure that Powers the Future of AI" presented by Vicki Cheung, Member of Technical Staff & Jonas Schneider, Member of Technical Staff at OpenAI from KubeCon/CloudNativeCon Europe 2017. +{{< /case-studies/quote >}} -
-
-
-
+{{< case-studies/lead >}} +From experiments in robotics to old-school video game play research, OpenAI's work in artificial intelligence technology is meant to be shared. +{{< /case-studies/lead >}} -
-
-Check out "Building the Infrastructure that Powers the Future of AI" presented by Vicki Cheung, Member of Technical Staff & Jonas Schneider, Member of Technical Staff at OpenAI from KubeCon/CloudNativeCon Europe 2017. -
-
-
-
-
-

From experiments in robotics to old-school video game play research, OpenAI’s work in artificial intelligence technology is meant to be shared.

- With a mission to ensure powerful AI systems are safe, OpenAI cares deeply about open source—both benefiting from it and contributing safety technology into it. "The research that we do, we want to spread it as widely as possible so everyone can benefit," says OpenAI’s Head of Infrastructure Christopher Berner. The lab’s philosophy—as well as its particular needs—lent itself to embracing an open source, cloud native strategy for its deep learning infrastructure.

- OpenAI started running Kubernetes on top of AWS in 2016, and a year later, migrated the Kubernetes clusters to Azure. "We probably use Kubernetes differently from a lot of people," says Berner. "We use it for batch scheduling and as a workload manager for the cluster. It’s a way of coordinating a large number of containers that are all connected together. We rely on our autoscaler to dynamically scale up and down our cluster. This lets us significantly reduce costs for idle nodes, while still providing low latency and rapid iteration."

- In the past year, Berner has overseen the launch of several Kubernetes clusters in OpenAI’s own data centers. "We run them in a hybrid model where the control planes—the Kubernetes API servers, etcd and everything—are all in Azure, and then all of the Kubernetes nodes are in our own data center," says Berner. "The cloud is really convenient for managing etcd and all of the masters, and having backups and spinning up new nodes if anything breaks. This model allows us to take advantage of lower costs and have the availability of more specialized hardware in our own data center." +

With a mission to ensure powerful AI systems are safe, OpenAI cares deeply about open source—both benefiting from it and contributing safety technology into it. "The research that we do, we want to spread it as widely as possible so everyone can benefit," says OpenAI's Head of Infrastructure Christopher Berner. The lab's philosophy—as well as its particular needs—lent itself to embracing an open source, cloud native strategy for its deep learning infrastructure.

+

OpenAI started running Kubernetes on top of AWS in 2016, and a year later, migrated the Kubernetes clusters to Azure. "We probably use Kubernetes differently from a lot of people," says Berner. "We use it for batch scheduling and as a workload manager for the cluster. It's a way of coordinating a large number of containers that are all connected together. We rely on our autoscaler to dynamically scale up and down our cluster. This lets us significantly reduce costs for idle nodes, while still providing low latency and rapid iteration."

-
-
-
-
- OpenAI’s experiments take advantage of Kubernetes’ benefits, including portability. "Because Kubernetes provides a consistent API, we can move our research experiments very easily between clusters..." +

In the past year, Berner has overseen the launch of several Kubernetes clusters in OpenAI's own data centers. "We run them in a hybrid model where the control planes—the Kubernetes API servers, etcd and everything—are all in Azure, and then all of the Kubernetes nodes are in our own data center," says Berner. "The cloud is really convenient for managing etcd and all of the masters, and having backups and spinning up new nodes if anything breaks. This model allows us to take advantage of lower costs and have the availability of more specialized hardware in our own data center."

-
-
-
-
- Different teams at OpenAI currently run a couple dozen projects. While the largest-scale workloads manage bare cloud VMs directly, most of OpenAI’s experiments take advantage of Kubernetes’ benefits, including portability. "Because Kubernetes provides a consistent API, we can move our research experiments very easily between clusters," says Berner. The on-prem clusters are generally "used for workloads where you need lots of GPUs, something like training an ImageNet model. Anything that’s CPU heavy, that’s run in the cloud. But we also have a number of teams that run their experiments both in Azure and in our own data centers, just depending on which cluster has free capacity, and that’s hugely valuable."

- Berner has made the Kubernetes clusters available to all OpenAI teams to use if it’s a good fit. "I’ve worked a lot with our games team, which at the moment is doing research on classic console games," he says. "They had been running a bunch of their experiments on our dev servers, and they had been trying out Google cloud, managing their own VMs. We got them to try out our first on-prem Kubernetes cluster, and that was really successful. They’ve now moved over completely to it, and it has allowed them to scale up their experiments by 10x, and do that without needing to invest significant engineering time to figure out how to manage more machines. A lot of people are now following the same path." +{{< case-studies/quote image="/images/case-studies/openAI/banner3.jpg" >}} +OpenAI's experiments take advantage of Kubernetes' benefits, including portability. "Because Kubernetes provides a consistent API, we can move our research experiments very easily between clusters..." +{{< /case-studies/quote >}} -
-
-
-
+

Different teams at OpenAI currently run a couple dozen projects. While the largest-scale workloads manage bare cloud VMs directly, most of OpenAI's experiments take advantage of Kubernetes' benefits, including portability. "Because Kubernetes provides a consistent API, we can move our research experiments very easily between clusters," says Berner. The on-prem clusters are generally "used for workloads where you need lots of GPUs, something like training an ImageNet model. Anything that's CPU heavy, that's run in the cloud. But we also have a number of teams that run their experiments both in Azure and in our own data centers, just depending on which cluster has free capacity, and that's hugely valuable."

+ +

Berner has made the Kubernetes clusters available to all OpenAI teams to use if it's a good fit. "I've worked a lot with our games team, which at the moment is doing research on classic console games," he says. "They had been running a bunch of their experiments on our dev servers, and they had been trying out Google cloud, managing their own VMs. We got them to try out our first on-prem Kubernetes cluster, and that was really successful. They've now moved over completely to it, and it has allowed them to scale up their experiments by 10x, and do that without needing to invest significant engineering time to figure out how to manage more machines. A lot of people are now following the same path."

+ +{{< case-studies/quote image="/images/case-studies/openAI/banner4.jpg" >}} "One of our researchers who is working on a new distributed training system has been able to get his experiment running in two or three days," says Berner. "In a week or two he scaled it out to hundreds of GPUs. Previously, that would have easily been a couple of months of work." -
-
+{{< /case-studies/quote >}} -
-
- That path has been simplified by frameworks and tools that two of OpenAI’s teams have developed to handle interaction with Kubernetes. "You can just write some Python code, fill out a bit of configuration with exactly how many machines you need and which types, and then it will prepare all of those specifications and send it to the Kube cluster so that it gets launched there," says Berner. "And it also provides a bit of extra monitoring and better tooling that’s designed specifically for these machine learning projects."

- The impact that Kubernetes has had at OpenAI is impressive. With Kubernetes, the frameworks and tooling, including the autoscaler, in place, launching experiments takes far less time. "One of our researchers who is working on a new distributed training system has been able to get his experiment running in two or three days," says Berner. "In a week or two he scaled it out to hundreds of GPUs. Previously, that would have easily been a couple of months of work."

- Plus, the flexibility they now have to use their on-prem Kubernetes cluster when appropriate is "lowering costs and providing us access to hardware that we wouldn’t necessarily have access to in the cloud," he says. "As long as the utilization is high, the costs are much lower in our data center. To an extent, you can also customize your hardware to exactly what you need." +

That path has been simplified by frameworks and tools that two of OpenAI's teams have developed to handle interaction with Kubernetes. "You can just write some Python code, fill out a bit of configuration with exactly how many machines you need and which types, and then it will prepare all of those specifications and send it to the Kube cluster so that it gets launched there," says Berner. "And it also provides a bit of extra monitoring and better tooling that's designed specifically for these machine learning projects."

+

The impact that Kubernetes has had at OpenAI is impressive. With Kubernetes, the frameworks and tooling, including the autoscaler, in place, launching experiments takes far less time. "One of our researchers who is working on a new distributed training system has been able to get his experiment running in two or three days," says Berner. "In a week or two he scaled it out to hundreds of GPUs. Previously, that would have easily been a couple of months of work."

-
+

Plus, the flexibility they now have to use their on-prem Kubernetes cluster when appropriate is "lowering costs and providing us access to hardware that we wouldn't necessarily have access to in the cloud," he says. "As long as the utilization is high, the costs are much lower in our data center. To an extent, you can also customize your hardware to exactly what you need."

-
-
- "Research teams can now take advantage of the frameworks we’ve built on top of Kubernetes, which make it easy to launch experiments, scale them by 10x or 50x, and take little effort to manage."
— CHRISTOPHER BERNER, HEAD OF INFRASTRUCTURE FOR OPENAI -
-
+{{< case-studies/quote author="CHRISTOPHER BERNER, HEAD OF INFRASTRUCTURE FOR OPENAI" >}} +"Research teams can now take advantage of the frameworks we've built on top of Kubernetes, which make it easy to launch experiments, scale them by 10x or 50x, and take little effort to manage." +{{< /case-studies/quote >}} -
+

OpenAI is also benefiting from other technologies in the CNCF cloud-native ecosystem. gRPC is used by many of its systems for communications between different services, and Prometheus is in place "as a debugging tool if things go wrong," says Berner. "We actually haven't had any real problems in our Kubernetes clusters recently, so I don't think anyone has looked at our Prometheus monitoring in a while. If something breaks, it will be there."

- OpenAI is also benefiting from other technologies in the CNCF cloud-native ecosystem. gRPC is used by many of its systems for communications between different services, and Prometheus is in place "as a debugging tool if things go wrong," says Berner. "We actually haven’t had any real problems in our Kubernetes clusters recently, so I don’t think anyone has looked at our Prometheus monitoring in a while. If something breaks, it will be there."

- One of the things Berner continues to focus on is Kubernetes’ ability to scale, which is essential to deep learning experiments. OpenAI has been able to push one of its Kubernetes clusters on Azure up to more than 2,500 nodes. "I think we’ll probably hit the 5,000-machine number that Kubernetes has been tested at before too long," says Berner, adding, "We’re definitely hiring if you’re excited about working on these things!" -
- -
+

One of the things Berner continues to focus on is Kubernetes' ability to scale, which is essential to deep learning experiments. OpenAI has been able to push one of its Kubernetes clusters on Azure up to more than 2,500 nodes. "I think we'll probably hit the 5,000-machine number that Kubernetes has been tested at before too long," says Berner, adding, "We're definitely hiring if you're excited about working on these things!"

diff --git a/content/en/case-studies/peardeck/index.html b/content/en/case-studies/peardeck/index.html index 688754a620..4398277677 100644 --- a/content/en/case-studies/peardeck/index.html +++ b/content/en/case-studies/peardeck/index.html @@ -1,111 +1,87 @@ --- title: Pear Deck Case Study - case_study_styles: true cid: caseStudies -css: /css/style_peardeck.css + +new_case_study_styles: true +heading_background: /images/case-studies/peardeck/banner3.jpg +heading_title_logo: /images/peardeck_logo.png +subheading: > + Infrastructure for a Growing EdTech Startup +case_study_details: + - Company: Pear Deck + - Location: Iowa City, Iowa + - Industry: Educational Software --- -
-

CASE STUDY:
Infrastructure for a Growing EdTech Startup

+

Challenge

-
+

The three-year-old startup provides a web app for teachers to interact with their students in the classroom. The JavaScript app was built on Google's web app development platform Firebase, using Heroku. As the user base steadily grew, so did the development team. "We outgrew Heroku when we started wanting to have multiple services, and the deploying story got pretty horrendous. We were frustrated that we couldn't have the developers quickly stage a version," says CEO Riley Eynon-Lynch. "Tracing and monitoring became basically impossible." On top of that, many of Pear Deck's customers are behind government firewalls and connect through Firebase, not Pear Deck's servers, making troubleshooting even more difficult.

-
- Company  Pear Deck     Location  Iowa City, Iowa     Industry  Educational Software -
+

Solution

-
+

In 2016, the company began moving their code from Heroku to Docker containers running on Google Kubernetes Engine, orchestrated by Kubernetes and monitored with Prometheus.

-
-
-
-

Challenge

- The three-year-old startup provides a web app for teachers to interact with their students in the classroom. The JavaScript app was built on Google’s web app development platform Firebase, using Heroku. As the user base steadily grew, so did the development team. "We outgrew Heroku when we started wanting to have multiple services, and the deploying story got pretty horrendous. We were frustrated that we couldn’t have the developers quickly stage a version," says CEO Riley Eynon-Lynch. "Tracing and monitoring became basically impossible." On top of that, many of Pear Deck’s customers are behind government firewalls and connect through Firebase, not Pear Deck’s servers, making troubleshooting even more difficult. -
- -
-

Solution

- In 2016, the company began moving their code from Heroku to Docker containers running on Google Kubernetes Engine, orchestrated by Kubernetes and monitored with Prometheus. -
-

Impact

- The new cloud native stack immediately improved the development workflow, speeding up deployments. Prometheus gave Pear Deck "a lot of confidence, knowing that people are still logging into the app and using it all the time," says Eynon-Lynch. "The biggest impact is being able to work as a team on the configuration in git in a pull request, and the biggest confidence comes from the solidity of the abstractions and the trust that we have in Kubernetes actually making our yaml files a reality." -
-
-
-
-
- "We didn’t even realize how stressed out we were about our lack of insight into what was happening with the app. I’m really excited and have more and more confidence in the actual state of our application for our actual users, and not just what the CPU graphs are saying, because of Prometheus and Kubernetes."

– RILEY EYNON-LYNCH, CEO OF PEAR DECK -
-
+

The new cloud native stack immediately improved the development workflow, speeding up deployments. Prometheus gave Pear Deck "a lot of confidence, knowing that people are still logging into the app and using it all the time," says Eynon-Lynch. "The biggest impact is being able to work as a team on the configuration in git in a pull request, and the biggest confidence comes from the solidity of the abstractions and the trust that we have in Kubernetes actually making our yaml files a reality."

-
-
-

With the speed befitting a startup, Pear Deck delivered its first prototype to customers within three months of incorporating.

-As a former high school math teacher, CEO Riley Eynon-Lynch felt an urgency to provide a tech solution to classes where instructors struggle to interact with every student in a short amount of time. "Pear Deck is an app that students can use to interact with the teacher all at once," he says. "When the teacher asks a question, instead of just the kid at the front of the room answering again, everybody can answer every single question. It’s a huge fundamental shift in the messaging to the students about how much we care about them and how much they are a part of the classroom."

-Eynon-Lynch and his partners quickly built a JavaScript web app on Google’s web app development platform Firebase, and launched the minimum viable product [MVP] on Heroku "because it was fast and easy," he says. "We made everything as easy as we could." -

-But once it launched, the user base began growing steadily at a rate of 30 percent a month. "Our Heroku bill was getting totally insane," Eynon-Lynch says. But even more crucially, as the company hired more developers to keep pace, "we outgrew Heroku. We wanted to have multiple services and the deploying story got pretty horrendous. We were frustrated that we couldn’t have the developers quickly stage a version. Tracing and monitoring became basically impossible." -

-On top of that, many of Pear Deck’s customers are behind government firewalls and connect through Firebase, not Pear Deck’s servers, making troubleshooting even more difficult. -

-The team began looking around for another solution, and finally decided in early 2016 to start moving the app from Heroku to Docker containers running on Google Kubernetes Engine, orchestrated by Kubernetes and monitored with Prometheus. -
-
+{{< case-studies/quote author="RILEY EYNON-LYNCH, CEO OF PEAR DECK" >}} +"We didn't even realize how stressed out we were about our lack of insight into what was happening with the app. I'm really excited and have more and more confidence in the actual state of our application for our actual users, and not just what the CPU graphs are saying, because of Prometheus and Kubernetes." +{{< /case-studies/quote >}} -
-
- "When it became clear that Google Kubernetes Engine was going to have a lot of support from Google and be a fully-managed Kubernetes platform, it seemed very obvious to us that was the way to go," says Eynon-Lynch. -
-
+{{< case-studies/lead >}} +With the speed befitting a startup, Pear Deck delivered its first prototype to customers within three months of incorporating. +{{< /case-studies/lead >}} -
-
- They had considered other options like Google’s App Engine (which they were already using for one service) and Amazon’s Elastic Compute Cloud (EC2), while experimenting with running one small service that wasn’t accessible to the Internet in Kubernetes. "When it became clear that Google Kubernetes Engine was going to have a lot of support from Google and be a fully-managed Kubernetes platform, it seemed very obvious to us that was the way to go," says Eynon-Lynch. "We didn’t really consider Terraform and the other competitors because the abstractions offered by Kubernetes just jumped off the page to us."

- Once the team started porting its Heroku apps into Kubernetes, which was "super easy," he says, the impact was immediate. "Before, to make a new version of the app meant going to Heroku and reconfiguring 10 new services, so basically no one was willing to do it, and we never staged things," he says. "Now we can deploy our exact same configuration in lots of different clusters in 30 seconds. We have a full set up that’s always running, and then any of our developers or designers can stage new versions with one command, including their recent changes. We stage all the time now, and everyone stopped talking about how cool it is because it’s become invisible how great it is." -

- Along with Kubernetes came Prometheus. "Until pretty recently we didn’t have any kind of visibility into aggregate server metrics or performance," says Eynon-Lynch. The team had tried to use Google Kubernetes Engine’s Stackdriver monitoring, but had problems making it work, and considered New Relic. When they started looking at Prometheus in the fall of 2016, "the fit between the abstractions in Prometheus and the way we think about how our system works, was so clear and obvious," he says.

- The integration with Kubernetes made set-up easy. Once Helm installed Prometheus, "We started getting a graph of the health of all our Kubernetes nodes and pods immediately. I think we were pretty hooked at that point," Eynon-Lynch says. "Then we got our own custom instrumentation working in 15 minutes, and had an actively updated count of requests that we could do, rate on and get a sense of how many users are connected at a given point. And then it was another hour before we had alarms automatically showing up in our Slack channel. All that was in one afternoon. And it was an afternoon of gasping with delight, basically!" -
-
+

As a former high school math teacher, CEO Riley Eynon-Lynch felt an urgency to provide a tech solution to classes where instructors struggle to interact with every student in a short amount of time. "Pear Deck is an app that students can use to interact with the teacher all at once," he says. "When the teacher asks a question, instead of just the kid at the front of the room answering again, everybody can answer every single question. It's a huge fundamental shift in the messaging to the students about how much we care about them and how much they are a part of the classroom."

-
-
- "We started getting a graph of the health of all our Kubernetes nodes and pods immediately. I think we were pretty hooked at that point," Eynon-Lynch says. "Then we got our own custom instrumentation working in 15 minutes, and had an actively updated count of requests that we could do, rate on and get a sense of how many users are connected at a given point. And then it was another hour before we had alarms automatically showing up in our Slack channel. All that was in one afternoon. And it was an afternoon of gasping with delight, basically!" -
-
+

Eynon-Lynch and his partners quickly built a JavaScript web app on Google's web app development platform Firebase, and launched the minimum viable product [MVP] on Heroku "because it was fast and easy," he says. "We made everything as easy as we could."

-
-
- With Pear Deck’s specific challenges—traffic through Firebase as well as government firewalls—Prometheus was a game-changer. "We didn’t even realize how stressed out we were about our lack of insight into what was happening with the app," Eynon-Lynch says. Before, when a customer would report that the app wasn’t working, the team had to manually investigate the problem without knowing whether customers were affected all over the world, or whether Firebase was down, and where.

- To help solve that problem, the team wrote a script that pings Firebase from several different geographical locations, and then reports the responses to Prometheus in a histogram. "A huge impact that Prometheus had on us was just an amazing sigh of relief, of feeling like we knew what was happening," he says. "It took 45 minutes to implement [the Firebase alarm] because we knew that we had this trustworthy metrics platform in Prometheus. We weren’t going to have to figure out, ‘Where do we send these metrics? How do we aggregate the metrics? How do we understand them?’"

- Plus, Prometheus has allowed Pear Deck to build alarms for business goals. One measures the rate of successful app loads and goes off if the day’s loads are less than 90 percent of the loads from seven days before. "We run a JavaScript app behind ridiculous firewalls and all kinds of crazy browser extensions messing with it—Chrome will push a feature that breaks some CSS that we’re using," Eynon-Lynch says. "So that gives us a lot of confidence, and we at least know that people are still logging into the app and using it all the time."

- Now, when a customer complains, and none of the alarms have gone off, the team can feel confident that it’s not a widespread problem. "Just to be sure, we can go and double check the graphs and say, ‘Yep, there’s currently 10,000 people connected to that Firebase node. It’s definitely working. Let’s investigate your network settings, customer,’" he says. "And we can pass that back off to our support reps instead of the whole development team freaking out that Firebase is down."

- Pear Deck is also giving back to the community, building and open-sourcing a metrics aggregator that enables end-user monitoring in Prometheus. "We can measure, for example, the time to interactive-dom on the web clients," he says. "The users all report that to our aggregator, then the aggregator reports to Prometheus. So we can set an alarm for some client side errors."

- Most of Pear Deck’s services have now been moved onto Kubernetes. And all of the team’s new code is going on Kubernetes. "Kubernetes lets us experiment with service configurations and stage them on a staging cluster all at once, and test different scenarios and talk about them as a development team looking at code, not just talking about the steps we would eventually take as humans," says Eynon-Lynch. -
-
+

But once it launched, the user base began growing steadily at a rate of 30 percent a month. "Our Heroku bill was getting totally insane," Eynon-Lynch says. But even more crucially, as the company hired more developers to keep pace, "we outgrew Heroku. We wanted to have multiple services and the deploying story got pretty horrendous. We were frustrated that we couldn't have the developers quickly stage a version. Tracing and monitoring became basically impossible."

-
-
- "A huge impact that Prometheus had on us was just an amazing sigh of relief, of feeling like we knew what was happening. It took 45 minutes to implement [the Firebase alarm] because we knew that we had this trustworthy metrics platform in Prometheus...in terms of the cloud, Kubernetes and Prometheus have so much to offer," he says. -
-
+

On top of that, many of Pear Deck's customers are behind government firewalls and connect through Firebase, not Pear Deck's servers, making troubleshooting even more difficult.

-
-
- Looking ahead, the team is planning to explore autoscaling on Kubernetes. With users all over the world but mostly in the United States, there are peaks and valleys in the traffic. One service that’s still on App Engine can get as many as 10,000 requests a second during the day but far less at night. "We pay for the same servers at night, so I understand there’s autoscaling that we can be taking advantage of," he says. "Implementing it is a big worry, exposing the rest of our Kubernetes cluster to us and maybe messing that up. But it’s definitely our intention to move everything over, because now none of the developers want to work on that app anymore because it’s such a pain to deploy it." -

-They’re also eager to explore the work that Kubernetes is doing with stateful sets. "Right now all of the services we run in Kubernetes are stateless, and Google basically runs our databases for us and manages backups," Eynon-Lynch says. "But we’re interested in building our own web-socket solution that doesn’t have to be super stateful but will have maybe an hour’s worth of state on it." -

-That project will also involve Prometheus, for a dark launch of web socket connections. "We don’t know how reliable web socket connections behind all these horrible firewalls will be to our servers," he says. "We don’t know what work Firebase has done to make them more reliable. So I’m really looking forward to trying to get persistent connections with web sockets to our clients and have optional tools to understand if it’s working. That’s our next new adventure, into stateful servers." -

-As for Prometheus, Eynon-Lynch thinks the company has only gotten started. "We haven’t instrumented all our important features, especially those that depend on third parties," he says. "We have to wait for those third parties to tell us they’re down, which sometimes they don’t do for a long time. So I’m really excited and have more and more confidence in the actual state of our application for our actual users, and not just what the CPU graphs are saying, because of Prometheus and Kubernetes." -

-For a spry startup that’s continuing to grow rapidly—and yes, they’re hiring!—Pear Deck is notably satisfied with how its infrastructure has evolved in the cloud native ecosystem. "Usually I have some angsty thing where I want to get to the new, better technology," says Eynon-Lynch, "but in terms of the cloud, Kubernetes and Prometheus have so much to offer." +

The team began looking around for another solution, and finally decided in early 2016 to start moving the app from Heroku to Docker containers running on Google Kubernetes Engine, orchestrated by Kubernetes and monitored with Prometheus.

-

-
-
+{{< case-studies/quote image="/images/case-studies/peardeck/banner1.jpg" >}} +"When it became clear that Google Kubernetes Engine was going to have a lot of support from Google and be a fully-managed Kubernetes platform, it seemed very obvious to us that was the way to go," says Eynon-Lynch. +{{< /case-studies/quote >}} + +

They had considered other options like Google's App Engine (which they were already using for one service) and Amazon's Elastic Compute Cloud (EC2), while experimenting with running one small service that wasn't accessible to the Internet in Kubernetes. "When it became clear that Google Kubernetes Engine was going to have a lot of support from Google and be a fully-managed Kubernetes platform, it seemed very obvious to us that was the way to go," says Eynon-Lynch. "We didn't really consider Terraform and the other competitors because the abstractions offered by Kubernetes just jumped off the page to us."

+ +

Once the team started porting its Heroku apps into Kubernetes, which was "super easy," he says, the impact was immediate. "Before, to make a new version of the app meant going to Heroku and reconfiguring 10 new services, so basically no one was willing to do it, and we never staged things," he says. "Now we can deploy our exact same configuration in lots of different clusters in 30 seconds. We have a full set up that's always running, and then any of our developers or designers can stage new versions with one command, including their recent changes. We stage all the time now, and everyone stopped talking about how cool it is because it's become invisible how great it is."

+ +

Along with Kubernetes came Prometheus. "Until pretty recently we didn't have any kind of visibility into aggregate server metrics or performance," says Eynon-Lynch. The team had tried to use Google Kubernetes Engine's Stackdriver monitoring, but had problems making it work, and considered New Relic. When they started looking at Prometheus in the fall of 2016, "the fit between the abstractions in Prometheus and the way we think about how our system works, was so clear and obvious," he says.

+ +

The integration with Kubernetes made set-up easy. Once Helm installed Prometheus, "We started getting a graph of the health of all our Kubernetes nodes and pods immediately. I think we were pretty hooked at that point," Eynon-Lynch says. "Then we got our own custom instrumentation working in 15 minutes, and had an actively updated count of requests that we could do, rate on and get a sense of how many users are connected at a given point. And then it was another hour before we had alarms automatically showing up in our Slack channel. All that was in one afternoon. And it was an afternoon of gasping with delight, basically!"

+ +{{< case-studies/quote image="/images/case-studies/peardeck/banner2.jpg" >}} +"We started getting a graph of the health of all our Kubernetes nodes and pods immediately. I think we were pretty hooked at that point," Eynon-Lynch says. "Then we got our own custom instrumentation working in 15 minutes, and had an actively updated count of requests that we could do, rate on and get a sense of how many users are connected at a given point. And then it was another hour before we had alarms automatically showing up in our Slack channel. All that was in one afternoon. And it was an afternoon of gasping with delight, basically!" +{{< /case-studies/quote >}} + +

With Pear Deck's specific challenges—traffic through Firebase as well as government firewalls—Prometheus was a game-changer. "We didn't even realize how stressed out we were about our lack of insight into what was happening with the app," Eynon-Lynch says. Before, when a customer would report that the app wasn't working, the team had to manually investigate the problem without knowing whether customers were affected all over the world, or whether Firebase was down, and where.

+ +

To help solve that problem, the team wrote a script that pings Firebase from several different geographical locations, and then reports the responses to Prometheus in a histogram. "A huge impact that Prometheus had on us was just an amazing sigh of relief, of feeling like we knew what was happening," he says. "It took 45 minutes to implement [the Firebase alarm] because we knew that we had this trustworthy metrics platform in Prometheus. We weren't going to have to figure out, 'Where do we send these metrics? How do we aggregate the metrics? How do we understand them?'"

+ +

Plus, Prometheus has allowed Pear Deck to build alarms for business goals. One measures the rate of successful app loads and goes off if the day's loads are less than 90 percent of the loads from seven days before. "We run a JavaScript app behind ridiculous firewalls and all kinds of crazy browser extensions messing with it—Chrome will push a feature that breaks some CSS that we're using," Eynon-Lynch says. "So that gives us a lot of confidence, and we at least know that people are still logging into the app and using it all the time."

+ +

Now, when a customer complains, and none of the alarms have gone off, the team can feel confident that it's not a widespread problem. "Just to be sure, we can go and double check the graphs and say, 'Yep, there's currently 10,000 people connected to that Firebase node. It's definitely working. Let's investigate your network settings, customer,'" he says. "And we can pass that back off to our support reps instead of the whole development team freaking out that Firebase is down."

+ +

Pear Deck is also giving back to the community, building and open-sourcing a metrics aggregator that enables end-user monitoring in Prometheus. "We can measure, for example, the time to interactive-dom on the web clients," he says. "The users all report that to our aggregator, then the aggregator reports to Prometheus. So we can set an alarm for some client side errors."

+ +

Most of Pear Deck's services have now been moved onto Kubernetes. And all of the team's new code is going on Kubernetes. "Kubernetes lets us experiment with service configurations and stage them on a staging cluster all at once, and test different scenarios and talk about them as a development team looking at code, not just talking about the steps we would eventually take as humans," says Eynon-Lynch.

+ +{{< case-studies/quote >}} +"A huge impact that Prometheus had on us was just an amazing sigh of relief, of feeling like we knew what was happening. It took 45 minutes to implement [the Firebase alarm] because we knew that we had this trustworthy metrics platform in Prometheus...in terms of the cloud, Kubernetes and Prometheus have so much to offer," he says. +{{< /case-studies/quote >}} + +

Looking ahead, the team is planning to explore autoscaling on Kubernetes. With users all over the world but mostly in the United States, there are peaks and valleys in the traffic. One service that's still on App Engine can get as many as 10,000 requests a second during the day but far less at night. "We pay for the same servers at night, so I understand there's autoscaling that we can be taking advantage of," he says. "Implementing it is a big worry, exposing the rest of our Kubernetes cluster to us and maybe messing that up. But it's definitely our intention to move everything over, because now none of the developers want to work on that app anymore because it's such a pain to deploy it."

+ +

They're also eager to explore the work that Kubernetes is doing with stateful sets. "Right now all of the services we run in Kubernetes are stateless, and Google basically runs our databases for us and manages backups," Eynon-Lynch says. "But we're interested in building our own web-socket solution that doesn't have to be super stateful but will have maybe an hour's worth of state on it."

+ +

That project will also involve Prometheus, for a dark launch of web socket connections. "We don't know how reliable web socket connections behind all these horrible firewalls will be to our servers," he says. "We don't know what work Firebase has done to make them more reliable. So I'm really looking forward to trying to get persistent connections with web sockets to our clients and have optional tools to understand if it's working. That's our next new adventure, into stateful servers."

+ +

As for Prometheus, Eynon-Lynch thinks the company has only gotten started. "We haven't instrumented all our important features, especially those that depend on third parties," he says. "We have to wait for those third parties to tell us they're down, which sometimes they don't do for a long time. So I'm really excited and have more and more confidence in the actual state of our application for our actual users, and not just what the CPU graphs are saying, because of Prometheus and Kubernetes."

+ +

For a spry startup that's continuing to grow rapidly—and yes, they're hiring!—Pear Deck is notably satisfied with how its infrastructure has evolved in the cloud native ecosystem. "Usually I have some angsty thing where I want to get to the new, better technology," says Eynon-Lynch, "but in terms of the cloud, Kubernetes and Prometheus have so much to offer."

diff --git a/content/en/case-studies/pearson/index.html b/content/en/case-studies/pearson/index.html index 78f70228e5..501bcea8e7 100644 --- a/content/en/case-studies/pearson/index.html +++ b/content/en/case-studies/pearson/index.html @@ -3,85 +3,81 @@ title: Pearson Case Study linkTitle: Pearson case_study_styles: true cid: caseStudies -css: /css/style_case_studies.css featured: false quote: > - We’re already seeing tremendous benefits with Kubernetes—improved engineering productivity, faster delivery of applications and a simplified infrastructure. But this is just the beginning. Kubernetes will help transform the way that educational content is delivered online. + We're already seeing tremendous benefits with Kubernetes—improved engineering productivity, faster delivery of applications and a simplified infrastructure. But this is just the beginning. Kubernetes will help transform the way that educational content is delivered online. + +new_case_study_styles: true +heading_background: /images/case-studies/pearson/banner1.jpg +heading_title_logo: /images/pearson_logo.png +subheading: > + Reinventing the World's Largest Education Company With Kubernetes +case_study_details: + - Company: Pearson + - Location: Global + - Industry: Education --- -
-

CASE STUDY:
Reinventing the World’s Largest Education Company With Kubernetes -

-
-
- Company  Pearson     Location  Global -      Industry  Education -
-
-
-
-
-

Challenge

- A global education company serving 75 million learners, Pearson set a goal to more than double that number, to 200 million, by 2025. A key part of this growth is in digital learning experiences, and Pearson was having difficulty in scaling and adapting to its growing online audience. They needed an infrastructure platform that would be able to scale quickly and deliver products to market faster. -
-

Solution

- "To transform our infrastructure, we had to think beyond simply enabling automated provisioning," says Chris Jackson, Director for Cloud Platforms & SRE at Pearson. "We realized we had to build a platform that would allow Pearson developers to build, manage and deploy applications in a completely different way." The team chose Docker container technology and Kubernetes orchestration "because of its flexibility, ease of management and the way it would improve our engineers’ productivity." -
-
-
-

Impact

- With the platform, there has been substantial improvements in productivity and speed of delivery. "In some cases, we’ve gone from nine months to provision physical assets in a data center to just a few minutes to provision and get a new idea in front of a customer," says John Shirley, Lead Site Reliability Engineer for the Cloud Platform Team. Jackson estimates they’ve achieved 15-20% developer productivity savings. Before, outages were an issue during their busiest time of year, the back-to-school period. Now, there’s high confidence in their ability to meet aggressive customer SLAs. -
-
-
-
-
-
- "We’re already seeing tremendous benefits with Kubernetes—improved engineering productivity, faster delivery of applications and a simplified infrastructure. But this is just the beginning. Kubernetes will help transform the way that educational content is delivered online."

— Chris Jackson, Director for Cloud Platforms & SRE at Pearson -
-
-
-
- In 2015, Pearson was already serving 75 million learners as the world’s largest education company, offering curriculum and assessment tools for Pre-K through college and beyond. Understanding that innovating the digital education experience was the key to the future of all forms of education, the company set out to increase its reach to 200 million people by 2025.

- That goal would require a transformation of its existing infrastructure, which was in data centers. In some cases, it took nine months to provision physical assets. In order to adapt to the demands of its growing online audience, Pearson needed an infrastructure platform that would be able to scale quickly and deliver business-critical products to market faster. "We had to think beyond simply enabling automated provisioning," says Chris Jackson, Director for Cloud Platforms & SRE at Pearson. "We realized we had to build a platform that would allow Pearson developers to build, manage and deploy applications in a completely different way."

- With 400 development groups and diverse brands with varying business and technical needs, Pearson embraced Docker container technology so that each brand could experiment with building new types of content using their preferred technologies, and then deliver it using containers. Jackson chose Kubernetes orchestration "because of its flexibility, ease of management and the way it would improve our engineers’ productivity," he says.

- The team adopted Kubernetes when it was still version 1.2 and are still going strong now on 1.7; they use Terraform and Ansible to deploy it on to basic AWS primitives. "We were trying to understand how we can create value for Pearson from this technology," says Ben Somogyi, Principal Architect for the Cloud Platforms. "It turned out that Kubernetes’ benefits are huge. We’re trying to help our applications development teams that use our platform go faster, so we filled that gap with a CI/CD pipeline that builds their images for them, standardizes them, patches everything up, allows them to deploy their different environments onto the cluster, and obfuscating the details of how difficult the work underneath the covers is." -
-
-
-
- "Your internal customers need to feel like they are choosing the very best option for them. We are experiencing this first hand in the growth of adoption. We are seeing triple-digit, year-on-year growth of the service."

— Chris Jackson, Director for Cloud Platforms & SRE at Pearson
-
-
-
-
- That work resulted in two tools for building and deploying applications in the cluster that Pearson has open sourced. "We’re an education company, so we want to share what we can," says Somogyi.

- Now that development teams no longer have to worry about infrastructure, there have been substantial improvements in productivity and speed of delivery. "In some cases, we’ve gone from nine months to provision physical assets in a data center to just a few minutes to provision and to get a new idea in front of a customer," says John Shirley, Lead Site Reliability Engineer for the Cloud Platform Team.

- According to Jackson, the Cloud Platforms team can "provision a new proof-of-concept environment for a development team in minutes, and then they can take that to production as quickly as they are able to. This is the value proposition of all major technology services, and we had to compete like one to become our developers’ preferred choice. Just because you work for the same company, you do not have the right to force people into a mediocre service. Your internal customers need to feel like they are choosing the very best option for them. We are experiencing this first hand in the growth of adoption. We are seeing triple-digit, year-on-year growth of the service."

- Jackson estimates they’ve achieved a 15-20% boost in productivity for developer teams who adopt the platform. They also see a reduction in the number of customer-impacting incidents. Plus, says Jackson, "Teams who were previously limited to 1-2 releases per academic year can now ship code multiple times per day!" -
-
-
-
- "Teams who were previously limited to 1-2 releases per academic year can now ship code multiple times per day!"

— Chris Jackson, Director for Cloud Platforms & SRE at Pearson
-
-
-
-
- Availability has also been positively impacted. The back-to-school period is the company’s busiest time of year, and "you have to keep applications up," says Somogyi. Before, this was a pain point for the legacy infrastructure. Now, for the applications that have been migrated to the Kubernetes platform, "We have 100% uptime. We’re not worried about 9s. There aren’t any. It’s 100%, which is pretty astonishing for us, compared to some of the existing platforms that have legacy challenges," says Shirley. -

- "You can’t even begin to put a price on how much that saves the company," Jackson explains. "A reduction in the number of support cases takes load out of our operations. The customer sentiment of having a reliable product drives customer retention and growth. It frees us to think about investing more into our digital transformation and taking a better quality of education to a global scale." -

- The platform itself is also being broken down, "so we can quickly release smaller pieces of the platform, like upgrading our Kubernetes or all the different modules that make up our platform," says Somogyi. "One of the big focuses in 2018 is this scheme of delivery to update the platform itself." -

- Guided by Pearson’s overarching goal of getting to 200 million users, the team has run internal tests of the platform’s scalability. "We had a challenge: 28 million requests within a 10 minute period," says Shirley. "And we demonstrated that we can hit that, with an acceptable latency. We saw that we could actually get that pretty readily, and we scaled up in just a few seconds, using open source tools entirely. Shout out to Locustfor that one. So that’s amazing." -
-
-
- "We have 100% uptime. We’re not worried about 9s. There aren’t any. It’s 100%, which is pretty astonishing for us, compared to some of the existing platforms that have legacy challenges. You can’t even begin to put a price on how much that saves the company."

— Benjamin Somogyi, Principal Systems Architect at Pearson
-
-
- In just two years, "We’re already seeing tremendous benefits with Kubernetes—improved engineering productivity, faster delivery of applications and a simplified infrastructure," says Jackson. "But this is just the beginning. Kubernetes will help transform the way that educational content is delivered online."

- So far, about 15 production products are running on the new platform, including Pearson’s new flagship digital education service, the Global Learning Platform. The Cloud Platform team continues to prepare, onboard and support customers that are a good fit for the platform. Some existing products will be refactored into 12-factor apps, while others are being developed so that they can live on the platform from the get-go. "There are challenges with bringing in new customers of course, because we have to help them to see a different way of developing, a different way of building," says Shirley.

- But, he adds, "It is our corporate motto: Always Learning. We encourage those teams that haven’t started a cloud native journey, to see the future of technology, to learn, to explore. It will pique your interest. Keep learning." -
-
+ +

Challenge

+ +

A global education company serving 75 million learners, Pearson set a goal to more than double that number, to 200 million, by 2025. A key part of this growth is in digital learning experiences, and Pearson was having difficulty in scaling and adapting to its growing online audience. They needed an infrastructure platform that would be able to scale quickly and deliver products to market faster.

+ +

Solution

+ +

"To transform our infrastructure, we had to think beyond simply enabling automated provisioning," says Chris Jackson, Director for Cloud Platforms & SRE at Pearson. "We realized we had to build a platform that would allow Pearson developers to build, manage and deploy applications in a completely different way." The team chose Docker container technology and Kubernetes orchestration "because of its flexibility, ease of management and the way it would improve our engineers' productivity."

+ +

Impact

+ +

With the platform, there has been substantial improvements in productivity and speed of delivery. "In some cases, we've gone from nine months to provision physical assets in a data center to just a few minutes to provision and get a new idea in front of a customer," says John Shirley, Lead Site Reliability Engineer for the Cloud Platform Team. Jackson estimates they've achieved 15-20% developer productivity savings. Before, outages were an issue during their busiest time of year, the back-to-school period. Now, there's high confidence in their ability to meet aggressive customer SLAs.

+ +{{< case-studies/quote author="Chris Jackson, Director for Cloud Platforms & SRE at Pearson" >}} +"We're already seeing tremendous benefits with Kubernetes—improved engineering productivity, faster delivery of applications and a simplified infrastructure. But this is just the beginning. Kubernetes will help transform the way that educational content is delivered online." +{{< /case-studies/quote >}} + +

In 2015, Pearson was already serving 75 million learners as the world's largest education company, offering curriculum and assessment tools for Pre-K through college and beyond. Understanding that innovating the digital education experience was the key to the future of all forms of education, the company set out to increase its reach to 200 million people by 2025.

+ +

That goal would require a transformation of its existing infrastructure, which was in data centers. In some cases, it took nine months to provision physical assets. In order to adapt to the demands of its growing online audience, Pearson needed an infrastructure platform that would be able to scale quickly and deliver business-critical products to market faster. "We had to think beyond simply enabling automated provisioning," says Chris Jackson, Director for Cloud Platforms & SRE at Pearson. "We realized we had to build a platform that would allow Pearson developers to build, manage and deploy applications in a completely different way."

+ +

With 400 development groups and diverse brands with varying business and technical needs, Pearson embraced Docker container technology so that each brand could experiment with building new types of content using their preferred technologies, and then deliver it using containers. Jackson chose Kubernetes orchestration "because of its flexibility, ease of management and the way it would improve our engineers' productivity," he says.

+ +

The team adopted Kubernetes when it was still version 1.2 and are still going strong now on 1.7; they use Terraform and Ansible to deploy it on to basic AWS primitives. "We were trying to understand how we can create value for Pearson from this technology," says Ben Somogyi, Principal Architect for the Cloud Platforms. "It turned out that Kubernetes' benefits are huge. We're trying to help our applications development teams that use our platform go faster, so we filled that gap with a CI/CD pipeline that builds their images for them, standardizes them, patches everything up, allows them to deploy their different environments onto the cluster, and obfuscating the details of how difficult the work underneath the covers is."

+ +{{< case-studies/quote + image="/images/case-studies/pearson/banner3.jpg" + author="Chris Jackson, Director for Cloud Platforms & SRE at Pearson" +>}} +"Your internal customers need to feel like they are choosing the very best option for them. We are experiencing this first hand in the growth of adoption. We are seeing triple-digit, year-on-year growth of the service." +{{< /case-studies/quote >}} + +

That work resulted in two tools for building and deploying applications in the cluster that Pearson has open sourced. "We're an education company, so we want to share what we can," says Somogyi.

+ +

Now that development teams no longer have to worry about infrastructure, there have been substantial improvements in productivity and speed of delivery. "In some cases, we've gone from nine months to provision physical assets in a data center to just a few minutes to provision and to get a new idea in front of a customer," says John Shirley, Lead Site Reliability Engineer for the Cloud Platform Team.

+ +

According to Jackson, the Cloud Platforms team can "provision a new proof-of-concept environment for a development team in minutes, and then they can take that to production as quickly as they are able to. This is the value proposition of all major technology services, and we had to compete like one to become our developers' preferred choice. Just because you work for the same company, you do not have the right to force people into a mediocre service. Your internal customers need to feel like they are choosing the very best option for them. We are experiencing this first hand in the growth of adoption. We are seeing triple-digit, year-on-year growth of the service."

+ +

Jackson estimates they've achieved a 15-20% boost in productivity for developer teams who adopt the platform. They also see a reduction in the number of customer-impacting incidents. Plus, says Jackson, "Teams who were previously limited to 1-2 releases per academic year can now ship code multiple times per day!"

+ +{{< case-studies/quote + image="/images/case-studies/pearson/banner4.jpg" + author="Chris Jackson, Director for Cloud Platforms & SRE at Pearson" +>}} +"Teams who were previously limited to 1-2 releases per academic year can now ship code multiple times per day!" +{{< /case-studies/quote >}} + +

Availability has also been positively impacted. The back-to-school period is the company's busiest time of year, and "you have to keep applications up," says Somogyi. Before, this was a pain point for the legacy infrastructure. Now, for the applications that have been migrated to the Kubernetes platform, "We have 100% uptime. We're not worried about 9s. There aren't any. It's 100%, which is pretty astonishing for us, compared to some of the existing platforms that have legacy challenges," says Shirley.

+ +

"You can't even begin to put a price on how much that saves the company," Jackson explains. "A reduction in the number of support cases takes load out of our operations. The customer sentiment of having a reliable product drives customer retention and growth. It frees us to think about investing more into our digital transformation and taking a better quality of education to a global scale."

+ +

The platform itself is also being broken down, "so we can quickly release smaller pieces of the platform, like upgrading our Kubernetes or all the different modules that make up our platform," says Somogyi. "One of the big focuses in 2018 is this scheme of delivery to update the platform itself."

+ +

Guided by Pearson's overarching goal of getting to 200 million users, the team has run internal tests of the platform's scalability. "We had a challenge: 28 million requests within a 10 minute period," says Shirley. "And we demonstrated that we can hit that, with an acceptable latency. We saw that we could actually get that pretty readily, and we scaled up in just a few seconds, using open source tools entirely. Shout out to Locustfor that one. So that's amazing."

+ +{{< case-studies/quote author="Benjamin Somogyi, Principal Systems Architect at Pearson" >}} +"We have 100% uptime. We're not worried about 9s. There aren't any. It's 100%, which is pretty astonishing for us, compared to some of the existing platforms that have legacy challenges. You can't even begin to put a price on how much that saves the company." +{{< /case-studies/quote >}} + +

In just two years, "We're already seeing tremendous benefits with Kubernetes—improved engineering productivity, faster delivery of applications and a simplified infrastructure," says Jackson. "But this is just the beginning. Kubernetes will help transform the way that educational content is delivered online."

+ +

So far, about 15 production products are running on the new platform, including Pearson's new flagship digital education service, the Global Learning Platform. The Cloud Platform team continues to prepare, onboard and support customers that are a good fit for the platform. Some existing products will be refactored into 12-factor apps, while others are being developed so that they can live on the platform from the get-go. "There are challenges with bringing in new customers of course, because we have to help them to see a different way of developing, a different way of building," says Shirley.

+ +

But, he adds, "It is our corporate motto: Always Learning. We encourage those teams that haven't started a cloud native journey, to see the future of technology, to learn, to explore. It will pique your interest. Keep learning."

diff --git a/content/en/case-studies/pingcap/index.html b/content/en/case-studies/pingcap/index.html index 8d032c7a8b..07c1e3e086 100644 --- a/content/en/case-studies/pingcap/index.html +++ b/content/en/case-studies/pingcap/index.html @@ -3,94 +3,77 @@ title: pingcap Case Study linkTitle: pingcap case_study_styles: true cid: caseStudies -css: /css/style_case_studies.css featured: false + +new_case_study_styles: true +heading_background: /images/case-studies/pingcap/banner1.jpg +heading_title_logo: /images/pingcap_logo.png +subheading: > + PingCAP Bets on Cloud Native for Its TiDB Database Platform +case_study_details: + - Company: PingCAP + - Location: Beijing, China, and San Mateo, CA + - Industry: Software --- -
-

CASE STUDY:
PingCAP Bets on Cloud Native for Its TiDB Database Platform +

Challenge

-

+

PingCAP is the company leading the development of the popular open source NewSQL database TiDB, which is MySQL-compatible, can handle hybrid transactional and analytical processing (HTAP) workloads, and has a cloud native architectural design. "Having a hybrid multi-cloud product is an important part of our global go-to-market strategy," says Kevin Xu, General Manager of Global Strategy and Operations. In order to achieve that, the team had to address two challenges: "how to deploy, run, and manage a distributed stateful application, such as a distributed database like TiDB, in a containerized world," Xu says, and "how to deliver an easy-to-use, consistent, and reliable experience for our customers when they use TiDB in the cloud, any cloud, whether that's one cloud provider or a combination of different cloud environments." Knowing that using a distributed system isn't easy, they began looking for the right orchestration layer to help reduce some of that complexity for end users.

-
+

Solution

-
- Company  PingCAP     Location  Beijing, China, and San Mateo, CA     Industry  Software -
+

The team started looking at Kubernetes for orchestration early on. "We knew Kubernetes had the promise of helping us solve our problems," says Xu. "We were just waiting for it to mature." In early 2018, PingCAP began integrating Kubernetes into its internal development as well as in its TiDB product. At that point, the team has already had experience using other cloud native technologies, having integrated both Prometheus and gRPC as parts of the TiDB platform earlier on.

-
-
-
-
-

Challenge

- PingCAP is the company leading the development of the popular open source NewSQL database TiDB, which is MySQL-compatible, can handle hybrid transactional and analytical processing (HTAP) workloads, and has a cloud native architectural design. "Having a hybrid multi-cloud product is an important part of our global go-to-market strategy," says Kevin Xu, General Manager of Global Strategy and Operations. In order to achieve that, the team had to address two challenges: "how to deploy, run, and manage a distributed stateful application, such as a distributed database like TiDB, in a containerized world," Xu says, and "how to deliver an easy-to-use, consistent, and reliable experience for our customers when they use TiDB in the cloud, any cloud, whether that’s one cloud provider or a combination of different cloud environments." Knowing that using a distributed system isn’t easy, they began looking for the right orchestration layer to help reduce some of that complexity for end users. -

Solution

- The team started looking at Kubernetes for orchestration early on. "We knew Kubernetes had the promise of helping us solve our problems," says Xu. "We were just waiting for it to mature." In early 2018, PingCAP began integrating Kubernetes into its internal development as well as in its TiDB product. At that point, the team has already had experience using other cloud native technologies, having integrated both Prometheus and gRPC as parts of the TiDB platform earlier on. -

Impact

- Xu says that PingCAP customers have had a "very positive" response so far to Kubernetes being the tool to deploy and manage TiDB. Prometheus, with Grafana as the dashboard, is installed by default when customers deploy TiDB, so that they can monitor performance and make any adjustments needed to reach their target before and while deploying TiDB in production. That monitoring layer "makes the evaluation process and communication much smoother," says Xu. -

- With the company’s Kubernetes-based Operator implementation, which is open sourced, customers are now able to deploy, run, manage, upgrade, and maintain their TiDB clusters in the cloud with no downtime, and reduced workload, burden and overhead. And internally, says Xu, "we’ve completely switched to Kubernetes for our own development and testing, including our data center infrastructure and Schrodinger, an automated testing platform for TiDB. With Kubernetes, our resource usage is greatly improved. Our developers can allocate and deploy clusters themselves, and the deploying process has gone from hours to minutes, so we can devote fewer people to manage IDC resources. The productivity improvement is about 15%, and as we gain more Kubernetes knowledge on the debugging and diagnosis front, the productivity should improve to more than 20%." -
+

Xu says that PingCAP customers have had a "very positive" response so far to Kubernetes being the tool to deploy and manage TiDB. Prometheus, with Grafana as the dashboard, is installed by default when customers deploy TiDB, so that they can monitor performance and make any adjustments needed to reach their target before and while deploying TiDB in production. That monitoring layer "makes the evaluation process and communication much smoother," says Xu.

-
-
-
-
- "We knew Kubernetes had the promise of helping us solve our problems. We were just waiting for it to mature, so we can fold it into our own development and product roadmap." -

- KEVIN XU, GENERAL MANAGER OF GLOBAL STRATEGY AND OPERATIONS, PINGCAP
-
-
-
-
-

Since it was introduced in 2015, the open source NewSQL database TiDB has gained a following for its compatibility with MySQL, its ability to handle hybrid transactional and analytical processing (HTAP) workloads—and its cloud native architectural design.

- PingCAP, the company behind TiDB, designed the platform with cloud in mind from day one, says Kevin Xu, General Manager of Global Strategy and Operations, and "having a hybrid multi-cloud product is an important part of our global go-to-market strategy." -

- In order to achieve that, the team had to address two challenges: "how to deploy, run, and manage a distributed stateful application, such as a distributed database like TiDB, in a containerized world," Xu says, and "how to deliver an easy-to-use, consistent, and reliable experience for our customers when they use TiDB in the cloud, any cloud, whether that’s one cloud provider or a combination of different cloud environments." -

- Knowing that using a distributed system isn’t easy, the PingCAP team began looking for the right orchestration layer to help reduce some of that complexity for end users. Kubernetes had been on their radar for quite some time. "We knew Kubernetes had the promise of helping us solve our problems," says Xu. "We were just waiting for it to mature." -
-
-
-
- "With the governance process being so open, it’s not hard to find out what’s the latest development in the technology and community, or figure out who to reach out to if we have problems or issues."

- KEVIN XU, GENERAL MANAGER OF GLOBAL STRATEGY AND OPERATIONS, PINGCAP
+

With the company's Kubernetes-based Operator implementation, which is open sourced, customers are now able to deploy, run, manage, upgrade, and maintain their TiDB clusters in the cloud with no downtime, and reduced workload, burden and overhead. And internally, says Xu, "we've completely switched to Kubernetes for our own development and testing, including our data center infrastructure and Schrodinger, an automated testing platform for TiDB. With Kubernetes, our resource usage is greatly improved. Our developers can allocate and deploy clusters themselves, and the deploying process has gone from hours to minutes, so we can devote fewer people to manage IDC resources. The productivity improvement is about 15%, and as we gain more Kubernetes knowledge on the debugging and diagnosis front, the productivity should improve to more than 20%."

-
-
-
-
- That time came in early 2018, when PingCAP began integrating Kubernetes into its internal development as well as in its TiDB product. "Having Kubernetes be part of the CNCF, as opposed to having only the backing of one individual company, was valuable in having confidence in the longevity of the technology," says Xu. Plus, "with the governance process being so open, it’s not hard to find out what’s the latest development in the technology and community, or figure out who to reach out to if we have problems or issues." -

- TiDB’s cloud native architecture consists of a stateless SQL layer (also called TiDB) and a persistent key-value storage layer that supports distributed transactions (TiKV, which is now in the CNCF Sandbox), which are loosely coupled. "You can scale both out or in depending on your computation and storage needs, and the two scaling processes can happen independent of each other," says Xu. The PingCAP team also built the TiDB Operator based on Kubernetes, which helps bootstrap a TiDB cluster on any cloud environment and simplifies and automates deployment, scaling, scheduling, upgrades, and maintenance. The company also recently previewed its fully-managed TiDB Cloud offering. -
-
-
-
- "A cloud native infrastructure will not only save you money and allow you to be more in control of the infrastructure resources you consume, but also empower new product innovation, new experience for your users, and new business possibilities. It’s both a cost reducer and a money maker."

- KEVIN XU, GENERAL MANAGER OF GLOBAL STRATEGY AND OPERATIONS, PINGCAP
-
-
+{{< case-studies/quote author="KEVIN XU, GENERAL MANAGER OF GLOBAL STRATEGY AND OPERATIONS, PINGCAP" >}} +"We knew Kubernetes had the promise of helping us solve our problems. We were just waiting for it to mature, so we can fold it into our own development and product roadmap." +{{< /case-studies/quote >}} -
-
- The entire TiDB platform leverages Kubernetes and other cloud native technologies, including Prometheus for monitoring and gRPC for interservice communication. -

- So far, the customer response to the Kubernetes-enabled platform has been "very positive." Prometheus, with Grafana as the dashboard, is installed by default when customers deploy TiDB, so that they can monitor and make any adjustments needed to reach their performance requirements before deploying TiDB in production. That monitoring layer "makes the evaluation process and communication much smoother," says Xu. With the company’s Kubernetes-based Operator implementation, customers are now able to deploy, run, manage, upgrade, and maintain their TiDB clusters in the cloud with no downtime, and reduced workload, burden and overhead. -

- These technologies have also had an impact internally. "We’ve completely switched to Kubernetes for our own development and testing, including our data center infrastructure and Schrodinger, an automated testing platform for TiDB," says Xu. "With Kubernetes, our resource usage is greatly improved. Our developers can allocate and deploy clusters themselves, and the deploying process takes less time, so we can devote fewer people to manage IDC resources. -
+{{< case-studies/lead >}} +Since it was introduced in 2015, the open source NewSQL database TiDB has gained a following for its compatibility with MySQL, its ability to handle hybrid transactional and analytical processing (HTAP) workloads—and its cloud native architectural design. +{{< /case-studies/lead >}} -
-
-"The entire cloud native community, whether it’s Kubernetes, CNCF in general, or cloud native vendors like us, have all gained enough experience—and have the battle scars to prove it—and are ready to help you succeed."

- KEVIN XU, GENERAL MANAGER OF GLOBAL STRATEGY AND OPERATIONS, PINGCAP
-
+

PingCAP, the company behind TiDB, designed the platform with cloud in mind from day one, says Kevin Xu, General Manager of Global Strategy and Operations, and "having a hybrid multi-cloud product is an important part of our global go-to-market strategy."

-
- The productivity improvement is about 15%, and as we gain more Kubernetes knowledge on the debugging and diagnosis front, the productivity should improve to more than 20%." -

- Kubernetes is now a crucial part of PingCAP’s product roadmap. For anyone else considering going cloud native, Xu has this advice: "There’s no better time to get started," he says. "The entire cloud native community, whether it’s Kubernetes, CNCF in general, or cloud native vendors like us, have all gained enough experience—and have the battle scars to prove it—and are ready to help you succeed." -

- In fact, the PingCAP team has seen more and more customers moving toward a cloud native approach, and for good reason. "IT infrastructure is quickly evolving from a cost-center and afterthought, to the core competency and competitiveness of any company," says Xu. "A cloud native infrastructure will not only save you money and allow you to be more in control of the infrastructure resources you consume, but also empower new product innovation, new experience for your users, and new business possibilities. It’s both a cost reducer and a money maker." +

In order to achieve that, the team had to address two challenges: "how to deploy, run, and manage a distributed stateful application, such as a distributed database like TiDB, in a containerized world," Xu says, and "how to deliver an easy-to-use, consistent, and reliable experience for our customers when they use TiDB in the cloud, any cloud, whether that's one cloud provider or a combination of different cloud environments."

-
-
+

Knowing that using a distributed system isn't easy, the PingCAP team began looking for the right orchestration layer to help reduce some of that complexity for end users. Kubernetes had been on their radar for quite some time. "We knew Kubernetes had the promise of helping us solve our problems," says Xu. "We were just waiting for it to mature."

+ +{{< case-studies/quote + image="/images/case-studies/pingcap/banner3.jpg" + author="KEVIN XU, GENERAL MANAGER OF GLOBAL STRATEGY AND OPERATIONS, PINGCAP" +>}} +"With the governance process being so open, it's not hard to find out what's the latest development in the technology and community, or figure out who to reach out to if we have problems or issues." +{{< /case-studies/quote >}} + +

That time came in early 2018, when PingCAP began integrating Kubernetes into its internal development as well as in its TiDB product. "Having Kubernetes be part of the CNCF, as opposed to having only the backing of one individual company, was valuable in having confidence in the longevity of the technology," says Xu. Plus, "with the governance process being so open, it's not hard to find out what's the latest development in the technology and community, or figure out who to reach out to if we have problems or issues."

+ +

TiDB's cloud native architecture consists of a stateless SQL layer (also called TiDB) and a persistent key-value storage layer that supports distributed transactions (TiKV, which is now in the CNCF Sandbox), which are loosely coupled. "You can scale both out or in depending on your computation and storage needs, and the two scaling processes can happen independent of each other," says Xu. The PingCAP team also built the TiDB Operator based on Kubernetes, which helps bootstrap a TiDB cluster on any cloud environment and simplifies and automates deployment, scaling, scheduling, upgrades, and maintenance. The company also recently previewed its fully-managed TiDB Cloud offering.

+ +{{< case-studies/quote + image="/images/case-studies/pingcap/banner4.jpg" + author="KEVIN XU, GENERAL MANAGER OF GLOBAL STRATEGY AND OPERATIONS, PINGCAP" +>}} +"A cloud native infrastructure will not only save you money and allow you to be more in control of the infrastructure resources you consume, but also empower new product innovation, new experience for your users, and new business possibilities. It's both a cost reducer and a money maker." +{{< /case-studies/quote >}} + +

The entire TiDB platform leverages Kubernetes and other cloud native technologies, including Prometheus for monitoring and gRPC for interservice communication.

+ +

So far, the customer response to the Kubernetes-enabled platform has been "very positive." Prometheus, with Grafana as the dashboard, is installed by default when customers deploy TiDB, so that they can monitor and make any adjustments needed to reach their performance requirements before deploying TiDB in production. That monitoring layer "makes the evaluation process and communication much smoother," says Xu. With the company's Kubernetes-based Operator implementation, customers are now able to deploy, run, manage, upgrade, and maintain their TiDB clusters in the cloud with no downtime, and reduced workload, burden and overhead.

+ +

These technologies have also had an impact internally. "We've completely switched to Kubernetes for our own development and testing, including our data center infrastructure and Schrodinger, an automated testing platform for TiDB," says Xu. "With Kubernetes, our resource usage is greatly improved. Our developers can allocate and deploy clusters themselves, and the deploying process takes less time, so we can devote fewer people to manage IDC resources.

+ +{{< case-studies/quote author="KEVIN XU, GENERAL MANAGER OF GLOBAL STRATEGY AND OPERATIONS, PINGCAP" >}} +"The entire cloud native community, whether it's Kubernetes, CNCF in general, or cloud native vendors like us, have all gained enough experience—and have the battle scars to prove it—and are ready to help you succeed." +{{< /case-studies/quote >}} + +

The productivity improvement is about 15%, and as we gain more Kubernetes knowledge on the debugging and diagnosis front, the productivity should improve to more than 20%."

+ +

Kubernetes is now a crucial part of PingCAP's product roadmap. For anyone else considering going cloud native, Xu has this advice: "There's no better time to get started," he says. "The entire cloud native community, whether it's Kubernetes, CNCF in general, or cloud native vendors like us, have all gained enough experience—and have the battle scars to prove it—and are ready to help you succeed."

+ +

In fact, the PingCAP team has seen more and more customers moving toward a cloud native approach, and for good reason. "IT infrastructure is quickly evolving from a cost-center and afterthought, to the core competency and competitiveness of any company," says Xu. "A cloud native infrastructure will not only save you money and allow you to be more in control of the infrastructure resources you consume, but also empower new product innovation, new experience for your users, and new business possibilities. It's both a cost reducer and a money maker."

diff --git a/content/en/case-studies/pinterest/index.html b/content/en/case-studies/pinterest/index.html index e4be7031bb..b95fff054a 100644 --- a/content/en/case-studies/pinterest/index.html +++ b/content/en/case-studies/pinterest/index.html @@ -3,106 +3,82 @@ title: Pinterest Case Study linkTitle: Pinterest case_study_styles: true cid: caseStudies -css: /css/style_case_studies.css featured: false weight: 30 quote: > We are in the position to run things at scale, in a public cloud environment, and test things out in way that a lot of people might not be able to do. + +new_case_study_styles: true +heading_background: /images/case-studies/pinterest/banner1.jpg +heading_title_logo: /images/pinterest_logo.png +subheading: > + Pinning Its Past, Present, and Future on Cloud Native +case_study_details: + - Company: Pinterest + - Location: San Francisco, California + - Industry: Web and Mobile App --- +

Challenge

-
-

CASE STUDY:
Pinning Its Past, Present, and Future on Cloud Native +

After eight years in existence, Pinterest had grown into 1,000 microservices and multiple layers of infrastructure and diverse set-up tools and platforms. In 2016 the company launched a roadmap towards a new compute platform, led by the vision of creating the fastest path from an idea to production, without making engineers worry about the underlying infrastructure.

-

+

Solution

-
- -
- Company  Pinterest     Location  San Francisco, California     Industry  Web and Mobile App -
- -
-
-
-
-

-

Challenge

- After eight years in existence, Pinterest had grown into 1,000 microservices and multiple layers of infrastructure and diverse set-up tools and platforms. In 2016 the company launched a roadmap towards a new compute platform, led by the vision of creating the fastest path from an idea to production, without making engineers worry about the underlying infrastructure. - -

- -

Solution

- The first phase involved moving services to Docker containers. Once these services went into production in early 2017, the team began looking at orchestration to help create efficiencies and manage them in a decentralized way. After an evaluation of various solutions, Pinterest went with Kubernetes. - -

+

The first phase involved moving services to Docker containers. Once these services went into production in early 2017, the team began looking at orchestration to help create efficiencies and manage them in a decentralized way. After an evaluation of various solutions, Pinterest went with Kubernetes.

Impact

- "By moving to Kubernetes the team was able to build on-demand scaling and new failover policies, in addition to simplifying the overall deployment and management of a complicated piece of infrastructure such as Jenkins," says Micheal Benedict, Product Manager for the Cloud and the Data Infrastructure Group at Pinterest. "We not only saw reduced build times but also huge efficiency wins. For instance, the team reclaimed over 80 percent of capacity during non-peak hours. As a result, the Jenkins Kubernetes cluster now uses 30 percent less instance-hours per-day when compared to the previous static cluster." -
-
+

"By moving to Kubernetes the team was able to build on-demand scaling and new failover policies, in addition to simplifying the overall deployment and management of a complicated piece of infrastructure such as Jenkins," says Micheal Benedict, Product Manager for the Cloud and the Data Infrastructure Group at Pinterest. "We not only saw reduced build times but also huge efficiency wins. For instance, the team reclaimed over 80 percent of capacity during non-peak hours. As a result, the Jenkins Kubernetes cluster now uses 30 percent less instance-hours per-day when compared to the previous static cluster."

-
-
- -


-"So far it’s been good, especially the elasticity around how we can configure our Jenkins workloads on that Kubernetes shared cluster. That is the win we were pushing for."

— Micheal Benedict, Product Manager for the Cloud and the Data Infrastructure Group at Pinterest
+{{< case-studies/quote author="Micheal Benedict, Product Manager for the Cloud and the Data Infrastructure Group at Pinterest" >}} + +
+"So far it's been good, especially the elasticity around how we can configure our Jenkins workloads on that Kubernetes shared cluster. That is the win we were pushing for." +{{< /case-studies/quote >}} -
-
-
-
-

Pinterest was born on the cloud—running on AWS since day one in 2010—but even cloud native companies can experience some growing pains. Since its launch, Pinterest has become a household name, with more than 200 million active monthly users and 100 billion objects saved. Underneath the hood, there are 1,000 microservices running and hundreds of thousands of data jobs.

-With such growth came layers of infrastructure and diverse set-up tools and platforms for the different workloads, resulting in an inconsistent and complex end-to-end developer experience, and ultimately less velocity to get to production. -So in 2016, the company launched a roadmap toward a new compute platform, led by the vision of having the fastest path from an idea to production, without making engineers worry about the underlying infrastructure.

-The first phase involved moving to Docker. "Pinterest has been heavily running on virtual machines, on EC2 instances directly, for the longest time," says Micheal Benedict, Product Manager for the Cloud and the Data Infrastructure Group. "To solve the problem around packaging software and not make engineers own portions of the fleet and those kinds of challenges, we standardized the packaging mechanism and then moved that to the container on top of the VM. Not many drastic changes. We didn’t want to boil the ocean at that point." +{{< case-studies/lead >}} +Pinterest was born on the cloud—running on AWS since day one in 2010—but even cloud native companies can experience some growing pains. +{{< /case-studies/lead >}} -
-
-
-
- "Though Kubernetes lacked certain things we wanted, we realized that by the time we get to productionizing many of those things, we’ll be able to leverage what the community is doing."

— MICHEAL BENEDICT, PRODUCT MANAGER FOR THE CLOUD AND THE DATA INFRASTRUCTURE GROUP AT PINTEREST
-
-
-
-
+

Since its launch, Pinterest has become a household name, with more than 200 million active monthly users and 100 billion objects saved. Underneath the hood, there are 1,000 microservices running and hundreds of thousands of data jobs.

-The first service that was migrated was the monolith API fleet that powers most of Pinterest. At the same time, Benedict’s infrastructure governance team built chargeback and capacity planning systems to analyze how the company uses its virtual machines on AWS. "It became clear that running on VMs is just not sustainable with what we’re doing," says Benedict. "A lot of resources were underutilized. There were efficiency efforts, which worked fine at a certain scale, but now you have to move to a more decentralized way of managing that. So orchestration was something we thought could help solve that piece."

-That led to the second phase of the roadmap. In July 2017, after an eight-week evaluation period, the team chose Kubernetes over other orchestration platforms. "Kubernetes lacked certain things at the time—for example, we wanted Spark on Kubernetes," says Benedict. "But we realized that the dev cycles we would put in to even try building that is well worth the outcome, both for Pinterest as well as the community. We’ve been in those conversations in the Big Data SIG. We realized that by the time we get to productionizing many of those things, we’ll be able to leverage what the community is doing."

-At the beginning of 2018, the team began onboarding its first use case into the Kubernetes system: Jenkins workloads. "Although we have builds happening during a certain period of the day, we always need to allocate peak capacity," says Benedict. "They don’t have any auto-scaling capabilities, so that capacity stays constant. It is difficult to speed up builds because ramping up takes more time. So given those kind of concerns, we thought that would be a perfect use case for us to work on." +

With such growth came layers of infrastructure and diverse set-up tools and platforms for the different workloads, resulting in an inconsistent and complex end-to-end developer experience, and ultimately less velocity to get to production. So in 2016, the company launched a roadmap toward a new compute platform, led by the vision of having the fastest path from an idea to production, without making engineers worry about the underlying infrastructure.

+

The first phase involved moving to Docker. "Pinterest has been heavily running on virtual machines, on EC2 instances directly, for the longest time," says Micheal Benedict, Product Manager for the Cloud and the Data Infrastructure Group. "To solve the problem around packaging software and not make engineers own portions of the fleet and those kinds of challenges, we standardized the packaging mechanism and then moved that to the container on top of the VM. Not many drastic changes. We didn't want to boil the ocean at that point."

-
-
-
-
-"So far it’s been good, especially the elasticity around how we can configure our Jenkins workloads on Kubernetes shared cluster. That is the win we were pushing for."

— MICHEAL BENEDICT, PRODUCT MANAGER FOR THE CLOUD AND THE DATA INFRASTRUCTURE GROUP AT PINTEREST
-
-
+{{< case-studies/quote + image="/images/case-studies/pinterest/banner3.jpg" + author="MICHEAL BENEDICT, PRODUCT MANAGER FOR THE CLOUD AND THE DATA INFRASTRUCTURE GROUP AT PINTEREST" +>}} +"Though Kubernetes lacked certain things we wanted, we realized that by the time we get to productionizing many of those things, we'll be able to leverage what the community is doing." +{{< /case-studies/quote >}} -
-
- They ramped up the cluster, and working with a team of four people, got the Jenkins Kubernetes cluster ready for production. "We still have our static Jenkins cluster," says Benedict, "but on Kubernetes, we are doing similar builds, testing the entire pipeline, getting the artifact ready and just doing the comparison to see, how much time did it take to build over here. Is the SLA okay, is the artifact generated correct, are there issues there?"

-"So far it’s been good," he adds, "especially the elasticity around how we can configure our Jenkins workloads on Kubernetes shared cluster. That is the win we were pushing for."

-By the end of Q1 2018, the team successfully migrated Jenkins Master to run natively on Kubernetes and also collaborated on the Jenkins Kubernetes Plugin to manage the lifecycle of workers. "We’re currently building the entire Pinterest JVM stack (one of the larger monorepos at Pinterest which was recently bazelized) on this new cluster," says Benedict. "At peak, we run thousands of pods on a few hundred nodes. Overall, by moving to Kubernetes the team was able to build on-demand scaling and new failover policies, in addition to simplifying the overall deployment and management of a complicated piece of infrastructure such as Jenkins. We not only saw reduced build times but also huge efficiency wins. For instance, the team reclaimed over 80 percent of capacity during non-peak hours. As a result, the Jenkins Kubernetes cluster now uses 30 percent less instance-hours per-day when compared to the previous static cluster." +

The first service that was migrated was the monolith API fleet that powers most of Pinterest. At the same time, Benedict's infrastructure governance team built chargeback and capacity planning systems to analyze how the company uses its virtual machines on AWS. "It became clear that running on VMs is just not sustainable with what we're doing," says Benedict. "A lot of resources were underutilized. There were efficiency efforts, which worked fine at a certain scale, but now you have to move to a more decentralized way of managing that. So orchestration was something we thought could help solve that piece."

+

That led to the second phase of the roadmap. In July 2017, after an eight-week evaluation period, the team chose Kubernetes over other orchestration platforms. "Kubernetes lacked certain things at the time—for example, we wanted Spark on Kubernetes," says Benedict. "But we realized that the dev cycles we would put in to even try building that is well worth the outcome, both for Pinterest as well as the community. We've been in those conversations in the Big Data SIG. We realized that by the time we get to productionizing many of those things, we'll be able to leverage what the community is doing."

-
+

At the beginning of 2018, the team began onboarding its first use case into the Kubernetes system: Jenkins workloads. "Although we have builds happening during a certain period of the day, we always need to allocate peak capacity," says Benedict. "They don't have any auto-scaling capabilities, so that capacity stays constant. It is difficult to speed up builds because ramping up takes more time. So given those kind of concerns, we thought that would be a perfect use case for us to work on."

-
-
- "We are in the position to run things at scale, in a public cloud environment, and test things out in way that a lot of people might not be able to do."

— MICHEAL BENEDICT, PRODUCT MANAGER FOR THE CLOUD AND THE DATA INFRASTRUCTURE GROUP AT PINTEREST
-
-
+{{< case-studies/quote + image="/images/case-studies/pinterest/banner4.jpg" + author="MICHEAL BENEDICT, PRODUCT MANAGER FOR THE CLOUD AND THE DATA INFRASTRUCTURE GROUP AT PINTEREST" +>}} +"So far it's been good, especially the elasticity around how we can configure our Jenkins workloads on Kubernetes shared cluster. That is the win we were pushing for." +{{< /case-studies/quote >}} -
- Benedict points to a "pretty robust roadmap" going forward. In addition to the Pinterest big data team’s experiments with Spark on Kubernetes, the company collaborated with Amazon’s EKS team on an ENI/CNI plug in.

-Once the Jenkins cluster is up and running out of dark mode, Benedict hopes to establish best practices, including having governance primitives established—including integration with the chargeback system—before moving on to migrating the next service. "We have a healthy pipeline of use-cases to be on-boarded. After Jenkins, we want to enable support for Tensorflow and Apache Spark. At some point, we aim to move the company’s monolithic API service. If we move that and understand the complexity around that, it builds our confidence," says Benedict. "It sets us up for migration of all our other services."

-After years of being a cloud native pioneer, Pinterest is eager to share its ongoing journey. "We are in the position to run things at scale, in a public cloud environment, and test things out in way that a lot of people might not be able to do," says Benedict. "We’re in a great position to contribute back some of those learnings." +

They ramped up the cluster, and working with a team of four people, got the Jenkins Kubernetes cluster ready for production. "We still have our static Jenkins cluster," says Benedict, "but on Kubernetes, we are doing similar builds, testing the entire pipeline, getting the artifact ready and just doing the comparison to see, how much time did it take to build over here. Is the SLA okay, is the artifact generated correct, are there issues there?"

+

"So far it's been good," he adds, "especially the elasticity around how we can configure our Jenkins workloads on Kubernetes shared cluster. That is the win we were pushing for."

+

By the end of Q1 2018, the team successfully migrated Jenkins Master to run natively on Kubernetes and also collaborated on the Jenkins Kubernetes Plugin to manage the lifecycle of workers. "We're currently building the entire Pinterest JVM stack (one of the larger monorepos at Pinterest which was recently bazelized) on this new cluster," says Benedict. "At peak, we run thousands of pods on a few hundred nodes. Overall, by moving to Kubernetes the team was able to build on-demand scaling and new failover policies, in addition to simplifying the overall deployment and management of a complicated piece of infrastructure such as Jenkins. We not only saw reduced build times but also huge efficiency wins. For instance, the team reclaimed over 80 percent of capacity during non-peak hours. As a result, the Jenkins Kubernetes cluster now uses 30 percent less instance-hours per-day when compared to the previous static cluster."

-
+{{< case-studies/quote author="MICHEAL BENEDICT, PRODUCT MANAGER FOR THE CLOUD AND THE DATA INFRASTRUCTURE GROUP AT PINTEREST">}} +"We are in the position to run things at scale, in a public cloud environment, and test things out in way that a lot of people might not be able to do." +{{< /case-studies/quote >}} -
+

Benedict points to a "pretty robust roadmap" going forward. In addition to the Pinterest big data team's experiments with Spark on Kubernetes, the company collaborated with Amazon's EKS team on an ENI/CNI plug in.

+ +

Once the Jenkins cluster is up and running out of dark mode, Benedict hopes to establish best practices, including having governance primitives established—including integration with the chargeback system—before moving on to migrating the next service. "We have a healthy pipeline of use-cases to be on-boarded. After Jenkins, we want to enable support for Tensorflow and Apache Spark. At some point, we aim to move the company's monolithic API service. If we move that and understand the complexity around that, it builds our confidence," says Benedict. "It sets us up for migration of all our other services."

+ +

After years of being a cloud native pioneer, Pinterest is eager to share its ongoing journey. "We are in the position to run things at scale, in a public cloud environment, and test things out in way that a lot of people might not be able to do," says Benedict. "We're in a great position to contribute back some of those learnings."

diff --git a/content/en/case-studies/prowise/index.html b/content/en/case-studies/prowise/index.html index 2f0beda5ae..b8b584aa41 100644 --- a/content/en/case-studies/prowise/index.html +++ b/content/en/case-studies/prowise/index.html @@ -3,97 +3,81 @@ title: Prowise Case Study linkTitle: prowise case_study_styles: true cid: caseStudies -css: /css/style_case_studies.css featured: false + +new_case_study_styles: true +heading_background: /images/case-studies/prowise/banner1.jpg +heading_title_logo: /images/prowise_logo.png +subheading: > + Prowise: How Kubernetes is Enabling the Edtech Solution's Global Expansion +case_study_details: + - Company: Prowise + - Location: Budel, The Netherlands + - Industry: Edtech --- +

Challenge

-
-

CASE STUDY:
Prowise: How Kubernetes is Enabling the Edtech Solution’s Global Expansion -

+

A Dutch company that produces educational devices and software used around the world, Prowise had an infrastructure based on Linux services with multiple availability zones in Europe, Australia, and the U.S. "We've grown a lot in the past couple of years, and we started to encounter problems with versioning and flexible scaling," says Senior DevOps Engineer Victor van den Bosch, "not only scaling in demands, but also in being able to deploy multiple products which all have their own versions, their own development teams, and their own problems that they're trying to solve. To be able to put that all on the same platform without much resistance is what we were looking for. We wanted to future proof our infrastructure, and also solve some of the problems that are associated with just running a normal Linux service."

-
+

Solution

-
- Company  Prowise     Location  Budel, The Netherlands      Industry  Edtech -
+

The Prowise team adopted containerization, spent time improving its CI/CD pipelines, and chose Microsoft Azure's managed Kubernetes service, AKS, for orchestration. "Kubernetes solves things like networking really well, in a way that fits our business model," says van den Bosch. "We want to focus on our core products, and that's the software that runs on it and not necessarily the infrastructure itself."

-
-
-
-
- -

Challenge

- A Dutch company that produces educational devices and software used around the world, Prowise had an infrastructure based on Linux services with multiple availability zones in Europe, Australia, and the U.S. “We’ve grown a lot in the past couple of years, and we started to encounter problems with versioning and flexible scaling,” says Senior DevOps Engineer Victor van den Bosch, “not only scaling in demands, but also in being able to deploy multiple products which all have their own versions, their own development teams, and their own problems that they’re trying to solve. To be able to put that all on the same platform without much resistance is what we were looking for. We wanted to future proof our infrastructure, and also solve some of the problems that are associated with just running a normal Linux service.” -

Solution

- The Prowise team adopted containerization, spent time improving its CI/CD pipelines, and chose Microsoft Azure’s managed Kubernetes service, AKS, for orchestration. “Kubernetes solves things like networking really well, in a way that fits our business model,” says van den Bosch. “We want to focus on our core products, and that’s the software that runs on it and not necessarily the infrastructure itself.”

Impact

- With its first web-based applications now running in beta on Prowise’s Kubernetes platform, the team is seeing the benefits of rapid and smooth deployments. “The old way of deploying took half an hour of preparations and half an hour deploying it. With Kubernetes, it’s a couple of seconds,” says Senior Developer Bart Haalstra. As a result, adds van den Bosch, “We’ve gone from quarterly releases to a release every month in production. We’re pretty much deploying every hour or just when we find that a feature is ready for production; before, our releases were mostly done on off-hours, where it couldn’t impact our customers, as our confidence in the process was relatively low. Kubernetes has also enabled us to follow up quickly on bugs and implement tweaks to our users with zero downtime between versions. For some bugs we’ve pushed code fixes to production minutes after detection.” Recently, the team launched a new single sign-on solution for use in an internal application. “Due to the resource based architecture of the Kubernetes platform, we were able to bring that application into an entirely new production environment in less than a day, most of that time used for testing after applying the already well-known resource definitions from staging to the new environment,” says van den Bosch. “On a traditional VM this would have likely cost a day or two, and then probably a few weeks to iron out the kinks in our provisioning scripts as we apply updates.” -
-
-
-
-
- "Because of Kubernetes, things have been much easier, our individual applications are better, and we can spend more time on functional implementation. We do not want to go back." -

— VICTOR VAN DEN BOSCH, SENIOR DEVOPS ENGINEER, PROWISE
-
-
-
-
-

If you haven’t set foot in a school in awhile, you might be surprised by what you’d see in a digitally connected classroom these days: touchscreen monitors, laptops, tablets, touch tables, and more.

- One of the leaders in the space, the Dutch company Prowise, offers an integrated solution of hardware and software to help educators create a more engaging learning environment. -

- As the company expanded its offerings beyond the Netherlands in recent years—creating multiple availability zones in Europe, Australia, and the U.S., with as many as nine servers per zone—its Linux service-based infrastructure struggled to keep up. “We’ve grown a lot in the past couple of years, and we started to encounter problems with versioning and flexible scaling,” says Senior DevOps Engineer Victor van den Bosch, who was hired by the company in late 2017 to build a new platform. -

- Prowise’s products support ten languages, so the problem wasn’t just scaling in demands, he adds, “but also in being able to deploy multiple products which all have their own versions, their own development teams, and their own problems that they’re trying to solve. To be able to put that all on the same platform without much resistance is what we were looking for. We wanted to future proof our infrastructure, and also solve some of the problems that are associated with just running a normal Linux service.” -

- The company’s existing infrastructure on Microsoft Azure Cloud was all on virtual machines, “a pretty traditional setup,” van den Bosch says. “We decided that we want some features in our software that requires being able to scale quickly, being able to deploy new applications and versions on different versions of different programming languages quickly. And we didn’t really want the hassle of trying to keep those servers in a particular state.” -
-
-
-
- "You don’t have to go all-in immediately. You can just take a few projects, a service, run it alongside your more traditional stack, and build it up from there. Kubernetes scales, so as you add applications and services to it, it will scale with you. You don’t have to do it all at once, and that’s really a secret to everything, but especially true to Kubernetes."

— VICTOR VAN DEN BOSCH, SENIOR DEVOPS ENGINEER, PROWISE
-
-
-
-
- After researching possible solutions, he opted for containerization and Kubernetes orchestration. “Containerization is the future,” van den Bosch says. “Kubernetes solves things like networking really well, in a way that fits our business model. We want to focus on our core products, and that’s the software that runs on it and not necessarily the infrastructure itself.” Plus, the Prowise team liked that there was no vendor lock-in. “We don’t want to be limited to one platform,” he says. “We try not to touch products that are very proprietary and can’t be ported easily to another vendor.” -

- The time to market with Kubernetes was very short: The first web-based applications on the platform went into beta within a few months. That was largely made possible by van den Bosch’s decision to use Azure’s managed Kubernetes service, AKS. The team then had to figure out which components to keep and which to replace. Monitoring tools like New Relic were taken out “because they tend to become very expensive when you scale it to different availability zones, and it’s just not very maintainable,” he says. -

- A lot of work also went into improving Prowise’s CI/CD pipelines. “We wanted to make sure that the pipelines are automated and easy to use,” he says. “We have a lot of settings and configurations figured out for the pipelines, and it’s just applying those scripts and those configurations to new projects from here on out.” -

- With its first web-based applications now running in beta on Prowise’s Kubernetes platform, the team is seeing the benefits of rapid and smooth deployments. “The old way of deploying took half an hour of preparations and half an hour deploying it. With Kubernetes, it’s a couple of seconds,” says Senior Developer Bart Haalstra. As a result, adds van den Bosch, “We’ve gone from quarterly releases to a release every month in production. We’re pretty much deploying every hour or just when we find that a feature is ready for production. Before, our releases were mostly done on off-hours, where it couldn’t impact our customers, as our confidence the process itself was relatively low. With Kubernetes, we dare to deploy in the middle of a busy day with high confidence the deployment will succeed.” -
-
-
-
- "Kubernetes allows us to really consider the best tools for a problem. Want to have a full-fledged analytics application developed by a third party that is just right for your use case? Run it. Dabbling in machine learning and AI algorithms but getting tired of waiting days for training to complete? It takes only seconds to scale it. Got a stubborn developer that wants to use a programming language no one has heard of? Let him, if it runs in a container, of course. And all of that while your operations team/DevOps get to sleep at night."

- VICTOR VAN DEN BOSCH, SENIOR DEVOPS ENGINEER, PROWISE
-
-
+

With its first web-based applications now running in beta on Prowise's Kubernetes platform, the team is seeing the benefits of rapid and smooth deployments. "The old way of deploying took half an hour of preparations and half an hour deploying it. With Kubernetes, it's a couple of seconds," says Senior Developer Bart Haalstra. As a result, adds van den Bosch, "We've gone from quarterly releases to a release every month in production. We're pretty much deploying every hour or just when we find that a feature is ready for production; before, our releases were mostly done on off-hours, where it couldn't impact our customers, as our confidence in the process was relatively low. Kubernetes has also enabled us to follow up quickly on bugs and implement tweaks to our users with zero downtime between versions. For some bugs we've pushed code fixes to production minutes after detection." Recently, the team launched a new single sign-on solution for use in an internal application. "Due to the resource based architecture of the Kubernetes platform, we were able to bring that application into an entirely new production environment in less than a day, most of that time used for testing after applying the already well-known resource definitions from staging to the new environment," says van den Bosch. "On a traditional VM this would have likely cost a day or two, and then probably a few weeks to iron out the kinks in our provisioning scripts as we apply updates."

-
+{{< case-studies/quote author="VICTOR VAN DEN BOSCH, SENIOR DEVOPS ENGINEER, PROWISE" >}} +"Because of Kubernetes, things have been much easier, our individual applications are better, and we can spend more time on functional implementation. We do not want to go back." +{{< /case-studies/quote >}} -
- Plus, van den Bosch says, “Kubernetes has enabled us to follow up quickly on bugs and implement tweaks to our users with zero downtime between versions. For some bugs we’ve pushed code fixes to production minutes after detection.” -

- Recently, the team launched a new single sign-on solution for use in an internal application. “Due to the resource based architecture of the Kubernetes platform, we were able to bring that application into an entirely new production environment in less than a day, most of that time used for testing after applying the already well-known resource definitions from staging to the new environment,” says van den Bosch. “On a traditional VM this would have likely cost a day or two, and then probably a few weeks to iron out the kinks in our provisioning scripts as we apply updates.” -

- Legacy applications are also being moved to Kubernetes. Not long ago, the team needed to set up a Java-based application for compiling and running a frontend. “On a traditional VM, it would have taken quite a bit of time to set it up and keep it up to date, not to mention maintenance for that setup down the line,” says van den Bosch. Instead, it took less than half a day to Dockerize it and get it running on Kubernetes. “It was much easier, and we were able to save costs too because we didn’t have to spin up new VMs specially for it.” -
+{{< case-studies/lead >}} +If you haven't set foot in a school in awhile, you might be surprised by what you'd see in a digitally connected classroom these days: touchscreen monitors, laptops, tablets, touch tables, and more. +{{< /case-studies/lead >}} -
-
-"We’re really trying to deliver integrated solutions with our hardware and software and making it as easy as possible for users to use and collaborate from different places,” says van den Bosch. And, says Haalstra, “We cannot do it without Kubernetes."

- VICTOR VAN DEN BOSCH, SENIOR DEVOPS ENGINEER, PROWISE
-
+

One of the leaders in the space, the Dutch company Prowise, offers an integrated solution of hardware and software to help educators create a more engaging learning environment.

-
- Perhaps most importantly, van den Bosch says, “Kubernetes allows us to really consider the best tools for a problem and take full advantage of microservices architecture. Got a library in Node.js that excels at solving a certain problem? Use it. Want to have a full-fledged analytics application developed by a third party that is just right for your use case? Run it. Dabbling in machine learning and AI algorithms but getting tired of waiting days for training to complete? It takes only seconds to scale it. Got a stubborn developer that wants to use a programming language no one has heard of? Let him, if it runs in a container, of course. And all of that while your operations team/DevOps get to sleep at night.” -

- Looking ahead, all new web development, platforms, and APIs at Prowise will be on Kubernetes. One of the big greenfield projects is a platform for teachers and students that is launching for back-to-school season in September. Users will be able to log in and access a wide variety of educational applications. With the recent acquisition of the software company Oefenweb, Prowise plans to provide adaptive software that allows teachers to get an accurate view of their students’ progress and weak points, and automatically adjusts the difficulty level of assignments to suit individual students. “We will be leveraging Kubernetes’ power to integrate, supplement, and support our combined application portfolio and bring our solutions to more classrooms,” says van den Bosch. -

- Collaborative software is also a priority. With the single sign-in software, users’ settings and credentials are saved in the cloud and can be used on any screen in the world. “We’re really trying to deliver integrated solutions with our hardware and software and making it as easy as possible for users to use and collaborate from different places,” says van den Bosch. And, says Haalstra, “We cannot do it without Kubernetes.” -
+

As the company expanded its offerings beyond the Netherlands in recent years—creating multiple availability zones in Europe, Australia, and the U.S., with as many as nine servers per zone—its Linux service-based infrastructure struggled to keep up. "We've grown a lot in the past couple of years, and we started to encounter problems with versioning and flexible scaling," says Senior DevOps Engineer Victor van den Bosch, who was hired by the company in late 2017 to build a new platform.

-
+

Prowise's products support ten languages, so the problem wasn't just scaling in demands, he adds, "but also in being able to deploy multiple products which all have their own versions, their own development teams, and their own problems that they're trying to solve. To be able to put that all on the same platform without much resistance is what we were looking for. We wanted to future proof our infrastructure, and also solve some of the problems that are associated with just running a normal Linux service."

+ +

The company's existing infrastructure on Microsoft Azure Cloud was all on virtual machines, "a pretty traditional setup," van den Bosch says. "We decided that we want some features in our software that requires being able to scale quickly, being able to deploy new applications and versions on different versions of different programming languages quickly. And we didn't really want the hassle of trying to keep those servers in a particular state."

+ +{{< case-studies/quote + image="/images/case-studies/prowise/banner3.jpg" + author="VICTOR VAN DEN BOSCH, SENIOR DEVOPS ENGINEER, PROWISE" +>}} +"You don't have to go all-in immediately. You can just take a few projects, a service, run it alongside your more traditional stack, and build it up from there. Kubernetes scales, so as you add applications and services to it, it will scale with you. You don't have to do it all at once, and that's really a secret to everything, but especially true to Kubernetes." +{{< /case-studies/quote >}} + +

After researching possible solutions, he opted for containerization and Kubernetes orchestration. "Containerization is the future," van den Bosch says. "Kubernetes solves things like networking really well, in a way that fits our business model. We want to focus on our core products, and that's the software that runs on it and not necessarily the infrastructure itself." Plus, the Prowise team liked that there was no vendor lock-in. "We don't want to be limited to one platform," he says. "We try not to touch products that are very proprietary and can't be ported easily to another vendor."

+ +

The time to market with Kubernetes was very short: The first web-based applications on the platform went into beta within a few months. That was largely made possible by van den Bosch's decision to use Azure's managed Kubernetes service, AKS. The team then had to figure out which components to keep and which to replace. Monitoring tools like New Relic were taken out "because they tend to become very expensive when you scale it to different availability zones, and it's just not very maintainable," he says.

+ +

A lot of work also went into improving Prowise's CI/CD pipelines. "We wanted to make sure that the pipelines are automated and easy to use," he says. "We have a lot of settings and configurations figured out for the pipelines, and it's just applying those scripts and those configurations to new projects from here on out."

+ +

With its first web-based applications now running in beta on Prowise's Kubernetes platform, the team is seeing the benefits of rapid and smooth deployments. "The old way of deploying took half an hour of preparations and half an hour deploying it. With Kubernetes, it's a couple of seconds," says Senior Developer Bart Haalstra. As a result, adds van den Bosch, "We've gone from quarterly releases to a release every month in production. We're pretty much deploying every hour or just when we find that a feature is ready for production. Before, our releases were mostly done on off-hours, where it couldn't impact our customers, as our confidence the process itself was relatively low. With Kubernetes, we dare to deploy in the middle of a busy day with high confidence the deployment will succeed."

+ +{{< case-studies/quote + image="/images/case-studies/prowise/banner4.jpg" + author="VICTOR VAN DEN BOSCH, SENIOR DEVOPS ENGINEER, PROWISE" +>}} +"Kubernetes allows us to really consider the best tools for a problem. Want to have a full-fledged analytics application developed by a third party that is just right for your use case? Run it. Dabbling in machine learning and AI algorithms but getting tired of waiting days for training to complete? It takes only seconds to scale it. Got a stubborn developer that wants to use a programming language no one has heard of? Let him, if it runs in a container, of course. And all of that while your operations team/DevOps get to sleep at night." +{{< /case-studies/quote >}} + +

Plus, van den Bosch says, "Kubernetes has enabled us to follow up quickly on bugs and implement tweaks to our users with zero downtime between versions. For some bugs we've pushed code fixes to production minutes after detection."

+ +

Recently, the team launched a new single sign-on solution for use in an internal application. "Due to the resource based architecture of the Kubernetes platform, we were able to bring that application into an entirely new production environment in less than a day, most of that time used for testing after applying the already well-known resource definitions from staging to the new environment," says van den Bosch. "On a traditional VM this would have likely cost a day or two, and then probably a few weeks to iron out the kinks in our provisioning scripts as we apply updates."

+ +

Legacy applications are also being moved to Kubernetes. Not long ago, the team needed to set up a Java-based application for compiling and running a frontend. "On a traditional VM, it would have taken quite a bit of time to set it up and keep it up to date, not to mention maintenance for that setup down the line," says van den Bosch. Instead, it took less than half a day to Dockerize it and get it running on Kubernetes. "It was much easier, and we were able to save costs too because we didn't have to spin up new VMs specially for it."

+ +{{< case-studies/quote author="VICTOR VAN DEN BOSCH, SENIOR DEVOPS ENGINEER, PROWISE" >}} +"We're really trying to deliver integrated solutions with our hardware and software and making it as easy as possible for users to use and collaborate from different places," says van den Bosch. And, says Haalstra, "We cannot do it without Kubernetes." +{{< /case-studies/quote >}} + +

Perhaps most importantly, van den Bosch says, "Kubernetes allows us to really consider the best tools for a problem and take full advantage of microservices architecture. Got a library in Node.js that excels at solving a certain problem? Use it. Want to have a full-fledged analytics application developed by a third party that is just right for your use case? Run it. Dabbling in machine learning and AI algorithms but getting tired of waiting days for training to complete? It takes only seconds to scale it. Got a stubborn developer that wants to use a programming language no one has heard of? Let him, if it runs in a container, of course. And all of that while your operations team/DevOps get to sleep at night."

+ +

Looking ahead, all new web development, platforms, and APIs at Prowise will be on Kubernetes. One of the big greenfield projects is a platform for teachers and students that is launching for back-to-school season in September. Users will be able to log in and access a wide variety of educational applications. With the recent acquisition of the software company Oefenweb, Prowise plans to provide adaptive software that allows teachers to get an accurate view of their students' progress and weak points, and automatically adjusts the difficulty level of assignments to suit individual students. "We will be leveraging Kubernetes' power to integrate, supplement, and support our combined application portfolio and bring our solutions to more classrooms," says van den Bosch.

+ +

Collaborative software is also a priority. With the single sign-in software, users' settings and credentials are saved in the cloud and can be used on any screen in the world. "We're really trying to deliver integrated solutions with our hardware and software and making it as easy as possible for users to use and collaborate from different places," says van den Bosch. And, says Haalstra, "We cannot do it without Kubernetes."

diff --git a/content/en/case-studies/ricardo-ch/index.html b/content/en/case-studies/ricardo-ch/index.html index 2863ceac75..26a1975ab4 100644 --- a/content/en/case-studies/ricardo-ch/index.html +++ b/content/en/case-studies/ricardo-ch/index.html @@ -3,96 +3,77 @@ title: ricardo.ch Case Study linkTitle: ricardo-ch case_study_styles: true cid: caseStudies -css: /css/style_case_studies.css featured: false + +new_case_study_styles: true +heading_background: /images/case-studies/ricardoch/banner1.png +heading_title_logo: /images/ricardoch_logo.png +subheading: > + ricardo.ch: How Kubernetes Improved Velocity and DevOps Harmony +case_study_details: + - Company: ricardo.ch + - Location: Zurich, Switzerland + - Industry: E-commerce --- -
-

CASE STUDY:
ricardo.ch: How Kubernetes Improved Velocity and DevOps Harmony -

+

Challenge

-
+

A Swiss online marketplace, ricardo.ch was experiencing problems with velocity, as well as a "classic gap" between Development and Operations, with the two sides unable to work well together. "They wanted to, but they didn't have common ground," says Cedric Meury, Head of Platform Engineering. "This was one of the root causes that slowed us down." The company began breaking down the legacy monolith into microservices, and needed orchestration to support the new architecture in its own data centers—as well as bring together Dev and Ops.

-
- Company  ricardo.ch     Location  Zurich, Switzerland      Industry  E-commerce -
+

Solution

-
-
-
-
+

The company adopted Kubernetes for cluster management, Prometheus for monitoring, and Fluentd for logging. The first cluster was deployed on premise in December 2016, with the first service in production three months later. The migration is about half done, and the company plans to move completely to Google Cloud Platform by the end of 2018.

-

Challenge

- A Swiss online marketplace, ricardo.ch was experiencing problems with velocity, as well as a "classic gap" between Development and Operations, with the two sides unable to work well together. "They wanted to, but they didn’t have common ground," says Cedric Meury, Head of Platform Engineering. "This was one of the root causes that slowed us down." The company began breaking down the legacy monolith into microservices, and needed orchestration to support the new architecture in its own data centers—as well as bring together Dev and Ops. -

Solution

- The company adopted Kubernetes for cluster management, Prometheus for monitoring, and Fluentd for logging. The first cluster was deployed on premise in December 2016, with the first service in production three months later. The migration is about half done, and the company plans to move completely to Google Cloud Platform by the end of 2018.

Impact

- Splitting up the monolith into microservices "allowed higher velocity, and Kubernetes was crucial to support that," says Meury. The number of deployments to production has gone from fewer than 10 a week to 30-60 per day. Before, "when there was a problem with something in production, tickets or complaints would be thrown over the wall to operations, the classical problem. Now, people have the chance to look into operations and troubleshoot for themselves first because everything is deployed in a standardized way," says Meury. He sees the impact in everyday interactions: "A couple of weeks ago, I saw a product manager doing a pull request for a JSON file that contains some variables, and someone else accepted it. And it was deployed after a couple of minutes or seconds even, which was unthinkable before. There used to be quite a chain of things that needed to happen, the whole monolith was difficult to understand, even for engineers. So, previously requests would go into large, inefficient Kanban boards and hopefully someone will have done the change after weeks and months." Before, infrastructure- and platform-related projects took months or years to complete; now developers and operators can work together to deploy infrastructure parts via Kubernetes in a matter of weeks and sometimes days. In the long run, the company also expects to notch 50% cost savings going from custom data center and virtual machines to containerized infrastructure and cloud services. -
-
-
-
-
- "Splitting up the monolith allowed higher velocity, and Kubernetes was crucial to support that. Containerization and orchestration by Kubernetes helped us to drastically reduce the conflict between Dev and Ops and also allowed us to speak the same language on both sides of the aisle." -

— CEDRIC MEURY, HEAD OF PLATFORM ENGINEERING, RICARDO.CH
-
-
-
-
-

When Cedric Meury joined ricardo.ch in 2016, he saw a clear divide between Operations and Development. In fact, there was literal distance between them: The engineering team worked in France, while the rest of the org was based in Switzerland. -



- "It was a classic gap between those departments and even some anger and frustration here and there," says Meury. "They wanted to work together, but they didn’t have common ground. This was one of the root causes that slowed us down." -

- That gap was hurting velocity at ricardo.ch, a Swiss online marketplace. The website processes up to 2.6 million searches on a peak day from both web and mobile apps, serving 3.2 million members with its live auctions. The technology team’s main challenge was to make sure that "the bids for items come in the right order, and before the auction is finished, and that this works in a fair way," says Meury. "We have a real-time requirement. We also provide an automated system to bid, and it needs to be accurate and correct. With a distributed system, you have the challenge of making sure that the ordering is right. And that’s one of the things we’re currently dealing with." -

- To address the velocity issue, ricardo.ch CTO Jeremy Seitz established a new software factory called EPD, which consists of 65 engineers, 7 product managers and 2 designers. "We brought these three departments together so that they can kind of streamline this and talk to each other much more closely," says Meury. -
-
-
-
- "Being in the End User Community demonstrates that we stand behind these technologies. In Switzerland, if all the companies see that ricardo.ch’s using it, I think that will help adoption. I also like that we’re connected to the other end users, so if there is a really heavy problem, I could go to the Slack channel, and say, ‘Hey, you guys…’ Like Reddit, Github and New York Times or whoever can give a recommendation on what to use here or how to solve that. So that’s kind of a superpower."

— CEDRIC MEURY, HEAD OF PLATFORM ENGINEERING, RICARDO.CH
-
-
-
-
+

Splitting up the monolith into microservices "allowed higher velocity, and Kubernetes was crucial to support that," says Meury. The number of deployments to production has gone from fewer than 10 a week to 30-60 per day. Before, "when there was a problem with something in production, tickets or complaints would be thrown over the wall to operations, the classical problem. Now, people have the chance to look into operations and troubleshoot for themselves first because everything is deployed in a standardized way," says Meury. He sees the impact in everyday interactions: "A couple of weeks ago, I saw a product manager doing a pull request for a JSON file that contains some variables, and someone else accepted it. And it was deployed after a couple of minutes or seconds even, which was unthinkable before. There used to be quite a chain of things that needed to happen, the whole monolith was difficult to understand, even for engineers. So, previously requests would go into large, inefficient Kanban boards and hopefully someone will have done the change after weeks and months." Before, infrastructure- and platform-related projects took months or years to complete; now developers and operators can work together to deploy infrastructure parts via Kubernetes in a matter of weeks and sometimes days. In the long run, the company also expects to notch 50% cost savings going from custom data center and virtual machines to containerized infrastructure and cloud services.

- The company also began breaking down the legacy monolith into more than 100 microservices, and needed orchestration to support the new architecture in its own data centers. "Splitting up the monolith allowed higher velocity, and Kubernetes was crucial to support that," says Meury. "Containerization and orchestration by Kubernetes helped us to drastically reduce the conflict between Dev and Ops and also allowed us to speak the same language on both sides of the aisle." -

- Meury put together a platform engineering team to choose the tools—including Fluentd for logging and Prometheus for monitoring, with Grafana visualization—and lay the groundwork for the first Kubernetes cluster, which was installed on premise in December 2016. Within a few weeks, the new platform was available to teams, who were given training sessions and documentation. The platform engineering team then embedded with engineers to help them deploy their applications on the new platform. The first service in production was the ricardo.ch jobs page. "It was an exercise in front-end development, so the developers could experiment with a new stack," says Meury. -

- Meury estimates that half of the application has been migrated to Kubernetes. And the plan is to move everything to the Google Cloud Platform by the end of 2018. "We are still running some servers in our own data centers, but all of the containerization efforts and describing our services as Kubernetes manifests will allow us to quite easily make that shift," says Meury. -
-
-
-
- "One of the core moments was when a front-end developer asked me how to do a port forward from his laptop to a front-end application to debug, and I told him the command. And he was like, ‘Wow, that’s all I need to do?’ He was super excited and happy about it. That showed me that this power in the right hands can just accelerate development." -

- CEDRIC MEURY, HEAD OF PLATFORM ENGINEERING, RICARDO.CH
-
-
+{{< case-studies/quote author="CEDRIC MEURY, HEAD OF PLATFORM ENGINEERING, RICARDO.CH" >}} +"Splitting up the monolith allowed higher velocity, and Kubernetes was crucial to support that. Containerization and orchestration by Kubernetes helped us to drastically reduce the conflict between Dev and Ops and also allowed us to speak the same language on both sides of the aisle." +{{< /case-studies/quote >}} -
-
- The impact has been great. Moving from custom data center and virtual machines to containerized infrastructure and cloud services is expected to result in 50% cost savings for the company. The number of deployments to production has gone from fewer than 10 a week to 30-60 per day. Before, "when there was a problem with something in production, tickets or complaints would be thrown over the wall to operations, the classical problem," says Meury. "Now, people have the chance to look into operations and troubleshoot for themselves first because everything is deployed in a standardized way. That reduces time and uncertainty." -

- Meury also sees the impact in everyday interactions: "A couple of weeks ago, I saw a product manager doing a pull request for a JSON file that contains some variables, and someone else accepted it. And it was deployed after a couple of minutes or seconds even, which was unthinkable before. There used to be quite a chain of things that needed to happen, the whole monolith was difficult to understand, even for engineers. So, previously requests would go into large, inefficient Kanban boards and hopefully someone will have done the change after weeks and months." -

- The divide between Dev and Ops has also diminished. "After a couple of months, I got requests by people saying, ‘Hey, could you help me install the Kubernetes client? I want to actually look at what’s going on,’" says Meury. "People were directly looking at the state of the system, bringing them much, much closer to the operations." Before, infrastructure- and platform-related projects took months or years to complete; now developers and operators can work together to deploy infrastructure parts via Kubernetes in a matter of weeks and sometimes days. -
+{{< case-studies/lead >}} +When Cedric Meury joined ricardo.ch in 2016, he saw a clear divide between Operations and Development. In fact, there was literal distance between them: The engineering team worked in France, while the rest of the org was based in Switzerland. +{{< /case-studies/lead >}} -
-
-"One of my colleagues was listening to all the talks at KubeCon, and he was overwhelmed by all the tools, technologies, frameworks out there that are currently lacking on our platform, but at the same time, he’s very happy to know that in the future there is so much that we can still explore and we can improve and we can work on."

- CEDRIC MEURY, HEAD OF PLATFORM ENGINEERING, RICARDO.CH
-
+

"It was a classic gap between those departments and even some anger and frustration here and there," says Meury. "They wanted to work together, but they didn't have common ground. This was one of the root causes that slowed us down."

-
+

That gap was hurting velocity at ricardo.ch, a Swiss online marketplace. The website processes up to 2.6 million searches on a peak day from both web and mobile apps, serving 3.2 million members with its live auctions. The technology team's main challenge was to make sure that "the bids for items come in the right order, and before the auction is finished, and that this works in a fair way," says Meury. "We have a real-time requirement. We also provide an automated system to bid, and it needs to be accurate and correct. With a distributed system, you have the challenge of making sure that the ordering is right. And that's one of the things we're currently dealing with."

+

To address the velocity issue, ricardo.ch CTO Jeremy Seitz established a new software factory called EPD, which consists of 65 engineers, 7 product managers and 2 designers. "We brought these three departments together so that they can kind of streamline this and talk to each other much more closely," says Meury.

- The ability to have insight into the system has extended to other parts of the company, too. "I found out that one of our customer support representatives looks at Grafana metrics to find out whether the system is running fine, which is fantastic," says Meury. "Prometheus is directly hooked into customer care." -

- The ricardo.ch cloud native journey has perhaps had the most impact on the Ops team. "We have an operations team that comes from a hardware-based background, and right now they are relearning how to operate in a more virtualized and cloud native world, with great success so far," says Meury. "So besides still operating on-site data center firewalls, they learn to code in Go or do some Python scripting at the same time. Former network administrators are writing Go code. It’s just really cool. -

- For Meury, the journey boils down to this. "One of my colleagues was listening to all the talks at KubeCon, and he was overwhelmed by all the tools, technologies, frameworks out there that are currently lacking on our platform," says Meury. "But at the same time, he’s very happy to know that in the future there is so much that we can still explore and we can improve and we can work on. We’re transitioning from seeing problems everywhere—like, ‘This is broken’ or ‘This is down, and we have to fix it’—more to, ‘How can we actually improve and automate more, and make it nicer for developers and ultimately for the end users?’" -
+{{< case-studies/quote + image="/images/case-studies/ricardoch/banner3.png" + author="CEDRIC MEURY, HEAD OF PLATFORM ENGINEERING, RICARDO.CH" +>}} +"Being in the End User Community demonstrates that we stand behind these technologies. In Switzerland, if all the companies see that ricardo.ch's using it, I think that will help adoption. I also like that we're connected to the other end users, so if there is a really heavy problem, I could go to the Slack channel, and say, 'Hey, you guys…' Like Reddit, Github and New York Times or whoever can give a recommendation on what to use here or how to solve that. So that's kind of a superpower." +{{< /case-studies/quote >}} -
+

The company also began breaking down the legacy monolith into more than 100 microservices, and needed orchestration to support the new architecture in its own data centers. "Splitting up the monolith allowed higher velocity, and Kubernetes was crucial to support that," says Meury. "Containerization and orchestration by Kubernetes helped us to drastically reduce the conflict between Dev and Ops and also allowed us to speak the same language on both sides of the aisle."

+ +

Meury put together a platform engineering team to choose the tools—including Fluentd for logging and Prometheus for monitoring, with Grafana visualization—and lay the groundwork for the first Kubernetes cluster, which was installed on premise in December 2016. Within a few weeks, the new platform was available to teams, who were given training sessions and documentation. The platform engineering team then embedded with engineers to help them deploy their applications on the new platform. The first service in production was the ricardo.ch jobs page. "It was an exercise in front-end development, so the developers could experiment with a new stack," says Meury.

+ +

Meury estimates that half of the application has been migrated to Kubernetes. And the plan is to move everything to the Google Cloud Platform by the end of 2018. "We are still running some servers in our own data centers, but all of the containerization efforts and describing our services as Kubernetes manifests will allow us to quite easily make that shift," says Meury.

+ +{{< case-studies/quote + image="/images/case-studies/ricardoch/banner4.png" + author="CEDRIC MEURY, HEAD OF PLATFORM ENGINEERING, RICARDO.CH" +>}} +"One of the core moments was when a front-end developer asked me how to do a port forward from his laptop to a front-end application to debug, and I told him the command. And he was like, 'Wow, that's all I need to do?' He was super excited and happy about it. That showed me that this power in the right hands can just accelerate development." +{{< /case-studies/quote >}} + +

The impact has been great. Moving from custom data center and virtual machines to containerized infrastructure and cloud services is expected to result in 50% cost savings for the company. The number of deployments to production has gone from fewer than 10 a week to 30-60 per day. Before, "when there was a problem with something in production, tickets or complaints would be thrown over the wall to operations, the classical problem," says Meury. "Now, people have the chance to look into operations and troubleshoot for themselves first because everything is deployed in a standardized way. That reduces time and uncertainty."

+ +

Meury also sees the impact in everyday interactions: "A couple of weeks ago, I saw a product manager doing a pull request for a JSON file that contains some variables, and someone else accepted it. And it was deployed after a couple of minutes or seconds even, which was unthinkable before. There used to be quite a chain of things that needed to happen, the whole monolith was difficult to understand, even for engineers. So, previously requests would go into large, inefficient Kanban boards and hopefully someone will have done the change after weeks and months."

+ +

The divide between Dev and Ops has also diminished. "After a couple of months, I got requests by people saying, 'Hey, could you help me install the Kubernetes client? I want to actually look at what's going on,'" says Meury. "People were directly looking at the state of the system, bringing them much, much closer to the operations." Before, infrastructure- and platform-related projects took months or years to complete; now developers and operators can work together to deploy infrastructure parts via Kubernetes in a matter of weeks and sometimes days.

+ +{{< case-studies/quote author="CEDRIC MEURY, HEAD OF PLATFORM ENGINEERING, RICARDO.CH" >}} +"One of my colleagues was listening to all the talks at KubeCon, and he was overwhelmed by all the tools, technologies, frameworks out there that are currently lacking on our platform, but at the same time, he's very happy to know that in the future there is so much that we can still explore and we can improve and we can work on." +{{< /case-studies/quote >}} + +

The ability to have insight into the system has extended to other parts of the company, too. "I found out that one of our customer support representatives looks at Grafana metrics to find out whether the system is running fine, which is fantastic," says Meury. "Prometheus is directly hooked into customer care."

+ +

The ricardo.ch cloud native journey has perhaps had the most impact on the Ops team. "We have an operations team that comes from a hardware-based background, and right now they are relearning how to operate in a more virtualized and cloud native world, with great success so far," says Meury. "So besides still operating on-site data center firewalls, they learn to code in Go or do some Python scripting at the same time. Former network administrators are writing Go code. It's just really cool.

+ +

For Meury, the journey boils down to this. "One of my colleagues was listening to all the talks at KubeCon, and he was overwhelmed by all the tools, technologies, frameworks out there that are currently lacking on our platform," says Meury. "But at the same time, he's very happy to know that in the future there is so much that we can still explore and we can improve and we can work on. We're transitioning from seeing problems everywhere—like, 'This is broken' or 'This is down, and we have to fix it'—more to, 'How can we actually improve and automate more, and make it nicer for developers and ultimately for the end users?'"

diff --git a/content/en/case-studies/slamtec/index.html b/content/en/case-studies/slamtec/index.html index 86ebe15f91..e34bf2bd08 100644 --- a/content/en/case-studies/slamtec/index.html +++ b/content/en/case-studies/slamtec/index.html @@ -3,86 +3,69 @@ title: Slamtec Case Study linkTitle: slamtec case_study_styles: true cid: caseStudies -css: /css/style_case_studies.css featured: false + +new_case_study_styles: true +heading_background: /images/case-studies/slamtec/banner1.jpg +heading_title_logo: /images/slamtec_logo.png +case_study_details: + - Company: Slamtec + - Location: Shanghai, China + - Industry: Robotics --- -
-

CASE STUDY:

-

-
+

Challenge

-
- Company  Slamtec     Location  Shanghai, China     Industry  Robotics -
+

Founded in 2013, SLAMTEC provides service robot autonomous localization and navigation solutions. The company's strength lies in its R&D team's ability to quickly introduce, and continually iterate on, its core products. In the past few years, the company, which had a legacy infrastructure based on Alibaba Cloud and VMware vSphere, began looking to build its own stable and reliable container cloud platform to host its Internet of Things applications. "Our needs for the cloud platform included high availability, scalability and security; multi-granularity monitoring alarm capability; friendliness to containers and microservices; and perfect CI/CD support," says Benniu Ji, Director of Cloud Computing Business Division.

-
-
-
-
-

Challenge

- Founded in 2013, SLAMTEC provides service robot autonomous localization and navigation solutions. The company’s strength lies in its R&D team’s ability to quickly introduce, and continually iterate on, its core products. In the past few years, the company, which had a legacy infrastructure based on Alibaba Cloud and VMware vSphere, began looking to build its own stable and reliable container cloud platform to host its Internet of Things applications. "Our needs for the cloud platform included high availability, scalability and security; multi-granularity monitoring alarm capability; friendliness to containers and microservices; and perfect CI/CD support," says Benniu Ji, Director of Cloud Computing Business Division. +

Solution

+ +

Ji's team chose Kubernetes for orchestration. "CNCF brings quality assurance and a complete ecosystem for Kubernetes, which is very important for the wide application of Kubernetes," says Ji. Thus Slamtec decided to adopt other CNCF projects as well: Prometheus monitoring, Fluentd logging, Harbor registry, and Helm package manager.

-

Solution

- Ji’s team chose Kubernetes for orchestration. "CNCF brings quality assurance and a complete ecosystem for Kubernetes, which is very important for the wide application of Kubernetes," says Ji. Thus Slamtec decided to adopt other CNCF projects as well: Prometheus monitoring, Fluentd logging, Harbor registry, and Helm package manager. -

Impact

- With the new platform, Ji reports that Slamtec has experienced "18+ months of 100% stability!" For users, there is now zero service downtime and seamless upgrades. "Kubernetes with third-party service mesh integration (Istio, along with Jaeger and Envoy) significantly reduced the microservice configuration and maintenance efforts by 50%," he adds. With centralized metrics monitoring and log aggregation provided by Prometheus on Fluentd, teams are saving 50% of time spent on troubleshooting and debugging. Harbor replication has allowed production/staging/testing environments to cross public cloud and the private Kubernetes cluster to share the same container registry, resulting in 30% savings of CI/CD efforts. Plus, Ji says, "Helm has accelerated prototype development and environment setup with its rich sharing charts." -
-
-
-
-
- "Cloud native technology helps us ensure high availability of our business, while improving development and testing efficiency, shortening the research and development cycle and enabling rapid product delivery." -

- BENNIU JI, DIRECTOR OF CLOUD COMPUTING BUSINESS DIVISION
-
-
-
-
-

Founded in 2013, Slamtec provides service robot autonomous localization and navigation solutions. In this fast-moving space, the company built its success on the ability of its R&D team to quickly introduce, and continually iterate on, its core products. -

- To sustain that development velocity, the company over the past few years began looking to build its own stable and reliable container cloud platform to host its Internet of Things applications. With a legacy infrastructure based on Alibaba Cloud and VMware vSphere, Slamtec teams had already adopted microservice architecture and continuous delivery, for "fine granularity on-demand scaling, fault isolation, ease of development, testing, and deployment, and for facilitating high-speed iteration," says Benniu Ji, Director of Cloud Computing Business Division. So "our needs for the cloud platform included high availability, scalability and security; multi-granularity monitoring alarm capability; friendliness to containers and microservices; and perfect CI/CD support." -

- After an evaluation of existing technologies, Ji’s team chose Kubernetes for orchestration. "CNCF brings quality assurance and a complete ecosystem for Kubernetes, which is very important for the wide application of Kubernetes," says Ji. Plus, "avoiding binding to an infrastructure technology or provider can help us ensure that our business is deployed and migrated in cross-regional environments, and can serve users all over the world." -
-
-
-
- "CNCF brings quality assurance and a complete ecosystem for Kubernetes, which is very important for the wide application of Kubernetes."

- BENNIU JI, DIRECTOR OF CLOUD COMPUTING BUSINESS DIVISION
+

With the new platform, Ji reports that Slamtec has experienced "18+ months of 100% stability!" For users, there is now zero service downtime and seamless upgrades. "Kubernetes with third-party service mesh integration (Istio, along with Jaeger and Envoy) significantly reduced the microservice configuration and maintenance efforts by 50%," he adds. With centralized metrics monitoring and log aggregation provided by Prometheus on Fluentd, teams are saving 50% of time spent on troubleshooting and debugging. Harbor replication has allowed production/staging/testing environments to cross public cloud and the private Kubernetes cluster to share the same container registry, resulting in 30% savings of CI/CD efforts. Plus, Ji says, "Helm has accelerated prototype development and environment setup with its rich sharing charts."

-
-
-
-
- Thus Slamtec decided to adopt other CNCF projects as well. "We built a monitoring and logging system based on Prometheus and Fluentd," says Ji. "The integration between Prometheus/Fluentd and Kubernetes is convenient, with multiple dimensions of data monitoring and log collection capabilities." -

- The company uses Harbor as a container image repository. "Harbor’s replication function helps us implement CI/CD on both private and public clouds," says Ji. "In addition, multi-project support, certification and policy configuration, and integration with Kubernetes are also excellent functions." Helm is also being used as a package manager, and the team is evaluating the Istio framework. "We’re very pleased that Kubernetes and these frameworks can be seamlessly integrated," Ji adds. -
-
-
-
- "Cloud native is suitable for microservice architecture, it’s suitable for fast iteration and agile development, and it has a relatively perfect ecosystem and active community."

- BENNIU JI, DIRECTOR OF CLOUD COMPUTING BUSINESS DIVISION
-
-
+{{< case-studies/quote author="BENNIU JI, DIRECTOR OF CLOUD COMPUTING BUSINESS DIVISION" >}} +"Cloud native technology helps us ensure high availability of our business, while improving development and testing efficiency, shortening the research and development cycle and enabling rapid product delivery." +{{< /case-studies/quote >}} -
-
- With the new platform, Ji reports that Slamtec has experienced "18+ months of 100% stability!" For users, there is now zero service downtime and seamless upgrades. "We benefit from the abstraction of Kubernetes from network and storage," says Ji. "The dependence on external services can be decoupled from the service and placed under unified management in the cluster." -

- Using Kubernetes and Istio "significantly reduced the microservice configuration and maintenance efforts by 50%," he adds. With centralized metrics monitoring and log aggregation provided by Prometheus on Fluentd, teams are saving 50% of time spent on troubleshooting and debugging. Harbor replication has allowed production/staging/testing environments to cross public cloud and the private Kubernetes cluster to share the same container registry, resulting in 30% savings of CI/CD efforts. Plus, Ji adds, "Helm has accelerated prototype development and environment setup with its rich sharing charts." -

-In short, Ji says, Slamtec’s new platform is helping it achieve one of its primary goals: the quick and easy release of products. With multiple release models and a centralized control interface, the platform is changing developers’ lives for the better. Slamtec also offers a unified API for the development of automated deployment tools according to users’ specific needs. -
+{{< case-studies/lead >}} +Founded in 2013, Slamtec provides service robot autonomous localization and navigation solutions. In this fast-moving space, the company built its success on the ability of its R&D team to quickly introduce, and continually iterate on, its core products. +{{< /case-studies/lead >}} -
-
-"We benefit from the abstraction of Kubernetes from network and storage, the dependence on external services can be decoupled from the service and placed under unified management in the cluster."

- BENNIU JI, DIRECTOR OF CLOUD COMPUTING BUSINESS DIVISION
-
+

To sustain that development velocity, the company over the past few years began looking to build its own stable and reliable container cloud platform to host its Internet of Things applications. With a legacy infrastructure based on Alibaba Cloud and VMware vSphere, Slamtec teams had already adopted microservice architecture and continuous delivery, for "fine granularity on-demand scaling, fault isolation, ease of development, testing, and deployment, and for facilitating high-speed iteration," says Benniu Ji, Director of Cloud Computing Business Division. So "our needs for the cloud platform included high availability, scalability and security; multi-granularity monitoring alarm capability; friendliness to containers and microservices; and perfect CI/CD support."

-
- Given its own success with cloud native, Slamtec has just one piece of advice for organizations considering making the leap. "For already containerized services, you should migrate them to the cloud native architecture as soon as possible and enjoy the advantages brought by the cloud native ecosystem," Ji says. "To migrate traditional, non-containerized services, in addition to the architecture changes of the service itself, you need to fully consider the operation and maintenance workload required to build the cloud native architecture." -

- That said, the cost-benefit analysis has been simple for Slamtec. "Cloud native technology is suitable for microservice architecture, it’s suitable for fast iteration and agile development, and it has a relatively perfect ecosystem and active community," says Ji. "It helps us ensure high availability of our business, while improving development and testing efficiency, shortening the research and development cycle and enabling rapid product delivery." -
-
+

After an evaluation of existing technologies, Ji's team chose Kubernetes for orchestration. "CNCF brings quality assurance and a complete ecosystem for Kubernetes, which is very important for the wide application of Kubernetes," says Ji. Plus, "avoiding binding to an infrastructure technology or provider can help us ensure that our business is deployed and migrated in cross-regional environments, and can serve users all over the world."

+ +{{< case-studies/quote + image="/images/case-studies/slamtec/banner3.jpg" + author="BENNIU JI, DIRECTOR OF CLOUD COMPUTING BUSINESS DIVISION" +>}} +"CNCF brings quality assurance and a complete ecosystem for Kubernetes, which is very important for the wide application of Kubernetes." +{{< /case-studies/quote >}} + +

Thus Slamtec decided to adopt other CNCF projects as well. "We built a monitoring and logging system based on Prometheus and Fluentd," says Ji. "The integration between Prometheus/Fluentd and Kubernetes is convenient, with multiple dimensions of data monitoring and log collection capabilities."

+ +

The company uses Harbor as a container image repository. "Harbor's replication function helps us implement CI/CD on both private and public clouds," says Ji. "In addition, multi-project support, certification and policy configuration, and integration with Kubernetes are also excellent functions." Helm is also being used as a package manager, and the team is evaluating the Istio framework. "We're very pleased that Kubernetes and these frameworks can be seamlessly integrated," Ji adds.

+ +{{< case-studies/quote + image="/images/case-studies/slamtec/banner4.jpg" + author="BENNIU JI, DIRECTOR OF CLOUD COMPUTING BUSINESS DIVISION" +>}} +"Cloud native is suitable for microservice architecture, it's suitable for fast iteration and agile development, and it has a relatively perfect ecosystem and active community." +{{< /case-studies/quote >}} + +

With the new platform, Ji reports that Slamtec has experienced "18+ months of 100% stability!" For users, there is now zero service downtime and seamless upgrades. "We benefit from the abstraction of Kubernetes from network and storage," says Ji. "The dependence on external services can be decoupled from the service and placed under unified management in the cluster."

+ +

Using Kubernetes and Istio "significantly reduced the microservice configuration and maintenance efforts by 50%," he adds. With centralized metrics monitoring and log aggregation provided by Prometheus on Fluentd, teams are saving 50% of time spent on troubleshooting and debugging. Harbor replication has allowed production/staging/testing environments to cross public cloud and the private Kubernetes cluster to share the same container registry, resulting in 30% savings of CI/CD efforts. Plus, Ji adds, "Helm has accelerated prototype development and environment setup with its rich sharing charts."

+ +

In short, Ji says, Slamtec's new platform is helping it achieve one of its primary goals: the quick and easy release of products. With multiple release models and a centralized control interface, the platform is changing developers' lives for the better. Slamtec also offers a unified API for the development of automated deployment tools according to users' specific needs.

+ +{{< case-studies/quote author="BENNIU JI, DIRECTOR OF CLOUD COMPUTING BUSINESS DIVISION" >}} +"We benefit from the abstraction of Kubernetes from network and storage, the dependence on external services can be decoupled from the service and placed under unified management in the cluster." +{{< /case-studies/quote >}} + +

Given its own success with cloud native, Slamtec has just one piece of advice for organizations considering making the leap. "For already containerized services, you should migrate them to the cloud native architecture as soon as possible and enjoy the advantages brought by the cloud native ecosystem," Ji says. "To migrate traditional, non-containerized services, in addition to the architecture changes of the service itself, you need to fully consider the operation and maintenance workload required to build the cloud native architecture."

+ +

That said, the cost-benefit analysis has been simple for Slamtec. "Cloud native technology is suitable for microservice architecture, it's suitable for fast iteration and agile development, and it has a relatively perfect ecosystem and active community," says Ji. "It helps us ensure high availability of our business, while improving development and testing efficiency, shortening the research and development cycle and enabling rapid product delivery."

diff --git a/content/en/case-studies/slingtv/index.html b/content/en/case-studies/slingtv/index.html index 349ed8c2de..6de46f35ff 100644 --- a/content/en/case-studies/slingtv/index.html +++ b/content/en/case-studies/slingtv/index.html @@ -3,108 +3,77 @@ title: SlingTV Case Study linkTitle: Sling TV case_study_styles: true cid: caseStudies -css: /css/style_case_studies.css featured: true weight: 49 quote: > I would almost be so bold as to say that most of these applications that we are building now would not have been possible without the cloud native patterns and the flexibility that Kubernetes enables. + +new_case_study_styles: true +heading_background: /images/case-studies/slingtv/banner1.jpg +heading_title_logo: /images/slingtv_logo.png +subheading: > + Sling TV: Marrying Kubernetes and AI to Enable Proper Web Scale +case_study_details: + - Company: Sling TV + - Location: Englewood, Colorado + - Industry: Streaming television --- +

Challenge

-
-

CASE STUDY:
Sling TV: Marrying Kubernetes and AI to Enable Proper Web Scale +

Launched by DISH Network in 2015, Sling TV experienced great customer growth from the beginning. After just a year, "we were going through some growing pains of some of the legacy systems and trying to find the right architecture to enable our future," says Brad Linder, Sling TV's Cloud Native & Big Data Evangelist. The company has particular challenges: "We take live TV and distribute it over the internet out to a user's device that we do not control," says Linder. "In a lot of ways, we are working in the Wild West: The internet is what it is going to be, and if a customer's service does not work for whatever reason, they do not care why. They just want things to work. Those are the variables of the equation that we have to try to solve. We really have to try to enable optionality and good customer experience at web scale."

-

+

Solution

-
- -
- Company  Sling TV     Location  Englewood, Colorado     Industry  Streaming television -
- -
-
-
-
-

Challenge

- Launched by DISH Network in 2015, Sling TV experienced great customer growth from the beginning. After just a year, “we were going through some growing pains of some of the legacy systems and trying to find the right architecture to enable our future,” says Brad Linder, Sling TV’s Cloud Native & Big Data Evangelist. The company has particular challenges: “We take live TV and distribute it over the internet out to a user’s device that we do not control,” says Linder. “In a lot of ways, we are working in the Wild West: The internet is what it is going to be, and if a customer’s service does not work for whatever reason, they do not care why. They just want things to work. Those are the variables of the equation that we have to try to solve. We really have to try to enable optionality and good customer experience at web scale.” - -

- -

Solution

- Led by the belief that “the cloud native architectures and patterns really give us a lot of flexibility in meeting the needs of that sort of customer base,” Linder partnered with Rancher Labs to build Sling TV’s next-generation platform around Kubernetes. “We are going to need to enable a hybrid cloud strategy including multiple public clouds and an on-premise VMWare multi data center environment to meet the needs of the business at some point, so getting that sort of abstraction was a real goal,” he says. “That is one of the biggest reasons why we picked Kubernetes.” The team launched its first applications on Kubernetes in Sling TV’s two internal data centers. The push to enable AWS as a data center option is underway and should be available by the end of 2018. The team has added Prometheus for monitoring and Jaeger for tracing, to work alongside the company’s existing tool sets: Zenoss, New Relic and ELK. - -
-

-
+

Led by the belief that "the cloud native architectures and patterns really give us a lot of flexibility in meeting the needs of that sort of customer base," Linder partnered with Rancher Labs to build Sling TV's next-generation platform around Kubernetes. "We are going to need to enable a hybrid cloud strategy including multiple public clouds and an on-premise VMWare multi data center environment to meet the needs of the business at some point, so getting that sort of abstraction was a real goal," he says. "That is one of the biggest reasons why we picked Kubernetes." The team launched its first applications on Kubernetes in Sling TV's two internal data centers. The push to enable AWS as a data center option is underway and should be available by the end of 2018. The team has added Prometheus for monitoring and Jaeger for tracing, to work alongside the company's existing tool sets: Zenoss, New Relic and ELK.

Impact

- “We are getting to the place where we can one-click deploy an entire data center – the compute, network, Kubernetes, logging, monitoring and all the apps,” says Linder. “We have really enabled a platform thinking based approach to allowing applications to consume common tools. A new application can be onboarded in about an hour using common tooling and CI/CD processes. The gains on that side have been huge. Before, it took at least a few days to get things sorted for a new application to deploy. That does not consider the training of our operations staff to manage this new application. It is two or three orders of magnitude of savings in time and cost, and operationally it has given us the opportunity to let a core team of talented operations engineers manage common infrastructure and tooling to make our applications available at web scale.” + +

"We are getting to the place where we can one-click deploy an entire data center – the compute, network, Kubernetes, logging, monitoring and all the apps," says Linder. "We have really enabled a platform thinking based approach to allowing applications to consume common tools. A new application can be onboarded in about an hour using common tooling and CI/CD processes. The gains on that side have been huge. Before, it took at least a few days to get things sorted for a new application to deploy. That does not consider the training of our operations staff to manage this new application. It is two or three orders of magnitude of savings in time and cost, and operationally it has given us the opportunity to let a core team of talented operations engineers manage common infrastructure and tooling to make our applications available at web scale."

+ +{{< case-studies/quote author="Brad Linder, Cloud Native & Big Data Evangelist for Sling TV" >}} +"I would almost be so bold as to say that most of these applications that we are building now would not have been possible without the cloud native patterns and the flexibility that Kubernetes enables." +{{< /case-studies/quote >}} + +{{< case-studies/lead >}} +The beauty of streaming television, like the service offered by Sling TV, is that you can watch it from any device you want, wherever you want. +{{< /case-studies/lead >}} -
+

Of course, from the provider side of things, that creates a particular set of challenges "We take live TV and distribute it over the internet out to a user's device that we do not control," says Brad Linder, Sling TV's Cloud Native & Big Data Evangelist. "In a lot of ways, we are working in the Wild West: The internet is what it is going to be, and if a customer's service does not work for whatever reason, they do not care why. They just want things to work. Those are the variables of the equation that we have to try to solve. We really have to try to enable optionality and we have to do it at web scale."

-
-
-
-
- “I would almost be so bold as to say that most of these applications that we are building now would not have been possible without the cloud native patterns and the flexibility that Kubernetes enables.”

— Brad Linder, Cloud Native & Big Data Evangelist for Sling TV
+

Indeed, Sling TV experienced great customer growth from the beginning of its launch by DISH Network in 2015. After just a year, "we were going through some growing pains of some of the legacy systems and trying to find the right architecture to enable our future," says Linder. Tasked with building a next-generation web scale platform for the "personalized customer experience," Linder has spent the past year bringing Kubernetes to Sling TV.

-
-
-
-
-

The beauty of streaming television, like the service offered by Sling TV, is that you can watch it from any device you want, wherever you want.

Of course, from the provider side of things, that creates a particular set of challenges -“We take live TV and distribute it over the internet out to a user’s device that we do not control,” says Brad Linder, Sling TV’s Cloud Native & Big Data Evangelist. “In a lot of ways, we are working in the Wild West: The internet is what it is going to be, and if a customer’s service does not work for whatever reason, they do not care why. They just want things to work. Those are the variables of the equation that we have to try to solve. We really have to try to enable optionality and we have to do it at web scale.”

-Indeed, Sling TV experienced great customer growth from the beginning of its launch by DISH Network in 2015. After just a year, “we were going through some growing pains of some of the legacy systems and trying to find the right architecture to enable our future,” says Linder. Tasked with building a next-generation web scale platform for the “personalized customer experience,” Linder has spent the past year bringing Kubernetes to Sling TV.

-Led by the belief that “the cloud native architectures and patterns really give us a lot of flexibility in meeting the needs of our customers,” Linder partnered with Rancher Labs to build the platform around Kubernetes. “They have really helped us get our head around how to use Kubernetes,” he says. “We needed the flexibility to enable our use case versus just a simple orchestrater. Enabling our future in a way that did not give us vendor lock-in was also a key part of our strategy. I think that is part of the Rancher value proposition.” +

Led by the belief that "the cloud native architectures and patterns really give us a lot of flexibility in meeting the needs of our customers," Linder partnered with Rancher Labs to build the platform around Kubernetes. "They have really helped us get our head around how to use Kubernetes," he says. "We needed the flexibility to enable our use case versus just a simple orchestrater. Enabling our future in a way that did not give us vendor lock-in was also a key part of our strategy. I think that is part of the Rancher value proposition."

+{{< case-studies/quote + image="/images/case-studies/slingtv/banner3.jpg" + author="Brad Linder, Cloud Native & Big Data Evangelist for Sling TV" +>}} +"We needed the flexibility to enable our use case versus just a simple orchestrater. Enabling our future in a way that did not give us vendor lock-in was also a key part of our strategy. I think that is part of the Rancher value proposition." +{{< /case-studies/quote >}} -
-
-
-
- “We needed the flexibility to enable our use case versus just a simple orchestrater. Enabling our future in a way that did not give us vendor lock-in was also a key part of our strategy. I think that is part of the Rancher value proposition.”

— Brad Linder, Cloud Native & Big Data Evangelist for Sling TV
-
-
-
-
+

One big reason he chose Kubernetes was getting a level of abstraction that would enable the company to "enable a hybrid cloud strategy including multiple public clouds and an on-premise VMWare multi data center environment to meet the needs of the business," he says. Another factor was how much the Kubernetes ecosystem has matured over the past couple of years. "We have spent a lot of time and energy around making logging, monitoring and alerting production ready to give us insights into applications' well-being," says Linder. The team has added Prometheus for monitoring and Jaeger for tracing, to work alongside the company's existing tool sets: Zenoss, New Relic and ELK.

-One big reason he chose Kubernetes was getting a level of abstraction that would enable the company to “enable a hybrid cloud strategy including multiple public clouds and an on-premise VMWare multi data center environment to meet the needs of the business,” he says. Another factor was how much the Kubernetes ecosystem has matured over the past couple of years. “We have spent a lot of time and energy around making logging, monitoring and alerting production ready to give us insights into applications’ well-being,” says Linder. The team has added Prometheus for monitoring and Jaeger for tracing, to work alongside the company’s existing tool sets: Zenoss, New Relic and ELK.

-With the emphasis on common tooling, “We are getting to the place where we can one-click deploy an entire data center – the compute, network, Kubernetes, logging, monitoring and all the apps,” says Linder. “We have really enabled a platform thinking based approach to allowing applications to consume common tools and services. A new application can be onboarded in about an hour using common tooling and CI/CD processes. The gains on that side have been huge. Before, it took at least a few days to get things sorted for a new application to deploy. That does not consider the training of our operations staff to manage this new application. It is two or three orders of magnitude of savings in time and cost, and operationally it has given us the opportunity to let a core team of talented operations engineers manage common infrastructure and tooling to make our applications available at web scale.”

+

With the emphasis on common tooling, "We are getting to the place where we can one-click deploy an entire data center – the compute, network, Kubernetes, logging, monitoring and all the apps," says Linder. "We have really enabled a platform thinking based approach to allowing applications to consume common tools and services. A new application can be onboarded in about an hour using common tooling and CI/CD processes. The gains on that side have been huge. Before, it took at least a few days to get things sorted for a new application to deploy. That does not consider the training of our operations staff to manage this new application. It is two or three orders of magnitude of savings in time and cost, and operationally it has given us the opportunity to let a core team of talented operations engineers manage common infrastructure and tooling to make our applications available at web scale."

-
-
-
-
-“We have to be able to react to changes and hiccups in the matrix. It is the foundation for our ability to deliver a high-quality service for our customers."

— Brad Linder, Cloud Native & Big Data Evangelist for Sling TV
-
-
+{{< case-studies/quote + image="/images/case-studies/slingtv/banner4.jpg" + author="Brad Linder, Cloud Native & Big Data Evangelist for Sling TV" +>}} +"We have to be able to react to changes and hiccups in the matrix. It is the foundation for our ability to deliver a high-quality service for our customers." +{{< /case-studies/quote >}} -
+

The team launched its first applications on Kubernetes in Sling TV's two internal data centers in the early part of Q1 2018 and began to enable AWS as a data center option. The company plans to expand into other public clouds in the future.

-
- The team launched its first applications on Kubernetes in Sling TV’s two internal data centers in the early part of Q1 2018 and began to enable AWS as a data center option. The company plans to expand into other public clouds in the future. -The first application that went into production is a web socket-based back-end notification service. “It allows back-end changes to trigger messages to our clients in the field without the polling,” says Linder. “We are talking about very high volumes of messages with this application. Without something like Kubernetes to be able to scale up and down, as well as just support that overall workload, that is pretty hard to do. I would almost be so bold as to say that most of these applications that we are building now would not have been possible without the cloud native patterns and the flexibility that Kubernetes enables.”

- Linder oversees three teams working together on building the next-generation platform: a platform engineering team; an enterprise middleware services team; and a big data and analytics team. “We have really tried to bring everything together to be able to have a client application interact with a cloud native middleware layer. That middleware layer must run on a platform, consume platform services and then have logs and events monitored by an artificial agent to keep things running smoothly,” says Linder. +

The first application that went into production is a web socket-based back-end notification service. "It allows back-end changes to trigger messages to our clients in the field without the polling," says Linder. "We are talking about very high volumes of messages with this application. Without something like Kubernetes to be able to scale up and down, as well as just support that overall workload, that is pretty hard to do. I would almost be so bold as to say that most of these applications that we are building now would not have been possible without the cloud native patterns and the flexibility that Kubernetes enables."

+

Linder oversees three teams working together on building the next-generation platform: a platform engineering team; an enterprise middleware services team; and a big data and analytics team. "We have really tried to bring everything together to be able to have a client application interact with a cloud native middleware layer. That middleware layer must run on a platform, consume platform services and then have logs and events monitored by an artificial agent to keep things running smoothly," says Linder.

-
+{{< case-studies/quote author="BRAD LINDER, CLOUD NATIVE & BIG DATA EVANGELIST FOR SLING TV">}} +This undertaking is about "trying to marry Kubernetes with AI to enable web scale that just works". +{{< /case-studies/quote >}} -
-
- This undertaking is about “trying to marry Kubernetes with AI to enable web scale that just works".

— BRAD LINDER, CLOUD NATIVE & BIG DATA EVANGELIST FOR SLING TV
-
-
+

Ultimately, this undertaking is about "trying to marry Kubernetes with AI to enable web scale that just works," he adds. "We want the artificial agents and the big data platform using the actual logs and events coming out of the applications, Kubernetes, the infrastructure, backing services and changes to the environment to make decisions like, 'Hey we need more capacity for this service so please add more nodes.' From a platform perspective, if you are truly doing web scale stuff and you are not using AI and big data, in my opinion, you are going to implode under your own weight. It is not a question of if, it is when. If you are in a 'millions of users' sort of environment, that implosion is going to be catastrophic. We are on our way to this goal and have learned a lot along the way."

-
- Ultimately, this undertaking is about “trying to marry Kubernetes with AI to enable web scale that just works,” he adds. “We want the artificial agents and the big data platform using the actual logs and events coming out of the applications, Kubernetes, the infrastructure, backing services and changes to the environment to make decisions like, ‘Hey we need more capacity for this service so please add more nodes.’ From a platform perspective, if you are truly doing web scale stuff and you are not using AI and big data, in my opinion, you are going to implode under your own weight. It is not a question of if, it is when. If you are in a ‘millions of users’ sort of environment, that implosion is going to be catastrophic. We are on our way to this goal and have learned a lot along the way.”

-For Sling TV, moving to cloud native has been exactly what they needed. “We have to be able to react to changes and hiccups in the matrix,” says Linder. “It is the foundation for our ability to deliver a high-quality service for our customers. Building intelligent platforms, tools and clients in the field consuming those services has got to be part of all of this. In my eyes that is a big part of what cloud native is all about. It is taking these distributed, potentially unreliable entities and enabling a robust customer experience they expect.” - - - - - -
- -
+

For Sling TV, moving to cloud native has been exactly what they needed. "We have to be able to react to changes and hiccups in the matrix," says Linder. "It is the foundation for our ability to deliver a high-quality service for our customers. Building intelligent platforms, tools and clients in the field consuming those services has got to be part of all of this. In my eyes that is a big part of what cloud native is all about. It is taking these distributed, potentially unreliable entities and enabling a robust customer experience they expect."

diff --git a/content/en/case-studies/sos/index.html b/content/en/case-studies/sos/index.html index becf486413..ff0e3f2322 100644 --- a/content/en/case-studies/sos/index.html +++ b/content/en/case-studies/sos/index.html @@ -3,97 +3,81 @@ title: SOS International Case Study linkTitle: SOS International case_study_styles: true cid: caseStudies -css: /css/style_case_studies.css logo: sos_featured_logo.png + +new_case_study_styles: true +heading_background: /images/case-studies/sos/banner1.jpg +heading_title_logo: /images/sos_logo.png +subheading: > + SOS International: Using Kubernetes to Provide Emergency Assistance in a Connected World +case_study_details: + - Company: SOS International + - Location: Frederiksberg, Denmark + - Industry: Medical and Travel Assistance --- +

Challenge

-
-

CASE STUDY:
SOS International: Using Kubernetes to Provide Emergency Assistance in a Connected World +

For the past six decades, SOS International has been providing reliable medical and travel assistance in the Nordic region. In recent years, the company's business strategy has required increasingly intense development in the digital space, but when it came to its IT systems, "SOS has a very fragmented legacy," with three traditional monoliths (Java, .NET, and IBM's AS/400) and a waterfall approach, says Martin Ahrentsen, Head of Enterprise Architecture. "We have been forced to institute both new technology and new ways of working, so we could be more efficient with a shorter time to market. It was a much more agile approach, and we needed to have a platform that can help us deliver that to the business."

-

+

Solution

-
+

After an unsuccessful search for a standard system, the company decided to take a platform approach and look for a solution that rolls up Kubernetes and the container technology. RedHat OpenShift proved to be a perfect fit for SOS's fragmented systems. "We have a lot of different technologies that we use, both code languages and others, and all of them could use the resources on the new platform," says Ahrentsen. Of the company's three monoliths, "we can provide this new bleeding edge technology to two of them (.NET and Java)." The platform went live in the spring of 2018; there are now six greenfield projects based on microservices architecture underway, plus all of the company's Java applications are currently going through a "lift and shift" migration.

-
- Company  SOS International     Location  Frederiksberg, Denmark -     Industry  Medical and Travel Assistance -
- -
-
-
-
-

Challenge

- For the past six decades, SOS International has been providing reliable medical and travel assistance in the Nordic region. In recent years, the company’s business strategy has required increasingly intense development in the digital space, but when it came to its IT systems, "SOS has a very fragmented legacy," with three traditional monoliths (Java, .NET, and IBM’s AS/400) and a waterfall approach, says Martin Ahrentsen, Head of Enterprise Architecture. "We have been forced to institute both new technology and new ways of working, so we could be more efficient with a shorter time to market. It was a much more agile approach, and we needed to have a platform that can help us deliver that to the business." - -

-

Solution

- After an unsuccessful search for a standard system, the company decided to take a platform approach and look for a solution that rolls up Kubernetes and the container technology. RedHat OpenShift proved to be a perfect fit for SOS’s fragmented systems. "We have a lot of different technologies that we use, both code languages and others, and all of them could use the resources on the new platform," says Ahrentsen. Of the company’s three monoliths, "we can provide this new bleeding edge technology to two of them (.NET and Java)." The platform went live in the spring of 2018; there are now six greenfield projects based on microservices architecture underway, plus all of the company’s Java applications are currently going through a "lift and shift" migration. - -

Impact

- Kubernetes has delivered "improved time to market, agility, and the ability to adapt to changes and new technologies," says Ahrentsen. "Just the time between when the software is ready for release and when it can be released has dramatically been improved." The way of thinking at SOS International has also changed for the better: "Since we have Kubernetes and easy access to scripts that can help us automate, creating CI/CD pipelines easily, that has spawned a lot of internal interest in how to do this fully automated, all the way. It creates a very good climate in order to start the journey," he says. Moreover, being part of the cloud native community has helped the company attract talent. "They want to work with the cool, new technologies," says Ahrentsen. "During our onboarding, we could see that we were chosen by IT professionals because we provided the new technologies." -
+

Kubernetes has delivered "improved time to market, agility, and the ability to adapt to changes and new technologies," says Ahrentsen. "Just the time between when the software is ready for release and when it can be released has dramatically been improved." The way of thinking at SOS International has also changed for the better: "Since we have Kubernetes and easy access to scripts that can help us automate, creating CI/CD pipelines easily, that has spawned a lot of internal interest in how to do this fully automated, all the way. It creates a very good climate in order to start the journey," he says. Moreover, being part of the cloud native community has helped the company attract talent. "They want to work with the cool, new technologies," says Ahrentsen. "During our onboarding, we could see that we were chosen by IT professionals because we provided the new technologies."

-
-
-
-
- "The speed of the changes that cloud native software and technologies drive right now is amazing, and following and adopting it is very crucial for us. The amazing technology provided by Kubernetes and cloud native has started the change for SOS towards a digital future." -

- Martin Ahrentsen, Head of Enterprise Architecture, SOS International
-
-
-
-
-

For six decades, SOS International has provided reliable emergency medical and travel assistance for customers in the Nordic countries.

- SOS operators handle a million cases and over a million phone calls a year. But in the past four years, the company’s business strategy has required increasingly intense development in the digital space.

- When it comes to its IT systems, "SOS has a very fragmented legacy," with three traditional monoliths running in the company’s own data centers and a waterfall approach, says Martin Ahrentsen, Head of Enterprise Architecture. "We had to institute both new technology and new ways of working so we could be more efficient, with a shorter time to market. It was a much more agile approach, and we needed to have a platform that can help us deliver that to the business."

- For a long time, Ahrentsen and his team searched for a standard solution that could work at SOS. "There aren’t that many assistance companies like us, so you cannot get a standard system that fits for that; there is no perfect match," he says. "We would have to take a standard system and twist it too much so it is not standard anymore. Based on that, we decided to find a technology platform instead, with some common components that we could use to build the new digital systems and core systems." +{{< case-studies/quote author="Martin Ahrentsen, Head of Enterprise Architecture, SOS International" >}} +"The speed of the changes that cloud native software and technologies drive right now is amazing, and following and adopting it is very crucial for us. The amazing technology provided by Kubernetes and cloud native has started the change for SOS towards a digital future." +{{< /case-studies/quote >}} +{{< case-studies/lead >}} +For six decades, SOS International has provided reliable emergency medical and travel assistance for customers in the Nordic countries. +{{< /case-studies/lead >}} +

SOS operators handle a million cases and over a million phone calls a year. But in the past four years, the company's business strategy has required increasingly intense development in the digital space.

-
-
-
-
- "We have to deliver new digital services, but we also have to migrate the old stuff, and we have to transform our core systems into new systems built on top of this platform. One of the reasons why we chose this technology is that we could build new digital services while changing the old one."

- Martin Ahrentsen, Head of Enterprise Architecture, SOS International
+

When it comes to its IT systems, "SOS has a very fragmented legacy," with three traditional monoliths running in the company's own data centers and a waterfall approach, says Martin Ahrentsen, Head of Enterprise Architecture. "We had to institute both new technology and new ways of working so we could be more efficient, with a shorter time to market. It was a much more agile approach, and we needed to have a platform that can help us deliver that to the business."

-
-
-
-
- Sold on what Kubernetes could do, Ahrentsen zeroed in on platforms that could meet the business’s needs right away. The company opted to use RedHat’s OpenShift container platform, which incorporates Docker containers and Kubernetes, as well as a whole stack of technologies, including RedHat Hyperconverged Infrastructure and some midware components, all from the open source community.

- Based on the company’s criteria—technology fit, agility fit, legal requirements, and competencies—the OpenShift solution seemed like a perfect fit for SOS’s fragmented systems. "We have a lot of different technologies that we use, both code languages and others, and all of them could use the resources on the new platform," says Ahrentsen. Of the company’s three monoliths, "we can provide this new bleeding edge technology to two of them (.NET and Java)."

- The platform went live in the spring of 2018; six greenfield projects based on microservices architecture were initially launched, plus all of the company’s Java applications are currently going through a "lift and shift" migration. One of the first Kubernetes-based projects to go live is Remote Medical Treatment, a solution in which customers can contact the SOS alarm center via voice, chat, or video. "We managed to develop it in quite a short timeframe with focus on full CI/CD pipelining and a modern microservice architecture all running in a dual OpenShift cluster setup," says Ahrentsen. Onsite, which is used for dispatching rescue trucks around the Nordic countries, and Follow Your Truck, which allows customers to track tow trucks, are also being rolled out. +

For a long time, Ahrentsen and his team searched for a standard solution that could work at SOS. "There aren't that many assistance companies like us, so you cannot get a standard system that fits for that; there is no perfect match," he says. "We would have to take a standard system and twist it too much so it is not standard anymore. Based on that, we decided to find a technology platform instead, with some common components that we could use to build the new digital systems and core systems."

-
-
-
-
- "During our onboarding, we could see that we were chosen by IT professionals because we provided the new technologies."

- Martin Ahrentsen, Head of Enterprise Architecture, SOS International
-
-
+{{< case-studies/quote + image="/images/case-studies/sos/banner3.jpg" + author="Martin Ahrentsen, Head of Enterprise Architecture, SOS International" +>}} +"We have to deliver new digital services, but we also have to migrate the old stuff, and we have to transform our core systems into new systems built on top of this platform. One of the reasons why we chose this technology is that we could build new digital services while changing the old one." +{{< /case-studies/quote >}} -
-
- The platform is still running on premise, because some of SOS’s customers in the insurance industry, for whom the company handles data, don’t yet have a cloud strategy. Kubernetes is allowing SOS to start in the data center and move to the cloud when the business is ready. "Over the next three to five years, all of them will have a strategy, and we could probably take the data and go to the cloud," says Ahrentsen. There’s also the possibility of moving to a hybrid cloud setup for sensitive and non-sensitive data.

- SOS’s technology is certainly in a state of transition. "We have to deliver new digital services, but we also have to migrate the old stuff, and we have to transform our core systems into new systems built on top of this platform," says Ahrentsen. "One of the reasons why we chose this technology is that we could build new digital services while changing the old one."

- But already, Kubernetes has delivered improved time to market, as evidenced by how quickly the greenfield projects were developed and released. "Just the time between when the software is ready for release and when it can be released has dramatically been improved," says Ahrentsen.

- Moreover, being part of the cloud native community has helped the company attract talent as it pursues a goal of growing the ranks of engineers, operators, and architects from 60 to 100 this year. "They want to work with the cool, new technologies," says Ahrentsen. "During our onboarding, we could see that we were chosen by IT professionals because we provided the new technologies." +

Sold on what Kubernetes could do, Ahrentsen zeroed in on platforms that could meet the business's needs right away. The company opted to use RedHat's OpenShift container platform, which incorporates Docker containers and Kubernetes, as well as a whole stack of technologies, including RedHat Hyperconverged Infrastructure and some midware components, all from the open source community.

-
+

Based on the company's criteria—technology fit, agility fit, legal requirements, and competencies—the OpenShift solution seemed like a perfect fit for SOS's fragmented systems. "We have a lot of different technologies that we use, both code languages and others, and all of them could use the resources on the new platform," says Ahrentsen. Of the company's three monoliths, "we can provide this new bleeding edge technology to two of them (.NET and Java)."

-
-
-"The future world where everything is connected and sends data will create a big potential for us in terms of new market opportunities. But it will also set a big demand on the IT platform and what we need to deliver."

- Martin Ahrentsen, Head of Enterprise Architecture, SOS International
-
+

The platform went live in the spring of 2018; six greenfield projects based on microservices architecture were initially launched, plus all of the company's Java applications are currently going through a "lift and shift" migration. One of the first Kubernetes-based projects to go live is Remote Medical Treatment, a solution in which customers can contact the SOS alarm center via voice, chat, or video. "We managed to develop it in quite a short timeframe with focus on full CI/CD pipelining and a modern microservice architecture all running in a dual OpenShift cluster setup," says Ahrentsen. Onsite, which is used for dispatching rescue trucks around the Nordic countries, and Follow Your Truck, which allows customers to track tow trucks, are also being rolled out.

-
- The way of thinking at SOS International has also changed dramatically: "Since we have Kubernetes and easy access to scripts that can help us automate, creating CI/CD pipelines easily, that has spawned a lot of internal interest in how to do this fully automated, all the way. It creates a very good climate in order to start the journey."

- For this journey at SOS, digitalization and optimization are the key words. "For IT to deliver this, we need to improve, and that is not just on the way of using Kubernetes and the platform," says Ahrentsen. "It’s also a way of building the systems to be ready for automation, and afterwards, machine learning and other interesting technologies that are on the way."

- Case in point: the introduction of the internet of things into automobiles. The European Commission now mandates all new cars to be equipped with eCall, which transmits location and other data in case of a serious traffic accident. SOS provides this service as smart auto assistance. "We receive the call and find out if an emergency response team needs to be sent, or if it’s not heavy impact," says Ahrentsen. "The future world where everything is connected and sends data will create a big potential for us in terms of new market opportunities. But it will also set a big demand on the IT platform and what we need to deliver."

- Ahrentsen feels that SOS is well equipped for the challenge, given the technology choices the company has made. "The speed of the changes that cloud native software and technologies drive right now is amazing, and following it and adopting it is very crucial for us," he says. "The amazing technology provided by Kubernetes and cloud native has started the change for SOS towards a digital future." -
-
+{{< case-studies/quote + image="/images/case-studies/sos/banner4.jpg" + author="Martin Ahrentsen, Head of Enterprise Architecture, SOS International" +>}} +"During our onboarding, we could see that we were chosen by IT professionals because we provided the new technologies." +{{< /case-studies/quote >}} + +

The platform is still running on premise, because some of SOS's customers in the insurance industry, for whom the company handles data, don't yet have a cloud strategy. Kubernetes is allowing SOS to start in the data center and move to the cloud when the business is ready. "Over the next three to five years, all of them will have a strategy, and we could probably take the data and go to the cloud," says Ahrentsen. There's also the possibility of moving to a hybrid cloud setup for sensitive and non-sensitive data.

+ +

SOS's technology is certainly in a state of transition. "We have to deliver new digital services, but we also have to migrate the old stuff, and we have to transform our core systems into new systems built on top of this platform," says Ahrentsen. "One of the reasons why we chose this technology is that we could build new digital services while changing the old one."

+ +

But already, Kubernetes has delivered improved time to market, as evidenced by how quickly the greenfield projects were developed and released. "Just the time between when the software is ready for release and when it can be released has dramatically been improved," says Ahrentsen.

+ +

Moreover, being part of the cloud native community has helped the company attract talent as it pursues a goal of growing the ranks of engineers, operators, and architects from 60 to 100 this year. "They want to work with the cool, new technologies," says Ahrentsen. "During our onboarding, we could see that we were chosen by IT professionals because we provided the new technologies."

+ +{{< case-studies/quote author="Martin Ahrentsen, Head of Enterprise Architecture, SOS International" >}} +"The future world where everything is connected and sends data will create a big potential for us in terms of new market opportunities. But it will also set a big demand on the IT platform and what we need to deliver." +{{< /case-studies/quote >}} + +

The way of thinking at SOS International has also changed dramatically: "Since we have Kubernetes and easy access to scripts that can help us automate, creating CI/CD pipelines easily, that has spawned a lot of internal interest in how to do this fully automated, all the way. It creates a very good climate in order to start the journey."

+ +

For this journey at SOS, digitalization and optimization are the key words. "For IT to deliver this, we need to improve, and that is not just on the way of using Kubernetes and the platform," says Ahrentsen. "It's also a way of building the systems to be ready for automation, and afterwards, machine learning and other interesting technologies that are on the way."

+ +

Case in point: the introduction of the internet of things into automobiles. The European Commission now mandates all new cars to be equipped with eCall, which transmits location and other data in case of a serious traffic accident. SOS provides this service as smart auto assistance. "We receive the call and find out if an emergency response team needs to be sent, or if it's not heavy impact," says Ahrentsen. "The future world where everything is connected and sends data will create a big potential for us in terms of new market opportunities. But it will also set a big demand on the IT platform and what we need to deliver."

+ +

Ahrentsen feels that SOS is well equipped for the challenge, given the technology choices the company has made. "The speed of the changes that cloud native software and technologies drive right now is amazing, and following it and adopting it is very crucial for us," he says. "The amazing technology provided by Kubernetes and cloud native has started the change for SOS towards a digital future."

diff --git a/content/en/case-studies/spotify/index.html b/content/en/case-studies/spotify/index.html index 63243b08f6..7448b40939 100644 --- a/content/en/case-studies/spotify/index.html +++ b/content/en/case-studies/spotify/index.html @@ -3,96 +3,77 @@ title: Spotify Case Study linkTitle: Spotify case_study_styles: true cid: caseStudies -css: /css/style_case_studies.css featured: false + +new_case_study_styles: true +heading_background: /images/case-studies/spotify/banner1.jpg +heading_title_text: Spotify +subheading: > + Spotify: An Early Adopter of Containers, Spotify Is Migrating from Homegrown Orchestration to Kubernetes +case_study_details: + - Company: Spotify + - Location: Global + - Industry: Entertainment --- -
-

CASE STUDY: Spotify
Spotify: An Early Adopter of Containers, Spotify Is Migrating from Homegrown Orchestration to Kubernetes +

Challenge

-

+

Launched in 2008, the audio-streaming platform has grown to over 200 million monthly active users across the world. "Our goal is to empower creators and enable a really immersive listening experience for all of the consumers that we have today—and hopefully the consumers we'll have in the future," says Jai Chakrabarti, Director of Engineering, Infrastructure and Operations. An early adopter of microservices and Docker, Spotify had containerized microservices running across its fleet of VMs with a homegrown container orchestration system called Helios. By late 2017, it became clear that "having a small team working on the features was just not as efficient as adopting something that was supported by a much bigger community," he says.

-
+

Solution

-
- Company  Spotify     Location  Global -     Industry  Entertainment -
+

"We saw the amazing community that had grown up around Kubernetes, and we wanted to be part of that," says Chakrabarti. Kubernetes was more feature-rich than Helios. Plus, "we wanted to benefit from added velocity and reduced cost, and also align with the rest of the industry on best practices and tools." At the same time, the team wanted to contribute its expertise and influence in the flourishing Kubernetes community. The migration, which would happen in parallel with Helios running, could go smoothly because "Kubernetes fit very nicely as a complement and now as a replacement to Helios," says Chakrabarti.

-
-
-
-
-

Challenge

- Launched in 2008, the audio-streaming platform has grown to over 200 million monthly active users across the world. "Our goal is to empower creators and enable a really immersive listening experience for all of the consumers that we have today—and hopefully the consumers we’ll have in the future," says Jai Chakrabarti, Director of Engineering, Infrastructure and Operations. An early adopter of microservices and Docker, Spotify had containerized microservices running across its fleet of VMs with a homegrown container orchestration system called Helios. By late 2017, it became clear that "having a small team working on the features was just not as efficient as adopting something that was supported by a much bigger community," he says. - -

-

Solution

- "We saw the amazing community that had grown up around Kubernetes, and we wanted to be part of that," says Chakrabarti. Kubernetes was more feature-rich than Helios. Plus, "we wanted to benefit from added velocity and reduced cost, and also align with the rest of the industry on best practices and tools." At the same time, the team wanted to contribute its expertise and influence in the flourishing Kubernetes community. The migration, which would happen in parallel with Helios running, could go smoothly because "Kubernetes fit very nicely as a complement and now as a replacement to Helios," says Chakrabarti. -

Impact

- The team spent much of 2018 addressing the core technology issues required for a migration, which started late that year and is a big focus for 2019. "A small percentage of our fleet has been migrated to Kubernetes, and some of the things that we’ve heard from our internal teams are that they have less of a need to focus on manual capacity provisioning and more time to focus on delivering features for Spotify," says Chakrabarti. The biggest service currently running on Kubernetes takes about 10 million requests per second as an aggregate service and benefits greatly from autoscaling, says Site Reliability Engineer James Wen. Plus, he adds, "Before, teams would have to wait for an hour to create a new service and get an operational host to run it in production, but with Kubernetes, they can do that on the order of seconds and minutes." In addition, with Kubernetes’s bin-packing and multi-tenancy capabilities, CPU utilization has improved on average two- to threefold. -
-
-
-
-
- "We saw the amazing community that’s grown up around Kubernetes, and we wanted to be part of that. We wanted to benefit from added velocity and reduced cost, and also align with the rest of the industry on best practices and tools." -

- Jai Chakrabarti, Director of Engineering, Infrastructure and Operations, Spotify
-
-
-
-
-

"Our goal is to empower creators and enable a really immersive listening experience for all of the consumers that we have today—and hopefully the consumers we’ll have in the future," says Jai Chakrabarti, Director of Engineering, Infrastructure and Operations at Spotify. Since the audio-streaming platform launched in 2008, it has already grown to over 200 million monthly active users around the world, and for Chakrabarti’s team, the goal is solidifying Spotify’s infrastructure to support all those future consumers too.

-

- An early adopter of microservices and Docker, Spotify had containerized microservices running across its fleet of VMs since 2014. The company used an open source, homegrown container orchestration system called Helios, and in 2016-17 completed a migration from on premise data centers to Google Cloud. Underpinning these decisions, "We have a culture around autonomous teams, over 200 autonomous engineering squads who are working on different pieces of the pie, and they need to be able to iterate quickly," Chakrabarti says. "So for us to have developer velocity tools that allow squads to move quickly is really important."

- But by late 2017, it became clear that "having a small team working on the Helios features was just not as efficient as adopting something that was supported by a much bigger community," says Chakrabarti. "We saw the amazing community that had grown up around Kubernetes, and we wanted to be part of that. We wanted to benefit from added velocity and reduced cost, and also align with the rest of the industry on best practices and tools." At the same time, the team wanted to contribute its expertise and influence in the flourishing Kubernetes community. +

The team spent much of 2018 addressing the core technology issues required for a migration, which started late that year and is a big focus for 2019. "A small percentage of our fleet has been migrated to Kubernetes, and some of the things that we've heard from our internal teams are that they have less of a need to focus on manual capacity provisioning and more time to focus on delivering features for Spotify," says Chakrabarti. The biggest service currently running on Kubernetes takes about 10 million requests per second as an aggregate service and benefits greatly from autoscaling, says Site Reliability Engineer James Wen. Plus, he adds, "Before, teams would have to wait for an hour to create a new service and get an operational host to run it in production, but with Kubernetes, they can do that on the order of seconds and minutes." In addition, with Kubernetes's bin-packing and multi-tenancy capabilities, CPU utilization has improved on average two- to threefold.

+{{< case-studies/quote author="Jai Chakrabarti, Director of Engineering, Infrastructure and Operations, Spotify" >}} +"We saw the amazing community that's grown up around Kubernetes, and we wanted to be part of that. We wanted to benefit from added velocity and reduced cost, and also align with the rest of the industry on best practices and tools." +{{< /case-studies/quote >}} -
-
-
-
- "The community has been extremely helpful in getting us to work through all the technology much faster and much easier. And it’s helped us validate all the things we’re doing."

- Dave Zolotusky, Software Engineer, Infrastructure and Operations, Spotify
+{{< case-studies/lead >}} +"Our goal is to empower creators and enable a really immersive listening experience for all of the consumers that we have today—and hopefully the consumers we'll have in the future," says Jai Chakrabarti, Director of Engineering, Infrastructure and Operations at Spotify. Since the audio-streaming platform launched in 2008, it has already grown to over 200 million monthly active users around the world, and for Chakrabarti's team, the goal is solidifying Spotify's infrastructure to support all those future consumers too. +{{< /case-studies/lead >}} -
-
-
-
- Another plus: "Kubernetes fit very nicely as a complement and now as a replacement to Helios, so we could have it running alongside Helios to mitigate the risks," says Chakrabarti. "During the migration, the services run on both, so we’re not having to put all of our eggs in one basket until we can validate Kubernetes under a variety of load circumstances and stress circumstances."

- The team spent much of 2018 addressing the core technology issues required for the migration. "We were able to use a lot of the Kubernetes APIs and extensibility features of Kubernetes to support and interface with our legacy infrastructure, so the integration was straightforward and easy," says Site Reliability Engineer James Wen.

- Migration started late that year and has accelerated in 2019. "Our focus is really on stateless services, and once we address our last remaining technology blocker, that’s where we hope that the uptick will come from," says Chakrabarti. "For stateful services there’s more work that we need to do."

- A small percentage of Spotify’s fleet, containing over 150 services, has been migrated to Kubernetes so far. "We’ve heard from our customers that they have less of a need to focus on manual capacity provisioning and more time to focus on delivering features for Spotify," says Chakrabarti. The biggest service currently running on Kubernetes takes over 10 million requests per second as an aggregate service and benefits greatly from autoscaling, says Wen. Plus, Wen adds, "Before, teams would have to wait for an hour to create a new service and get an operational host to run it in production, but with Kubernetes, they can do that on the order of seconds and minutes." In addition, with Kubernetes’s bin-packing and multi-tenancy capabilities, CPU utilization has improved on average two- to threefold. +

An early adopter of microservices and Docker, Spotify had containerized microservices running across its fleet of VMs since 2014. The company used an open source, homegrown container orchestration system called Helios, and in 2016-17 completed a migration from on premise data centers to Google Cloud. Underpinning these decisions, "We have a culture around autonomous teams, over 200 autonomous engineering squads who are working on different pieces of the pie, and they need to be able to iterate quickly," Chakrabarti says. "So for us to have developer velocity tools that allow squads to move quickly is really important."

-
-
-
-
- "We were able to use a lot of the Kubernetes APIs and extensibility features to support and interface with our legacy infrastructure, so the integration was straightforward and easy."

- James Wen, Site Reliability Engineer, Spotify
-
-
+

But by late 2017, it became clear that "having a small team working on the Helios features was just not as efficient as adopting something that was supported by a much bigger community," says Chakrabarti. "We saw the amazing community that had grown up around Kubernetes, and we wanted to be part of that. We wanted to benefit from added velocity and reduced cost, and also align with the rest of the industry on best practices and tools." At the same time, the team wanted to contribute its expertise and influence in the flourishing Kubernetes community.

-
-
- Chakrabarti points out that for all four of the top-level metrics that Spotify looks at—lead time, deployment frequency, time to resolution, and operational load—"there is impact that Kubernetes is having."

- One success story that’s come out of the early days of Kubernetes is a tool called Slingshot that a Spotify team built on Kubernetes. "With a pull request, it creates a temporary staging environment that self destructs after 24 hours," says Chakrabarti. "It’s all facilitated by Kubernetes, so that’s kind of an exciting example of how, once the technology is out there and ready to use, people start to build on top of it and craft their own solutions, even beyond what we might have envisioned as the initial purpose of it."

- Spotify has also started to use gRPC and Envoy, replacing existing homegrown solutions, just as it had with Kubernetes. "We created things because of the scale we were at, and there was no other solution existing," says Dave Zolotusky, Software Engineer, Infrastructure and Operations. "But then the community kind of caught up and surpassed us, even for tools that work at that scale." +{{< case-studies/quote + image="/images/case-studies/spotify/banner3.jpg" + author="Dave Zolotusky, Software Engineer, Infrastructure and Operations, Spotify" +>}} +"The community has been extremely helpful in getting us to work through all the technology much faster and much easier. And it's helped us validate all the things we're doing." +{{< /case-studies/quote >}} -
+

Another plus: "Kubernetes fit very nicely as a complement and now as a replacement to Helios, so we could have it running alongside Helios to mitigate the risks," says Chakrabarti. "During the migration, the services run on both, so we're not having to put all of our eggs in one basket until we can validate Kubernetes under a variety of load circumstances and stress circumstances."

-
-
- "It’s been surprisingly easy to get in touch with anybody we wanted to, to get expertise on any of the things we’re working with. And it’s helped us validate all the things we’re doing." -

- James Wen, Site Reliability Engineer, Spotify
-
+

The team spent much of 2018 addressing the core technology issues required for the migration. "We were able to use a lot of the Kubernetes APIs and extensibility features of Kubernetes to support and interface with our legacy infrastructure, so the integration was straightforward and easy," says Site Reliability Engineer James Wen.

-
- Both of those technologies are in early stages of adoption, but already "we have reason to believe that gRPC will have a more drastic impact during early development by helping with a lot of issues like schema management, API design, weird backward compatibility issues, things like that," says Zolotusky. "So we’re leaning heavily on gRPC to help us in that space."

- As the team continues to fill out Spotify’s cloud native stack—tracing is up next—it is using the CNCF landscape as a helpful guide. "We look at things we need to solve, and if there are a bunch of projects, we evaluate them equivalently, but there is definitely value to the project being a CNCF project," says Zolotusky.

- Spotify’s experiences so far with Kubernetes bears this out. "The community has been extremely helpful in getting us to work through all the technology much faster and much easier," Zolotusky says. "It’s been surprisingly easy to get in touch with anybody we wanted to, to get expertise on any of the things we’re working with. And it’s helped us validate all the things we’re doing." +

Migration started late that year and has accelerated in 2019. "Our focus is really on stateless services, and once we address our last remaining technology blocker, that's where we hope that the uptick will come from," says Chakrabarti. "For stateful services there's more work that we need to do."

-
-
- - +

A small percentage of Spotify's fleet, containing over 150 services, has been migrated to Kubernetes so far. "We've heard from our customers that they have less of a need to focus on manual capacity provisioning and more time to focus on delivering features for Spotify," says Chakrabarti. The biggest service currently running on Kubernetes takes over 10 million requests per second as an aggregate service and benefits greatly from autoscaling, says Wen. Plus, Wen adds, "Before, teams would have to wait for an hour to create a new service and get an operational host to run it in production, but with Kubernetes, they can do that on the order of seconds and minutes." In addition, with Kubernetes's bin-packing and multi-tenancy capabilities, CPU utilization has improved on average two- to threefold.

+ +{{< case-studies/quote + image="/images/case-studies/spotify/banner4.jpg" + author="James Wen, Site Reliability Engineer, Spotify" +>}} +"We were able to use a lot of the Kubernetes APIs and extensibility features to support and interface with our legacy infrastructure, so the integration was straightforward and easy." +{{< /case-studies/quote >}} + +

Chakrabarti points out that for all four of the top-level metrics that Spotify looks at—lead time, deployment frequency, time to resolution, and operational load—"there is impact that Kubernetes is having."

+ +

One success story that's come out of the early days of Kubernetes is a tool called Slingshot that a Spotify team built on Kubernetes. "With a pull request, it creates a temporary staging environment that self destructs after 24 hours," says Chakrabarti. "It's all facilitated by Kubernetes, so that's kind of an exciting example of how, once the technology is out there and ready to use, people start to build on top of it and craft their own solutions, even beyond what we might have envisioned as the initial purpose of it."

+ +

Spotify has also started to use gRPC and Envoy, replacing existing homegrown solutions, just as it had with Kubernetes. "We created things because of the scale we were at, and there was no other solution existing," says Dave Zolotusky, Software Engineer, Infrastructure and Operations. "But then the community kind of caught up and surpassed us, even for tools that work at that scale."

+ +{{< case-studies/quote author="James Wen, Site Reliability Engineer, Spotify" >}} +"It's been surprisingly easy to get in touch with anybody we wanted to, to get expertise on any of the things we're working with. And it's helped us validate all the things we're doing." +{{< /case-studies/quote >}} + +

Both of those technologies are in early stages of adoption, but already "we have reason to believe that gRPC will have a more drastic impact during early development by helping with a lot of issues like schema management, API design, weird backward compatibility issues, things like that," says Zolotusky. "So we're leaning heavily on gRPC to help us in that space."

+ +

As the team continues to fill out Spotify's cloud native stack—tracing is up next—it is using the CNCF landscape as a helpful guide. "We look at things we need to solve, and if there are a bunch of projects, we evaluate them equivalently, but there is definitely value to the project being a CNCF project," says Zolotusky.

+ +

Spotify's experiences so far with Kubernetes bears this out. "The community has been extremely helpful in getting us to work through all the technology much faster and much easier," Zolotusky says. "It's been surprisingly easy to get in touch with anybody we wanted to, to get expertise on any of the things we're working with. And it's helped us validate all the things we're doing."

\ No newline at end of file diff --git a/content/en/case-studies/squarespace/index.html b/content/en/case-studies/squarespace/index.html index 27340835f4..461e466d8c 100644 --- a/content/en/case-studies/squarespace/index.html +++ b/content/en/case-studies/squarespace/index.html @@ -2,100 +2,70 @@ title: Squarespace Case Study case_study_styles: true cid: caseStudies -css: /css/style_case_studies.css + +new_case_study_styles: true +heading_background: /images/case-studies/squarespace/banner1.jpg +heading_title_logo: /images/squarespace_logo.png +subheading: > + Squarespace: Gaining Productivity and Resilience with Kubernetes +case_study_details: + - Company: Squarespace + - Location: New York, N.Y. + - Industry: Software as a Service, Website-Building Platform --- -
-

CASE STUDY:
Squarespace: Gaining Productivity and Resilience with Kubernetes -

+

Challenge

-
+

Moving from a monolith to microservices in 2014 "solved a problem on the development side, but it pushed that problem to the infrastructure team," says Kevin Lynch, Staff Engineer on the Site Reliability team at Squarespace. "The infrastructure deployment process on our 5,000 VM hosts was slowing everyone down."

-
- Company  Squarespace     Location  New York, N.Y.     Industry  Software as a Service, Website-Building Platform -
- -
-
-
-
-

Challenge

- Moving from a monolith to microservices in 2014 "solved a problem on the development side, but it pushed that problem to the infrastructure team," says Kevin Lynch, Staff Engineer on the Site Reliability team at Squarespace. "The infrastructure deployment process on our 5,000 VM hosts was slowing everyone down." -

Solution

-The team experimented with container orchestration platforms, and found that Kubernetes "answered all the questions that we had," says Lynch. The company began running Kubernetes in its data centers in 2016. -
- -
+

The team experimented with container orchestration platforms, and found that Kubernetes "answered all the questions that we had," says Lynch. The company began running Kubernetes in its data centers in 2016.

Impact

-Since Squarespace moved to Kubernetes, in conjunction with modernizing its networking stack, deployment time has been reduced by almost 85%. Before, their VM deployment would take half an hour; now, says Lynch, "someone can generate a templated application, deploy it within five minutes, and have actual instances containerized, running in our staging environment at that point." Because of that, "productivity time is the big cost saver," he adds. "When we started the Kubernetes project, we had probably a dozen microservices. Today there are twice that in the pipeline being actively worked on." Resilience has also been improved with Kubernetes: "If a node goes down, it’s rescheduled immediately and there’s no performance impact." -
+

Since Squarespace moved to Kubernetes, in conjunction with modernizing its networking stack, deployment time has been reduced by almost 85%. Before, their VM deployment would take half an hour; now, says Lynch, "someone can generate a templated application, deploy it within five minutes, and have actual instances containerized, running in our staging environment at that point." Because of that, "productivity time is the big cost saver," he adds. "When we started the Kubernetes project, we had probably a dozen microservices. Today there are twice that in the pipeline being actively worked on." Resilience has also been improved with Kubernetes: "If a node goes down, it's rescheduled immediately and there's no performance impact."

-
-
-
-
- -

"Once you prove that Kubernetes solves one problem, everyone immediately starts solving other problems without you even having to evangelize it." -

— Kevin Lynch, Staff Engineer on the Site Reliability team at Squarespace
-
-
-
-
-

Since it was started in a dorm room in 2003, Squarespace has made it simple for millions of people to create their own websites.

Behind the scenes, though, the company’s monolithic Java application was making things not so simple for its developers to keep improving the platform. So in 2014, the company decided to "go down the microservices path," says Kevin Lynch, staff engineer on Squarespace’s Site Reliability team. "But we were always deploying our applications in vCenter VMware VMs [in our own data centers]. Microservices solved a problem on the development side, but it pushed that problem to the Infrastructure team. The infrastructure deployment process on our 5,000 VM hosts was slowing everyone down."

- After experimenting with another container orchestration platform and "breaking it in very painful ways," Lynch says, the team began experimenting with Kubernetes in mid-2016 and found that it "answered all the questions that we had." Deploying it in the data center rather than the public cloud was their biggest challenge, and at the time, not a lot of other companies were doing that. "We had to figure out how to deploy this in our infrastructure for ourselves, and we had to integrate it with our other applications," says Lynch.

- At the same time, Squarespace’s Network Engineering team was modernizing its networking stack, switching from a traditional layer-two network to a layer-three spine-and-leaf network. "It mapped beautifully with what we wanted to do with Kubernetes," says Lynch. "It gives us the ability to have our servers communicate directly with the top-of-rack switches. We use Calico for CNI networking for Kubernetes, so we can announce all these individual Kubernetes pod IP addresses and have them integrate seamlessly with our other services that are still provisioned in the VMs." +{{< case-studies/quote author="Kevin Lynch, Staff Engineer on the Site Reliability team at Squarespace" >}} + +
+"Once you prove that Kubernetes solves one problem, everyone immediately starts solving other problems without you even having to evangelize it." +{{< /case-studies/quote >}} -
-
-
-
- After experimenting with another container orchestration platform and "breaking it in very painful ways," Lynch says, the team began experimenting with Kubernetes in mid-2016 and found that it "answered all the questions that we had." +{{< case-studies/lead >}} +Since it was started in a dorm room in 2003, Squarespace has made it simple for millions of people to create their own websites. +{{< /case-studies/lead >}} -
-
-
-
- Within a couple months, they had a stable cluster for their internal use, and began rolling out Kubernetes for production. They also added Zipkin and CNCF projects Prometheus and fluentd to their cloud native stack. "We switched to Kubernetes, a new world, and we revamped all our other tooling as well," says Lynch. "It allowed us to streamline our process, so we can now easily create an entire microservice project from templates, generate the code and deployment pipeline for that, generate the Docker file, and then immediately just ship a workable, deployable project to Kubernetes." Deployments across Dev/QA/Stage/Prod were also "simplified drastically," Lynch adds. "Now there is little configuration variation." -

- And the whole process takes only five minutes, an almost 85% reduction in time compared to their VM deployment. "From end to end that probably took half an hour, and that’s not accounting for the fact that an infrastructure engineer would be responsible for doing that, so there’s some business delay in there as well." -

- With faster deployments, "productivity time is the big cost saver," says Lynch. "We had a team that was implementing a new file storage service, and they just started integrating that with our storage back end without our involvement"—which wouldn’t have been possible before Kubernetes. He adds: "When we started the Kubernetes project, we had probably a dozen microservices. Today there are twice that in the pipeline being actively worked on." +

Behind the scenes, though, the company's monolithic Java application was making things not so simple for its developers to keep improving the platform. So in 2014, the company decided to "go down the microservices path," says Kevin Lynch, staff engineer on Squarespace's Site Reliability team. "But we were always deploying our applications in vCenter VMware VMs [in our own data centers]. Microservices solved a problem on the development side, but it pushed that problem to the Infrastructure team. The infrastructure deployment process on our 5,000 VM hosts was slowing everyone down."

+

After experimenting with another container orchestration platform and "breaking it in very painful ways," Lynch says, the team began experimenting with Kubernetes in mid-2016 and found that it "answered all the questions that we had." Deploying it in the data center rather than the public cloud was their biggest challenge, and at the time, not a lot of other companies were doing that. "We had to figure out how to deploy this in our infrastructure for ourselves, and we had to integrate it with our other applications," says Lynch.

-
-
-
-
- "We switched to Kubernetes, a new world....It allowed us to streamline our process, so we can now easily create an entire microservice project from templates," Lynch says. And the whole process takes only five minutes, an almost 85% reduction in time compared to their VM deployment. -
-
+

At the same time, Squarespace's Network Engineering team was modernizing its networking stack, switching from a traditional layer-two network to a layer-three spine-and-leaf network. "It mapped beautifully with what we wanted to do with Kubernetes," says Lynch. "It gives us the ability to have our servers communicate directly with the top-of-rack switches. We use Calico for CNI networking for Kubernetes, so we can announce all these individual Kubernetes pod IP addresses and have them integrate seamlessly with our other services that are still provisioned in the VMs."

-
-
- There’s also been a positive impact on the application’s resilience. "When we’re deploying VMs, we have to build tooling to ensure that a service is spread across racks appropriately and can withstand failure," he says. "Kubernetes just does it. If a node goes down, it’s rescheduled immediately and there’s no performance impact." -

- Another big benefit is autoscaling. "It wasn’t really possible with the way we’ve been using VMware," says Lynch, "but now we can just add the appropriate autoscaling features via Kubernetes directly, and boom, it’s scaling up as demand increases. And it worked out of the box." -

- For others starting out with Kubernetes, Lynch says his best advice is to "fail fast": "Once you’ve planned things out, just execute. Kubernetes has been really great for trying something out quickly and seeing if it works or not." +{{< case-studies/quote image="/images/case-studies/squarespace/banner3.jpg" >}} +After experimenting with another container orchestration platform and "breaking it in very painful ways," Lynch says, the team began experimenting with Kubernetes in mid-2016 and found that it "answered all the questions that we had." +{{< /case-studies/quote >}} +

Within a couple months, they had a stable cluster for their internal use, and began rolling out Kubernetes for production. They also added Zipkin and CNCF projects Prometheus and fluentd to their cloud native stack. "We switched to Kubernetes, a new world, and we revamped all our other tooling as well," says Lynch. "It allowed us to streamline our process, so we can now easily create an entire microservice project from templates, generate the code and deployment pipeline for that, generate the Docker file, and then immediately just ship a workable, deployable project to Kubernetes." Deployments across Dev/QA/Stage/Prod were also "simplified drastically," Lynch adds. "Now there is little configuration variation."

-
+

And the whole process takes only five minutes, an almost 85% reduction in time compared to their VM deployment. "From end to end that probably took half an hour, and that's not accounting for the fact that an infrastructure engineer would be responsible for doing that, so there's some business delay in there as well."

-
-
- "When we’re deploying VMs, we have to build tooling to ensure that a service is spread across racks appropriately and can withstand failure," he says. "Kubernetes just does it. If a node goes down, it’s rescheduled immediately and there’s no performance impact." -
-
+

With faster deployments, "productivity time is the big cost saver," says Lynch. "We had a team that was implementing a new file storage service, and they just started integrating that with our storage back end without our involvement"—which wouldn't have been possible before Kubernetes. He adds: "When we started the Kubernetes project, we had probably a dozen microservices. Today there are twice that in the pipeline being actively worked on."

-
- Lynch and his team are planning to open source some of the tools they’ve developed to extend Kubernetes and use it as an API itself. The first tool injects dependent applications as containers in a pod. "When you ship an application, usually it comes along with a whole bunch of dependent applications that need to be shipped with that, for example, fluentd for logging," he explains. With this tool, the developer doesn’t need to worry about the configurations. -

- Going forward, all new services at Squarespace are going into Kubernetes, and the end goal is to convert everything it can. About a quarter of existing services have been migrated. "Our monolithic application is going to be the last one, just because it’s so big and complex," says Lynch. "But now I’m seeing other services get moved over, like the file storage service. Someone just did it and it worked—painlessly. So I believe if we tackle it, it’s probably going to be a lot easier than we fear. Maybe I should just take my own advice and fail fast!" +{{< case-studies/quote image="/images/case-studies/squarespace/banner4.jpg" >}} +"We switched to Kubernetes, a new world....It allowed us to streamline our process, so we can now easily create an entire microservice project from templates," Lynch says. And the whole process takes only five minutes, an almost 85% reduction in time compared to their VM deployment. +{{< /case-studies/quote >}} -
+

There's also been a positive impact on the application's resilience. "When we're deploying VMs, we have to build tooling to ensure that a service is spread across racks appropriately and can withstand failure," he says. "Kubernetes just does it. If a node goes down, it's rescheduled immediately and there's no performance impact."

-
+

Another big benefit is autoscaling. "It wasn't really possible with the way we've been using VMware," says Lynch, "but now we can just add the appropriate autoscaling features via Kubernetes directly, and boom, it's scaling up as demand increases. And it worked out of the box."

+ +

For others starting out with Kubernetes, Lynch says his best advice is to "fail fast": "Once you've planned things out, just execute. Kubernetes has been really great for trying something out quickly and seeing if it works or not."

+ +{{< case-studies/quote >}} +"When we're deploying VMs, we have to build tooling to ensure that a service is spread across racks appropriately and can withstand failure," he says. "Kubernetes just does it. If a node goes down, it's rescheduled immediately and there's no performance impact." +{{< /case-studies/quote >}} + +

Lynch and his team are planning to open source some of the tools they've developed to extend Kubernetes and use it as an API itself. The first tool injects dependent applications as containers in a pod. "When you ship an application, usually it comes along with a whole bunch of dependent applications that need to be shipped with that, for example, fluentd for logging," he explains. With this tool, the developer doesn't need to worry about the configurations.

+ +

Going forward, all new services at Squarespace are going into Kubernetes, and the end goal is to convert everything it can. About a quarter of existing services have been migrated. "Our monolithic application is going to be the last one, just because it's so big and complex," says Lynch. "But now I'm seeing other services get moved over, like the file storage service. Someone just did it and it worked—painlessly. So I believe if we tackle it, it's probably going to be a lot easier than we fear. Maybe I should just take my own advice and fail fast!"

diff --git a/content/en/case-studies/thredup/index.html b/content/en/case-studies/thredup/index.html index ad990356ff..fb2cae26d5 100644 --- a/content/en/case-studies/thredup/index.html +++ b/content/en/case-studies/thredup/index.html @@ -3,92 +3,75 @@ title: ThredUp Case Study linkTitle: thredup case_study_styles: true cid: caseStudies -css: /css/style_case_studies.css featured: false + +new_case_study_styles: true +heading_background: /images/case-studies/thredup/banner1.jpg +heading_title_logo: /images/thredup_logo.png +case_study_details: + - Company: ThredUp + - Location: San Francisco, CA + - Industry: eCommerce --- -
-

CASE STUDY:

-

-
+

Challenge

-
- Company  ThredUp     Location  San Francisco, CA     Industry  eCommerce -
+

The largest online consignment store for women's and children's clothes, ThredUP launched in 2009 with a monolithic application running on Amazon Web Services. Though the company began breaking up the monolith into microservices a few years ago, the infrastructure team was still dealing with handcrafted servers, which hampered productivity. "We've configured them just to get them out as fast as we could, but there was no standardization, and as we kept growing, that became a bigger and bigger chore to manage," says Cofounder/CTO Chris Homer. The infrastructure, they realized, needed to be modernized to enable the velocity the company needed. "It's really important to a company like us who's disrupting the retail industry to make sure that as we're building software and getting it out in front of our users, we can do it on a fast cycle and learn a ton as we experiment," adds Homer. "We wanted to make sure that our engineers could embrace the DevOps mindset as they built software. It was really important to us that they could own the life cycle from end to end, from conception at design, through shipping it and running it in production, from marketing to ecommerce, the user experience and our internal distribution center operations."

-
-
-
-
-

Challenge

- The largest online consignment store for women’s and children’s clothes, ThredUP launched in 2009 with a monolithic application running on Amazon Web Services. Though the company began breaking up the monolith into microservices a few years ago, the infrastructure team was still dealing with handcrafted servers, which hampered productivity. "We’ve configured them just to get them out as fast as we could, but there was no standardization, and as we kept growing, that became a bigger and bigger chore to manage," says Cofounder/CTO Chris Homer. The infrastructure, they realized, needed to be modernized to enable the velocity the company needed. "It’s really important to a company like us who’s disrupting the retail industry to make sure that as we’re building software and getting it out in front of our users, we can do it on a fast cycle and learn a ton as we experiment," adds Homer. "We wanted to make sure that our engineers could embrace the DevOps mindset as they built software. It was really important to us that they could own the life cycle from end to end, from conception at design, through shipping it and running it in production, from marketing to ecommerce, the user experience and our internal distribution center operations." -

+

Solution

+ +

In early 2017, the company adopted Kubernetes for container orchestration, and in the course of a year, the entire infrastructure was moved to Kubernetes.

-

Solution

- In early 2017, the company adopted Kubernetes for container orchestration, and in the course of a year, the entire infrastructure was moved to Kubernetes. -

Impact

- Before, "even considering that we already have all the infrastructure in the cloud, databases and services, and all these good things," says Infrastructure Engineer Oleksandr Snagovskyi, setting up a new service meant waiting 2-4 weeks just to get the environment. With Kubernetes, new application roll-out time has decreased from several days or weeks to minutes or hours. Now, says Infrastructure Engineer Oleksii Asiutin, "our developers can experiment with existing applications and create new services, and do it all blazingly fast." In fact, deployment time has decreased about 50% on average for key services. "Lead time" for all applications is under 20 minutes, enabling engineers to deploy multiple times a day. Plus, 3200+ ansible scripts have been deprecated in favor of helm charts. And impressively, hardware cost has decreased 56% while the number of services ThredUP runs has doubled. -
-
-
-
-
-

- "Moving towards cloud native technologies like Kubernetes really unlocks our ability to experiment quickly and learn from customers along the way." -

- CHRIS HOMER, COFOUNDER/CTO, THREDUP
-
-
-
-
-

The largest online consignment store for women’s and children’s clothes, ThredUP is focused on getting consumers to think second-hand first. "We’re disrupting the retail industry, and it’s really important to us to make sure that as we’re building software and getting it out in front of our users, we can do it on a fast cycle and learn a ton as we experiment," says Cofounder/CTO Chris Homer. -

- But over the past few years, ThredUP, which was launched in 2009 with a monolithic application running on Amazon Web Services, was feeling growing pains as its user base passed the 20- million mark. Though the company had begun breaking up the monolith into microservices, the infrastructure team was still dealing with handcrafted servers, which hampered productivity. "We’ve configured them just to get them out as fast as we could, but there was no standardization, and as we kept growing, that became a bigger and bigger chore to manage," says Homer. The infrastructure, Homer realized, needed to be modernized to enable the velocity—and the culture—the company wanted. -

- "We wanted to make sure that our engineers could embrace the DevOps mindset as they built software," Homer says. "It was really important to us that they could own the life cycle from end to end, from conception at design, through shipping it and running it in production, from marketing to ecommerce, the user experience and our internal distribution center operations." -
-
-
-
- "Kubernetes enabled auto scaling in a seamless and easily manageable way on days like Black Friday. We no longer have to sit there adding instances, monitoring the traffic, doing a lot of manual work."

- CHRIS HOMER, COFOUNDER/CTO, THREDUP
+

Before, "even considering that we already have all the infrastructure in the cloud, databases and services, and all these good things," says Infrastructure Engineer Oleksandr Snagovskyi, setting up a new service meant waiting 2-4 weeks just to get the environment. With Kubernetes, new application roll-out time has decreased from several days or weeks to minutes or hours. Now, says Infrastructure Engineer Oleksii Asiutin, "our developers can experiment with existing applications and create new services, and do it all blazingly fast." In fact, deployment time has decreased about 50% on average for key services. "Lead time" for all applications is under 20 minutes, enabling engineers to deploy multiple times a day. Plus, 3200+ ansible scripts have been deprecated in favor of helm charts. And impressively, hardware cost has decreased 56% while the number of services ThredUP runs has doubled.

-
-
-
-
- In early 2017, Homer found the solution with Kubernetes container orchestration. In the course of a year, the company migrated its entire infrastructure to Kubernetes, starting with its website applications and concluding with its operations backend. Teams are now also using Fluentd and Helm. "Initially there were skeptics about the value that this move to cloud native technologies would bring, but as we went through the process, people very quickly started to realize the benefit of having seamless upgrades and easy rollbacks without having to worry about what was happening," says Homer. "It unlocks the developers’ confidence in being able to deploy quickly, learn, and if you make a mistake, you can roll it back without any issue." -

- According to the infrastructure team, the key improvement was the consistent experience Kubernetes enabled for developers. "It lets developers work in the same environment that their application will be running in production," says Infrastructure Engineer Oleksandr Snagovskyi. Plus, "It became easier to test, easier to refine, and easier to deploy, because everything’s done automatically," says Infrastructure Engineer Oleksii Asiutin. "One of the main goals of our team is to make developers’ lives more comfortable, and we are achieving this with Kubernetes. They can experiment with existing applications and create new services, and do it all blazingly fast." -
-
-
-
- "One of the main goals of our team is to make developers’ lives more comfortable, and we are achieving this with Kubernetes. They can experiment with existing applications and create new services, and do it all blazingly fast."

- OLEKSII ASIUTIN, INFRASTRUCTURE ENGINEER, THREDUP
-
-
+{{< case-studies/quote author="CHRIS HOMER, COFOUNDER/CTO, THREDUP" >}} + +
+"Moving towards cloud native technologies like Kubernetes really unlocks our ability to experiment quickly and learn from customers along the way." +{{< /case-studies/quote >}} -
-
- Before, "even considering that we already have all the infrastructure in the cloud, databases and services, and all these good things," says Snagovskyi, setting up a new service meant waiting 2-4 weeks just to get the environment. With Kubernetes, because of simple configuration and minimal dependency on the infrastructure team, the roll-out time for new applications has decreased from several days or weeks to minutes or hours. -

- In fact, deployment time has decreased about 50% on average for key services. "Fast deployment and parallel test execution in Kubernetes keep a ‘lead time’ for all applications under 20 minutes," allowing engineers to do multiple releases a day, says Director of Infrastructure Roman Chepurnyi. The infrastructure team’s jobs, he adds, have become less burdensome, too: "We can execute seamless upgrades frequently and keep cluster performance and security up-to-date because OS-level hardening and upgrades of a Kubernetes cluster is a non-blocking activity for production operations and does not involve coordination with multiple engineering teams." -

- More than 3,200 ansible scripts have been deprecated in favor of Helm charts. And impressively, hardware cost has decreased 56% while the number of services ThredUP runs has doubled. -
+{{< case-studies/lead >}} +The largest online consignment store for women's and children's clothes, ThredUP is focused on getting consumers to think second-hand first. "We're disrupting the retail industry, and it's really important to us to make sure that as we're building software and getting it out in front of our users, we can do it on a fast cycle and learn a ton as we experiment," says Cofounder/CTO Chris Homer. +{{< /case-studies/lead >}} -
-
-"Our future’s all about automation, and behind that, cloud native technologies are going to unlock our ability to embrace that and go full force towards the future."

- CHRIS HOMER, COFOUNDER/CTO, THREDUP
-
+

But over the past few years, ThredUP, which was launched in 2009 with a monolithic application running on Amazon Web Services, was feeling growing pains as its user base passed the 20- million mark. Though the company had begun breaking up the monolith into microservices, the infrastructure team was still dealing with handcrafted servers, which hampered productivity. "We've configured them just to get them out as fast as we could, but there was no standardization, and as we kept growing, that became a bigger and bigger chore to manage," says Homer. The infrastructure, Homer realized, needed to be modernized to enable the velocity—and the culture—the company wanted.

-
- Perhaps the impact is most evident on the busiest days in retail. "Kubernetes enabled auto scaling in a seamless and easily manageable way on days like Black Friday," says Homer. "We no longer have to sit there adding instances, monitoring the traffic, doing a lot of manual work. That’s handled for us, and instead we can actually have some turkey, drink some wine and enjoy our families." -

- For ThredUP, Kubernetes fits perfectly with the company’s vision for how it’s changing retail. Some of what ThredUP does is still very manual: "As our customers send bags of items to our distribution centers, they’re photographed, inspected, tagged, and put online today," says Homer. -

- But in every other aspect, "we use different forms of technology to drive everything we do," Homer says. "We have machine learning algorithms to help predict the likelihood of sale for items, which drives our pricing algorithm. We have personalization algorithms that look at the images and try to determine style and match users’ preferences across our systems." -

- Count Kubernetes as one of those drivers. "Our future’s all about automation," says Homer, "and behind that, cloud native technologies are going to unlock our ability to embrace that and go full force towards the future." -
-
+

"We wanted to make sure that our engineers could embrace the DevOps mindset as they built software," Homer says. "It was really important to us that they could own the life cycle from end to end, from conception at design, through shipping it and running it in production, from marketing to ecommerce, the user experience and our internal distribution center operations."

+ +{{< case-studies/quote + image="/images/case-studies/thredup/banner3.jpg" + author="CHRIS HOMER, COFOUNDER/CTO, THREDUP" +>}} +"Kubernetes enabled auto scaling in a seamless and easily manageable way on days like Black Friday. We no longer have to sit there adding instances, monitoring the traffic, doing a lot of manual work." +{{< /case-studies/quote >}} + +

In early 2017, Homer found the solution with Kubernetes container orchestration. In the course of a year, the company migrated its entire infrastructure to Kubernetes, starting with its website applications and concluding with its operations backend. Teams are now also using Fluentd and Helm. "Initially there were skeptics about the value that this move to cloud native technologies would bring, but as we went through the process, people very quickly started to realize the benefit of having seamless upgrades and easy rollbacks without having to worry about what was happening," says Homer. "It unlocks the developers' confidence in being able to deploy quickly, learn, and if you make a mistake, you can roll it back without any issue."

+ +

According to the infrastructure team, the key improvement was the consistent experience Kubernetes enabled for developers. "It lets developers work in the same environment that their application will be running in production," says Infrastructure Engineer Oleksandr Snagovskyi. Plus, "It became easier to test, easier to refine, and easier to deploy, because everything's done automatically," says Infrastructure Engineer Oleksii Asiutin. "One of the main goals of our team is to make developers' lives more comfortable, and we are achieving this with Kubernetes. They can experiment with existing applications and create new services, and do it all blazingly fast."

+ +{{< case-studies/quote + image="/images/case-studies/thredup/banner4.jpg" + author="OLEKSII ASIUTIN, INFRASTRUCTURE ENGINEER, THREDUP" +>}} +"One of the main goals of our team is to make developers' lives more comfortable, and we are achieving this with Kubernetes. They can experiment with existing applications and create new services, and do it all blazingly fast." +{{< /case-studies/quote >}} + +

Before, "even considering that we already have all the infrastructure in the cloud, databases and services, and all these good things," says Snagovskyi, setting up a new service meant waiting 2-4 weeks just to get the environment. With Kubernetes, because of simple configuration and minimal dependency on the infrastructure team, the roll-out time for new applications has decreased from several days or weeks to minutes or hours.

+ +

In fact, deployment time has decreased about 50% on average for key services. "Fast deployment and parallel test execution in Kubernetes keep a 'lead time' for all applications under 20 minutes," allowing engineers to do multiple releases a day, says Director of Infrastructure Roman Chepurnyi. The infrastructure team's jobs, he adds, have become less burdensome, too: "We can execute seamless upgrades frequently and keep cluster performance and security up-to-date because OS-level hardening and upgrades of a Kubernetes cluster is a non-blocking activity for production operations and does not involve coordination with multiple engineering teams."

+ +

More than 3,200 ansible scripts have been deprecated in favor of Helm charts. And impressively, hardware cost has decreased 56% while the number of services ThredUP runs has doubled.

+ +{{< case-studies/quote author="CHRIS HOMER, COFOUNDER/CTO, THREDUP">}} +"Our future's all about automation, and behind that, cloud native technologies are going to unlock our ability to embrace that and go full force towards the future." +{{< /case-studies/quote >}} + +

Perhaps the impact is most evident on the busiest days in retail. "Kubernetes enabled auto scaling in a seamless and easily manageable way on days like Black Friday," says Homer. "We no longer have to sit there adding instances, monitoring the traffic, doing a lot of manual work. That's handled for us, and instead we can actually have some turkey, drink some wine and enjoy our families."

+ +

For ThredUP, Kubernetes fits perfectly with the company's vision for how it's changing retail. Some of what ThredUP does is still very manual: "As our customers send bags of items to our distribution centers, they're photographed, inspected, tagged, and put online today," says Homer.

+ +

But in every other aspect, "we use different forms of technology to drive everything we do," Homer says. "We have machine learning algorithms to help predict the likelihood of sale for items, which drives our pricing algorithm. We have personalization algorithms that look at the images and try to determine style and match users' preferences across our systems."

+ +

Count Kubernetes as one of those drivers. "Our future's all about automation," says Homer, "and behind that, cloud native technologies are going to unlock our ability to embrace that and go full force towards the future."

diff --git a/content/en/case-studies/vsco/index.html b/content/en/case-studies/vsco/index.html index c2ac2a2a72..95d8ae011c 100644 --- a/content/en/case-studies/vsco/index.html +++ b/content/en/case-studies/vsco/index.html @@ -3,95 +3,77 @@ title: vsco Case Study linkTitle: vsco case_study_styles: true cid: caseStudies -css: /css/style_case_studies.css featured: false + +new_case_study_styles: true +heading_background: /images/case-studies/vsco/banner1.jpg +heading_title_logo: /images/vsco_logo.png +subheading: > + VSCO: How a Mobile App Saved 70% on Its EC2 Bill with Cloud Native +case_study_details: + - Company: VSCO + - Location: Oakland, CA + - Industry: Photo Mobile App --- -
-

CASE STUDY:
VSCO: How a Mobile App Saved 70% on Its EC2 Bill with Cloud Native -

+

Challenge

-
+

After moving from Rackspace to AWS in 2015, VSCO began building Node.js and Go microservices in addition to running its PHP monolith. The team containerized the microservices using Docker, but "they were all in separate groups of EC2 instances that were dedicated per service," says Melinda Lu, Engineering Manager for the Machine Learning Team. Adds Naveen Gattu, Senior Software Engineer on the Community Team: "That yielded a lot of wasted resources. We started looking for a way to consolidate and be more efficient in the AWS EC2 instances."

-
- Company  VSCO     Location  Oakland, CA     Industry  Photo Mobile App -
+

Solution

+ +

The team began exploring the idea of a scheduling system, and looked at several solutions including Mesos and Swarm before deciding to go with Kubernetes. VSCO also uses gRPC and Envoy in their cloud native stack.

-
-
-
-
-

Challenge

- After moving from Rackspace to AWS in 2015, VSCO began building Node.js and Go microservices in addition to running its PHP monolith. The team containerized the microservices using Docker, but "they were all in separate groups of EC2 instances that were dedicated per service," says Melinda Lu, Engineering Manager for the Machine Learning Team. Adds Naveen Gattu, Senior Software Engineer on the Community Team: "That yielded a lot of wasted resources. We started looking for a way to consolidate and be more efficient in the AWS EC2 instances." -

Solution

- The team began exploring the idea of a scheduling system, and looked at several solutions including Mesos and Swarm before deciding to go with Kubernetes. VSCO also uses gRPC and Envoy in their cloud native stack. -

Impact

- Before, deployments required "a lot of manual tweaking, in-house scripting that we wrote, and because of our disparate EC2 instances, Operations had to babysit the whole thing from start to finish," says Senior Software Engineer Brendan Ryan. "We didn't really have a story around testing in a methodical way, and using reusable containers or builds in a standardized way." There's a faster onboarding process now. Before, the time to first deploy was two days' hands-on setup time; now it's two hours. By moving to continuous integration, containerization, and Kubernetes, velocity was increased dramatically. The time from code-complete to deployment in production on real infrastructure went from one to two weeks to two to four hours for a typical service. Adds Gattu: "In man hours, that's one person versus a developer and a DevOps individual at the same time." With an 80% decrease in time for a single deployment to happen in production, the number of deployments has increased as well, from 1200/year to 3200/year. There have been real dollar savings too: With Kubernetes, VSCO is running at 2x to 20x greater EC2 efficiency, depending on the service, adding up to about 70% overall savings on the company's EC2 bill. Ryan points to the company's ability to go from managing one large monolithic application to 50+ microservices with "the same size developer team, more or less. And we've only been able to do that because we have increased trust in our tooling and a lot more flexibility, so we don't need to employ a DevOps engineer to tune every service." With Kubernetes, gRPC, and Envoy in place, VSCO has seen an 88% reduction in total minutes of outage time, mainly due to the elimination of JSON-schema errors and service-specific infrastructure provisioning errors, and an increased speed in fixing outages. -
+

Before, deployments required "a lot of manual tweaking, in-house scripting that we wrote, and because of our disparate EC2 instances, Operations had to babysit the whole thing from start to finish," says Senior Software Engineer Brendan Ryan. "We didn't really have a story around testing in a methodical way, and using reusable containers or builds in a standardized way." There's a faster onboarding process now. Before, the time to first deploy was two days' hands-on setup time; now it's two hours. By moving to continuous integration, containerization, and Kubernetes, velocity was increased dramatically. The time from code-complete to deployment in production on real infrastructure went from one to two weeks to two to four hours for a typical service. Adds Gattu: "In man hours, that's one person versus a developer and a DevOps individual at the same time." With an 80% decrease in time for a single deployment to happen in production, the number of deployments has increased as well, from 1200/year to 3200/year. There have been real dollar savings too: With Kubernetes, VSCO is running at 2x to 20x greater EC2 efficiency, depending on the service, adding up to about 70% overall savings on the company's EC2 bill. Ryan points to the company's ability to go from managing one large monolithic application to 50+ microservices with "the same size developer team, more or less. And we've only been able to do that because we have increased trust in our tooling and a lot more flexibility, so we don't need to employ a DevOps engineer to tune every service." With Kubernetes, gRPC, and Envoy in place, VSCO has seen an 88% reduction in total minutes of outage time, mainly due to the elimination of JSON-schema errors and service-specific infrastructure provisioning errors, and an increased speed in fixing outages.

-
-
-
-
- "I've been really impressed seeing how our engineers have come up with creative solutions to things by just combining a lot of Kubernetes primitives. Exposing Kubernetes constructs as a service to our engineers as opposed to exposing higher order constructs has worked well for us. It lets you get familiar with the technology and do more interesting things with it." -

- MELINDA LU, ENGINEERING MANAGER FOR VSCO'S MACHINE LEARNING TEAM
-
-
-
-
-

A photography app for mobile, VSCO was born in the cloud in 2011. In the beginning, "we were using Rackspace and had one PHP monolith application talking to MySQL database, with FTP deployments, no containerization, no orchestration," says Software Engineer Brendan Ryan, "which was sufficient at the time."

- After VSCO moved to AWS in 2015 and its user base passed the 30 million mark, the team quickly realized that set-up wouldn't work anymore. Developers had started building some Node and Go microservices, which the team tried containerizing with Docker. But "they were all in separate groups of EC2 instances that were dedicated per service," says Melinda Lu, Engineering Manager for the Machine Learning Team. Adds Naveen Gattu, Senior Software Engineer on the Community Team: "That yielded a lot of wasted resources. We started looking for a way to consolidate and be more efficient in the EC2 instances." -

- With a checklist that included ease of use and implementation, level of support, and whether it was open source, the team evaluated a few scheduling solutions, including Mesos and Swarm, before deciding to go with Kubernetes. "Kubernetes seemed to have the strongest open source community around it," says Lu. Plus, "We had started to standardize on a lot of the Google stack, with Go as a language, and gRPC for almost all communication between our own services inside the data center. So it seemed pretty natural for us to choose Kubernetes." +{{< case-studies/quote author="MELINDA LU, ENGINEERING MANAGER FOR VSCO'S MACHINE LEARNING TEAM" >}} +"I've been really impressed seeing how our engineers have come up with creative solutions to things by just combining a lot of Kubernetes primitives. Exposing Kubernetes constructs as a service to our engineers as opposed to exposing higher order constructs has worked well for us. It lets you get familiar with the technology and do more interesting things with it." +{{< /case-studies/quote >}} -
-
-
-
- "Kubernetes seemed to have the strongest open source community around it, plus, we had started to standardize on a lot of the Google stack, with Go as a language, and gRPC for almost all communication between our own services inside the data center. So it seemed pretty natural for us to choose Kubernetes."

- MELINDA LU, ENGINEERING MANAGER FOR VSCO'S MACHINE LEARNING TEAM
+{{< case-studies/lead >}} +A photography app for mobile, VSCO was born in the cloud in 2011. In the beginning, "we were using Rackspace and had one PHP monolith application talking to MySQL database, with FTP deployments, no containerization, no orchestration," says Software Engineer Brendan Ryan, "which was sufficient at the time." +{{< /case-studies/lead >}} -
-
-
-
- At the time, there were few managed Kubernetes offerings and less tooling available in the ecosystem, so the team stood up its own cluster and built some custom components for its specific deployment needs, such as an automatic ingress controller and policy constructs for canary deploys. "We had already begun breaking up the monolith, so we moved things one by one, starting with pretty small, low-risk services," says Lu. "Every single new service was deployed there." The first service was migrated at the end of 2016, and after one year, 80% of the entire stack was on Kubernetes, including the rest of the monolith. -

- The impact has been great. Deployments used to require "a lot of manual tweaking, in-house scripting that we wrote, and because of our disparate EC2 instances, Operations had to babysit the whole thing from start to finish," says Ryan. "We didn't really have a story around testing in a methodical way, and using reusable containers or builds in a standardized way." There's a faster onboarding process now. Before, the time to first deploy was two days' hands-on setup time; now it's two hours. -

- By moving to continuous integration, containerization, and Kubernetes, velocity was increased dramatically. The time from code-complete to deployment in production on real infrastructure went from one to two weeks to two to four hours for a typical service. Plus, says Gattu, "In man hours, that's one person versus a developer and a DevOps individual at the same time." With an 80% decrease in time for a single deployment to happen in production, the number of deployments has increased as well, from 1200/year to 3200/year. +

After VSCO moved to AWS in 2015 and its user base passed the 30 million mark, the team quickly realized that set-up wouldn't work anymore. Developers had started building some Node and Go microservices, which the team tried containerizing with Docker. But "they were all in separate groups of EC2 instances that were dedicated per service," says Melinda Lu, Engineering Manager for the Machine Learning Team. Adds Naveen Gattu, Senior Software Engineer on the Community Team: "That yielded a lot of wasted resources. We started looking for a way to consolidate and be more efficient in the EC2 instances."

-
-
-
-
- "I've been really impressed seeing how our engineers have come up with really creative solutions to things by just combining a lot of Kubernetes primitives, exposing Kubernetes constructs as a service to our engineers as opposed to exposing higher order constructs has worked well for us. It lets you get familiar with the technology and do more interesting things with it."

- MELINDA LU, ENGINEERING MANAGER FOR VSCO’S MACHINE LEARNING TEAM
-
-
+

With a checklist that included ease of use and implementation, level of support, and whether it was open source, the team evaluated a few scheduling solutions, including Mesos and Swarm, before deciding to go with Kubernetes. "Kubernetes seemed to have the strongest open source community around it," says Lu. Plus, "We had started to standardize on a lot of the Google stack, with Go as a language, and gRPC for almost all communication between our own services inside the data center. So it seemed pretty natural for us to choose Kubernetes."

-
-
- There have been real dollar savings too: With Kubernetes, VSCO is running at 2x to 20x greater EC2 efficiency, depending on the service, adding up to about 70% overall savings on the company’s EC2 bill. -

- Ryan points to the company’s ability to go from managing one large monolithic application to 50+ microservices with “the same size developer team, more or less. And we’ve only been able to do that because we have increased trust in our tooling and a lot more flexibility when there are stress points in our system. You can increase CPU memory requirements of a service without having to bring up and tear down instances, and read through AWS pages just to be familiar with a lot of jargon, which isn’t really tenable for a company at our scale.” -

- Envoy and gRPC have also had a positive impact at VSCO. “We get many benefits from gRPC out of the box: type safety across multiple languages, ease of defining services with the gRPC IDL, built-in architecture like interceptors, and performance improvements over HTTP/1.1 and JSON,” says Lu. -

- VSCO was one of the first users of Envoy, getting it in production five days after it was open sourced. “We wanted to serve gRPC and HTTP/2 directly to mobile clients through our edge load balancers, and Envoy was our only reasonable solution,” says Lu. “The ability to send consistent and detailed stats by default across all services has made observability and standardization of dashboards much easier.” The metrics that come built in with Envoy have also “greatly helped with debugging,” says DevOps Engineer Ryan Nguyen. -
+{{< case-studies/quote + image="/images/case-studies/vsco/banner2.jpg" + author="MELINDA LU, ENGINEERING MANAGER FOR VSCO'S MACHINE LEARNING TEAM" +>}} +"Kubernetes seemed to have the strongest open source community around it, plus, we had started to standardize on a lot of the Google stack, with Go as a language, and gRPC for almost all communication between our own services inside the data center. So it seemed pretty natural for us to choose Kubernetes." +{{< /case-studies/quote >}} -
-
-"Because there’s now an organization that supports Kubernetes, does that build confidence? The answer is a resounding yes."

- NAVEEN GATTU, SENIOR SOFTWARE ENGINEER ON VSCO’S COMMUNITY TEAM
-
+

At the time, there were few managed Kubernetes offerings and less tooling available in the ecosystem, so the team stood up its own cluster and built some custom components for its specific deployment needs, such as an automatic ingress controller and policy constructs for canary deploys. "We had already begun breaking up the monolith, so we moved things one by one, starting with pretty small, low-risk services," says Lu. "Every single new service was deployed there." The first service was migrated at the end of 2016, and after one year, 80% of the entire stack was on Kubernetes, including the rest of the monolith.

-
- With Kubernetes, gRPC, and Envoy in place, VSCO has seen an 88% reduction in total minutes of outage time, mainly due to the elimination of JSON-schema errors and service-specific infrastructure provisioning errors, and an increased speed in fixing outages. -

- Given its success using CNCF projects, VSCO is starting to experiment with others, including CNI and Prometheus. “To have a large organization backing these technologies, we have a lot more confidence trying this software and deploying to production,” says Nguyen. -

- The team has made contributions to gRPC and Envoy, and is hoping to be even more active in the CNCF community. “I’ve been really impressed seeing how our engineers have come up with really creative solutions to things by just combining a lot of Kubernetes primitives,” says Lu. “Exposing Kubernetes constructs as a service to our engineers as opposed to exposing higher order constructs has worked well for us. It lets you get familiar with the technology and do more interesting things with it.” +

The impact has been great. Deployments used to require "a lot of manual tweaking, in-house scripting that we wrote, and because of our disparate EC2 instances, Operations had to babysit the whole thing from start to finish," says Ryan. "We didn't really have a story around testing in a methodical way, and using reusable containers or builds in a standardized way." There's a faster onboarding process now. Before, the time to first deploy was two days' hands-on setup time; now it's two hours.

-
-
+

By moving to continuous integration, containerization, and Kubernetes, velocity was increased dramatically. The time from code-complete to deployment in production on real infrastructure went from one to two weeks to two to four hours for a typical service. Plus, says Gattu, "In man hours, that's one person versus a developer and a DevOps individual at the same time." With an 80% decrease in time for a single deployment to happen in production, the number of deployments has increased as well, from 1200/year to 3200/year.

+ +{{< case-studies/quote + image="/images/case-studies/vsco/banner4.jpg" + author="MELINDA LU, ENGINEERING MANAGER FOR VSCO'S MACHINE LEARNING TEAM" +>}} +"I've been really impressed seeing how our engineers have come up with really creative solutions to things by just combining a lot of Kubernetes primitives, exposing Kubernetes constructs as a service to our engineers as opposed to exposing higher order constructs has worked well for us. It lets you get familiar with the technology and do more interesting things with it." +{{< /case-studies/quote >}} + +

There have been real dollar savings too: With Kubernetes, VSCO is running at 2x to 20x greater EC2 efficiency, depending on the service, adding up to about 70% overall savings on the company's EC2 bill.

+ +

Ryan points to the company's ability to go from managing one large monolithic application to 50+ microservices with "the same size developer team, more or less. And we've only been able to do that because we have increased trust in our tooling and a lot more flexibility when there are stress points in our system. You can increase CPU memory requirements of a service without having to bring up and tear down instances, and read through AWS pages just to be familiar with a lot of jargon, which isn't really tenable for a company at our scale."

+ +

Envoy and gRPC have also had a positive impact at VSCO. "We get many benefits from gRPC out of the box: type safety across multiple languages, ease of defining services with the gRPC IDL, built-in architecture like interceptors, and performance improvements over HTTP/1.1 and JSON," says Lu.

+ +

VSCO was one of the first users of Envoy, getting it in production five days after it was open sourced. "We wanted to serve gRPC and HTTP/2 directly to mobile clients through our edge load balancers, and Envoy was our only reasonable solution," says Lu. "The ability to send consistent and detailed stats by default across all services has made observability and standardization of dashboards much easier." The metrics that come built in with Envoy have also "greatly helped with debugging," says DevOps Engineer Ryan Nguyen.

+ +{{< case-studies/quote author="NAVEEN GATTU, SENIOR SOFTWARE ENGINEER ON VSCO'S COMMUNITY TEAM" >}} +"Because there's now an organization that supports Kubernetes, does that build confidence? The answer is a resounding yes." +{{< /case-studies/quote >}} + +

With Kubernetes, gRPC, and Envoy in place, VSCO has seen an 88% reduction in total minutes of outage time, mainly due to the elimination of JSON-schema errors and service-specific infrastructure provisioning errors, and an increased speed in fixing outages.

+ +

Given its success using CNCF projects, VSCO is starting to experiment with others, including CNI and Prometheus. "To have a large organization backing these technologies, we have a lot more confidence trying this software and deploying to production," says Nguyen.

+ +

The team has made contributions to gRPC and Envoy, and is hoping to be even more active in the CNCF community. "I've been really impressed seeing how our engineers have come up with really creative solutions to things by just combining a lot of Kubernetes primitives," says Lu. "Exposing Kubernetes constructs as a service to our engineers as opposed to exposing higher order constructs has worked well for us. It lets you get familiar with the technology and do more interesting things with it."

diff --git a/content/en/case-studies/wikimedia/index.html b/content/en/case-studies/wikimedia/index.html index abc74e3ee3..b10002af83 100644 --- a/content/en/case-studies/wikimedia/index.html +++ b/content/en/case-studies/wikimedia/index.html @@ -1,96 +1,66 @@ --- title: Wikimedia Case Study - -class: gridPage +case_study_styles: true cid: caseStudies + +new_case_study_styles: true +heading_title_text: Wikimedia +use_gradient_overlay: true +subheading: > + Using Kubernetes to Build Tools to Improve the World's Wikis +case_study_details: + - Company: Wikimedia + - Location: San Francisco, CA --- -
-

Wikimedia Case Study

-
+

The non-profit Wikimedia Foundation operates some of the largest collaboratively edited reference projects in the world, including Wikipedia. To help users maintain and use wikis, it runs Wikimedia Tool Labs, a hosting environment for community developers working on tools and bots to help editors and other volunteers do their work, including reducing vandalism. The community around Wikimedia Tool Labs began forming nearly 10 years ago.

-
-
-
-

Using Kubernetes to Build Tools to Improve the World's Wikis

-

- The non-profit Wikimedia Foundation operates some of the largest collaboratively edited reference projects in the world, including Wikipedia. To help users maintain and use wikis, it runs Wikimedia Tool Labs, a hosting environment for community developers working on tools and bots to help editors and other volunteers do their work, including reducing vandalism. The community around Wikimedia Tool Labs began forming nearly 10 years ago. -

-
- Wikimedia -

- "Wikimedia Tool Labs is vital for making sure wikis all around the world work as well as they possibly can. Because it's grown organically for almost 10 years, it has become an extremely challenging environment and difficult to maintain. It's like a big ball of mud — you really can't see through it. With Kubernetes, we're simplifying the environment and making it easier for developers to build the tools that make wikis run better." -

-

— Yuvi Panda, operations engineer at Wikimedia Foundation and Wikimedia Tool Labs

-
-
-
-
+{{< case-studies/quote author="Yuvi Panda, operations engineer at Wikimedia Foundation and Wikimedia Tool Labs">}} +Wikimedia +
+
+"Wikimedia Tool Labs is vital for making sure wikis all around the world work as well as they possibly can. Because it's grown organically for almost 10 years, it has become an extremely challenging environment and difficult to maintain. It's like a big ball of mud — you really can't see through it. With Kubernetes, we're simplifying the environment and making it easier for developers to build the tools that make wikis run better." +{{< /case-studies/quote >}} -
-
-
-
-

Challenges:

-
    -
  • Simplify a complex, difficult-to-manage infrastructure
  • -
  • Allow developers to continue writing tools and bots using existing techniques
  • -
-
-
-

Why Kubernetes:

-
    -
  • Wikimedia Tool Labs chose Kubernetes because it can mimic existing workflows, while reducing complexity
  • -
-
-
-

Approach:

-
    -
  • Migrate old systems and a complex infrastructure to Kubernetes
  • -
-
-
-

Results:

-
    -
  • 20 percent of web tools that account for more than 40 percent of web traffic now run on Kubernetes
  • -
  • A 25-node cluster that keeps up with each new Kubernetes release
  • -
  • Thousands of lines of old code have been deleted, thanks to Kubernetes
  • -
-
-
-
-
+

Challenges

-
-
-
-

Using Kubernetes to provide tools for maintaining wikis

-

- Wikimedia Tool Labs is run by a staff of four-and-a-half paid employees and two volunteers. The infrastructure didn't make it easy or intuitive for developers to build bots and other tools to make wikis work more easily. Yuvi says, "It's incredibly chaotic. We have lots of Perl and Bash duct tape on top of it. Everything is super fragile." -

-

- To solve the problem, Wikimedia Tool Labs migrated parts of its infrastructure to Kubernetes, in preparation for eventually moving its entire system. Yuvi said Kubernetes greatly simplifies maintenance. The goal is to allow developers creating bots and other tools to use whatever development methods they want, but make it easier for the Wikimedia Tool Labs to maintain the required infrastructure for hosting and sharing them. -

-

- "With Kubernetes, I've been able to remove a lot of our custom-made code, which makes everything easier to maintain. Our users' code also runs in a more stable way than previously," says Yuvi. -

-
-
-
+ -
-
-
-

Simplifying infrastructure and keeping wikis running better

-

- Wikimedia Tool Labs has seen great success with the initial Kubernetes deployment. Old code is being simplified and eliminated, contributing developers don't have to change the way they write their tools and bots, and those tools and bots run in a more stable fashion than they have in the past. The paid staff and volunteers are able to better keep up with fixing issues. -

-

- In the future, with a more complete migration to Kubernetes, Wikimedia Tool Labs expects to make it even easier to host and maintain the bots and tools that help run wikis across the world. The tool labs already host approximately 1,300 tools and bots from 800 volunteers, with many more being submitted every day. Twenty percent of the tool labs' web tools that account for more than 60 percent of web traffic now run on Kubernetes. The tool labs has a 25-node cluster that keeps up with each new Kubernetes release. Many existing web tools are migrating to Kubernetes. -

-

- "Our goal is to make sure that people all over the world can share knowledge as easily as possible. Kubernetes helps with that, by making it easier for wikis everywhere to have the tools they need to thrive," says Yuvi. -

-
-
-
+

Why Kubernetes

+ + + +

Approach

+ + + +

Results

+ + + +

Using Kubernetes to provide tools for maintaining wikis

+ +

Wikimedia Tool Labs is run by a staff of four-and-a-half paid employees and two volunteers. The infrastructure didn't make it easy or intuitive for developers to build bots and other tools to make wikis work more easily. Yuvi says, "It's incredibly chaotic. We have lots of Perl and Bash duct tape on top of it. Everything is super fragile."

+ +

To solve the problem, Wikimedia Tool Labs migrated parts of its infrastructure to Kubernetes, in preparation for eventually moving its entire system. Yuvi said Kubernetes greatly simplifies maintenance. The goal is to allow developers creating bots and other tools to use whatever development methods they want, but make it easier for the Wikimedia Tool Labs to maintain the required infrastructure for hosting and sharing them.

+ +

"With Kubernetes, I've been able to remove a lot of our custom-made code, which makes everything easier to maintain. Our users' code also runs in a more stable way than previously," says Yuvi.

+ +

Simplifying infrastructure and keeping wikis running better

+ +

Wikimedia Tool Labs has seen great success with the initial Kubernetes deployment. Old code is being simplified and eliminated, contributing developers don't have to change the way they write their tools and bots, and those tools and bots run in a more stable fashion than they have in the past. The paid staff and volunteers are able to better keep up with fixing issues.

+ +

In the future, with a more complete migration to Kubernetes, Wikimedia Tool Labs expects to make it even easier to host and maintain the bots and tools that help run wikis across the world. The tool labs already host approximately 1,300 tools and bots from 800 volunteers, with many more being submitted every day. Twenty percent of the tool labs' web tools that account for more than 60 percent of web traffic now run on Kubernetes. The tool labs has a 25-node cluster that keeps up with each new Kubernetes release. Many existing web tools are migrating to Kubernetes.

+ +

"Our goal is to make sure that people all over the world can share knowledge as easily as possible. Kubernetes helps with that, by making it easier for wikis everywhere to have the tools they need to thrive," says Yuvi.

diff --git a/content/en/case-studies/wink/index.html b/content/en/case-studies/wink/index.html index 3f47f8c779..1f8ea4c89d 100644 --- a/content/en/case-studies/wink/index.html +++ b/content/en/case-studies/wink/index.html @@ -1,109 +1,87 @@ --- title: Wink Case Study - case_study_styles: true cid: caseStudies -css: /css/style_wink.css + +new_case_study_styles: true +heading_background: /images/case-studies/wink/banner1.jpg +heading_title_logo: /images/wink_logo.png +subheading: > + Cloud-Native Infrastructure Keeps Your Smart Home Connected +case_study_details: + - Company: Wink + - Location: New York, N.Y. + - Industry: Internet of Things Platform --- -
-

CASE STUDY:
-
Cloud-Native Infrastructure Keeps Your Smart Home Connected
-

-
+

Challenge

+

Building a low-latency, highly reliable infrastructure to serve communications between millions of connected smart-home devices and the company's consumer hubs and mobile app, with an emphasis on horizontal scalability, the ability to encrypt everything quickly and connections that could be easily brought back up if anything went wrong.

-
- Company  Wink     Location  New York, N.Y.     Industry  Internet of Things Platform -
+

Solution

-
+

Across-the-board use of a Kubernetes-Docker-CoreOS Container Linux stack.

-
-
-
+

Impact

-

Challenge

- Building a low-latency, highly reliable infrastructure to serve communications between millions of connected smart-home devices and the company’s consumer hubs and mobile app, with an emphasis on horizontal scalability, the ability to encrypt everything quickly and connections that could be easily brought back up if anything went wrong. -

-

Solution

- Across-the-board use of a Kubernetes-Docker-CoreOS Container Linux stack.

-
+

"Two of the biggest American retailers [Home Depot and Walmart] are carrying and promoting the brand and the hardware," Wink Head of Engineering Kit Klein says proudly – though he adds that "it really comes with a lot of pressure. It's not a retail situation where you have a lot of tech enthusiasts. These are everyday people who want something that works and have no tolerance for technical excuses." And that's further testament to how much faith Klein has in the infrastructure that the Wink team has built. With 80 percent of Wink's workload running on a unified stack of Kubernetes-Docker-CoreOS, the company has put itself in a position to continually innovate and improve its products and services. Committing to this technology, says Klein, "makes building on top of the infrastructure relatively easy."

-
-

Impact

- "Two of the biggest American retailers [Home Depot and Walmart] are carrying and promoting the brand and the hardware,” Wink Head of Engineering Kit Klein says proudly – though he adds that "it really comes with a lot of pressure. It’s not a retail situation where you have a lot of tech enthusiasts. These are everyday people who want something that works and have no tolerance for technical excuses.” And that’s further testament to how much faith Klein has in the infrastructure that the Wink team has built. With 80 percent of Wink’s workload running on a unified stack of Kubernetes-Docker-CoreOS, the company has put itself in a position to continually innovate and improve its products and services. Committing to this technology, says Klein, "makes building on top of the infrastructure relatively easy.” -
+{{< case-studies/quote author="KIT KLEIN, HEAD OF ENGINEERING, WINK" >}} +"It's not proprietary, it's totally open, it's really portable. You can run all the workloads across different cloud providers. You can easily run a hybrid AWS or even bring in your own data center. That's the benefit of having everything unified on one open source Kubernetes-Docker-CoreOS Container Linux stack. There are massive security benefits if you only have one Linux distro/machine image to validate. The benefits are enormous because you save money, and you save time." +{{< /case-studies/quote >}} -
-
+{{< case-studies/lead >}} +How many people does it take to turn on a light bulb? +{{< /case-studies/lead >}} +

Kit Klein whips out his phone to demonstrate. With a few swipes, the head of engineering at Wink pulls up the smart-home app created by the New York City-based company and taps the light button. "Honestly when you're holding the phone and you're hitting the light," he says, "by the time you feel the pressure of your finger on the screen, it's on. It takes as long as the signal to travel to your brain."

-
-
- "It’s not proprietary, it’s totally open, it’s really portable. You can run all the workloads across different cloud providers. You can easily run a hybrid AWS or even bring in your own data center. That’s the benefit of having everything unified on one open source Kubernetes-Docker-CoreOS Container Linux stack. There are massive security benefits if you only have one Linux distro/machine image to validate. The benefits are enormous because you save money, and you save time.”

- KIT KLEIN, HEAD OF ENGINEERING, WINK -
-
+

Sure, it takes just one finger and less than 200 milliseconds to turn on the light – or lock a door or change a thermostat. But what allows Wink to help consumers manage their connected smart-home products with such speed and ease is a sophisticated, cloud native infrastructure that Klein and his team built and continue to develop using a unified stack of CoreOS, the open-source operating system designed for clustered deployments, and Kubernetes, an open-source platform for automating deployment, scaling, and operations of application containers across clusters of hosts, providing container-centric infrastructure. "When you have a big, complex network of interdependent microservices that need to be able to discover each other, and need to be horizontally scalable and tolerant to failure, that's what this is really optimized for," says Klein. "A lot of people end up relying on proprietary services [offered by some big cloud providers] to do some of this stuff, but what you get by adopting CoreOS/Kubernetes is portability, to not be locked in to anyone. You can really make your own fate."

+

Indeed, Wink did. The company's mission statement is to make the connected home accessible – that is, user-friendly for non-technical owners, affordable and perhaps most importantly, reliable. "If you can't trust that when you hit the switch, you know a light is going to go on, or if you're remote and you're checking on your house and that information isn't accurate, then the convenience of the system is lost," says Klein. "So that's where the infrastructure comes in."

-
-
-

How many people does it take to turn on a light bulb?

+

Wink was incubated within Quirky, a company that developed crowd-sourced inventions. The Wink app was first introduced in 2013, and at the time, it controlled only a few consumer products such as the PivotPower Strip that Quirky produced in collaboration with GE. As smart-home products proliferated, Wink was launched in 2014 in Home Depot stores nationwide. Its first project: a hub that could integrate with smart products from about a dozen brands like Honeywell and Chamberlain. The biggest challenge would be to build the infrastructure to serve all those communications between the hub and the products, with a focus on maximizing reliability and minimizing latency.

- Kit Klein whips out his phone to demonstrate. With a few swipes, the head of engineering at Wink pulls up the smart-home app created by the New York City-based company and taps the light button. "Honestly when you’re holding the phone and you’re hitting the light,” he says, "by the time you feel the pressure of your finger on the screen, it’s on. It takes as long as the signal to travel to your brain.”

- Sure, it takes just one finger and less than 200 milliseconds to turn on the light – or lock a door or change a thermostat. But what allows Wink to help consumers manage their connected smart-home products with such speed and ease is a sophisticated, cloud native infrastructure that Klein and his team built and continue to develop using a unified stack of CoreOS, the open-source operating system designed for clustered deployments, and Kubernetes, an open-source platform for automating deployment, scaling, and operations of application containers across clusters of hosts, providing container-centric infrastructure. "When you have a big, complex network of interdependent microservices that need to be able to discover each other, and need to be horizontally scalable and tolerant to failure, that’s what this is really optimized for,” says Klein. "A lot of people end up relying on proprietary services [offered by some big cloud providers] to do some of this stuff, but what you get by adopting CoreOS/Kubernetes is portability, to not be locked in to anyone. You can really make your own fate.”

- Indeed, Wink did. The company’s mission statement is to make the connected home accessible – that is, user-friendly for non-technical owners, affordable and perhaps most importantly, reliable. "If you can’t trust that when you hit the switch, you know a light is going to go on, or if you’re remote and you’re checking on your house and that information isn’t accurate, then the convenience of the system is lost,” says Klein. "So that’s where the infrastructure comes in.”

- Wink was incubated within Quirky, a company that developed crowd-sourced inventions. The Wink app was first introduced in 2013, and at the time, it controlled only a few consumer products such as the PivotPower Strip that Quirky produced in collaboration with GE. As smart-home products proliferated, Wink was launched in 2014 in Home Depot stores nationwide. Its first project: a hub that could integrate with smart products from about a dozen brands like Honeywell and Chamberlain. The biggest challenge would be to build the infrastructure to serve all those communications between the hub and the products, with a focus on maximizing reliability and minimizing latency.

- "When we originally started out, we were moving very fast trying to get the first product to market, the minimum viable product,” says Klein. "Lots of times you go down a path and end up having to backtrack and try different things. But in this particular case, we did a lot of the work up front, which led to us making a really sound decision to deploy it on CoreOS Container Linux. And that was very early in the life of it.” +

"When we originally started out, we were moving very fast trying to get the first product to market, the minimum viable product," says Klein. "Lots of times you go down a path and end up having to backtrack and try different things. But in this particular case, we did a lot of the work up front, which led to us making a really sound decision to deploy it on CoreOS Container Linux. And that was very early in the life of it."

-
-
+{{< case-studies/quote image="/images/case-studies/wink/banner3.jpg">}} +"...what you get by adopting CoreOS/Kubernetes is portability, to not be locked in to anyone. You can really make your own fate." +{{< /case-studies/quote >}} -
-
- "...what you get by adopting CoreOS/Kubernetes is portability, to not be locked in to anyone. You can really make your own fate.” -
-
+

Concern number one: Wink's products need to connect to consumer devices in people's homes, behind a firewall. "You don't have an end point like a URL, and you don't even know what ports are open behind that firewall," Klein explains. "So you essentially need to have this thing wake up and talk to your system and then open real-time, bidirectional communication between the cloud and the device. And it's really, really important that it's persistent because you want to decrease as much as possible the overhead of sending a message – you never know when someone is going to turn on the lights."

-
-
- Concern number one: Wink’s products need to connect to consumer devices in people’s homes, behind a firewall. "You don’t have an end point like a URL, and you don’t even know what ports are open behind that firewall,” Klein explains. "So you essentially need to have this thing wake up and talk to your system and then open real-time, bidirectional communication between the cloud and the device. And it’s really, really important that it’s persistent because you want to decrease as much as possible the overhead of sending a message – you never know when someone is going to turn on the lights.”

- With the earliest version of the Wink Hub, when you decided to turn your lights on or off, the request would be sent to the cloud and then executed. Subsequent updates to Wink’s software enabled local control, cutting latency down to about 10 milliseconds for many devices. But with the need for cloud-enabled integrations of an ever-growing ecosystem of smart home products, low-latency internet connectivity is still a critical consideration. -

-

"You essentially need to have this thing wake up and talk to your system and then open real-time, bidirectional communication between the cloud and the device. And it’s really, really important that it’s persistent...you never know when someone is going to turn on the lights.”

- In addition, Wink had other requirements: horizontal scalability, the ability to encrypt everything quickly, connections that could be easily brought back up if something went wrong. "Looking at this whole structure we started, we decided to make a secure socket-based service,” says Klein. "We’ve always used, I would say, some sort of clustering technology to deploy our services and so the decision we came to was, this thing is going to be containerized, running on Docker.”

- At the time – just over two years ago – Docker wasn’t yet widely used, but as Klein points out, "it was certainly understood by the people who were on the frontier of technology. We started looking at potential technologies that existed. One of the limiting factors was that we needed to deploy multi-port non-http/https services. It wasn’t really appropriate for some of the early cluster technology. We liked the project a lot and we ended up using it on other stuff for a while, but initially it was too targeted toward http workloads.”

- Once Wink’s backend engineering team decided on a Dockerized workload, they had to make decisions about the OS and the container orchestration platform. "Obviously you can’t just start the containers and hope everything goes well,” Klein says with a laugh. "You need to have a system that is helpful [in order] to manage where the workloads are being distributed out to. And when the container inevitably dies or something like that, to restart it, you have a load balancer. All sorts of housekeeping work is needed to have a robust infrastructure.” +

With the earliest version of the Wink Hub, when you decided to turn your lights on or off, the request would be sent to the cloud and then executed. Subsequent updates to Wink's software enabled local control, cutting latency down to about 10 milliseconds for many devices. But with the need for cloud-enabled integrations of an ever-growing ecosystem of smart home products, low-latency internet connectivity is still a critical consideration.

-
-
+{{< case-studies/lead >}} +"You essentially need to have this thing wake up and talk to your system and then open real-time, bidirectional communication between the cloud and the device. And it's really, really important that it's persistent...you never know when someone is going to turn on the lights." +{{< /case-studies/lead >}} -
-
- "Obviously you can’t just start the containers and hope everything goes well,” Klein says with a laugh. "You need to have a system that is helpful [in order] to manage where the workloads are being distributed out to. And when the container inevitably dies or something like that, to restart it, you have a load balancer. All sorts of housekeeping work is needed to have a robust infrastructure.” -
-
+

In addition, Wink had other requirements: horizontal scalability, the ability to encrypt everything quickly, connections that could be easily brought back up if something went wrong. "Looking at this whole structure we started, we decided to make a secure socket-based service," says Klein. "We've always used, I would say, some sort of clustering technology to deploy our services and so the decision we came to was, this thing is going to be containerized, running on Docker."

-
-
- Wink considered building directly on a general purpose Linux distro like Ubuntu (which would have required installing tools to run a containerized workload) and cluster management systems like Mesos (which was targeted toward enterprises with larger teams/workloads), but ultimately set their sights on CoreOS Container Linux. "A container-optimized Linux distribution system was exactly what we needed,” he says. "We didn’t have to futz around with trying to take something like a Linux distro and install everything. It’s got a built-in container orchestration system, which is Fleet, and an easy-to-use API. It’s not as feature-rich as some of the heavier solutions, but we realized that, at that moment, it was exactly what we needed.”

- Wink’s hub (along with a revamped app) was introduced in July 2014 with a short-term deployment, and within the first month, they had moved the service to the Dockerized CoreOS deployment. Since then, they’ve moved almost every other piece of their infrastructure – from third-party cloud-to-cloud integrations to their customer service and payment portals – onto CoreOS Container Linux clusters.

- Using this setup did require some customization. "Fleet is really nice as a basic container orchestration system, but it doesn’t take care of routing, sharing configurations, secrets, et cetera, among instances of a service,” Klein says. "All of those layers of functionality can be implemented, of course, but if you don’t want to spend a lot of time writing unit files manually – which of course nobody does – you need to create a tool to automate some of that, which we did.”

- Wink quickly embraced the Kubernetes container cluster manager when it was launched in 2015 and integrated with CoreOS core technology, and as promised, it ended up providing the features Wink wanted and had planned to build. "If not for Kubernetes, we likely would have taken the logic and library we implemented for the automation tool that we created, and would have used it in a higher level abstraction and tool that could be used by non-DevOps engineers from the command line to create and manage clusters,” Klein says. "But Kubernetes made that totally unnecessary – and is written and maintained by people with a lot more experience in cluster management than us, so all the better.” Now, an estimated 80 percent of Wink’s workload is run on Kubernetes on top of CoreOS Container Linux. +

At the time – just over two years ago – Docker wasn't yet widely used, but as Klein points out, "it was certainly understood by the people who were on the frontier of technology. We started looking at potential technologies that existed. One of the limiting factors was that we needed to deploy multi-port non-http/https services. It wasn't really appropriate for some of the early cluster technology. We liked the project a lot and we ended up using it on other stuff for a while, but initially it was too targeted toward http workloads."

-
-
+

Once Wink's backend engineering team decided on a Dockerized workload, they had to make decisions about the OS and the container orchestration platform. "Obviously you can't just start the containers and hope everything goes well," Klein says with a laugh. "You need to have a system that is helpful [in order] to manage where the workloads are being distributed out to. And when the container inevitably dies or something like that, to restart it, you have a load balancer. All sorts of housekeeping work is needed to have a robust infrastructure."

-
-
- "Stay close to the development. Understand why decisions are being made. If you understand the intent behind the project, from the technological intent to a certain philosophical intent, then it helps you understand how to build your system in harmony with those systems as opposed to trying to work against it.” -
-
+{{< case-studies/quote image="/images/case-studies/wink/banner4.jpg" >}} +"Obviously you can't just start the containers and hope everything goes well," Klein says with a laugh. "You need to have a system that is helpful [in order] to manage where the workloads are being distributed out to. And when the container inevitably dies or something like that, to restart it, you have a load balancer. All sorts of housekeeping work is needed to have a robust infrastructure." +{{< /case-studies/quote >}} -
-
- Wink’s reasons for going all in are clear: "It’s not proprietary, it’s totally open, it’s really portable,” Klein says. "You can run all the workloads across different cloud providers. You can easily run a hybrid AWS or even bring in your own data center. That’s the benefit of having everything unified on one Kubernetes-Docker-CoreOS Container Linux stack. There are massive security benefits if you only have one Linux distro to try to validate. The benefits are enormous because you save money, you save time.”

- Klein concedes that there are tradeoffs in every technology decision. "Cutting-edge technology is going to be scary for some people,” he says. "In order to take advantage of this, you really have to keep up with the technology. You can’t treat it like it’s a black box. Stay close to the development. Understand why decisions are being made. If you understand the intent behind the project, from the technological intent to a certain philosophical intent, then it helps you understand how to build your system in harmony with those systems as opposed to trying to work against it.”

- Wink, which was acquired by Flex in 2015, now controls 2.3 million connected devices in households all over the country. What’s next for the company? A new version of the hub - Wink Hub 2 - hit shelves last November – and is being offered for the first time at Walmart stores in addition to Home Depot. "Two of the biggest American retailers are carrying and promoting the brand and the hardware,” Klein says proudly – though he adds that "it really comes with a lot of pressure. It’s not a retail situation where you have a lot of tech enthusiasts. These are everyday people who want something that works and have no tolerance for technical excuses.” And that’s further testament to how much faith Klein has in the infrastructure that the Wink team has have built.

- Wink’s engineering team has grown exponentially since its early days, and behind the scenes, Klein is most excited about the machine learning Wink is using. "We built [a system of] containerized small sections of the data pipeline that feed each other and can have multiple outputs,” he says. "It’s like data pipelines as microservices.” Again, Klein points to having a unified stack running on CoreOS Container Linux and Kubernetes as the primary driver for the innovations to come. "You’re not reinventing the wheel every time,” he says. "You can just get down to work.”
-
+

Wink considered building directly on a general purpose Linux distro like Ubuntu (which would have required installing tools to run a containerized workload) and cluster management systems like Mesos (which was targeted toward enterprises with larger teams/workloads), but ultimately set their sights on CoreOS Container Linux. "A container-optimized Linux distribution system was exactly what we needed," he says. "We didn't have to futz around with trying to take something like a Linux distro and install everything. It's got a built-in container orchestration system, which is Fleet, and an easy-to-use API. It's not as feature-rich as some of the heavier solutions, but we realized that, at that moment, it was exactly what we needed."

+ +

Wink's hub (along with a revamped app) was introduced in July 2014 with a short-term deployment, and within the first month, they had moved the service to the Dockerized CoreOS deployment. Since then, they've moved almost every other piece of their infrastructure – from third-party cloud-to-cloud integrations to their customer service and payment portals – onto CoreOS Container Linux clusters.

+ +

Using this setup did require some customization. "Fleet is really nice as a basic container orchestration system, but it doesn't take care of routing, sharing configurations, secrets, et cetera, among instances of a service," Klein says. "All of those layers of functionality can be implemented, of course, but if you don't want to spend a lot of time writing unit files manually – which of course nobody does – you need to create a tool to automate some of that, which we did."

+ +

Wink quickly embraced the Kubernetes container cluster manager when it was launched in 2015 and integrated with CoreOS core technology, and as promised, it ended up providing the features Wink wanted and had planned to build. "If not for Kubernetes, we likely would have taken the logic and library we implemented for the automation tool that we created, and would have used it in a higher level abstraction and tool that could be used by non-DevOps engineers from the command line to create and manage clusters," Klein says. "But Kubernetes made that totally unnecessary – and is written and maintained by people with a lot more experience in cluster management than us, so all the better." Now, an estimated 80 percent of Wink's workload is run on Kubernetes on top of CoreOS Container Linux.

+ +{{< case-studies/quote >}} +"Stay close to the development. Understand why decisions are being made. If you understand the intent behind the project, from the technological intent to a certain philosophical intent, then it helps you understand how to build your system in harmony with those systems as opposed to trying to work against it." +{{< /case-studies/quote >}} + +

Wink's reasons for going all in are clear: "It's not proprietary, it's totally open, it's really portable," Klein says. "You can run all the workloads across different cloud providers. You can easily run a hybrid AWS or even bring in your own data center. That's the benefit of having everything unified on one Kubernetes-Docker-CoreOS Container Linux stack. There are massive security benefits if you only have one Linux distro to try to validate. The benefits are enormous because you save money, you save time."

+ +

Klein concedes that there are tradeoffs in every technology decision. "Cutting-edge technology is going to be scary for some people," he says. "In order to take advantage of this, you really have to keep up with the technology. You can't treat it like it's a black box. Stay close to the development. Understand why decisions are being made. If you understand the intent behind the project, from the technological intent to a certain philosophical intent, then it helps you understand how to build your system in harmony with those systems as opposed to trying to work against it."

+ +

Wink, which was acquired by Flex in 2015, now controls 2.3 million connected devices in households all over the country. What's next for the company? A new version of the hub - Wink Hub 2 - hit shelves last November – and is being offered for the first time at Walmart stores in addition to Home Depot. "Two of the biggest American retailers are carrying and promoting the brand and the hardware," Klein says proudly – though he adds that "it really comes with a lot of pressure. It's not a retail situation where you have a lot of tech enthusiasts. These are everyday people who want something that works and have no tolerance for technical excuses." And that's further testament to how much faith Klein has in the infrastructure that the Wink team has have built.

+ +

Wink's engineering team has grown exponentially since its early days, and behind the scenes, Klein is most excited about the machine learning Wink is using. "We built [a system of] containerized small sections of the data pipeline that feed each other and can have multiple outputs," he says. "It's like data pipelines as microservices." Again, Klein points to having a unified stack running on CoreOS Container Linux and Kubernetes as the primary driver for the innovations to come. "You're not reinventing the wheel every time," he says. "You can just get down to work."

diff --git a/content/en/case-studies/woorank/index.html b/content/en/case-studies/woorank/index.html index fbb86bdd24..d83240d878 100644 --- a/content/en/case-studies/woorank/index.html +++ b/content/en/case-studies/woorank/index.html @@ -3,94 +3,77 @@ title: Woorank Case Study linkTitle: woorank case_study_styles: true cid: caseStudies -css: /css/style_case_studies.css featured: false + +new_case_study_styles: true +heading_background: /images/case-studies/woorank/banner1.jpg +heading_title_logo: /images/woorank_logo.png +subheading: > + Woorank: How Kubernetes Helped a Startup Manage 50 Microservices with 12 Engineers—At 30% Less Cost +case_study_details: + - Company: Woorank + - Location: Brussels, Belgium + - Industry: Digital marketing tool --- +

Challenge

-
-

CASE STUDY:
Woorank: How Kubernetes Helped a Startup Manage 50 Microservices with
12 Engineers—At 30% Less Cost -

+

Founded in 2011, Woorank embraced microservices and containerization early on, so its core product, a tool that helps digital marketers improve their websites' visibility on the internet, consists of 50 applications developed and maintained by a technical team of 12. For two years, the infrastructure ran smoothly on Mesos, but "there were still lots of our own libraries that we had to roll and applications that we had to bring in, so it was very cumbersome for us as a small team to keep those things alive and to update them," says CTO/Cofounder Nils De Moor. So he began looking for a new solution with more automation and self-healing built in, that would better suit the company's human resources.

-
+

Solution

-
- Company  Woorank     Location  Brussels, Belgium     Industry  Digital marketing tool -
+

De Moor decided to switch to Kubernetes running on AWS, which "allows us to just define applications, how they need to run, how scalable they need to be, and it takes pain away from the developers thinking about that," he says. "When things fail and errors pop up, the system tries to heal itself, and that's really, for us, the key reason to work with Kubernetes." The company now also uses Fluentd, Prometheus, and OpenTracing.

-
-
-
-
- -

Challenge

- Founded in 2011, Woorank embraced microservices and containerization early on, so its core product, a tool that helps digital marketers improve their websites’ visibility on the internet, consists of 50 applications developed and maintained by a technical team of 12. For two years, the infrastructure ran smoothly on Mesos, but “there were still lots of our own libraries that we had to roll and applications that we had to bring in, so it was very cumbersome for us as a small team to keep those things alive and to update them,” says CTO/Cofounder Nils De Moor. So he began looking for a new solution with more automation and self-healing built in, that would better suit the company’s human resources. -

Solution

- De Moor decided to switch to Kubernetes running on AWS, which “allows us to just define applications, how they need to run, how scalable they need to be, and it takes pain away from the developers thinking about that,” he says. “When things fail and errors pop up, the system tries to heal itself, and that’s really, for us, the key reason to work with Kubernetes.” The company now also uses Fluentd, Prometheus, and OpenTracing.

Impact

- The company’s number one concern was immediately erased: Maintaining Kubernetes takes just one person on staff, and it’s not a fulltime job. Infrastructure updates used to take two active working days; now it’s just a matter of “a few hours of passively following the process,” says De Moor. Implementing new tools—which once took weeks of planning, installing, and onboarding—now only takes a few days. “We were already pretty flexible in our costs and taking on traffic peaks and higher load in general,” adds De Moor, “but with Kubernetes and the other CNCF tools we use, we have achieved about 30% in cost savings.” Plus, the rate of deployments per day has nearly doubled. -
-
-
-
-
- “It was definitely important for us to have CNCF as an umbrella above everything. We’ve always been working with open source libraries and tools and technologies. It works very well for us, but sometimes things can drift, maintainers drop out, and projects go haywire. For us, it was indeed important to know that whatever project gets taken under this umbrella, it’s taken very seriously. Our way of contributing back is also by joining this community. It’s, for us, a way to show our appreciation for what’s going on in this framework.” -

— NILS DE MOOR, CTO/COFOUNDER, WOORANK
-
-
-
-
-

Woorank’s core product is a tool that enables digital marketers to improve their websites’ visibility on the internet.

- “We help them acquire lots of data and then present it to them in meaningful ways so they can work with it,” says CTO/Cofounder Nils De Moor. In its seven years as a startup, the company followed a familiar technological path to build that product: starting with a monolithic application, breaking it down into microservices, and then embracing containerization. “That’s where our modern infrastructure started out,” says De Moor. -

- As new features have been added to the product, it has grown to consist of 50 applications under the hood. Though Docker had made things easier to deploy, and the team had been using Mesos as an orchestration framework on AWS since 2015, De Moor realized there was still too much overhead to managing the infrastructure, especially with a technical team of just 12. -

- “The pain point was that there were still lots of our own libraries that we had to roll and applications that we had to bring in, so it was very cumbersome for us as a small team to keep those things alive and to update them,” says De Moor. “When things went wrong during deployment, someone manually had to come in and figure it out. It wasn’t necessarily that the technology or anything was wrong with Mesos; it was just not really fitting our model of being a small company, not having the human resources to make sure it all works and can be updated.” +

The company's number one concern was immediately erased: Maintaining Kubernetes takes just one person on staff, and it's not a fulltime job. Infrastructure updates used to take two active working days; now it's just a matter of "a few hours of passively following the process," says De Moor. Implementing new tools—which once took weeks of planning, installing, and onboarding—now only takes a few days. "We were already pretty flexible in our costs and taking on traffic peaks and higher load in general," adds De Moor, "but with Kubernetes and the other CNCF tools we use, we have achieved about 30% in cost savings." Plus, the rate of deployments per day has nearly doubled.

-
-
-
-
- "Cloud native technologies have brought to us a transparency on everything going on in our system, from the code to the server. It has brought huge cost savings and a better way of dealing with those costs and keeping them under control. And performance-wise, it has helped our team understand how we can make our code work better on the cloud native infrastructure."

— NILS DE MOOR, CTO/COFOUNDER, WOORANK
+{{< case-studies/quote author="NILS DE MOOR, CTO/COFOUNDER, WOORANK" >}} +"It was definitely important for us to have CNCF as an umbrella above everything. We've always been working with open source libraries and tools and technologies. It works very well for us, but sometimes things can drift, maintainers drop out, and projects go haywire. For us, it was indeed important to know that whatever project gets taken under this umbrella, it's taken very seriously. Our way of contributing back is also by joining this community. It's, for us, a way to show our appreciation for what's going on in this framework." +{{< /case-studies/quote >}} -
-
+{{< case-studies/lead >}} +Woorank's core product is a tool that enables digital marketers to improve their websites' visibility on the internet. +{{< /case-studies/lead >}} -
-
- Around the time Woorank was grappling with these issues, Kubernetes was emerging as a technology. De Moor knew that he wanted a platform that would be more automated and self-healing, and when he began experimenting with Kubernetes, he found that it checked all those boxes. “Kubernetes allows us to just define applications, how they need to run, how scalable they need to be, and it takes pain away from the developers thinking about that,” he says. “When things fail and errors pop up, the system tries to heal itself, and that’s really, for us, the key reason to work with Kubernetes. It allowed us to set up certain testing frameworks to just be alerted when things go wrong, instead of having to look at whether everything went right. It’s made people’s lives much easier. It’s quite a big mindset change.” -

- Once one small Kubernetes cluster was up and running, the team began moving over a few applications at a time, gradually increasing the load over the course of several months. By early 2017, Woorank was 100% deployed on Kubernetes. -

- The company’s number one concern was immediately erased: Maintaining Kubernetes is the responsibility of just one person on staff, and it’s not his fulltime job. Updating the old infrastructure “was always a pain,” says De Moor: It used to take two active working days, “and it was always a bit scary when we did that.” With Kubernetes, it’s just a matter of “a few hours of passively following the process.” -
-
-
-
- "When things fail and errors pop up, the system tries to heal itself, and that’s really, for us, the key reason to work with Kubernetes. It allowed us to set up certain testing frameworks to just be alerted when things go wrong, instead of having to look at whether everything went right. It’s made people’s lives much easier. It’s quite a big mindset change."

- NILS DE MOOR, CTO/COFOUNDER, WOORANK
-
-
+

"We help them acquire lots of data and then present it to them in meaningful ways so they can work with it," says CTO/Cofounder Nils De Moor. In its seven years as a startup, the company followed a familiar technological path to build that product: starting with a monolithic application, breaking it down into microservices, and then embracing containerization. "That's where our modern infrastructure started out," says De Moor.

-
-
- Transparency on all levels, from the code to the servers, has also been a byproduct of the move to Kubernetes. “It’s easier for the entire team to get a better understanding of the infrastructure, how it’s working, how it looks like, what’s going on,” says De Moor. “It’s not that thing that’s running, and no one really knows how it works except this one person. Now it’s really a team effort of everyone knowing, ‘Okay, when something goes wrong, it’s probably in this area or we need to check this.’” -

- To that end, Woorank has begun implementing other cloud native tools that help with visibility, such as Fluentd for logging, Prometheus for monitoring, and OpenTracing for distributed tracing. Implementing these new tools—which once took weeks of planning, installing, and onboarding—now only takes a few days. “With all the tools and projects under the CNCF umbrella, it’s easier for us to test and play with technology than it used to be,” says De Moor. “With Prometheus, we used it fairly early and couldn’t get it fairly stable. A couple of months ago, the question reappeared, so we set it up in two days, and now everyone is using it.” -

- Deployments, too, have been impacted: The rate has more than doubled, which De Moor partly attributes to the transparency of the new process. “With Kubernetes, you see that these three containers didn’t start for this reason,” he says. Plus, “now we bring deployment messages into Slack. If you see deployments rolling by every day, it does somehow indirectly enforce you, okay, I need to be part of this train, so I also need to deploy.” -
+

As new features have been added to the product, it has grown to consist of 50 applications under the hood. Though Docker had made things easier to deploy, and the team had been using Mesos as an orchestration framework on AWS since 2015, De Moor realized there was still too much overhead to managing the infrastructure, especially with a technical team of just 12.

-
-
-"We can plan those things over a certain timeline, try to fit our resource usage to that, and then bring in spot instances, which will hopefully drive the costs down more."

- NILS DE MOOR, CTO/COFOUNDER, WOORANK
-
+

"The pain point was that there were still lots of our own libraries that we had to roll and applications that we had to bring in, so it was very cumbersome for us as a small team to keep those things alive and to update them," says De Moor. "When things went wrong during deployment, someone manually had to come in and figure it out. It wasn't necessarily that the technology or anything was wrong with Mesos; it was just not really fitting our model of being a small company, not having the human resources to make sure it all works and can be updated."

-
- Perhaps the biggest impact, though, has been on the bottom line. “We were already pretty flexible in our costs and taking on traffic peaks and higher load in general, but with Kubernetes and the other CNCF tools we use, we have achieved about 30% in cost savings,” says De Moor. -

- And there’s room for even greater savings. Currently, most of Woorank’s infrastructure is running on AWS on demand; the company pays a fixed price and makes some reservations for its planned amount of resources needed. De Moor is planning to experiment more with spot instances with certain resource-heavy workloads such as web crawls: “We can plan those things over a certain timeline, try to fit our resource usage to that, and then bring in spot instances, which will hopefully drive the costs down more.” -

- Moving to Kubernetes has been so beneficial to Woorank that the company is doubling down on both cloud native technologies and the community. “It was definitely important for us to have CNCF as an umbrella above everything,” says De Moor. “We’ve always been working with open source libraries and tools and technologies. It works very well for us, but sometimes things can drift, maintainers drop out, and projects go haywire. For us, it was indeed important to know that whatever project gets taken under this umbrella, it’s taken very seriously. Our way of contributing back is also by joining this community. It’s, for us, a way to show our appreciation for what’s going on in this framework.” -
-
+{{< case-studies/quote + image="/images/case-studies/woorank/banner3.jpg" + author="NILS DE MOOR, CTO/COFOUNDER, WOORANK" +>}} +"Cloud native technologies have brought to us a transparency on everything going on in our system, from the code to the server. It has brought huge cost savings and a better way of dealing with those costs and keeping them under control. And performance-wise, it has helped our team understand how we can make our code work better on the cloud native infrastructure." +{{< /case-studies/quote >}} + +

Around the time Woorank was grappling with these issues, Kubernetes was emerging as a technology. De Moor knew that he wanted a platform that would be more automated and self-healing, and when he began experimenting with Kubernetes, he found that it checked all those boxes. "Kubernetes allows us to just define applications, how they need to run, how scalable they need to be, and it takes pain away from the developers thinking about that," he says. "When things fail and errors pop up, the system tries to heal itself, and that's really, for us, the key reason to work with Kubernetes. It allowed us to set up certain testing frameworks to just be alerted when things go wrong, instead of having to look at whether everything went right. It's made people's lives much easier. It's quite a big mindset change."

+ +

Once one small Kubernetes cluster was up and running, the team began moving over a few applications at a time, gradually increasing the load over the course of several months. By early 2017, Woorank was 100% deployed on Kubernetes.

+ +

The company's number one concern was immediately erased: Maintaining Kubernetes is the responsibility of just one person on staff, and it's not his fulltime job. Updating the old infrastructure "was always a pain," says De Moor: It used to take two active working days, "and it was always a bit scary when we did that." With Kubernetes, it's just a matter of "a few hours of passively following the process."

+ +{{< case-studies/quote + image="/images/case-studies/woorank/banner4.jpg" + author="NILS DE MOOR, CTO/COFOUNDER, WOORANK" +>}} +"When things fail and errors pop up, the system tries to heal itself, and that's really, for us, the key reason to work with Kubernetes. It allowed us to set up certain testing frameworks to just be alerted when things go wrong, instead of having to look at whether everything went right. It's made people's lives much easier. It's quite a big mindset change." +{{< /case-studies/quote >}} + +

Transparency on all levels, from the code to the servers, has also been a byproduct of the move to Kubernetes. "It's easier for the entire team to get a better understanding of the infrastructure, how it's working, how it looks like, what's going on," says De Moor. "It's not that thing that's running, and no one really knows how it works except this one person. Now it's really a team effort of everyone knowing, 'Okay, when something goes wrong, it's probably in this area or we need to check this.'"

+ +

To that end, Woorank has begun implementing other cloud native tools that help with visibility, such as Fluentd for logging, Prometheus for monitoring, and OpenTracing for distributed tracing. Implementing these new tools—which once took weeks of planning, installing, and onboarding—now only takes a few days. "With all the tools and projects under the CNCF umbrella, it's easier for us to test and play with technology than it used to be," says De Moor. "With Prometheus, we used it fairly early and couldn't get it fairly stable. A couple of months ago, the question reappeared, so we set it up in two days, and now everyone is using it."

+ +

Deployments, too, have been impacted: The rate has more than doubled, which De Moor partly attributes to the transparency of the new process. "With Kubernetes, you see that these three containers didn't start for this reason," he says. Plus, "now we bring deployment messages into Slack. If you see deployments rolling by every day, it does somehow indirectly enforce you, okay, I need to be part of this train, so I also need to deploy."

+ +{{< case-studies/quote author="NILS DE MOOR, CTO/COFOUNDER, WOORANK" >}} +"We can plan those things over a certain timeline, try to fit our resource usage to that, and then bring in spot instances, which will hopefully drive the costs down more." +{{< /case-studies/quote >}} + +

Perhaps the biggest impact, though, has been on the bottom line. "We were already pretty flexible in our costs and taking on traffic peaks and higher load in general, but with Kubernetes and the other CNCF tools we use, we have achieved about 30% in cost savings," says De Moor.

+ +

And there's room for even greater savings. Currently, most of Woorank's infrastructure is running on AWS on demand; the company pays a fixed price and makes some reservations for its planned amount of resources needed. De Moor is planning to experiment more with spot instances with certain resource-heavy workloads such as web crawls: "We can plan those things over a certain timeline, try to fit our resource usage to that, and then bring in spot instances, which will hopefully drive the costs down more."

+ +

Moving to Kubernetes has been so beneficial to Woorank that the company is doubling down on both cloud native technologies and the community. "It was definitely important for us to have CNCF as an umbrella above everything," says De Moor. "We've always been working with open source libraries and tools and technologies. It works very well for us, but sometimes things can drift, maintainers drop out, and projects go haywire. For us, it was indeed important to know that whatever project gets taken under this umbrella, it's taken very seriously. Our way of contributing back is also by joining this community. It's, for us, a way to show our appreciation for what's going on in this framework."

diff --git a/content/en/case-studies/workiva/index.html b/content/en/case-studies/workiva/index.html index 1c09503bfb..d8945bfd34 100644 --- a/content/en/case-studies/workiva/index.html +++ b/content/en/case-studies/workiva/index.html @@ -3,111 +3,91 @@ title: Workiva Case Study linkTitle: Workiva case_study_styles: true cid: caseStudies -css: /css/style_case_studies.css draft: true featured: true weight: 20 quote: > With OpenTracing, my team was able to look at a trace and make optimization suggestions to another team without ever looking at their code. + +new_case_study_styles: true +heading_background: /images/case-studies/workiva/banner1.jpg +heading_title_logo: /images/workiva_logo.png +subheading: > + Using OpenTracing to Help Pinpoint the Bottlenecks +case_study_details: + - Company: Workiva + - Location: Ames, Iowa + - Industry: Enterprise Software --- -
-

CASE STUDY:
Using OpenTracing to Help Pinpoint the Bottlenecks +

Challenge

-

+

Workiva offers a cloud-based platform for managing and reporting business data. This SaaS product, Wdesk, is used by more than 70 percent of the Fortune 500 companies. As the company made the shift from a monolith to a more distributed, microservice-based system, "We had a number of people working on this, all on different teams, so we needed to identify what the issues were and where the bottlenecks were," says Senior Software Architect MacLeod Broad. With back-end code running on Google App Engine, Google Compute Engine, as well as Amazon Web Services, Workiva needed a tracing system that was agnostic of platform. While preparing one of the company's first products utilizing AWS, which involved a "sync and link" feature that linked data from spreadsheets built in the new application with documents created in the old application on Workiva's existing system, Broad's team found an ideal use case for tracing: There were circular dependencies, and optimizations often turned out to be micro-optimizations that didn't impact overall speed.

-
+

Solution

-
- Company  Workiva     Location  Ames, Iowa     Industry  Enterprise Software -
+

Broad's team introduced the platform-agnostic distributed tracing system OpenTracing to help them pinpoint the bottlenecks.

-
-
-
-
-

Challenge

- Workiva offers a cloud-based platform for managing and reporting business data. This SaaS product, Wdesk, is used by more than 70 percent of the Fortune 500 companies. As the company made the shift from a monolith to a more distributed, microservice-based system, "We had a number of people working on this, all on different teams, so we needed to identify what the issues were and where the bottlenecks were," says Senior Software Architect MacLeod Broad. With back-end code running on Google App Engine, Google Compute Engine, as well as Amazon Web Services, Workiva needed a tracing system that was agnostic of platform. While preparing one of the company’s first products utilizing AWS, which involved a "sync and link" feature that linked data from spreadsheets built in the new application with documents created in the old application on Workiva’s existing system, Broad’s team found an ideal use case for tracing: There were circular dependencies, and optimizations often turned out to be micro-optimizations that didn’t impact overall speed. - -
- -
- -
-

Solution

- Broad’s team introduced the platform-agnostic distributed tracing system OpenTracing to help them pinpoint the bottlenecks. -

Impact

- Now used throughout the company, OpenTracing produced immediate results. Software Engineer Michael Davis reports: "Tracing has given us immediate, actionable insight into how to improve our service. Through a combination of seeing where each call spends its time, as well as which calls are most often used, we were able to reduce our average response time by 95 percent (from 600ms to 30ms) in a single fix." -
+

Now used throughout the company, OpenTracing produced immediate results. Software Engineer Michael Davis reports: "Tracing has given us immediate, actionable insight into how to improve our service. Through a combination of seeing where each call spends its time, as well as which calls are most often used, we were able to reduce our average response time by 95 percent (from 600ms to 30ms) in a single fix."

-
-
-
-
-"With OpenTracing, my team was able to look at a trace and make optimization suggestions to another team without ever looking at their code."
— MacLeod Broad, Senior Software Architect at Workiva
+{{< case-studies/quote author="MacLeod Broad, Senior Software Architect at Workiva" >}} +"With OpenTracing, my team was able to look at a trace and make optimization suggestions to another team without ever looking at their code." +{{< /case-studies/quote >}} -
-
-
-
-

Last fall, MacLeod Broad’s platform team at Workiva was prepping one of the company’s first products utilizing Amazon Web Services when they ran into a roadblock.

- Early on, Workiva’s backend had run mostly on Google App Engine. But things changed along the way as Workiva’s SaaS offering, Wdesk, a cloud-based platform for managing and reporting business data, grew its customer base to more than 70 percent of the Fortune 500 companies. "As customer needs grew and the product offering expanded, we started to leverage a wider offering of services such as Amazon Web Services as well as other Google Cloud Platform services, creating a multi-vendor environment."

-With this new product, there was a "sync and link" feature by which data "went through a whole host of services starting with the new spreadsheet system [Amazon Aurora] into what we called our linking system, and then pushed through http to our existing system, and then a number of calculations would go on, and the results would be transmitted back into the new system," says Broad. "We were trying to optimize that for speed. We thought we had made this great optimization and then it would turn out to be a micro optimization, which didn’t really affect the overall speed of things."

-The challenges faced by Broad’s team may sound familiar to other companies that have also made the shift from monoliths to more distributed, microservice-based systems. "We had a number of people working on this, all on different teams, so it was difficult to get our head around what the issues were and where the bottlenecks were," says Broad.

- "Each service team was going through different iterations of their architecture and it was very hard to follow what was actually going on in each teams’ system," he adds. "We had circular dependencies where we’d have three or four different service teams unsure of where the issues really were, requiring a lot of back and forth communication. So we wasted a lot of time saying, ‘What part of this is slow? Which part of this is sometimes slow depending on the use case? Which part is degrading over time? Which part of this process is asynchronous so it doesn’t really matter if it’s long-running or not? What are we doing that’s redundant, and which part of this is buggy?’" +{{< case-studies/lead >}} +Last fall, MacLeod Broad's platform team at Workiva was prepping one of the company's first products utilizing Amazon Web Services when they ran into a roadblock. +{{< /case-studies/lead >}} +

Early on, Workiva's backend had run mostly on Google App Engine. But things changed along the way as Workiva's SaaS offering, Wdesk, a cloud-based platform for managing and reporting business data, grew its customer base to more than 70 percent of the Fortune 500 companies. "As customer needs grew and the product offering expanded, we started to leverage a wider offering of services such as Amazon Web Services as well as other Google Cloud Platform services, creating a multi-vendor environment."

-
-
-
-
- "A tracing system can at a glance explain an architecture, narrow down a performance bottleneck and zero in on it, and generally just help direct an investigation at a high level. Being able to do that at a glance is much faster than at a meeting or with three days of debugging, and it’s a lot faster than never figuring out the problem and just moving on."
— MACLEOD BROAD, SENIOR SOFTWARE ARCHITECT AT WORKIVA
-
-
-
-
+

With this new product, there was a "sync and link" feature by which data "went through a whole host of services starting with the new spreadsheet system [Amazon Aurora] into what we called our linking system, and then pushed through http to our existing system, and then a number of calculations would go on, and the results would be transmitted back into the new system," says Broad. "We were trying to optimize that for speed. We thought we had made this great optimization and then it would turn out to be a micro optimization, which didn't really affect the overall speed of things."

-Simply put, it was an ideal use case for tracing. "A tracing system can at a glance explain an architecture, narrow down a performance bottleneck and zero in on it, and generally just help direct an investigation at a high level," says Broad. "Being able to do that at a glance is much faster than at a meeting or with three days of debugging, and it’s a lot faster than never figuring out the problem and just moving on."

-With Workiva’s back-end code running on Google Compute Engine as well as App Engine and AWS, Broad knew that he needed a tracing system that was platform agnostic. "We were looking at different tracing solutions," he says, "and we decided that because it seemed to be a very evolving market, we didn’t want to get stuck with one vendor. So OpenTracing seemed like the cleanest way to avoid vendor lock-in on what backend we actually had to use."

-Once they introduced OpenTracing into this first use case, Broad says, "The trace made it super obvious where the bottlenecks were." Even though everyone had assumed it was Workiva’s existing code that was slowing things down, that wasn’t exactly the case. "It looked like the existing code was slow only because it was reaching out to our next-generation services, and they were taking a very long time to service all those requests," says Broad. "On the waterfall graph you can see the exact same work being done on every request when it was calling back in. So every service request would look the exact same for every response being paged out. And then it was just a no-brainer of, ‘Why is it doing all this work again?’"

-Using the insight OpenTracing gave them, "My team was able to look at a trace and make optimization suggestions to another team without ever looking at their code," says Broad. "The way we named our traces gave us insight whether it’s doing a SQL call or it’s making an RPC. And so it was really easy to say, ‘OK, we know that it’s going to page through all these requests. Do the work once and stuff it in cache.’ And we were done basically. All those calls became sub-second calls immediately."

+

The challenges faced by Broad's team may sound familiar to other companies that have also made the shift from monoliths to more distributed, microservice-based systems. "We had a number of people working on this, all on different teams, so it was difficult to get our head around what the issues were and where the bottlenecks were," says Broad.

+

"Each service team was going through different iterations of their architecture and it was very hard to follow what was actually going on in each teams' system," he adds. "We had circular dependencies where we'd have three or four different service teams unsure of where the issues really were, requiring a lot of back and forth communication. So we wasted a lot of time saying, 'What part of this is slow? Which part of this is sometimes slow depending on the use case? Which part is degrading over time? Which part of this process is asynchronous so it doesn't really matter if it's long-running or not? What are we doing that's redundant, and which part of this is buggy?'"

+{{< case-studies/quote + image="/images/case-studies/workiva/banner3.jpg" + author="MACLEOD BROAD, SENIOR SOFTWARE ARCHITECT AT WORKIVA" +>}} +"A tracing system can at a glance explain an architecture, narrow down a performance bottleneck and zero in on it, and generally just help direct an investigation at a high level. Being able to do that at a glance is much faster than at a meeting or with three days of debugging, and it's a lot faster than never figuring out the problem and just moving on." +{{< /case-studies/quote >}} -
-
-
-
-"We were looking at different tracing solutions and we decided that because it seemed to be a very evolving market, we didn’t want to get stuck with one vendor. So OpenTracing seemed like the cleanest way to avoid vendor lock-in on what backend we actually had to use."
— MACLEOD BROAD, SENIOR SOFTWARE ARCHITECT AT WORKIVA
-
-
+

Simply put, it was an ideal use case for tracing. "A tracing system can at a glance explain an architecture, narrow down a performance bottleneck and zero in on it, and generally just help direct an investigation at a high level," says Broad. "Being able to do that at a glance is much faster than at a meeting or with three days of debugging, and it's a lot faster than never figuring out the problem and just moving on."

-
-
- After the success of the first use case, everyone involved in the trial went back and fully instrumented their products. Tracing was added to a few more use cases. "We wanted to get through the initial implementation pains early without bringing the whole department along for the ride," says Broad. "Now, a lot of teams add it when they’re starting up a new service. We’re really pushing adoption now more than we were before."

-Some teams were won over quickly. "Tracing has given us immediate, actionable insight into how to improve our [Workspaces] service," says Software Engineer Michael Davis. "Through a combination of seeing where each call spends its time, as well as which calls are most often used, we were able to reduce our average response time by 95 percent (from 600ms to 30ms) in a single fix."

-Most of Workiva’s major products are now traced using OpenTracing, with data pushed into Google StackDriver. Even the products that aren’t fully traced have some components and libraries that are.

-Broad points out that because some of the engineers were working on App Engine and already had experience with the platform’s Appstats library for profiling performance, it didn’t take much to get them used to using OpenTracing. But others were a little more reluctant. "The biggest hindrance to adoption I think has been the concern about how much latency is introducing tracing [and StackDriver] going to cost," he says. "People are also very concerned about adding middleware to whatever they’re working on. Questions about passing the context around and how that’s done were common. A lot of our Go developers were fine with it, because they were already doing that in one form or another. Our Java developers were not super keen on doing that because they’d used other systems that didn’t require that."

-But the benefits clearly outweighed the concerns, and today, Workiva’s official policy is to use tracing." -In fact, Broad believes that tracing naturally fits in with Workiva’s existing logging and metrics systems. "This was the way we presented it internally, and also the way we designed our use," he says. "Our traces are logged in the exact same mechanism as our app metric and logging data, and they get pushed the exact same way. So we treat all that data exactly the same when it’s being created and when it’s being recorded. We have one internal library that we use for logging, telemetry, analytics and tracing." +

With Workiva's back-end code running on Google Compute Engine as well as App Engine and AWS, Broad knew that he needed a tracing system that was platform agnostic. "We were looking at different tracing solutions," he says, "and we decided that because it seemed to be a very evolving market, we didn't want to get stuck with one vendor. So OpenTracing seemed like the cleanest way to avoid vendor lock-in on what backend we actually had to use."

+

Once they introduced OpenTracing into this first use case, Broad says, "The trace made it super obvious where the bottlenecks were." Even though everyone had assumed it was Workiva's existing code that was slowing things down, that wasn't exactly the case. "It looked like the existing code was slow only because it was reaching out to our next-generation services, and they were taking a very long time to service all those requests," says Broad. "On the waterfall graph you can see the exact same work being done on every request when it was calling back in. So every service request would look the exact same for every response being paged out. And then it was just a no-brainer of, 'Why is it doing all this work again?'"

-
+

Using the insight OpenTracing gave them, "My team was able to look at a trace and make optimization suggestions to another team without ever looking at their code," says Broad. "The way we named our traces gave us insight whether it's doing a SQL call or it's making an RPC. And so it was really easy to say, 'OK, we know that it's going to page through all these requests. Do the work once and stuff it in cache.' And we were done basically. All those calls became sub-second calls immediately."

-
-
- "Tracing has given us immediate, actionable insight into how to improve our [Workspaces] service. Through a combination of seeing where each call spends its time, as well as which calls are most often used, we were able to reduce our average response time by 95 percent (from 600ms to 30ms) in a single fix."
— Michael Davis, Software Engineer, Workiva
-
-
+{{< case-studies/quote + image="/images/case-studies/workiva/banner4.jpg" + author="MACLEOD BROAD, SENIOR SOFTWARE ARCHITECT AT WORKIVA" +>}} +"We were looking at different tracing solutions and we decided that because it seemed to be a very evolving market, we didn't want to get stuck with one vendor. So OpenTracing seemed like the cleanest way to avoid vendor lock-in on what backend we actually had to use." +{{< /case-studies/quote >}} -
- For Workiva, OpenTracing has become an essential tool for zeroing in on optimizations and determining what’s actually a micro-optimization by observing usage patterns. "On some projects we often assume what the customer is doing, and we optimize for these crazy scale cases that we hit 1 percent of the time," says Broad. "It’s been really helpful to be able to say, ‘OK, we’re adding 100 milliseconds on every request that does X, and we only need to add that 100 milliseconds if it’s the worst of the worst case, which only happens one out of a thousand requests or one out of a million requests."

-Unlike many other companies, Workiva also traces the client side. "For us, the user experience is important—it doesn’t matter if the RPC takes 100 milliseconds if it still takes 5 seconds to do the rendering to show it in the browser," says Broad. "So for us, those client times are important. We trace it to see what parts of loading take a long time. We’re in the middle of working on a definition of what is ‘loaded.’ Is it when you have it, or when it’s rendered, or when you can interact with it? Those are things we’re planning to use tracing for to keep an eye on and to better understand."

-That also requires adjusting for differences in external and internal clocks. "Before time correcting, it was horrible; our traces were more misleading than anything," says Broad. "So we decided that we would return a timestamp on the response headers, and then have the client reorient its time based on that—not change its internal clock but just calculate the offset on the response time to when the client got it. And if you end up in an impossible situation where a client RPC spans 210 milliseconds but the time on the response time is outside of that window, then we have to reorient that."

-Broad is excited about the impact OpenTracing has already had on the company, and is also looking ahead to what else the technology can enable. One possibility is using tracing to update documentation in real time. "Keeping documentation up to date with reality is a big challenge," he says. "Say, we just ran a trace simulation or we just ran a smoke test on this new deploy, and the architecture doesn’t match the documentation. We can find whose responsibility it is and let them know and have them update it. That’s one of the places I’d like to get in the future with tracing." +

After the success of the first use case, everyone involved in the trial went back and fully instrumented their products. Tracing was added to a few more use cases. "We wanted to get through the initial implementation pains early without bringing the whole department along for the ride," says Broad. "Now, a lot of teams add it when they're starting up a new service. We're really pushing adoption now more than we were before."

-
+

Some teams were won over quickly. "Tracing has given us immediate, actionable insight into how to improve our [Workspaces] service," says Software Engineer Michael Davis. "Through a combination of seeing where each call spends its time, as well as which calls are most often used, we were able to reduce our average response time by 95 percent (from 600ms to 30ms) in a single fix."

-
+

Most of Workiva's major products are now traced using OpenTracing, with data pushed into Google StackDriver. Even the products that aren't fully traced have some components and libraries that are.

+ +

Broad points out that because some of the engineers were working on App Engine and already had experience with the platform's Appstats library for profiling performance, it didn't take much to get them used to using OpenTracing. But others were a little more reluctant. "The biggest hindrance to adoption I think has been the concern about how much latency is introducing tracing [and StackDriver] going to cost," he says. "People are also very concerned about adding middleware to whatever they're working on. Questions about passing the context around and how that's done were common. A lot of our Go developers were fine with it, because they were already doing that in one form or another. Our Java developers were not super keen on doing that because they'd used other systems that didn't require that."But the benefits clearly outweighed the concerns, and today, Workiva's official policy is to use tracing."

+ +

In fact, Broad believes that tracing naturally fits in with Workiva's existing logging and metrics systems. "This was the way we presented it internally, and also the way we designed our use," he says. "Our traces are logged in the exact same mechanism as our app metric and logging data, and they get pushed the exact same way. So we treat all that data exactly the same when it's being created and when it's being recorded. We have one internal library that we use for logging, telemetry, analytics and tracing."

+ +{{< case-studies/quote author="Michael Davis, Software Engineer, Workiva" >}} +"Tracing has given us immediate, actionable insight into how to improve our [Workspaces] service. Through a combination of seeing where each call spends its time, as well as which calls are most often used, we were able to reduce our average response time by 95 percent (from 600ms to 30ms) in a single fix." +{{< /case-studies/quote >}} + +

For Workiva, OpenTracing has become an essential tool for zeroing in on optimizations and determining what's actually a micro-optimization by observing usage patterns. "On some projects we often assume what the customer is doing, and we optimize for these crazy scale cases that we hit 1 percent of the time," says Broad. "It's been really helpful to be able to say, 'OK, we're adding 100 milliseconds on every request that does X, and we only need to add that 100 milliseconds if it's the worst of the worst case, which only happens one out of a thousand requests or one out of a million requests."

+ +

Unlike many other companies, Workiva also traces the client side. "For us, the user experience is important—it doesn't matter if the RPC takes 100 milliseconds if it still takes 5 seconds to do the rendering to show it in the browser," says Broad. "So for us, those client times are important. We trace it to see what parts of loading take a long time. We're in the middle of working on a definition of what is 'loaded.' Is it when you have it, or when it's rendered, or when you can interact with it? Those are things we're planning to use tracing for to keep an eye on and to better understand."

+ +

That also requires adjusting for differences in external and internal clocks. "Before time correcting, it was horrible; our traces were more misleading than anything," says Broad. "So we decided that we would return a timestamp on the response headers, and then have the client reorient its time based on that—not change its internal clock but just calculate the offset on the response time to when the client got it. And if you end up in an impossible situation where a client RPC spans 210 milliseconds but the time on the response time is outside of that window, then we have to reorient that."

+ +

Broad is excited about the impact OpenTracing has already had on the company, and is also looking ahead to what else the technology can enable. One possibility is using tracing to update documentation in real time. "Keeping documentation up to date with reality is a big challenge," he says. "Say, we just ran a trace simulation or we just ran a smoke test on this new deploy, and the architecture doesn't match the documentation. We can find whose responsibility it is and let them know and have them update it. That's one of the places I'd like to get in the future with tracing."

diff --git a/content/en/case-studies/ygrene/index.html b/content/en/case-studies/ygrene/index.html index c07443249a..9afc1ec45b 100644 --- a/content/en/case-studies/ygrene/index.html +++ b/content/en/case-studies/ygrene/index.html @@ -1,111 +1,82 @@ --- title: Ygrene Case Study - linkTitle: Ygrene case_study_styles: true cid: caseStudies -css: /css/style_case_studies.css logo: ygrene_featured_logo.png featured: true weight: 48 quote: > - We had to change some practices and code, and the way things were built, but we were able to get our main systems onto Kubernetes in a month or so, and then into production within two months. That’s very fast for a finance company. + We had to change some practices and code, and the way things were built, but we were able to get our main systems onto Kubernetes in a month or so, and then into production within two months. That's very fast for a finance company. + +new_case_study_styles: true +heading_background: /images/case-studies/ygrene/banner1.jpg +heading_title_logo: /images/ygrene_logo.png +subheading: > + Ygrene: Using Cloud Native to Bring Security and Scalability to the Finance Industry +case_study_details: + - Company: Ygrene + - Location: Petaluma, Calif. + - Industry: Clean energy financing --- -
-

CASE STUDY:
Ygrene: Using Cloud Native to Bring Security and Scalability to the Finance Industry +

Challenge

-

+

A PACE (Property Assessed Clean Energy) financing company, Ygrene has funded more than $1 billion in loans since 2010. In order to approve and process those loans, "We have lots of data sources that are being aggregated, and we also have lots of systems that need to churn on that data," says Ygrene Development Manager Austin Adams. The company was utilizing massive servers, and "we just reached the limit of being able to scale them vertically. We had a really unstable system that became overwhelmed with requests just for doing background data processing in real time. The performance the users saw was very poor. We needed a solution that wouldn't require us to make huge refactors to the code base." As a finance company, Ygrene also needed to ensure that they were shipping their applications securely.

-
+

Solution

-
- Company  Ygrene     Location  Petaluma, Calif.     Industry  Clean energy financing -
- -
-
-
-
-

Challenge

- A PACE (Property Assessed Clean Energy) financing company, Ygrene has funded more than $1 billion in loans since 2010. In order to approve and process those loans, "We have lots of data sources that are being aggregated, and we also have lots of systems that need to churn on that data," says Ygrene Development Manager Austin Adams. The company was utilizing massive servers, and "we just reached the limit of being able to scale them vertically. We had a really unstable system that became overwhelmed with requests just for doing background data processing in real time. The performance the users saw was very poor. We needed a solution that wouldn’t require us to make huge refactors to the code base." As a finance company, Ygrene also needed to ensure that they were shipping their applications securely. - -
- -

Solution

- Moving from an Engine Yard platform and Amazon Elastic Beanstalk, the Ygrene team embraced cloud native technologies and practices: Kubernetes to help scale out vertically and distribute workloads, Notary to put in build-time controls and get trust on the Docker images being used with third-party dependencies, and Fluentd for "observing every part of our stack," all running on Amazon EC2 Spot. - -
- -
+

Moving from an Engine Yard platform and Amazon Elastic Beanstalk, the Ygrene team embraced cloud native technologies and practices: Kubernetes to help scale out vertically and distribute workloads, Notary to put in build-time controls and get trust on the Docker images being used with third-party dependencies, and Fluentd for "observing every part of our stack," all running on Amazon EC2 Spot.

Impact

- Before, deployments typically took three to four hours, and two or three months’ worth of work would be deployed at low-traffic times every week or two weeks. Now, they take five minutes for Kubernetes, and an hour for the overall deploy with smoke testing. And "we’re able to deploy three or four times a week, with just one week’s or two days’ worth of work," Adams says. "We’re deploying during the work week, in the daytime and without any downtime. We had to ask for business approval to take the systems down, even in the middle of the night, because people could be doing loans. Now we can deploy, ship code, and migrate databases, all without taking the system down. The company gets new features without worrying that some business will be lost or delayed." Additionally, by using the kops project, Ygrene can now run its Kubernetes clusters with AWS EC2 Spot, at a tenth of the previous cost. These cloud native technologies have "changed the game for scalability, observability, and security—we’re adding new data sources that are very secure," says Adams. "Without Kubernetes, Notary, and Fluentd, we couldn’t tell our investors and team members that we knew what was going on." +

Before, deployments typically took three to four hours, and two or three months' worth of work would be deployed at low-traffic times every week or two weeks. Now, they take five minutes for Kubernetes, and an hour for the overall deploy with smoke testing. And "we're able to deploy three or four times a week, with just one week's or two days' worth of work," Adams says. "We're deploying during the work week, in the daytime and without any downtime. We had to ask for business approval to take the systems down, even in the middle of the night, because people could be doing loans. Now we can deploy, ship code, and migrate databases, all without taking the system down. The company gets new features without worrying that some business will be lost or delayed." Additionally, by using the kops project, Ygrene can now run its Kubernetes clusters with AWS EC2 Spot, at a tenth of the previous cost. These cloud native technologies have "changed the game for scalability, observability, and security—we're adding new data sources that are very secure," says Adams. "Without Kubernetes, Notary, and Fluentd, we couldn't tell our investors and team members that we knew what was going on."

-
+{{< case-studies/quote author="Austin Adams, Development Manager, Ygrene Energy Fund" >}} +"CNCF projects are helping Ygrene determine the security and observability standards for the entire PACE industry. We're an emerging finance industry, and without these projects, especially Kubernetes, we couldn't be the industry leader that we are today." +{{< /case-studies/quote >}} -
-
-
-
-"CNCF projects are helping Ygrene determine the security and observability standards for the entire PACE industry. We’re an emerging finance industry, and without these projects, especially Kubernetes, we couldn’t be the industry leader that we are today."

— Austin Adams, Development Manager, Ygrene Energy Fund
+{{< case-studies/lead >}} +In less than a decade, Ygrene has funded more than $1 billion in loans for renewable energy projects. +{{< /case-studies/lead >}} -
-
+

A PACE (Property Assessed Clean Energy) financing company, "We take the equity in a home or a commercial building, and use it to finance property improvements for anything that saves electricity, produces electricity, saves water, or reduces carbon emissions," says Development Manager Austin Adams.

-
-

In less than a decade, Ygrene has funded more than $1 billion in loans for renewable energy projects.

A PACE (Property Assessed Clean Energy) financing company, "We take the equity in a home or a commercial building, and use it to finance property improvements for anything that saves electricity, produces electricity, saves water, or reduces carbon emissions," says Development Manager Austin Adams.

-In order to approve those loans, the company processes an enormous amount of underwriting data. "We have tons of different points that we have to validate about the property, about the company, or about the person," Adams says. "So we have lots of data sources that are being aggregated, and we also have lots of systems that need to churn on that data in real time."

-By 2017, deployments and scalability had become pain points. The company was utilizing massive servers, and "we just reached the limit of being able to scale them vertically," he says. Migrating to AWS Elastic Beanstalk didn’t solve the problem: "The Scala services needed a lot of data from the main Ruby on Rails services and from different vendors, so they were asking for information from our Ruby services at a rate that those services couldn’t handle. We had lots of configuration misses with Elastic Beanstalk as well. It just came to a head, and we realized we had a really unstable system." +

In order to approve those loans, the company processes an enormous amount of underwriting data. "We have tons of different points that we have to validate about the property, about the company, or about the person," Adams says. "So we have lots of data sources that are being aggregated, and we also have lots of systems that need to churn on that data in real time."

-
- -
-
- "CNCF has been an amazing incubator for so many projects. Now we look at its webpage regularly to find out if there are any new, awesome, high-quality projects we can implement into our stack. It’s actually become a hub for us for knowing what software we need to be looking at to make our systems more secure or more scalable."

— Austin Adams, Development Manager, Ygrene Energy Fund
-
-
-
-
+

By 2017, deployments and scalability had become pain points. The company was utilizing massive servers, and "we just reached the limit of being able to scale them vertically," he says. Migrating to AWS Elastic Beanstalk didn't solve the problem: "The Scala services needed a lot of data from the main Ruby on Rails services and from different vendors, so they were asking for information from our Ruby services at a rate that those services couldn't handle. We had lots of configuration misses with Elastic Beanstalk as well. It just came to a head, and we realized we had a really unstable system."

-Adams along with the rest of the team set out to find a solution that would be transformational, but "wouldn’t require us to make huge refactors to the code base," he says. And as a finance company, Ygrene needed security as much as scalability. They found the answer by embracing cloud native technologies: Kubernetes to help scale out vertically and distribute workloads, Notary to achieve reliable security at every level, and Fluentd for observability. "Kubernetes was where the community was going, and we wanted to be future proof," says Adams.

-With Kubernetes, the team was able to quickly containerize the Ygrene application with Docker. "We had to change some practices and code, and the way things were built," Adams says, "but we were able to get our main systems onto Kubernetes in a month or so, and then into production within two months. That’s very fast for a finance company."

-How? Cloud native has "changed the game for scalability, observability, and security—we’re adding new data sources that are very secure," says Adams. "Without Kubernetes, Notary, and Fluentd, we couldn’t tell our investors and team members that we knew what was going on."

-Notary, in particular, "has been a godsend," says Adams. "We need to know that our attack surface on third-party dependencies is low, or at least managed. We use it as a trust system and we also use it as a separation, so production images are signed by Notary, but some development images we don’t sign. That is to ensure that they can’t get into the production cluster. We’ve been using it in the test cluster to feel more secure about our builds." +{{< case-studies/quote + image="/images/case-studies/ygrene/banner3.jpg" + author="Austin Adams, Development Manager, Ygrene Energy Fund" +>}} +"CNCF has been an amazing incubator for so many projects. Now we look at its webpage regularly to find out if there are any new, awesome, high-quality projects we can implement into our stack. It's actually become a hub for us for knowing what software we need to be looking at to make our systems more secure or more scalable." +{{< /case-studies/quote >}} +

Adams along with the rest of the team set out to find a solution that would be transformational, but "wouldn't require us to make huge refactors to the code base," he says. And as a finance company, Ygrene needed security as much as scalability. They found the answer by embracing cloud native technologies: Kubernetes to help scale out vertically and distribute workloads, Notary to achieve reliable security at every level, and Fluentd for observability. "Kubernetes was where the community was going, and we wanted to be future proof," says Adams.

+

With Kubernetes, the team was able to quickly containerize the Ygrene application with Docker. "We had to change some practices and code, and the way things were built," Adams says, "but we were able to get our main systems onto Kubernetes in a month or so, and then into production within two months. That's very fast for a finance company."

-
-
-
-
-"We had to change some practices and code, and the way things were built," Adams says, "but we were able to get our main systems onto Kubernetes in a month or so, and then into production within two months. That’s very fast for a finance company." -
-
+

How? Cloud native has "changed the game for scalability, observability, and security—we're adding new data sources that are very secure," says Adams. "Without Kubernetes, Notary, and Fluentd, we couldn't tell our investors and team members that we knew what was going on."

-
-
- By using the kops project, Ygrene was able to move from Elastic Beanstalk to running its Kubernetes clusters on AWS EC2 Spot, at a tenth of the previous cost. "In order to scale before, we would need to up our instance sizes, incurring high cost for low value," says Adams. "Now with Kubernetes and kops, we are able to scale horizontally on Spot with multiple instance groups."

-That also helped them mitigate the risk that comes with running in the public cloud. "We figured out, essentially, that if we’re able to select instance classes using EC2 Spot that had an extremely low likelihood of interruption and zero history of interruption, and we’re willing to pay a price high enough, that we could virtually get the same guarantee using Kubernetes because we have enough nodes," says Software Engineer Zach Arnold, who led the migration to Kubernetes. "Now that we’ve re-architected these pieces of the application to not live on the same server, we can push out to many different servers and have a more stable deployment."

-As a result, the team can now ship code any time of day. "That was risky because it could bring down your whole loan management software with it," says Arnold. "But we now can deploy safely and securely during the day." +

Notary, in particular, "has been a godsend," says Adams. "We need to know that our attack surface on third-party dependencies is low, or at least managed. We use it as a trust system and we also use it as a separation, so production images are signed by Notary, but some development images we don't sign. That is to ensure that they can't get into the production cluster. We've been using it in the test cluster to feel more secure about our builds."

+{{< case-studies/quote image="/images/case-studies/ygrene/banner4.jpg">}} +"We had to change some practices and code, and the way things were built," Adams says, "but we were able to get our main systems onto Kubernetes in a month or so, and then into production within two months. That's very fast for a finance company." +{{< /case-studies/quote >}} -
+

By using the kops project, Ygrene was able to move from Elastic Beanstalk to running its Kubernetes clusters on AWS EC2 Spot, at a tenth of the previous cost. "In order to scale before, we would need to up our instance sizes, incurring high cost for low value," says Adams. "Now with Kubernetes and kops, we are able to scale horizontally on Spot with multiple instance groups."

-
-
- "In order to scale before, we would need to up our instance sizes, incurring high cost for low value," says Adams. "Now with Kubernetes and kops, we are able to scale horizontally on Spot with multiple instance groups." -
-
+

That also helped them mitigate the risk that comes with running in the public cloud. "We figured out, essentially, that if we're able to select instance classes using EC2 Spot that had an extremely low likelihood of interruption and zero history of interruption, and we're willing to pay a price high enough, that we could virtually get the same guarantee using Kubernetes because we have enough nodes," says Software Engineer Zach Arnold, who led the migration to Kubernetes. "Now that we've re-architected these pieces of the application to not live on the same server, we can push out to many different servers and have a more stable deployment."

-
- Before, deployments typically took three to four hours, and two or three months’ worth of work would be deployed at low-traffic times every week or two weeks. Now, they take five minutes for Kubernetes, and an hour for an overall deploy with smoke testing. And "we’re able to deploy three or four times a week, with just one week’s or two days’ worth of work," Adams says. "We’re deploying during the work week, in the daytime and without any downtime. We had to ask for business approval to take the systems down for 30 minutes to an hour, even in the middle of the night, because people could be doing loans. Now we can deploy, ship code, and migrate databases, all without taking the system down. The company gets new features without worrying that some business will be lost or delayed."

-Cloud native also affected how Ygrene’s 50+ developers and contractors work. Adams and Arnold spent considerable time "teaching people to think distributed out of the box," says Arnold. "We ended up picking what we call the Four S’s of Shipping: safely, securely, stably, and speedily." (For more on the security piece of it, see their article on their "continuous hacking" strategy.) As for the engineers, says Adams, "they have been able to advance as their software has advanced. I think that at the end of the day, the developers feel better about what they’re doing, and they also feel more connected to the modern software development community."

-Looking ahead, Adams is excited to explore more CNCF projects, including SPIFFE and SPIRE. "CNCF has been an amazing incubator for so many projects," he says. "Now we look at its webpage regularly to find out if there are any new, awesome, high-quality projects we can implement into our stack. It’s actually become a hub for us for knowing what software we need to be looking at to make our systems more secure or more scalable." +

As a result, the team can now ship code any time of day. "That was risky because it could bring down your whole loan management software with it," says Arnold. "But we now can deploy safely and securely during the day."

+{{< case-studies/quote >}} +"In order to scale before, we would need to up our instance sizes, incurring high cost for low value," says Adams. "Now with Kubernetes and kops, we are able to scale horizontally on Spot with multiple instance groups." +{{< /case-studies/quote >}} +

Before, deployments typically took three to four hours, and two or three months' worth of work would be deployed at low-traffic times every week or two weeks. Now, they take five minutes for Kubernetes, and an hour for an overall deploy with smoke testing. And "we're able to deploy three or four times a week, with just one week's or two days' worth of work," Adams says. "We're deploying during the work week, in the daytime and without any downtime. We had to ask for business approval to take the systems down for 30 minutes to an hour, even in the middle of the night, because people could be doing loans. Now we can deploy, ship code, and migrate databases, all without taking the system down. The company gets new features without worrying that some business will be lost or delayed."

-
+

Cloud native also affected how Ygrene's 50+ developers and contractors work. Adams and Arnold spent considerable time "teaching people to think distributed out of the box," says Arnold. "We ended up picking what we call the Four S's of Shipping: safely, securely, stably, and speedily." (For more on the security piece of it, see their article on their "continuous hacking" strategy.) As for the engineers, says Adams, "they have been able to advance as their software has advanced. I think that at the end of the day, the developers feel better about what they're doing, and they also feel more connected to the modern software development community."

-
+

Looking ahead, Adams is excited to explore more CNCF projects, including SPIFFE and SPIRE. "CNCF has been an amazing incubator for so many projects," he says. "Now we look at its webpage regularly to find out if there are any new, awesome, high-quality projects we can implement into our stack. It's actually become a hub for us for knowing what software we need to be looking at to make our systems more secure or more scalable."

diff --git a/content/en/case-studies/zalando/index.html b/content/en/case-studies/zalando/index.html index 49bad6ff9e..23363da401 100644 --- a/content/en/case-studies/zalando/index.html +++ b/content/en/case-studies/zalando/index.html @@ -1,101 +1,83 @@ --- title: Zalando Case Study - case_study_styles: true cid: caseStudies -css: /css/style_zalando.css + +new_case_study_styles: true +heading_background: /images/case-studies/zalando/banner1.jpg +heading_title_logo: /images/zalando_logo.png +subheading: > + Europe's Leading Online Fashion Platform Gets Radical with Cloud Native +case_study_details: + - Company: Zalando + - Location: Berlin, Germany + - Industry: Online Fashion --- -
-

CASE STUDY:
Europe’s Leading Online Fashion Platform Gets Radical with Cloud Native -

+

Challenge

-
+

Zalando, Europe's leading online fashion platform, has experienced exponential growth since it was founded in 2008. In 2015, with plans to further expand its original e-commerce site to include new services and products, Zalando embarked on a radical transformation resulting in autonomous self-organizing teams. This change requires an infrastructure that could scale with the growth of the engineering organization. Zalando's technology department began rewriting its applications to be cloud-ready and started moving its infrastructure from on-premise data centers to the cloud. While orchestration wasn't immediately considered, as teams migrated to Amazon Web Services (AWS): "We saw the pain teams were having with infrastructure and Cloud Formation on AWS," says Henning Jacobs, Head of Developer Productivity. "There's still too much operational overhead for the teams and compliance. " To provide better support, cluster management was brought into play.

-
- Company  Zalando     Location  Berlin, Germany     Industry  Online Fashion -
+

Solution

-
-
-
-
-

Challenge

- Zalando, Europe’s leading online fashion platform, has experienced exponential growth since it was founded in 2008. In 2015, with plans to further expand its original e-commerce site to include new services and products, Zalando embarked on a radical transformation resulting in autonomous self-organizing teams. This change requires an infrastructure that could scale with the growth of the engineering organization. Zalando’s technology department began rewriting its applications to be cloud-ready and started moving its infrastructure from on-premise data centers to the cloud. While orchestration wasn’t immediately considered, as teams migrated to Amazon Web Services (AWS): "We saw the pain teams were having with infrastructure and Cloud Formation on AWS," says Henning Jacobs, Head of Developer Productivity. "There’s still too much operational overhead for the teams and compliance. " To provide better support, cluster management was brought into play. +

The company now runs its Docker containers on AWS using Kubernetes orchestration.

-
- -
-

Solution

- The company now runs its Docker containers on AWS using Kubernetes orchestration. -

Impact

- With the old infrastructure "it was difficult to properly embrace new technologies, and DevOps teams were considered to be a bottleneck," says Jacobs. "Now, with this cloud infrastructure, they have this packaging format, which can contain anything that runs on the Linux kernel. This makes a lot of people pretty happy. The engineers love autonomy." -
-
-
-
-
- "We envision all Zalando delivery teams running their containerized applications on a state-of-the-art, reliable and scalable cluster infrastructure provided by Kubernetes."

- Henning Jacobs, Head of Developer Productivity at Zalando
-
-
-
-
-

When Henning Jacobs arrived at Zalando in 2010, the company was just two years old with 180 employees running an online store for European shoppers to buy fashion items.

- "It started as a PHP e-commerce site which was easy to get started with, but was not scaling with the business' needs" says Jacobs, Head of Developer Productivity at Zalando.

- At that time, the company began expanding beyond its German origins into other European markets. Fast-forward to today and Zalando now has more than 14,000 employees, 3.6 billion Euro in revenue for 2016 and operates across 15 countries. "With growth in all dimensions, and constant scaling, it has been a once-in-a-lifetime experience," he says.

- Not to mention a unique opportunity for an infrastructure specialist like Jacobs. Just after he joined, the company began rewriting all their applications in-house. "That was generally our strategy," he says. "For example, we started with our own logistics warehouses but at first you don’t know how to do logistics software, so you have some vendor software. And then we replaced it with our own because with off-the-shelf software you’re not competitive. You need to optimize these processes based on your specific business needs."

- In parallel to rewriting their applications, Zalando had set a goal of expanding beyond basic e-commerce to a platform offering multi-tenancy, a dramatic increase in assortments and styles, same-day delivery and even your own personal online stylist.

- The need to scale ultimately led the company on a cloud-native journey. As did its embrace of a microservices-based software architecture that gives engineering teams more autonomy and ownership of projects. "This move to the cloud was necessary because in the data center you couldn’t have autonomous teams. You have the same infrastructure and it was very homogeneous, so you could only run your Java or Python app," Jacobs says. -
-
-
-
- "This move to the cloud was necessary because in the data center you couldn’t have autonomous teams. You have the same infrastructure and it was very homogeneous, so you could only run your Java or Python app." +

With the old infrastructure "it was difficult to properly embrace new technologies, and DevOps teams were considered to be a bottleneck," says Jacobs. "Now, with this cloud infrastructure, they have this packaging format, which can contain anything that runs on the Linux kernel. This makes a lot of people pretty happy. The engineers love autonomy."

-
-
-
-
- Zalando began moving its infrastructure from two on-premise data centers to the cloud, requiring the migration of older applications for cloud-readiness. "We decided to have a clean break," says Jacobs. "Our Amazon Web Services infrastructure was set up like so: Every team has its own AWS account, which is completely isolated, meaning there’s no ‘lift and shift.’ You basically have to rewrite your application to make it cloud-ready even down to the persistence layer. We bravely went back to the drawing board and redid everything, first choosing Docker as a common containerization, then building the infrastructure from there."

- The company decided to hold off on orchestration at the beginning, but as teams were migrated to AWS, "we saw the pain teams were having with infrastructure and cloud formation on AWS," says Jacobs.

- Zalandos 200+ autonomous engineering teams decide what technologies to use and could operate their own applications using their own AWS accounts. This setup proved to be a compliance challenge. Even with strict rules-of-play and automated compliance checks in place, engineering teams and IT-compliance were overburdened addressing compliance issues. "Violations appear for non-compliant behavior, which we detect when scanning the cloud infrastructure," says Jacobs. "Everything is possible and nothing enforced, so you have to live with violations (and resolve them) instead of preventing the error in the first place. This means overhead for teams—and overhead for compliance and operations. It also takes time to spin up new EC2 instances on AWS, which affects our deployment velocity."

- The team realized they needed to "leverage the value you get from cluster management," says Jacobs. When they first looked at Platform as a Service (PaaS) options in 2015, the market was fragmented; but "now there seems to be a clear winner. It seemed like a good bet to go with Kubernetes."

- The transition to Kubernetes started in 2016 during Zalando’s Hack Week where participants deployed their projects to a Kubernetes cluster. From there 60 members of the tech infrastructure department were on-boarded - and then engineering teams were brought on one at a time. "We always start by talking with them and make sure everyone’s expectations are clear," says Jacobs. "Then we conduct some Kubernetes training, which is mostly training for our CI/CD setup, because the user interface for our users is primarily through the CI/CD system. But they have to know fundamental Kubernetes concepts and the API. This is followed by a weekly sync with each team to check their progress. Once they have something in production, we want to see if everything is fine on top of what we can improve." +{{< case-studies/quote author="Henning Jacobs, Head of Developer Productivity at Zalando" >}} +"We envision all Zalando delivery teams running their containerized applications on a state-of-the-art, reliable and scalable cluster infrastructure provided by Kubernetes." +{{< /case-studies/quote >}} -
-
-
-
- Once Zalando began migrating applications to Kubernetes, the results were immediate. "Kubernetes is a cornerstone for our seamless end-to-end developer experience. We are able to ship ideas to production using a single consistent and declarative API," says Jacobs. -
-
+{{< case-studies/lead >}} +When Henning Jacobs arrived at Zalando in 2010, the company was just two years old with 180 employees running an online store for European shoppers to buy fashion items. +{{< /case-studies/lead >}} -
-
- At the moment, Zalando is running an initial 40 Kubernetes clusters with plans to scale for the foreseeable future. - Once Zalando began migrating applications to Kubernetes, the results were immediate. "Kubernetes is a cornerstone for our seamless end-to-end developer experience. We are able to ship ideas to production using a single consistent and declarative API," says Jacobs. "The self-healing infrastructure provides a frictionless experience with higher-level abstractions built upon low-level best practices. We envision all Zalando delivery teams will run their containerized applications on a state-of-the-art reliable and scalable cluster infrastructure provided by Kubernetes."

- With the old on-premise infrastructure "it was difficult to properly embrace new technologies, and DevOps teams were considered to be a bottleneck," says Jacobs. "Now, with this cloud infrastructure, they have this packaging format, which can contain anything that runs in the Linux kernel. This makes a lot of people pretty happy. The engineers love the autonomy." - There were a few challenges in Zalando’s Kubernetes implementation. "We are a team of seven people providing clusters to different engineering teams, and our goal is to provide a rock-solid experience for all of them," says Jacobs. "We don’t want pet clusters. We don’t want to have to understand what workload they have; it should just work out of the box. With that in mind, cluster autoscaling is important. There are many different ways of doing cluster management, and this is not part of the core. So we created two components to provision clusters, have a registry for clusters, and to manage the whole cluster life cycle."

- Jacobs’s team also worked to improve the Kubernetes-AWS integration. "Thus you're very restricted. You need infrastructure to scale each autonomous team’s idea.""

- Plus, "there are still a lot of best practices missing," says Jacobs. The team, for example, recently solved a pod security policy issue. "There was already a concept in Kubernetes but it wasn’t documented, so it was kind of tricky," he says. The large Kubernetes community was a big help to resolve the issue. To help other companies start down the same path, Jacobs compiled his team’s learnings in a document called Running Kubernetes in Production. +

"It started as a PHP e-commerce site which was easy to get started with, but was not scaling with the business' needs" says Jacobs, Head of Developer Productivity at Zalando.

-
-
+

At that time, the company began expanding beyond its German origins into other European markets. Fast-forward to today and Zalando now has more than 14,000 employees, 3.6 billion Euro in revenue for 2016 and operates across 15 countries. "With growth in all dimensions, and constant scaling, it has been a once-in-a-lifetime experience," he says.

-
-
- "The Kubernetes API allows us to run applications in a cloud provider-agnostic way, which gives us the freedom to revisit IaaS providers in the coming years... We expect the Kubernetes API to be the global standard for PaaS infrastructure and are excited about the continued journey." -
-
-
-
- In the end, Kubernetes made it possible for Zalando to introduce and maintain the new products the company envisioned to grow its platform. "The fashion advice product used Scala, and there were struggles to make this possible with our former infrastructure," says Jacobs. "It was a workaround, and that team needed more and more support from the platform team, just because they used different technologies. Now with Kubernetes, it’s autonomous. Whatever the workload is, that team can just go their way, and Kubernetes prevents other bottlenecks."

- Looking ahead, Jacobs sees Zalando’s new infrastructure as a great enabler for other things the company has in the works, from its new logistics software, to a platform feature connecting brands, to products dreamed up by data scientists. "One vision is if you watch the next James Bond movie and see the suit he’s wearing, you should be able to automatically order it, and have it delivered to you within an hour," says Jacobs. "It’s about connecting the full fashion sphere. This is definitely not possible if you have a bottleneck with everyone running in the same data center and thus very restricted. You need infrastructure to scale each autonomous team’s idea."

- For other companies considering this technology, Jacobs says he wouldn’t necessarily advise doing it exactly the same way Zalando did. "It’s okay to do so if you’re ready to fail at some things," he says. "You need to set the right expectations. Not everything will work. Rewriting apps and this type of organizational change can be disruptive. The first product we moved was critical. There were a lot of dependencies, and it took longer than expected. Maybe we should have started with something less complicated, less business critical, just to get our toes wet."

- But once they got to the other side "it was clear for everyone that there’s no big alternative," Jacobs adds. "The Kubernetes API allows us to run applications in a cloud provider-agnostic way, which gives us the freedom to revisit IaaS providers in the coming years. Zalando Technology benefits from migrating to Kubernetes as we are able to leverage our existing knowledge to create an engineering platform offering flexibility and speed to our engineers while significantly reducing the operational overhead. We expect the Kubernetes API to be the global standard for PaaS infrastructure and are excited about the continued journey." +

Not to mention a unique opportunity for an infrastructure specialist like Jacobs. Just after he joined, the company began rewriting all their applications in-house. "That was generally our strategy," he says. "For example, we started with our own logistics warehouses but at first you don't know how to do logistics software, so you have some vendor software. And then we replaced it with our own because with off-the-shelf software you're not competitive. You need to optimize these processes based on your specific business needs."

-
+

In parallel to rewriting their applications, Zalando had set a goal of expanding beyond basic e-commerce to a platform offering multi-tenancy, a dramatic increase in assortments and styles, same-day delivery and even your own personal online stylist.

-
+

The need to scale ultimately led the company on a cloud-native journey. As did its embrace of a microservices-based software architecture that gives engineering teams more autonomy and ownership of projects. "This move to the cloud was necessary because in the data center you couldn't have autonomous teams. You have the same infrastructure and it was very homogeneous, so you could only run your Java or Python app," Jacobs says.

+ +{{< case-studies/quote image="/images/case-studies/zalando/banner3.jpg" >}} +"This move to the cloud was necessary because in the data center you couldn't have autonomous teams. You have the same infrastructure and it was very homogeneous, so you could only run your Java or Python app." +{{< /case-studies/quote >}} + +

Zalando began moving its infrastructure from two on-premise data centers to the cloud, requiring the migration of older applications for cloud-readiness. "We decided to have a clean break," says Jacobs. "Our Amazon Web Services infrastructure was set up like so: Every team has its own AWS account, which is completely isolated, meaning there's no 'lift and shift.' You basically have to rewrite your application to make it cloud-ready even down to the persistence layer. We bravely went back to the drawing board and redid everything, first choosing Docker as a common containerization, then building the infrastructure from there."

+ +

The company decided to hold off on orchestration at the beginning, but as teams were migrated to AWS, "we saw the pain teams were having with infrastructure and cloud formation on AWS," says Jacobs.

+ +

Zalandos 200+ autonomous engineering teams decide what technologies to use and could operate their own applications using their own AWS accounts. This setup proved to be a compliance challenge. Even with strict rules-of-play and automated compliance checks in place, engineering teams and IT-compliance were overburdened addressing compliance issues. "Violations appear for non-compliant behavior, which we detect when scanning the cloud infrastructure," says Jacobs. "Everything is possible and nothing enforced, so you have to live with violations (and resolve them) instead of preventing the error in the first place. This means overhead for teams—and overhead for compliance and operations. It also takes time to spin up new EC2 instances on AWS, which affects our deployment velocity."

+ +

The team realized they needed to "leverage the value you get from cluster management," says Jacobs. When they first looked at Platform as a Service (PaaS) options in 2015, the market was fragmented; but "now there seems to be a clear winner. It seemed like a good bet to go with Kubernetes."

+ +

The transition to Kubernetes started in 2016 during Zalando's Hack Week where participants deployed their projects to a Kubernetes cluster. From there 60 members of the tech infrastructure department were on-boarded - and then engineering teams were brought on one at a time. "We always start by talking with them and make sure everyone's expectations are clear," says Jacobs. "Then we conduct some Kubernetes training, which is mostly training for our CI/CD setup, because the user interface for our users is primarily through the CI/CD system. But they have to know fundamental Kubernetes concepts and the API. This is followed by a weekly sync with each team to check their progress. Once they have something in production, we want to see if everything is fine on top of what we can improve."

+ +{{< case-studies/quote image="/images/case-studies/zalando/banner4.jpg" >}} +Once Zalando began migrating applications to Kubernetes, the results were immediate. "Kubernetes is a cornerstone for our seamless end-to-end developer experience. We are able to ship ideas to production using a single consistent and declarative API," says Jacobs. +{{< /case-studies/quote >}} + +

At the moment, Zalando is running an initial 40 Kubernetes clusters with plans to scale for the foreseeable future. Once Zalando began migrating applications to Kubernetes, the results were immediate. "Kubernetes is a cornerstone for our seamless end-to-end developer experience. We are able to ship ideas to production using a single consistent and declarative API," says Jacobs. "The self-healing infrastructure provides a frictionless experience with higher-level abstractions built upon low-level best practices. We envision all Zalando delivery teams will run their containerized applications on a state-of-the-art reliable and scalable cluster infrastructure provided by Kubernetes."

+ +

With the old on-premise infrastructure "it was difficult to properly embrace new technologies, and DevOps teams were considered to be a bottleneck," says Jacobs. "Now, with this cloud infrastructure, they have this packaging format, which can contain anything that runs in the Linux kernel. This makes a lot of people pretty happy. The engineers love the autonomy."

+ +

There were a few challenges in Zalando's Kubernetes implementation. "We are a team of seven people providing clusters to different engineering teams, and our goal is to provide a rock-solid experience for all of them," says Jacobs. "We don't want pet clusters. We don't want to have to understand what workload they have; it should just work out of the box. With that in mind, cluster autoscaling is important. There are many different ways of doing cluster management, and this is not part of the core. So we created two components to provision clusters, have a registry for clusters, and to manage the whole cluster life cycle."

+ +

Jacobs's team also worked to improve the Kubernetes-AWS integration. "Thus you're very restricted. You need infrastructure to scale each autonomous team's idea." Plus, "there are still a lot of best practices missing," says Jacobs. The team, for example, recently solved a pod security policy issue. "There was already a concept in Kubernetes but it wasn't documented, so it was kind of tricky," he says. The large Kubernetes community was a big help to resolve the issue. To help other companies start down the same path, Jacobs compiled his team's learnings in a document called Running Kubernetes in Production.

+ +{{< case-studies/quote >}} +"The Kubernetes API allows us to run applications in a cloud provider-agnostic way, which gives us the freedom to revisit IaaS providers in the coming years... We expect the Kubernetes API to be the global standard for PaaS infrastructure and are excited about the continued journey." +{{< /case-studies/quote >}} + +

In the end, Kubernetes made it possible for Zalando to introduce and maintain the new products the company envisioned to grow its platform. "The fashion advice product used Scala, and there were struggles to make this possible with our former infrastructure," says Jacobs. "It was a workaround, and that team needed more and more support from the platform team, just because they used different technologies. Now with Kubernetes, it's autonomous. Whatever the workload is, that team can just go their way, and Kubernetes prevents other bottlenecks."

+ +

Looking ahead, Jacobs sees Zalando's new infrastructure as a great enabler for other things the company has in the works, from its new logistics software, to a platform feature connecting brands, to products dreamed up by data scientists. "One vision is if you watch the next James Bond movie and see the suit he's wearing, you should be able to automatically order it, and have it delivered to you within an hour," says Jacobs. "It's about connecting the full fashion sphere. This is definitely not possible if you have a bottleneck with everyone running in the same data center and thus very restricted. You need infrastructure to scale each autonomous team's idea."

+ +

For other companies considering this technology, Jacobs says he wouldn't necessarily advise doing it exactly the same way Zalando did. "It's okay to do so if you're ready to fail at some things," he says. "You need to set the right expectations. Not everything will work. Rewriting apps and this type of organizational change can be disruptive. The first product we moved was critical. There were a lot of dependencies, and it took longer than expected. Maybe we should have started with something less complicated, less business critical, just to get our toes wet."

+ +

But once they got to the other side "it was clear for everyone that there's no big alternative," Jacobs adds. "The Kubernetes API allows us to run applications in a cloud provider-agnostic way, which gives us the freedom to revisit IaaS providers in the coming years. Zalando Technology benefits from migrating to Kubernetes as we are able to leverage our existing knowledge to create an engineering platform offering flexibility and speed to our engineers while significantly reducing the operational overhead. We expect the Kubernetes API to be the global standard for PaaS infrastructure and are excited about the continued journey."

diff --git a/layouts/case-studies/single.html b/layouts/case-studies/single.html index 2d8bbdad98..ad0335685a 100644 --- a/layouts/case-studies/single.html +++ b/layouts/case-studies/single.html @@ -2,6 +2,33 @@ {{ with .Params.content_url }} {{ else }} - {{ .Content }} + {{ if .Params.new_case_study_styles }} + + + +
+ {{ range $detail := .Params.case_study_details }} + {{ range $key, $value := $detail }} + {{ $key }} {{ $value }} + {{ end }} + {{ end }} +
+ +
+ {{ .Content }} +
+ {{ else }} + {{ .Content }} + {{ end }} {{ end }} {{ end }} diff --git a/layouts/partials/css.html b/layouts/partials/css.html index cb2cef6eb6..7abfb21e72 100644 --- a/layouts/partials/css.html +++ b/layouts/partials/css.html @@ -27,6 +27,10 @@ {{- end }} +{{- if .Params.new_case_study_styles }} + +{{- end}} + {{- with .Params.css }} {{- $extraCss := split . "," }} {{- range $extraCss }} diff --git a/layouts/shortcodes/case-studies/lead.html b/layouts/shortcodes/case-studies/lead.html new file mode 100644 index 0000000000..362224c151 --- /dev/null +++ b/layouts/shortcodes/case-studies/lead.html @@ -0,0 +1 @@ +
{{ .Inner }}
diff --git a/layouts/shortcodes/case-studies/quote.html b/layouts/shortcodes/case-studies/quote.html new file mode 100644 index 0000000000..e0990c8c18 --- /dev/null +++ b/layouts/shortcodes/case-studies/quote.html @@ -0,0 +1,12 @@ +{{ if .Get "image" }} +