add SlingTV and Workiva case studies (#9769)
* add SlingTV and Workiva case studies * update * updates * updatepull/9774/head
|
@ -121,6 +121,14 @@ cid: home
|
|||
<main>
|
||||
<h3>Case Studies</h3>
|
||||
<div id="caseStudiesWrapper">
|
||||
<div>
|
||||
<p>Sling TV: Marrying Kubernetes and AI to Enable Proper Web Scale</p>
|
||||
<a href="/case-studies/slingtv">Read more</a>
|
||||
</div>
|
||||
<div>
|
||||
<p>Using OpenTracing to Help Pinpoint the Bottlenecks</p>
|
||||
<a href="/case-studies/workiva">Read more</a>
|
||||
</div>
|
||||
<div>
|
||||
<p>Pinning Its Past, Present, and Future on Cloud Native</p>
|
||||
<a href="/case-studies/pinterest">Read more</a>
|
||||
|
@ -129,15 +137,6 @@ cid: home
|
|||
<p>Reinventing the World’s Largest Education Company With Kubernetes</p>
|
||||
<a href="/case-studies/pearson">Read more</a>
|
||||
</div>
|
||||
|
||||
<div>
|
||||
<p>Supporting Fast Decisioning Applications with Kubernetes</p>
|
||||
<a href="/case-studies/capital-one">Read more</a>
|
||||
</div>
|
||||
<div>
|
||||
<p>Driving Banking Innovation with Cloud Native</p>
|
||||
<a href="/case-studies/ing">Read more</a>
|
||||
</div>
|
||||
|
||||
</div>
|
||||
|
||||
|
|
|
@ -12,6 +12,20 @@ cid: caseStudies
|
|||
<div class="content">
|
||||
<div class="case-studies">
|
||||
|
||||
<div class="case-study">
|
||||
<img src="/images/case_studies/slingtv_feature.png" alt="Sling TV">
|
||||
<p class="quote">"I would almost be so bold as to say that most of these applications that we are building now would not have been possible without the cloud native patterns and the flexibility that Kubernetes enables."</p>
|
||||
<!--<p class="attrib">— Brad Linder, Cloud Native & Big Data Evangelist for Sling TV</p>-->
|
||||
<a href="/case-studies/slingtv/">Read about Sling TV</a>
|
||||
</div>
|
||||
|
||||
<div class="case-study">
|
||||
<img src="/images/case_studies/workiva_feature.png" alt="Workiva">
|
||||
<p class="quote"> "With OpenTracing, my team was able to look at a trace and make optimization suggestions to another team without ever looking at their code."</p>
|
||||
<!--<p class="attrib">— MacLeod Broad, Senior Software Architect at Workiva</p>-->
|
||||
<a href="/case-studies/workiva/">Read about Workiva</a>
|
||||
</div>
|
||||
|
||||
<div class="case-study">
|
||||
<img src="/images/case_studies/pinterest_feature.png" alt="Pinterest">
|
||||
<p class="quote"> "We are in the position to run things at scale, in a public cloud environment, and test things out in way that a lot of people might not be able to do."</p>
|
||||
|
@ -25,19 +39,7 @@ cid: caseStudies
|
|||
<!--<p class="attrib">— CHRIS JACKSON, DIRECTOR FOR CLOUD PLATFORMS & SRE AT PEARSON</p>-->
|
||||
<a href="/case-studies/pearson/">Read about Pearson</a>
|
||||
</div>
|
||||
<div class="case-study">
|
||||
<img src="/images/case_studies/ing_feature.png" alt="ING">
|
||||
<p class="quote">"The big cloud native promise to our business is the ability to go from idea to production within 48 hours. We are some years away from this, but that’s quite feasible to us."</p>
|
||||
<!--<p class="attrib">— Thijs Ebbers, Infrastructure Architect, ING</p>-->
|
||||
<a href="/case-studies/ing/">Read about ING</a>
|
||||
</div>
|
||||
|
||||
<div class="case-study">
|
||||
<img src="/images/case_studies/capitalone_feature.png" alt="Capital One">
|
||||
<p class="quote">"With the scalability, the management, the coordination, Kubernetes really empowers us and gives us more time back than we had before."</p>
|
||||
<!--<p class="attrib">— Jamil Jadallah, Scrum Master</p>-->
|
||||
<a href="/case-studies/capital-one/">Read about Capital One</a>
|
||||
</div>
|
||||
|
||||
|
||||
</div>
|
||||
</div>
|
||||
|
@ -121,6 +123,8 @@ cid: caseStudies
|
|||
<a target="_blank" href="https://youtu.be/4gyeixJLabo"><img src="/images/case_studies/sap.png" alt="SAP"></a>
|
||||
|
||||
<a target="_blank" href="http://www.nextplatform.com/2016/05/24/samsung-experts-put-kubernetes-paces/"><img src="/images/case_studies/sds.png" alt="Samsung SDS"></a>
|
||||
|
||||
<a target="_blank" href="/case-studies/slingtv/"><img src="/images/case_studies/slingtv_feature.png" alt="Sling TV"></a>
|
||||
|
||||
<a target="_blank" href="/case-studies/squarespace/"><img src="/images/case_studies/squarespace_feature.png" alt="Squarespace"></a>
|
||||
|
||||
|
@ -133,6 +137,8 @@ cid: caseStudies
|
|||
|
||||
<a target="_blank" href="/case-studies/wink/"><img src="/images/case_studies/wink.png" alt="Wink"></a>
|
||||
|
||||
<a target="_blank" href="/case-studies/workiva/"><img src="/images/case_studies/workiva_feature.png" alt="Workiva"></a>
|
||||
|
||||
<a target="_blank" href="https://kubernetes.io/blog/2016/10/kubernetes-and-openstack-at-yahoo-japan"><img src="/images/case_studies/yahooJapan_logo.png" alt="Yahoo! Japan"></a>
|
||||
|
||||
<a target="_blank" href="/case-studies/zalando/"><img src="/images/case_studies/zalando_feature.png" alt="Zalando"></a>
|
||||
|
|
|
@ -0,0 +1,105 @@
|
|||
---
|
||||
title: SlingTV Case Study
|
||||
case_study_styles: true
|
||||
cid: caseStudies
|
||||
css: /css/style_case_studies.css
|
||||
---
|
||||
|
||||
|
||||
<div class="banner1 desktop" style="background-image: url('/images/CaseStudy_slingtv_banner1.jpg')">
|
||||
<h1> CASE STUDY:<img src="/images/slingtv_logo.png" style="margin-bottom:-1.5%" class="header_logo"><br> <div class="subhead">Sling TV: Marrying Kubernetes and AI to Enable Proper Web Scale
|
||||
|
||||
</div></h1>
|
||||
|
||||
</div>
|
||||
|
||||
<div class="details">
|
||||
Company <b>Sling TV</b> Location <b>Englewood, Colorado</b> Industry <b>Streaming television</b>
|
||||
</div>
|
||||
|
||||
<hr>
|
||||
<section class="section1">
|
||||
<div class="cols">
|
||||
<div class="col1" style="width:95% !important;padding-left:5%">
|
||||
<h2>Challenge</h2>
|
||||
Launched by DISH Network in 2015, Sling TV experienced great customer growth from the beginning. After just a year, “we were going through some growing pains of some of the legacy systems and trying to find the right architecture to enable our future,” says Brad Linder, Sling TV’s Cloud Native & Big Data Evangelist. The company has particular challenges: “We take live TV and distribute it over the internet out to a user’s device that we do not control,” says Linder. “In a lot of ways, we are working in the Wild West: The internet is what it is going to be, and if a customer’s service does not work for whatever reason, they do not care why. They just want things to work. Those are the variables of the equation that we have to try to solve. We really have to try to enable optionality and good customer experience at web scale.”
|
||||
|
||||
<br><br>
|
||||
|
||||
<h2>Solution</h2>
|
||||
Led by the belief that “the cloud native architectures and patterns really give us a lot of flexibility in meeting the needs of that sort of customer base,” Linder partnered with <a href="http://rancher.com">Rancher Labs</a> to build Sling TV’s next-generation platform around Kubernetes. “We are going to need to enable a hybrid cloud strategy including multiple public clouds and an on-premise VMWare multi data center environment to meet the needs of the business at some point, so getting that sort of abstraction was a real goal,” he says. “That is one of the biggest reasons why we picked Kubernetes.” The team launched its first applications on Kubernetes in Sling TV’s two internal data centers. The push to enable AWS as a data center option is underway and should be available by the end of 2018. The team has added <a href="https://prometheus.io/">Prometheus</a> for monitoring and <a href="https://github.com/jaegertracing/jaeger">Jaeger</a> for tracing, to work alongside the company’s existing tool sets: Zenoss, New Relic and ELK.
|
||||
|
||||
</div>
|
||||
<br><br>
|
||||
<div class="col2" style="width:95% !important;padding-left:5%">
|
||||
|
||||
<h2>Impact</h2>
|
||||
“We are getting to the place where we can one-click deploy an entire data center – the compute, network, Kubernetes, logging, monitoring and all the apps,” says Linder. “We have really enabled a platform thinking based approach to allowing applications to consume common tools. A new application can be onboarded in about an hour using common tooling and CI/CD processes. The gains on that side have been huge. Before, it took at least a few days to get things sorted for a new application to deploy. That does not consider the training of our operations staff to manage this new application. It is two or three orders of magnitude of savings in time and cost, and operationally it has given us the opportunity to let a core team of talented operations engineers manage common infrastructure and tooling to make our applications available at web scale.”
|
||||
|
||||
|
||||
</div>
|
||||
|
||||
</div>
|
||||
</section>
|
||||
<div class="banner2">
|
||||
<div class="banner2text">
|
||||
“I would almost be so bold as to say that most of these applications that we are building now would not have been possible without the cloud native patterns and the flexibility that Kubernetes enables.”<span style="font-size:14px;letter-spacing:0.12em;padding-top:20px;text-transform:uppercase;line-height:14px"><br><br>— Brad Linder, Cloud Native & Big Data Evangelist for Sling TV</span>
|
||||
|
||||
</div>
|
||||
</div>
|
||||
<section class="section2">
|
||||
<div class="fullcol">
|
||||
<h2>The beauty of streaming television, like the service offered by <a href="https://www.sling.com/" style="text-decoration:underline">Sling TV</a>, is that you can watch it from any device you want, wherever you want.</h2> Of course, from the provider side of things, that creates a particular set of challenges
|
||||
“We take live TV and distribute it over the internet out to a user’s device that we do not control,” says Brad Linder, Sling TV’s Cloud Native & Big Data Evangelist. “In a lot of ways, we are working in the Wild West: The internet is what it is going to be, and if a customer’s service does not work for whatever reason, they do not care why. They just want things to work. Those are the variables of the equation that we have to try to solve. We really have to try to enable optionality and we have to do it at web scale.”<br><br>
|
||||
Indeed, Sling TV experienced great customer growth from the beginning of its launch by <a href="https://www.dish.com/">DISH Network</a> in 2015. After just a year, “we were going through some growing pains of some of the legacy systems and trying to find the right architecture to enable our future,” says Linder. Tasked with building a next-generation web scale platform for the “personalized customer experience,” Linder has spent the past year bringing Kubernetes to Sling TV.<br><br>
|
||||
Led by the belief that “the cloud native architectures and patterns really give us a lot of flexibility in meeting the needs of our customers,” Linder partnered with <a href="http://rancher.com">Rancher Labs</a> to build the platform around Kubernetes. “They have really helped us get our head around how to use Kubernetes,” he says. “We needed the flexibility to enable our use case versus just a simple orchestrater. Enabling our future in a way that did not give us vendor lock-in was also a key part of our strategy. I think that is part of the Rancher value proposition.”
|
||||
|
||||
|
||||
</div>
|
||||
</section>
|
||||
<div class="banner3" style="background-image: url('/images/CaseStudy_slingtv_banner3.jpg')">
|
||||
<div class="banner3text">
|
||||
“We needed the flexibility to enable our use case versus just a simple orchestrater. Enabling our future in a way that did not give us vendor lock-in was also a key part of our strategy. I think that is part of the Rancher value proposition.” <span style="font-size:14px;letter-spacing:0.12em;padding-top:20px;text-transform:uppercase;line-height:14px"><br><br>— Brad Linder, Cloud Native & Big Data Evangelist for Sling TV</span>
|
||||
</div>
|
||||
</div>
|
||||
<section class="section3">
|
||||
<div class="fullcol">
|
||||
|
||||
One big reason he chose Kubernetes was getting a level of abstraction that would enable the company to “enable a hybrid cloud strategy including multiple public clouds and an on-premise VMWare multi data center environment to meet the needs of the business,” he says. Another factor was how much the Kubernetes ecosystem has matured over the past couple of years. “We have spent a lot of time and energy around making logging, monitoring and alerting production ready to give us insights into applications’ well-being,” says Linder. The team has added <a href="https://prometheus.io/">Prometheus</a> for monitoring and <a href="https://github.com/jaegertracing/jaeger">Jaeger</a> for tracing, to work alongside the company’s existing tool sets: Zenoss, New Relic and ELK.<br><br>
|
||||
With the emphasis on common tooling, “We are getting to the place where we can one-click deploy an entire data center – the compute, network, Kubernetes, logging, monitoring and all the apps,” says Linder. “We have really enabled a platform thinking based approach to allowing applications to consume common tools and services. A new application can be onboarded in about an hour using common tooling and CI/CD processes. The gains on that side have been huge. Before, it took at least a few days to get things sorted for a new application to deploy. That does not consider the training of our operations staff to manage this new application. It is two or three orders of magnitude of savings in time and cost, and operationally it has given us the opportunity to let a core team of talented operations engineers manage common infrastructure and tooling to make our applications available at web scale.”<br><br>
|
||||
|
||||
</div>
|
||||
</section>
|
||||
<div class="banner4" style="background-image: url('/images/CaseStudy_slingtv_banner4.jpg')">
|
||||
<div class="banner4text">
|
||||
"So far it’s been good," he adds, "especially the elasticity around how we can configure our Jenkins workloads on Kubernetes shared cluster. That is the win we were pushing for." <span style="font-size:14px;letter-spacing:0.12em;padding-top:20px;text-transform:uppercase;line-height:14px"><br><br>— Brad Linder, Cloud Native & Big Data Evangelist for Sling TV</span>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<section class="section5" style="padding:0px !important">
|
||||
|
||||
<div class="fullcol">
|
||||
The team launched its first applications on Kubernetes in Sling TV’s two internal data centers in the early part of Q1 2018 and began to enable AWS as a data center option. The company plans to expand into other public clouds in the future.
|
||||
The first application that went into production is a web socket-based back-end notification service. “It allows back-end changes to trigger messages to our clients in the field without the polling,” says Linder. “We are talking about very high volumes of messages with this application. Without something like Kubernetes to be able to scale up and down, as well as just support that overall workload, that is pretty hard to do. I would almost be so bold as to say that most of these applications that we are building now would not have been possible without the cloud native patterns and the flexibility that Kubernetes enables.”<br><br>
|
||||
Linder oversees three teams working together on building the next-generation platform: a platform engineering team; an enterprise middleware services team; and a big data and analytics team. “We have really tried to bring everything together to be able to have a client application interact with a cloud native middleware layer. That middleware layer must run on a platform, consume platform services and then have logs and events monitored by an artificial agent to keep things running smoothly,” says Linder.
|
||||
|
||||
|
||||
</div>
|
||||
|
||||
<div class="banner5">
|
||||
<div class="banner5text">
|
||||
This undertaking is about “trying to marry Kubernetes with AI to enable web scale that just works". <span style="font-size:14px;letter-spacing:0.12em;padding-top:20px;text-transform:uppercase;line-height:14px"><br><br>— BRAD LINDER, CLOUD NATIVE & BIG DATA EVANGELIST FOR SLING TV </span>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<div class="fullcol">
|
||||
Ultimately, this undertaking is about “trying to marry Kubernetes with AI to enable web scale that just works,” he adds. “We want the artificial agents and the big data platform using the actual logs and events coming out of the applications, Kubernetes, the infrastructure, backing services and changes to the environment to make decisions like, ‘Hey we need more capacity for this service so please add more nodes.’ From a platform perspective, if you are truly doing web scale stuff and you are not using AI and big data, in my opinion, you are going to implode under your own weight. It is not a question of if, it is when. If you are in a ‘millions of users’ sort of environment, that implosion is going to be catastrophic. We are on our way to this goal and have learned a lot along the way.”<br><br>
|
||||
For Sling TV, moving to cloud native has been exactly what they needed. “We have to be able to react to changes and hiccups in the matrix,” says Linder. “It is the foundation for our ability to deliver a high-quality service for our customers. Building intelligent platforms, tools and clients in the field consuming those services has got to be part of all of this. In my eyes that is a big part of what cloud native is all about. It is taking these distributed, potentially unreliable entities and enabling a robust customer experience they expect.”
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
</div>
|
||||
|
||||
</section>
|
|
@ -0,0 +1,107 @@
|
|||
---
|
||||
title: Workiva Case Study
|
||||
case_study_styles: true
|
||||
cid: caseStudies
|
||||
css: /css/style_case_studies.css
|
||||
---
|
||||
|
||||
<div class="banner1 desktop" style="background-image: url('/images/CaseStudy_workiva_banner1.jpg')">
|
||||
<h1> CASE STUDY:<img src="/images/workiva_logo.png" style="margin-bottom:0%" class="header_logo"><br> <div class="subhead">Using OpenTracing to Help Pinpoint the Bottlenecks
|
||||
|
||||
</div></h1>
|
||||
|
||||
</div>
|
||||
|
||||
<div class="details">
|
||||
Company <b>Workiva</b> Location <b>Ames, Iowa</b> Industry <b>Enterprise Software</b>
|
||||
</div>
|
||||
|
||||
<hr>
|
||||
<section class="section1">
|
||||
<div class="cols">
|
||||
<div class="col1">
|
||||
<h2>Challenge</h2>
|
||||
<a href="https://www.workiva.com/">Workiva</a> offers a cloud-based platform for managing and reporting business data. This SaaS product, Wdesk, is used by more than 70 percent of the Fortune 500 companies. As the company made the shift from a monolith to a more distributed, microservice-based system, "We had a number of people working on this, all on different teams, so we needed to identify what the issues were and where the bottlenecks were," says Senior Software Architect MacLeod Broad. With back-end code running on Google App Engine, Google Compute Engine, as well as Amazon Web Services, Workiva needed a tracing system that was agnostic of platform. While preparing one of the company’s first products utilizing AWS, which involved a "sync and link" feature that linked data from spreadsheets built in the new application with documents created in the old application on Workiva’s existing system, Broad’s team found an ideal use case for tracing: There were circular dependencies, and optimizations often turned out to be micro-optimizations that didn’t impact overall speed.
|
||||
|
||||
<br>
|
||||
|
||||
</div>
|
||||
|
||||
<div class="col2">
|
||||
<h2>Solution</h2>
|
||||
Broad’s team introduced the platform-agnostic distributed tracing system OpenTracing to help them pinpoint the bottlenecks.
|
||||
<br>
|
||||
<h2>Impact</h2>
|
||||
Now used throughout the company, OpenTracing produced immediate results. Software Engineer Michael Davis reports: "Tracing has given us immediate, actionable insight into how to improve our service. Through a combination of seeing where each call spends its time, as well as which calls are most often used, we were able to reduce our average response time by 95 percent (from 600ms to 30ms) in a single fix."
|
||||
|
||||
</div>
|
||||
|
||||
</div>
|
||||
</section>
|
||||
<div class="banner2">
|
||||
<div class="banner2text">
|
||||
"With OpenTracing, my team was able to look at a trace and make optimization suggestions to another team without ever looking at their code." <span style="font-size:14px;letter-spacing:0.12em;padding-top:20px;text-transform:uppercase"><br>— MacLeod Broad, Senior Software Architect at Workiva</span>
|
||||
|
||||
</div>
|
||||
</div>
|
||||
<section class="section2">
|
||||
<div class="fullcol">
|
||||
<h2>Last fall, MacLeod Broad’s platform team at Workiva was prepping one of the company’s first products utilizing <a href="https://aws.amazon.com/">Amazon Web Services</a> when they ran into a roadblock.</h2>
|
||||
Early on, Workiva’s backend had run mostly on <a href="https://cloud.google.com/appengine/">Google App Engine</a>. But things changed along the way as Workiva’s SaaS offering, <a href="https://www.workiva.com/wdesk">Wdesk</a>, a cloud-based platform for managing and reporting business data, grew its customer base to more than 70 percent of the Fortune 500 companies. "As customer needs grew and the product offering expanded, we started to leverage a wider offering of services such as Amazon Web Services as well as other Google Cloud Platform services, creating a multi-vendor environment."<br><br>
|
||||
With this new product, there was a "sync and link" feature by which data "went through a whole host of services starting with the new spreadsheet system [<a href="https://aws.amazon.com/rds/aurora/">Amazon Aurora</a>] into what we called our linking system, and then pushed through http to our existing system, and then a number of calculations would go on, and the results would be transmitted back into the new system," says Broad. "We were trying to optimize that for speed. We thought we had made this great optimization and then it would turn out to be a micro optimization, which didn’t really affect the overall speed of things." <br><br>
|
||||
The challenges faced by Broad’s team may sound familiar to other companies that have also made the shift from monoliths to more distributed, microservice-based systems. "We had a number of people working on this, all on different teams, so it was difficult to get our head around what the issues were and where the bottlenecks were," says Broad.<br><br>
|
||||
"Each service team was going through different iterations of their architecture and it was very hard to follow what was actually going on in each teams’ system," he adds. "We had circular dependencies where we’d have three or four different service teams unsure of where the issues really were, requiring a lot of back and forth communication. So we wasted a lot of time saying, ‘What part of this is slow? Which part of this is sometimes slow depending on the use case? Which part is degrading over time? Which part of this process is asynchronous so it doesn’t really matter if it’s long-running or not? What are we doing that’s redundant, and which part of this is buggy?’"
|
||||
|
||||
|
||||
</div>
|
||||
</section>
|
||||
<div class="banner3" style="background-image: url('/images/CaseStudy_workiva_banner3.jpg')">
|
||||
<div class="banner3text">
|
||||
"A tracing system can at a glance explain an architecture, narrow down a performance bottleneck and zero in on it, and generally just help direct an investigation at a high level. Being able to do that at a glance is much faster than at a meeting or with three days of debugging, and it’s a lot faster than never figuring out the problem and just moving on."<span style="font-size:14px;letter-spacing:0.12em;padding-top:20px;text-transform:uppercase"><br>— MACLEOD BROAD, SENIOR SOFTWARE ARCHITECT AT WORKIVA</span>
|
||||
</div>
|
||||
</div>
|
||||
<section class="section3">
|
||||
<div class="fullcol">
|
||||
|
||||
Simply put, it was an ideal use case for tracing. "A tracing system can at a glance explain an architecture, narrow down a performance bottleneck and zero in on it, and generally just help direct an investigation at a high level," says Broad. "Being able to do that at a glance is much faster than at a meeting or with three days of debugging, and it’s a lot faster than never figuring out the problem and just moving on."<br><br>
|
||||
With Workiva’s back-end code running on <a href="https://cloud.google.com/compute/">Google Compute Engine</a> as well as App Engine and AWS, Broad knew that he needed a tracing system that was platform agnostic. "We were looking at different tracing solutions," he says, "and we decided that because it seemed to be a very evolving market, we didn’t want to get stuck with one vendor. So OpenTracing seemed like the cleanest way to avoid vendor lock-in on what backend we actually had to use."<br><br>
|
||||
Once they introduced OpenTracing into this first use case, Broad says, "The trace made it super obvious where the bottlenecks were." Even though everyone had assumed it was Workiva’s existing code that was slowing things down, that wasn’t exactly the case. "It looked like the existing code was slow only because it was reaching out to our next-generation services, and they were taking a very long time to service all those requests," says Broad. "On the waterfall graph you can see the exact same work being done on every request when it was calling back in. So every service request would look the exact same for every response being paged out. And then it was just a no-brainer of, ‘Why is it doing all this work again?’"<br><br>
|
||||
Using the insight OpenTracing gave them, "My team was able to look at a trace and make optimization suggestions to another team without ever looking at their code," says Broad. "The way we named our traces gave us insight whether it’s doing a SQL call or it’s making an RPC. And so it was really easy to say, ‘OK, we know that it’s going to page through all these requests. Do the work once and stuff it in cache.’ And we were done basically. All those calls became sub-second calls immediately."<br><br>
|
||||
|
||||
|
||||
|
||||
</div>
|
||||
</section>
|
||||
<div class="banner4" style="background-image: url('/images/CaseStudy_workiva_banner4.jpg')">
|
||||
<div class="banner4text">
|
||||
"We were looking at different tracing solutions and we decided that because it seemed to be a very evolving market, we didn’t want to get stuck with one vendor. So OpenTracing seemed like the cleanest way to avoid vendor lock-in on what backend we actually had to use." <span style="font-size:14px;letter-spacing:0.12em;padding-top:20px;text-transform:uppercase"><br>— MACLEOD BROAD, SENIOR SOFTWARE ARCHITECT AT WORKIVA</span>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<section class="section5" style="padding:0px !important">
|
||||
<div class="fullcol">
|
||||
After the success of the first use case, everyone involved in the trial went back and fully instrumented their products. Tracing was added to a few more use cases. "We wanted to get through the initial implementation pains early without bringing the whole department along for the ride," says Broad. "Now, a lot of teams add it when they’re starting up a new service. We’re really pushing adoption now more than we were before." <br><br>
|
||||
Some teams were won over quickly. "Tracing has given us immediate, actionable insight into how to improve our [Workspaces] service," says Software Engineer Michael Davis. "Through a combination of seeing where each call spends its time, as well as which calls are most often used, we were able to reduce our average response time by 95 percent (from 600ms to 30ms) in a single fix." <br><br>
|
||||
Most of Workiva’s major products are now traced using OpenTracing, with data pushed into <a href="https://cloud.google.com/stackdriver/">Google StackDriver</a>. Even the products that aren’t fully traced have some components and libraries that are. <br><br>
|
||||
Broad points out that because some of the engineers were working on App Engine and already had experience with the platform’s Appstats library for profiling performance, it didn’t take much to get them used to using OpenTracing. But others were a little more reluctant. "The biggest hindrance to adoption I think has been the concern about how much latency is introducing tracing [and StackDriver] going to cost," he says. "People are also very concerned about adding middleware to whatever they’re working on. Questions about passing the context around and how that’s done were common. A lot of our Go developers were fine with it, because they were already doing that in one form or another. Our Java developers were not super keen on doing that because they’d used other systems that didn’t require that."<br><br>
|
||||
But the benefits clearly outweighed the concerns, and today, Workiva’s official policy is to use tracing."
|
||||
In fact, Broad believes that tracing naturally fits in with Workiva’s existing logging and metrics systems. "This was the way we presented it internally, and also the way we designed our use," he says. "Our traces are logged in the exact same mechanism as our app metric and logging data, and they get pushed the exact same way. So we treat all that data exactly the same when it’s being created and when it’s being recorded. We have one internal library that we use for logging, telemetry, analytics and tracing."
|
||||
|
||||
|
||||
</div>
|
||||
|
||||
<div class="banner5">
|
||||
<div class="banner5text">
|
||||
"Tracing has given us immediate, actionable insight into how to improve our [Workspaces] service. Through a combination of seeing where each call spends its time, as well as which calls are most often used, we were able to reduce our average response time by 95 percent (from 600ms to 30ms) in a single fix." <span style="font-size:14px;letter-spacing:0.12em;padding-top:20px;text-transform:uppercase"><br>— Michael Davis, Software Engineer, Workiva </span>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<div class="fullcol">
|
||||
For Workiva, OpenTracing has become an essential tool for zeroing in on optimizations and determining what’s actually a micro-optimization by observing usage patterns. "On some projects we often assume what the customer is doing, and we optimize for these crazy scale cases that we hit 1 percent of the time," says Broad. "It’s been really helpful to be able to say, ‘OK, we’re adding 100 milliseconds on every request that does X, and we only need to add that 100 milliseconds if it’s the worst of the worst case, which only happens one out of a thousand requests or one out of a million requests."<br><br>
|
||||
Unlike many other companies, Workiva also traces the client side. "For us, the user experience is important—it doesn’t matter if the RPC takes 100 milliseconds if it still takes 5 seconds to do the rendering to show it in the browser," says Broad. "So for us, those client times are important. We trace it to see what parts of loading take a long time. We’re in the middle of working on a definition of what is ‘loaded.’ Is it when you have it, or when it’s rendered, or when you can interact with it? Those are things we’re planning to use tracing for to keep an eye on and to better understand."<br><br>
|
||||
That also requires adjusting for differences in external and internal clocks. "Before time correcting, it was horrible; our traces were more misleading than anything," says Broad. "So we decided that we would return a timestamp on the response headers, and then have the client reorient its time based on that—not change its internal clock but just calculate the offset on the response time to when the client got it. And if you end up in an impossible situation where a client RPC spans 210 milliseconds but the time on the response time is outside of that window, then we have to reorient that."<br><br>
|
||||
Broad is excited about the impact OpenTracing has already had on the company, and is also looking ahead to what else the technology can enable. One possibility is using tracing to update documentation in real time. "Keeping documentation up to date with reality is a big challenge," he says. "Say, we just ran a trace simulation or we just ran a smoke test on this new deploy, and the architecture doesn’t match the documentation. We can find whose responsibility it is and let them know and have them update it. That’s one of the places I’d like to get in the future with tracing."
|
||||
|
||||
</div>
|
||||
|
||||
</section>
|
|
@ -744,13 +744,13 @@ html.search #docsContent h1 { margin-bottom: 0; border-bottom: 0; padding-bottom
|
|||
|
||||
#home #caseStudiesWrapper div { position: relative; display: inline-block; vertical-align: top; width: 100%; min-height: 230px; padding: 125px 10px 15px; margin-bottom: 30px; background-position: top center; background-repeat: no-repeat; }
|
||||
|
||||
#home #caseStudiesWrapper div:nth-child(1) { background-image: url(/images/community_logos/pearson_logo.png); }
|
||||
#home #caseStudiesWrapper div:nth-child(1) { background-image: url(/images/community_logos/slingtv_logo.png); }
|
||||
|
||||
#home #caseStudiesWrapper div:nth-child(2) { background-image: url(/images/community_logos/box_logo.png); }
|
||||
#home #caseStudiesWrapper div:nth-child(2) { background-image: url(/images/community_logos/workiva_logo.png); }
|
||||
|
||||
#home #caseStudiesWrapper div:nth-child(3) { background-image: url(/images/community_logos/ebay_logo.png); }
|
||||
#home #caseStudiesWrapper div:nth-child(3) { background-image: url(/images/community_logos/pinterest_logo.png); }
|
||||
|
||||
#home #caseStudiesWrapper div:nth-child(4) { background-image: url(/images/community_logos/wikimedia_foundation_logo.png); }
|
||||
#home #caseStudiesWrapper div:nth-child(4) { background-image: url(/images/community_logos/pearson_logo.png); }
|
||||
|
||||
#home #caseStudiesWrapper p { font-size: 20px; }
|
||||
|
||||
|
|
After Width: | Height: | Size: 165 KiB |
After Width: | Height: | Size: 259 KiB |
After Width: | Height: | Size: 276 KiB |
After Width: | Height: | Size: 294 KiB |
After Width: | Height: | Size: 231 KiB |
After Width: | Height: | Size: 318 KiB |
After Width: | Height: | Size: 8.3 KiB |
After Width: | Height: | Size: 5.8 KiB |
After Width: | Height: | Size: 17 KiB |
After Width: | Height: | Size: 6.5 KiB |
After Width: | Height: | Size: 4.1 KiB |
After Width: | Height: | Size: 9.1 KiB |