Bringing it all together: An evaluation of service mesh solutions
Bringing it all together… An evaluation of service mesh solutions @melissajmckay
A presentation at Philly DevOps Meetup in September 2020 in by Melissa McKay
Bringing it all together… An evaluation of service mesh solutions @melissajmckay
Melissa McKay ● ● ● ● Mom Developer Developer Advocate @ JFrog Self appointed UNConference Advocate and Promoter: JCrete (http://www.jcrete.org/) JOnsen (http://jonsen.jp/) JSpirit (https://jspirit.org) JAlba (https://jalba.scot) LavaOne/UnVoxxed Hawaii (https://voxxeddays.com/hawaii/)
Simple Rules To maximize what you can get out of the unconference, simple rules apply. Wikipedia summarizes them nicely: 1. 2. 3. 4. 5. Whoever shows up are the right people …reminds participants that they don’t need the CEO and 100 people to get something done, you need people who care. And, absent the direction or control exerted in a traditional meeting, that’s who shows up in the various breakout sessions of an Open Space meeting. Whenever it starts is the right time …reminds participants that spirit and creativity do not run on the clock. Whatever happens is the only thing that could have …reminds participants that once something has happened, it’s done—and no amount of fretting, complaining or otherwise rehashing can change that. Move on. Wherever it happens is the right place …reminds participants that space is opening everywhere all the time. Please be conscious and aware. When it’s over, it’s over …reminds participants that we never know how long it will take to resolve an issue, once raised, but that whenever the issue or work or conversation is finished, move on to the next thing. Don’t keep rehashing just because there’s 30 minutes left in the session. Do the work, not the time. http://www.jcrete.org/what-is-an-unconference_/
What am I going to get out of this? ● ● ● ● ● I can talk to someone else about a service mesh I can understand someone who talks to me about a service mesh I know whether or not a service mesh is something I would get value from I know where to get a service mesh I am aware of some differences between service mesh offerings
How did I get here? /endpoint X 100 Some history… ● ● A mis-behaving service Missing SLAs /endpoint X 1000
How did I get here? /endpoint X 1000 Some history… ● ● ● A mis-behaving service Missing SLAs Replicating the service under a load balancer did NOT solve the problem!
How did I get here? /endpoint X 10 /endpoint X 100 Some history… ● ● ● ● A mis-behaving service Missing SLAs Replicating the service under a load balancer did NOT solve the problem! Analyzed, determined issue & did some research on best way to solve…
Architectural Solution /endpoint Request Type A Request Type B X 1000
Architectural Solution /endpoint X 1000 ? Request Type A Request Type B
An API Gateway? ● ● ● ● ● Filter options for routing Enabled rolling upgrades Enabled traffic shifting & rate limiting Ability to canary nodes Blue-green deployment
An API Gateway? ● ● ● ● ● Filter options for routing Enabled rolling upgrades Enabled traffic shifting & rate limiting Ability to canary nodes Blue-green deployment But what’s this?
Istio… I’ve heard of that!
How is it different? “They can both handle service discovery, request routing, authentication, rate limiting, and monitoring, but there are differences in architectures and intentions. A service mesh’s primary purpose is to manage internal service-to-service communication, while an API Gateway is primarily meant for external client-to-service communication.” https://dzone.com/articles/api-gateway-vs-service-mesh
How is it different? What IS this? “They can both handle service discovery, request routing, authentication, rate limiting, and monitoring, but there are differences in architectures and intentions. A service mesh’s primary purpose is to manage internal service-to-service communication, while an API Gateway is primarily meant for external client-to-service communication.” https://dzone.com/articles/api-gateway-vs-service-mesh
A service mesh is a dedicated infrastructure layer that controls service-to-service communication over a network. https://searchitoperations.techtarget.com/definition/service-mesh - Margaret Rouse, Alex Gillis, WhatIs.com (January, 2019)
A service mesh is a configurable, low‑latency infrastructure layer designed to handle a high volume of network‑based interprocess communication among application infrastructure services using application programming interfaces (APIs). https://www.nginx.com/blog/what-is-a-service-mesh/ - Floyd Smith & Owen Garrett, NGINX (April, 2018)
tl;dr: A service mesh is a dedicated infrastructure layer for making service-to-service communication safe, fast, and reliable. https://buoyant.io/2017/04/25/whats-a-service-mesh-and-why-do-i-need-one/ William Morgan, Buoyant (April, 2017)
A service mesh brings security, resiliency, and visibility to service communications, so developers don’t have to https://www.infoworld.com/article/3402260/what-is-a-service-mesh-service-mesh-expl ained.html - Josh Fruhlinger, InfoWorld (July, 2019)
The term service mesh is used to describe the network of microservices that make up such applications and the interactions between them. https://istio.io/docs/concepts/what-is-istio/ - Istio (accessed September, 2019)
So what is a service mesh, really? A service mesh is a separately managed distributed system that handles common functions required and normally implemented by services that do not concern the business logic of the service itself. A real-life metaphor…. human circulatory system? https://en.wikipedia.org/wiki/Circulatory_system
Service Mesh Capabilities… and more coming! ● ● ● ● ● ● ● ● Service Discovery Observability Rate Limiting Circuit Breaking Traffic Shifting Load Balancing Authorization/Authentication Distributed Tracing
Service Mesh Capabilities… and more coming! ● Service Discovery ● ● ● ● ● ● ● Observability Rate Limiting Circuit Breaking Traffic Shifting Load Balancing Authorization/Authentication Distributed Tracing
Service Mesh Capabilities… and more coming! ● Service Discovery ● Observability ● ● ● ● ● ● Rate Limiting Circuit Breaking Traffic Shifting Load Balancing Authorization/Authentication Distributed Tracing
Service Mesh Capabilities… and more coming! ● ● Service Discovery Observability ● Rate Limiting ● ● ● ● ● Circuit Breaking Traffic Shifting Load Balancing Authorization/Authentication Distributed Tracing
Service Mesh Capabilities… and more coming! ● ● ● Service Discovery Observability Rate Limiting ● Circuit Breaking ● ● ● ● Traffic Shifting Load Balancing Authorization/Authentication Distributed Tracing
Service Mesh Capabilities… and more coming! ● ● ● ● Service Discovery Observability Rate Limiting Circuit Breaking ● Traffic Shifting ● ● ● Load Balancing Authorization/Authentication Distributed Tracing
Service Mesh Capabilities… and more coming! ● ● ● ● ● Service Discovery Observability Rate Limiting Circuit Breaking Traffic Shifting ● Load Balancing ● ● Authorization/Authentication Distributed Tracing
Service Mesh Capabilities… and more coming! ● ● ● ● ● ● Service Discovery Observability Rate Limiting Circuit Breaking Traffic Shifting Load Balancing ● Authorization/Authentication ● Distributed Tracing
Service Mesh Capabilities… and more coming! ● ● ● ● ● ● ● Service Discovery Observability Rate Limiting Circuit Breaking Traffic Shifting Load Balancing Authorization/Authentication ● Distributed Tracing
Service Mesh Capabilities… and more coming! ● ● ● ● ● ● ● ● Service Discovery Observability Rate Limiting Circuit Breaking Traffic Shifting Load Balancing Authorization/Authentication Distributed Tracing … legitimate solutions to all kinds of problems!
Why isn’t everyone using a service mesh??? “If you’re wondering about service mesh, you don’t need one. Period. If you’ve reached the scale and microservice maturity level that requires a service mesh, you will be actively — perhaps desperately — searching for a solution and it will be abundantly obvious that a service mesh is necessary.” https://thenewstack.io/primer-the-who-what-and-why-of-service-mesh/ - Emily Omier (May, 2019) “I think we have a tendency to chase the shiny object, in the sense that X company does Y, therefore I must do Y, even though I don’t have any of X company’s problems.” - Matt Klein
Service Mesh Capabilities… and more coming! ● ● ● ● ● ● ● ● Service Discovery Observability Rate Limiting Circuit Breaking Traffic Shifting Load Balancing Authorization/Authentication Distributed Tracing … legitimate solutions to all kinds of problems!
Back up… what problem are we trying to solve? Matt Klein - Lyft engineer who started Envoy Proxy Podcast: https://softwareengineeringdaily.com/2017/02/14/service-proxying-with-matt-klein/
Biggest Problem?: I vote for Observability & Reliability ● ● ● ● ● Large numbers of services Diverse/Polyglot Different Communication Protocols Service/language specific libraries No standardization on logging or stats
How is a service mesh implemented? DATA PLANE This is the part that touches every request in the system. Sidecar proxies. ● ● All service communication (ingress & egress) routed through proxies The proxy acts as a gateway to the service
How is a service mesh implemented? CONTROL PLANE This is the part that manages the data planes providing them with the data and configuration needed by the system. UI, CLI or some other interface where an operator can set configuration settings.
I’m ready to try it! Where do I start? ENVOY: https://www.envoyproxy.io/learn/ LINKERD: https://linkerd.io/2/getting-started/ ISTIO: https://istio.io/docs/setup/install/kubernetes/
Prereqs to play ● ● ● ● ● Docker Docker Compose Kubernetes basics Helm charts/templates Get used to YAML if you aren’t already Pay attention to versioning, of course! Definitely up your CPUs to 4 and memory to 8 GiB if you use Docker Desktop.
https://azure.microsoft.com/en-us/resources/kubernetes-learning-path/
ISTIO: https://istio.io/docs/setup/install/kubernetes/ Istio ● Go Apache 2.0 license Designed for extensibility, but might come at the cost of complexity Modular, pluggable ● Supports HTTP 1.1, HTTP2, gRPC, and TCP ● ● ● ● ● ● ● OVER 40 EXAMPLES available to play with different features!!! Backed by Google, RedHat & IBM The quick install was easy, but choosing another type of install or configuration felt a little like choose your own adventure. Documentation is good, but you can find yourself in a loop if you follow it blind There are a TON of online tutorials, etc
Istio https://istio.io/docs/concepts/what-is-istio/
LINKERD: https://linkerd.io/2/getting-started/ Linkerd2 ● ● ● ● ● ● ● Go/Rust Apache 2.0 license Supported by Cloud-Native Computing Foundation Data & Control Plane tightly integrated (less modular, but smooth) VERY easy to install and get it up and running Several examples are available to try out key features Intended for Kubernetes currently ● ● ● ● Supports HTTP 1.1, HTTP2, gRPC, and TCP Not as feature rich as Istio, but is a very active project (weekly edge releases, 6-8 week stable releases) Does not have Distributed Tracing YET (This is in their roadmap for this year for 2.6 & 2.7 this year) Excellent documentation UPDATE: Feature adds in 2.6 (Oct) - Distributed tracing, traffic shifting (blue/green, canaries), telemetry, retries, timeouts, proxy auto-injection, mTLS on by default for all HTTP
Linkerd2 https://linkerd.io/2/reference/architecture/index.html
https://kinvolk.io/blog/2019/05/performance-benchmark-analysis-of-istio-and-linkerd/ https://github.com/kinvolk/service-mesh-benchmark/issues/5
Quick Compare - Istio AND Linkerd2 ● ● ● ● ● ● ● Supports Kubernetes Apache 2.0 license Side Car Pattern Deployment Control Plane written in Go Supported Protocols - HTTP1.1, HTTP2, gRPC, TCP Similar traffic control & monitoring features Helm Chart support
Quick Compare ISTIO ● ● ● ● Data plane: Envoy (C++), or others (Nginx) mTLS support Higher performance overhead Pluggable/Modular Generally more Complex Setup LINKERD2 ● ● ● ● Data plane: Native (Rust) Full mTLS support… soon! Lower performance overhead Opinionated/Tightly Coupled Generally Simple Setup
What next? EXPLORE OTHERS! This space is growing fast and getting a lot of attention - one might presume this means there is a definite need in the market, so it’s definitely worth checking out. ● ● Hashicorp - Consul Service Mesh Google - Anthos Service Mesh
Conclusion Choose a solution that addresses REAL problems you need to solve for your system. Consider your developers. Consider your codebase. Consider the performance cost. Evaluate MULTIPLE solutions - don’t simply jump on a bandwagon. The whole idea of a service mesh is pretty cool!
THANK YOU!