Bringing it all Together: An Evaluation of Service Mesh Solutions

A presentation at Philly DevOps Meetup in September 2020 in by Melissa McKay

Slide 1

Slide 1

Bringing it all together: An evaluation of service mesh solutions

Bringing it all together… An evaluation of service mesh solutions @melissajmckay

Slide 2

Slide 2

Melissa McKay ● ● ● ● Mom Developer Developer Advocate @ JFrog Self appointed UNConference Advocate and Promoter: JCrete (http://www.jcrete.org/) JOnsen (http://jonsen.jp/) JSpirit (https://jspirit.org) JAlba (https://jalba.scot) LavaOne/UnVoxxed Hawaii (https://voxxeddays.com/hawaii/)

Slide 3

Slide 3

Simple Rules To maximize what you can get out of the unconference, simple rules apply. Wikipedia summarizes them nicely: 1. 2. 3. 4. 5. Whoever shows up are the right people …reminds participants that they don’t need the CEO and 100 people to get something done, you need people who care. And, absent the direction or control exerted in a traditional meeting, that’s who shows up in the various breakout sessions of an Open Space meeting. Whenever it starts is the right time …reminds participants that spirit and creativity do not run on the clock. Whatever happens is the only thing that could have …reminds participants that once something has happened, it’s done—and no amount of fretting, complaining or otherwise rehashing can change that. Move on. Wherever it happens is the right place …reminds participants that space is opening everywhere all the time. Please be conscious and aware. When it’s over, it’s over …reminds participants that we never know how long it will take to resolve an issue, once raised, but that whenever the issue or work or conversation is finished, move on to the next thing. Don’t keep rehashing just because there’s 30 minutes left in the session. Do the work, not the time. http://www.jcrete.org/what-is-an-unconference_/

Slide 4

Slide 4

What am I going to get out of this? ● ● ● ● ● I can talk to someone else about a service mesh I can understand someone who talks to me about a service mesh I know whether or not a service mesh is something I would get value from I know where to get a service mesh I am aware of some differences between service mesh offerings

Slide 5

Slide 5

How did I get here? /endpoint X 100 Some history… ● ● A mis-behaving service Missing SLAs /endpoint X 1000

Slide 6

Slide 6

How did I get here? /endpoint X 1000 Some history… ● ● ● A mis-behaving service Missing SLAs Replicating the service under a load balancer did NOT solve the problem!

Slide 7

Slide 7

How did I get here? /endpoint X 10 /endpoint X 100 Some history… ● ● ● ● A mis-behaving service Missing SLAs Replicating the service under a load balancer did NOT solve the problem! Analyzed, determined issue & did some research on best way to solve…

Slide 8

Slide 8

Architectural Solution /endpoint Request Type A Request Type B X 1000

Slide 9

Slide 9

Architectural Solution /endpoint X 1000 ? Request Type A Request Type B

Slide 10

Slide 10

An API Gateway? ● ● ● ● ● Filter options for routing Enabled rolling upgrades Enabled traffic shifting & rate limiting Ability to canary nodes Blue-green deployment

Slide 11

Slide 11

An API Gateway? ● ● ● ● ● Filter options for routing Enabled rolling upgrades Enabled traffic shifting & rate limiting Ability to canary nodes Blue-green deployment But what’s this?

Slide 12

Slide 12

Istio… I’ve heard of that!

Slide 13

Slide 13

How is it different? “They can both handle service discovery, request routing, authentication, rate limiting, and monitoring, but there are differences in architectures and intentions. A service mesh’s primary purpose is to manage internal service-to-service communication, while an API Gateway is primarily meant for external client-to-service communication.” https://dzone.com/articles/api-gateway-vs-service-mesh

Slide 14

Slide 14

How is it different? What IS this? “They can both handle service discovery, request routing, authentication, rate limiting, and monitoring, but there are differences in architectures and intentions. A service mesh’s primary purpose is to manage internal service-to-service communication, while an API Gateway is primarily meant for external client-to-service communication.” https://dzone.com/articles/api-gateway-vs-service-mesh

Slide 15

Slide 15

A service mesh is a dedicated infrastructure layer that controls service-to-service communication over a network. https://searchitoperations.techtarget.com/definition/service-mesh - Margaret Rouse, Alex Gillis, WhatIs.com (January, 2019)

Slide 16

Slide 16

A service mesh is a configurable, low‑latency infrastructure layer designed to handle a high volume of network‑based interprocess communication among application infrastructure services using application programming interfaces (APIs). https://www.nginx.com/blog/what-is-a-service-mesh/ - Floyd Smith & Owen Garrett, NGINX (April, 2018)

Slide 17

Slide 17

tl;dr: A service mesh is a dedicated infrastructure layer for making service-to-service communication safe, fast, and reliable. https://buoyant.io/2017/04/25/whats-a-service-mesh-and-why-do-i-need-one/ William Morgan, Buoyant (April, 2017)

Slide 18

Slide 18

A service mesh brings security, resiliency, and visibility to service communications, so developers don’t have to https://www.infoworld.com/article/3402260/what-is-a-service-mesh-service-mesh-expl ained.html - Josh Fruhlinger, InfoWorld (July, 2019)

Slide 19

Slide 19

The term service mesh is used to describe the network of microservices that make up such applications and the interactions between them. https://istio.io/docs/concepts/what-is-istio/ - Istio (accessed September, 2019)

Slide 20

Slide 20

So what is a service mesh, really? A service mesh is a separately managed distributed system that handles common functions required and normally implemented by services that do not concern the business logic of the service itself. A real-life metaphor…. human circulatory system? https://en.wikipedia.org/wiki/Circulatory_system

Slide 21

Slide 21

Service Mesh Capabilities… and more coming! ● ● ● ● ● ● ● ● Service Discovery Observability Rate Limiting Circuit Breaking Traffic Shifting Load Balancing Authorization/Authentication Distributed Tracing

Slide 22

Slide 22

Service Mesh Capabilities… and more coming! ● Service Discovery ● ● ● ● ● ● ● Observability Rate Limiting Circuit Breaking Traffic Shifting Load Balancing Authorization/Authentication Distributed Tracing

Slide 23

Slide 23

Service Mesh Capabilities… and more coming! ● Service Discovery ● Observability ● ● ● ● ● ● Rate Limiting Circuit Breaking Traffic Shifting Load Balancing Authorization/Authentication Distributed Tracing

Slide 24

Slide 24

Service Mesh Capabilities… and more coming! ● ● Service Discovery Observability ● Rate Limiting ● ● ● ● ● Circuit Breaking Traffic Shifting Load Balancing Authorization/Authentication Distributed Tracing

Slide 25

Slide 25

Service Mesh Capabilities… and more coming! ● ● ● Service Discovery Observability Rate Limiting ● Circuit Breaking ● ● ● ● Traffic Shifting Load Balancing Authorization/Authentication Distributed Tracing

Slide 26

Slide 26

Service Mesh Capabilities… and more coming! ● ● ● ● Service Discovery Observability Rate Limiting Circuit Breaking ● Traffic Shifting ● ● ● Load Balancing Authorization/Authentication Distributed Tracing

Slide 27

Slide 27

Service Mesh Capabilities… and more coming! ● ● ● ● ● Service Discovery Observability Rate Limiting Circuit Breaking Traffic Shifting ● Load Balancing ● ● Authorization/Authentication Distributed Tracing

Slide 28

Slide 28

Service Mesh Capabilities… and more coming! ● ● ● ● ● ● Service Discovery Observability Rate Limiting Circuit Breaking Traffic Shifting Load Balancing ● Authorization/Authentication ● Distributed Tracing

Slide 29

Slide 29

Service Mesh Capabilities… and more coming! ● ● ● ● ● ● ● Service Discovery Observability Rate Limiting Circuit Breaking Traffic Shifting Load Balancing Authorization/Authentication ● Distributed Tracing

Slide 30

Slide 30

Service Mesh Capabilities… and more coming! ● ● ● ● ● ● ● ● Service Discovery Observability Rate Limiting Circuit Breaking Traffic Shifting Load Balancing Authorization/Authentication Distributed Tracing … legitimate solutions to all kinds of problems!

Slide 31

Slide 31

Why isn’t everyone using a service mesh??? “If you’re wondering about service mesh, you don’t need one. Period. If you’ve reached the scale and microservice maturity level that requires a service mesh, you will be actively — perhaps desperately — searching for a solution and it will be abundantly obvious that a service mesh is necessary.” https://thenewstack.io/primer-the-who-what-and-why-of-service-mesh/ - Emily Omier (May, 2019) “I think we have a tendency to chase the shiny object, in the sense that X company does Y, therefore I must do Y, even though I don’t have any of X company’s problems.” - Matt Klein

Slide 32

Slide 32

Service Mesh Capabilities… and more coming! ● ● ● ● ● ● ● ● Service Discovery Observability Rate Limiting Circuit Breaking Traffic Shifting Load Balancing Authorization/Authentication Distributed Tracing … legitimate solutions to all kinds of problems!

Slide 33

Slide 33

Back up… what problem are we trying to solve? Matt Klein - Lyft engineer who started Envoy Proxy Podcast: https://softwareengineeringdaily.com/2017/02/14/service-proxying-with-matt-klein/

Slide 34

Slide 34

Biggest Problem?: I vote for Observability & Reliability ● ● ● ● ● Large numbers of services Diverse/Polyglot Different Communication Protocols Service/language specific libraries No standardization on logging or stats

Slide 35

Slide 35

How is a service mesh implemented? DATA PLANE This is the part that touches every request in the system. Sidecar proxies. ● ● All service communication (ingress & egress) routed through proxies The proxy acts as a gateway to the service

Slide 36

Slide 36

How is a service mesh implemented? CONTROL PLANE This is the part that manages the data planes providing them with the data and configuration needed by the system. UI, CLI or some other interface where an operator can set configuration settings.

Slide 37

Slide 37

I’m ready to try it! Where do I start? ENVOY: https://www.envoyproxy.io/learn/ LINKERD: https://linkerd.io/2/getting-started/ ISTIO: https://istio.io/docs/setup/install/kubernetes/

Slide 38

Slide 38

Prereqs to play ● ● ● ● ● Docker Docker Compose Kubernetes basics Helm charts/templates Get used to YAML if you aren’t already Pay attention to versioning, of course! Definitely up your CPUs to 4 and memory to 8 GiB if you use Docker Desktop.

Slide 39

Slide 39

https://azure.microsoft.com/en-us/resources/kubernetes-learning-path/

Slide 40

Slide 40

Slide 41

Slide 41

Slide 42

Slide 42

Slide 43

Slide 43

Slide 44

Slide 44

Slide 45

Slide 45

Slide 46

Slide 46

ISTIO: https://istio.io/docs/setup/install/kubernetes/ Istio ● Go Apache 2.0 license Designed for extensibility, but might come at the cost of complexity Modular, pluggable ● Supports HTTP 1.1, HTTP2, gRPC, and TCP ● ● ● ● ● ● ● OVER 40 EXAMPLES available to play with different features!!! Backed by Google, RedHat & IBM The quick install was easy, but choosing another type of install or configuration felt a little like choose your own adventure. Documentation is good, but you can find yourself in a loop if you follow it blind There are a TON of online tutorials, etc

Slide 47

Slide 47

Istio https://istio.io/docs/concepts/what-is-istio/

Slide 48

Slide 48

LINKERD: https://linkerd.io/2/getting-started/ Linkerd2 ● ● ● ● ● ● ● Go/Rust Apache 2.0 license Supported by Cloud-Native Computing Foundation Data & Control Plane tightly integrated (less modular, but smooth) VERY easy to install and get it up and running Several examples are available to try out key features Intended for Kubernetes currently ● ● ● ● Supports HTTP 1.1, HTTP2, gRPC, and TCP Not as feature rich as Istio, but is a very active project (weekly edge releases, 6-8 week stable releases) Does not have Distributed Tracing YET (This is in their roadmap for this year for 2.6 & 2.7 this year) Excellent documentation UPDATE: Feature adds in 2.6 (Oct) - Distributed tracing, traffic shifting (blue/green, canaries), telemetry, retries, timeouts, proxy auto-injection, mTLS on by default for all HTTP

Slide 49

Slide 49

Linkerd2 https://linkerd.io/2/reference/architecture/index.html

Slide 50

Slide 50

https://kinvolk.io/blog/2019/05/performance-benchmark-analysis-of-istio-and-linkerd/ https://github.com/kinvolk/service-mesh-benchmark/issues/5

Slide 51

Slide 51

Quick Compare - Istio AND Linkerd2 ● ● ● ● ● ● ● Supports Kubernetes Apache 2.0 license Side Car Pattern Deployment Control Plane written in Go Supported Protocols - HTTP1.1, HTTP2, gRPC, TCP Similar traffic control & monitoring features Helm Chart support

Slide 52

Slide 52

Quick Compare ISTIO ● ● ● ● Data plane: Envoy (C++), or others (Nginx) mTLS support Higher performance overhead Pluggable/Modular Generally more Complex Setup LINKERD2 ● ● ● ● Data plane: Native (Rust) Full mTLS support… soon! Lower performance overhead Opinionated/Tightly Coupled Generally Simple Setup

Slide 53

Slide 53

What next? EXPLORE OTHERS! This space is growing fast and getting a lot of attention - one might presume this means there is a definite need in the market, so it’s definitely worth checking out. ● ● Hashicorp - Consul Service Mesh Google - Anthos Service Mesh

Slide 54

Slide 54

Conclusion Choose a solution that addresses REAL problems you need to solve for your system. Consider your developers. Consider your codebase. Consider the performance cost. Evaluate MULTIPLE solutions - don’t simply jump on a bandwagon. The whole idea of a service mesh is pretty cool!

Slide 55

Slide 55

THANK YOU!