A Crash Course in Service Mesh Solutions

A presentation at London Microservices Meetup in December 2020 in London, UK by Melissa McKay

Slide 1

Slide 1

A Crash Course in Service Mesh Solutions A beginner’s guide @melissajmckay

Slide 2

Slide 2

http://bit.ly/AirPodsLDNMicroservices slides jfrog.com/shownotes

Slide 3

Slide 3

Melissa McKay ● ● ● ● Mom Developer Developer Advocate @ JFrog Self appointed UNConference Advocate and Promoter: JCrete (http://www.jcrete.org/) JOnsen (http://jonsen.jp/) JSpirit (https://jspirit.org) JAlba (https://jalba.scot) LavaOne/UnVoxxed Hawaii (https://voxxeddays.com/hawaii/)

Slide 4

Slide 4

Simple Rules To maximize what you can get out of the unconference, simple rules apply. Wikipedia summarizes them nicely: 1. 2. 3. 4. 5. Whoever shows up are the right people …reminds participants that they don’t need the CEO and 100 people to get something done, you need people who care. And, absent the direction or control exerted in a traditional meeting, that’s who shows up in the various breakout sessions of an Open Space meeting. Whenever it starts is the right time …reminds participants that spirit and creativity do not run on the clock. Whatever happens is the only thing that could have …reminds participants that once something has happened, it’s done—and no amount of fretting, complaining or otherwise rehashing can change that. Move on. Wherever it happens is the right place …reminds participants that space is opening everywhere all the time. Please be conscious and aware. When it’s over, it’s over …reminds participants that we never know how long it will take to resolve an issue, once raised, but that whenever the issue or work or conversation is finished, move on to the next thing. Don’t keep rehashing just because there’s 30 minutes left in the session. Do the work, not the time. http://www.jcrete.org/what-is-an-unconference_/

Slide 5

Slide 5

What am I going to get out of this? ● Understand the concepts behind a service mesh ● Learn key differentiators between solutions (Linkerd, Istio) ● Develop an educated opinion

Slide 6

Slide 6

How did I get here? /endpoint X 100 requests Some history… ● ● A mis-behaving service Missing SLAs /endpoint X 1000 requests

Slide 7

Slide 7

How did I get here? /endpoint X 1000 requests Some history… ● ● ● A mis-behaving service Missing SLAs Replicating the service under a load balancer did NOT solve the problem!

Slide 8

Slide 8

How did I get here? /endpoint X 10 requests /endpoint X 100 requests Some history… ● ● ● ● A mis-behaving service Missing SLAs Replicating the service under a load balancer did NOT solve the problem! Analyzed, determined issue & did some research on best way to solve…

Slide 9

Slide 9

Architectural Solution /endpoint Request Type A Request Type B X 1000

Slide 10

Slide 10

Architectural Solution /endpoint X 1000 ? Request Type A Request Type B

Slide 11

Slide 11

An API Gateway? ● ● ● ● ● Filter options for routing Enabled rolling upgrades Enabled traffic shifting & rate limiting Ability to canary nodes Blue-green deployment

Slide 12

Slide 12

An API Gateway? ● ● ● ● ● Filter options for routing Enabled rolling upgrades Enabled traffic shifting & rate limiting Ability to canary nodes Blue-green deployment But what’s this?

Slide 13

Slide 13

Istio… I’ve heard of that!

Slide 14

Slide 14

How is it different? “They can both handle service discovery, request routing, authentication, rate limiting, and monitoring, but there are differences in architectures and intentions. A service mesh’s primary purpose is to manage internal service-to-service communication, while an API Gateway is primarily meant for external client-to-service communication.” https://dzone.com/articles/api-gateway-vs-service-mesh

Slide 15

Slide 15

How is it different? What IS this? “They can both handle service discovery, request routing, authentication, rate limiting, and monitoring, but there are differences in architectures and intentions. A service mesh’s primary purpose is to manage internal service-to-service communication, while an API Gateway is primarily meant for external client-to-service communication.” https://dzone.com/articles/api-gateway-vs-service-mesh

Slide 16

Slide 16

A service mesh is a dedicated infrastructure layer that controls service-to-service communication over a network. https://searchitoperations.techtarget.com/definition/service-mesh - Margaret Rouse, Alex Gillis, WhatIs.com (January, 2019)

Slide 17

Slide 17

A service mesh is a configurable, low‑latency infrastructure layer designed to handle a high volume of network‑based interprocess communication among application infrastructure services using application programming interfaces (APIs). https://www.nginx.com/blog/what-is-a-service-mesh/ - Floyd Smith & Owen Garrett, NGINX (April, 2018)

Slide 18

Slide 18

tl;dr: A service mesh is a dedicated infrastructure layer for making service-to-service communication safe, fast, and reliable. https://buoyant.io/2017/04/25/whats-a-service-mesh-and-why-do-i-need-one/ William Morgan, Buoyant (April, 2017)

Slide 19

Slide 19

A service mesh brings security, resiliency, and visibility to service communications, so developers don’t have to https://www.infoworld.com/article/3402260/what-is-a-service-mesh-service-mesh-expl ained.html - Josh Fruhlinger, InfoWorld (July, 2019)

Slide 20

Slide 20

The term service mesh is used to describe the network of microservices that make up such applications and the interactions between them. https://istio.io/docs/concepts/what-is-istio/ - Istio (accessed September, 2019)

Slide 21

Slide 21

So what is a service mesh, really? A service mesh is a separately managed distributed system that handles common functions required and normally implemented by services that do not concern the business logic of the service itself. A real-life metaphor…. human circulatory system? https://en.wikipedia.org/wiki/Circulatory_system

Slide 22

Slide 22

Service Mesh Capabilities… and more coming! ● ● ● ● ● ● ● ● Service Discovery Observability Rate Limiting Circuit Breaking Traffic Shifting Load Balancing Authorization/Authentication Distributed Tracing

Slide 23

Slide 23

Service Mesh Capabilities… and more coming! ● Service Discovery ● ● ● ● ● ● ● Observability Rate Limiting Circuit Breaking Traffic Shifting Load Balancing Authorization/Authentication Distributed Tracing

Slide 24

Slide 24

Service Mesh Capabilities… and more coming! ● Service Discovery ● Observability ● ● ● ● ● ● Rate Limiting Circuit Breaking Traffic Shifting Load Balancing Authorization/Authentication Distributed Tracing

Slide 25

Slide 25

Service Mesh Capabilities… and more coming! ● ● Service Discovery Observability ● Rate Limiting ● ● ● ● ● Circuit Breaking Traffic Shifting Load Balancing Authorization/Authentication Distributed Tracing

Slide 26

Slide 26

Service Mesh Capabilities… and more coming! ● ● ● Service Discovery Observability Rate Limiting ● Circuit Breaking ● ● ● ● Traffic Shifting Load Balancing Authorization/Authentication Distributed Tracing

Slide 27

Slide 27

Service Mesh Capabilities… and more coming! ● ● ● ● Service Discovery Observability Rate Limiting Circuit Breaking ● Traffic Shifting ● ● ● Load Balancing Authorization/Authentication Distributed Tracing

Slide 28

Slide 28

Service Mesh Capabilities… and more coming! ● ● ● ● ● Service Discovery Observability Rate Limiting Circuit Breaking Traffic Shifting ● Load Balancing ● ● Authorization/Authentication Distributed Tracing

Slide 29

Slide 29

Service Mesh Capabilities… and more coming! ● ● ● ● ● ● Service Discovery Observability Rate Limiting Circuit Breaking Traffic Shifting Load Balancing ● Authorization/Authentication ● Distributed Tracing

Slide 30

Slide 30

Service Mesh Capabilities… and more coming! ● ● ● ● ● ● ● Service Discovery Observability Rate Limiting Circuit Breaking Traffic Shifting Load Balancing Authorization/Authentication ● Distributed Tracing

Slide 31

Slide 31

Service Mesh Capabilities… and more coming! ● ● ● ● ● ● ● ● Service Discovery Observability Rate Limiting Circuit Breaking Traffic Shifting Load Balancing Authorization/Authentication Distributed Tracing … legitimate solutions to all kinds of problems!

Slide 32

Slide 32

Why isn’t everyone using a service mesh??? “If you’re wondering about service mesh, you don’t need one. Period. If you’ve reached the scale and microservice maturity level that requires a service mesh, you will be actively — perhaps desperately — searching for a solution and it will be abundantly obvious that a service mesh is necessary.” https://thenewstack.io/primer-the-who-what-and-why-of-service-mesh/ - Emily Omier (May, 2019) “I think we have a tendency to chase the shiny object, in the sense that X company does Y, therefore I must do Y, even though I don’t have any of X company’s problems.” - Matt Klein

Slide 33

Slide 33

Service Mesh Capabilities… and more coming! ● ● ● ● ● ● ● ● Service Discovery Observability Rate Limiting Circuit Breaking Traffic Shifting Load Balancing Authorization/Authentication Distributed Tracing … legitimate solutions to all kinds of problems!

Slide 34

Slide 34

Back up… what problem are we trying to solve? Matt Klein - Lyft engineer who started Envoy Proxy Podcast: https://softwareengineeringdaily.com/2017/02/14/service-proxying-with-matt-klein/

Slide 35

Slide 35

Biggest Problems?: I vote for Observability & Reliability ● ● ● ● ● Large numbers of services Diverse/Polyglot Different Communication Protocols Service/language specific libraries No standardization on logging or stats

Slide 36

Slide 36

How is a service mesh implemented? DATA PLANE This is the part that touches every request in the system. Sidecar proxies. ● ● All service communication (ingress & egress) routed through proxies The proxy acts as a gateway to the service

Slide 37

Slide 37

How is a service mesh implemented? CONTROL PLANE This is the part that manages the data planes providing them with the data and configuration needed by the system. UI, CLI or some other interface where an operator can set configuration settings.

Slide 38

Slide 38

I’m ready to try it! Where do I start? ENVOY: https://www.envoyproxy.io/learn/ https://www.envoyproxy.io/docs/envoy/latest/start/start LINKERD: https://linkerd.io/2/getting-started/ ISTIO: https://istio.io/docs/setup/install/kubernetes/

Slide 39

Slide 39

Prereqs to play ● ● ● ● ● Docker Docker Compose Kubernetes basics Helm charts/templates Get used to YAML if you aren’t already Pay attention to versioning, of course! Definitely up your CPUs to 4 and memory to 8 GiB if you use Docker Desktop.

Slide 40

Slide 40

https://azure.microsoft.com/en-us/resources/kubernetes-learning-path/

Slide 41

Slide 41

Slide 42

Slide 42

Slide 43

Slide 43

Slide 44

Slide 44

Slide 45

Slide 45

Slide 46

Slide 46

Slide 47

Slide 47

ISTIO: https://istio.io/docs/setup/install/kubernetes/ Istio ● ● ● ● ● ● Go Apache 2.0 license Designed for extensibility, but might come at the cost of complexity Modular, pluggable Supports HTTP 1.1, HTTP2, gRPC, and TCP Support for Kubernetes, VMs OVER 40 EXAMPLES available to play with different features!!! ● ● ● ● Backed by Google, RedHat & IBM The quick install was easy, but choosing another type of install or configuration felt a little like choose your own adventure. Great documentation There are a TON of online tutorials, etc

Slide 48

Slide 48

Istio https://istio.io/docs/concepts/what-is-istio/

Slide 49

Slide 49

LINKERD: https://linkerd.io/2/getting-started/ Linkerd2 ● ● ● ● ● ● ● Go/Rust Apache 2.0 license Supported by Cloud-Native Computing Foundation Data & Control Plane tightly integrated (less modular, but smooth) VERY easy to install and get it up and running Several examples are available to try out key features Intended for Kubernetes ● ● ● Supports HTTP 1.1, HTTP2, gRPC, and TCP Not as feature rich as Istio, but is a very active project (weekly edge releases, 6-8 week stable releases) Latest updates (2.9): Distributed tracing, traffic shifting (blue/green, canaries), telemetry, retries, timeouts, proxy auto-injection, mTLS on by default for all TCP ● Excellent documentation

Slide 50

Slide 50

Linkerd2 https://linkerd.io/2/reference/architecture/index.html

Slide 51

Slide 51

https://kinvolk.io/blog/2019/05/performance-benchmark-analysis-of-istio-and-linkerd/ https://github.com/kinvolk/service-mesh-benchmark/issues/5

Slide 52

Slide 52

Quick Compare - Istio AND Linkerd2 ● ● ● ● ● ● ● ● Supports Kubernetes Apache 2.0 license Side Car Pattern Deployment Control Plane written in Go Supported Protocols - HTTP1.1, HTTP2, gRPC, TCP Similar traffic control & monitoring features Helm Chart support mTLS support

Slide 53

Slide 53

Quick Compare - Differences ISTIO ● ● ● Data plane: Envoy (C++), or others (Nginx) Higher performance overhead Pluggable/Modular Generally more Complex Setup LINKERD2 ● ● ● Data plane: Native (Rust) Lower performance overhead Opinionated/Tightly Coupled Generally Simple Setup Comparison Chart: https://dzone.com/articles/service-mesh-comparison-istio-vs-linkerd

Slide 54

Slide 54

What next? EXPLORE OTHERS! This space is growing fast and getting a lot of attention - one might presume this means there is a definite need in the market, so it’s definitely worth checking out. Service Mesh Interface (SMI): A standard interface for service meshes on Kubernetes. https://smi-spec.io/

Slide 55

Slide 55

Conclusion Choose a solution that addresses REAL problems you need to solve for your system. Consider your developers. Consider your codebase. Consider the performance cost. Evaluate MULTIPLE solutions - don’t simply jump on a bandwagon. The whole idea of a service mesh is pretty cool!

Slide 56

Slide 56

THANK YOU! ! t e g ’t for Q&A Don http://bit.ly/AirPodsLDNMicroservices