Tradeoffs, Bad Science, and Polar Bears - The World of Java Optimisation

A presentation at Accento in September 2021 in by Holly Cummins

Slide 1

Slide 1

tradeoffs, bad science, and polar bears: the world of java optimisation Holly Cummins IBM @holly_cummins

Slide 2

Slide 2

why optimise? #IBM @holly_cummins

Slide 3

Slide 3

why optimise? #IBM @holly_cummins

Slide 4

Slide 4

0.5s extra search page time why optimise? #IBM @holly_cummins

Slide 5

Slide 5

0.5s extra search page time 20% drop in traf c why optimise? fi #IBM @holly_cummins

Slide 6

Slide 6

0.5s extra search page time 20% drop in traf c 100 ms latency on page load why optimise? fi #IBM @holly_cummins

Slide 7

Slide 7

0.5s extra search page time 20% drop in traf c 100 ms latency on page load 7% lower conversion rate why optimise? fi #IBM @holly_cummins

Slide 8

Slide 8

0.5s extra search page time 20% drop in traf c 100 ms latency on page load 7% lower conversion rate why optimise? fi #IBM @holly_cummins

Slide 9

Slide 9

0.5s extra search page time 20% drop in traf c 100 ms latency on page load 7% lower conversion rate 10 ms delay in trading platform fi #IBM why optimise? @holly_cummins

Slide 10

Slide 10

0.5s extra search page time 20% drop in traf c 100 ms latency on page load 7% lower conversion rate 10 ms delay in trading platform fi #IBM 10% drop in revenue why optimise? @holly_cummins

Slide 11

Slide 11

what is optimising? #IBM @holly_cummins

Slide 12

Slide 12

“make it go faster” for whom? when? doing what? #IBM @holly_cummins

Slide 13

Slide 13

design thinking #IBM @holly_cummins

Slide 14

Slide 14

#IBM @holly_cummins

Slide 15

Slide 15

performance can be: #IBM @holly_cummins

Slide 16

Slide 16

performance can be: throughput #IBM @holly_cummins

Slide 17

Slide 17

performance can be: throughput #IBM transactions per second @holly_cummins

Slide 18

Slide 18

performance can be: throughput transactions per second latency #IBM @holly_cummins

Slide 19

Slide 19

performance can be: throughput latency #IBM transactions per second start-up time @holly_cummins

Slide 20

Slide 20

performance can be: transactions per second throughput latency #IBM response time start-up time @holly_cummins

Slide 21

Slide 21

performance can be: transactions per second throughput latency #IBM response time ramp-up time start-up time @holly_cummins

Slide 22

Slide 22

performance can be: transactions per second throughput latency response time ramp-up time start-up time capacity #IBM @holly_cummins

Slide 23

Slide 23

performance can be: transactions per second throughput latency capacity #IBM ramp-up time response time start-up time footprint @holly_cummins

Slide 24

Slide 24

performance can be: transactions per second throughput latency capacity ramp-up time response time start-up time footprint CPU usage #IBM @holly_cummins

Slide 25

Slide 25

performance can be: transactions per second throughput latency capacity utilisation #IBM ramp-up time response time start-up time footprint CPU usage @holly_cummins

Slide 26

Slide 26

performance can be: transactions per second throughput latency capacity utilisation ramp-up time response time start-up time footprint CPU usage … #IBM @holly_cummins

Slide 27

Slide 27

Never underestimate the bandwidth [throughput] of a station wagon full of tapes hurtling down the highway. –Andrew Tanenbaum, 1981 #IBM @holly_cummins

Slide 28

Slide 28

Never underestimate the bandwidth [throughput] of a station wagon full of tapes hurtling down the highway. –Andrew Tanenbaum, 1981 but the latency is terrible … #IBM @holly_cummins

Slide 29

Slide 29

requirements change #IBMGarage @holly_cummins

Slide 30

Slide 30

#IBMGarage @holly_cummins

Slide 31

Slide 31

#IBMGarage @holly_cummins

Slide 32

Slide 32

#IBMGarage @holly_cummins

Slide 33

Slide 33

#IBMGarage @holly_cummins

Slide 34

Slide 34

I am not designed for this. #IBMGarage @holly_cummins

Slide 35

Slide 35

the world changes #IBMGarage @holly_cummins

Slide 36

Slide 36

#IBM @holly_cummins

Slide 37

Slide 37

-Xmx == $ #IBM @holly_cummins

Slide 38

Slide 38

-Xmx == $ footprint #IBM @holly_cummins

Slide 39

Slide 39

#IBM @holly_cummins

Slide 40

Slide 40

which performs better? #IBM @holly_cummins

Slide 41

Slide 41

quarkus trading-off flexibility against startup speed and footprint #IBM @holly_cummins

Slide 42

Slide 42

quarkus trading-off flexibility against startup speed and footprint uhh … are you supposed to shut down applications after using them? #IBM @holly_cummins

Slide 43

Slide 43

behaviour at idle 30% of VMs are zombies (antithesisgroup.com) #IBM @holly_cummins

Slide 44

Slide 44

how to optimise? #IBM @holly_cummins

Slide 45

Slide 45

fi find the bottleneck. x it. #IBM @holly_cummins

Slide 46

Slide 46

pitfall 1 intuition #IBM @holly_cummins

Slide 47

Slide 47

this is not the place for ideas #IBM @holly_cummins

Slide 48

Slide 48

measure, don’t guess. #IBM @holly_cummins

Slide 49

Slide 49

measure the right thing #IBM @holly_cummins

Slide 50

Slide 50

measure the right thing what do your users care about? #IBM @holly_cummins

Slide 51

Slide 51

pitfall 2 numbers #IBM @holly_cummins

Slide 52

Slide 52

#IBM @holly_cummins

Slide 53

Slide 53

leading indicators #IBM @holly_cummins

Slide 54

Slide 54

leading indicators #IBM lagging indicators @holly_cummins

Slide 55

Slide 55

leading indicators lagging indicators we care about them #IBM @holly_cummins

Slide 56

Slide 56

leading indicators lagging indicators we care about them easy to measure #IBM @holly_cummins

Slide 57

Slide 57

leading indicators lagging indicators we care about them easy to measure hard to change #IBM @holly_cummins

Slide 58

Slide 58

#IBM leading indicators lagging indicators easy to change we care about them easy to measure hard to change @holly_cummins

Slide 59

Slide 59

leading indicators lagging indicators predictive of a thing we care about we care about them easy to measure hard to change easy to change #IBM @holly_cummins

Slide 60

Slide 60

#IBM leading indicators lagging indicators predictive of a thing we care about hard to identify easy to change we care about them easy to measure hard to change @holly_cummins

Slide 61

Slide 61

#IBM leading indicators lagging indicators predictive of a thing we care about hard to identify easy to change we care about them easy to measure hard to change @holly_cummins

Slide 62

Slide 62

caution: performance experiments for entertainment purposes only. do not try these at home. #IBM @holly_cummins

Slide 63

Slide 63

2007 #IBM @holly_cummins

Slide 64

Slide 64

bad-ish advice: “reduce time spent in garbage collection” #IBM @holly_cummins

Slide 65

Slide 65

bad-ish advice: “reduce time spent in garbage collection” actually, garbage collection can make your application go faster #IBM @holly_cummins

Slide 66

Slide 66

2007 #IBM @holly_cummins

Slide 67

Slide 67

2007 #IBM @holly_cummins

Slide 68

Slide 68

2021 #IBM @holly_cummins

Slide 69

Slide 69

2021 #IBM @holly_cummins

Slide 70

Slide 70

-verbose:gc -Xverbosegclog:gclog.xml -Xcompactgc #IBM @holly_cummins

Slide 71

Slide 71

-verbose:gc -Xverbosegclog:gclog.xml -Xgcpolicy:optthruput -Xcompactgc #IBM @holly_cummins

Slide 72

Slide 72

-verbose:gc -Xverbosegclog:gclog.xml -Xgcpolicy:optthruput -Xmx110m -Xms110m -Xnocompactgc #IBM @holly_cummins

Slide 73

Slide 73

-verbose:gc -Xverbosegclog:gclog.xml -Xgcpolicy:optthruput -Xmx160m -Xms160m -Xnocompactgc #IBM @holly_cummins

Slide 74

Slide 74

-verbose:gc -Xverbosegclog:gclog.xml -Xgcpolicy:optthruput -Xmx300m -Xms300m -Xcompactgc why does the performance stay exactly the same no matter what gc settings I choose? #IBM @holly_cummins

Slide 75

Slide 75

by the way, this is cheating. (remember the ‘bad science’?) #IBM @holly_cummins

Slide 76

Slide 76

-verbose:gc #IBM @holly_cummins

Slide 77

Slide 77

Slide 78

Slide 78

Slide 79

Slide 79

Slide 80

Slide 80

Slide 81

Slide 81

Slide 82

Slide 82

Slide 83

Slide 83

Slide 84

Slide 84

Slide 85

Slide 85

Slide 86

Slide 86

4.1% of time in GC pause 23.9 GB garbage collected 493 transactions/s total GC time: 12.0s 3.6% of time in GC pause 13.0 GB garbage collected 260 transactions/s

Slide 87

Slide 87

total GC time: 21.6s 4.1% of time in GC pause 23.9 GB garbage collected 493 transactions/s #IBM total GC time: 12.0s 3.6% of time in GC pause 13.0 GB garbage collected 260 transactions/s @holly_cummins

Slide 88

Slide 88

leading indicator total GC time: 21.6s 4.1% of time in GC pause 23.9 GB garbage collected 493 transactions/s #IBM total GC time: 12.0s 3.6% of time in GC pause 13.0 GB garbage collected 260 transactions/s @holly_cummins

Slide 89

Slide 89

leading indicator total GC time: 21.6s 4.1% of time in GC pause 23.9 GB garbage collected 493 transactions/s #IBM total GC time: 12.0s 3.6% of time in GC pause 13.0 GB garbage collected 260 transactions/s @holly_cummins

Slide 90

Slide 90

lagging indicator leading indicator total GC time: 21.6s 4.1% of time in GC pause 23.9 GB garbage collected 493 transactions/s #IBM total GC time: 12.0s 3.6% of time in GC pause 13.0 GB garbage collected 260 transactions/s @holly_cummins

Slide 91

Slide 91

lagging indicator leading indicator ? total GC time: 21.6s 4.1% of time in GC pause 23.9 GB garbage collected 493 transactions/s #IBM total GC time: 12.0s 3.6% of time in GC pause 13.0 GB garbage collected 260 transactions/s @holly_cummins

Slide 92

Slide 92

lagging indicator ? leading indicator ? total GC time: 21.6s 4.1% of time in GC pause 23.9 GB garbage collected 493 transactions/s #IBM total GC time: 12.0s 3.6% of time in GC pause 13.0 GB garbage collected 260 transactions/s @holly_cummins

Slide 93

Slide 93

so wait, what changed to make the app faster? running jmeter on the same machine as the app gives a big speedup! #IBM @holly_cummins

Slide 94

Slide 94

“Any improvements made anywhere besides the bottleneck are an illusion.” – Gene Kim #IBM @holly_cummins

Slide 95

Slide 95

time kills all performance advice (even mine) #IBM @holly_cummins

Slide 96

Slide 96

the takeaways: gc can improve performance by rearranging the heap find the bottleneck validate advice independently #IBM @holly_cummins

Slide 97

Slide 97

pitfall 3 advice #IBM @holly_cummins

Slide 98

Slide 98

I read it on the internet! #IBM @holly_cummins

Slide 99

Slide 99

noooooo! “make one big method because method dispatching is slow” #IBM @holly_cummins

Slide 100

Slide 100

noooooo! “re-use your objects to help the garbage collector” #IBM @holly_cummins

Slide 101

Slide 101

noooooo! “to tune your JVM, use this command-line:” -server -Xms1g -Xmx1g -XX:PermSize=1g -XX:MaxPermSize=256m -Xmn256m -Xss64k -XX:SurvivorRatio=30 -XX:+UseConcMarkSweepGC -XX: +CMSParallelRemarkEnabled -XX:+UseCMSInitiatingOccupancyOnly -XX:CMSInitiatingOccupancyFraction=10 -XX:+ScavengeBeforeFullGC -XX: +CMSScavengeBeforeRemark -XX:+PrintGCDateStamps -verbose:gc -XX: +PrintGCDetails -Dsun.net.inetaddr.ttl=5 -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=date.hprof -Dcom.sun.management.jmxremote.port=5616 -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -server -Xms2g -Xmx2g -XX:MaxPermSize=256m -XX:NewRatio=1 -XX:+UseConcMarkSweepGC #IBM @holly_cummins

Slide 102

Slide 102

noooooo! use StringBuilder, never concatenate strings with += #IBM @holly_cummins

Slide 103

Slide 103

noooooo! wait, what? yes, right? use StringBuilder, never concatenate strings with += #IBM @holly_cummins

Slide 104

Slide 104

2 things ruin advice: • context • time #IBM @holly_cummins

Slide 105

Slide 105

pitfall 4 micro-optimisation #IBM @holly_cummins

Slide 106

Slide 106

#IBM @holly_cummins

Slide 107

Slide 107

static string beSlow() { string result = “”; for (int i = 0; i < 314159; i++) { result += getStringData(i); } return result; } #IBM @holly_cummins

Slide 108

Slide 108

@Override public String toString() { String ret = “\n\tMarket Summary at: ” + getSummaryDate() + “\n\t\t TSIA:” + getTSIA() + “\n\t\t openTSIA:” + getOpenTSIA() + “\n\t\t gain:” + getGainPercent() + “\n\t\t volume:” + getVolume(); if ((getTopGainers() == null) || (getTopLosers() == null)) { return ret; } ret += “\n\t\t Current Top Gainers:”; Iterator<QuoteDataBean> it = getTopGainers().iterator(); while (it.hasNext()) { QuoteDataBean quoteData = it.next(); ret += (“\n\t\t\t” + quoteData.toString()); } ret += “\n\t\t Current Top Losers:”; it = getTopLosers().iterator(); while (it.hasNext()) { QuoteDataBean quoteData = it.next(); ret += (“\n\t\t\t” + quoteData.toString()); } return ret; } #IBM @holly_cummins

Slide 109

Slide 109

@Override public String toString() { String ret = “\n\tMarket Summary at: ” + getSummaryDate() + “\n\t\t TSIA:” + getTSIA() + “\n\t\t openTSIA:” + getOpenTSIA() + “\n\t\t gain:” + getGainPercent() + “\n\t\t volume:” + getVolume(); if ((getTopGainers() == null) || (getTopLosers() == null)) { return ret; } ret += “\n\t\t Current Top Gainers:”; Iterator<QuoteDataBean> it = getTopGainers().iterator(); while (it.hasNext()) { QuoteDataBean quoteData = it.next(); ret += (“\n\t\t\t” + quoteData.toString()); } ret += “\n\t\t Current Top Losers:”; it = getTopLosers().iterator(); while (it.hasNext()) { QuoteDataBean quoteData = it.next(); ret += (“\n\t\t\t” + quoteData.toString()); } return ret; } #IBM @holly_cummins

Slide 110

Slide 110

@Override public String toString() { String ret = “\n\tMarket Summary at: ” + getSummaryDate() + “\n\t\t TSIA:” + getTSIA() + “\n\t\t openTSIA:” + getOpenTSIA() + “\n\t\t gain:” + getGainPercent() + “\n\t\t volume:” + getVolume(); if ((getTopGainers() == null) || (getTopLosers() == null)) { return ret; } ret += “\n\t\t Current Top Gainers:”; Iterator<QuoteDataBean> it = getTopGainers().iterator(); while (it.hasNext()) { QuoteDataBean quoteData = it.next(); ret += (“\n\t\t\t” + quoteData.toString()); } ret += “\n\t\t Current Top Losers:”; it = getTopLosers().iterator(); while (it.hasNext()) { QuoteDataBean quoteData = it.next(); ret += (“\n\t\t\t” + quoteData.toString()); } return ret; } #IBM @holly_cummins

Slide 111

Slide 111

this never gets called @Override public String toString() { String ret = “\n\tMarket Summary at: ” + getSummaryDate() + “\n\t\t TSIA:” + getTSIA() + “\n\t\t openTSIA:” + getOpenTSIA() + “\n\t\t gain:” + getGainPercent() + “\n\t\t volume:” + getVolume(); if ((getTopGainers() == null) || (getTopLosers() == null)) { return ret; } ret += “\n\t\t Current Top Gainers:”; Iterator<QuoteDataBean> it = getTopGainers().iterator(); while (it.hasNext()) { QuoteDataBean quoteData = it.next(); ret += (“\n\t\t\t” + quoteData.toString()); } ret += “\n\t\t Current Top Losers:”; it = getTopLosers().iterator(); while (it.hasNext()) { QuoteDataBean quoteData = it.next(); ret += (“\n\t\t\t” + quoteData.toString()); } return ret; } #IBM @holly_cummins

Slide 112

Slide 112

let’s make travel energy-efficient? #IBM @holly_cummins

Slide 113

Slide 113

every little helps? #IBM @holly_cummins

Slide 114

Slide 114

every little helps? every optimisation is another optimisation you aren’t doing #IBM @holly_cummins

Slide 115

Slide 115

our platforms help #IBM @holly_cummins

Slide 116

Slide 116

static string beSlow() { string result = “”; for (int i = 0; i < 314159; i++) { result += getStringData(i); } return result; } #IBM @holly_cummins

Slide 117

Slide 117

static string beSlow() { string result = “”; result += getStringData(1); result += getStringData(2); result += getStringData(3); } #IBM return result; @holly_cummins

Slide 118

Slide 118

static string beSlow() { string result = “”; result += getStringData(1); result += getStringData(2); result += getStringData(3); } #IBM return result; this is fine @holly_cummins

Slide 119

Slide 119

the JVM writers have far more time for optimising than you do clean, typical, code runs best #IBM @holly_cummins

Slide 120

Slide 120

ok, but how to optimise? #IBM @holly_cummins

Slide 121

Slide 121

tools #IBM @holly_cummins

Slide 122

Slide 122

“What you can optimize is limited to what you can observe.” -Susie Xia, Netflix #IBM @holly_cummins

Slide 123

Slide 123

observability #IBM @holly_cummins

Slide 124

Slide 124

method profiler GC analysis heap analysis APM distributed tracing * not free #IBM this is an incomplete list, because there are a lot of tools out there, and many cost money @holly_cummins

Slide 125

Slide 125

method profiler VisualVM GC analysis heap analysis APM distributed tracing * not free #IBM this is an incomplete list, because there are a lot of tools out there, and many cost money @holly_cummins

Slide 126

Slide 126

method profiler VisualVM Mission Control GC analysis heap analysis APM distributed tracing * not free #IBM this is an incomplete list, because there are a lot of tools out there, and many cost money @holly_cummins

Slide 127

Slide 127

method profiler VisualVM Mission Control GC analysis IBM Health Center (for OpenJ9) heap analysis APM distributed tracing * not free #IBM this is an incomplete list, because there are a lot of tools out there, and many cost money @holly_cummins

Slide 128

Slide 128

method profiler flame graphs VisualVM Mission Control GC analysis IBM Health Center (for OpenJ9) heap analysis APM distributed tracing * not free #IBM this is an incomplete list, because there are a lot of tools out there, and many cost money @holly_cummins

Slide 129

Slide 129

method profiler flame graphs VisualVM Mission Control GC analysis IBM Health Center (for OpenJ9) GCMV heap analysis APM distributed tracing * not free #IBM this is an incomplete list, because there are a lot of tools out there, and many cost money @holly_cummins

Slide 130

Slide 130

method profiler flame graphs VisualVM Mission Control GC analysis IBM Health Center (for OpenJ9) GCMV heap analysis Eclipse MAT APM distributed tracing * not free #IBM this is an incomplete list, because there are a lot of tools out there, and many cost money @holly_cummins

Slide 131

Slide 131

method profiler flame graphs VisualVM Mission Control GC analysis GCMV heap analysis APM IBM Health Center (for OpenJ9) Eclipse MAT GlowRoot distributed tracing * not free #IBM this is an incomplete list, because there are a lot of tools out there, and many cost money @holly_cummins

Slide 132

Slide 132

method profiler flame graphs VisualVM Mission Control GC analysis GCMV heap analysis APM IBM Health Center (for OpenJ9) GlowRoot Eclipse MAT New Relic* distributed tracing * not free #IBM this is an incomplete list, because there are a lot of tools out there, and many cost money @holly_cummins

Slide 133

Slide 133

method profiler flame graphs VisualVM Mission Control GC analysis GCMV heap analysis APM IBM Health Center (for OpenJ9) GlowRoot Eclipse MAT AppDynamics* New Relic* distributed tracing * not free #IBM this is an incomplete list, because there are a lot of tools out there, and many cost money @holly_cummins

Slide 134

Slide 134

method profiler flame graphs VisualVM Mission Control GC analysis GCMV heap analysis APM IBM Health Center (for OpenJ9) GlowRoot Eclipse MAT AppDynamics* New Relic* Dynatrace* distributed tracing * not free #IBM this is an incomplete list, because there are a lot of tools out there, and many cost money @holly_cummins

Slide 135

Slide 135

method profiler flame graphs VisualVM Mission Control GC analysis GCMV heap analysis APM IBM Health Center (for OpenJ9) GlowRoot distributed tracing Eclipse MAT AppDynamics* New Relic* Dynatrace* Zipkin

  • not free #IBM this is an incomplete list, because there are a lot of tools out there, and many cost money @holly_cummins

Slide 136

Slide 136

method profiler flame graphs VisualVM IBM Health Center (for OpenJ9) Mission Control GC analysis GCMV heap analysis APM GlowRoot distributed tracing Eclipse MAT AppDynamics* New Relic* Zipkin Dynatrace* Jaeger

  • not free #IBM this is an incomplete list, because there are a lot of tools out there, and many cost money @holly_cummins

Slide 137

Slide 137

optimising a micro-service: is that micro-optimising? Netflix microservice architecture #IBM @holly_cummins

Slide 138

Slide 138

you may need to know the whole system context to know what to optimise #IBMGarage @holly_cummins

Slide 139

Slide 139

“Nines don’t matter if your users aren’t happy.” – Charity Majors #IBM @holly_cummins

Slide 140

Slide 140

don’t forget the edges queueing theory helps us understand where the disasters happen #IBM @holly_cummins

Slide 141

Slide 141

“When it comes to IT performance, amateurs look at averages. Professionals look at distributions.” – Avishai Ish-Shalom #IBM @holly_cummins

Slide 142

Slide 142

slow performance can turn into big cloud bills make cloud costs visible to engineers #IBM @holly_cummins

Slide 143

Slide 143

ok, but you promised bears #IBM @holly_cummins

Slide 144

Slide 144

if you leave the TV on when you’re not using it, you’re a polar bear murderer #IBM @holly_cummins

Slide 145

Slide 145

there is a moral imperative to avoid waste #IBM @holly_cummins

Slide 146

Slide 146

there is a moral imperative to avoid waste electricity hardware #IBM @holly_cummins

Slide 147

Slide 147

data centres use 1-2% of the world’s electricity #IBM @holly_cummins

Slide 148

Slide 148

fewer devices longer lifetime #IBM @holly_cummins

Slide 149

Slide 149

higher ef ciency fewer devices longer lifetime @holly_cummins fi #IBM

Slide 150

Slide 150

higher ef ciency fewer devices lower footprint longer lifetime @holly_cummins fi #IBM

Slide 151

Slide 151

higher ef ciency fewer devices lower footprint more multitenancy longer lifetime @holly_cummins fi #IBM

Slide 152

Slide 152

higher ef ciency fewer devices lower footprint more multitenancy longer lifetime @holly_cummins fi #IBM optimise for longevity

Slide 153

Slide 153

higher ef ciency fewer devices lower footprint more multitenancy longer lifetime the end of planned obsolescence? @holly_cummins fi #IBM optimise for longevity

Slide 154

Slide 154

sooo … you can optimise, and it can be fun measure, don’t guess only optimise what matters now for questions! #IBM @holly_cummins