Trade-Offs, Bad Science, and Polar Bears—The World of Java Optimization

A presentation at O’Reilly Open Source Software Superstream Series: Java—From Java 17 to the Cloud and Beyond in July 2021 in by Holly Cummins

Slide 1

Slide 1

tradeoffs, bad science, and polar bears: the world of java optimisation Holly Cummins IBM @holly_cummins

Slide 2

Slide 2

pulse check hi! have you tried optimising your code? #IBM @holly_cummins

Slide 3

Slide 3

why optimise? #IBM @holly_cummins

Slide 4

Slide 4

why optimise? #IBM @holly_cummins

Slide 5

Slide 5

0.5s extra search page time why optimise? #IBM @holly_cummins

Slide 6

Slide 6

0.5s extra search page time 20% drop in traf c why optimise? fi #IBM @holly_cummins

Slide 7

Slide 7

0.5s extra search page time 20% drop in traf c 100 ms latency on page load why optimise? fi #IBM @holly_cummins

Slide 8

Slide 8

0.5s extra search page time 20% drop in traf c 100 ms latency on page load 7% lower conversion rate why optimise? fi #IBM @holly_cummins

Slide 9

Slide 9

0.5s extra search page time 20% drop in traf c 100 ms latency on page load 7% lower conversion rate why optimise? fi #IBM @holly_cummins

Slide 10

Slide 10

0.5s extra search page time 20% drop in traf c 100 ms latency on page load 7% lower conversion rate 10 ms delay in trading platform fi #IBM why optimise? @holly_cummins

Slide 11

Slide 11

0.5s extra search page time 20% drop in traf c 100 ms latency on page load 7% lower conversion rate 10 ms delay in trading platform fi #IBM 10% drop in revenue why optimise? @holly_cummins

Slide 12

Slide 12

what is optimising? #IBM @holly_cummins

Slide 13

Slide 13

“make it go faster” for whom? when? doing what? #IBM @holly_cummins

Slide 14

Slide 14

design thinking #IBM @holly_cummins

Slide 15

Slide 15

#IBM @holly_cummins

Slide 16

Slide 16

performance can be: #IBM @holly_cummins

Slide 17

Slide 17

performance can be: throughput #IBM @holly_cummins

Slide 18

Slide 18

performance can be: throughput #IBM transactions per second @holly_cummins

Slide 19

Slide 19

performance can be: throughput transactions per second latency #IBM @holly_cummins

Slide 20

Slide 20

performance can be: throughput latency #IBM transactions per second start-up time @holly_cummins

Slide 21

Slide 21

performance can be: transactions per second throughput latency #IBM response time start-up time @holly_cummins

Slide 22

Slide 22

performance can be: transactions per second throughput latency #IBM response time ramp-up time start-up time @holly_cummins

Slide 23

Slide 23

performance can be: transactions per second throughput latency response time ramp-up time start-up time capacity #IBM @holly_cummins

Slide 24

Slide 24

performance can be: transactions per second throughput latency capacity #IBM ramp-up time response time start-up time footprint @holly_cummins

Slide 25

Slide 25

performance can be: transactions per second throughput latency capacity ramp-up time response time start-up time footprint CPU usage #IBM @holly_cummins

Slide 26

Slide 26

performance can be: transactions per second throughput latency capacity utilisation #IBM ramp-up time response time start-up time footprint CPU usage @holly_cummins

Slide 27

Slide 27

performance can be: transactions per second throughput latency capacity utilisation ramp-up time response time start-up time footprint CPU usage … #IBM @holly_cummins

Slide 28

Slide 28

Never underestimate the bandwidth [throughput] of a station wagon full of tapes hurtling down the highway. –Andrew Tanenbaum, 1981 #IBM @holly_cummins

Slide 29

Slide 29

Never underestimate the bandwidth [throughput] of a station wagon full of tapes hurtling down the highway. –Andrew Tanenbaum, 1981 but the latency is terrible … #IBM @holly_cummins

Slide 30

Slide 30

requirements change #IBMGarage @holly_cummins

Slide 31

Slide 31

#IBMGarage @holly_cummins

Slide 32

Slide 32

#IBMGarage @holly_cummins

Slide 33

Slide 33

#IBMGarage @holly_cummins

Slide 34

Slide 34

#IBMGarage @holly_cummins

Slide 35

Slide 35

I am not designed for this. #IBMGarage @holly_cummins

Slide 36

Slide 36

the world changes #IBMGarage @holly_cummins

Slide 37

Slide 37

#IBM @holly_cummins

Slide 38

Slide 38

-Xmx == $ #IBM @holly_cummins

Slide 39

Slide 39

-Xmx == $ footprint #IBM @holly_cummins

Slide 40

Slide 40

#IBM @holly_cummins

Slide 41

Slide 41

pulse check who’s tried openj9? #IBM @holly_cummins

Slide 42

Slide 42

which performs better? #IBM @holly_cummins

Slide 43

Slide 43

pulse check did it perform “better”? #IBM @holly_cummins

Slide 44

Slide 44

quarkus trading-off flexibility against startup speed and footprint #IBM @holly_cummins

Slide 45

Slide 45

quarkus trading-off flexibility against startup speed and footprint uhh … are you supposed to shut down applications after using them? #IBM @holly_cummins

Slide 46

Slide 46

pulse check who’s tried quarkus? #IBM @holly_cummins

Slide 47

Slide 47

behaviour at idle 30% of VMs are zombies (antithesisgroup.com) #IBM @holly_cummins

Slide 48

Slide 48

how to optimise? #IBM @holly_cummins

Slide 49

Slide 49

fi find the bottleneck. x it. #IBM @holly_cummins

Slide 50

Slide 50

pitfall 1 intuition #IBM @holly_cummins

Slide 51

Slide 51

this is not the place for ideas #IBM @holly_cummins

Slide 52

Slide 52

measure, don’t guess. #IBM @holly_cummins

Slide 53

Slide 53

measure the right thing #IBM @holly_cummins

Slide 54

Slide 54

measure the right thing what do your users care about? #IBM @holly_cummins

Slide 55

Slide 55

pitfall 2 numbers #IBM @holly_cummins

Slide 56

Slide 56

#IBM @holly_cummins

Slide 57

Slide 57

leading indicators #IBM @holly_cummins

Slide 58

Slide 58

leading indicators #IBM lagging indicators @holly_cummins

Slide 59

Slide 59

leading indicators lagging indicators we care about them #IBM @holly_cummins

Slide 60

Slide 60

leading indicators lagging indicators we care about them easy to measure #IBM @holly_cummins

Slide 61

Slide 61

leading indicators lagging indicators we care about them easy to measure hard to change #IBM @holly_cummins

Slide 62

Slide 62

#IBM leading indicators lagging indicators easy to change we care about them easy to measure hard to change @holly_cummins

Slide 63

Slide 63

leading indicators lagging indicators predictive of a thing we care about we care about them easy to measure hard to change easy to change #IBM @holly_cummins

Slide 64

Slide 64

#IBM leading indicators lagging indicators predictive of a thing we care about hard to identify easy to change we care about them easy to measure hard to change @holly_cummins

Slide 65

Slide 65

#IBM leading indicators lagging indicators predictive of a thing we care about hard to identify easy to change we care about them easy to measure hard to change @holly_cummins

Slide 66

Slide 66

caution: performance experiments for entertainment purposes only. do not try these at home. #IBM @holly_cummins

Slide 67

Slide 67

2007 #IBM @holly_cummins

Slide 68

Slide 68

bad-ish advice: “reduce time spent in garbage collection” #IBM @holly_cummins

Slide 69

Slide 69

bad-ish advice: “reduce time spent in garbage collection” actually, garbage collection can make your application go faster #IBM @holly_cummins

Slide 70

Slide 70

2007 #IBM @holly_cummins

Slide 71

Slide 71

2007 #IBM @holly_cummins

Slide 72

Slide 72

2021 #IBM @holly_cummins

Slide 73

Slide 73

2021 #IBM @holly_cummins

Slide 74

Slide 74

-verbose:gc -Xverbosegclog:gclog.xml -Xcompactgc #IBM @holly_cummins

Slide 75

Slide 75

-verbose:gc -Xverbosegclog:gclog.xml -Xgcpolicy:optthruput -Xcompactgc #IBM @holly_cummins

Slide 76

Slide 76

-verbose:gc -Xverbosegclog:gclog.xml -Xgcpolicy:optthruput -Xmx110m -Xms110m -Xnocompactgc #IBM @holly_cummins

Slide 77

Slide 77

-verbose:gc -Xverbosegclog:gclog.xml -Xgcpolicy:optthruput -Xmx160m -Xms160m -Xnocompactgc #IBM @holly_cummins

Slide 78

Slide 78

-verbose:gc -Xverbosegclog:gclog.xml -Xgcpolicy:optthruput -Xmx300m -Xms300m -Xcompactgc why does the performance stay exactly the same no matter what gc settings I choose? #IBM @holly_cummins

Slide 79

Slide 79

by the way, this is cheating. (remember the ‘bad science’?) #IBM @holly_cummins

Slide 80

Slide 80

-verbose:gc #IBM @holly_cummins

Slide 81

Slide 81

Slide 82

Slide 82

Slide 83

Slide 83

Slide 84

Slide 84

Slide 85

Slide 85

Slide 86

Slide 86

Slide 87

Slide 87

Slide 88

Slide 88

Slide 89

Slide 89

Slide 90

Slide 90

4.1% of time in GC pause 23.9 GB garbage collected 493 transactions/s total GC time: 12.0s 3.6% of time in GC pause 13.0 GB garbage collected 260 transactions/s

Slide 91

Slide 91

total GC time: 21.6s 4.1% of time in GC pause 23.9 GB garbage collected 493 transactions/s #IBM total GC time: 12.0s 3.6% of time in GC pause 13.0 GB garbage collected 260 transactions/s @holly_cummins

Slide 92

Slide 92

leading indicator total GC time: 21.6s 4.1% of time in GC pause 23.9 GB garbage collected 493 transactions/s #IBM total GC time: 12.0s 3.6% of time in GC pause 13.0 GB garbage collected 260 transactions/s @holly_cummins

Slide 93

Slide 93

leading indicator total GC time: 21.6s 4.1% of time in GC pause 23.9 GB garbage collected 493 transactions/s #IBM total GC time: 12.0s 3.6% of time in GC pause 13.0 GB garbage collected 260 transactions/s @holly_cummins

Slide 94

Slide 94

lagging indicator leading indicator total GC time: 21.6s 4.1% of time in GC pause 23.9 GB garbage collected 493 transactions/s #IBM total GC time: 12.0s 3.6% of time in GC pause 13.0 GB garbage collected 260 transactions/s @holly_cummins

Slide 95

Slide 95

lagging indicator leading indicator ? total GC time: 21.6s 4.1% of time in GC pause 23.9 GB garbage collected 493 transactions/s #IBM total GC time: 12.0s 3.6% of time in GC pause 13.0 GB garbage collected 260 transactions/s @holly_cummins

Slide 96

Slide 96

lagging indicator ? leading indicator ? total GC time: 21.6s 4.1% of time in GC pause 23.9 GB garbage collected 493 transactions/s #IBM total GC time: 12.0s 3.6% of time in GC pause 13.0 GB garbage collected 260 transactions/s @holly_cummins

Slide 97

Slide 97

“Any improvements made anywhere besides the bottleneck are an illusion.” – Gene Kim #IBM @holly_cummins

Slide 98

Slide 98

time kills all performance advice (even mine) #IBM @holly_cummins

Slide 99

Slide 99

gc can improve performance by rearranging the heap find the bottleneck validate advice independently #IBM @holly_cummins

Slide 100

Slide 100

pitfall 3 advice #IBM @holly_cummins

Slide 101

Slide 101

I read it on the internet! #IBM @holly_cummins

Slide 102

Slide 102

noooooo! “make one big method because method dispatching is slow” #IBM @holly_cummins

Slide 103

Slide 103

noooooo! “re-use your objects to help the garbage collector” #IBM @holly_cummins

Slide 104

Slide 104

noooooo! “to tune your JVM, use this command-line:” -server -Xms1g -Xmx1g -XX:PermSize=1g -XX:MaxPermSize=256m -Xmn256m -Xss64k -XX:SurvivorRatio=30 -XX:+UseConcMarkSweepGC -XX: +CMSParallelRemarkEnabled -XX:+UseCMSInitiatingOccupancyOnly -XX:CMSInitiatingOccupancyFraction=10 -XX:+ScavengeBeforeFullGC -XX: +CMSScavengeBeforeRemark -XX:+PrintGCDateStamps -verbose:gc -XX: +PrintGCDetails -Dsun.net.inetaddr.ttl=5 -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=date.hprof -Dcom.sun.management.jmxremote.port=5616 -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -server -Xms2g -Xmx2g -XX:MaxPermSize=256m -XX:NewRatio=1 -XX:+UseConcMarkSweepGC #IBM @holly_cummins

Slide 105

Slide 105

noooooo! use StringBuilder, never concatenate strings with += #IBM @holly_cummins

Slide 106

Slide 106

noooooo! wait, what? yes, right? use StringBuilder, never concatenate strings with += #IBM @holly_cummins

Slide 107

Slide 107

2 things ruin advice: context time #IBM @holly_cummins

Slide 108

Slide 108

pitfall 4 micro-optimisation #IBM @holly_cummins

Slide 109

Slide 109

#IBM @holly_cummins

Slide 110

Slide 110

static string beSlow() { string result = “”; for (int i = 0; i < 314159; i++) { result += getStringData(i); } return result; } #IBM @holly_cummins

Slide 111

Slide 111

@Override public String toString() { String ret = “\n\tMarket Summary at: ” + getSummaryDate() + “\n\t\t TSIA:” + getTSIA() + “\n\t\t openTSIA:” + getOpenTSIA() + “\n\t\t gain:” + getGainPercent() + “\n\t\t volume:” + getVolume(); if ((getTopGainers() == null) || (getTopLosers() == null)) { return ret; } ret += “\n\t\t Current Top Gainers:”; Iterator<QuoteDataBean> it = getTopGainers().iterator(); while (it.hasNext()) { QuoteDataBean quoteData = it.next(); ret += (“\n\t\t\t” + quoteData.toString()); } ret += “\n\t\t Current Top Losers:”; it = getTopLosers().iterator(); while (it.hasNext()) { QuoteDataBean quoteData = it.next(); ret += (“\n\t\t\t” + quoteData.toString()); } return ret; } #IBM @holly_cummins

Slide 112

Slide 112

@Override public String toString() { String ret = “\n\tMarket Summary at: ” + getSummaryDate() + “\n\t\t TSIA:” + getTSIA() + “\n\t\t openTSIA:” + getOpenTSIA() + “\n\t\t gain:” + getGainPercent() + “\n\t\t volume:” + getVolume(); if ((getTopGainers() == null) || (getTopLosers() == null)) { return ret; } ret += “\n\t\t Current Top Gainers:”; Iterator<QuoteDataBean> it = getTopGainers().iterator(); while (it.hasNext()) { QuoteDataBean quoteData = it.next(); ret += (“\n\t\t\t” + quoteData.toString()); } ret += “\n\t\t Current Top Losers:”; it = getTopLosers().iterator(); while (it.hasNext()) { QuoteDataBean quoteData = it.next(); ret += (“\n\t\t\t” + quoteData.toString()); } return ret; } #IBM @holly_cummins

Slide 113

Slide 113

@Override public String toString() { String ret = “\n\tMarket Summary at: ” + getSummaryDate() + “\n\t\t TSIA:” + getTSIA() + “\n\t\t openTSIA:” + getOpenTSIA() + “\n\t\t gain:” + getGainPercent() + “\n\t\t volume:” + getVolume(); if ((getTopGainers() == null) || (getTopLosers() == null)) { return ret; } ret += “\n\t\t Current Top Gainers:”; Iterator<QuoteDataBean> it = getTopGainers().iterator(); while (it.hasNext()) { QuoteDataBean quoteData = it.next(); ret += (“\n\t\t\t” + quoteData.toString()); } ret += “\n\t\t Current Top Losers:”; it = getTopLosers().iterator(); while (it.hasNext()) { QuoteDataBean quoteData = it.next(); ret += (“\n\t\t\t” + quoteData.toString()); } return ret; } #IBM @holly_cummins

Slide 114

Slide 114

this never gets called @Override public String toString() { String ret = “\n\tMarket Summary at: ” + getSummaryDate() + “\n\t\t TSIA:” + getTSIA() + “\n\t\t openTSIA:” + getOpenTSIA() + “\n\t\t gain:” + getGainPercent() + “\n\t\t volume:” + getVolume(); if ((getTopGainers() == null) || (getTopLosers() == null)) { return ret; } ret += “\n\t\t Current Top Gainers:”; Iterator<QuoteDataBean> it = getTopGainers().iterator(); while (it.hasNext()) { QuoteDataBean quoteData = it.next(); ret += (“\n\t\t\t” + quoteData.toString()); } ret += “\n\t\t Current Top Losers:”; it = getTopLosers().iterator(); while (it.hasNext()) { QuoteDataBean quoteData = it.next(); ret += (“\n\t\t\t” + quoteData.toString()); } return ret; } #IBM @holly_cummins

Slide 115

Slide 115

let’s make travel energy-efficient? #IBM @holly_cummins

Slide 116

Slide 116

every little helps? #IBM @holly_cummins

Slide 117

Slide 117

every little helps? every optimisation is another optimisation you aren’t doing #IBM @holly_cummins

Slide 118

Slide 118

our platforms help #IBM @holly_cummins

Slide 119

Slide 119

static string beSlow() { string result = “”; for (int i = 0; i < 314159; i++) { result += getStringData(i); } return result; } #IBM @holly_cummins

Slide 120

Slide 120

static string beSlow() { string result = “”; result += getStringData(1); result += getStringData(2); result += getStringData(3); } #IBM return result; @holly_cummins

Slide 121

Slide 121

static string beSlow() { string result = “”; result += getStringData(1); result += getStringData(2); result += getStringData(3); } #IBM return result; this is fine @holly_cummins

Slide 122

Slide 122

the JVM writers have far more time for optimising than you do clean, typical, code runs best #IBM @holly_cummins

Slide 123

Slide 123

ok, but how to optimise? #IBM @holly_cummins

Slide 124

Slide 124

tools #IBM @holly_cummins

Slide 125

Slide 125

“What you can optimize is limited to what you can observe.” -Susie Xia, Netflix #IBM @holly_cummins

Slide 126

Slide 126

observability #IBM @holly_cummins

Slide 127

Slide 127

method profiler GC analysis heap analysis APM distributed tracing * not free #IBM this is an incomplete list, because there are a lot of tools out there, and many cost money @holly_cummins

Slide 128

Slide 128

method profiler VisualVM GC analysis heap analysis APM distributed tracing * not free #IBM this is an incomplete list, because there are a lot of tools out there, and many cost money @holly_cummins

Slide 129

Slide 129

method profiler VisualVM Mission Control GC analysis heap analysis APM distributed tracing * not free #IBM this is an incomplete list, because there are a lot of tools out there, and many cost money @holly_cummins

Slide 130

Slide 130

method profiler VisualVM Mission Control GC analysis IBM Health Center (for OpenJ9) heap analysis APM distributed tracing * not free #IBM this is an incomplete list, because there are a lot of tools out there, and many cost money @holly_cummins

Slide 131

Slide 131

method profiler flame graphs VisualVM Mission Control GC analysis IBM Health Center (for OpenJ9) heap analysis APM distributed tracing * not free #IBM this is an incomplete list, because there are a lot of tools out there, and many cost money @holly_cummins

Slide 132

Slide 132

method profiler flame graphs VisualVM Mission Control GC analysis IBM Health Center (for OpenJ9) GCMV heap analysis APM distributed tracing * not free #IBM this is an incomplete list, because there are a lot of tools out there, and many cost money @holly_cummins

Slide 133

Slide 133

method profiler flame graphs VisualVM Mission Control GC analysis IBM Health Center (for OpenJ9) GCMV heap analysis Eclipse MAT APM distributed tracing * not free #IBM this is an incomplete list, because there are a lot of tools out there, and many cost money @holly_cummins

Slide 134

Slide 134

method profiler flame graphs VisualVM Mission Control GC analysis GCMV heap analysis APM IBM Health Center (for OpenJ9) Eclipse MAT GlowRoot distributed tracing * not free #IBM this is an incomplete list, because there are a lot of tools out there, and many cost money @holly_cummins

Slide 135

Slide 135

method profiler flame graphs VisualVM Mission Control GC analysis GCMV heap analysis APM IBM Health Center (for OpenJ9) GlowRoot Eclipse MAT New Relic* distributed tracing * not free #IBM this is an incomplete list, because there are a lot of tools out there, and many cost money @holly_cummins

Slide 136

Slide 136

method profiler flame graphs VisualVM Mission Control GC analysis GCMV heap analysis APM IBM Health Center (for OpenJ9) GlowRoot Eclipse MAT AppDynamics* New Relic* distributed tracing * not free #IBM this is an incomplete list, because there are a lot of tools out there, and many cost money @holly_cummins

Slide 137

Slide 137

method profiler flame graphs VisualVM Mission Control GC analysis GCMV heap analysis APM IBM Health Center (for OpenJ9) GlowRoot Eclipse MAT AppDynamics* New Relic* Dynatrace* distributed tracing * not free #IBM this is an incomplete list, because there are a lot of tools out there, and many cost money @holly_cummins

Slide 138

Slide 138

method profiler flame graphs VisualVM Mission Control GC analysis GCMV heap analysis APM IBM Health Center (for OpenJ9) GlowRoot distributed tracing Eclipse MAT AppDynamics* New Relic* Dynatrace* Zipkin

  • not free #IBM this is an incomplete list, because there are a lot of tools out there, and many cost money @holly_cummins

Slide 139

Slide 139

method profiler flame graphs VisualVM IBM Health Center (for OpenJ9) Mission Control GC analysis GCMV heap analysis APM GlowRoot distributed tracing Eclipse MAT AppDynamics* New Relic* Zipkin Dynatrace* Jaeger

  • not free #IBM this is an incomplete list, because there are a lot of tools out there, and many cost money @holly_cummins

Slide 140

Slide 140

optimising a micro-service: is that micro-optimising? Netflix microservice architecture #IBM @holly_cummins

Slide 141

Slide 141

you may need to know the whole system context to know what to optimise #IBMGarage @holly_cummins

Slide 142

Slide 142

“Nines don’t matter if your users aren’t happy.” – Charity Majors #IBM @holly_cummins

Slide 143

Slide 143

don’t forget the edges queueing theory helps us understand where the disasters happen #IBM @holly_cummins

Slide 144

Slide 144

“When it comes to IT performance, amateurs look at averages. Professionals look at distributions.” – Avishai Ish-Shalom #IBM @holly_cummins

Slide 145

Slide 145

slow performance can turn into big cloud bills make cloud costs visible to engineers #IBM @holly_cummins

Slide 146

Slide 146

ok, but you promised bears #IBM @holly_cummins

Slide 147

Slide 147

if you leave the TV on when you’re not using it, you’re a polar bear murderer #IBM @holly_cummins

Slide 148

Slide 148

there is a moral imperative to avoid waste #IBM @holly_cummins

Slide 149

Slide 149

there is a moral imperative to avoid waste electricity hardware #IBM @holly_cummins

Slide 150

Slide 150

data centres use 1-2% of the world’s electricity #IBM @holly_cummins

Slide 151

Slide 151

fewer devices longer lifetime #IBM @holly_cummins

Slide 152

Slide 152

higher ef ciency fewer devices longer lifetime @holly_cummins fi #IBM

Slide 153

Slide 153

higher ef ciency fewer devices lower footprint longer lifetime @holly_cummins fi #IBM

Slide 154

Slide 154

higher ef ciency fewer devices lower footprint more multitenancy longer lifetime @holly_cummins fi #IBM

Slide 155

Slide 155

higher ef ciency fewer devices lower footprint more multitenancy longer lifetime @holly_cummins fi #IBM optimise for longevity

Slide 156

Slide 156

higher ef ciency fewer devices lower footprint more multitenancy longer lifetime the end of planned obsolescence? @holly_cummins fi #IBM optimise for longevity

Slide 157

Slide 157

sooo … you can optimise, and it can be fun measure, don’t guess only optimise what matters now for questions! #IBM @holly_cummins