Java Performance Tooling

A presentation at TheServerSide Java Symposium in March 2008 in Las Vegas, NV, USA by Holly Cummins

Slide 1

Slide 1

Ja v a P e rf o rm a n c e T o o ls Dr Holly Cummins Tooling Technical Lead Java Technology Centre IBM cumminsh@uk.ibm.com

Slide 2

Slide 2

O u t l in e • Introduction • Identifying performance problems • Fixing performance problems – Performance tools for … • Space bound applications – IBM Monitoring and Diagnostic Tools for Java™ – GC and Memory Visualizer – IBM MDD4J • CPU bound applications – Method trace • I/O bound applications • Lock bound applications – IBM Lock Analyzer for Java

Slide 3

Slide 3

When would you us e a p erform ance tool? • When you have a performance problem!

Slide 4

Slide 4

What’s a perform ance p rob lem ? • It doesn’t go as fast as you think it ought to • It doesn’t go as fast as your users demand • It starts out fine and then after some period doesn’t go as fast as it used to • It hangs – This is a quite severe example of a performance problem

Slide 5

Slide 5

O u t l in e • Introduction • Assessing performance problems • Fixing performance problems – Performance tools for … • Space bound applications – IBM Monitoring and Diagnostic Tools for Java™ – GC and Memory Visualizer – IBM MDD4J • CPU bound applications – Method trace • I/O bound applications • Lock bound applications – IBM Lock Analyzer for Java

Slide 6

Slide 6

As s es s ing perform ance p roblem s • Performance must be measured before problems can be fixed – Otherwise you risk making things worse with a clever fix • We don’t provide a tool for this! – A performance tool cannot do your performance measurement for you • Performance measurement must be based on your application and your quality of service requirements – Throughput – Response times • Mean response time • 90th percentile response time • Worst-case response time

Slide 7

Slide 7

T h e p e r il s o f b e n c h m a r k s • Sometimes measuring the performance of your own application is difficult • Measuring the performance of a benchmark is not good enough – If it’s your application you care about, measure your application

Slide 8

Slide 8

T h e p e r i l s o f s im u l a t e d w o r k l o a d s • Generating a “real” workload can be hard in a test environment • Tuning a system against a simulated workload can be misleading – Example: garbage collection can be very sensitive to the exact distribution of object sizes and the pattern of connections between objects – Example: Insufficient variation in data can lead to artificially warm caches and disguise I/O bottlenecks • Care must be taken to ensure simulated workloads are sufficiently realistic

Slide 9

Slide 9

The p erils of inference • The performance metrics from performance tools cannot tell you how well your application is performing • Pause times cannot tell you what your application response times are • Time in GC cannot tell you how fast your application is running • • Generational garbage collectors often use more of the CPU but give better throughput, and shorter maximum response times A profiler may show more time is being spent in a method, but that may be because a change prompted the JIT to inline other methods, so total time may be reduced

Slide 10

Slide 10

How well is your application p erforming? • The simplest and most effective way to measure performance is to invoke System.currentTimeMillis() in a test harness to time properties of interest • Performance can be very variable, so measurements must be repeated • Allow unmeasured warm-up period – (If that’s how the application will run) – Allows caches to be populated and methods to be compiled

Slide 11

Slide 11

E x c e p t io n : U s e G C t o m e a s u r e th r ou g h p u t – Rate of garbage collection = rate of garbage generation – If the code doesn’t change, generating garbage faster is good, because garbage is a side effect of work • IBM Monitoring and Diagnostic Tools for Java – GC and Memory Visualizer reports the rate of garbage collection

Slide 12

Slide 12

O u t l in e • Introduction • Assessing performance problems • Fixing performance problems – Performance tools for … • Space bound applications – IBM Monitoring and Diagnostic Tools for Java™ – GC and Memory Visualizer – IBM MDD4J • CPU bound applications – Method trace • I/O bound applications • Lock bound applications – IBM Lock Analyzer for Java

Slide 13

Slide 13

Fixing p erform ance problem s • Performance problems are caused by limited resources • Which resource is limited? • Applications may be – CPU bound – I/O bound – Space bound – “Lock bound” (contended)

Slide 14

Slide 14

How to decid e which it is ? • CPU bound – CPU utilisation consistently high • I/O bound – CPU utilisation not consistently high • Lock bound – CPU utilisation not consistently high • Space bound – Any of the above! • These heuristics aren’t precise enough, so tools are required to guide diagnosis

Slide 15

Slide 15

IBM Perform ance T ools • IBM provides a number of tools to identify and fix performance bottlenecks • The tools are all freely available • Most – but not all – are targeted for IBM JVMs only • Tools available from – alphaWorks (alpha tools) – IBM Support Assistant (fully supported tools)

Slide 16

Slide 16

IBM Supp ort As s is tant (ISA) • Hosting for Serviceability Tools across product families • Automatic problem determination data gathering • Assist with opening PMR’s and working with IBM Support • Documentation: – Aggregated search across sources – Regular updates to Diagnostics Guide h ttp://www.ibm.com/so pp ort/isa /sofftware/su sup

Slide 17

Slide 17

O u t l in e • Introduction • Identifying performance problems • Fixing performance problems – Performance tools for … • Space bound applications – IBM Monitoring and Diagnostic Tools for Java™ – GC and Memory Visualizer – IBM MDD4J • CPU bound applications – Method trace • I/O bound applications • Lock bound applications – IBM Lock Analyzer for Java

Slide 18

Slide 18

Diagnos ing s pace bound ap plications • Space bound can be disguised as CPU bound – Java has garbage collection – If the GC is running excessively it will hog the CPU • Space-bound can also be disguised as I/O bound – Excessive “in use” footprint can cause • Paging • Cache misses • Enabling verbose garbage collection can quickly identify or rule out space issues – On IBM platforms, use -Xverbose:gc or -Xverbosegclog:$file to write directly to a file – Logs may be analyzed with a verbose gc analysis tool

Slide 19

Slide 19

O u t l in e • Introduction • Identifying performance problems • Fixing performance problems – Performance tools for … • Space bound applications – IBM Monitoring and Diagnostic Tools for Java™ – GC and Memory Visualizer – IBM MDD4J • CPU bound applications – Method trace • I/O bound applications • Lock bound applications – IBM Lock Analyzer for Java

Slide 20

Slide 20

T h e G C a n d M e m o r y V i s u a l iz e r • IBM Monitoring and Diagnostic Tools for Java – GC and Memory Visualizer (formerly known as EVTK) is a verbose GC analysis tool • Handles verbose GC from all versions of IBM JVMs – 1.4.2 and lower – 5.0 and higher – zSeries – iSeries – WebSphere real time • … and Solaris platforms • … and HP-UX platforms 20

Slide 21

Slide 21

GC and M em ory Vis ualizer cap abilities • Analyses heap usage, heap size, pause times, and many other properties • Provides tuning recommendations • Compares multiple logs in the same plots and reports • Many views on data – Reports – Graphs – Tables • Can save data to – HTML reports – JPEG pictures – CSV files

Slide 22

Slide 22

T he GC and M em ory Vis ualizer Heap Vis ualization Heap occupancy Pause times 22

Slide 23

Slide 23

T h e G C a n d M e m o r y V is u a l i z e r - C o m p a r i s o n & A d v ic e Performance advisor… Compare runs… 23

Slide 24

Slide 24

W h a t d o e s g a r b a g e c o l l e c t io n t e l l y o u ? • High heap occupancy indicates an application is likely space bound – Increasing heap size or lowering application footprint should improve performance • If GC is using more than 10% or 20% of the CPU action may be required – Alternate choice of policy – GC tuning

Slide 25

Slide 25

Don’t forget native m em ory • Java applications use – and may leak - native memory • Low occupancy is no guarantee an application is not space bound. • Native memory use is not logged in verbose GC • Memory pressure and even OutOfMemory errors may occur even though there is lots of room in the heap • Use platform-specific tools – Windows perfmon tool – Linux ps – AIX vmstat

Slide 26

Slide 26

W h e n s h o u l d y o u s ize t h e h e a p ? • If performance is important – Fixing the heap size prevents the JVM shrinking the heap when the memory usage drops and then having to re-grow when it increases again – Try -Xmaxf=100 option to allow growth but prevent shrinking • If the application uses a lot of memory – Most JVMs will avoid using all the memory on a box! – The IBM JVM has an upper limit of half the physical memory – If the application needs more than this intervention is required

Slide 27

Slide 27

D e m o n s t r a t io n : U s in g t h e G C a n d M e m o r y Vis ualizer to s ize the heap . • Sample application allocates many objects, keeps some, and regularly throws some away

Slide 28

Slide 28

T ry out various heap s izes • Some will be obviously bad • Most will seem fine

Slide 29

Slide 29

U s e t h e G C a n d M e m o r y V is u a l iz e r t o d e c i d e • Consider summary data and plotted data

Slide 30

Slide 30

T he trade-off b etween heap and p erform ance H ea p s iz e O c c u p a nc y 100 M B GC ov e r he a d T i m e ta k e n O u t O f M e m or y c r a s h 110 M B 89% 77% 30s 120 M B 82% 37% 9s 130 M B 75% 20% 9s 140 M B 69% 14% 8s 200 M B 49% 9% 7s 400 M B 24% 4% 7s 800 M B 12% 4% 7s

Slide 31

Slide 31

What’s the right heap s ize? • It depends! • What other demands are there for heap on the system? • Larger heaps generally give better performance – But … • Very large heaps give diminishing returns • Pause times will generally be longer with larger heaps and may be very long with enormous heaps • Some policies are more sensitive than others to heap size • As a rule of thumb, aim for no more than 70% used heap (occupancy) • 50% used heap is a good balance between improving performance and avoiding waste

Slide 32

Slide 32

A s s e s s in g F o o t p r in t • After you’ve sized the heap, is the footprint what you expect? • If not, why not? – Excessive caching – Excessive cloning – Bloated object structures • Solution may be to reduce application’s memory usage rather than increase the heap size • Sometimes the solution may be to increase application’s memory usage if it’s using less than expected – “If my footprint’s that small then I can cache all that stuff and speed up my application”

Slide 33

Slide 33

Diagnosing footprint is s ues • Understanding leaks and excessive footprint needs an understanding of what objects are on the heap – Take a heap or system dump – Heap dumps are triggered automatically on OutOfMemoryErrors – Dumps may be triggered with ctrl-break (windows) or kill -3 (unix) – Dumps may also be triggered on method entry and other events – Dumps may also be triggered programmatically • Once you have a dump, the dump can be analysed to discover what’s holding onto memory

Slide 34

Slide 34

O u t l in e • Introduction • Identifying performance problems • Fixing performance problems – Performance tools for … • Space bound applications – IBM Monitoring and Diagnostic Tools for Java™ – GC and Memory Visualizer – IBM MDD4J • CPU bound applications – Method trace • I/O bound applications • Lock bound applications – IBM Lock Analyzer for Java

Slide 35

Slide 35

M DD4J • Java Memory Analysis tool – Help explain / track down OutOfMemoryError – Footprint analysis – Performance problems when object use • 2 modes of use – Single snapshot – to visualize a given heap – Delta mode – to track growth between 2 points in time • Input data types supported – IBM Portable Heap Dump (heapdump.phd) – IBM Text heap dump (heapdump.txt) – HPROF heap dump format (hprof.txt) • Available through IBM Support Assistant 35

Slide 36

Slide 36

36

Slide 37

Slide 37

37

Slide 38

Slide 38

O u t l in e • Introduction • Identifying performance problems • Fixing performance problems – Performance tools for … • Space bound applications – IBM Monitoring and Diagnostic Tools for Java™ – GC and Memory Visualizer – IBM MDD4J • CPU bound applications – Method trace • I/O bound applications • Lock bound applications – IBM Lock Analyzer for Java

Slide 39

Slide 39

Diagnosing CPU bound ap plications • Code is being invoked more than it needs to be – Easily done with event-driven models • An algorithm is not the most efficient – Easily done without algorithms research! • Fixing CPU bound applications requires knowledge of what code is being run – Identify methods which are suitable for optimisation • Optimising methods which the application doesn’t spend time in is a waste of your time – Identify methods where more time is being spent than you expect • “Why is so much of my profile in calls to this trivial little method?”

Slide 40

Slide 40

M ethod trace and p rofiling • There are two ways to work out what code your application is doing – Trace – Profiling • Trace – Does not require specialist tools (but is better with them) – Records every invocation of a subset of methods – Gives insight into sequence of events – In the simplest case, System.out.println • Profiling – Requires specialist tools – Samples all methods and provides statistics

Slide 41

Slide 41

O u t l in e • Introduction • Identifying performance problems • Fixing performance problems – Performance tools for … • Space bound applications – IBM Monitoring and Diagnostic Tools for Java™ – GC and Memory Visualizer – IBM MDD4J • CPU bound applications – Method trace • I/O bound applications • Lock bound applications – IBM Lock Analyzer for Java

Slide 42

Slide 42

IBM Java m ethod trace • Traces any Java methods • Instrumentation-free, and no extra code required • No fancy GUI, but very very powerful • Detailed information: – Entry and Exit points, with thread information and microsecond time stamps Not overhead-free, but lower overhead than equivalent function implemented in Java

Slide 43

Slide 43

C o n t r o l l in g w h a t is t r a c e d • Can select methods on package, class or method name: • Package: methods={java/lang/} • Class: methods={java/lang/String.} • Method: methods={HelloWorld.main} • Also ! operator and combination allowed: – methods={java/lang/,!java/lang/String} • Possible to create huge volume of output, so use sensible method specifications!

Slide 44

Slide 44

T r i g g e r in g e v e n t s • Can request certain actions occur when chosen methods are entered or exited • Actions such as coredump, javadump, etc. • Actions such as enabling trace! • Can cause action to occur on n’th instance of trigger condition • Can specify how many times the action occurs • Multiple trigger types and actions can be specified

Slide 45

Slide 45

U s in g t r ig g e r in g t o t r a c e o n l y s o m e o f t h e t im e • Can start trace suspended, and resume / suspend it on matching method conditions • E.g. use start up option –Xtrace:resumecount=1 to start trace suspended. • Trigger={method{HelloWorld.main*,resumethis,suspendthis}} • This will cause the requested tracing to take effect only inside HelloWorld.main method • Less work than stepping through in a debugger and creates a permanent record

Slide 46

Slide 46

S u s p e n d / r e s u m e in a c t i o n

Slide 47

Slide 47

T r i g g e r i n g a n d M e t h o d T r a c e in A c t io n • -Xtrace:print=mt,methods={myapp/MyTime*},resumecount=1,trigger=method {myapp/MyTime.main,resume,suspend} 21:05:47.992*0x806cb00 mt.3 Bytecode static method 21:05:47.994 0x806cb00 mt.19 21:05:47.994 0x806cb00 mt.0 = 809baec 21:05:47.994 0x806cb00 mt.18 MyTime@55D8CBA8 arguments: () 21:05:47.994 0x806cb00 mt.6 21:05:47.994 0x806cb00 mt.0 809baf0 21:05:47.994 0x806cb00 mt.18 MyTime@55D8CBA8 arguments: () 21:05:48.079 0x806cb00 mt.6 21:05:48.079 0x806cb00 mt.9 Bytecode static method         

myapp/MyTime.main([Ljava/lang/String;)V - Static method arguments: ([L@55D8CB98) > myapp/MyTime.<init>()V Bytecode method, This - Instance method receiver: myapp/ < myapp/MyTime.<init>()V Bytecode method > myapp/MyTime.test()V Bytecode method, This = - Instance method receiver: myapp/ < myapp/MyTime.test()V Bytecode method < myapp/MyTime.main([Ljava/lang/String;)V • Only real time (79ms) is in the call to MyTime.test() • Could now drill down into MyTime.test() 47

Slide 48

Slide 48

T r i g g e r i n g a n d M e t h o d T r a c e in A c t io n • Drill down into MyTime.test(): • Extend scope of methods traced, and reduce scope of tracing into MyTime.test() • -Xtrace:print=mt,methods={myapp/},resumecount=1,trigger=method{myapp/ MyTime.test,resume,suspend} 21:07:14.9680x806cb00 809baf0 mt.0

myapp/MyTime.test()V Bytecode method, This = 21:07:14.970 0x806cb00 mt.18 MyTime@55D8CBA8 arguments: ()

  • Instance method receiver: myapp/ 21:07:15.067 0x806cb00 method mt.3

myapp/MyTimer.getTime()V Bytecode static 21:07:15.067 0x806cb00 mt.19 21:07:15.067 0x806cb00 method mt.9 21:07:15.069 0x806cb00 mt.6

  • Static method arguments: () < myapp/MyTimer.getTime()V Bytecode static < myapp/MyTime.test()V Bytecode method 48

Slide 49

Slide 49

O th e r u s e s o f t r a c e • Can count tracepoints using • java -Xtrace:count={tracepoint_selection} Class • This is almost like a sampling profiler

Slide 50

Slide 50

O u t l in e • Introduction • Identifying performance problems • Fixing performance problems – Performance tools for … • Space bound applications – IBM Monitoring and Diagnostic Tools for Java™ – GC and Memory Visualizer – IBM MDD4J • CPU bound applications – Method trace • I/O bound applications • Lock bound applications – IBM Lock Analyzer for Java

Slide 51

Slide 51

Diagnos ing I/O-bound ap plications • A number of tools may be required to isolate the causes of I/O delays • Use the GC and Memory Visualizer to check sweep times – Sweep times should be very short – Long sweep times indicate access to memory is slow – This indicates the application is probably paging • Use method trace to trace calls to network and disk I/O 51

Slide 52

Slide 52

O u t l in e • Introduction • Identifying performance problems • Fixing performance problems – Performance tools for … • Space bound applications – IBM Monitoring and Diagnostic Tools for Java™ – GC and Memory Visualizer – IBM MDD4J • CPU bound applications – Method trace • I/O bound applications • Lock bound applications – IBM Lock Analyzer for Java

Slide 53

Slide 53

Diagnosing lock bound ap plications • Infelicitous synchronization can cause significant application delays • IBM provides a tool to quickly diagnose and identify contended locks – A contended lock is the opposite of a contented lock! 53

Slide 54

Slide 54

O u t l in e • Introduction • Identifying performance problems • Fixing performance problems – Performance tools for … • Space bound applications – IBM Monitoring and Diagnostic Tools for Java™ – GC and Memory Visualizer – IBM MDD4J • CPU bound applications – Method trace • I/O bound applications • Lock bound applications – IBM Lock Analyzer for Java

Slide 55

Slide 55

IBM Lock Analyzer for Ja Javva • Download from http://www.alphaworks.ibm.com/tech/jla • JLA provides profiling data on monitors used in Java applications and the JVM: – Counters associated with contended locks – Total number of successful acquires – Recursive acquires – Frequency with which a thread had to block waiting on the monitor – Cumulative time the monitor was held. – For platforms that support 3 Tier Spin Locking the following are also collected • Number of times the requesting thread went through the inner 55 (spin loop) while attempting to acquire the monitor. • Number of times the requesting thread went through the outer (thread yield loop) while attempting to acquire the monitor.

Slide 56

Slide 56

IBM Lock Analyzer for Java 56

Slide 57

Slide 57

W h a t d o th e b a r s m e a n ? • The Lock Analyzer provides very detailed information on locking and synchronization in the table below the chart • In most cases the chart will be enough • The height of the bar indicates how often threads were blocked waiting for the lock • The colour of the bar indicates what fraction of the attempts were unsuccessful

Slide 58

Slide 58

C o n cl u s io n s • Improving application performance starts with identifying limited resources • Tools can help fix performance bottlenecks – Space bound • GC and Memory Visualizer • MDD4J – CPU bound • Method tracing – Lock bound • Lock Analyzer for Java

Slide 59

Slide 59

• The following terms are trademarks of International Business Machines Corporation in the United States, other countries, or both: – IBM – z/OS – PowerPC – WebSphere • Java and all Java-based trademarks are trademarks of Sun Microsystems, Inc. in the United States, other countries, or both. • Solaris is a trademark of Sun Microsystems, Inc. • Intel is a trademarks of Intel Corporation or its subsidiaries in the United States, other countries, or both

Slide 60

Slide 60

A n y Q u e s t io n s ? 60