Thursday, May 10, 2012

Java Performance- Cookbook Part I

I was involve in lots of performance related issues, so thought of reading "Java Performance" book to serve my clients better.
The Java Performance book is nice but umpteen times I felt it covers lots of things very superficially, having a link for further reading would have been nicer in my situation. Overall thumps up and would recommend every Java developer to have a copy on his/her bookshelf.

Things which I would like to remember (my notes) from Chapter 2 - Operating System Performance.

It's very important to understand the concepts before jumping deeper i.e. What is Performance Monitoring, Performance profiling and performance tuning. We will also look little bit into operating system monitoring command line and GUI tools.

Performance Monitoring
Performance monitoring is an act of collecting or observing performance data, from an application having performance issues with no sufficient information or clues to potential root cause.  This can be performed in any environment.

Performance Profiling
 Performance profiling in similar to monitoring activity with more narrow focus. Profiling is rarely done in Production environment. This is often an act that follows a monitoring activity that indicate some kind of performance issue.

Performance Tuning
 Performance tuning is an act of changing tune-ables, source code or configure attribute(s) for the purpose of improving application responsiveness or throughput. 

To reach highest performance we do need to understand, how CPU resources are utilized. Our goal should be to reduce kernel or system CPU utilization as much as possible i.e High kernel or system CPU utilization can be an indication of shared resource contention or a large number of interaction between I/O devices.

The stall is a scenario where CPU is reported as being utilized even though the CPU may be waiting for data to be fetched from memory. Stalls occurs any time the CPU executes an instruction and the data being operated on by the instruction is not readily available in CPU register or cache. For applications that are compute intensive, user has to observe IPC ( Instructions per clock , The number of CPU instructions per CPU clock cycle) or CPI ( Cycles per instructions, The number of CPU clock cycles per CPU instruction) to calculate it's wait. The target is to reduce number of stalls or improve the CPU's cache utilization so fewer CPU clock cycles are wasted waiting for data to be fetched from memory.

Lets look into few monitoring tools for Linux OS. The book has described tools for other operating systems, so if you need please reference the book :).

Monitoring CPU utilization on Linux
GNOME System Monitor tool can be used to monitor CPU utilization graphically. The tool can be launched with "gnome-system-monitor" command. Few Linux distributions may also include xosview tool.  One of the additional features of xosview in CPU utilization is further broken down into user CPU, kernel or system CPU and idle CPU.

Linux also provide vmstat and mpstat command line tools to monitor CPU utilization. 

vmstat reports the summary of all CPU utilization , across all virtual processors (The number of virtual processors is the number of hardware threads on the system. It is also the value returned by the Java API, Runtime.availableProcessors() ), data collected since the system has last been booted.

mpstat offers a tabular view of CPU utilization for each virtual processor. Most Linux distribution requires an installation of the sysstat package to use mpstat. Using mpstat to observe, per Virtual processor, CPU utilization can be useful in identifying whether an application has threads that tend to consume larger percentages of CPU cycles than other threads or whether application threads tend to utilize the same percentage of CPU cycles. The latter observed behavior usually  suggests an application that may scale better.

CPU Scheduler Run Queue
CPU scheduler run queue tell if the system is being saturated with work.   Run depth queue 3 or 4 times greater than the number of virtual processors over an extended time period should be considered an observation that requires immediate attention or action.
vmstat tool's first column reports the run queue depth. The number reported is the actual number of lightweighted processed in the run queue. Java programmers can realize better performance through choosing more efficient algorithms or data structure. In other words, explore alternative approaches that will result in fewer CPU cycles necessary to run the application such as reducing garbage collection frequency or alternative algorithms that result in fewer CPU instructions to execute the same work.

In addition to CPU utilization there are attributes of system's memory that should be monitored, such as paging or swapping activity, locking and voluntary and involuntary context switching along with thread migration activity.  Performance issues can be noticed for a JVM based application, if swapping activity is monitored.

Monitoring Memory Utilization on Linux
Swapping occurs when there is more memory being consumed by application running on the system than there is physical memory available. To deal with this situation, a system is usually configured with an area called swap space.  vmstat tool's can be used to monitor the system for swapping activity. top command and /proc/meminfo file can be used too.  The columns in vmstat to monitor are "si" and "so", which represents  the amount of memory paged-in and amount of memory paged-out.  In addition, the "free" column reports the amount of available free memory. It's important to observe the free memory when low and high paging activity is occurring at the same time.

Monitoring Lock Contention on Linux
pidstat command from sysstat package can be used to monitor lock contention. "pidstat -w" reports voluntary context switches in "cswch/s" column (The numbers are for all virtual processes.
The book has nice formula to calculate the percentage of clock cycles wasted). General guideline of 3% o 5% of clock cycles spent in voluntary context switches implies a Java application that may be suffering from lock contention. 

Monitoring Network I/O Utilization on Linux
Linux has netstat command line tool bundled with it's sysstat package. The netstat tool does not report network utilization. It reports numbers of packet sent and received per second along with errors and collisions. i.e. You can make out that network traffic is occurring but whether the network is 100% utilized or 1% utilized, is difficult to tell from netstat output.
A nicstat tool provides meaningful data under %Util column (The network interface utilization). We have to download and compile the source code before being able to use it on Linux.

Monitoring Disk I/O Utilization on Linux
Disk utilization is most useful monitoring statistic for understanding application's (writing to log, accessing database etc.). It is a measure of active disk I/O time.  iostat command tool can be used to monitor disk utilization.

If improved disk utilization is required several strategies may help.

At hardware and OS level
1. a faster storage device
2. Spreading file systems across multiple disks
3. Tuning the os to cache larger amounts of file system data structures

At application level:
1.  Reducing the number of read and write operations using buffered input and output streams or caching data structure into the application. Buffered data structured are available in the JDK that can easily be utilized.
2) Use non-blocking Java NIO (Grizzly project) instead of blocking  

SAR Command Line Tool on Linuz
With sar, you can select which data to collect such as user CPU utilization, system or kernel CPU utilization, number of system calls, memory paging and disk I/O statistics etc. Observing data collected over a longer period of time can help identifying trends that may provide early indications of pending performance concerns.

My notes are over :) for chapter 2. I read man pages for more information. Tomorrow will share notes for chapter 3.

Happy Reading