Showing posts with label Java performance. Show all posts
Showing posts with label Java performance. Show all posts

Tuesday, July 24, 2012

JMeter is not good tool for finding Request per second

I played around with performance tools a lot in my last project and this was one blog which I never got to finish. My goal was to come with a tool which is best for benchmarking an application on Tomcat. While working with JMeters all the numbers compare to AB tool were low, I suspected Tomcat and started reading the tomcat performance tuning book and found below lines.

(Page 136 - Chapter 4 - Tomcat Performance Tuning ) - JMeter's HTTP client is slower than that of AB or siege. You can use JMeter to  find out if a change to your webapp, your tomcat installation or your JVM, accelerated or slows the response times of the webpage however you cannot use JMeter to determine the server's maximum number of requests per second that it can successfully serve because of JMeter's HTTP client appears to be slower than Tomcat's server code.


http://books.google.com/books?id=vJttHyVF0SUC&pg=PA136&lpg=PA136&dq=%22maximum+number+of+requests+per+second%22++jmeter&source=bl&ots=i-5yv-tLh0&sig=0JnmfPtq2PwKWdEDkfSamUQMBEg&hl=en&sa=X&ei=6L0QT-3DGc6lsQKmio2LBA&ved=0CDIQ6AEwAg#v=onepage&q=%22maximum%20number%20of%20requests%20per%20second%22%20%20jmeter&f=false

I thought it was really valuable find :)
Manisha 

Friday, June 1, 2012

Java Performance- Cookbook Part II

This is the continuation of my previous blog post. Last post I shared my chapter 2 notes and here we go with chapter 3.

This chapter provides an overview of the "Hotspot Java Virtual Machine" (VM) architecture and it's major component. There are three major components of the HotSpot VM
1) VM Runtime
2) JIT Compiler
3) and A Memory Manager (Runtime)

HotSpot VM High Level Architecture

HotSpot VM has a JIT compiler, client or server, is pluggable, as is the choice of garbage collector: Serial GC, Throughput, Concurrent or G1. VM also has a runtime (HotSpot VM Runtime) which provides services and common APIs to the JIT compilers and garbage collector. Runtime also provides basic functionality to the VM such as a launcher, thread management, Java Native Interface and so on.

VM yields high scalability by generating dynamic optimization. In other words it makes optimization decisions while the Java application running and generates high performing native machine instruction targeted for the underlying system architecture.

32-bit VMs are limited to 4 GB memory. It is important to note that the actual Java heap space available for 32-bit VM may be further limited depending on the underlying OS. For instance Windows OS the maximum Java heap available to the VM is around 1.5 GB. For Linux OS it is around 2.5 to 3.0 GB. The actual maximums vary due to memory address space consumed by both a given Java application and JVM version.

64-bit VM allows these systems to utilize more memory. However, with 64-bit VMs come a performance penalty due to an increase in size of the internal HotSpot VM's representation of Java objects, called Ordinary Object Pointers (OOPS), which have an increase in the width from 32-bit to 64-bit. 

Increase in width results in fewer OOPS being available on a CPU cache line. The decrease in CPU cache efficiency results in about 8% to 15 % performance degradation compare to 32-bit JVM. A new feature called compressed oops, which is enabled with the -XX:+UseCompressedOops VM command line option, can yield 32-bit JVM performance with the benefit of larger 64-bit Java Heaps. Java application realize better performance with a 64-bit VM using compressed oops than they achieve with 32-bit VM. Application executes faster with improved CPU cache utilization.

HotSport VM Runtime

VM Runtime encompasses many responsibilities, including parsing of command line arguments, VM life cycle, class loading, byte code interpreter, exception handling, synchronization, thread management etc.

a) Command Line Options
VM Runtime parses command line options and configures the HotSpot VM based on those options. There are 3 main categories of command line options.

1) Standard Options - These options are expected to be accepted by all JVM implementation and stable between releases.
2) Non-standard Options - These options are not required to be supported by all JVM and begin with a -X option.
3) Developer Options - It begins with -XX prefix.

b) VM Life Cycle
The HotSpot VM Runtime is responsible for launching and shutting-down the HotSpot VM. The component that starts the VM is called the "launcher" (java, javaw, javaws (java web start)). Below are the steps which launcher executes to start the VM.


1) Parse command line options, such as -client or -server, determines the JIT compiler to load.
2) Establish the Java Heap size and JIT Compiler Type.
3) Establish environment varibales such as LD_LIBRARY_PATH and CLASSPATH.
4) Look for MAIN-Class name from command line or from JAR's manifest file.
5) Create VM using the standard JNI method (JNI_CreateJavaVM).
6) Once the VM is created and initialized, the Java Main-class is loaded and launcher gets the Java main methods attributes from the java main-class.
7) Java main method is invoked in the VM using JNI method CallStaticVoidMethod passing it the marshaled arguments from command line.

Once the java program completes it's execution both method's and program's exit status must be passed back to their caller. The Java main method is detached from the VM using JNI's method DetachCurrentThread and  thread count decrements so the JNI can safely invoke DestroyJavaVM method.

Sorry could not finish typing my notes.. There are few lot more important stuff which I would love to remember from this chapter. More to come soon.

Manisha







Thursday, May 10, 2012

Java Performance- Cookbook Part I

I was involve in lots of performance related issues, so thought of reading "Java Performance" book to serve my clients better.
The Java Performance book is nice but umpteen times I felt it covers lots of things very superficially, having a link for further reading would have been nicer in my situation. Overall thumps up and would recommend every Java developer to have a copy on his/her bookshelf.

Things which I would like to remember (my notes) from Chapter 2 - Operating System Performance.

It's very important to understand the concepts before jumping deeper i.e. What is Performance Monitoring, Performance profiling and performance tuning. We will also look little bit into operating system monitoring command line and GUI tools.

Performance Monitoring
Performance monitoring is an act of collecting or observing performance data, from an application having performance issues with no sufficient information or clues to potential root cause.  This can be performed in any environment.

Performance Profiling
 Performance profiling in similar to monitoring activity with more narrow focus. Profiling is rarely done in Production environment. This is often an act that follows a monitoring activity that indicate some kind of performance issue.

Performance Tuning
 Performance tuning is an act of changing tune-ables, source code or configure attribute(s) for the purpose of improving application responsiveness or throughput. 

To reach highest performance we do need to understand, how CPU resources are utilized. Our goal should be to reduce kernel or system CPU utilization as much as possible i.e High kernel or system CPU utilization can be an indication of shared resource contention or a large number of interaction between I/O devices.

Stall
The stall is a scenario where CPU is reported as being utilized even though the CPU may be waiting for data to be fetched from memory. Stalls occurs any time the CPU executes an instruction and the data being operated on by the instruction is not readily available in CPU register or cache. For applications that are compute intensive, user has to observe IPC ( Instructions per clock , The number of CPU instructions per CPU clock cycle) or CPI ( Cycles per instructions, The number of CPU clock cycles per CPU instruction) to calculate it's wait. The target is to reduce number of stalls or improve the CPU's cache utilization so fewer CPU clock cycles are wasted waiting for data to be fetched from memory.

Lets look into few monitoring tools for Linux OS. The book has described tools for other operating systems, so if you need please reference the book :).

Monitoring CPU utilization on Linux
GNOME System Monitor tool can be used to monitor CPU utilization graphically. The tool can be launched with "gnome-system-monitor" command. Few Linux distributions may also include xosview tool.  One of the additional features of xosview in CPU utilization is further broken down into user CPU, kernel or system CPU and idle CPU.

Linux also provide vmstat and mpstat command line tools to monitor CPU utilization. 

VMSTAT
vmstat reports the summary of all CPU utilization , across all virtual processors (The number of virtual processors is the number of hardware threads on the system. It is also the value returned by the Java API, Runtime.availableProcessors() ), data collected since the system has last been booted.

MPSTAT
mpstat offers a tabular view of CPU utilization for each virtual processor. Most Linux distribution requires an installation of the sysstat package to use mpstat. Using mpstat to observe, per Virtual processor, CPU utilization can be useful in identifying whether an application has threads that tend to consume larger percentages of CPU cycles than other threads or whether application threads tend to utilize the same percentage of CPU cycles. The latter observed behavior usually  suggests an application that may scale better.

CPU Scheduler Run Queue
CPU scheduler run queue tell if the system is being saturated with work.   Run depth queue 3 or 4 times greater than the number of virtual processors over an extended time period should be considered an observation that requires immediate attention or action.
vmstat tool's first column reports the run queue depth. The number reported is the actual number of lightweighted processed in the run queue. Java programmers can realize better performance through choosing more efficient algorithms or data structure. In other words, explore alternative approaches that will result in fewer CPU cycles necessary to run the application such as reducing garbage collection frequency or alternative algorithms that result in fewer CPU instructions to execute the same work.

In addition to CPU utilization there are attributes of system's memory that should be monitored, such as paging or swapping activity, locking and voluntary and involuntary context switching along with thread migration activity.  Performance issues can be noticed for a JVM based application, if swapping activity is monitored.

Monitoring Memory Utilization on Linux
Swapping occurs when there is more memory being consumed by application running on the system than there is physical memory available. To deal with this situation, a system is usually configured with an area called swap space.  vmstat tool's can be used to monitor the system for swapping activity. top command and /proc/meminfo file can be used too.  The columns in vmstat to monitor are "si" and "so", which represents  the amount of memory paged-in and amount of memory paged-out.  In addition, the "free" column reports the amount of available free memory. It's important to observe the free memory when low and high paging activity is occurring at the same time.

Monitoring Lock Contention on Linux
pidstat command from sysstat package can be used to monitor lock contention. "pidstat -w" reports voluntary context switches in "cswch/s" column (The numbers are for all virtual processes.
The book has nice formula to calculate the percentage of clock cycles wasted). General guideline of 3% o 5% of clock cycles spent in voluntary context switches implies a Java application that may be suffering from lock contention. 

Monitoring Network I/O Utilization on Linux
Linux has netstat command line tool bundled with it's sysstat package. The netstat tool does not report network utilization. It reports numbers of packet sent and received per second along with errors and collisions. i.e. You can make out that network traffic is occurring but whether the network is 100% utilized or 1% utilized, is difficult to tell from netstat output.
A nicstat tool provides meaningful data under %Util column (The network interface utilization). We have to download and compile the source code before being able to use it on Linux.

Monitoring Disk I/O Utilization on Linux
Disk utilization is most useful monitoring statistic for understanding application's (writing to log, accessing database etc.). It is a measure of active disk I/O time.  iostat command tool can be used to monitor disk utilization.

If improved disk utilization is required several strategies may help.

At hardware and OS level
1. a faster storage device
2. Spreading file systems across multiple disks
3. Tuning the os to cache larger amounts of file system data structures

At application level:
1.  Reducing the number of read and write operations using buffered input and output streams or caching data structure into the application. Buffered data structured are available in the JDK that can easily be utilized.
2) Use non-blocking Java NIO (Grizzly project) instead of blocking java.net.Socket.  

SAR Command Line Tool on Linuz
With sar, you can select which data to collect such as user CPU utilization, system or kernel CPU utilization, number of system calls, memory paging and disk I/O statistics etc. Observing data collected over a longer period of time can help identifying trends that may provide early indications of pending performance concerns.

My notes are over :) for chapter 2. I read man pages for more information. Tomorrow will share notes for chapter 3.

Happy Reading
Manisha