Investigating CPU usage in a Java Process
1. Determine the process ID (PID) of the affected server process using the following command:
$ jps –v
2. Determine which thread in the process identified in step 1 is consuming the CPU:
$ top -n 1 -H -p [pid]
Output:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 20508 appuser 20 0 21.0g 18.4g 25220 S 56.2 30.8 0:40.99 http-bio-8080-e 23771 appuser 20 0 21.0g 18.4g 25220 S 50.0 30.8 7:35.04 http-bio-8080-e 23774 appuser 20 0 21.0g 18.4g 25220 S 43.8 30.8 8:33.02 http-bio-8080-e 9138 appuser 20 0 21.0g 18.4g 25220 R 37.5 30.8 60:31.64 GCThread#0 9492 appuser 20 0 21.0g 18.4g 25220 R 31.2 30.8 60:25.19 GCThread#1 9494 appuser 20 0 21.0g 18.4g 25220 R 31.2 30.8 60:19.80 GCThread#3 9495 appuser 20 0 21.0g 18.4g 25220 R 31.2 30.8 60:24.35 GCThread#4 9497 appuser 20 0 21.0g 18.4g 25220 R 31.2 30.8 60:20.89 GCThread#6 9498 appuser 20 0 21.0g 18.4g 25220 R 31.2 30.8 60:26.13 GCThread#7 26531 appuser 20 0 21.0g 18.4g 25220 S 31.2 30.8 6:57.19 http-bio-8080-e 9145 appuser 20 0 21.0g 18.4g 25220 S 25.0 30.8 23:00.93 G1Refine#0 12515 appuser 20 0 21.0g 18.4g 25220 S 25.0 30.8 15:02.56 http-bio-8080-e 16481 appuser 20 0 21.0g 18.4g 25220 S 25.0 30.8 9:03.94 http-bio-8080-e 23791 appuser 20 0 21.0g 18.4g 25220 S 25.0 30.8 9:15.38 http-bio-8080-e 25573 appuser 20 0 21.0g 18.4g 25220 S 25.0 30.8 7:25.01 http-bio-8080-e 5121 appuser 20 0 21.0g 18.4g 25220 S 25.0 30.8 1:44.61 http-bio-8080-e 5962 appuser 20 0 21.0g 18.4g 25220 S 25.0 30.8 2:55.04 http-bio-8080-e 6933 appuser 20 0 21.0g 18.4g 25220 S 25.0 30.8 3:07.66 http-bio-8080-e 9496 appuser 20 0 21.0g 18.4g 25220 R 18.8 30.8 60:30.70 GCThread#5 23777 appuser 20 0 21.0g 18.4g 25220 S 18.8 30.8 8:33.61 http-bio-8080-e 5942 appuser 20 0 21.0g 18.4g 25220 S 18.8 30.8 1:48.79 http-bio-8080-e 20515 appuser 20 0 21.0g 18.4g 25220 S 18.8 30.8 0:11.44 http-bio-8080-e 9493 appuser 20 0 21.0g 18.4g 25220 R 12.5 30.8 60:27.59 GCThread#2
3. Produce a stack trace for the PID identified in step 1 using the following command:
$ jstack [PID] > threadstack.txt
4. Convert the process id (corresponding to the thread) identified as problematic in step 2 to a hex value. We picked the process that has spent lot of CPU cycles by using the "Time+" metric. In our example output, the thread ID decimal value 9138 converts to hex value 23B2. Here is a site that can be used to convert the values: https://www.rapidtables.com/convert/number/decimal-to-hex.html
5. Search the stack trace output for this hex value using grep.You are looking for a thread nid that matches this hex value:
$ cat jstack-output.txt | grep -i 2202
This command gives an output similar to this:"Concurrent Mark-Sweep GC Thread" prio=10tid=0x00007f1448029800 nid=0x23B2 runnable
In this example, we can see that the problem is a GarbageCollector (GC) thread, which suggests that GC is not properly tuned. However, the problematic thread could be non-gc thread as well and that denotes an logic with heavy cpu usage that needs to be optimized. Optimization could be to use better algorithm to perform the business logic.