Investigating CPU Contention In A Java Process

More often than not, we would be in a situation where we have to investigate cpu starvation. While it is easy to find the process consuming the most cpu cycles, it gets difficult to identify the specific thread that is consuming the memory. Identifying the thread helps in taking correcttive action on logic causing it. In this article, I am outlining the procedure to find the specific jvm thread that is using the cpu.
1. Determine the process ID (PID) of the affected server process using the following command:
$ jps –v
2. Determine which thread in the process identified in step 1 is consuming the CPU:
$ top -n 1 -H -p \[pid\]
Output:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
20508 appuser 20 0 21.0g 18.4g 25220 S 56.2 30.8 0:40.99 http-bio-8080-e
23771 appuser 20 0 21.0g 18.4g 25220 S 50.0 30.8 7:35.04 http-bio-8080-e
23774 appuser 20 0 21.0g 18.4g 25220 S 43.8 30.8 8:33.02 http-bio-8080-e
9138 appuser 20 0 21.0g 18.4g 25220 R 37.5 30.8 60:31.64 GCThread#0
9492 appuser 20 0 21.0g 18.4g 25220 R 31.2 30.8 60:25.19 GCThread#1
9494 appuser 20 0 21.0g 18.4g 25220 R 31.2 30.8 60:19.80 GCThread#3
9495 appuser 20 0 21.0g 18.4g 25220 R 31.2 30.8 60:24.35 GCThread#4
9497 appuser 20 0 21.0g 18.4g 25220 R 31.2 30.8 60:20.89 GCThread#6
9498 appuser 20 0 21.0g 18.4g 25220 R 31.2 30.8 60:26.13 GCThread#7
26531 appuser 20 0 21.0g 18.4g 25220 S 31.2 30.8 6:57.19 http-bio-8080-e
9145 appuser 20 0 21.0g 18.4g 25220 S 25.0 30.8 23:00.93 G1Refine#0
12515 appuser 20 0 21.0g 18.4g 25220 S 25.0 30.8 15:02.56 http-bio-8080-e
16481 appuser 20 0 21.0g 18.4g 25220 S 25.0 30.8 9:03.94 http-bio-8080-e
23791 appuser 20 0 21.0g 18.4g 25220 S 25.0 30.8 9:15.38 http-bio-8080-e
25573 appuser 20 0 21.0g 18.4g 25220 S 25.0 30.8 7:25.01 http-bio-8080-e
5121 appuser 20 0 21.0g 18.4g 25220 S 25.0 30.8 1:44.61 http-bio-8080-e
5962 appuser 20 0 21.0g 18.4g 25220 S 25.0 30.8 2:55.04 http-bio-8080-e
6933 appuser 20 0 21.0g 18.4g 25220 S 25.0 30.8 3:07.66 http-bio-8080-e
9496 appuser 20 0 21.0g 18.4g 25220 R 18.8 30.8 60:30.70 GCThread#5
23777 appuser 20 0 21.0g 18.4g 25220 S 18.8 30.8 8:33.61 http-bio-8080-e
5942 appuser 20 0 21.0g 18.4g 25220 S 18.8 30.8 1:48.79 http-bio-8080-e
20515 appuser 20 0 21.0g 18.4g 25220 S 18.8 30.8 0:11.44 http-bio-8080-e
9493 appuser 20 0 21.0g 18.4g 25220 R 12.5 30.8 60:27.59 GCThread#2
3. Produce a stack trace for the PID identified in step 1 using the following command:
$ jstack \[PID\] > threadstack.txt
4. Convert the process id (corresponding to the thread) identified as problematic in step 2 to a hex value. We picked the process that has spent lot of CPU cycles by using the “Time+” metric. In our example output, the thread ID decimal value 9138 converts to hex value 23B2. Here is a site that can be used to convert the values: https://www.rapidtables.com/convert/number/decimal-to-hex.html
5. Search the stack trace output for this hex value using grep.You are looking for a thread nid that matches this hex value:
$ cat jstack-output.txt | grep -i 2202
This command gives an output similar to this:“Concurrent Mark-Sweep GC Thread” prio=10tid=0x00007f1448029800 nid=0x23B2 runnable
In this example, we can see that the problem is a GarbageCollector (GC) thread, which suggests that GC is not properly tuned. However, the problematic thread could be non-gc thread as well and that denotes an logic with heavy cpu usage that needs to be optimized. Optimization could be to use better algorithm to perform the business logic.