I was contacted recently about a server with 64 cores that, with no work load, had one core, ordinal 34, running very high on CPU. Looking closer, it was all being used by process ‘System (4)’.
So that’s a fun one. I had them collect an xperf trace using the following command:
xperf –on base+cswitch+disk_io_init+latency+dispatcher –stackwalk threadcreate+readythread+cswitch+profile –f kernel.etl –buffersize 1024 –maxbuffers 1024 –maxfile 1024 –filemode circular
and then stopped the xperf trace after collecting the high CPU at idle for a few minutes.
I then opened the kernel.etl file in xperfview.exe and got to work.
I was looking for what System was doing on core 34, so I had some hints to get started.
I went and verified in CPU Sampling by CPU that CPU was in fact hittin 100% on core 34:
I then drilled down in “CPU Sampling by Thread” to see which thread it was:
Turns out it was thread 372.
This thread was busy in WmiApSrv.exe and wmiprov.dll, but also a lot of work being done in ntoskrnl and storport:
I looked at their version of storport.sys, it was stock for 2008 R2 (not SP1). So I suggested they apply http://support.microsoft.com/kb/981208 to address storport known issues.
When that was applied, the problem went away and they were back in business.
Nice catch! Didn't know about xperf before this post, looks like a powerful tool. Txs. for the insight.