What is Windows Performance Analyzer all about and why should you use it instead of xperfview?

Consider the following trace in xperfview:

image

This should be familiar to you, it’s the DPC storm / storport doorbell that looks like this:

image

So how does this trace look in WPA from the Windows 8 ADK (RP edition)?

image

A little different, but the Dude isn’t closed minded here, so lets give it a whirl:

image

Observe.  With that LSI_SAS selected, we see the time frame on the top middle where the CPU was doing work in that module.  and See the stack?  Which view is more in-depth?  WPA is the frickin future man.  I was a doubting Thomas 100%.  But WPA takes xperfview to a whole ‘nother level of magic.

And the Dude believes!

Hats off to Michael Milirud and the Windows Performance Toolkit team.  Seriously folks.  Job well done.

 

Another example, why does WINLOGON INIT take so long?

image

image

image

image

 

How cool is that?  We can see CCMEXEC took 8 seconds, its simple to just scroll down the list and see exactly why WINLOGON INIT took so long, and triage it.

 

I LOVE IT!

 

I’m going to build some broken scenarios in my lab and do some more examples of how-tos…

When an Exchange Server doesn’t Exchange…

“Well the high sheriff, told his deputy, won’t you go out and bring my Lazarus?”

Why am I quoting the Po Lazarus tune, the opening song of “O Brother Where Art Thou?” when I’m supposed to be talking about the Exchange Server that doesn’t?  It’s the Chewbacca defense!  This Exchange Server is so hosed I can comfortably quote an old folk song instead of talking about the server…

Ok ok, I’ll talk about the server:

This server is a Windows Server 2008 R2 SP1 Server running 10G E cards to talk to storage and it performs like it shouldn’t.

image

 

And Holy Moly!  DPCs consume more CPU than any one thread on the box.  Googly moogly!  We’ve got a problem here.  But why?  What does it mean?

Right Click this graph and select Summary Table:

image

Here we go, our DPCs are in SYSTEM (4), module elx_octeamvlan.sys.  But wait, there’s more, why?

image

Seems this driver in SYSTEM is spending a lot of DPC time on cores 6, 0 and 4.  Odd.  Lets see what else we can find to help them write a better driver:

DPCs are high, way too high:

image

Observe, DPC count is low on 6/4/0 cores, but waits are um, not low:

image

Huh, lets see what it is (symbols didn’t resolve sorry, but its NDIS.  The Driver / Hardware is a 10G E adapter:

image

Same function call in each of the three cores, lots of wait times.  We’re having trouble with the drivers implementation of how they talk on the network via NDIS.  They are aware and I believe have already fixed the problem.  Woot!  Another happy customer.

Ladies and Gentleman (and low lifes), it is with outstanding pleasure tonight that I am able to present to you, the reader, with an xperf trace….

For those who didn’t yet know, the Dude is a fan of old school rap.  So one of these days, not much to do, I was chillin with my old school crew and I ran into a slow boot scenario…

(Yes, I am in fact listening to Mona Lisa by Slick Rick the Ruler!)

So I’m not picky, let me share this trace out…

image

Not a bad looking trace, what’s the problem the reader may ask him/herself?  Lets scroll down a bit…

image

Whoop whoop, that’s the sound of the police!  Why is our storage basically at 100% util through the whole trace?  The disk light on this laptop never turns off through the boot up.  And look at this:

image

So don’t say this, don’t say that, change your lyrics….Post Boot never finishes on this trace, what gives?

image

Doh, Chill Rob G may have the Power, but ReadyBoot doesn’t….

Lets get a summary table on this.

image

Look at that chart.  We miss 602 MB of ReadyBoot data and hit 415 MB of it.  Who here things a less than 50% Readyboot cache hit rate is acceptable?  No one?  Good, you’re right!

This is a classic case for requiring the ReadyBoot hotfix that says if 99 restore points are created you might run into problems.  It fixes this, as Mark pointed out a while back…http://support.microsoft.com/kb/2555428.

Yet another xperf analysis post

Pump up the volume, Pump up the volume…

Slow Boot:

image

Yep, that’s slow.  Clearly not in CPU usage though…what about disk?

image

Nope (the dude knows nothing about this trace except he (when looking at it the first time) named it ROFLCOPTER)….

So basically the system hangs at boot from 30 second mark through to 350 seconds.  UNACCEPTABLE SIRS!

Why?  I’m guessing WMI, but what do I know…

image

Huh.  Appears I know something.  I might be psychic in fact.  I BET you, these 2 WMI calls, well, 3 calls, are from GPClient…

image

And there we have it.  GPClient killing the box.  Sad smile  I’m also listening to the Blues Brothers, so what do you want for nothing?  A Rubber Biscuit!?

The dude does xperf…

So the Dude (that’s me) has gone over how to collect xperf traces a few times now, but what do we do after we’ve collected a solid .etl file you may ask?  Well, that’s what this post is all about.  This is how the Dude does it.  The Dude you see is listening to Simple Man by Lynyrd Skynyrd right now, in a loop while flying over the Atlantic on the way to TechEd Amsterdam, so he’s in a mentoring mood…

Ok, so the Dude opens his traces in xperfview from the Windows Performance Toolkit, from the Windows 8 RP ADK.  With me so far?  Good. 

 

So I open this trace for a slow boot machine in xperfview.exe, and it looks like this:

image

We are, of course, looking for patterns, as Clint Huffman would say.  Patterns of analysis that lead us to reproducible solutions.  That’s the win of course.

So what do we see in these first two graphs?  Any patterns, anything that stands out?  Maybe in the second graph we see that blue line running for a long time?  This trace, by the way, is of the Dude’s own machine, on a Seagate Momentus XT 500 GB drive.  So a boot time of 300 seconds – 120 for post boot delay = 180 seconds to get to the desktop.  Completely unacceptable.  The Dude is a busy man!

So what’s going on here?  Lets overlay the disk activity instead of just showing it to you…

image

Is it natural and expected during boot to see disk idle while CPU grinds on an SVCHOST (I’m having a hard time screen shotting it to show you the highlight of that process, but here’s the thread in question in the Thread Activity:

image

That’s a bit more damning isn’t it?  So what is that thread up to, right click on CPU Sampling by Thread and Select Summary Table and we’ll find out:

image

Ok, there’s our culprit, SVCHOST.EXE 1228, so we expand him:

image

This SVCHOST is spending all his time (and most of the time of CPU Scheduling on the box except that pesky IDLE process) grinding on REPDRVFS.DLL.  Anyone have their bell rung on that one?  It’s one of the classic SBSL hotfixes, http://support.microsoft.com/kb/2617858.

The Dude saw 2 members of Public Enemy at the Airport today, they still rock Europe.  The Dude knows what time it is!

PAL 2.3 with Multi-Threading!

PAL (http://pal.codeplex.com/) is a favorite tool of mine, written by Clint Huffman.  I’ve written about it a few times but maybe you weren’t aware that the newest release has multi-threading capability?

It’s no in your face though really, just an option on the last page, the Execute page:

 

image

 

If you assign more than you have cores, your machine will be pretty unusable during the processing of a BLG, but it cuts time down significantly on complex traces.  Give it a shot!

Introducing Windows Server 2012 Free e-book available

I highly recommend giving this a read so you are ready for Windows Server 2012.  The dude had a hand in proofing the book and even has a sidebar on page 25 on performance monitoring.  Give it a read, rumor has it there may be some floating around at TechEd Orlando and Amsterdam as well…

PDF is homed on this page http://www.microsoft.com/en-us/server-cloud/new.aspx and the direct link is here:  http://go.microsoft.com/FWLink/?Linkid=251464

Enjoy!

Jeff