What does a good boot look like (aka, what should I be happy with)?

It’s a question I wasn’t prepared for in class last week, but one that made sense really.  For the IT Pro that doesn’t eat breath and sleep this stuff, what does a good or ‘fair’ trace look like?

Something like this:

image

What we are looking at here is a boot up that finishes before 45 seconds, with post boot delay quieting in the 60-ish range.  Not bad.  I like getting to a useable desktop in less than a minute, and this boot delivers that.

Do we see any boogeymen here?

image

Or here?

image

Nope, not yet.

image

Nah, not really.  This is a healthy boot.

 

Why?

 

CPU doesn’t spike and/or ride high through boot.  Disk doesn’t ride at 100% through it, so we’re not exceeding storage’s ability to deliver with the boot demand.  DPCs are low, we don’t have a bad driver, etc.

This is a good boot.

How to collect a trace for audio or video problems in Windows 7

Assume the following:  You have a Windows 7 host that you want to collect a trace from.  The user complains of audio issues, stuttering, latency, etc…or video frame rate is low.  Something annoying.

Like my previous post, lets cover a few basic rules here as we get started:

1.  If host = Windows 7 AND bitness = amd64 THEN Set DisablePagingExecutive to 1 and reboot:

http://technet.microsoft.com/en-us/library/cc959492.aspx

2.  Make sure the user account we want to trace is local administrator, even temporarily.

 

After we have that, install the Windows 8 ADK on the target machine, or copy the Windows Performance Toolkit from a machine it has already been installed on onto our target machine.

(We can install by running ADK Setup and deselecting EVERYTHING except Windows Performance Toolkit, by the way.)

installADK-WPT_thumb1

 

So, its there, somewhere.

 

1.  Run WPRUI elevated/as administrator

 

wprui1_thumb1

 

2.  Click More Options on the bottom left, revealing the window that looks like this:

 

image_thumb3

3.  For audio and video glitches that are easy to reproduce, check the scenario you are reproducing in the scenario analysis area.  Change Logging Mode to File based and hit start.

image

 

4.  Click “Start” and the reproduce the issue.  The window will look like this while you do so:

image

5.  When it reproduces, click Save and save the file off, review in xperfview or Windows Performance Analyzer to determine the cause of the glitches, probably DPCs from usbaudio drivers, but what do I know….

“But way Dude!  What if this isn’t easy to reproduce?” you may ask….

Step….

6.  If this is not easy to reproduce, get setup to collect a trace as above, but don’t use WPRUI.

Instead, elevate a command prompt, go to the root of a drive, I’ll use C: for the example, and do the following after you have DisablePagingExecutive set and WPT installed…make a trace directory and cd to it.

xperf -on dispatcher+latency+drivers -stackwalk readythread+threadcreate+cswitch+profile -f C:\trace\xperftrace.etl -minbuffers 1024 -maxbuffers 1024 -maxfile 512 -filemode circular

Then let it run in the background while you dork around trying to reproduce the issue.  Once it hits, simply do the following:

Xperf -d C:\trace\results.etl

Now you can open results.etl in xperfview.exe or Windows Performance Analyzer and look for DPCs and so forth that might be causing the issue….

Enjoy!

How to collect a good boot trace on Windows 7

Assume the following:  You have a Windows 7 host that you want to collect a trace from.  A good trace.  One that you know other people will be able to decipher as well as yourself.  Maybe I’ve asked you to collect a boot trace so I can look at it and pointed you to this blog.  Maybe your Sherpa of IT has decided you should learn this and you are doing it to learn….

(edited 11-2)

[You may also use xperf’s xbootmgr with a syntax similar to this:

xbootmgr -trace boot -traceflags base+latency+dispatcher -stackwalk profile+cswitch+readythread+threadcreate -notraceflagsinfilename -postbootdelay 30

]

 

In any event, you have a Windows 7 host.

Lets cover a few basic rules here as we get started:

1.  If host = Windows 7 AND bitness = amd64 THEN Set DisablePagingExecutive to 1 and reboot:

http://technet.microsoft.com/en-us/library/cc959492.aspx

2.  Make sure the user account we want to trace is local administrator, even temporarily.

3.  Set AutoLogon up in the registry for this user so we don’t flub a password input and invalidate a trace with bogus data:

http://support.microsoft.com/kb/324737

 

After we have that, install the Windows 8 ADK on the target machine, or copy the Windows Performance Toolkit from a machine it has already been installed on onto our target machine. (link http://www.microsoft.com/en-us/download/details.aspx?id=30652)

(We can install by running ADK Setup and deselecting EVERYTHING except Windows Performance Toolkit, by the way.)

installADK-WPT

 

So, its there, somewhere.

 

1.  Run WPRUI elevated/as administrator

 

wprui1

 

2.  For a boot trace, click More Options on the bottom left, revealing the window that looks like this:

 

image

3.  For the boot trace, I would like to see CPU Usage, Disk I/O Activity and File I/O Activity.  I would like you to change the Performance Scenario to “Boot” and number of iterations to “1”, as so:

image

 

4.  Click “Start” and then type something into the box and select a convenient place to store your trace and then hit “Save” which will reboot your machine and collect the trace.

image

 

5.  Let it reboot, let it logon as the user you specified in the auto logon, let it count down the normal boot process and end with the ETL trace in the directory you specified.  Get me that trace, stat!  Or if you are doing this to learn, poke around in it in XperfView.exe and WPA.exe, two entirely different ways to view the data set.

Hope this helps, after I stand up a VM or two I’m going to do some WPA examples….

Windows 8 ADK solves GPO Logon Delay questions, film at 11….

Symptoms:  Logons take forever and you’ve collected an xbootmgr/WPR/xperf123/xperfui trace.  What GPO can it be?  Easy!

 

Open Windows Performance Analyzer from the Windows 8 ADK:

image

Open Said Offending Trace:

 

image

Expand…System Activity and then find the Generic Events, click and drag the windows so its under “Analysis” tab…

image

AR me eyes!  Relax, its ok, we’re going to bring some good order out of this chaos.  Click this button in the top right:

image

And no, I can’t draw straight lines….

This trace should now look like this:

 

image

Slide down on the Provider Name area until you locate “Microsoft-Windows-GroupPolicy”.  Right Click and select “Filter to selection:

image

 

Sanity ensues…Now click the Carrot on Task Name (right of Provider Name in the middle of the screenshot above:

Anyway, this view looks like this (still too busy):

image

Notice the diamonds at the top by the way?  Highlight WinStop on the top and it’ll take you to the bottom area for WinStop.  Observe when each one ends and also, look for the column CSEElapsed Time:

 

image

 

Because it took 360453 ms to complete:

Next expand the CSEExtension Name field

image

Easy as pie.

What is Windows Performance Analyzer all about and why should you use it instead of xperfview?

Consider the following trace in xperfview:

image

This should be familiar to you, it’s the DPC storm / storport doorbell that looks like this:

image

So how does this trace look in WPA from the Windows 8 ADK (RP edition)?

image

A little different, but the Dude isn’t closed minded here, so lets give it a whirl:

image

Observe.  With that LSI_SAS selected, we see the time frame on the top middle where the CPU was doing work in that module.  and See the stack?  Which view is more in-depth?  WPA is the frickin future man.  I was a doubting Thomas 100%.  But WPA takes xperfview to a whole ‘nother level of magic.

And the Dude believes!

Hats off to Michael Milirud and the Windows Performance Toolkit team.  Seriously folks.  Job well done.

 

Another example, why does WINLOGON INIT take so long?

image

image

image

image

 

How cool is that?  We can see CCMEXEC took 8 seconds, its simple to just scroll down the list and see exactly why WINLOGON INIT took so long, and triage it.

 

I LOVE IT!

 

I’m going to build some broken scenarios in my lab and do some more examples of how-tos…

When an Exchange Server doesn’t Exchange…

“Well the high sheriff, told his deputy, won’t you go out and bring my Lazarus?”

Why am I quoting the Po Lazarus tune, the opening song of “O Brother Where Art Thou?” when I’m supposed to be talking about the Exchange Server that doesn’t?  It’s the Chewbacca defense!  This Exchange Server is so hosed I can comfortably quote an old folk song instead of talking about the server…

Ok ok, I’ll talk about the server:

This server is a Windows Server 2008 R2 SP1 Server running 10G E cards to talk to storage and it performs like it shouldn’t.

image

 

And Holy Moly!  DPCs consume more CPU than any one thread on the box.  Googly moogly!  We’ve got a problem here.  But why?  What does it mean?

Right Click this graph and select Summary Table:

image

Here we go, our DPCs are in SYSTEM (4), module elx_octeamvlan.sys.  But wait, there’s more, why?

image

Seems this driver in SYSTEM is spending a lot of DPC time on cores 6, 0 and 4.  Odd.  Lets see what else we can find to help them write a better driver:

DPCs are high, way too high:

image

Observe, DPC count is low on 6/4/0 cores, but waits are um, not low:

image

Huh, lets see what it is (symbols didn’t resolve sorry, but its NDIS.  The Driver / Hardware is a 10G E adapter:

image

Same function call in each of the three cores, lots of wait times.  We’re having trouble with the drivers implementation of how they talk on the network via NDIS.  They are aware and I believe have already fixed the problem.  Woot!  Another happy customer.

Ladies and Gentleman (and low lifes), it is with outstanding pleasure tonight that I am able to present to you, the reader, with an xperf trace….

For those who didn’t yet know, the Dude is a fan of old school rap.  So one of these days, not much to do, I was chillin with my old school crew and I ran into a slow boot scenario…

(Yes, I am in fact listening to Mona Lisa by Slick Rick the Ruler!)

So I’m not picky, let me share this trace out…

image

Not a bad looking trace, what’s the problem the reader may ask him/herself?  Lets scroll down a bit…

image

Whoop whoop, that’s the sound of the police!  Why is our storage basically at 100% util through the whole trace?  The disk light on this laptop never turns off through the boot up.  And look at this:

image

So don’t say this, don’t say that, change your lyrics….Post Boot never finishes on this trace, what gives?

image

Doh, Chill Rob G may have the Power, but ReadyBoot doesn’t….

Lets get a summary table on this.

image

Look at that chart.  We miss 602 MB of ReadyBoot data and hit 415 MB of it.  Who here things a less than 50% Readyboot cache hit rate is acceptable?  No one?  Good, you’re right!

This is a classic case for requiring the ReadyBoot hotfix that says if 99 restore points are created you might run into problems.  It fixes this, as Mark pointed out a while back…http://support.microsoft.com/kb/2555428.

Yet another xperf analysis post

Pump up the volume, Pump up the volume…

Slow Boot:

image

Yep, that’s slow.  Clearly not in CPU usage though…what about disk?

image

Nope (the dude knows nothing about this trace except he (when looking at it the first time) named it ROFLCOPTER)….

So basically the system hangs at boot from 30 second mark through to 350 seconds.  UNACCEPTABLE SIRS!

Why?  I’m guessing WMI, but what do I know…

image

Huh.  Appears I know something.  I might be psychic in fact.  I BET you, these 2 WMI calls, well, 3 calls, are from GPClient…

image

And there we have it.  GPClient killing the box.  Sad smile  I’m also listening to the Blues Brothers, so what do you want for nothing?  A Rubber Biscuit!?

The dude does xperf…

So the Dude (that’s me) has gone over how to collect xperf traces a few times now, but what do we do after we’ve collected a solid .etl file you may ask?  Well, that’s what this post is all about.  This is how the Dude does it.  The Dude you see is listening to Simple Man by Lynyrd Skynyrd right now, in a loop while flying over the Atlantic on the way to TechEd Amsterdam, so he’s in a mentoring mood…

Ok, so the Dude opens his traces in xperfview from the Windows Performance Toolkit, from the Windows 8 RP ADK.  With me so far?  Good. 

 

So I open this trace for a slow boot machine in xperfview.exe, and it looks like this:

image

We are, of course, looking for patterns, as Clint Huffman would say.  Patterns of analysis that lead us to reproducible solutions.  That’s the win of course.

So what do we see in these first two graphs?  Any patterns, anything that stands out?  Maybe in the second graph we see that blue line running for a long time?  This trace, by the way, is of the Dude’s own machine, on a Seagate Momentus XT 500 GB drive.  So a boot time of 300 seconds – 120 for post boot delay = 180 seconds to get to the desktop.  Completely unacceptable.  The Dude is a busy man!

So what’s going on here?  Lets overlay the disk activity instead of just showing it to you…

image

Is it natural and expected during boot to see disk idle while CPU grinds on an SVCHOST (I’m having a hard time screen shotting it to show you the highlight of that process, but here’s the thread in question in the Thread Activity:

image

That’s a bit more damning isn’t it?  So what is that thread up to, right click on CPU Sampling by Thread and Select Summary Table and we’ll find out:

image

Ok, there’s our culprit, SVCHOST.EXE 1228, so we expand him:

image

This SVCHOST is spending all his time (and most of the time of CPU Scheduling on the box except that pesky IDLE process) grinding on REPDRVFS.DLL.  Anyone have their bell rung on that one?  It’s one of the classic SBSL hotfixes, http://support.microsoft.com/kb/2617858.

The Dude saw 2 members of Public Enemy at the Airport today, they still rock Europe.  The Dude knows what time it is!