Troubleshooting: Tracking down what executed updates in Windows 10

“Why did I get an update installed on my Windows 10 machine?”

At first blush, this sounds like an easy-peasy kinda question to answer. But in a managed enterprise, with multiple IT departments that might not always play nice together, this can be a prickly subject. Because the question really isn’t “Why did this update install?”. It is usually something more like the CEO asking “Why did my PC slow down while I was in a board meeting presentation?”

And again, it’s a fair question. And should be easy. Except when it isn’t. Like today.

“Which management system we use caused an update to occur at this particular time point?”

For this, on 10 I asked for the output of Get-WindowsUpdateLog in an administrative PowerShell prompt. But the output came back formatted weirdly. Snippet below with removing some info (date, time made up)

2018/07/XX 5:55:03.0332383 149280 120000 Unknown( 13): GUID=a2b43708-af59-32cd-48bc-7cf111dee98e.
2018/07/XX 5:55:04.9808283 149280 120000 Unknown( 16): GUID=a2b43708-af59-32cd-48bc-7cf111dee98e.
2018/07/XX 5:55:04.9819252 149280 149372 Unknown( 19): GUID=1bce64d0-3b5c-3a28-bd28-0e6a0b1dc374.
2018/07/XX 5:55:08.2263039 4872 148044 Unknown( 12): GUID=26bba210-72d3-3f28-a89e-6bdc4716006d.
2018/07/XX 5:55:08.2263914 4872 148044 Unknown( 12): GUID=26bba210-72d3-3f28-a89e-6bdc4716006d.
2018/07/XX 5:55:08.2264610 4872 148044 Unknown( 18): GUID=8bc93df4-dfb7-3bd8-4da6-97dda6e01d4f.
2018/07/XX 5:55:29.5807238 4872 148180 Unknown( 19): GUID=7d6df39d-a28f-39fb-0934-ccd9f7428391.
2018/07/XX 5:55:29.5807720 4872 148180 Unknown( 63): GUID=d7143c12-c53a-301e-eff4-e0e2b985334a.

Ok….

So I have PID and I have presumably Thread ID. Don’t know what the process, or what it was doing, and then a GUID which might be helpful.

As it turns out I had previously given the customer instructions on how to collect ETW traces for me using WPRui. So we also had a trace too!

In the trace, my first question was “Did my product cause this?”

In it we found lots of disk activity, but nothing related to updates at all.

So I then tracked down the PID 149280 in the ETW trace, using WPA.

So, it’s a PIA to look manually for PID appended to all these process ids…I could ctrl+F and search in column for my PID, or I could add the PID value to the columns by right clicking a column and checking the box for PID. I searched.

Ok, so now we know the PID is MonitoringHost.exe…this doesn’t sound like something that would be pushing updates to a system, or does it?

Let us pretend we don’t know, how do we find out what this is? Couple ways, add a path to the process view I showed above will give a hint usually. The other would be to look for the parent PID and see who owns it. Lets do both…

Awesome. It’s the Microsoft Monitoring Agent. I k now from reading https://blogs.technet.microsoft.com/msoms/2016/08/17/the-many-faces-of-the-microsoft-monitoring-agent/ that this is some part of either SCOM/SCCM/OMS/Whatever else uses this common binary…

So what is it doing? Let us stack walk shall we?

I add a view from Computation in WPA to the Analysis view. Specifically ‘CPU Usage (Sampled)’, change it to Usage by Process, Stack. Then add Module and Function. I then find my process and right click/filter to selection to remove all other processes. Then simply right click the stack column and select ‘Expand Column’.

I see this is running .net, but I really only care about stacks from the specific binaries, so I scroll down a bit…

And I see this stack

This kinda stood out to me… (truncated them so they wouldn’t line wrap on a browser window… or trying to do that. Whatever I don’t know WordPress.)

OMSRunbookWorkerRegistration.dll!AgentService.OmsHybridRegistration.PowerShell.Commandlets.OmsHybridRunbookWorker

and

Hybrid.Registration.Cmdlets.dll!AgentService.HybridRegistration.PowerShell.Registry.HybridWorkerRegistry

Ok, so now I know it’s OMS code, running runbooks. Lets see what that does…

https://docs.microsoft.com/en-us/azure/automation/automation-update-management Spells it out pretty well…

Computers that are managed by Update Management use the following configurations to perform assessment and update deployments:

Microsoft Monitoring Agent (MMA) for Windows or Linux
PowerShell Desired State Configuration (DSC) for Linux
Automation Hybrid Runbook Worker
Microsoft Update or Windows Server Update Services (WSUS) for Windows computers

Well ok, I see Runbook Worker Registration happening, so it’s running that. Looks pretty likely this is what’s causing Windows Updates to patch.

 

Again, why?

Because I saw the PID of the process referenced in the WindowsUpdate Log

I analyzed the code the PID is running and I see clearly that it is running code for components that are used by an Azure patch automation offering.

How did it get there on the machine? Don’t know, not enough data, maybe someone testing out something, or configured into this by accident. Magic 8 ball says “future is uncertain”.

 

Peace!

Jeff Stokes

Performance Series Part 1 – How to collect an ETW/Xperf trace to capture general performance issues

Applies to: Windows 7+, Windows Server 2008 R2+
Target audience: People I support primarily. Anyone who wants to perf like a pro?

Step 1: Get the Windows Performance Toolkit, by way of the Windows Assessment and Deployment Kit. Since every iteration of the WPT happens to be distributed slightly differently than the previous version, I’ve included the MSFT guide on getting the most recent as a link. As it stands now, run through the web installer and uncheck everything but “Windows Performance Toolkit”.

It is worth noting that the resulting Windows Kits folder with the WPT in it is typically portable. Meaning once you install, you can usually copy/paste the folder to another host without going through the web installer again. There are also redist executables to install just the WPT for ‘next time/next system’ as well.

Step 2: Open WPRui (Start/WPRUI/enter)

Step 3: Expand the “More Options” caret.

  • Expand Resource Analysis.
    • Select “CPU Usage”,
    • Select “Disk I/O Activity”,
    • Select “File I/O Activity”
  • Expand Scenario Analysis.
    • Select “Minifilter I/O activity”

Step 3a: Optionally I may have you skip this and click “Add Profiles…” and add a custom XML instead of check individual boxes.

Step 4: Validate the Performance Scenario is “General”, Detail Level is “Verbose” and Logging mode is “Memory”.

Step 5: Click “Start” and then reproduce the ‘bad behavior’.

Step 6:  Let the collection run for the amount of time I gave you (or a couple minutes) and then click stop.

Step 7:  Wait.

Zip and upload the resulting ETL file and the same-named NGEN.PDB folder (if present) to me.

How To: Collect ETL/WPT tracing diagnostics when you can never logon to the host.

First, guess who's back?!

Me! I left Microsoft of my own accord last year. I came back. I wrote this about my experience, I hope you enjoy it.

There and back again, an IT tale…

Anyways, I was asked a few times recently, Dude, how do you collect an ETW trace for boot/logon if the machine never lets you logon? Is this a chicken/egg scenario?! We need the trace to find out why we never get to desktop, we can't get the trace because we can't get to desktop to stop it?!

Well friends, I'm here to say you can in fact collect your hard won trace!

For your problem node(s) just get a trace started. How if you can't logon to desktop? Easy, here are some options for you:

– Safe mode w/Network copy the Windows Performance Toolkit folder onto the troubled node.

Run Xbootmgr -trace boot -traceflags dispatcher+latency

If SafeMode doesn't work

– Boot up system. Don't logon. Copy WPT directory onto system.

PSExec / scheduled task as system/autoexec.bat the command (guess) "xbootmgr -trace boot -traceflags dispatcher+latency

Now that the system has a xbootmgr trace and is shutting down and rebooting….

Wait to logon, when prompted, do so.

wait 3-4 minutes

– PSExec to the machine. xperf -d C:\directory\merge.etl

If psexec didn't work

set a scheduled task remotely or locally in safe mode if that works, to run xperf -d C:\directory\merge.etl in some directory you made.

(tasks need to run in system context).

Problems with this? Don't get it? Ask questions/comment please. I'm here, for you.

Peace!

Xperf for the layman, performance analysis unchained, Windows Assessment Toolkit revealed.

If you have been following along in performance land the last year or three, you’d hear about xperf and the WPT (Windows Performance Toolkit).  Mayhap you’ve had some time to practice and you know what you are doing.  Cool.  This tool might still interest you.

If you, on the other hand, haven’t heard of these, or haven’t had the time to spend to get good at them, then this tool will definitely interest you.

It is the Windows Assessment Toolkit.  Unlike the Windows Assessment Server, which I’ll speak to later, Windows Assessment Toolkit it a stand alone, infrastructure-less toolkit designed to help layman and skilled professional alike with client performance analysis. 

This is an option for both Windows 7 and Windows 8 by the way.

So without further ado, lets get rolling…

Step 1 – Get the tool

Go here and run the web installer for the ADK.  Cycle through the installer until you get to the checkbox list of tools and pick 2 as seen below:

image

 

Let it install.

Step 2 – Use the tool (data collection and review)

Launch the Windows Assessment Console for the Toolkit like so:

image 

So here we have the Console Launching…and then, the console:

 

image

 

 

So, browser running slow can’t figure out why?  Want to see how long the battery will really last?  Does it take forever to startup and you want to know why?  Just some of the test cases at your finger tips.  Note all these in the default pane run only on Windows 8 or Windows RT.  But when you select “Run Individual Assessments”:

image

A fair amount of them can be run on Windows 7 as well as 8.  So if you don’t want to stand up the infrastructure of a Windows Assessment Server, use this to vet out the performance of hardware, your build, third party filter drivers like AV, DLP, NAC, etc.

The key to this UI though is to click “Configure” down at the bottom next to the Run button, because that’s what you can use to determine which of these ad-hoc test cases can be run against Windows 7 as well.

image

Note this test case can run on Windows 7.  If you wanted to make a test case to give to someone or to place on another machine, just click “Package…”

image

And then you can run it on a machine without having to install any console.

So click Run to do a test case.

image

 

And then it dumps you into a report view when it is complete.  All the items are clickable, and can take you into the ETW trace files if need be.  For example:

image

See below, we’re selected on one of the found issues and on the right hand pane we get an explanation of what the problem and recommendation are to remediate, along with a link to TechNet on what the ‘deal’ is.

image

Take the time to use this in your environment on workstations….why you are going to ask?  What does it get me?

 

Well, the driver certification and verification jobs will identify problem drivers in your build that could cause BSODs or other problems…

The File Handling test case will give you a crystal clear idea of DLP or AV’s ‘cost’ to performance in terms of file io.

Boot up is a general holistic view of the boot up process and the impact of everything on it.

Internet Explorer browsing experience is a collection of pages the job will hit locally for graphics rendering.  It’s pretty slick.  Run it and see.  How good is your GPU and GPU driver at hardware rendering?  Find out.

Check this out, see how it works, and it’ll even point you to issues in the ETW files and you can use this as a jump start to real ETW trace analysis on your own.

Hope you liked this post!

The Dude…

What does a good boot look like (aka, what should I be happy with)?

It’s a question I wasn’t prepared for in class last week, but one that made sense really.  For the IT Pro that doesn’t eat breath and sleep this stuff, what does a good or ‘fair’ trace look like?

Something like this:

image

What we are looking at here is a boot up that finishes before 45 seconds, with post boot delay quieting in the 60-ish range.  Not bad.  I like getting to a useable desktop in less than a minute, and this boot delivers that.

Do we see any boogeymen here?

image

Or here?

image

Nope, not yet.

image

Nah, not really.  This is a healthy boot.

 

Why?

 

CPU doesn’t spike and/or ride high through boot.  Disk doesn’t ride at 100% through it, so we’re not exceeding storage’s ability to deliver with the boot demand.  DPCs are low, we don’t have a bad driver, etc.

This is a good boot.

How to collect a trace for audio or video problems in Windows 7

Assume the following:  You have a Windows 7 host that you want to collect a trace from.  The user complains of audio issues, stuttering, latency, etc…or video frame rate is low.  Something annoying.

Like my previous post, lets cover a few basic rules here as we get started:

1.  If host = Windows 7 AND bitness = amd64 THEN Set DisablePagingExecutive to 1 and reboot:

http://technet.microsoft.com/en-us/library/cc959492.aspx

2.  Make sure the user account we want to trace is local administrator, even temporarily.

 

After we have that, install the Windows 8 ADK on the target machine, or copy the Windows Performance Toolkit from a machine it has already been installed on onto our target machine.

(We can install by running ADK Setup and deselecting EVERYTHING except Windows Performance Toolkit, by the way.)

installADK-WPT_thumb1

 

So, its there, somewhere.

 

1.  Run WPRUI elevated/as administrator

 

wprui1_thumb1

 

2.  Click More Options on the bottom left, revealing the window that looks like this:

 

image_thumb3

3.  For audio and video glitches that are easy to reproduce, check the scenario you are reproducing in the scenario analysis area.  Change Logging Mode to File based and hit start.

image

 

4.  Click “Start” and the reproduce the issue.  The window will look like this while you do so:

image

5.  When it reproduces, click Save and save the file off, review in xperfview or Windows Performance Analyzer to determine the cause of the glitches, probably DPCs from usbaudio drivers, but what do I know….

“But way Dude!  What if this isn’t easy to reproduce?” you may ask….

Step….

6.  If this is not easy to reproduce, get setup to collect a trace as above, but don’t use WPRUI.

Instead, elevate a command prompt, go to the root of a drive, I’ll use C: for the example, and do the following after you have DisablePagingExecutive set and WPT installed…make a trace directory and cd to it.

xperf -on dispatcher+latency+drivers -stackwalk readythread+threadcreate+cswitch+profile -f C:\trace\xperftrace.etl -minbuffers 1024 -maxbuffers 1024 -maxfile 512 -filemode circular

Then let it run in the background while you dork around trying to reproduce the issue.  Once it hits, simply do the following:

Xperf -d C:\trace\results.etl

Now you can open results.etl in xperfview.exe or Windows Performance Analyzer and look for DPCs and so forth that might be causing the issue….

Enjoy!

How to collect a good boot trace on Windows 7

Assume the following:  You have a Windows 7 host that you want to collect a trace from.  A good trace.  One that you know other people will be able to decipher as well as yourself.  Maybe I’ve asked you to collect a boot trace so I can look at it and pointed you to this blog.  Maybe your Sherpa of IT has decided you should learn this and you are doing it to learn….

(edited 11-2)

[You may also use xperf’s xbootmgr with a syntax similar to this:

xbootmgr -trace boot -traceflags base+latency+dispatcher -stackwalk profile+cswitch+readythread+threadcreate -notraceflagsinfilename -postbootdelay 30

]

 

In any event, you have a Windows 7 host.

Lets cover a few basic rules here as we get started:

1.  If host = Windows 7 AND bitness = amd64 THEN Set DisablePagingExecutive to 1 and reboot:

http://technet.microsoft.com/en-us/library/cc959492.aspx

2.  Make sure the user account we want to trace is local administrator, even temporarily.

3.  Set AutoLogon up in the registry for this user so we don’t flub a password input and invalidate a trace with bogus data:

http://support.microsoft.com/kb/324737

 

After we have that, install the Windows 8 ADK on the target machine, or copy the Windows Performance Toolkit from a machine it has already been installed on onto our target machine. (link http://www.microsoft.com/en-us/download/details.aspx?id=30652)

(We can install by running ADK Setup and deselecting EVERYTHING except Windows Performance Toolkit, by the way.)

installADK-WPT

 

So, its there, somewhere.

 

1.  Run WPRUI elevated/as administrator

 

wprui1

 

2.  For a boot trace, click More Options on the bottom left, revealing the window that looks like this:

 

image

3.  For the boot trace, I would like to see CPU Usage, Disk I/O Activity and File I/O Activity.  I would like you to change the Performance Scenario to “Boot” and number of iterations to “1”, as so:

image

 

4.  Click “Start” and then type something into the box and select a convenient place to store your trace and then hit “Save” which will reboot your machine and collect the trace.

image

 

5.  Let it reboot, let it logon as the user you specified in the auto logon, let it count down the normal boot process and end with the ETL trace in the directory you specified.  Get me that trace, stat!  Or if you are doing this to learn, poke around in it in XperfView.exe and WPA.exe, two entirely different ways to view the data set.

Hope this helps, after I stand up a VM or two I’m going to do some WPA examples….

Welcome to the Windows Assessment Server from the Windows 8 ADK. Part 1 of X

Fa-La…It’s the magical mystery tour…..

 

Well, not really, but it is the Windows Assessment Server brought to us by the Windows 8 ADK!

 

Lets fire this bad boy up and play around with it.

Step 0.  Prepare your environment.

You need DHCP and DNS and a Server running 2008 R2 SP1 or Windows Server 2012.  This MAY NOT be installed on a server with the AD DS role by the way…

Step 1.  Install the ADK:

Go here:  http://www.microsoft.com/en-us/download/details.aspx?id=30652 and run ADKSetup.exe

image

Next

image

Next

image

Accept

image

Click Install

image

And then be patient…

image

And then

image

And then

image

And finally…

image

Start the Windows Assessment Service – Client and click to configure the server.  It’ll do a lot of work.

image

Like that and this

image

And finally….

image 

Windows 8 ADK solves GPO Logon Delay questions, film at 11….

Symptoms:  Logons take forever and you’ve collected an xbootmgr/WPR/xperf123/xperfui trace.  What GPO can it be?  Easy!

 

Open Windows Performance Analyzer from the Windows 8 ADK:

image

Open Said Offending Trace:

 

image

Expand…System Activity and then find the Generic Events, click and drag the windows so its under “Analysis” tab…

image

AR me eyes!  Relax, its ok, we’re going to bring some good order out of this chaos.  Click this button in the top right:

image

And no, I can’t draw straight lines….

This trace should now look like this:

 

image

Slide down on the Provider Name area until you locate “Microsoft-Windows-GroupPolicy”.  Right Click and select “Filter to selection:

image

 

Sanity ensues…Now click the Carrot on Task Name (right of Provider Name in the middle of the screenshot above:

Anyway, this view looks like this (still too busy):

image

Notice the diamonds at the top by the way?  Highlight WinStop on the top and it’ll take you to the bottom area for WinStop.  Observe when each one ends and also, look for the column CSEElapsed Time:

 

image

 

Because it took 360453 ms to complete:

Next expand the CSEExtension Name field

image

Easy as pie.

When an Exchange Server doesn’t Exchange…

“Well the high sheriff, told his deputy, won’t you go out and bring my Lazarus?”

Why am I quoting the Po Lazarus tune, the opening song of “O Brother Where Art Thou?” when I’m supposed to be talking about the Exchange Server that doesn’t?  It’s the Chewbacca defense!  This Exchange Server is so hosed I can comfortably quote an old folk song instead of talking about the server…

Ok ok, I’ll talk about the server:

This server is a Windows Server 2008 R2 SP1 Server running 10G E cards to talk to storage and it performs like it shouldn’t.

image

 

And Holy Moly!  DPCs consume more CPU than any one thread on the box.  Googly moogly!  We’ve got a problem here.  But why?  What does it mean?

Right Click this graph and select Summary Table:

image

Here we go, our DPCs are in SYSTEM (4), module elx_octeamvlan.sys.  But wait, there’s more, why?

image

Seems this driver in SYSTEM is spending a lot of DPC time on cores 6, 0 and 4.  Odd.  Lets see what else we can find to help them write a better driver:

DPCs are high, way too high:

image

Observe, DPC count is low on 6/4/0 cores, but waits are um, not low:

image

Huh, lets see what it is (symbols didn’t resolve sorry, but its NDIS.  The Driver / Hardware is a 10G E adapter:

image

Same function call in each of the three cores, lots of wait times.  We’re having trouble with the drivers implementation of how they talk on the network via NDIS.  They are aware and I believe have already fixed the problem.  Woot!  Another happy customer.