How to troubleshoot application hangs with DebugDiag 1.2 Part 2 (or what am I looking at exactly?)

I forgot, in the original post, you have to enable Symbols…doh.

But all isn’t lost.  You can re-run it and generate a new MHT.

Symbols you want to point to our public symbol server in the “Tools/Options and Settings” area:

image

And you can just run the analysis again…back to our scheduled program here..

So you followed the instructions in “How to troubleshoot application hangs with DebugDiag 1.2” and Internet Explorer popped up a MHT file with the warning that scripts are enabled….

So click the warning!  We’re cooking with gas here!

Alright, now that that’s out of the way, we have a report with a bunch of information in it, mine looks like:

 

image

Ok so what am I drawn to here?  What are we looking at?  An Outlook hang.  We can see right away, in my demo, I forgot to set Outlook to ‘online mode’ instead of ‘cached mode”….hm.  How can we tell?

Detected a possible critical section related problem in OUTLOOK.EXE__PID__2900__Date__02_28_2012__Time_06_13_01PM__639__Manual Dump.dmp
Lock at 0x067513b0 is Unlocked
Impact analysis
2.33% of threads blocked
(Threads 3)
The following functions are involved in the root cause
MSPST32!MSProviderInit+2d9
The following modules are involved in the root cause
c:\Program Files (x86)\Microsoft Office\Office14\MSPST32.DLL from Microsoft Corporation

Thread 3 is doing work in the PST provider, OST uses that provider as well, we’re doing work in a file for Outlook.  But what else?

I click the link for Thread 3 and get this (again highlighting relevant things that stand out):

Thread 3 – System ID 2920

Entry point
0x00000000

Create time
2/28/2012 5:30:03 PM

Time spent in user mode
0 Days 00:00:00.265

Time spent in kernel mode
0 Days 00:00:00.093

This thread is incomplete and also has/have an invalid Thread Environment Block pointer. As a result, the information reported is most likely inaccurate.

Function
Source

ntdll!ZwWaitForSingleObject+15

ntdll!RtlpWaitOnCriticalSection+13e

ntdll!RtlEnterCriticalSection+150

MSPST32!MSProviderInit+2d9

MSPST32!MSProviderInit+1e09

MSPST32!MSProviderInit+1d9c2

OLMAPI32!ScRelocNotifications+8c7

OLMAPI32!ScRelocNotifications+78f

OLMAPI32!HrNotify+194

OLMAPI32!MSProviderInit+9e7

OLMAPI32!MSProviderInit+95f

MSO!Ordinal3329+64c

MSO!Ordinal3329+428

MSO!Ordinal3329+2ba

MSO!Ordinal5376+339e

MSO!Ordinal3194+299

kernel32!BaseThreadInitThunk+e

ntdll!__RtlUserThreadStart+70

ntdll!_RtlUserThreadStart+1b

So, we see we are doing work in the file backing Outlook, and OLMAPI32 is doing “ScRelocNotifications”.  We’re moving data.  In the file.

What else?

This stands out:

The following threads in OUTLOOK.EXE__PID__2900__Date__02_28_2012__Time_06_13_01PM__639__Manual Dump.dmp are waiting on data to be returned from another server via WinSock.
The call to WinSock originated from wininet!ICAsyncThread::SelectThread+381
( 23 )
2.33% of threads blocked

We’re syncing data with the server, because I just moved a ton of data in my OST file that has to be synced with the server, so to verify, I click 23:

Thread 23 – System ID 5388

Entry point
0x00000000

Create time
2/28/2012 5:30:11 PM

Time spent in user mode
0 Days 00:00:00.00

Time spent in kernel mode
0 Days 00:00:00.00

This thread is waiting on data to be returned from another server via WinSock.
The call to WinSock originated from wininet!ICAsyncThread::SelectThread+381

Function
Source

ntdll!ZwWaitForSingleObject+15

mswsock!SockWaitForSingleObject+1ba

mswsock!WSPSelect+3a6

ws2_32!select+2bb

wininet!ICAsyncThread::SelectThread+381

wininet!ICAsyncThread::SelectThreadWrapper+d

kernel32!BaseThreadInitThunk+e

ntdll!__RtlUserThreadStart+70

ntdll!_RtlUserThreadStart+1b

 

Makes sense to me…now some background about the Dude here….no college, no CS degree, no coding experience.  But I can infer we’re doing synchronization of data from Outlook to the mail store.  Need more proof though right, not very compelling here….

So I poke through the report a bit.  Remember, this is a hang analysis for our dumps.  Functioning threads aren’t going to be pointed out by our report, just the threads that we are WAITING on.  So we’re waiting on file system activity for the OST, and syncing with the server.  Hey look, I poked around in the trace and found this:

Thread 40 – System ID 3236

Entry point
0x00000000

Create time
2/28/2012 6:12:56 PM

Time spent in user mode
0 Days 00:00:00.686

Time spent in kernel mode
0 Days 00:00:00.078

This thread is not fully resolved and may or may not be a problem. Further analysis of these threads may be required.

Function
Source

ntdll!NtWaitForMultipleObjects+15

KERNELBASE!WaitForMultipleObjectsEx+100

kernel32!WaitForMultipleObjectsExImplementation+e0

EMSMDB32!ScStatClose+dd3d

EMSMDB32!ScStatClose+d94f

EMSMDB32!XPProviderInit+104a3

EMSMDB32!XPProviderInit+103bd

EMSMDB32!XPProviderInit+102e5

EMSMDB32!XPProviderInit+f82d

EMSMDB32!XPProviderInit+574e3

EMSMDB32!XPProviderInit+5739e

EMSMDB32!XPProviderInit+5713a

EMSMDB32!XPProviderInit+56f8b

EMSMDB32!XPProviderInit+56e13

EMSMDB32!XPProviderInit+56cb0

EMSMDB32!XPProviderInit+35c0e

EMSMDB32!XPProviderInit+4e12c

OLMAPI32!MSProviderInit+95f

MSO!Ordinal3329+64c

MSO!Ordinal3329+428

MSO!Ordinal3329+2ba

MSO!Ordinal5376+339e

MSO!Ordinal3194+299

kernel32!BaseThreadInitThunk+e

ntdll!__RtlUserThreadStart+70

ntdll!_RtlUserThreadStart+1b

 

EMSMDB32, commands going back to the data store for Exchange I presume.  32 because I’m running the 32 bit version of Outlook.

Here’s another thread we aren’t hung on, its doing its job:

Thread 39 – System ID 9376

Entry point
0x00000000

Create time
2/28/2012 6:12:49 PM

Time spent in user mode
0 Days 00:00:04.274

Time spent in kernel mode
0 Days 00:00:00.390

This thread is not fully resolved and may or may not be a problem. Further analysis of these threads may be required.

Function
Source

MSPST32!MSProviderInit+1a7e0

MSPST32!MSProviderInit+f943

MSPST32!MSProviderInit+f834

MSPST32!MSProviderInit+f792

MSPST32!MSProviderInit+f6b1

MSPST32!MSProviderInit+f5b0

MSPST32!MSProviderInit+f55d

MSPST32!MSProviderInit+f53e

MSPST32!MSProviderInit+16f9b

MSPST32!MSProviderInit+171d9

MSPST32!MSProviderInit+199cf

MSPST32!MSProviderInit+1991b

MSPST32!MSProviderInit+486d0

MSPST32!MSProviderInit+5c69b

MSPST32!MSProviderInit+48230

MSPST32!MSProviderInit+48178

MSPST32!MSProviderInit+30a28

MSO!Ordinal803+51b

MSO!Ordinal3329+428

MSO!Ordinal3329+2ba

MSO!Ordinal1704+d4a

MSO!Ordinal3329+428

MSO!Ordinal3329+2ba

MSO!Ordinal5376+339e

MSO!Ordinal3194+299

kernel32!BaseThreadInitThunk+e

ntdll!__RtlUserThreadStart+70

ntdll!_RtlUserThreadStart+1b

Doing a bunch of work in the file backing Outlook.  The numbers after the command are symbols that aren’t resolved by the public symbol server by the way…

 

So this is a hang analysis of a normal functioning application that waited for a few seconds (dumps collected 3 seconds apart).  If we had Outlook or Word hung for a long time though, we could do the same thing and the threads that would stand out, would be threads of interest because other threads are waiting on them.  Does this make sense?  Help a little?

Play with it.  Poke around in it.  See if it doesn’t start to make some sense.

How to troubleshoot application hangs with DebugDiag 1.2

Or, DebugDiag from the client perspective…

1.  Install DebugDiag 1.2 (http://support.microsoft.com/kb/2580960)

(Basically, I’m going to expound upon the following (somewhat obscure) instructions:)

        “Create a manual memory dump series during the slow or hang state by right-clicking the process name in the processes view and choosing the “Create Dump Series” option.”

2.  Run the application (in this case, Microsoft Outlook 2010)

image

3.  Run DebugDiag 1.2

image

4.  Attach DebugDiag 1.2 to the application.

Click Cancel on the Wizard

image

Then click the 3rd Tab, Process

image

And then Right Click your Process (in my example, Outlook.exe) and Select “Create Userdump Series”

image

Then Change the “10” to “5” in the number of dumps to create and check the box to create a full before creating minidumps (this will run a sequence to collect 1 full dump, then 3 minidumps and then a full dump of the process).

image

5.  Reproduce application hang (in this example I’m going to copy 5507 emails to a new folder while on an aircard in Online Mode instead of Cached Mode).

Once the problem starts to occur, click “Save and Close”.  Dumps are being created!

image

6.  Analyze the dumps.  Click the middle tab (Advanced Analysis) and then select the default option for Crash/Hang Analysis and then (Add Data Files).

image

Add the dumps stored in the directory (default was C:\Program Files\DebugDiag\Logs\Misc)

image

Now enable Symbols

For Symbols you want to point to our public symbol server in the “Tools/Options and Settings” area:

image

 

And click start analysis and let er rip

image

 

Coming soon, how to read the report, or, what comes next (I’ll finish this when I get onto the airplane in a couple hours).

Windows 7 Slow Boot / Slow Logon (Why I do what I do)

Lately, the last year or two, I’ve been slowly drawn (moth to flame almost) into the remediation of the problem state of Windows slow boot or slow logon conditions.  Interestingly enough, Microsoft Premier Field Engineering management allows (and encourages in many cases) folks to focus on things of interest to them.  Follow our passions as it were.  Mine happens to be the end user.  I care about the end user experience as it relates to our products and systems.  If the end user experience fails, the cloud fails, the back office fails, the new application rollout fails, the migration to a new image fails.  Etc..

How long does your enterprise system take to boot up to a usable desktop?  2 minutes?  3?  5?  10?  Is this a problem for your business productivity?  Has your organization had a WDRAP?  Why not?

Is it all about boot times and logon times though?  Does it go beyond that?  Is the system sluggish all the time?  Just some of the time, randomly?  Do applications launch slowly?  Does the UI hang for the end user randomly?  Do they complain?

Are you doing something about it?

Help me help you…

If you haven’t already, apply these hotfixes we’ve agreed can help client performance post Service Pack 1 for Windows 7:

http://blogs.technet.com/b/yongrhee/archive/2012/02/19/list-of-performance-hotfixes-post-sp1-for-windows-7-sp1.aspx

Are you doing VDI and having slow performance?  Have you done any of these steps (at a minimum):

http://social.technet.microsoft.com/wiki/contents/articles/list-of-resources-on-windows-7-optimization-for-vdi.aspx?wa=wsignin1.0

Do you have slow boot up times?  Have you looked at the logon times of your services like this:

http://blogs.technet.com/b/jeff_stokes/archive/2009/12/18/why-do-i-have-long-boot-times.aspx

Do you need to dig deeper, like this:

http://blogs.technet.com/b/jeff_stokes/archive/2011/12/05/xperf-o-rama.aspx

Have a scenario I didn’t cover?  Post a comment and I’ll think up a data collection and analysis path if I can.

Thanks,

Jeff Stokes

Group Policy Sample Script Issue, solved

This is an interesting one-off that we felt would be worth posting.  Might help a few folks not pull their hair out one late evening…

Issue: Running the ‘getreportsforallgpos.wsf’ script from the GPMC sample scripts, from a command prompt (even as admin) – I get the following error message:

Exact syntax is: 

C:\Program Files\Microsoft Group Policy\Sample Scripts>cscript getreportsforallgpos.wsf c:\users\bob\desktop\gpoexport /domain:Contoso

Error:  (47, 18) (null) : A referral was returned from the server

Resolution:  Use the domain’s FQDN instead of short name (NetBIOS name)

Example:   C:\Program Files\Microsoft Group Policy\Sample Scripts>cscript getreportsforallgpos.wsf c:\users\bob\desktop\gpoexport /domain:Contoso.com

Hope that helps someone out there in the great big world someday.  Shout out to Roger Southgate, fellow PFE, for finding this gem.

 

Cheers!