TLDR: At time of writing, Windows 10 20H2 has a bug where the default buffer allocations in boot tracing are inadequate to capture the data of a boot trace. The fix is pretty simple, use good old xbootmgr instead. This is a binary from the older ADK and gets installed when you install the current ADK.
What am I talking about? How did I find this?
I hit a scenario where I needed a boot trace. So I set it up like so, this is a pretty typical set of options for a boot trace. Collect 1st level triage, CPU, DiskIO and File IO events. Log to file (the only option in a boot trace) and change your iterations from 3 to 1.
But when the trace rebooted the VM and came back up, it had dropped events. Dropped events mean at some point in the recording, data was lost. Windows knows it lost data but not what type. So this makes interpreting the trace extremely unreliable.
Typically this is due to poor storage performance. So I tested the storage with CrystalDiskMark. And since the VM is hosted on an NVME drive, it did pretty well.
These numbers are more than adequate for our needs. So what gives? There is a mechanic in collecting traces known as ETW buffers that capture the data from ETW providers.
Think of this as radio waves. Each ETW provider in Windows is a radio station. Each one is broadcasting all the time. When you collect an ETW trace, what you are telling Windows you want to do is listen to a station or set of stations, and collect that data into memory, or in the case of a boot trace, a pair of files. Windows can do this for you usually with no issues, by allocating Non-Paged Kernel memory to trace buffers. In xbootmgr and its cousin, xperf.exe, you can tweak the buffers allocated to the trace, both the count of buffers, and the memory size of each buffer. Typically the default values work just fine, but if you are dealing with a very busy system or terrible storage performance, sometimes you can drop events.
To go back to the radio analogy, this would be like the broadcast missing segments of time, or static perhaps is a way to think of it.
If you wanted to learn more, About Event Tracing is a great starting point, so is ETW Central.
So back to the scenario, I had dropped events, and I confirmed storage was great. So what next?
I thinned the trace, iteratively, down to just 1st level triage checked and “Light” instead of “Verbose” and still dropped events.
I also tried the “GeneralProfileForLargeServers.wprp” file that is located in the “C:\Program Files (x86)\Windows Kits\10\Windows Performance Toolkit” directory. I tried this because this file has statically set values for buffers. But still, no dice, dropped events.
What I ended up doing to fix this was call xbootmgr and then I had no dropped events. Curious. I can only surmise Windows 10 20H2 has a different configuration than previous Windows versions for the ETW collections.
The command I used is xbootmgr -trace boot -traceflags dispatcher+latency. This rebooted my machine as expected and collected a trace. When I opened it, it had no errors. Success!
xbootmgr -trace boot -traceflags dispatcher+latency
Then simply double-clicking the resulting etw file was met with success.
I’ll be opening a Feedback using the Feedback app and placing a link here shortly. If this impacts you and you’d like to see it fixed please upvote here. I hope this has helped you understand what is going on and how to work around the current issue. Happy Tracing!
Leave a Reply