Windows, Zombie Processes, and bullshit code

Jeff Stokes Performance Debug Windows Internals

Hi,

In my work at Tanium I do a bit of debugging and performance analytics. Over the last 2-3 years, a LOT of this has centered around how Windows systems get slower and slower over time. This has been a common complaint/statement of ridicule/FUD since I started my career in IT 26 years ago in fact.

Windows is not the issue here. But this happens a lot in Windows and it isn’t exposed well and that needs to be fixed. And people writing tools for Windows systems need to learn to fucking code (and maybe when you write a kernel driver, do more than download the sample on MSDN, pop in your code, then leave the default “Made by WINDDK” in the file properties of your driver. Oh yeah, including a version number too.

I like to eat my own dogfood, so I run Tanium on my own systems. If our stuff is off the chain, I want to know it. Preferably before a customer and Microsoft are pointing fingers at us in a support call. Call me crazy, I like to get ahead of the 8 ball, you know? But, if it is us, and MSFT support happens to be right (which is a declining % value over time the last 4-5 years ago it seems), I’ll take that on the nose as well. At the end I either educated a customer, Microsoft support tier 1 SE, or myself and my dev team. It’s a win all around, no matter who is ‘at fault’.

Tonight I’m going to show you how fucked up Windows can get when code sucks. And it’s not Tanium. And it’s not MSFT. And (fucking surprise!) it’s not even antivirus!!!!! Never thought I’d give AV a clean bill of sale on this, cause usually it is AV….

Ok. system specs: AMD 5900x, 64GB of 3200Mhz RAM, Nvidia 3090 FE, boot drive is a 2TB PCI4 NVME drive. I have a total of 9 SSDs and 3 NVME drives. I also have spinners for cold storage, USB3.1 attached larger spinners. This is not a poorly performant system, or RATHER, it has no real excuse to be a poorly performant system. So when I opened ProcessExplorer tonight to figure out why I had process ID counts above 100k (a zombie process symptom) and it HUNG, over and over trying to render screen updates, I knew something was bad.

So here’s how I solved this.

  1. Taskman, sort by PID, over 100k? Yes? Bad.
  2. Open ProcessExplorer as Administrator and sort by handle count (after adding it to the view).
  3. Add bottom analysis pane to ProcExp do “By handle”
  4. SMH.

So what does this look like in pictures?

Taskman High PID
PIDs are over 100k. Fuck That noise, Windows recycles pids. If your prod server has over a MILLION value, your environment is hosed. Seek help.
Process Explorer - Dead Thread
As you can see, ASUS ArmouryCrate is a well coded app that knows to release handles after enumerating all processes on my system. (<—snark) Why is it enumerating all processes on my system? Because it is searching for games… Here we see it has a handle to a thread that is terminated and that thread’s process is dead.

Ok, so ArmouryCrate is causing all the zombied processes? hahahaah, no

Why? Because the plot thickens.

Bruce Dawson wrote a blog post back a couple years ago and updated some fellows code and posted it in GitHub to find details on process zombies, that are handle based. So I ran that as well.

Razer is holding open a bunch too
Razer is holding open a lot of zombies, yet, that app is not running on my system…or is it?

So, check this out, all these zombie processes here, are cause Razer opens a handle to its child gui process and never closes it either.

Razer holding open razer, why
Razer holding open razer, why

So how to fix all this mess? Like some gamer guy on the internet is gonna get Razer and Asus to fix their shit right? lolz Uninstall that shit and don’t look back I guess. Here’s how I look once I kill these offenders.

From 9300 zombies, to 68
From 9300 zombies, to 68

Now my system is responsive. Eventually as running processes close, pid values will fall as well. Hope this helps you understand how to look for zombies.

https://docs.microsoft.com/en-us/windows/win32/api/processthreadsapi/nf-processthreadsapi-openprocess for more reading.

Thanks for reading my rant.

6 Comments

  1. Great post Jeff. I would beg to differ about it being a MS fault, just because it usually is. However, this is a very enlightening concise article, and helps confirm my suspicions over the razer sw that my kids seem to hold in such high esteem.
    Thank you

Leave a Reply