Applies to Exchange 2003, concepts apply to 2007 as well.
I’ve bumped into a few cases recently where the customer had unexpected transaction log file growth that caused the server to dismount a storage group due to lack of disk space. In this post I’ll attempted to explain why this occurs, how to troubleshoot it, etc.
The short of it is transaction log file growth usually occurs because of a repeating transaction. It can be a looping message, a mis-behaving client, or a corrupt message. Looping messages I’ve seen done by users setting up special things on their Outlook clients. Consider the following example:
A user leaves for the weekend. They are expecting an important email, so they put in a forward rule to forward all email to their mobile phone’s email address. They either 1)mis-type that address, or 2)their phone’s email box doesn’t accept messages above a certain size. In the event of 1), every message sent to the user is going to hit the mail servers of the phone provider and bounce with an invalid address. This NDR will come back and hit the mailbox of the user, where the forward rule will forward the NDR to the phone, which will bounce and come back to the inbox, where it will forward the NDR to the phone…… In the event of 2), any message above the size limit will trigger the loop above (unless the ISP’s mail server knows not to append the offending email as an attachment to the NDR).
This is a real world example I’ve personally run into. Users can and will do all kinds of bizarre things that under the light of day seem obtuse, but in the heat of the moment make sense.
So how do you track this down?
The normal troubleshooting path I take for this type of problem is:
1. Run Exmon. Tell me if a single user is taking something silly like 50% of the servers resources. If you’re spooling out transaction logs like it’s nobody’s business and Exmon shows a user at 50%+ and they are in the same Storage Group as the spooling transaction logs, then chances are you’ve found your man. If Exmon doesn’t point anything out of the ordinary, then proceed to step 2:
2. Go to your Exchange System Manager, drill down to the Storage Group that you’re seeing the transaction log growth on. Expand each database and visit the logins area. Add columns for MSG Ops, Folder Ops, Total Ops, and sort by high/low and see if you have one user towering above the rest. Do this for each database. If you’ve got a single user standing out, again, this is very likely your culprit. Log into their mailbox, see if there is something stuck in the Outbox, or check their active client for any client-side rule that may be at fault. Worse comes to worse, disable the user’s mail.
3. User Scott Oseychik’s guide on Transaction Log analysis to figure out what the offending message might be:
This is an excellent guide and needs no further clarification.
4. If this doesn’t work out for you at this point, call into support, it could be a problem with a mobile device syncing or an OWA session trying to process a corrupt message (I’ve seen both scenarios). Only a series of store dumps collected with adplus will tell us that.
I hope this helps in your troubleshooting efforts.
Glad it helped you out Paul. Thanks for posting!
Not to necro this old thread, but I still deal with Ex2007 in many production environments, and have used ExMon to track down this issue in the past with a high success rate, sometimes narrowly escaping system outages as a result of storage over-consumption. Recently, I came across a CRM Outlook plug-in that was causing tremendous log file growth due to looping activity. If your environment has MS Dynamics CRM 4, check this article.
Hopefully this helps someone who feels like they are 'chasing ghosts'.
Thanks for stopping by and the comment Drew!
Many thanks for this blog entry. We narrowly escaped a storage group outage when transaction logs were being created at around 100MB every 10 seconds.
I suspect point 4 is what we were expereincing. We had no mail queues, message tracking wasn’t showing a large number of messages from or too a specific person. Exmon showed a single mailbox consuming 50% CPU. Ultimately we moved this mailbox as it was the quickest way to kill a single mailbox session and re-create all messages. The growth of logs then returned to normal.
Checking the server out further shows our STM file has grown from 1GB to 32GB over the same period.