Wednesday, November 01, 2006
Imagine the following typical conversation, with a Microsoft Escalation Engineer. For those of you who don't know, a Microsoft Escalation Engineer is at the top of the food chain when it comes to getting to the root of a problem quickly, and reliably, on the Windows platform. They have something akin to super powers when it comes to debugging and I have the utmost respect for anyone in this position! Which makes it that much more bizarre...
"So, you know how to take a dump?", says the Microsoft Escalation Engineer.
"Sure! I've been taking dumps as long as I can remember. You want me to take one now?", I reply.
"Yes. Go ahead and take a dump for me, then upload it so I can review it. Once I get a chance to sift through the results I'll get back to you", explains the Microsoft Escalation Engineer.
We carry on for a few more minutes on the subject of dumping. Seriously.. Its only in retrospect that I cringe at how someone might take the conversation out of context.
Now, I'm not above a little snickering (even as writing this), but we've got to draw the line somewhere don't we?
It was after one such conversation that I began referring to them as 'snaps'. I probably even read it somewhere, but it sounds so much more, er...professional, than 'dump'.
The focus of the post is help educate our operators and administrators on what to do when they witness behavior in a production system that needs to be reviewed.
In this particular case, memory utilization of a process grows until the process stops responding, which affects our user base, which is when the phone starts ringing! To keep things 'moving along' the process is typically recycled.
During the pressure of attempting to right a system thats gone belly up, we want the it to become second nature to inject an additional step in the current process which will aid in diagnosing the problem.
Installing Debug Tools for Windows
The latest version can be found here. Check your servers to make sure they have a recent version! Recent being defined as a version that was released this year.
I’ve installed this on production servers without problems. One thing to note, there are two ways to get it ‘installed’ on the server in our environments.
- Run the installation by clicking the setup.exe - or -
- Run the installation on one server and then xcopy the directory to the production server. The only thing you lose is access to the tools from the Start menu. This is how the server team typically installs it on c:\tools\deugging tools for windows. You’ll note that the version on the servers is likely out of date unless you’ve installed/upgraded it yourself.
To take a memory snap (sometimes referred to as a ‘dump’; ‘snap’ just sounds better):
- Open a command prompt
- Change to the Debugging Tools for Windows directory
- Execute adplus –hang –p xxxx (where xxxx is the process id of the process you want to snap); an option you may need is the ‘-o’ which is to redirect the dump to a directory other than the current working directory.
In the default configuration, meaning you’ve installed per procedure above, you will receive a warning dialog about missing environment variable symbol paths. It’s safe to ignore this dialog.
If it’s suspected that a process is ‘hung’, or ‘locked’, take additional snaps of the process, but at least a minute apart. Usually 2-3 will suffice. This will allow the reviewer to confirm a hung process or not.
A snap will take up as much memory on disk as the process consumes, so it can fill a disk quickly. Because the threads of the process are temporarily suspended during a snap, I do not recommend pointing the output (-o) to a file share when you actually take the snap. This adds much latency to the operation. It’s much faster, and less intrusive, to snap it to a local disk and then xcopy to a network share, or a workstation for review.
There are additional options, all of which are available from adplus.vbs; just open it up in your favorite text editor for review. There is also a significant amount of information online, just try Googling ‘adplus’, ‘debugging windows’, etc.
The prototype will focus on the sales order life cycle and is expected to take 1-2 weeks.
Our team is mainly focused on providing consulting and best practices to the vendor to help enable both parties to reach a mutual goal of successful integration. At the end of the engagement, if the vendor has a solid understanding of the process involved, it will be up to them to implement the remaining interfaces.
Today begins the downhill slide into Friday and the checking and rechecking to make sure we aren't forgetting anything that we might need while on site.