By Michael Parshin
The goal of this post is to give a short overview of common mistakes we, .NET devs, tend to make over and over again while debugging our beloved products, day in, day out.
Debugging the whole system instead of using unit tests
Reproduce, debug, fix, debug, fix, … start the whole system and try to reproduce the buggy behavior, wait until that happens, and then the debugger crashes exactly when you try to check the value of the third element in the array… Sounds familiar? It is a tough and time consuming process, and unfortunately we all go through this vicious cycle, so what could be a solution?
I believe most of us are familiar with Unit Tests and Test Driven Development concepts, however there are lots of projects out there without even a single test! If your team culture does not allow you to write tests before your write any production code, or even write tests right after you finished writing it – then start writing tests for debugging purposes! When it’s time to fix the next bug, do not step in into debugger, do not prepare some side console app, just open a new test project and start writing unit tests that isolates the issue. Then, you can debug those tests with much less effort and most importantly, the tests you wrote will be yours to keep, and protect you from regressions. When you or your coworker make some unrelated change that re-introduces the same nasty bug all-over again, you’ll know about it!
As you write more tests, you’ll notice that tests are actually valuable in many other ways – they help you express what your code does (or at least, what it ought to do), they serve as documentation for others, showing them how to use your code, and they help verify the SOLIDness of your design.
Jumping right into the debugger before checking logs
Concurrency issues are hard to reproduce and even harder to debug. While most of the work should be done upfront, making sure that our architecture and design do not lend themselves to concurrency mayhem, we may still find ourselves struggling with concurrency related errors. Stepping through a multi-threaded app is like juggling 10 balls at once. What’s worse is that having a debugger attached will definitely impact each thread’s execution time. One of the best solutions for debugging concurrency-related issues is logging. Nothing beats reading through a linear, step-by-step account of the events that lead up to the software failure. There are plenty logging platforms out there (like NLog and log4net) which can be easily incorporated into any .NET application. However, adding lots of logging statements for the purpose of making the application more easily debuggable often becomes a maintenance nightmare, and many times programs become overflowing with context-less and confusing logs.
In some cases it is handy to use additional tracing mechanisms like Trace Points in OzCode.
Lets consider typical race condition scenario:
class Program { private static int _counter; static void Main(string[] args) { Parallel.For(1, 300, IncreaseCounter); Console.WriteLine(_counter); Console.ReadKey(true); } private static void IncreaseCounter(int i) { Thread.Sleep(5); _counter++; } }
In some cases the output of this program will be a number that is less than 300. Since we’re not really sure why this might be the case, Trace Points can help us investigate! So lets put one in the beginning of the method, by using the Trace every entry… Quick Action:
…and then once the program reaches the call to the WriteLine
method we can check the “Trace Points Viewer”, which is essentially a fully-featured log-viewer that’s built into Visual Studio:
Here we can see how many times the function was actually executed and in which thread, look at call stack, execution time-stamp and the trace message. You could also export the data as .CSV and open it in Excel, for instance. Double-clicking on one of the messages takes us right to the line of code from which it was written!
Trying to fix performance related issues without metrics
Another area that is hard to debug is performance. Some programs have specific performance requirements, such as: “The application needs to process 100 orders per second”. OK, so we write the program that can process orders, we test it (hopefully with unit tests) and make sure that the core logic works, and then… then it is necessary to perform good stress tests and demonstrate that it can indeed do a hundred orders per second over time. In most cases we’ll end up needing to set up monitoring in production, to see that there is no other activities on the OS which steal our app’s resources.
Logs or traces may not be a good diagnostic tool in this case, since there is nothing (out of the box) which can take a log file and tell us what the performance rates and metrics are right now. In addition, adding excessive logging to the app can in itself have a non-negligible overhead on performance.
So what’s the solution? Performance counters can be a big help. I would recommend you use them during development, testing, and production monitoring. Add custom performance counters to your application which describes what your application does and how well it performs.
Often times, it is not trivial to define a comprehensive set of performance counters upfront. Continuous investigation and improvement are the way to go. It is a good idea to leave these counters on after development is finished, because your application will not live in a vacuum; there are tons of things which may impact performance, and thus your custom counters will do a great job for maintenance and on-going monitoring.
The nicest thing about performance counters is that they enable you to easily monitor your own application-specific and business-centric counters against the built in counters, such as those that relate to GC, Exceptions, Threading, etc, and they give you all this rich information with only a minimal impact on your application.
Losing valuable information whenever the application crashes
I guess you had a chance to see this message box come up once or twice . If it happens on a developer machine we will have the option of debugging the app to find out what happened. But what if the crash happens in a QA environment, or worse – in production? The most important thing is not to lose the data and find out the reason of the crash. Post mortem debugging can help us investigate the crash after it actually happened. The only thing you need is a dump file. Unfortunately dump files are not created automatically and some configuration needs to be done in order to make this happen. There are number of tools that can produce dump files, my favorite is ProcDump - it is small and easy to use. To register ProcDump and start gathering dumps you have to call this command:
C:\>procdump mp -i c:\dumps
Next time any program will crash, a mini dump will be created under C:\dumps [make sure the folder exists before you begin].
Now that we have dump file, what next?
If WinDbg is still not installed on your computer, this is the time to do it. It is an outstanding tool which gives you a lot of power. I am not going to cover WinDbg in this post (which would be impossible even if I tried), but I really suggest you go and learn abut this tool. There are many good guides and technical information around.
Lets write a simple program that will generate a second chance exception:
class Program { static void Main() { int a = Compute(100, 10); a = Compute(a, 0); Console.WriteLine(a); } static int Compute(int a, int b) { return a/b; } }Once the program executes, a dump file will be created under c:\dumps. Lets open it and investigate it in WinDbg.
First we need to load the SOS extension, as it helps a lot in debugging managed applications. Use .loadby sos clr if you app is x64 or .loadby sos mscorwks for x86.
With WinDbg, we can do a million and one things, but to get started, we can print out the exception details:
0:000> !pe Exception object: 0000000002ea2e78 Exception type: System.DivideByZeroException Message: Attempted to divide by zero. InnerException: <none> StackTrace (generated): SP IP Function 0000000000CCEED0 000007FE087C0150 Crash!Crash.Program.Compute(Int32, Int32)+0x40 0000000000CCEF10 000007FE087C00D8 Crash!Crash.Program.Main()+0x48 StackTraceString: <none> HResult: 80020012see the call stack with function properties values:0:000> !CLRStack -a OS Thread Id: 0x720 (0) Child SP IP Call Site 0000000000cce980 000007fe7886319b [FaultingExceptionFrame: 0000000000cce980] 0000000000cceed0 000007fe087c0150 *** WARNING: Unable to verify checksum for Crash.exe Crash.Program.Compute(Int32, Int32) [C:\Users\MichaelP\Projects\ConcurencyErrors\Crash\Program.cs @ 18] PARAMETERS: a (0x0000000000ccef10) = 0x000000000000000a b (0x0000000000ccef18) = 0x0000000000000000 LOCALS: 0x0000000000cceef0 = 0x0000000000000000 0000000000ccef10 000007fe087c00d8 Crash.Program.Main() [C:\Users\MichaelP\Projects\ConcurencyErrors\Crash\Program.cs @ 11] LOCALS: 0x0000000000ccef30 = 0x000000000000000a 0000000000ccf230 000007fe67e77d93 [GCFrame: 0000000000ccf230]
I highly suggest you configure your QA lab’s computers with ProcDump to collect dumps, so that the next time there is a crash you will have data to work with and so you’ll be able to solve the problem much faster. In addition, many time crashes are not easily reproducible, so it will save QA efforts as well. Dumps can also be opened in Visual Studio, which can be very useful, but mastering WinDbg gives you unprecedented powers of introspection.
Conclusion
While having unit tests may save you some time in debugging, there are still plenty of possible issues we may encounter during an application’s lifecycle, which can hardly be covered by unit tests. Performance issues, memory/resource leaks, crashes and hangs – all of these are the sort of bugs where the plain old debugger is not the best way to go. Do not step-in into the debugger as a knee-jerk reaction, just because you can, but choose the right tool for each job.
Happy debugging!
Michael Parshin is a Senior Consultant at CodeValue.
Michael specializes in enterprise system architecture, software design and development methodologies. He has in-depth knowledge in the .NET framework, C#, WCF and other technologies and development environments. Michael holds an MSc degree in Applied Mathematics from BIU (Bar Ilan University, Israel).
Blog: http://michaelparshin.blogspot.co.il/
LinkedIn: http://www.linkedin.com/in/michaelparshin