Debugging: Difference between revisions
Wikiwikitavi (talk | contribs) m http://en.wikipedia.org/wiki/Regression_test |
|||
Line 95: | Line 95: | ||
===Test suites=== |
===Test suites=== |
||
A standard set of tests that can be run to perform |
A standard set of tests that can be run to perform [http://en.wikipedia.org/wiki/Software_testing#Regression_testingregression tests] can assist in finding errors before they make it into production. These test cases should be automated as much as possible to reduce the amount of effort required to perform these tests. As new features are added to the system, additional tests should be created to exercise those features. |
||
===Change one thing at a time=== |
===Change one thing at a time=== |
Revision as of 02:25, 11 April 2006
Debugging is a methodical process of finding and reducing the number of bugs, or defects, in a computer program or a piece of electronic hardware thus making it behave as expected. Debugging tends to be harder when various subsystems are tightly coupled, as changes in one may cause bugs to emerge in another.
Origin
There is some controversy over who first used the term "bug" (see the Computer bug article for that discussion). Some claim that the term "debugging" was first defined by Glenford J Myers in his 1976 book Software Reliability: Principles and Practices as "diagnosing the precise nature of a known error and then correcting the error".
The story goes that when one of the early computers malfunctioned, Admiral Grace Hopper discovered that the problem was that a moth had gotten into a relay (which were used for the logic of computers at that time) and caused a short circuit. This was the origin of the term bug in reference to problems with computer programs running correctly. The process of removing errors from computer programs has therefore become known as debugging.
Tools
Debugging is, in general, a cumbersome and tiring task. The debugging skill of the programmer is probably the biggest factor in the ability to debug a problem, but the difficulty of software debugging varies greatly with the programming language used and the available tools, such as debuggers. Debuggers are software tools which enable the programmer to monitor the execution of a program, stop it, re-start it, run it in slow motion, change values in memory and even, in some cases, go back in time. The term debugger can also refer to the person who is doing the debugging.
Generally, high-level programming languages, such as Java, make debugging easier, because they have features such as exception handling that make real sources of erratic behaviour easier to spot. In lower-level programming languages such as C or assembly, bugs may cause silent problems such as memory corruption, and it is often difficult to see where the initial problem happened; in those cases, sophisticated debugging tools may be needed.
For debugging electronic hardware (e.g., computer hardware) as well as low-level software (eg. BIOSes, device drivers) and firmware, instruments such as oscilloscopes, logic analyzers or in-circuit emulators (ICEs) are often used, alone or in combination. An ICE may perform many of the typical software debugger's tasks on low-level software and firmware.
Basic steps
Although each debugging experience is unique, certain general principles can be applied in debugging. This section particularly addresses debugging software, although many of these principles can also be applied to debugging hardware.
The basic steps in debugging are:
- Recognize that a bug exists
- Isolate the source of the bug
- Identify the cause of the bug
- Determine a fix for the bug
- Apply the fix and test it
Recognize a bug exists
Detection of bugs can be done proactively or passively.
An experienced programmer often knows where errors are more likely to occur, based on the complexity of sections of the program as well as possible data corruption. For example, any data obtained from a user should be treated suspiciously. Great care should be taken to verify that the format and content of the data are correct. Data obtained from transmissions should be checked to make sure the entire message (data) was received. Complex data that must be parsed and/or processed may contain unexpected combinations of values that were not anticipated, and not handled correctly. By inserting checks for likely error symptoms, the program can detect when data has been corrupted or not handled correctly.
If an error is severe enough to cause the program to terminate abnormally, the existence of a bug becomes obvious. If the program detects a less serious problem, the bug can be recognized, provided error and/or log messages are monitored. However, if the error is minor and only causes the wrong results, it becomes much more difficult to detect that a bug exists; this is especially true if it is difficult or impossible to verify the results of the program.
The goal of this step is to identify the symptoms of the bug. Observing the symptoms of the problem, under what conditions the problem is detected, and what work-arounds, if any, have been found, will greatly help the remaining steps to debugging the problem.
Isolate source of bug
This step is often the most difficult (and therefore rewarding) step in debugging. The idea is to identify what portion of the system is causing the error. Unfortunately, the source of the problem isn't always the same as the source of the symptoms. For example, if an input record is corrupted, an error may not occur until the program is processing a different record, or performing some action based on the erroneous information, which could happen long after the record was read.
This step often involves iterative testing. The programmer might first verify that the input is correct, next if it was read correctly, processed correctly, etc. For modular systems, this step can be a little easier by checking the validity of data passed across interfaces between different modules. If the input was correct, but the output was not, then the source of the error is within the module. By iteratively testing inputs and outputs, the debugger can identify within a few lines of code where the error is occurring.
Skilled debuggers are often able to hypothesize where the problem might be (based on analogies to previous similar situations), and test the inputs and outputs of the suspected areas of the program. This form of debugging is an instance of the scientific method. Less skilled debuggers often step sequentially through the program, looking for a place where the behavior of the program is different than expected. Note that this is still a form of scientific method as the programmer must decide what variables to examine when looking for unusual behavior. Another approach is to use a "binary sort" type of isolation process. By testing sections near the middle of the data / processing flow, the programmer can determine if the error happens during earlier or later sections of the program. If no data problems are detected, then the error is probably later in the process.
Identify cause of bug
Having found the location of the bug, the next step is to determine the actual cause of the bug, which might involve other sections of the program. For example, if it has been determined that the program faults because a field is wrong, the next step is to identify why the field is wrong. This is the actual source of the bug, although some would argue that the inability of a program to handle bad data can be considered a bug as well.
A good understanding of the system is vital to successfully identifying the source of the bug. A trained debugger can isolate where a problem originates, but only someone familiar with the system can accurately identify the actual cause behind the error. In some cases it might be external to the system: the input data was incorrect. In other cases it might be due to a logic error, where correct data was handled incorrectly. Other possibilities include unexpected values, where the initial assumptions were that a given field can have only "n" values, when in fact, it can have more, as well as unexpected combinations of values in different fields (field x was only supposed to have that value when field y was something different). Another possibility is incorrect reference data, such as a lookup table containing incorrect values relative to the record that was corrupted.
Having determined the cause of the bug, it is a good idea to examine similar sections of the code to see if the same mistake is repeated elsewhere. If the error was clearly a typo, this is less likely, but if the original programmer misunderstood the initial design and/or requirements, the same or similar mistakes could have been made elsewhere.
Determine fix for bug
Having identified the source of the problem, the next task is to determine how the problem can be fixed. An intimate knowledge of the existing system is essential for all but the simplest of problems. This is because the fix will modify the existing behavior of the system, which may produce unexpected results. Furthermore, fixing an existing bug can often either create additional bugs, or expose other bugs that were already present in the program, but never exposed because of the original bug. These problems are often caused by the program executing a previously untested branch of code, or under previously untested conditions.
In some cases, a fix is simple and obvious. This is especially true for logic errors where the original design was implemented incorrectly. On the other hand, if the problem uncovers a major design flaw that permeates a large portion of the system, then the fix might range from difficult to impossible, requiring a total rewrite of the application.
In some cases, it might be desirable to implement a "quick fix", followed by a more permanent fix. This decision is often made by considering the severity, visibility, frequency, and side effects of the problem, as well as the nature of the fix, and product schedules (e.g., are there more pressing problems?).
Fix and test
After the fix has been applied, it is important to test the system and determine that the fix handles the former problem correctly. Testing should be done for two purposes: (1) does the fix now handle the original problem correctly, and (2) make sure the fix hasn't created any undesirable side effects.
For large systems, it is a good idea to have regression tests, a series of test runs that exercise the system. After significant changes and/or bug fixes, these tests can be repeated at any time to verify that the system still executes as expected. As new features are added, additional tests can be included in the test suite.
Steps to reduce debugging
There are concrete steps that can be taken to reduce the amount of time spent debugging software. These are listed in the sections below.
The correct mindset
Probably the most important thing you can do when you are starting to debug a program is to realize that you don't understand what is going on. Programmers who are convinced that their program should work fine are less likely to find errors simply because they are refusing to admit their confusion. If the program behaved the way you think it does, you wouldn't be debugging; the program would be working fine. Even when the program appears to work, if you examine it with the thought that there is at least one bug remaining and you are going to find it, then you are more likely to find something wrong with the program (assuming a bug still exists).
Start at the source
The time when you are most aware of where problems are more likely to arise is usually when first designing and writing the code. By inserting integrity checks at various places within the program, problems can be detected and reported by the program itself. In addition to detecting problems, considerations should be given as to how best to handle each error. Options include:
- Report error, set invalid fields to a default value, and continue
- Report error, discard the record associated with the invalid value, and continue
- Report error, transfer invalid record into separate file/table so the user can examine and possibly correct the problem
- Report error and terminate the program
Treat user input with suspicion
Any data that originated from users (including external systems) should be treated with suspicion. Carefully validate all such input data, performing syntactical and semantical integrity checks. Such invalid data are a common source of programming errors. Think not just of data entered in error, but malicious data as well, as in buffer overflow exploits.
If data are entered interactively by users, you can provide appropriate error messages and allow the user to correct the invalid field(s). If data are not from an interactive source, then the erroneous records should be handled as described above.
Use of log files
Programs that write information to log files can provide significant information that can be used to analyze what was going on before, during, and after problems are encountered. The number of entries to be searched can be reduced by creating various log files, such as a separate log for each major component of the system, plus one log file strictly for errors. Each entry should be date/time stamped so that entries from different logs can be correlated.
Test suites
A standard set of tests that can be run to perform tests can assist in finding errors before they make it into production. These test cases should be automated as much as possible to reduce the amount of effort required to perform these tests. As new features are added to the system, additional tests should be created to exercise those features.
Change one thing at a time
When making a lot of changes, apply them incrementally. Add one change, and then test that change thoroughly before starting on the next change. This will reduce the number of possible sources of new bugs. If several different changes are applied at the same time, then it is much more difficult to identify the source of the problem. Furthermore, minor errors in different areas can interact to produce errors that never would have happened if those changes had been applied one at a time.
Back out changes that have no effect
If you make a change to fix a problem, but the program still behaves the same, back out those changes before proceeding. The fact that your changes didn't do anything indicates one of several things:
- The problem is not where you think it is
- The area you modified either isn't being called, or isn't being called the way you think it is
- Assuming the section you changed wasn't executed, you might have introduced new bugs that won't appear until you fix the current bug
Think of similar situations
When a bug has been found, think of other places where the same mistake might have been made. Check those places and see if the same problem exists there as well.
See also
- Crash (computing)
- Debugger
- Error-correction
- Breakpoint
- Memory debugger
- Magic number (programming) (section "Magic debug values")
- Computer programming
- Software testing
- Flaw detection
- Framepointer
References
- David J. Agans: Debugging: The Nine Indispensable Rules for Finding Even the Most Elusive Software and Hardware Problems, AMACOM, ISBN 0814471684
- Matthew A. Telles, Yuan Hsieh, Matt Telles: The Science of Debugging, The Coriolis Group, ISBN 1576109178
- Robert Metzger: Debugging by Thinking : A Multidisciplinary Approach, Digital Press, ISBN 1555583075
- John Robbins: Debugging Applications, Microsoft Press, ISBN 0735608865
- Ann R. Ford, Toby J. Teorey: Practical Debugging in C++, Prentice Hall, ISBN 0130653942
- Bill Blunden: Software Exorcism: A Handbook for Debugging and Optimizing Legacy Code, APress, ISBN 1590592344
- Frederick Phillips Brooks: The Mythical Man-Month: Essays on Software Engineering, Pearson Addison Wesley, ISBN 0201006502
- Glenford J Myers: Software Reliability: Principles and Practices, John Wiley & Sons inc, ISBN 0471627658
- Glenford J Myers: *The Art of Software Testing, John Wiley & Sons inc, ISBN 0471043281
External links
- Algorithmic and Automatic Debugging - extensive collection of links to debugging tools and methods
- Debugging Backwards in Time - Omniscient Debugging
- Citations from CiteSeer
- Learn the essentials of debugging A nice article exposing a debugging methodology.
- Debugging Software Crashes in C
- Debugging Software Crashes in C++