This can be difficult to determine. Many modern software applications are so complex, and run in such as interdependent environment, that complete testing can never be done. "When to stop testing" is one of the most difficult questions to a test engineer. Common factors in deciding when to stop are:
- Deadlines ( release deadlines,testing deadlines.)
- Test cases completed with certain percentages passed
- Test budget depleted
- Coverage of code/functionality/requirements reaches a specified point
- The rate at which Bugs can be found is too small
- Beta or Alpha Testing period ends
- The risk in the project is under acceptable limit.
Practically, we feel that the decision of stopping testing is based on the level of the risk acceptable to the management. As testing is a never ending process we can never assume that 100 % testing has been done, we can only minimize the risk of shipping the product to client with X testing done. The risk can be measured by Risk analysis but for small duration / low budget / low resources project, risk can be deduced by simply: -
- Measuring Test Coverage.
- Number of test cycles.
- Number of high priority bugs.
The Software Assurance Technology Center (SATC) in the Systems Reliability
and Safety Office at Goddard Space Flight Center (GSFC) is investigating the use
of software error data as an indicator of testing status. Items of interest for
determining the status of testing include projections of the number of errors
remaining in the software and the expected amount of time to find some
percentage of the remaining errors.
To project the number of errors remaining in
software, one needs an estimate of the total number of errors in the software at
the start of testing and a count of the errors found and corrected throughout
testing. There are a number of models that reasonably fit the rate at which
errors are found in software, the most commonly used is referred to in this
paper as the Musa model. This model is not easily applicable at GSFC, however,
due to the availability and the quality of the error data.
At GSFC, useful error data is not easy to obtain for projects not in the
Software Engineering Laboratory. Of the projects studied by the SATC, only a few
had an organized accounting scheme for tracking errors, but they often did not
have a consistent format for recording errors. Some projects record errors that
were found but did not record any information about resources applied to
testing. The error data frequently contained the date of entry of the error data
rather than the actual date of error discovery. In order to use traditional
models such as the Musa model for estimating the cumulative number of errors,
one needs fairly precise data on the time of discovery of errors and the level
of resources applied to testing. Real world software projects are generally not
very accommodating when it comes to either accuracy or completeness of error
data. The models developed by the SATC to perform trending and prediction on
error data attempt to compensate for these shortcomings in the quantity and
availability of project data.
In order to compensate for the quality of the error data, the SATC developed
a software error trending models using two techniques, each based on the basic
Musa model, but with the constant in the exponential term replaced by a function
of time that describes the 'intensity' of the testing effort. The shape and the
parameters for this function can be estimated using measures such as CPU time or
staff hours devoted to testing. The first technique involves fitting cumulative
error data to the modified Musa model using a least squares fit that is based on
gradient methods. This technique requires data on errors found and the number of
staff hours devoted to testing each week of the testing activity. The second
technique uses a Kalman filter to estimate both the total number of errors in
the software and the level of testing being performed. This technique requires
error data and initial estimates of the total number of errors and the initial
amount of effort applied to testing.
The SATC has currently examined and modeled error data from a limited number
of projects. Generally, only the date on which an error was entered into the
error tracking system was available, not the date of discovery of the error. No
useful data was available on human or computer resources expended for testing.
What is needed for the most accurate model is the total time expended for
testing, even if the times are approximate. Using the sum of reported times to
find/fix individual errors did not produce any reasonable correlation with the
resource function required. Some indirect attempts to estimate resource usage,
however, led to some very good fits.
On one project errors were reported along with the name of the person that found the error. Resource usage for testing was estimated as follows: A person was estimated to be working on the testing effort over a period beginning with the first error that they reported and ending with the last error that they reported. The percentage of time that each person worked during that period was assumed to be an unknown constant that did not differ from person to person. Using this technique led to a resource curve that closely resembled the Rayleigh curve (Figure 1).
On one project errors were reported along with the name of the person that found the error. Resource usage for testing was estimated as follows: A person was estimated to be working on the testing effort over a period beginning with the first error that they reported and ending with the last error that they reported. The percentage of time that each person worked during that period was assumed to be an unknown constant that did not differ from person to person. Using this technique led to a resource curve that closely resembled the Rayleigh curve (Figure 1).
Figure 1: Test Resource Levels for Project A
On most of the projects, there was good conformity between the trend model and the reported error data. More importantly, estimates of the total number of errors and the error discovery parameter, made fairly early in the testing activity, seemed to provide reliable indicators of the total number of errors actually found and the time it took to find future errors.
On most of the projects, there was good conformity between the trend model and the reported error data. More importantly, estimates of the total number of errors and the error discovery parameter, made fairly early in the testing activity, seemed to provide reliable indicators of the total number of errors actually found and the time it took to find future errors.
Figures 2 shows the
relationship between reported errors and the SATC trend model for one project.
The graph represents data available at the conclusion of the project. This close
fit was also found on other projects when sufficient data was available.
Figure 2: Cumulative Software Errors for Project A
On another project, different estimates of the total number of errors
were obtained when estimates were made over different testing time
intervals. That is, there was inconsistent agreement between the trend
model and the error data over different time intervals. Through
subsequent discussion with the project manager it was learned that the
rate of error reporting by the project went from approximately 100%
during integration testing to 40% during acceptance testing.
Furthermore, there was a significant amount of code rework, and testing
of the software involved a sequential strategy of completely testing a single functional area before proceeding to test the next functional area of the code.
Thus, the instability of the estimates of the total
errors was a useful indicator of the fact that there was a significant
change in either the project's testing and reporting process. Figure 3
shows the results for this project. Note the change in slope of the
reported number of errors occurring around 150 days. The data curve
flattens at the right end of the curve due to a pause in testing, rather
than a lack of error detection. This project is still undergoing
testing.
Figure 3: Cumulative S/W Errors for Project B - Flight S/W
If error data is broken into the distinct testing phases of the life cycle
(e.g., unit, system, integration), the projected error curve using the SATC
model closely fits the rate at which errors are found in each phase.
Some points need to be clarified about the SATC error trend model. The
formulation of the SATC equation is the direct result of assuming that at any
instant of time, the rate of discovery of errors is proportional to the number
of errors remaining in the software and to the resources applied to finding
errors. Additional conditions needed in order for the SATC trending model to be
a valid are:
- The code being tested is not being substantially altered during the testing process, especially through the addition or rework of large amounts of code.
- All errors found are reported.
- All of the software is tested, and testing of the software is uniform throughout the time of the testing activity.
Condition 1 is present to ensure that the total number of errors is a
relatively stable number throughout the testing activity. Conditions 2 and 3 are
present to ensure that the estimate of the total number of errors is in fact an
estimate of the total errors present in the software at the start of testing -
no new errors are introduced during testing. If testing is not "uniform" then
the rate of error discovery will not necessarily be proportional to the number
of errors remaining in the software and so the equation will not be an
appropriate model for errors found. No attempt will be made here to make precise
the meaning of the word "uniform".
The SATC developed this model rather than using the standard Musa model
because it seems less sensitive to data inaccuracy and provides for non-constant
testing resource levels. An additional benefit from this work is the application
of the Rayleigh curve for estimating resource usage. Future work by the SATC
will continue to seek a practical balance between available trend analysis
theory and the constraints imposed by the limited accuracy and availability of
data from real-world projects.
No comments:
Post a Comment