Performance testing with pyUnitPerf

Summary: If you are familiar with Mike Clark's JUnitPerf framework, then you may be interested in knowing that I have just released a Python port: pyUnitPerf. You can browse the source code here and you can get a gzipped tarball from here.

Details

pyUnitPerf tests are meant to transparently add performance testing capabilities to existing pyUnit test suites. The pyUnitPerf framework introduces 2 new types of tests:
  • TimedTest: runs an existing pyUnit test case by imposing a limit to the time it takes to run the test
  • LoadTest: runs an existing pyUnit test case by simulating concurrent users and iterations
Let's look at a simple example adapted from the samples provided with JUnitPerf.

Assume you have the following pyUnit test case in a file called ExampleTestCase.py:
from unittest import TestCase, TestSuite, TextTestRunner, makeSuite

import time



class ExampleTestCase(TestCase):



    def __init__(self, name):

        TestCase.__init__(self, name)



    def testOneSecondResponse(self):

        time.sleep(1)



    def suite(self):

        return makeSuite(self.__class__)



if __name__ == "__main__":

    example = ExampleTestCase("testOneSecondResponse")

    runner = TextTestRunner()

Admitedly this is a contrived example, since the testOneSecondResponse method simply sleeps for 1 second and does not actually test anything, but it serves to illustrate the pyUnitPerf functionality.

Assume you want to create a timed test that waits for the completion of the ExampleTestCase.testOneSecondResponse method and then fails if the elapsed time exceeded 1 second. With pyUnitPerf, all you need to do is write the following code in a file called ExampleTimedTest.py:
from unittest import TestSuite, TextTestRunner

from ExampleTestCase import ExampleTestCase

from LoadTest import LoadTest

from TimedTest import TimedTest



class ExampleTimedTest:



    def __init__(self):

        self.toleranceInSec = 0.05



    def suite(self):

        s = TestSuite()

        s.addTest(self.make1SecondResponseTimedTest())

        return s



    def make1SecondResponseTimedTest(self):

        """

        Decorates a one second response time test as a

        timed test with a maximum elapsed time of 1 second

        """

        maxElapsedTimeInSec = 1 + self.toleranceInSec



        testCase = ExampleTestCase("testOneSecondResponse")

        timedTest = TimedTest(testCase, maxElapsedTimeInSec)

        return timedTest



if __name__ == "__main__":

    TextTestRunner(verbosity=2).run(ExampleTimedTest().suite())

The suite() method constructs a TestSuite object and adds to it the test object returned by the make1SecondResponseTimedTest method. This method instantiates an ExampleTestCase object, passing it the method name to be tested: testOneSecondResponse. We then pass the testCase object to a TimedTest object, together with the desired maximum time to wait for the completion of the test (to which we add a 50 msec. tolerance to account for time potentially spent setting up and tearing down the test case). In the __main__ section of the module, we simply call the pyUnit TextTestRunner, passing it the suite.

If you run: python ExampleTimedTest.py at a command prompt, you will get the following output:
testOneSecondResponse (ExampleTestCase.ExampleTestCase) ... ok

TimedTest (WAITING): 
testOneSecondResponse (ExampleTestCase.ExampleTestCase): 1.0 sec.



----------------------------------------------------------------------

Ran 1 test in 1.000s



OK

Now let's make the test fail by requiring the timed test to finish in 0.9 seconds. To do this, simply change
maxElapsedTimeInSec = 1 + self.toleranceInSec
to
maxElapsedTimeInSec = 0.9 + self.toleranceInSec
Running python ExampleTimedTest.py now results in the following output:
testOneSecondResponse (ExampleTestCase.ExampleTestCase) ... ok

TimedTest (WAITING):
 testOneSecondResponse (ExampleTestCase.ExampleTestCase): 1.

0 sec.

FAIL



======================================================================

FAIL: testOneSecondResponse (ExampleTestCase.ExampleTestCase)

----------------------------------------------------------------------

AssertionFailedError:
Maximum elapsed time exceeded! Expected 0.95 sec., but was

1.0 sec.



----------------------------------------------------------------------

Ran 1 test in 1.000s



FAILED (failures=1)

Note that the test result for the pyUnit test case (ExampleTestCase.testOneSecondResponse) is still marked as OK, but the test result for the Timed Test is marked as FAILED, since the time it took was longer than the specified maximum time of 0.96 sec.

Let's look at an example of a LoadTest. The following code can be saved in a file called ExampleLoadTest.py:
from unittest import TestSuite, TextTestRunner

from ExampleTestCase import ExampleTestCase

from LoadTest import LoadTest

from TimedTest import TimedTest



class ExampleLoadTest:



    def __init__(self):

        self.toleranceInSec = 0.05



    def suite(self):

        s = TestSuite()

        s.addTest(self.make1SecondResponseSingleUserLoadTest())

        s.addTest(self.make1SecondResponseMultipleUserLoadTest())

        s.addTest(self.make1SecondResponse1UserLoadIterationTest())

        return s



    def make1SecondResponseSingleUserLoadTest(self):

        """

        Decorates a one second response time test as a single user

        load test with a maximum elapsed time of 1 second

        and a 0 second delay between users.

        """

        users = 1

        maxElapsedTimeInSec = 1 + self.toleranceInSec



        testCase = ExampleTestCase("testOneSecondResponse")

        loadTest = LoadTest(testCase, users)

        timedTest = TimedTest(loadTest, maxElapsedTimeInSec)

        return timedTest



    def make1SecondResponseMultipleUserLoadTest(self):

        """

        Decorates a one second response time test as a multiple-user

        load test with a maximum elapsed time of 1.5

        seconds and a 0 second delay between users.

        """

        users = 10

        maxElapsedTimeInSec = 1.5 + self.toleranceInSec



        testCase = ExampleTestCase("testOneSecondResponse")

        loadTest = LoadTest(testCase, users)

        timedTest = TimedTest(loadTest, maxElapsedTimeInSec)

        return timedTest



    def make1SecondResponse1UserLoadIterationTest(self):

        """

        Decorates a one second response time test as a single user

        load test with 10 iterations per user, a maximum

        elapsed time of 10 seconds, and a 0 second delay

        between users.

        """



        users = 1

        iterations = 10



        maxElapsedTimeInSec = 10 + self.toleranceInSec

        testCase = ExampleTestCase("testOneSecondResponse");

        loadTest = LoadTest(testCase, users, iterations)

        timedTest = TimedTest(loadTest, maxElapsedTimeInSec)

        return timedTest



if __name__ == "__main__":

    TextTestRunner(verbosity=1).run(ExampleLoadTest().suite())

The 3 methods defined in ExampleLoadTest cover some of the most commonly used load test scenarios. See the doc strings at the beginning of each method for more details. Running python ExampleLoadTest.py generates this output:
.TimedTest (WAITING): LoadTest (NON-ATOMIC): ThreadedTest: testOneSecondResponse

(ExampleTestCase.ExampleTestCase): 1.03099989891 sec.

..........TimedTest (WAITING): 
LoadTest (NON-ATOMIC): ThreadedTest: testOneSecondResponse (ExampleTestCase.ExampleTestCase): 1.0150001049 sec.

..........TimedTest (WAITING): 
LoadTest (NON-ATOMIC): ThreadedTest: testOneSecondResponse (ExampleTestCase.ExampleTestCase)(repeated): 10.0 sec.



----------------------------------------------------------------------

Ran 21 tests in 12.046s



OK

This time all the tests passed. Note that the multiple user load test (make1SecondResponseMultipleUserLoadTest) runs the individual test cases in parallel, each test case in its own thread, and thus the overall time is only slighly longer than 1 second. The multiple iteration test (make1SecondResponse1UserLoadIterationTest) runs the 10 iterations of the test case sequentially, and thus the overall time is 10 seconds.

We can make some of the tests fail by increasing the value of maxElapsedTimeInSec, similar to what we did for the TimedTest.

Why should you use pyUnitPerf? Mike Clark makes a great case for using JUnitPerf here. To summarize, you use pyUnitPerf when you have an existing suite of pyUnit tests that verify the correctness of your code, and you want to isolate potential performance issues with your code.

The fact that the pyUnitPerf test suites are completely independent from the pyUnit tests helps you in scheduling different run times for the 2 types of tests:
  • you want to run the pyUnit tests very often, since they (should) run fast
  • you want to run the pyUnitPerf tests less frequently, when trying to verify that an identified bottleneck has been eliminated (potential bottlenecks can be pinpointed via profiling for example); performance tests tend to take a longer time to run, so they could be scheduled for example during a nightly smoke test run

V Model to W Model | W Model in SDLC Simplified

We already discuss that V-model is the basis of structured testing. However there are few problem with V Model. V Model Represents one-to-one relationship between the documents on the left hand side and the test activities on the right. This is not always correct. System testing not only depends on Function requirements but also depends on technical design, architecture also. Couple of testing activities are not explained in V model. This is a major exception and the V-Model does not support the broader view of testing as a continuously major activity throughout the Software development lifecycle.
Paul Herzlich introduced the W-Model. In W Model, those testing activities are covered which are skipped in V Model.
The ‘W’ model illustrates that the Testing starts from day one of the of the project initiation.
If you see the below picture, 1st “V” shows all the phases of SDLC and 2nd “V” validates the each phase. In 1st “V”, every activity is shadowed by a test activity. The purpose of the test activity specifically is to determine whether the objectives of that activity have been met and the deliverable meets its requirements. W-Model presents a standard development lifecycle with every development stage mirrored by a test activity. On the left hand side, typically, the deliverables of a development activity (for example, write requirements) is accompanied by a test activity test the requirements and so on.


Fig 1: W Model

Fig 2: Each phase is verified/validated. Dotted arrow shows that every phase in brown is validated/tested through every phase in sky blue.
Now, in the above figure,
  • Point 1 refers to - Build Test Plan & Test Strategy.
  • Point 2 refers to - Scenario Identification.
  • Point 3, 4 refers to – Test case preparation from Specification document and design documents
  • Point 5 refers to – review of test cases and update as per the review comments.
So if you see, the above 5 points covers static testing.
  • Point 6 refers to – Various testing methodologies (i.e. Unit/integration testing, path testing, equivalence partition, boundary value, specification based testing, security testing, usability testing, performance testing).
  • After this, there are regression test cycles and then User acceptance testing.
Conclusion - V model only shows dynamic test cycles, but W models gives a broader view of testing. the connection between the various test stages and the basis for the test is clear with W Model (which is not clear in V model).
You can find more comparison of W Model with other SDLC models Here.

V-model is the basis of structured testing

You will find out this is a great model!




V-model is the basis of structured testing

  • The left side shows the classic software life cycle & Right side shows the verification and validation for Each Phase
Analyze User requirements
End users express their whish for a solution for one or more problems they have. In testing you have to start preparation of your user tests at this moment!

You should do test preparation sessions with your acceptance testers. Ask them what cases they want to test. It might help you to find good test cases if you interview end users about the every day cases they work on. Ask them for difficulties they meet in every days work now.

Give feedback about the results of this preparation (hand the list of real life cases, the questions) to the analyst team. Or even better, invite the analyst team to the test preparation sessions. They will learn a lot!

System requirements
One or more analysts interview end users and other parties to find out what is really wanted. They
write down what they found out and usually this is reviewed by Development/Technical Team, end users and third parties.
In testing you can start now by breaking the analyses down into 'features to test'. One 'feature to test' can only have 2 answers: 'pass' or 'fail'. One analysis document will have a number of features to test. Later this will be extremely useful in your quality reporting!

Look for inconsistency and things you don't understand in the analysis documents. There’s a good chance that if you don't understand it, neither will the developers.  Give Feedback your questions and remarks to the analyst team.  This is a second review delivered by testing in order to find the bug as early as possible!


Lets discuss Left side of V Model:
- Global and detailed design
Development translates the analysis documents into technical design.

- Code / Build
Developers program the application and build the application.

- Note: In the classic waterfall software life cycle testing would be at the end of the life cycle.  The V-model is a little different. We already added some testing review to it.

The right side shows the different testing levels :

- Component & Component integration testing
These are the tests development performs to make sure that all the issues of the technical and functional analysis is implemented properly.

- Component testing (unit testing)
   Every time a developer finishes a part of the application he should test this to see if it works properly.

- Component integration testing
   Once a set of application parts is finished, a member of the Development team should test to verify whether the different parts do what they have to do.

 Once these tests pass successfully, system testing can start.

- System and System integration testing
In this testing level we are going to check whether the features to test, destilated from the analyses documents, are realised properly. 

Best results will be achieved when these tests are performed by professional testers.

- System testing
   In this testing level each part (use case, screen description) is tested apart.  

- System integration testing
   Different parts of the application now are tested together to examine the quality of the application.  This is an important (but sometimes difficult) step.  

Typical stuff to test: navigation between different screens, background processes started in one screen, giving a certain output (PDF, updating a database, consistency in GUI,...).

System integration testing also involves testing the interfacing with other systems.  E.g. if you have a web shop, you probably will have to test whether the integrated Online payment services works.

These interface tests are usually not easy to realise, because you will have to make arrangements with parties outside the project group.

- Acceptance testing
Here real users (= the people who will have to work with it) validate whether this application is what they really wanted. 

This comic explains why end users need to accept the application:
This is what actually Client Needs :-(

During the project a lot off interpretation has to be done.  The analyst team has to translate the wishes of the customer into text.  Development has to translate these to program code. Testers have to interpret the analysis to make features to test list. 

Tell somebody a phrase. Make him tell this phrase to another person. And this person to another one... Do this 20 times.  You'll be surprised how much the phrase has changed!

This is exactly the same phenomenon you see in software development!  

Let the end users test the application with the real cases you listed up in the test preparation sessions. Ask them to use real life cases!

And - instead of getting angry - listen when they tell you that the application is not doing what it should do.  They are the people who will suffer the applications shortcomings for the next couple of years. They are your customer!

Art of Test case writing

Objective and Importance of a Test Case
- The basic objective of writing test cases is to ensure complete test coverage of the application.



  •  The most extensive effort in preparing to test a software, is writing test cases.
  • Gives better reliability in estimating the test effort
  • Improves productivity during test execution by reducing the “understanding” time during execution
  • Writing effective test cases is a skill and that can be achieved by experience and in-depth study of the application on which test cases are being written.
  • Documenting the test cases prior to test execution ensures that the tester does the ‘homework’ and is prepared for the ‘attack’ on the Application Under Test
  • Breaking down the Test Requirements into Test Scenarios and Test Cases would help the testers avoid missing out certain test conditions

What is a Test Case?
  • It is the smallest unit of Testing
  • A test case is a detailed procedure that fully tests a feature or an aspect of a feature. Whereas the test plan describes what to test, a test case describes how to perform a particular test.
  • A test case has components that describes an input, action or event and an expected response, to determine if a feature of an application is working correctly.”
  • Test cases must be written by a team member who thoroughly understands the function being tested.

Elements of a Test Case
Every test case must have the following details:

Anatomy of a Test Case
Test Case ID
Requirement # / Section:
Objective: [What is to be verified? ]
Assumptions & Prerequisites
Steps to be executed:
Test data (if any): [Variables and their values ]
Expected result:
Status: [Pass or Fail with details on Defect ID and proofs [o/p files, screenshots (optional)]
Comments:
Any CMMi company would have defined templates and standards to be adhered to while writing test cases.


Language to be used in Test Cases:
1. Use Simple and Easy-to-Understand language.

2. Use Active voice while writing test cases For eg.
- Click on OK button
- Enter the data in screen1
- Choose the option1
- Navigate to the account Summary page.
 

3. Use words like “Verify” / ”Validate” for starting any sentence in Test Case description (Specially for checking GUI) For eg.
- Validate the fields available in _________ screen/tab.

(Changed as per Rick’s suggestion – See comments)

4. Use words like “is/are” and use Present Tense for Expected Results
- The application displays the account information screen
- An error message is displayed on entering special characters

Test Design Techniques

The purpose of test design techniques is to identify test conditions and test scenarios through which effective and efficient test cases can be written.Using test design techniques is a best approach rather the test cases picking out of the air. Test design techniques help in achieving high test coverage. In this post, we will discuss the following:
1. Black Box Test Design Techniques
  • Specification Based
  • Experience Based
2. White-box or Structural Test design techniques

Black-box testing techniques

These includes specification-based and experienced-based techniques. These use external descriptions of the software, including specifications, requirements, and design to derive test cases. These tests can be functional or non-functional, though usually functional. Tester needs not to have any knowledge of internal structure or code of software under test.
Specification-based techniques:
  • Equivalence partitioning
  • Boundary value analysis
  • Use case testing
  • Decision tables
  • Cause-effect graph
  • State transition testing
  • Classification tree method
  • Pair-wise testing
From ISTQB Syllabus:
Common features of specification-based techniques:
  • Models, either formal or informal, are used for the specification of the problem to be solved, the software or its components.
  • From these models test cases can be derived systematically.
Experienced-based techniques:
  • Error Guessing
  • Exploratory Testing

From ISTQB Syllabus:
Common features of experience-based techniques:
  • The knowledge and experience of people are used to derive the test cases.
  • Knowledge of testers, developers, users and other stakeholders about the software, its
    usage and its environment.
  • Knowledge about likely defects and their distribution.

 

White-box techniques

Also referred as structure-based techniques. These are based on the internal structure of the component. Tester must have knowledge of internal structure or code of software under test.
Structural or structure-based techniques includes:
  • Statement testing
  • Condition testing
  • LCSAJ (loop testing)
  • Path testing
  • Decision testing/branch testing
From ISTQB Syllabus: 
Common features of structure-based techniques:
  • Information about how the software is constructed is used to derive the test cases, for example, code and design.
  • The extent of coverage of the software can be measured for existing test cases, and further test cases can be derived systematically to increase coverage.

 

Testing Stop Process

This can be difficult to determine. Many modern software applications are so complex, and run in such as interdependent environment, that complete testing can never be done. "When to stop testing" is one of the most difficult questions to a test engineer. Common factors in deciding when to stop are:

  • Deadlines ( release deadlines,testing deadlines.)
  • Test cases completed with certain percentages passed
  • Test budget depleted
  • Coverage of code/functionality/requirements reaches a specified point
  • The rate at which Bugs can be found is too small
  • Beta or Alpha Testing period ends
  • The risk in the project is under acceptable limit.

Practically, we feel that the decision of stopping testing is based on the level of the risk acceptable to the management. As testing is a never ending process we can never assume that 100 % testing has been done, we can only minimize the risk of shipping the product to client with X testing done. The risk can be measured by Risk analysis but for small duration / low budget / low resources project, risk can be deduced by simply: -
  • Measuring Test Coverage.
  • Number of test cycles.
  • Number of high priority bugs. 
The Software Assurance Technology Center (SATC) in the Systems Reliability and Safety Office at Goddard Space Flight Center (GSFC) is investigating the use of software error data as an indicator of testing status. Items of interest for determining the status of testing include projections of the number of errors remaining in the software and the expected amount of time to find some percentage of the remaining errors.
To project the number of errors remaining in software, one needs an estimate of the total number of errors in the software at the start of testing and a count of the errors found and corrected throughout testing. There are a number of models that reasonably fit the rate at which errors are found in software, the most commonly used is referred to in this paper as the Musa model. This model is not easily applicable at GSFC, however, due to the availability and the quality of the error data.
At GSFC, useful error data is not easy to obtain for projects not in the Software Engineering Laboratory. Of the projects studied by the SATC, only a few had an organized accounting scheme for tracking errors, but they often did not have a consistent format for recording errors. Some projects record errors that were found but did not record any information about resources applied to testing. The error data frequently contained the date of entry of the error data rather than the actual date of error discovery. In order to use traditional models such as the Musa model for estimating the cumulative number of errors, one needs fairly precise data on the time of discovery of errors and the level of resources applied to testing. Real world software projects are generally not very accommodating when it comes to either accuracy or completeness of error data. The models developed by the SATC to perform trending and prediction on error data attempt to compensate for these shortcomings in the quantity and availability of project data.
In order to compensate for the quality of the error data, the SATC developed a software error trending models using two techniques, each based on the basic Musa model, but with the constant in the exponential term replaced by a function of time that describes the 'intensity' of the testing effort. The shape and the parameters for this function can be estimated using measures such as CPU time or staff hours devoted to testing. The first technique involves fitting cumulative error data to the modified Musa model using a least squares fit that is based on gradient methods. This technique requires data on errors found and the number of staff hours devoted to testing each week of the testing activity. The second technique uses a Kalman filter to estimate both the total number of errors in the software and the level of testing being performed. This technique requires error data and initial estimates of the total number of errors and the initial amount of effort applied to testing.

The SATC has currently examined and modeled error data from a limited number of projects. Generally, only the date on which an error was entered into the error tracking system was available, not the date of discovery of the error. No useful data was available on human or computer resources expended for testing.  
What is needed for the most accurate model is the total time expended for testing, even if the times are approximate. Using the sum of reported times to find/fix individual errors did not produce any reasonable correlation with the resource function required. Some indirect attempts to estimate resource usage, however, led to some very good fits.
On one project errors were reported along with the name of the person that found the error. Resource usage for testing was estimated as follows: A person was estimated to be working on the testing effort over a period beginning with the first error that they reported and ending with the last error that they reported. The percentage of time that each person worked during that period was assumed to be an unknown constant that did not differ from person to person. Using this technique led to a resource curve that closely resembled the Rayleigh curve (Figure 1).




Figure 1: Test Resource Levels for Project A
Figure 1: Test Resource Levels for Project A

On most of the projects, there was good conformity between the trend model and the reported error data. More importantly, estimates of the total number of errors and the error discovery parameter, made fairly early in the testing activity, seemed to provide reliable indicators of the total number of errors actually found and the time it took to find future errors.
Figures 2 shows the relationship between reported errors and the SATC trend model for one project. The graph represents data available at the conclusion of the project. This close fit was also found on other projects when sufficient data was available.
Figure 2: Cumulative Software Errors for Project A
Figure 2: Cumulative Software Errors for Project A

On another project, different estimates of the total number of errors were obtained when estimates were made over different testing time intervals. That is, there was inconsistent agreement between the trend model and the error data over different time intervals. Through subsequent discussion with the project manager it was learned that the rate of error reporting by the project went from approximately 100% during integration testing to 40% during acceptance testing. Furthermore, there was a significant amount of code rework, and testing of the software involved a sequential strategy of completely testing a single functional area before proceeding to test the next functional area of the code.  
Thus, the instability of the estimates of the total errors was a useful indicator of the fact that there was a significant change in either the project's testing and reporting process. Figure 3 shows the results for this project. Note the change in slope of the reported number of errors occurring around 150 days. The data curve flattens at the right end of the curve due to a pause in testing, rather than a lack of error detection. This project is still undergoing testing.
Figure 3: Cumulative S/W Errors for Project B - Flight S/W
Figure 3: Cumulative S/W Errors for Project B - Flight S/W

If error data is broken into the distinct testing phases of the life cycle (e.g., unit, system, integration), the projected error curve using the SATC model closely fits the rate at which errors are found in each phase.
Some points need to be clarified about the SATC error trend model. The formulation of the SATC equation is the direct result of assuming that at any instant of time, the rate of discovery of errors is proportional to the number of errors remaining in the software and to the resources applied to finding errors. Additional conditions needed in order for the SATC trending model to be a valid are:
  1. The code being tested is not being substantially altered during the testing process, especially through the addition or rework of large amounts of code.
  2. All errors found are reported.
  3. All of the software is tested, and testing of the software is uniform throughout the time of the testing activity.
Condition 1 is present to ensure that the total number of errors is a relatively stable number throughout the testing activity. Conditions 2 and 3 are present to ensure that the estimate of the total number of errors is in fact an estimate of the total errors present in the software at the start of testing - no new errors are introduced during testing. If testing is not "uniform" then the rate of error discovery will not necessarily be proportional to the number of errors remaining in the software and so the equation will not be an appropriate model for errors found. No attempt will be made here to make precise the meaning of the word "uniform".
The SATC developed this model rather than using the standard Musa model because it seems less sensitive to data inaccuracy and provides for non-constant testing resource levels. An additional benefit from this work is the application of the Rayleigh curve for estimating resource usage. Future work by the SATC will continue to seek a practical balance between available trend analysis theory and the constraints imposed by the limited accuracy and availability of data from real-world projects.