In a former life, I worked in software development, specifically on a suite of telecoms products in the 3G area that were live in networks around the globe. I was responsible for planning the release of software updates for several of these products at different times, and eventually for the overall planning of the complete 3G product suite. In my blog, I aim to highlight some of the challenges of the Testing process, and how we learned from experience and came up with positive solutions and recommendations that help the test teams and management to gain mutually beneficial outcomes.
At that time, we were running an Agile-like process where we had regular software publishing slots that varied depending on the age of the product. For example, every 2 weeks for a product that had just been released in the last few months, up to every 6 weeks for products approaching end-of-life (2 years after product release). Each slot was open for code updates up to 1 week before the publishing date, at which point the release was locked for final testing. This was only on the maintenance side of the house (i.e. dealing with released products). The development side of the house (i.e. prior to a product’s global release to adoring customers) was working with a Waterfall model involving anywhere from 6 to 12 months between product releases.
The problem I, and the others responsible for planning the software releases of the other 3G products encountered, was a consistent thread of incomprehension or lack of understanding was in the area of testing, both from managers with experience of a development environment and with testers.
From a management perspective, there was a repeated lack of understanding when it came to issues detected in that final week. Firstly, why was the test suite that detected the issue not run earlier in the cycle? Secondly, why did it require a week of testing once an updated bug fix was delivered? The testing run in the final week of the cycle included test suites that could only run on the final software, running them on anything other than that would undermine the results. This included test suites such as upgrade testing where any change in the software affected the database translation engine and required the upgrade testing to start from scratch again.
The week-long schedule of testing was the minimum amount of testing we could run within the context of the lab equipment we had allocated, the staff available and the minimum scope of test suites the system experts had determined were required to compare to prior software releases (to ensure no meaningful degradation in capacity, reliability, redundancy or functionality). Without the testing, there was no appropriate way to assess the potential risk and compare or “score” the software release. From the perspective of the testers, we encountered a blind spot when it came to troubleshooting. We had issues where testers were reporting faults as introduced in a specific version but could not state when the test suite in question had successfully run last, and if the fault was also present there. When a fault was not obviously due to a specific bug fix, testers were unable to proceed with meaningful trouble-shooting, as they did not understand how to investigate from first principles (step back through the internal builds for a release to identify when the issue appeared). They did not understand that an issue they saw might just be one symptom of a larger bug and that if the one symptom was no longer present it did not automatically mean the issue was resolved completely.
After considerable exasperation and head-scratching on behalf of the assorted planners we stumbled across the reason for this was due to the managers and testers in question having worked exclusively in the development side of the house in the past. They were used to an environment where there were 6 months, potentially 12, between software releases to clients, where there were 100s if not 1000+ bug fixes, numerous feature improvements, multiple new features, and assorted regulatory changes cumulatively made in a single release.
The result of this was managers that infrequently assessed whether the software was good enough to release to clients, consequently had little experience in assessing risk, or a need to understand fully what absent/failing test results meant in terms of risk in this context. In addition, clients tended to deploy newly released product in a test network for a number of weeks/months prior to rolling it out across their entire infrastructure (including experimenting with new features and determining the best configurations to use in their networks, etc.). In maintenance, the software updates went straight into live networks in order to deal with bugs the client was actively experiencing. A client might run a quick and dirty test suite for a few days before deploying it across their entire infrastructure. Client exposure to something slipping through the cracks was greater in the maintenance situation.
When faced with a software release every 2 weeks, and told we could not publish a release for at least another week after a solution was found for a release-blocking issue, the managers in question did not grasp the situation properly. They overlooked the software was going to be deployed into live and established client networks and were of the impression once the issue was fixed the software was good to go. After all, the developers were unlikely to make another mistake and surely the testing was unnecessary at this stage?
Similarly, on the development side of the house testers did not have much experience troubleshooting due to the time limitations imposed on them by the volume of code updates. They only did any significant amount of troubleshooting when a bug uncovered related to a new feature that was a high priority amongst clients. Consequently, when they moved to the maintenance side of the house where a greater need for troubleshooting skills existed they were ill-equipped.
Once we understood the source of the issues, we had a number of prolonged education sessions with the managers. Thankfully, most of them understood the situation, the differences between the two sides of the house, and came to grips with the fact that the maintenance side of things was about risk management/avoidance and not simply hitting a publishing date for the sake of it. A late release that worked was far preferable to publishing on time and causing an outage in multiple live networks.
The testers were a more difficult issue as it took time, experience, and mindset to create good trouble-shooters. Due to a rotation policy with testers, we had many promising trouble-shooters rotate out to other parts of the company so we were continuously receiving new testers. We created cheat sheets and guides that instructed inexperienced trouble-shooters on what to do when an issue was encountered, reducing the time lost when investigating tricky issues, and ensured the developers got a good supply of relevant information for investigating the source of the issue. It helped fast-track testers’ competence progression for those that had the aptitude to be good troubleshooters and quickly made them more useful to us.
- Software testing is not fully understood, or appreciated, by many. Unfortunately, this includes people whose job incorporates software testing, so good communication and knowledge sharing are crucial to overcome this gap.
- People that have prior experience in software development and testing sometimes have no transferable knowledge for a different software development and testing situation. Processes and tools have to be established to ensure all parties can effectively contribute to a smooth testing operation and sidestep this issue.
- It is important to understand risk management, particularly in the context of live networks owned by clients that cost hundreds of millions and in low billions in some cases to build and deploy.
- Cross-training, strong communication processes, and a watertight testing framework will enhance the experience of all parties and create the desired outcome for successful SW release.
Thankfully, Aspira’s Software Services group has widespread and extensive experience in all aspects of software development, testing, and developing test strategies. This means Aspira understands that testing on one project/product is not always appropriate for another, that each situation must be assessed on multiple factors to determine appropriate testing (status of the base platform, volume, and type of code changes made, availability of automated test tools versus resources for manual testing, target environment and acceptable level of risk, etc.).
For anyone encountering difficulties as I experienced in the past, or that needs advice on testing strategy, Aspira’s Software Services group is an excellent choice.
As Aspira turns 15 years old this year, we achieved ISO27001:2017 certification: the international standard for Information Security. The basic goal of ISO 27001 is to enable organisations to implement
Overview: This course will help you effectively use Microsoft Project for setting up projects, resource management, tracking progress, reporting, and communicating to stakeholders. It is for project managers, team leaders