Part 1: QA — the fire safety specialist for your project. After a conference talk, I was asked to recommend a QA process development strategy. The colleague asking the question was starting a new project and wanted to develop a long-term strategy based on the experience of projects like Bumble. I was pleased to comply and shared my […]
After a conference talk, I was asked to recommend a QA process development strategy. The colleague asking the question was starting a new project and wanted to develop a long-term strategy based on the experience of projects like Bumble. I was pleased to comply and shared my vision with him.
A year after my promotion at Bumble Inc — the parent company that operates Badoo and Bumble app — to project manager, I benefited from the assistance of the QA team. The experience I gained while completing and maintaining three projects in my new role means I am fully able to vouch for the validity of my recommendation. I stand by every word.
So, if you have been tasked with testing an entire project and you are unsure where to start, or you are simply thinking about how to organise your work in such a way as to be helpful not just to you but also to your team then read on!
At conferences and in informal conversations the topic occasionally comes round to the importance of the QA engineer’s work and their role on a project. This may come in the form of a tentative question asked by a colleague, “Couldn’t we just release it without QA?”. This question is usually followed by a lengthy talk.
It seems to me that the problem lies in the fact that the effectiveness of a QA engineer is inversely proportional to the number of ‘urgent and unexpected’ complicated tasks they have to deal with. This is like the paradox of a fireman salary. Firemen get paid even when there are no jobs to do. The fewer accidents on their patch, the higher the quality and effectiveness of their preventive work. However, the fewer jobs they get, the more it seems to an outsider that firemen aren’t actually doing anything.
Meme like this sum it up:
Such an attitude also generates myths about the work content of QA engineers.
Software testing is a relatively young discipline so in order to dispel the myths and to better understand the processes involved. It perhaps makes sense to draw an analogy with better-known and well-developed segments. I might, for example, compare the creation of a software product to the construction of modern modular buildings, and the work of the QA engineer to the work of a fire safety specialist.
A fire safety specialist is not responsible for putting out fires. They do other things. They:
It is similar for testing engineers.
The QA engineer is not the one who fixes the bug in the application, but they can
1. give timely notification that the bug is there,
2. identify places where bugs could occur in the code,
3. reduce the likelihood of them occurring, and
4. share their knowledge, so that colleagues who encounter a similar problem are able to solve it easily.
Basically, to build a SAFE system the QA should:
Answer the question. How can I tell if there is an error in the component/integration?
Answer the questions. How, as changes are being made, can I determine whether this particular component won’t be affected and will continue to work properly? Is it possible to return to an earlier point in the change process, if something goes wrong? How can we minimise the likelihood of errors occurring in the code?
Answer the question. If an error occurs in this component, how can I be sure that this is where it occurred?
Answer the question. What might other participants in the process not understand when using this component? What might lead to incorrect use and problems?
Before looking at all the QA engineers’ tasks one by one, allow me to point out some myths around how QA is perceived on the team, from my experience of being a QA engineer. Then I’ll attempt to dispel them
At most companies, a QA engineer’s sole responsibility is to prevent bugs from happening, at least this is the first thing people say. This aspect of their work is greatly overvalued, generating a lot of stereotypes.
At Bumble we consider the following to be myths.
Myth # 1: the QA engineer has to test everything possible. They do not approve a ticket until all the bugs in it have been fixed.
This myth is propagated either by beginner testers or by ‘effective managers’ who consider the sole responsibility of a QA engineer is to locate the fire and find bugs.
This myth leads to two problems. Firstly, minor faults delay the release of new functionality, and the delay may be critical for the business. Secondly, the atmosphere among staff is greatly affected.
Myth # 2. The QA department is full of ham-fisted inadequately qualified workers, who failed a chance to become a programmer. So, first and foremost, what they need to do is break everything.
It is either QA engineers themselves with their low self-esteem or developers with high self-esteem who are the main spreaders of this myth. They have got the idea into their heads that unless a tester has broken something, they aren’t very good at their job.
As a result, the QA engineer focusing on looking for bugs can forget about the objectives of the business. This means that they search for obscure and involved cases, which only a very small proportion of users ever encounter — just so they can find a bug and can feel needed.
Myth # 3. The tester bears the responsibility if a release contains bugs.
Fortunately, at Bumble this myth had already been dispelled by the time I joined the team. Nevertheless, I often encounter this myth on other projects.
This particular myth affects the speed with which problems can be solved. The delay arises because the first person that people come to, asking, “So, where’s the feature?”, is actually the last person in the chain delivering the feature.
Myth # 4. Positive testing is the QA engineer’s sole task.
As a rule, those who spread this myth are those people who have a poor grasp of the nature of equivalency classes and borderline cases are (the theoretical basis for QA work). As a result, tickets arrive for testing lacking the necessary information. The developer thinks, “Why should I spend my time writing a description if there is someone whose job it is to do the same thing all over again? Let them do it.”
Besides generating an unhealthy atmosphere on the team this myth also creates the following problem: instead of comprehensive testing coverage, the QA engineer is literally doing detective work to find out firstly what the developer has done and then devising how to test it.
I came across the most comprehensive instructions for combatting the first, second and third myths above at a talk by Nancy Kelln at the TestBash conference in Munich (to watch her talk, register at ministryoftestingorg.com or listen to her podcast on the subject).
In brief, the idea is as follows:
A QA engineer is someone whose job is to provide the most objective information about the state of a project/feature as fast as possible on the state of a project/feature. The information that enables a decision to be taken regarding project/feature release, based on the interests of the business.
The QA must remember that sometimes speed of delivery of the feature is critical for the business. This is why a worthwhile tester is not someone who identifies the greatest number of minor defects (demonstrating attention to detail and other qualities which might be in a job description) but is rather one who, as fast as possible, provides objective information on the current state of the product. The latter allows managers to take a decision regarding its release and to identify priorities in respect of forthcoming fixes.
The QA engineer may obtain this information by following a particular sequence of steps. It is important that the use of this sequence follows the ‘fractal principle’.
Let’s return for a moment to the analogy of fire safety. Imagine the process of constructing a building. If we build without following a ‘fractal’ approach, then, first of all, we would put up one wall, plaster it, paint it, set up the alarm system, and, only then, proceed to the next wall. The fractal principle means that, at each stage of working on a project (architecture, individual integration, work on a specific ticket in Jira), feature testability needs to be put in place. So that, when the building work is completed, it doesn’t turn out that the wiring for the alarm system has to be threaded through a tube which was omitted from the building’s blueprint). In order to implement this at any level of the project (idea, implementation, servicing, refactoring, moving on to the next version), the QA engineer has questions to answer, and in the following sequence (SAFE):
1. (Setup Signal System) How can I tell whether new errors have occurred in this functionality?
2. (Assess Affected Areas) How can I pre-empt errors occurring and reduce the likelihood of them happening?
3. (Facilitate Finding and Fixes) How fast can I identify what a given error is related to?
4. (Extend Expertise and Educate) Will my experience of solving a problem be useful to me or someone else in the future?
Let’s take a more detailed look at how to organise the solution for each of these questions at different stages in the development of the project and at different levels of abstraction.
This stage addresses the question, “How can I tell if there is an error in the component/integration at production?” The key issue here is the information from production.
On large, sophisticated projects it is the departments for gathering statistics and monitoring that perform analysis of production. Let’s consider how this aspect of the work can be used by a QA engineer using the example of the Bumble billing team.
We test over 70 different payment integrations in 250 countries of the world and often statistics and logs are the only ways to make sure that everything is okay in the particular integration (combination of country and payment provider).
For example, recently lots of payment processing providers (especially SMS) have recently offered SDK, where they themselves receive information from the user’s SIM card (MNC/MCC code, IMSI, MSISDN). In this case, it is impossible to verify integration without an actual SIM card from the country for which the payment is being verified.
For this reason, we try to cover as many integrations as possible with tests, using mocks, fixtures, stubs and other tools(read more about this in this article). However, sometimes an error occurs precisely in the part where the mock or fixture is located, and, even if the problem isn’t at our end, that doesn’t make things easier for our users. We have to do our best to identify mistakes and avert negative consequences.
Meanwhile, statistics and logs are a good way of compensating for the consequences of human error. For example, if during testing you overlooked distinctive aspects of payment systems related to legislation in a particular country (if you accept payments from abroad), or how mobile operators work. Analysis of statistics and logs in such cases can become something of a safety net (even if it is delayed).
At this stage, the QA engineer has to work on two fronts:
1. Researching the current state of the product after changes have been made. If there are problems, can they be minimised?
2. Monitoring whether there is a plan of action should errors have slipped through the net.
As soon as a ticket passes from the developer to the QA engineer, someone is going to ask:
Colleagues don’t do this merely to annoy you, I am absolutely convinced, speaking from the position of a product manager
The speed with which a feature is delivered is crucial to the business, especially if it is something new for the market. Companies are often prepared to release a feature even if it contains errors (as long as the errors in question won’t block the advertised functionality), achieving this before their competitors do.
For this reason, in order to be able to track the history of a product at any point in time, something useful for any business, I recommend to use the following sequence of steps when testing:
I will look at the specifics of each step in my next article. Here I want to move on to the second part: implementing preventive measures.
The team should consider Plan B as a major issue as well, i.e. the option to reverse changes quickly, if a ‘fire’ is discovered immediately after deployment. Just as in the case with metrics, implementation of Plan B will subsequently be passed from the QA department to another team. Once changes are reversed, they will then pass to the release engineers. If, however, this process has not been set up, it is logical for the QA engineer to get involved.
Here, just as in the case of the alarm system, the role of the QA engineer varies, depending on the project’s level in the organisation. At Bumble, this task has been resolved at the level of release management: all changes needed at the production are tested in a special environment — pre-production. Then follows a period of time during which the release isn’t closed. If post-release, critical errors are found to be on production, they can be reversed.
These steps will allow you to reduce the volume of work in establishing preventive measures, and so free up resources for other activities, an aspect that lots of people fail to take into consideration.
Here comes my favourite part.
When testing we do occasionally ‘cheat’. Sometimes programmers write patches to be used in order to test something. Or the QA engineer themselves creates test data which although designed to fool the system, in fact, tests the case.
There will be more than one occasion when you will be dealing with patches, data generators and other ‘cheating’ methods, so it makes sense to document these straightaways or, even better, to integrate them into the programming code.
Having a system that stores all the ‘cheating’ methods (mocks, stubs, data generators) achieves several objectives at once for us:
Paradoxically, but this is what sometimes produces the greatest resistance from managers and developers. In a previous job at another company I once had a conversation with a project manager that went pretty much as follows:
Me: I need a plug-in that will generate testing data for our system.
Manager: Send a ticket to the developers.
– But I need it now, and I can write it myself!
– Listen, you don’t get paid to write code. If you write code, you’ll spend more time on it than programmers would, and that means your code will cost the company more.
– But you don’t have the resources at the moment to devote to this ticket.
– In that case, you’re going to have to check everything by generating data manually.
Unfortunately, many companies think that a QA engineer is ineffective at writing code. So, I was so pleased to hear that at Bumble this particular aspect of work on the part of the QA engineer’s role was not only something desirable, but was considered one of the criteria for growing professionally and moving forward in your career! I have an agreement with my manager that I can spend up to 20% of my time on my project (and that represents a day per week). You can see the evidence of this approach in my colleagues’ frequent presentations at conferences.
I won’t be saying anything new if I tell you that documentation is a good thing. If in the near future, you are planning to scale up a project, I recommend reassessing the roles and duties of the members of the team. Also, if a new person comes along, then creating documentation is worth making a higher priority. When functionality appears which is going to be tested by several people, it is worth sharing knowledge.
Let us consider the extent to which given errors are critical (as I work in the Billing department, we will consider only issues connected with the payment process, but you can create your own priority list). Below, in reverse order of the extent to which they are critical, is a list of errors that we encounter in practice:
1. A user has made a payment, but has not received a service or has received the wrong one (losses in terms of reputation).
2. A user cannot make a payment, and it is our fault or the fault of the payment provider; the entry point to the payment form is not working, the integration is broken etc (financial losses).
3. Other errors.
Having an alarm in place (using logs or statistics) is the most useful contribution when responding to or preventing the most critical errors in good time. Some critical errors which occur may not be our fault but we still have to respond to them. For example, a breakdown at the payment provider’s end, to which we can respond by switching the user over to another payment provider or displaying an error message. Analysis of alarm messages in logs and statistics can help pinpoint mistakes that make into production through human error.
Preventive measures also represent a major contribution to the search for errors. And perfecting testing tools and writing up documentation has an indirect impact on the process of dealing with them.
Can the order of the points (their order of priority) change? Yes, it can. In our team’s experience the ratio of ‘localising fires’ to ‘testing’ varies, depending on the make-up of the sprint. If the sprint is in relation to new services and new integrations, then the ratio is 40/60 (alarm/prevention, respectively); if it is in relation to refactoring and bug fixing, then it is 60/40. Thus, in the first case, the order of priorities may change significantly in the direction of preventive measures, leaving the alarm system in second place.
Something similar is true with respect to the ratio between ‘perfecting tools’ and ‘writing up documentation’. Depending on the processes handled by the team, priorities may change. If an integration project is envisaged or the members of the team change, documentation may become a higher priority than developing new tools. If, however, the team is more or less stable, and the structure of the tasks involves more work inside the department rather than integration, then the development of tools becomes more important.
The work of a QA engineer can be compared to the work of a fire safety specialist when constructing a building. If you want to be useful to your team and, at the same time, to respond to problems, then:
In the next article, we’ll look at each step in detail, based on the experience of Bumble.
QA — the fire safety specialist for your project. Part 1 was originally published in Bumble Tech on Medium, where people are continuing the conversation by highlighting and responding to this story.