At Babbel, we revised our test automation strategy about 1.5 years ago. Since then, our focus has been on frontend testing (browser and mobile) for crucial parts of our business. Which, in our case, is mostly that a user can register or login and navigate through the language lessons after subscribing.
I’ll give a short introduction to test automation here, so if you’re a test automation engineer, what I’ll be covering is probably not new to you. If, however, you’re new to the topic, read on!
What is test automation?
Basically, instead of a person (usually a manual QA) interacting with a browser or mobile device, you have a driver doing it for you. This driver is a piece of software. There are several different browser/mobile device drivers out there, and the ones we are using are called Selenium and Appium. These drivers execute commands that are specified in code which, at Babbel, is written by us test automation engineers.
Test Code
The test code is usually divided into 3 layers of which the topmost is the easiest to understand for a non-tester and can be written by anyone with sufficient domain knowledge. The middle layer is a bit more abstract and maps the topmost layer to the corresponding implementations that make up the bottom layer. This is where you find the programming code, the actual implementations.
First layer
The first layer is written in regular human sentences and stored in a so called feature file, where each tested potential problem is called a scenario, and each line/sentence in it is a step. The language is called Gherkin Here is an example for a scenario:
@android @3
Scenario: Try to login with wrong email (babbel login)
Given I visit the start page
And I click on the login button
Then I am on the login page
When I click on the email button
Then I am on the login with email page
When I fill in an invalid username / password combination
And I click on the login button
Then I should see a red error box with the text "Invalid email or password."
Second layer
The next layer is composed of files that contain the step definitions. Regular expressions are used to match a step with its definition. Our definitions are written in Ruby where regular expressions are marked with either //
or %r{}
as you can see in the examples below (the definitions match some of the steps from our example scenario).
Then(/^I am on the login with email page$/)do@current_page=Pages::LoginWithEmail.newassert@current_page.page_detected?,"I am not on the login with email page."endWhen(%r{^I fill in an invalid username / password combination$})do@current_page.fill_in_email_password_combination(true)EndWhen(/^I click on the login button$/)do@current_page.click_login_buttonendThen(/^I should see a red error box with the text "([^"]*)"$/)do|text|assert@current_page.red_error_box_text_correct?(text)end
The use of variables is possible, as can be seen in the last step definition above, where text
is a variable that can be set to a specific value when the step is used. In our example scenario the value of text
is Invalid email or password. The test will fail if there is no red error box or if the red error box contains a different text.
In the last step definition, we are asserting that on the @current_page
the function red_error_box_text_correct?
with the input text
returns true. The definition of this functions can be found in the code in the next section.
Third layer
Finally, everything that is used in the step definition files needs to be coded somewhere. The actual code files make up the lowest and most complicated layer of the test code. At Babbel, we use the same feature and step definition files for all platforms (Android, iOS, web), but the actual implementations vary, so they are split into different folders. The code can be written in any desired programming language as long as there is a client library written for it - or you write you own (current status Selenium bindings, Appium bindings). We are using ruby.
Let’s have a look at the implementation of the email login example above for Android. Here is small excerpt:
require_relative "android_base"
module Pages
class LoginWithEmail < AndroidBase
[...]
LOGINEMAIL_PASSWDFIELD_XPATH = "#{Pages::Base::ANDROID_XPATH_PREFIX}passwd_text']".freeze
TOP_RED_ERROR_TEXT_XPATH = "#{Pages::Base::ANDROID_XPATH_PREFIX}errorLabel']".freeze
[...]
def click_forgot_password_button
wait_for_element(:xpath, LOGINEMAIL_FORGOT_XPATH).click
end
def red_error_box_text_correct?(assumed_text)
actual_text = wait_for_element(:xpath, TOP_RED_ERROR_TEXT_XPATH).text
if assumed_text == actual_text
return true
end
false
end
end # end class
end # end module
To find out whether the assumed_text
is shown, the top red error box needs to be found first. So in the function red_error_box_text_correct?
we wait for an element to appear in the app that is identified by its xpath "#{Pages::Base::ANDROID_XPATH_PREFIX}errorLabel']"
. There are other ways (so called locator strategies) to identify elements, but here xpath is used. Once we have the element, we have access to its text and can compare actual_text
to the assumed_text
and return true
or false
.
If you’re wondering about the #{Pages::Base::ANDROID_XPATH_PREFIX}
, it is defined in the pages’** base class and translates to //*[@resource-id='com.babbel.mobile.android.en:id/"
.
Finding elements’
There are several locator strategies. The input for a locator strategy is a selector. For example, we might want to find an element by its ID “green_button”. “Find by ID” would be the locator strategy and “green_button” the selector.
So how do we find the appropriate locator strategy and selectors? Keep in mind that if we can’t identify the elements that are displayed on a page, we cannot interact with that page.
The easiest way is to identify any element by its ID, name, value, or text. And if all of that is neither useful nor given, by its xpath or css selector. With an xpath, anything can be located, but the xpath itself might not be easily comprehensible for a human.
Consider the example above. We are using the Appium inspector on an Android phone which is running our example scenario. It is trying to log in with an invalid username/password combination, and the desired red error box has shown up. In the inspector, the top red error box is selected and we can read its ID (called “resource-id”).
Luckily, the error box has the ID “errorLabel”. If it didn’t, we might have to use any of the other attributes. Look at the image above and consider them from top to bottom.
- Type android.widget.TextView is unspecific. On any page, there might be 10 TextViews
- The text Invalid email or password. is dependent on the phone’s locale (location and display language setting) and might be different on different phones. Also, it could also be changed if the product owner felt a different text might be better
*Finally, the xpath ‘android.wiget.LinearLayout[1]/…./’ is incomprehensible and can easily change if the layout of the page was altered
An ID however, usually does not change. It is neither dependent on the locale nor the layout.
If you find yourself in a situation where identifying an element is not straightforward or risky (in terms of test might fail soon), ask your developers to please add an identifier for you.
When automating browser interaction, finding IDs/xpaths/css selectors is made very easy for us. Chrome and Firefox offer easy-to-use right-click inspect element options. In this example, we are looking at an element with the xpath “div#siteSub”
, or simply the ID “siteSub”
. The client libraries for Selenium and Appium offer methods to find elements by their ID/xpath/name/value/.. Take a look at the Python library for a nice example.
Also, this cheat sheet is quite helpful.
The driver
Let’s go back and think about the driver again. What is it doing? When we run a test script on a computer, we first try to establish a connection to the driver, and the driver to the browser or phone.
Consider using a mobile phone for testing. We’d send a configuration to the driver (in this case, Appium) telling it what phone to use. And we’d also tell it which locale to use, which timeout, layout, and many other things. The configuration we’re sending is called ‘desired capabilities’. Selenium for web testing has them as well, just with settings that apply to browsers.
The driver will then try to connect to the desired phone with the given settings. If successful, we’ll receive a session ID, if not, an error message. Once a session has been established, we can start sending our commands. The driver translates our commands to commands that the phone understands and can execute. For every step in a scenario, messages are being sent back and forth, and finally, we receive either an OK or an error message.
Since so many messages are being sent between our computer, the driver, and the phone, the execution of such tests is usually slow (minutes to hours).
This is the runtime of all our current iOS tests run on an iphone simulator on my computer. Local run, 244 steps, 83 minutes.
The closer the computer, driver, and phone are physically located, the faster the test will run. This is usually a local setup or a server-side test execution when using external providers. A server-side test is also (similar to) a local setup - just not in our office, but in the provider’s.
We use an external provider (a so-called ‘mobile farm’) to be able to run tests in parallel and to have access to a much greater choice of phone types than we have in our office. Until now, we have only been able to use client-side execution of tests, but server-side execution follow later.
Client-side execution means that the test runs on a computer within our network or, for example, on a CI server like Travis or Bitrise, while the Appium server is located somewhere else, and the phones again somewhere else, although probably close to the Appium server, considering that it’s the same company running both server and mobile devices. The physical distance introduces lag time to the tests. The test execution time is roughly twice as long when using this setup. To learn more about client/server-side execution, you might want to read this article.
Why are we doing test automation?
Presumably, a business wants to make sure its product works as expected before releasing (a new version of) it. To some extent, the testing can be done by humans, but as a product gets more complex with more scenarios to test, or releases become very frequent, you’d have to hire an army of testers. Also, repeating the same tests over and over again is not an interesting job and might lead to careless mistakes.
Computers, on the other hand, neither get bored, nor forget things, nor make human errors; they just do as they are told. Also, as mentioned before, automated tests can run in parallel and off-site. How many devices can a manual QA operate in parallel? Probably not as many as a computer can.
Another often forgotten advantage of writing test cases (scenarios) - which is the first step in test automation - is that they help understanding the complexity of a user story. Imagine the product owner comes up with a new feature. There is often not just one test case, but several positive, as well as negative, ones.
The typical example is the rather simple login field, where the positive test case would be ‘user can log in’, and the negative ‘user cannot log in’. But there isn’t just one scenario where the user cannot log in, but many, for example ‘wrong input’, ‘empty fields’, ‘wrong username/password combination’.
Writing the test cases into the user story gives the developers an idea of what to keep in mind and reduces the number of questions that usually come back while implementing a story. It makes estimation easier, and the test cases can be used as acceptance criteria for the user story. Which test cases to automate can be decided on by the product team.
All automated tests take away work from the manual QAs that can be better assigned to test things that cannot or should not be automated, such as exploratory testing or very complex test cases. They also have more time to speak with team members, take care of tickets, etc., hopefully leaving them happier than if they were being used as test-monkeys.
Since well-written frontend tests do what a typical user does, we can be pretty sure that our product is working correctly when we have an extensive test suite and all tests are passing. So… we should automate all the things, right?
Why not to automate all the things
As mentioned before, browser tests (and mobile tests even more so), are slow. Running the login scenario manually might take me 30 seconds. Running it with Appium, however, might take five minutes, four and a half of which are needed to set up the session, start up the mobile phone, apply the settings, and take care of the messages back and forth.
Besides this overhead, the tests are also flaky. There are often issues with timeouts. A manual tester can easily deal with a situation where an element appears too quickly or too late and can still execute all remaining steps, but an automated test will fail if any step fails.
From time to time, there will be incompatibilities between the driver and phone/browser versions that will make all the tests fail, requiring a couple of days’ work until the issues have been resolved by the owners of the respective driver/phone/browser software.
Frontend tests are also quite dependent on browser or phone models. The same test might pass on Firefox, but fail on Chrome, or work on Android Phone A, but not Android Phone B. A manual tester, however, can run tests on all of them.
So with an increasing number of tests, we would see an increasing number of false negatives, not to mention the increasing duration of the test run. Even when done in parallel, it would take a lot of time, and developers are not happy to wait for test results, especially not on a pull request level. If you are unfamiliar with pull requests and the usual git workflow, please read this easy introduction.
Your development team might fail to trust the tests if they are constantly failing for reasons not related to the developers’ code changes.
Also, managers might be pressured to release quickly and be tempted to release without testing if there is no fast running test suite available, creating undesirable precedents or conventions.
Another pressing matter is the question of maintenance. Who writes the tests, and who maintains them? Technical QAs are often moved around teams to help where test coverage is low, so they leave teams and hand over the tests to the developers, who then might be reluctant to change the test code together with their other code changes.
I would recommend keeping the automated user tests to an absolute minimum, and to not run all of them all the time, but to find a setup in which some tests - and not necessarily the same ones - run
- On PR level (with green tests being mandatory to merge the branch) in a development environment
- In a staging environment before releasing to production
- On live during a nightly run with some form of built-in alarm system, because otherwise no one will look at the build results
Generally, in the development process, there is more than one good opportunity to run tests!
There are also ways to speed up tests and to make them more stable, but going into that would be too detailed for this blog entry. For example, a short guideline by saucelabs.
Keep in mind that before you automate every scenario you can come up with, it’s better to have a prioritization session with your team or product owner and to devise a suitable test strategy.
There are many other layers in the software architecture where we can test the correctness of code and services, and many things should be tested on lower levels. There is a so called testing pyramid that can aid with this.
In fact, everything that can be tested on a lower level, should be, and browser/mobile level tests should cover only those features that are built only/mostly on the user interface, and those that are critical for the business. Think in terms of money, company brand, and reputation. What is unacceptable if it doesn’t work? That’s what you should automate.