Our experience with performance testing a Bonita page with Puppeteer and Lighthouse
Front end developers are interested in knowing how well user-facing web pages in an application perform, especially where high performance is needed. This article describes how our development team has set up performance testing, with the ultimate goal of automating these tests, using complex data.
I will explain how our runtime team has set up the performance tests on web-based application pages that are created using the Bonita platform’s development suite. I will go through what steps we have taken to develop these tests, what we tested, how we checked that the results are acceptable, and a bit about our choice of performance libraries.
Setting up performance tests for application pages
To performance test Bonita pages, we needed to :
- Prepare the data for the tests
- Define the test scenarios
- Define the test metrics
Prepare the data
The first thing that we did was to prepare the data on which all the tests will run.
There are three prerequisites:
- Define what data is needed
- Find out how the data will be generated
- Find a place where the data will be stored
To find out what data was needed we reached out to Bonita users to find how they were using the platform. What we wanted to do was to stretch the limits of the platform and create a complex set of data that contains the maximum numbers that we found for these users. For example, if two different users have 15000 and 18000 processes in their database, we wanted to create 20000 in ours.
We already had a database with previously generated data, but it had to be extended to cover our use case better by adding new types of generation.
We used an internally developed tool that can be found here to generate the new data. This tool calls the Bonita engine directly instead of passing through APIs, which improved the work time drastically.
Since the Bonita platform is used in the cloud, we decided to generate all the data into an AWS platform. This will create network latencies, which we actually want since we want to have the lab case with close to the worst performance that could still be achieved in a real environment.
Define the test scenarios
The next step was to define our test scenarios. We settled on two:
- The page load
- Actions that are taken on the page
For the page load, we open the browser, log into the application, open a new tab, go to the page and then take measurements for several metrics. Then we close that tab and open a new one where we load the page again - rinse and repeat a couple hundred times.
We could have closed and reopened the browser for each execution, but we didn’t because the difference between the first and the following executions didn’t seem that big, and also because launching the browser took too long.
To test actions on the page, the scenario is the same. However, keeping the browser open vs opening it after each action is still an open issue. If we keep the browser open, we might close the tab and navigate to the page again vs keeping the tab open and retesting the action. We haven’t completed this part yet, but it’s planned to be done in the future.
Define the test metrics
The next thing to consider were the metrics. We began by using the same ones used by Google developers, which can be found here.
For the page load test, the suggested metrics are:
- The First Contentful Paint (FCP), which measures the time it takes for the first text/image/video to be rendered onto the page.
- The Largest Contentful Paint (LCP), which measures when the largest element has finished rendering onto the page.
- The Time to Interactive (TTI), which measures the time it takes until the browser is able to reliably respond to the user input quickly.
- The Total Blocking Time (TBT). This measures the time the browser is blocked after something is displayed, but before it becomes interactive to the user.
For the page actions test, the important metrics are the other two that are mentioned on the web.dev website:
- First Input Delay (FID). This is the time between the user interaction and the browser starting to process the action.
- Cumulative Layout Shift (CLS). This is measured as a score instead of having a time value. It measures the impact of any part of the page moving unexpectedly.
We executed each scenario around 200 to 300 times. To see that the data that we are using in these tests is acceptable, we checked to confirm that the 90% value of each metric is under a threshold. (This threshold is different for each metric.) We also checked for the coefficient of variation. We used this second value to verify how stable the page is. It should be under 20%.
We launched tests in two situations:
- When a new page is created
- When a page is updated
The difference between these two is that in the first case, we just have to compare the page to the above-mentioned values, while in the second case we have to compare the values before and after the update.
Currently, these tests are launched in a local environment and we are processing the results manually, but we plan to launch the tests in Jenkins and compare the results automatically to the predefined thresholds.
Our experience with these tests
For this part of this article, I’ll describe the history of the libraries that we tried to use.
At first, we tried using Playwright. This is a NodeJs library for automated tests in Chrome, Firefox andWebkit browsers. Multi-browser support seemed super interesting, but after trying to put it in place, we just couldn’t make Playwright work to test the metrics that I described above.
Next we found that Puppeteer was another NodeJs library. Even though it doesn’t have multi-browser support by default, it has a direct connection to Lighthouse. Lighthouse can be easily used to generate performance results that contain the performance metrics we want. We decided to switch to Puppeteer in the end and try to look for multi-browser support since this is a very important point for us.
Using Lighthouse with Puppeteer was pretty easy. We use Puppeteer to launch the browser and log in, and then we connect Lighthouse to the browser. Even though we only need the 90% quantile and the coefficient of variation, we are still keeping the other values in case we need to analyse if there is an error.
This has been our experience with performance testing so far. I didn’t go into code too much, but you can find it here.
The work that remains to be done will be split into three parts:
- first, generate the data that has already been identified as missing;
- then, find out what data remains to be generated; and then
- add this data.
Ultimately, we expect to have a platform in the cloud that contains data of all possible types.