Collaborative Filtering for Predictive Test Selection

See how the collaborative filtering method can be applied to the predictive test selection problem. The application helps tackle one of the core challenges of predictive test selection: handling the lack of information on new tests.

Lior Neumann
Itay Cohen
May 2, 2023
Introduction

Our goal at Redefine is to predict whether a given test will fail following a certain code change. To achieve this, we set out to estimate the probability of failure for each test, given a change in a specific file.
In order to estimate this conditional probability, we use historical test runs. If a given file foo.py changed 1,000 times, and out of them, test test_foo() failed 300 times, the estimated conditional probability is:


We model these conditional relationships using the following bipartite graph:


One of the core challenges of this estimation is the handling of new tests. There is little information available about new tests. For example, the following graph shows a new test, new_test(), with missing information about the relationship between new_test() and bar2.py.


Leveraging Collaborative Filtering 

Collaborative filtering, a popular method in the field of recommender systems, is usually modeled with two types of entities. These entities are typically thought of as movies and users. In our case, the entities are files and tests. Instead of users ranking movies, in our model, we have tests ranking files.


For the purpose of this blog post, we assume that we have a finite set of files denoted by F and a set of tests T. We further assume that we have a perfect understanding of the relationship between every pair f ∈ F, and t ∈ T in terms of failure probability. That is, we assume we know pt|f = P[tfails|fchanged


Starting at new_test() on the right-hand side and moving a single step, we reach bar1.py. Moving another step on the graph, we reach test_bar(), which is also affected by bar1.py. We conclude that both test_bar() and new_test() are affected by bar1.py and hence are similar. On the other hand, test_foo() and new_test() are not similar, as they are not mutually affected by the same file.


A slightly more sophisticated way to compute the similarity is by using the adjacency matrix of the bipartite graph, which we call P. In our case, the rows of P are indexed by foo.py, bar1.py, and bar2.py; and the columns of P are indexed by test_foo(), test_bar(), and new_test().


The similarity between any two tests can now be described by ⟨Pt, Ptnew , which corresponds to all the two steps on the graph from tnew to t, accounting for the weights. Note that this is not enough, as it means that a test that always fails has a large similarity with all other tests. In order to avoid this, we should normalize the two-step walk on the graph. This result is exactly the cosine similarity of the tests.

In our example, this results in the following similarity matrix:

Evaluating the new estimated probabilities for foo.py and bar2.py yields the following results:

Indeed, test_new() is similar to test_bar(); and test_bar() failed following a change in bar2.py, which increases the probability that test_new will also fail following a change in bar2.py.

3. Collaborative Filtering with Uncertainty

Unlike classical collaborative filtering, our tests do not rank the files with the exact probability of failure after their first run. Removing this assumption about our full and accurate probability, we can further improve our accuracy with the following estimation function:

One possible confidence function would be

where n(t, f) is the number of times test run t ran following a change in file f. Back to our example, given:

and using the knowledge we have about the relationship between test_bar and bar2.py, we can estimate the relationship between new_test and bar2.py as follows:



Summary

To summarize, we showed how the collaborative filtering method can be applied to the predictive test selection problem. The application helps tackle one of the core challenges of predictive test selection: handling the lack of information on new tests. It is worth mentioning that this is not a complete solution to the problem. Rather, at Redefine, we combine multiple techniques and methods in order to improve the accuracy of our ML models and reduce time to value.


Stay tuned for more posts like this...