# Collaborative Filtering for Predictive Test Selection

See how the collaborative filtering method can be applied to the predictive test selection problem. The application helps tackle one of the core challenges of predictive test selection: handling the lack of information on new tests.

##### Introduction

Our goal at Redefine is to predict whether a given test will fail following a certain code change. To achieve this, we set out to estimate the probability of failure for each test, given a change in a specific file.

In order to estimate this conditional probability, we use historical test runs. If a given file foo.py changed 1,000 times, and out of them, test** test_foo()** failed 300 times, the estimated conditional probability is:

We model these conditional relationships using the following bipartite graph:

One of the core challenges of this estimation is the handling of new tests. There is little information available about new tests. For example, the following graph shows a new test,** new_test()**, with missing information about the relationship between **new_test() **and **bar2.py.**

##### Leveraging Collaborative Filtering

Collaborative filtering, a popular method in the field of recommender systems, is usually modeled with two types of entities. These entities are typically thought of as movies and users. In our case, the entities are files and tests. Instead of users ranking movies, in our model, we have tests ranking files.

For the purpose of this blog post, we assume that we have a finite set of files denoted by *F *and a set of tests *T*. We further assume that we have a perfect understanding of the relationship between every pair *f ∈ F*, and *t ∈ T *in terms of failure probability. That is, we assume we know *p _{t|f }= P[t_{fails}|f_{changed}*

Starting at **new_test()** on the right-hand side and moving a single step, we reach** bar1.py.** Moving another step on the graph, we reach test_bar(), which is also affected by **bar1.py**. We conclude that both **test_bar()** and **new_test()** are affected by **bar1.py** and hence are similar. On the other hand, **test_foo()** and **new_test() **are not similar, as they are not mutually affected by the same file.

A slightly more sophisticated way to compute the similarity is by using the adjacency matrix of the bipartite graph, which we call *P*. In our case, the rows of *P *are indexed by **foo.py, bar1.py**, and **bar2.py**; and the columns of *P *are indexed by **test_foo()**, **test_bar()**, and **new_test()**.

The similarity between any two tests can now be described by ** ⟨P_{t}, P_{tnew }⟩**,

*which corresponds to all the two steps on the graph from*

*t*to

_{n}ew*t*, accounting for the weights. Note that this is not enough, as it means that a test that always fails has a large similarity with all other tests. In order to avoid this, we should normalize the two-step walk on the graph. This result is exactly the cosine similarity of the tests.

In our example, this results in the following similarity matrix:

Evaluating the new estimated probabilities for foo.py and bar2.py yields the following results:

Indeed, **test_new()** is similar to **test_bar()**; and **test_bar()** failed following a change in **bar2.py**, which increases the probability that **test_new** will also fail following a change in **bar2.py**.

##### 3. Collaborative Filtering with Uncertainty

Unlike classical collaborative filtering, our tests do not rank the files with the exact probability of failure after their first run. Removing this assumption about our full and accurate probability, we can further improve our accuracy with the following estimation function:

** **One possible confidence function would be

where *n*(*t, f*) is the number of times test run *t *ran following a change in file *f*. Back to our example, given:

and using the knowledge we have about the relationship between **test_bar** and **bar2.py**, we can estimate the relationship between **new_test** and **bar2.py** as follows:

Summary

To summarize, we showed how the collaborative filtering method can be applied to the predictive test selection problem. The application helps tackle one of the core challenges of predictive test selection: handling the lack of information on new tests. It is worth mentioning that this is not a complete solution to the problem. Rather, at Redefine, we combine multiple techniques and methods in order to improve the accuracy of our ML models and reduce time to value.

Stay tuned for more posts like this...