Simultaneous regression and classification for drug sensitivity prediction using an advanced random forest method

Abstract

Machine learning methods trained on cancer cell line panels are intensively studied for the prediction of optimal anti-cancer therapies. While classification approaches distinguish effective from ineffective drugs, regression approaches aim to quantify the degree of drug effectiveness. However, the high specificity of most anti-cancer drugs induces a skewed distribution of drug response values in favor of the more drug-resistant cell lines, negatively affecting the classification performance (class imbalance) and regression performance (regression imbalance) for the sensitive cell lines. Here, we present a novel approach called SimultAneoUs Regression and classificatiON Random Forests (SAURON-RF) based on the idea of performing a joint regression and classification analysis. We demonstrate that SAURON-RF improves the classification and regression performance for the sensitive cell lines at the expense of a moderate loss for the resistant ones. Furthermore, our results show that simultaneous classification and regression can be superior to regression or classification alone.

Citation

[LEG+22] Lenhof, K., Eckhart, L., Gerstner, N., Kehl, T., Lenhof, H.-P. Simultaneous regression and classification for drug sensitivity prediction using an advanced random forest method. Scientific Reports, 2022. DOI: 10.1038/s41598-022-17609-x.
Read Publication