Integration of pathway knowledge into a weighted recursive feature elimination approach for cancer patients risk stratification

Integration of pathway knowledge into a reweighted recursive feature elimination approach for risk stratification of cancer patients - Motivation: One of the main goals of high-throughput gene-expression studies in cancer research is to identify prognostic gene signatures, which have the potential to predict the clinical outcome. It is common practice to investigate these questions using classification methods. However, standard methods merely rely on gene-expression data and assume the genes to be independent. Including pathway knowledge a priori into the classification process has recently been indicated as a promising way to increase classification accuracy as well as the interpretability and reproducibility of prognostic gene signatures.
Results: We propose a new method called Reweighted Recursive Feature Elimination. It is based on the hypothesis that a gene with a low fold-change should have an increased influence on the classifier if it is connected to differentially expressed genes. We used a modified version of Google's PageRank algorithm to alter the ranking criterion of the SVM-RFE algorithm. Evaluations of our method on an integrated breast cancer dataset comprising 788 samples showed an improvement of the area under the receiver operator characteristic curve as well as in the reproducibility and interpretability of selected genes.
Availability: The R code of the proposed algorithm is given in Supplementary Material.