Improving diagnostics and prognostics for cancer remains a challenging problem. As this disease is largely molecular, omics data are essential to answer diagnostic/prognostic related questions. Omics data, however, is inherently high dimensional, with the number of features quickly exceeding 10000. This poses a significant problem to build precise and reliable prediction models.
A way to improve predictions is by incorporating external information, also called co-data, into the models. There is a wealth of information available that could potentially enhance diagnostic/prognostic learning. Examples of such co-data sources are p-values from external studies, grouping of genes by their function in the cell, and the position of a gene on the genome.
My research focusses on incorporating co-data into tree-based methods like Bayesian additive regression trees and random forests. The integration of external information will be done such that the primary data is still central to the learner and the co-data guides the search for relevant features. An example of such a co-data model can be found here.