Lymphoma {EBcoBART} | R Documentation |
Lymphoma
Description
Contains training data and test data to predict 2 year progression free survival (yes/no) #' based on four types of variables: copy number variation, point mutations, translocations, #' and clinical. For the variables, auxiliary information (co-data) is available which may be used to give more weight to certain variables in the prediction model. This data set is used in the manuscript "Co-data Learning for Bayesian Additive Regression Trees"
Usage
data(Lymphoma)
Format
A list object with five objects:
- Xtrain
Dataframe with 101 rows (samples) and 140 columns (variables). Explanatory variables used for fitting BART. Variable names are anonymized.
- Ytrain
Numeric of length 101. Binary training response (0: 2 year progression free survival, 1: disease came back within 2 years)
- Xtest
Dataframe with 83 rows (samples) and 140 columns (variables). Explanatory variables used for fitting BART. Variable names are anonymized.
- Ytest
Numeric of length 83 Binary training response (0: 2 year progression free survival, 1: disease came back within 2 years)
- CoData
Dataframe with 140 rows and 2 columns. Auxiliary information on the 140 variables. Contains a grouping structure indicating which type a variable is (copy number variation (CNV), mutation, translocation, or clinical), and p values (logit scale) for each variable obtained from a previous study
Author(s)
Jeroen M. Goedhart, j.m.goedhart@amsterdamumc.nl
Jurriaan Janssen
References
Jeroen M. Goedhart, Thomas Klausch, Jurriaan Janssen, Mark A. van de Wiel. "Co-data Learning for Bayesian Additive Regression Trees." arXiv preprint arXiv:2311.09997. 2023 Nov 16.