This dataset, named `sample_data_extern`, is a subset of publicly available microarray data from the HG-U133PLUS2 chip. It contains expression levels of 200 genes across 50 samples, used primarily as an external validation set in robust feature selection studies. The data has been sourced from the ArrayExpress repository and has been referenced in several research articles.
sample_data_externA data frame with 50 observations and 201 variables, including:
Factor. The response variable.
Numeric. Expression level of gene 236694_at.
Numeric. Expression level of gene 222356_at.
Numeric. Expression level of gene 1554125_a_at.
Numeric. Expression level of gene 232823_at.
Numeric. Expression level of gene 205766_at.
Numeric. Expression level of gene 1560446_at.
Numeric. Expression level of gene 202565_s_at.
Numeric. Expression level of gene 234887_at.
Numeric. Expression level of gene 209687_at.
Numeric. Expression level of gene 221592_at.
Numeric. Expression level of gene 1570123_at.
Numeric. Expression level of gene 241368_at.
Numeric. Expression level of gene 243324_x_at.
Numeric. Expression level of gene 224046_s_at.
Numeric. Expression level of gene 202775_s_at.
Numeric. Expression level of gene 216332_at.
Numeric. Expression level of gene 1569545_at.
Numeric. Expression level of gene 205946_at.
Numeric. Expression level of gene 203547_at.
Numeric. Expression level of gene 243239_at.
Numeric. Expression level of gene 234245_at.
Numeric. Expression level of gene 210832_x_at.
Numeric. Expression level of gene 224549_x_at.
Numeric. Expression level of gene 236628_at.
Numeric. Expression level of gene 214848_at.
Numeric. Expression level of gene 1553015_a_at.
Numeric. Expression level of gene 1554199_at.
Numeric. Expression level of gene 1557636_a_at.
Numeric. Expression level of gene 1558511_s_at.
Numeric. Expression level of gene 1561713_at.
Numeric. Expression level of gene 1561883_at.
Numeric. Expression level of gene 1568720_at.
Numeric. Expression level of gene 1569168_at.
Numeric. Expression level of gene 1569443_s_at.
Numeric. Expression level of gene 1570103_at.
Numeric. Expression level of gene 200916_at.
Numeric. Expression level of gene 201554_x_at.
Numeric. Expression level of gene 202371_at.
Numeric. Expression level of gene 204481_at.
Numeric. Expression level of gene 205831_at.
Numeric. Expression level of gene 207061_at.
Numeric. Expression level of gene 207423_s_at.
Numeric. Expression level of gene 209896_s_at.
Numeric. Expression level of gene 212646_at.
Numeric. Expression level of gene 214068_at.
Numeric. Expression level of gene 217727_x_at.
Numeric. Expression level of gene 221103_s_at.
Numeric. Expression level of gene 221785_at.
Numeric. Expression level of gene 224207_x_at.
Numeric. Expression level of gene 228257_at.
Numeric. Expression level of gene 228877_at.
Numeric. Expression level of gene 231173_at.
Numeric. Expression level of gene 231328_s_at.
Numeric. Expression level of gene 231639_at.
Numeric. Expression level of gene 232221_x_at.
Numeric. Expression level of gene 232349_x_at.
Numeric. Expression level of gene 232849_at.
Numeric. Expression level of gene 233601_at.
Numeric. Expression level of gene 234403_at.
Numeric. Expression level of gene 234585_at.
Numeric. Expression level of gene 234650_at.
Numeric. Expression level of gene 234897_s_at.
Numeric. Expression level of gene 236071_at.
Numeric. Expression level of gene 236689_at.
Numeric. Expression level of gene 238551_at.
Numeric. Expression level of gene 239414_at.
Numeric. Expression level of gene 241034_at.
Numeric. Expression level of gene 241131_at.
Numeric. Expression level of gene 241897_at.
Numeric. Expression level of gene 242611_at.
Numeric. Expression level of gene 244805_at.
Numeric. Expression level of gene 244866_at.
Numeric. Expression level of gene 32259_at.
Numeric. Expression level of gene 1552264_a_at.
Numeric. Expression level of gene 1552880_at.
Numeric. Expression level of gene 1553186_x_at.
Numeric. Expression level of gene 1553372_at.
Numeric. Expression level of gene 1553438_at.
Numeric. Expression level of gene 1554299_at.
Numeric. Expression level of gene 1554362_at.
Numeric. Expression level of gene 1554491_a_at.
Numeric. Expression level of gene 1555098_a_at.
Numeric. Expression level of gene 1555990_at.
Numeric. Expression level of gene 1556034_s_at.
Numeric. Expression level of gene 1556822_s_at.
Numeric. Expression level of gene 1556824_at.
Numeric. Expression level of gene 1557278_s_at.
Numeric. Expression level of gene 1558603_at.
Numeric. Expression level of gene 1558890_at.
Numeric. Expression level of gene 1560791_at.
Numeric. Expression level of gene 1561083_at.
Numeric. Expression level of gene 1561364_at.
Numeric. Expression level of gene 1561553_at.
Numeric. Expression level of gene 1562523_at.
Numeric. Expression level of gene 1562613_at.
Numeric. Expression level of gene 1563351_at.
Numeric. Expression level of gene 1563473_at.
Numeric. Expression level of gene 1566780_at.
Numeric. Expression level of gene 1567257_at.
Numeric. Expression level of gene 1569664_at.
Numeric. Expression level of gene 1569882_at.
Numeric. Expression level of gene 1570252_at.
Numeric. Expression level of gene 201089_at.
Numeric. Expression level of gene 201261_x_at.
Numeric. Expression level of gene 202052_s_at.
Numeric. Expression level of gene 202236_s_at.
Numeric. Expression level of gene 202948_at.
Numeric. Expression level of gene 203080_s_at.
Numeric. Expression level of gene 203211_s_at.
Numeric. Expression level of gene 203218_at.
Numeric. Expression level of gene 203236_s_at.
Numeric. Expression level of gene 203347_s_at.
Numeric. Expression level of gene 203960_s_at.
Numeric. Expression level of gene 204609_at.
Numeric. Expression level of gene 204806_x_at.
Numeric. Expression level of gene 204949_at.
Numeric. Expression level of gene 204979_s_at.
Numeric. Expression level of gene 205823_at.
Numeric. Expression level of gene 205902_at.
Numeric. Expression level of gene 205967_at.
Numeric. Expression level of gene 206186_at.
Numeric. Expression level of gene 207151_at.
Numeric. Expression level of gene 207379_at.
Numeric. Expression level of gene 207440_at.
Numeric. Expression level of gene 207883_s_at.
Numeric. Expression level of gene 208277_at.
Numeric. Expression level of gene 208280_at.
Numeric. Expression level of gene 209224_s_at.
Numeric. Expression level of gene 209561_at.
Numeric. Expression level of gene 209630_s_at.
Numeric. Expression level of gene 210118_s_at.
Numeric. Expression level of gene 210342_s_at.
Numeric. Expression level of gene 211566_x_at.
Numeric. Expression level of gene 211756_at.
Numeric. Expression level of gene 212170_at.
Numeric. Expression level of gene 212494_at.
Numeric. Expression level of gene 213118_at.
Numeric. Expression level of gene 214475_x_at.
Numeric. Expression level of gene 214834_at.
Numeric. Expression level of gene 215718_s_at.
Numeric. Expression level of gene 216283_s_at.
Numeric. Expression level of gene 217206_at.
Numeric. Expression level of gene 217557_s_at.
Numeric. Expression level of gene 217577_at.
Numeric. Expression level of gene 218152_at.
Numeric. Expression level of gene 218252_at.
Numeric. Expression level of gene 219714_s_at.
Numeric. Expression level of gene 220506_at.
Numeric. Expression level of gene 220889_s_at.
Numeric. Expression level of gene 221204_s_at.
Numeric. Expression level of gene 221795_at.
Numeric. Expression level of gene 222048_at.
Numeric. Expression level of gene 223142_s_at.
Numeric. Expression level of gene 223439_at.
Numeric. Expression level of gene 223673_at.
Numeric. Expression level of gene 224363_at.
Numeric. Expression level of gene 224512_s_at.
Numeric. Expression level of gene 224690_at.
Numeric. Expression level of gene 224936_at.
Numeric. Expression level of gene 225334_at.
Numeric. Expression level of gene 225713_at.
Numeric. Expression level of gene 225839_at.
Numeric. Expression level of gene 226041_at.
Numeric. Expression level of gene 226093_at.
Numeric. Expression level of gene 226543_at.
Numeric. Expression level of gene 227695_at.
Numeric. Expression level of gene 228295_at.
Numeric. Expression level of gene 228548_at.
Numeric. Expression level of gene 229234_at.
Numeric. Expression level of gene 229658_at.
Numeric. Expression level of gene 229725_at.
Numeric. Expression level of gene 230252_at.
Numeric. Expression level of gene 230471_at.
Numeric. Expression level of gene 231149_s_at.
Numeric. Expression level of gene 231556_at.
Numeric. Expression level of gene 231754_at.
Numeric. Expression level of gene 232011_s_at.
Numeric. Expression level of gene 233030_at.
Numeric. Expression level of gene 234161_at.
Numeric. Expression level of gene 235050_at.
Numeric. Expression level of gene 235094_at.
Numeric. Expression level of gene 235278_at.
Numeric. Expression level of gene 235671_at.
Numeric. Expression level of gene 235952_at.
Numeric. Expression level of gene 236158_at.
Numeric. Expression level of gene 236181_at.
Numeric. Expression level of gene 237055_at.
Numeric. Expression level of gene 237768_x_at.
Numeric. Expression level of gene 238897_at.
Numeric. Expression level of gene 239160_at.
Numeric. Expression level of gene 239998_at.
Numeric. Expression level of gene 240254_at.
Numeric. Expression level of gene 240612_at.
Numeric. Expression level of gene 240692_at.
Numeric. Expression level of gene 240822_at.
Numeric. Expression level of gene 240842_at.
Numeric. Expression level of gene 241331_at.
Numeric. Expression level of gene 241598_at.
Numeric. Expression level of gene 241927_x_at.
Numeric. Expression level of gene 242405_at.
This dataset was extracted from a larger dataset available on ArrayExpress and is used as an external validation set for feature selection tasks and other machine learning applications in bioinformatics.
Ellenbach, N., Boulesteix, A.L., Bischl, B., et al. (2021). Improved Outcome Prediction Across Data Sources Through Robust Parameter Tuning. Journal of Classification, 38, 212–231. tools:::Rd_expr_doi("10.1007/s00357-020-09368-z").
Hornung, R., Causeur, D., Bernau, C., Boulesteix, A.L. (2017). Improving cross-study prediction through addon batch effect adjustment or addon normalization. Bioinformatics, 33(3), 397–404. tools:::Rd_expr_doi("10.1093/bioinformatics/btw650").
# Load the dataset
data(sample_data_extern)
# View the first few rows of the dataset
head(sample_data_extern)
# Summary of the dataset
summary(sample_data_extern)
Run the code above in your browser using DataLab