Thomas G Brooks 1 Gregory R Grant 1,2 1. ITMAT, University of Pennsylvania 2. Dept. of Genetics, University of Pennsylvania
Poster #85
Omics data is in the "p >> n" regime where there are fewer samples than measurements per sample. This creates dual challenges in generating realistic simulated data for the purposes of benchmarking. First, there isn't enough data to be able to compute a dependence structure (e.g., a full-rank correlation matrix). Second, generating omics-scale data with a specified correlation matrix is slow. These often mean that simulators assume independence of the measurements, which does not reflect reality. Here, we give a simple solution to both of these problems by using a low-rank correlation matrix to both approximate realistic dependencies in a real dataset and generate simulated data mimicking the real omics data with a NORTA (Normal to Anything) approach.
コメント