Q-Mer Analysis: A Generalized Method for Analyzing RNA-Seq Data
Author(s): Tatsuma Shoji and Yoshiharu Sato.
Background: RNA-Seq data are usually summarized by counting the number of transcript reads aligned to each gene. However, count-based methods do not take alignment information, where and how each read was mapped in the gene, into account. This information is essential to characterize samples accurately. In this study, we developed a method to summarize RNA-Seq data without losing alignment information. Results: To include alignment information, we introduce “q-mer analysis,” which summarizes RNA-Seq data with 4q kinds of q-length oligomers. Using publicly available RNA-Seq datasets, we demonstrate that at least q ? 9 is required for capturing alignment information in Homo sapiens. It should be noted that 49 = 262,144 is approximately 10 times larger than the number of genes in H. sapiens (20,022 genes). Furthermore, principal component analysis showed that q-mer analysis with q = 14 linearly distinguished samples from controls, while a count-based method failed. These results indicate that alignment information is essential to characterize transcriptomics samples. Conclusions: In conclusion, we introduce q-mer analysis to include alignment information in RNA-Seq analysis and demonstrate the superiority of q-mer analysis over count-based methods in that q-mer analysis can distinguish case samples from controls. Combining RNA-Seq research with q-mer analysis could be useful for identifying distinguishing transcriptomic features that could provide hypotheses for disease mechanisms.