WebSeq: A Genomic Data Analytics Platform for Monogenic Disease Discovery
Author(s): Milind Agarwal, Kshitiz Ghimire, Joy D. Cogan, Undiagnosed Disease Network, Janet Markle.
Whole exome sequencing (WES) is commonly used to study monogenic diseases. The application of this sequencing technology has gained in popularity amongst clinicians and researchers as WES pricing has declined. The accumulation of WES data creates a need for a robust, flexible, scalable and easy-to-use analytics platform to allow researchers to gain biological insight from this genomic data. We present WebSeq, a self-contained server and web interface to facilitate intuitive analysis of WES data. WebSeq provides access to sophisticated tools and pipelines through a user-friendly and modern web interface. WebSeq has modules that support i) FASTQ to VCF conversion, ii) VCF to ANNOVAR[1] CSV conversion, iii) family-based analyses for Mendelian disease gene discovery, iv) cohort-wide gene enrichment analyses, (v) an automated IGV[2] browser, and (vi) a ‘virtual gene panel’ analysis module. WebSeq Pro, our expanded pipeline, also supports SNP genotype analyses such as ancestry inference and kinship testing. WebSeq Lite, our minimal pipeline, supports family-based analyses, cohort-wide gene enrichment analyses, and a virtual gene panel along with the IGV [2] browser module. We anticipate that the rigorous use of our web application will allow researchers to expedite discoveries from human genomic data [3]. WebSeq Lite, WebSeq, and WebSeq Pro are fully containerized using Docker[4], run on all major operating systems, and are freely available for personal, academic, and non-profit use at http://bitly.ws/g6cn.