Algorithm for Selecting Potential SARS-CoV-2 Dominant Variants based on POS-NT Frequency
Author(s): Eunhee Kang, TaeJin Ahn and Taesung Park
Coronavirus disease 19 (COVID-19), currently prevalent worldwide, is caused by a novel coronavirus, severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). Similar to other RNA viruses, SARS-CoV-2 continues evolving through random mutations, creating numerous variants, including Alpha, Beta, and Delta. It is, therefore, necessary to predict the mutations constituting the dominant variant before they are generated. This can be achieved by continuously monitoring the mutation trends and patterns. Hence, we sought to design a dominant variant candidate (DVC) selection algorithm in the current study. To this end, we obtained COVID-19 sequence data from GISAID and extracted position-nucleotide (POS-NT) frequency ratio data by country and date through data preprocessing. We then defined the dominant dates for each variant in the USA and developed a frequency ratio prediction model for each POS-NT. Based on this model, we applied DVC criteria to build the selection algorithm, which was verified for Delta and Omicron. Using Condition 3 as the DVC criterion, 69 and 102 DVC POS-NTs were identified for Delta and Omicron an average of 47 and 82 days before the dominant dates, respectively. Moreover, 13 and 44 Delta- and Omicron-defining POS-NTs were recognized 18 and 25 days before the dominant dates, respectively. We identified all DVC POS-NTs before the dominant dates, including rapidly and gently increasing POS-NTs. Considering that we successfully defined all POS-NT mutations for Delta and Omicron, the DVC algorithm may represent a valuable tool for providing early predictions regarding future variants, helping improve global health.