Machine Learning Analysis of Virus based on Transmission Electron Microscopy Images: Application to SARS-CoV-2
Author(s): Y Dabiri, GS Kassab
The goal of this paper was to develop a machine learning (ML) platform for categorization of viruses using transmission electron microscopy (TEM) images. More efficient pathogenesis, treatment and vaccine development strategies become possible once the virus family is identified. We used three deep learning (DL) pretrained models namely AlexNet, VGG16 and SquezzeNet. The classifier portion of the models was modified and trained for the available virus dataset. We used 20% of the images (320) for testing the DL models. The dataset included TEM images from 16 virus families including novel corona virus (SARS-CoV-2). We also used two unsupervised methods to analyze image clusters: principal component analysis (PCA) and t-distributed stochastic neighbor embedding (t-SNE). The results from PCA and t-SNE were visualized based on two components. The AlexNet, VGG16 and SqueezeNet models were able to predict the categorization of test images with accuracy 77.8±4.5%, 75.3±4.7% and 77.8±4.5%, respectively. The receiver operating characteristic (ROC) curves had area under curve (AUC) greater than 0.9. Our PCA and t-STE results suggested SARS-CoV-2 is closest to Influenza family of viruses. Using DL models, TEM images can be classified into virus families. This ML approach may lead to more accurate and faster virus TEM image classification tools, which is particularly important for pandemic situations such as with the current SARS-CoV-2 crisis.