AI-Based Image Quality Assessment in CT

AI-Based Image Quality Assessment in CT

Article Information

Lars Edenbrandt^1,2,3, Elin Tragardh⁴^,5, and Johannes Ulen⁶

¹Region Vastra Gotaland, Sahlgrenska University Hospital, Department of Clinical Physiology, Gothenburg, Sweden

²Department of Molecular and Clinical Medicine, Institute of Medicine, Sahlgrenska Academy, University of Gothenburg, Gothenburg, Sweden

³SliceVault AB, Malmo, Sweden

⁴Department of Clinical Physiology and Nuclear Medicine, Skane University Hospital and Lund University, Malmo, Sweden

⁵Wallenberg Center for Molecular Medicine, Lund University, Malmo, Sweden

⁶Eigenvision AB, Malmo, Sweden

*Corresponding author: Lars Edenbrandt, Region Vastra Gotaland, Sahlgrenska University Hospital, Department of Clinical Physiology, Gothenburg, Sweden.

Received: 25 August 2022; Accepted: 06 September 2022; Published: 14 October 2022

Citation: Lars Edenbrandt, Elin Tragardh, and Johannes Ulen. AI-Based Image Quality Assessment in CT. Archives of Clinical and Biomedical Research 6 (2022): 869-872.

View / Download Pdf Share at Facebook

Abstract

Medical imaging, especially computed tomography (CT), is becoming increasingly important in research studies and clinical trials and adequate image quality is essential for reliable results. The aim of this study was to develop an artificial intelligence (AI)-based method for quality assessment of CT studies, both regarding the parts of the body included (i.e. head, chest, abdomen, pelvis), and other image features (i.e. presence of hip prosthesis, intravenous contrast and oral contrast).

Approach: 1, 000 CT studies from eight different publicly available CT databases were retrospectively included. The full dataset was randomly divided into a training (n = 500), a validation/tuning (n = 250), and a testing set (n = 250). All studies were manually classified by an imaging specialist. A deep neural network network was then trained to directly classify the 7 different properties of the image.

Results: The classification results on the 250 test CT studies showed accuracy for the anatomical regions and presence of hip prosthesis in the interval 98.4% to 100.0%. The accuracy for intravenous contrast was 89.6% and for oral contrast 82.4%.

Conclusions: We have shown that it is feasible to develop an AI-based method to automatically perform a quality assessment regarding if correct body parts are included in CT scans, with a very high accuracy.

Keywords

Artificial Intelligence; Diagnostic Imaging; Machine Learning; Quality Assessment

Artificial Intelligence articles; Diagnostic Imaging articles; Machine Learning articles; Quality Assessment articles

Article Details

1. Introduction

Medical imaging, especially computed tomography (CT), is becoming increasingly important in research studies and clinical trials. Large projects and trials could include hundreds or thousands of CT studies and adequate image quality is essential for reliable results. This is a particular concern in multi-center trials, which often provide detailed imaging guides that must be followed in order to correctly include patients. Problems related to imaging could lead to either exclusion of patients or false image data incorporated in study or trial results. A quality check of images selected for a study is therefore an important process. Today, this is performed manually. Often, the quality check must be performed promptly by the clinical research organization before the patient can be enrolled. In both clinical trials and in large retrospective studies, this could be tedious work. The quality check of images could ensure for example that:

The correct part of the body is visible in the CT, e.g., the chest in a lung cancer study,
CT artefacts are not present, e.g., hip prostheses causing artefacts which usually prevent a proper analysis of the prostate,
The CT is acquired according to the study protocol regarding the use of intravenous or oral contrast.

Information about a CT study should ideally be described in the DICOM tags. However, experience shows that it is not possible to rely only on this information. This is true especially in research projects and clinical trials when important information in the DICOM tags could be deleted in the pseudonymization process. The use of artificial intelligence (AI) to solve clinical problems has been intensely studied in recent times [1]. Deep learning, in particular, has gained attention as a method of obtaining complex information from medical images. AI could potentially be trained to help with image quality assessment and could be an important tool in this important, otherwise manual, task. The aim of this study was to develop an AI-based method for quality assessment of CT studies, both regarding the parts of the body included (i.e. head, chest, abdomen, pelvis), and other image features (i.e. presence of hip prosthesis, intravenous contrast and oral contrast).

2. Methods

2.1 Patients

We retrospectively included 1,000 CT studies from eight publicly available CT databases, see Table 1.

Database	Number of images	References
C4KC-KiTS	204	[2–4]
ACRIN 6668	203	[2, 5, 6]
CT Lymph Nodes	176	[2, 7–9]
CT-ORG	117	[2, 10–12]
NSCLC-Radiomics	115	[2, 13, 14]
Task 07 Pancreas	100	[15]
Task 03 Liver	52	[15]
Anti-PD-1 Immunotherapy Lung	33	[2, 16]
Total	1,000

Table 1: The number of images selected from each publicly available database.

Before training the model, the full image set of 1, 000 CT studies was randomly divided into a training (n = 500), a validation/tuning (n = 250), and a testing set (n = 250). The test set was reserved for model evaluation.

Manual Classification/Ground Truth Definition

All CT studies were classified by a nuclear medicine specialist experienced in hybrid imaging. Each case was classified based on the presence of the following seven features:

Head- The cranium is visible at least partly. Head is not present if only part of the mandible is visible.
Chest- The lungs are visible. Only very minor parts may be missing.
Abdomen- Main parts of liver, spleen, and the kidneys are visible.
Pelvis- The hip bones are visible.
Hip prosthesis- Uni- or bilateral hip prosthesis including implants for fixation of hip fractures.
Intravenous (IV) contrast- Signs of intravenous contrast including different phases.
Oral contrast- Signs of oral contrast including different phases.

An overview of the distribution of the different classes in the dataset is given in Table 2.

Class	Positive count	Negative count
Head	256	744
Chest	603	397
Abdomen	903	97
Pelvis	702	298
Hip prosthesis	25	975
Intravenous contrast	256	744
Oral contrast	422	578

Table 2: Positive and negative examples for each class in the dataset.

AI Tool

The AI tool consists of a 3D-ResNet, [17] a deep neural network designed for classification of 3D images. The network have an input shape of 110 × 110 × 110 × 1 pixels with 7 output channels each with its own sigmoid activation. Each output channel represents one of the classes defined in Section 2.2. Many CT images contain quite a lot of air, which is not helpful for classification. In order to remove air, the images are smoothed using a Gaussian kernel with standard deviation 5mm³. An axis-aligned bounding box is then fitted to all pixels with Hounsfield unit (HU) above –800 in the smoothed image. The original image is then cropped to this bounding box. The cropped CT images are pre-processed by clamping the HU values to the range [-1000, 3000] and then normalized to [-1, 1]. Furthermore, the CT volumes are re-sized to resolution 5 × 6 × 12 mm (or the smallest possible pixel shape with the same aspect ratio making the full image fit) and placed in the middle of the input volume.

Sampling: The classes are quite imbalanced as seen in Table 2. In order to sample uncommon examples more often each image i is sampled proportional to a weight w_i defined as:

where L is the set of labels, pℓ the total number of positive examples, and nℓ the total number of negative examples for label ℓ. p_ℓ and n_ℓare calculated individually for the training and validation sets.

2.3.2 Training: Binary cross-entropy is used as loss function and the network is optimized using the ADAM optimizer [18] with Nesterov momentum and an initial learning rate of 1 × 10⁻⁵. Each training and validation epoch consists of 2, 000 samples and 400 samples respectively. If the validation loss has not improved for 10 epochs the learning rate is halved until it reaches a minimum value of 1 × 10⁻⁸. The training stops when validation loss has not improved for 20 epochs. During training the images are augmented using rotations (−0.1 to 0.1 radians), scaling (−10 to 10%) and an intensity shift of (−100 to +100 HU).

3. Results

The classification results on the test set of is presented in Table 3. The accuracy for the anatomical regions and presence of hip prosthesis were 98.4% to 100.0%. The accuracy for intravenous contrast was 89.6% and for oral contrast 82.4%. Figure 1 and 2 show patient examples with correct and non-correct classifications.

Classification task	TP	TN	FP	FN	Accuracy
Head	70	176	0	4	98.40%
Chest	146	104	0	0	100.00%
Abdomen	222	24	0	4	98.40%
Pelvis	167	80	0	3	98.80%
Hip prosthesis	6	244	0	0	100.00%
Intravenous contrast	149	75	13	13	89.60%
Oral contrast	71	135	15	29	82.40%

Table 3: Result for the 250 test CT studies. True positive (TP), True negative (TN), False positive (FP), False negative

(FN), and accuracy.

Figure 1: Example of a correctly classified image from ref [5]. Head, chest, abdomen, pelvis, hip prosthesis, and oral contrast were present. Intravenous contrast was not present.

4. Discussion

In this study we have shown that it is feasible to develop an AI-based tool to automatically check that the correct body parts are visible in the CT studies, with a very high accuracy. The AI-based method was also able to accurately detect hip prosthesis even though the number of positive cases in the training and validation sets were limited (n = 19). A limitation of this study was that the AI-based tool only made a classification regarding presence of contrast or not. Many different phases of contrast enhancement exist, [19] typically early arterial phase (15–25 s post injection), late arterial phase (30–40 s post injection), hepatic or late portal venous phase (70–90 s post injection), nephrogenic phase (85–120 s post injection) and excretory or delayed phase (5–10 min post injection). No clearly defined times post injection of the contrast agent exist, but with a large number of images with different contrast phases in the training group, it would probably be possible to train an AI-method to categorize the contrast phase in more detail than we did in this study. Other potential problems related to intravenous contrast is different amounts of contrast agent administered, for example reduced doses in patients with kidney disease. Problems with oral contrast for this type of task include different timings of contrast as well as different contrast agents (for example barium or iodine-based agents). A more comprehensive classification of contrast would most likely require a much larger training set. Some of the false negative cases of our test set represented very late intravenous phases with low contrast in the aorta but contrast in the kidneys or urinary bladder (Figure 2). This type of cases was not common in the training set. Also the appearance of oral contrast on CT showed substantial variation. In most cases contrast was clearly visible in the small bowel. In other cases only the stomach or colon showed contrast. Medical imaging is often a key asset in clinical trials, as it can provide efficacy evaluation and safety monitoring [20]. It is also often used as screening for eligible patients to include. Medical imaging can also improve clinical trial efficacy and reduce the time to complete a specific trial, by offering imaging biomarkers that can act as a surrogate endpoint. In order to do so, good image quality is crucial and therefore it is necessary to monitor image quality throughout different stages of a study. A step in the quality assessment could be to determine if correct body parts are included and if the images contain contrast agent or not. Further development of automated image quality assessment could also include image properties such as noise level and patient motion. Evaluation by a human observer is both time consuming and subjective. AI-based tools could help minimize both issues.

Figure 2: Example of image with a miss-classification from ref [5]. Excretory intravenous contrast phase is present, but not detected by AI. All other classifications were correct.

5. Conclusions

We have shown that it is feasible to develop an AI-based method to automatically perform a quality assessment regarding if correct body parts are included in CT scans, with a very high accuracy.

Acknowledgment

We would like to thank Måns Larsson and Olof Enqvist for fruitful discussions regarding this study and closely related topics.

References

Jarrett D, Stride E, Vallis K, et al. Applications and limitations of machine learning in radiation oncology. The British Journal of Radiology 92 (2019): 20190001.
Clark KW, Vendt BA, Smith KE, et al. The cancer imaging archive (tcia): Maintaining and operating a public information repository. J. Digital Imaging 26 (2013): 1045-1057.
Heller N, Sathianathen N, Kalapara A, et al. C4kc kits challenge kidney tumor segmentation dataset. (2019).
Heller N, Isensee F, Maier-Hein KH, et al. The state of the art in kidney and kidney tumor segmentation in contrast-enhanced ct imaging: Results of the kits19 challenge. Medical Image Analysis 67 (2021): 101821.
Kinahan P, Muzi M, Bialecki B, et al. Data from the acrin 6668 trial nsclc-fdg-pet (2019).
Machtay M, Duan F, Siegel BA, et al. Prediction of survival by [18f] fluorodeoxyglucose positron emission tomog- raphy in patients with locally advanced non–small-cell lung cancer undergoing definitive chemoradiation therapy: results of the acrin 6668/rtog 0235 trial. Journal of clinical oncology 31(2013): 3823.
Roth H, Lu L, Seff A, et al. A new 2.5 d representation for lymph node detection in ct. (2015).
Roth HR, Lu L, Seff A, et al. A new 2.5 d representation for lymph node detection using random sets of deep convolutional neural network observations. International conference on medical image computing and computer-assisted intervention, Springer (2014): 520-527.
Seff A, Lu L, Cherry KM, et al. 2d view aggregation for lymph node detection using a shallow hierarchy of linear classifiers. International conference on medical image computing and computer-assisted intervention, Springer (2014): 544-552.
Rister B, Shivakumar K, Nobashi T, et al. Ct-org: A dataset of ct volumes with multiple organ segmentations (2019).
Rister B, Yi D, Shivakumar K, et al. Ct organ segmentation using gpu data augmentation, unsupervised labels and iou loss (2018).
Bilic P, Christ PF, Vorontsov E, et al. The liver tumor segmentation benchmark (lits),” CoRR abs/1901.04056, 2019.
Rister, K. Shivakumar, T. Nobashi, and D. L. Rubin. Ct-org: A dataset of ct volumes with multiple organ segmentations (2019).
Aerts HJWL, Velazquez ER, Leijenaar RTH, et al. Decoding tumour phenotype by noninvasive imaging using a quantitative radiomics approach. Nature Communications (2014).
Antonelli M, Reinke A, Bakas S, et al. The medical segmentation decathlon (2021).
Patnana M, Patel S, Tsao AS. Data from anti-pd-1 immunotherapy lung (2019).
Yang C, Rangarajan A, Ranka S. Visual explanations from deep 3d convolutional neural networks for alzheimer’s disease classification. CoRR abs/1803.02544 (2018).
Kingma DP, Ba J. Adam: A method for stochastic optimization (2014).
Fleischmann D, Kamaya A. Optimal vascular and parenchymal contrast enhancement: The current state of the art. Radiologic Clinics of North America 47 (2009): 13-26.
Murphy P. Imaging in clinical trials. Cancer Imaging 10(2010): S74-S82.