Skip to main content
Erschienen in:
Buchtitelbild

Open Access 2024 | OriginalPaper | Buchkapitel

Computed Tomography Artefact Detection Using Deep Learning—Towards Automated Quality Assurance

verfasst von : S. I. Inkinen, A. O. Kotiaho, M. Hanni, M. T. Nieminen, M. A. K. Brix

Erschienen in: Digital Health and Wireless Solutions

Verlag: Springer Nature Switzerland

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
download
DOWNLOAD
print
DRUCKEN
insite
SUCHEN
loading …

Abstract

Image artefacts in computed tomography (CT) limit the diagnostic quality of the images. The objective of this proof-of-concept study was to apply deep learning (DL) for automated CT artefact classification. Openly available Head CT data from Johns Hopkins University was used. Three common artefacts (patient movement, beam hardening, and ring artefacts (RAs)) and artefact free images were simulated using 2D axial slices. Simulated data were split into a training set (Ntrain = 1040 × 4(4160)), two validation sets (Nval1 = 130 × 4(520) and Nval2 = 130 × 4(520)), and a separate test set (Ntest = 201 × 4(804); two individual subjects). VGG-16 model architecture was used as a DL classifier, and the Grad-CAM approach was used to produce attention maps. Model performance was evaluated using accuracy, average precision, area under the receiver operating characteristics (ROC) curve, precision, recall, and F1-score. Sensitivity analysis was performed for two test set slice images in which different RA radiuses (4 pixels to 245) and movement artefacts, i.e., head tilt with rotation angles (0.2° to 3°), were generated. Artefact classification performance was excellent on the test set, as accuracy, average precision, and ROC area under curve over all classes were 0.91, 0.86, and 0.99, respectively. The precision, recall, and F1-scores were over 0.84, 0.71, and 0.85 for all class-wise cases. Sensitivity analysis revealed that the model detected movement at all rotation angles, yet it failed to detect the smallest RAs (4-pixel radius). DL can be used for effective detection of CT artefacts. In future, DL could be applied for automated quality assurance of clinical CT.

1 Introduction

Image artefacts encountered in computed tomography (CT) limit the diagnostic quality of the images and may result in a re-scan of the patient. CT artefacts can be caused by the patient, they may be based on CT physics, or caused by malfunctioning hardware [1]. Patient-based artefacts are mostly due to movement during acquisition. These artefacts cause, for example, blurring and distortions in the reconstructed images. They can, however, be mitigated using different motion correction algorithms [2].
The most common physics-based artefact is the beam hardening artefact in which the polychromatic low energy photons of the x-ray beam attenuate most in the patient, increasing the mean energy of the x-ray spectrum. Another common artefact type is a metal artefact, resulting from beam hardening and photon starvation which cause streaking and cupping artefacts in the CT images. These artefacts can be alleviated by increasing x-ray tube peak kilovoltage, stronger beam filtration and by using beam hardening and metal artefact reduction algorithms [3].
Common hardware-based artefacts are ring artefact, tube arcing, and air bubble artefact in the oil coolant of the tube [1, 4]. Ring artefacts (RAs) can be caused, for example, by dead pixels in the x-ray detector or miscalibration. Artefacts arising from a gas bubble in the oil coolant of the tube are difficult to detect [5]. Hardware-based artefacts usually require maintenance service. However, RAs may be corrected using e.g. interpolation methods, filtering approaches, or flat-field recalibration [6].
Several automated detection approaches have been proposed for CT image quality assessment [7, 8]. However, these studies do not focus on artefact detection but on technical image quality assessment directly from clinical patient images. For example, Smith et al. 2017 developed a method to automatically estimate the detectability index, noise power spectrum, and modulation transfer function directly from patient CT images [7]. On the machine learning front, using deep learning (DL) convolutional neural networks in image analysis has gained great interest in image processing [9]. For CT image quality assessment in particular, a DL method was recently developed for deformable image registration quality assessment from lung CT images [10]. In another recent study focusing on MRI, a fast and automated DL method for assessing re-scan need in motion-corrupted brain series was developed [11].
Even though sophisticated artefact correction algorithms exist, modern CT scanners may produce image artefacts limiting the diagnostic quality which in the worst case may require a re-scan. Therefore, imaging technologists must carefully review the CT reconstructions after image acquisition. An automated artefact detection tool could optimize this process. On the other hand, artefact detection could be used for monitoring CT artefact prevalence as a performance management tool in hospitals. Despite the recent developments in CT image quality assessment [1214], to the authors’ knowledge, there are only a few studies focusing on CT artefact detection using DL [15, 16]. In the work by Madesta et al., DL was applied for the detection of simulated respiration-related patient motion artefacts using a 3D patch-based approach in radiotherapy 4D CT imaging and for the subsequent correction of the artefacts using DL-based inpainting [15]. In the work by Prakash and Dutta, detector-related artifacts were simulated to projection data, and a deep learning approach was employed to detect streaks, rings, and bands [16]. In this study, a VGG-16 CNN architecture is applied for image artefact detection directly from clinical head CT patient images. Hypothesis is that DL can be utilized in the automated assessment of the clinical image artefacts.

2 Materials and Methods

Data:
In this study, the openly available Head CT scan dataset available from Johns Hopkins University Data Archive was used [17]. The dataset consists of 35 subjects’ non-contrast head CT scans. The dataset had to be curated as unwanted metal artefacts were present for some of the subjects’ dental regions, and those slices were excluded manually from the image stacks. After metal artefact exclusion, the whole dataset consisted of 1501 slice images (1300/201 training phase/test phase).
Three different artefacts were simulated in the dataset using internally developed algorithms: ring artefact, beam hardening artefact, and movement artefacts (Fig. 1). The simulation was performed for each slice image in 2D, and for each slice, all three artefacts were simulated. First, the slice image was segmented into {air, adipose, water, brain, skull} material regions based on HU thresholding, in which the CT slice is segmented to different tissues based on the HU values of the individual pixels {air ≤ −100 HU, adipose ≤ −30 HU, water ≤ 20 HU, brain < 1000 HU, skull ≥ 1000 HU}. The energy-dependent linear attenuation coefficients (µ) for each segmented material (x) were extracted from attenuation table obtained from XCAT virtual phantom software package [18]. The attenuation process was modeled in discrete form as:
$${I}_{k}=\sum_{E=0}^{{E}_{max}}{I}_{0,E}\mathrm{ exp}\left(-\sum_{j=1}^{N}{a}_{k,j}\sum_{m=1}^{M}{\mu }_{E,m}{x}_{j,m}\right),$$
(1)
where, Ik is the transmitted x-ray intensity at sinogram index k, I0,E is the input spectrum, ak,j is the element of the forward projection matrix at row k and column j and xj,m is the thickness of material mth material at pixel j. Emax is the maximum peak kilovoltage from the simulated x-ray tube spectrum. The 120 kVp and 70 kVp input spectra were simulated using the SPEKTR toolbox [19] in Matlab (2018b/v9.5.0, Mathworks Inc., Natick, MA, USA) (Fig. 1). The 120 kVp spectrum was applied for artefact free projection data simulation and for all artefacts except for beam hardening. Parallel beam projection geometry with 180 projections in 1-degree intervals was applied in the simulations and Poisson noise was simulated in the projection data. The projection data were flat-field corrected before filtered back projection (FBP) reconstruction with Ram-Lak filtering. Reconstructions were computed using the Astra toolbox (v.1.9.0.dev11) [20, 21]. The simulations and reconstructions were conducted in Python (v. 3.7).
The artefact-free projections and reconstructions were computed as explained above, and the artefacts were generated as follows:
Ring Artefact: RAs were inserted into the artefact-free FBP slice images as an image post-processing method. The number of RAs were chosen to contain either one single RA or several (1 to 20) RAs. The single RA corresponds to the situation where only one detector element is malfunctioning, and several RAs mimics a scenario of detector miscalibration or malfunctioning of multiple detector elements. When a single RA was simulated in the slice image, a circle mask with radius (4 to 245 range) was simulated and a value from uniform distribution (-1000 to 2000) was drawn, and the mask was added to the reconstructed image. For several RAs, diameters from 4 to 245 isocenter distances were simulated, and the value of the RAs in multiple RA case were drawn from Gaussian distribution with a mean equal to image pixel values (excluding values < 0) and a standard deviation of 100. The RAs’ thicknesses randomly varied from one to three pixels.
Beam Hardening Artefact: 70 kVp spectrum was used instead of 120 kVp in the beam hardening simulation, as it showed typical cupping artefact (Fig. 1c)).
Movement artefact: Movement artefact was simulated as head tilt rotation movement. The axial rotation axis was set at the center of the left edge of the slice image (Fig. 1d)). The amount of rotation (R) was drawn from a uniform distribution [−20, +20 degrees], and 180 evenly spaced intervals were generated from 0.1 to R degrees. Subsequently, the reconstructed image was rotated, and forward projection was generated to obtain the sinogram of the moved sample. One projection of this moved sample was stored in the sinogram containing movement. This was repeated 180 times to fill the sinogram of the moving target.
Deep Learning Model for CT Artefact Classification:
PyTorch (v1.8.1) framework was utilized, and an VGG-16 convolutional neural network architecture pre-trained with ImageNet dataset was applied for artifact classification [22]. The last fully connected layer was modified to produce four outputs. All model layers were trained during transfer learning for artefact detection. The reconstructed images were processed to a size of 1 × 512 × 512, and during training, they were augmented by randomly cropping to 1 × 450 × 450 size to obtain variability. Data augmentation was performed during training phase before feeding input data to the network. Subsequently, they were resized to the required input size of 3 × 244 × 244 and fed as an input for the pre-trained VGG-16 network.
The total dataset (NTotal = 1501 × 4 (6004) was randomly split into training set (Ntrain = 1040 × 4(4160)) two validation sets (Nval1 = 130 × 4(520) and Nval2 = 130 × 4(520), and to a separate test set (Ntest = 201 × 4(804); two individual subjects). In the notation, four refers to {artefact-free, ring artefact, movement artefact, beam hardening} images simulated from one slice image. This yielded data splitting to 76.5% training, 13.2% validation and 10% for testing. The maximum number of epochs was set to 30. To avoid model overfitting, early stopping with patience of six epochs based on validation loss was applied. The model training was performed as follows: Ntrain was used in training, and Nval1 was applied in early stopping evaluation. Then the model was evaluated on the Nval2 set. The hyperparameter tuning was applied in this step.
After selecting the best performing hyperparameter combination, the final model was trained on Ntrain + Nval2 and Nval1 was used for early stopping for that model. Final evaluations were performed to the separate test set. Hyperparameter tuning was applied to learning rates (LRs) {1e−4; 1e−5; 1e−6}, batch sizes {8,16}, and weight decays {1e−1, 1e−2, 1e−3} yielding 18 possible combinations. The final best performing combination was: LR = 1e−5; batch size = 8, and weight decay = 0.01, which was used in the final training using ADAM optimizer for 15 epochs. The cross-entropy loss function was applied as the loss function.
Validation of Artefact Detection Framework:
For model performance validation, the area under receiver operating characteristics curve (ROC-AUC), average precision, accuracy, precision, recall, and F1-scores (2 × (precision × recall)/(precision + recall)) were determined. A Gradient-weighted Class Activation Mapping (Grad-CAM) approach was applied to provide visual explanations highlighting the regions important for the VGG-16 model decisions [23]. Finally, to assess the limits of reliability of the trained model (i.e. to test the detection limits for artefacts that were not so prominent), an additional sensitivity analysis was performed with two different slice images taken from the test dataset with different RA and movement artefact classes: 1. Bright and dark RAs with increasing radius from 4 to 245 in 10 evenly spaced intervals were generated and fed to the trained model to assess how small a radius the RA model is capable of detecting. 2. Movement from 0.2° to 3° in 10 evenly spaced intervals was simulated in the image slices.

3 Results

The model performance was excellent for the test set (Table 1). The poorest precision was on the movement and beam hardening artefact classes. The overall ROC-AUC, average precision and accuracies were 0.99, 0.86 and 0.91, respectively (Fig. 2). When ROC-AUC was evaluated class-wise, the poorest performance was found on beam hardening artefact (area = 0.97) (Fig. 2).
Table 1.
Precision, recall and F1-score values for simulated artefact-free and different image artefact types.
Image Type
Precision
Recall
F1-score
No artefact
0.99
0.71
0.85
Ring artefact
0.99
0.99
0.99
Beam hardening artefact
0.85
0.91
0.88
Motion artefact
0.84
0.99
0.92
The GradCAM visualizations showed that with all image classes, the model attention is focused on bony regions (Fig. 3). In the simulated artefact-free image, the model attention highlights the whole head region, whereas, for RA images, the model attention is intensified in the RA. For movement artefacts, the attention is focused on blurring artefact as well as in the field of view edge, which has a halo artefact (Fig. 3). For beam hardening artefact, the attention map focuses on the uniform brain region as well as in the bony regions (Fig. 3c).
The qualitative sensitivity analysis revealed that the model could classify all movement artefacts correctly. However, this was due to the visible edge in the field-of-view, which is also highlighted in the attention maps (Fig. 4). The model was incapable of detecting small RAs with a radius of 4 pixels (Fig. 5). Also, the bright RA near the bony regions was misclassified by the model even though the model attention highlights the RA region (Fig. 5e).

4 Discussion

Data-driven algorithms and especially convolutional neural networks have shown their applicability in various medical image processing and analysis tasks. This simulation study demonstrated that deep learning can be utilized as an effective classification and detection tool for CT image artefacts. Furthermore, the attention maps created using the Grad-CAM approach provided visual assessment for model performance evaluation.
In image quality assurance tasks, recent advances have been made in automating the analysis of technical image quality parameter directly from patient images [7, 8]. The approach developed in this study could be incorporated as a monitoring or artefact indicator tool to support imaging technologists who monitor the patient during the clinical imaging process. DL could be applied to support this monitoring process as a post-analysis step. Although, majority of these CT image artifacts can be differentiated based on visual observation, the DL-based automated process may speed up the process of checking images from large datasets after acquisition, thus enabling faster maintenance action. Also, this approach could have applications not only for artefact detection but also in assessing the artefact prevalence of CT scanners. The developed method has the potential to improve quality assessment and decrease recall rates. Further, DL-based quality assurance could be used to identify protocols that require further optimization or scanners that require maintenance. To illustrate, if the classification tool frequently detects movement artefacts in routine use of a specific protocol, the rotation time could be decreased, the pitch factor increased to reduce scan times, or the imaging technologists could be trained to better advise/guide the patients not to move during scan process.
Although the overall model performance was excellent, there were misclassifications in the test data. Therefore, an additional sensitivity analysis was performed to assess the limits of the model. This analysis, combined with the Grad-CAM visualization, revealed that small RAs were left undetected as they were difficult to distinguish from anatomical structures and bright RAs near the skull were left undetected by the model. The model performance may be further improved by introducing more of these misclassified cases in the learning process. The model attention in movement artefact simulations focused on the edge of the field-of-view region, which is not ideal as the out of field artefacts may be pre-corrected for clinical images.
This study has the following limitations. Only simulated artefact dataset from the head region was utilized as a comprehensive collection of clinical images with various artefacts was not available. This was because some of the artefacts are not commonly encountered in clinical practice. For illustration, RAs are usually monitored by radiographers using quality assurance phantoms. In addition, only one openly available head CT dataset was utilized, and it contained images only from the head region. However, CT artefacts and artefact types differ depending on which body region is being imaged (e.g., in the thorax region, respiration-related motion artifacts are more common). Therefore, in future studies, the dataset should include other body regions. However, this was considered out of scope in this proof-of-concept study. One solution to produce a non-simulated dataset with various artefacts in the whole-body region would be to scan anthropomorphic phantoms with artefacts. Imaging phantoms enable a controlled and systematic process for artefact generation with different CT imaging protocols. For example, movement artefacts could be easily produced by introducing movement e.g. using a linear actuator. X-ray detector-related artefacts are more difficult to generate, but for example, contrast material contamination in the detector cover or mylar window produces RAs. In this proof-of-concept study, only a pre-trained VGG-16 neural network architecture was utilized as it performed very well initially. However, in future studies, other more sophisticated network architectures should be investigated in combination with real measured i.e., non-simulated, artefact datasets.
Moreover, the grad-CAM heatmaps generated from the trained VGG-16 model highlighted well the artefact regions. However, other architectures should also be experimented with in more detail in future studies. In addition, the verification of model performance with clinical data needs to be addressed in the future. Furthermore, the clinical data should have a variety of CT scanner vendors and protocols to increase the variability in the noise texture [14], image resolution, as well as in contrast. Finally, exposing the model for different X-ray energy spectra, and collimations may cause challenges in real patient data as the contrast and overall image quality appearance would differ.

5 Conclusion

In summary, a classification pipeline for CT image artefact detection was developed using VGG-16 model convolutional neural network architecture. The model performance was excellent on a simulated dataset, and the developed method shows promise for medical image quality assurance as an integrated part of the routine diagnostic workflow. However, before this, the results need to be verified with a clinical dataset of various artefacts and different CT scanners from different vendors.

Acknowledgments

This study was supported by Academy of Finland (Project no. 316899).

Disclosure of Interests

The authors have no relevant conflicts of interest to disclose.
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://​creativecommons.​org/​licenses/​by/​4.​0/​), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
Literatur
4.
Zurück zum Zitat Törmänen, J., Rautiainen, J., Tahvonen, P., Leinonen, K., Nieminen, M.T., Tervonen, O.: The ‘Air in the CT X-ray tube oil’ artifact—examples of the quality control images and the evaluation of four potential clinical patientsʼ head computed tomography cases. J. Comput. Assist. Tomogr. 41(3), 489–493 (2017). https://doi.org/10.1097/RCT.0000000000000532CrossRef Törmänen, J., Rautiainen, J., Tahvonen, P., Leinonen, K., Nieminen, M.T., Tervonen, O.: The ‘Air in the CT X-ray tube oil’ artifact—examples of the quality control images and the evaluation of four potential clinical patientsʼ head computed tomography cases. J. Comput. Assist. Tomogr. 41(3), 489–493 (2017). https://​doi.​org/​10.​1097/​RCT.​0000000000000532​CrossRef
10.
Zurück zum Zitat Galib, S.M., Lee, H.K., Guy, C.L., Riblett, M.J., Hugo, G.D.: A fast and scalable method for quality assurance of deformable image registration on lung CT scans using convolutional neural networks. Med. Phys. 47(1), 99–109 (2020). https://doi.org/10.1002/mp.13890CrossRef Galib, S.M., Lee, H.K., Guy, C.L., Riblett, M.J., Hugo, G.D.: A fast and scalable method for quality assurance of deformable image registration on lung CT scans using convolutional neural networks. Med. Phys. 47(1), 99–109 (2020). https://​doi.​org/​10.​1002/​mp.​13890CrossRef
14.
16.
Zurück zum Zitat Prakash, P., Dutta, S.: Deep learning-based artifact detection for diagnostic CT images. In: Bosmans, H., Chen, G.-H., Gilat Schmidt, T. (eds.) Medical Imaging 2019: Physics of Medical Imaging, p. 158. SPIE (2019). https://doi.org/10.1117/12.2511766 Prakash, P., Dutta, S.: Deep learning-based artifact detection for diagnostic CT images. In: Bosmans, H., Chen, G.-H., Gilat Schmidt, T. (eds.) Medical Imaging 2019: Physics of Medical Imaging, p. 158. SPIE (2019). https://​doi.​org/​10.​1117/​12.​2511766
22.
Zurück zum Zitat Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: 3rd International Conference on Learning Representations (ICLR 2015), Conference Track Proceedings, pp. 1–14 (2014) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: 3rd International Conference on Learning Representations (ICLR 2015), Conference Track Proceedings, pp. 1–14 (2014)
Metadaten
Titel
Computed Tomography Artefact Detection Using Deep Learning—Towards Automated Quality Assurance
verfasst von
S. I. Inkinen
A. O. Kotiaho
M. Hanni
M. T. Nieminen
M. A. K. Brix
Copyright-Jahr
2024
DOI
https://doi.org/10.1007/978-3-031-59091-7_2

Premium Partner