Diagnostic effectiveness of deep learning-based MRI in predicting multiple sclerosis: A meta-analysis

Tareef S. Daqqaq; Ayman S. Alhasan; Hadeel A. Ghunaim

doi:10.17712/nsj.2024.2.20230103

Abstract

Objectives: The brain and spinal cord, constituting the central nervous system (CNS), could be impacted by an inflammatory disease known as multiple sclerosis (MS). The convolutional neural networks (CNN), a machine learning method, can detect lesions early by learning patterns on brain magnetic resonance image (MRI). We performed this study to investigate the diagnostic performance of CNN based MRI in the identification, classification, and segmentation of MS lesions.

Methods: PubMed, Web of Science, Embase, the Cochrane Library, CINAHL, and Google Scholar were used to retrieve papers reporting the use of CNN based MRI in MS diagnosis. The accuracy, the specificity, the sensitivity, and the Dice Similarity Coefficient (DSC) were evaluated in this study.

Results: In total, 2174 studies were identified and only 15 articles met the inclusion criteria. The 2D-3D CNN presented a high accuracy (98.81, 95% CI: 98.50–99.13), sensitivity (98.76, 95% CI: 98.42–99.10), and specificity (98.67, 95% CI: 98.22–99.12) in the identification of MS lesions. Regarding classification, the overall accuracy rate was significantly high (91.38, 95% CI: 83.23–99.54). A DSC rate of 63.78 (95% CI: 58.29–69.27) showed that 2D-3D CNN-based MRI performed highly in the segmentation of MS lesions. Sensitivity analysis showed that the results are consistent, indicating that this study is robust.

Conclusion: This metanalysis revealed that 2D-3D CNN based MRI is an automated system that has high diagnostic performance and can promptly and effectively predict the disease.

Multiple sclerosis (MS) is an inflammatory neurological condition that affects the central nervous system (CNS). Specifically in the brain’s white matter, it causes demyelination and inflammation of the nerves. As a result, it can slow down or block messages between the brain and body. The World Health Organization (WHO) revealed that MS affects 2.8 million people globally, and its prevalence is increasing every year.¹ North America, Europe, and Australia have most of the MS patients.² Fatigue, trouble walking, stiffness, weakness, vision issues, vertigo, cognitive changes, emotional changes, sadness, and more are all typical MS symptoms.^3,4 To date, the etiology of MS remains unclear.⁵ The MS is thought to be caused by a confluence of hereditary and environmental factors.⁶ Geographical location, vitamin D insufficiency, obesity, and smoking are examples of environmental factors that may be related to MS. The early detection of MS presents long-term benefits and could help researchers to find the best clinical strategy of this condition.⁷ There are currently no signs, physical observations, or laboratory testing that can alone indicate if you have MS. There are several methods used to assess if you match the recognized standards for an MS diagnosis and to rule out other potential causes of the symptoms you are presently exhibiting. A thorough medical history, a neurologic examination, and numerous diagnostics, such as magnetic resonance imaging (MRI), spinal fluid analysis, and blood testing, are some of these measures.⁸ Currently, the standard non-invasive diagnostic modality of MS uses MRI to visualize the lesions and presents a major role in controlling prognosis and development of the disease.⁹ The use of quantitative MRI techniques improves understanding of the extent of tissue damage and disease. These methods consist of: (1) MR spectroscopy, a non-invasive technique for examining the biochemical changes in MS;¹⁰ (2) magnetization transfer imaging which offers improved sensitivity and specificity for MS studies;¹¹ (3) diffusion weighted imaging (DWI) and diffusion tensor imaging (DTI) which are quantitative MRI techniques, providing information on size, integrity, geometry, and orientation of tissue fibers;¹² (4) dynamic contrast enhanced MRI which enables quantification of blood brain barrier disruption;¹³ and (5) dynamic susceptibility contrast MRI, which by injecting contrast agent into the patient produces quantitative maps of cerebral blood flow, cerebral blood volume, and temporal metrics like mean transit time.¹³ To date, the accurate diagnosis of MS lesions presents some inconvenient. It was showed that MRI are difficult and time-consuming modalities in the diagnosis of MS because it is difficult to manually detect most of the lesions, especially within the grey matter.¹⁴ Moreover, interobserver variability can lead to inaccurate results, and the inability to compare studies from several modalities is a significant drawback. Also, methods that include human interplay may be characterized by substantial inter observer variability that may hamper the quality of the final results. Thus, new modalities were proposed to correctly detect MS lesions. Recently, deep learning (DL) tools, using artificial intelligence (AI), have developed for the diagnosis of various diseases, attracting many physicians’ attention.^15,16 Different DL techniques using MRI were proposed for the diagnosis of MS. The principal advantage of DL methods is their capacity to deduct intrinsic image representation in MRI data.¹⁷ Furthermore, DL does not need any manual guidance of the characteristic extraction step.¹⁸

Since 2016, research on the use of DL architectures and MRI data for the diagnosis of MS have been conducted. Identification, segmentation, and classification of MS lesions were investigated by DL models. Conventional neural networks (CNN) are one of the most widely employed architectures in MS diagnosis.¹⁴ It learns characteristics of lesions using multinomial logistic regression to improve the diagnosis of MS.¹⁹ The majority of physicians used 2D- and 3D-CNN architectures for classification and segmentation of MRI techniques. These networks’ capacity to reuse weights and lower parameter counts make them more compatible with 2D and 3D images.¹⁴ Given that 3D images contain a lot more information than 2D images, it makes some sense that the 3D CNN will perform better than the 2D version. The CNN designs and/or training/testing dataset variances may be the root cause of the underlying variations. A 3D CNN requires far more processing power for training and inference than a 2D CNN.

The objective of this metaanalysis is to assess the effectiveness of MRI based 2D-3D CNN architectures on the diagnosis of MS.

Methods

Resources and search techniques

We followed the Preferred Reporting Items for Systematic Reviews and MetaAnalyses (PRISMA) standards to conduct this study.²⁰ The following databases were used to find relevant papers published from January 2010 until December 2022: Web of Science, PubMed, CINAHL, Google Scholar, Embase, and the Cochrane Library. Two independent reviewers performed a systematic search using the following terms “multiple sclerosis” AND “magnetic resonance imaging” OR “MRI” AND “machine learning” OR “artificial intelligence” OR “deep learning” OR “convolutional neural networks”.

Selection criteria

After suppression of duplicates, title and abstract checks were done on pertinent papers. Papers were included if they reported the use of 2D- or 3D-CNN for the identification, classification, or segmentation of MS lesions. They were then fully read to ensure eligibility.

Study inclusion criteria were: (1) Papers reporting MS lesions; (2) identification, classification, or segmentation of MS lesions using a CNN method; (3) use of 2D- or 3D-CNN architecture; (4) use of MRI as neuroimaging modality; (5) original research papers; and (6) articles reporting sufficient information about the performance of CNN.

Study exclusion criteria were: (1) Papers written in languages other than English; (2) letters, comments, opinions, guidelines, protocols, and review papers; (3) use of other architectures of CNN (4D-CNN Models, DeepSCAN); (4) overlapping study groups and duplicate publications; (5) studies with scant information on the results.

Data extraction

Two independent authors retrieved information from the eligible articles following the inclusion and exclusion criteria, and information were collected on a standardized data sheet that included: (1) article, (2) country, (3) dataset, (4) sample size, (5) diagnosis application, (6) neuroimaging modalities, (7) deep learning method, (8) deep learning architecture, and (9) performance.

Study Quality Assessment

The methodologic quality of the included studies was evaluated independently, by 2 authors, using the Quality Assessment of Diagnostic Accuracy Studies-2 (QUADAS-2) tool, which includes four criteria that judge bias and applicability: “patient selection”, “index test”, “reference standard”, and “flow and timing”.²¹ Each is assessed in terms of risk of bias, and the first 3 domains were also assessed with respect to applicability. Each item is answered with “yes,” “no,” or “unclear.” The answer of “yes” means low risk of bias, whereas “no” or “unclear” means the opposite. Consensus was used to settle disagreements, while arbitration with a third reviewer was an option if necessary. RevMan Version 5.4 (Cochrane Collaboration, Oxford, United Kingdom) was used to visualize the quality assessment results.

Outcome measures

Accuracy: It measures the ability of 2D- or 3D-CNN to detect MS when it is present and detect the absence of MS when it is absent.

Sensitivity

It refers to 2D- or 3D-CNN’s ability to designate an individual with MS as positive.

Specificity

It refers to 2D- or 3D-CNN’s ability to correctly classify an individual without MS disease.

Dice Similarity Coefficient (DSC): It is a spatial overlap index and a reproducibility validation metric that measures the similarity between two sets of binary segmentation results.

These measures were used for the :(i) identification of MS from healthy controls, (ii) classification of MS lesions from other brain lesions, and (iii) segmentation of images produced by MRI for measuring and visualizing the brain’s anatomical structures, for analyzing brain changes and for delineating MS lesions.

Statistical analysis

Accuracy, sensitivity, specificity, and dice similarity coefficient measures were pooled from the included studies. Statistical analyses were conducted by RevMan Version 5.4 (Cochrane Collaboration, Oxford, United Kingdom). A p-value <0.05 was considered significant. Heterogeneity was assessed by the Cochrane chi-squared test. A p-value <0.05 confirms the presence of heterogeneity. In order to assess the influence of heterogeneity on the results, we calculated I² values; I2 values ≥50% and p<0.05 indicated an important level of heterogeneity. If I²<50% and p>0.05, we used a fixed effects design; if not, a random effects model was adopted.²² We also performed subgroup and sensitivity analysis to identify the cause of heterogeneity. To assess publication bias, a visual examination of the symmetry in funnel plots was used. This second point was supported by Egger’s test using the SPSS V25 statistical package.

Results

Identification of studies

Literature search identified 2174 papers to be screened, of which 691 studies were duplicates and were removed. Hence, 1483 papers were screened by title and abstract and 543 were excluded for no full text article available or language other than English. Finally, 940 studies of which 940 studies were identified as potentially eligible and then were full text reviewed. Fifteen publications satisfied the eligibility requirements and were included in this study. The flowchart for the PRISMA study is shown in (Figure 1).

Figure 1

- PRISMA study flowchart.

Characteristics of studies

The 15 studies were released between 2018 and 2021 and were came from nine nations: Australia (n=2), Iran (n=2), Italy (n=2), USA (n=2), China (n=2), Spain (n=2), Switzerland (n=1), India (n=1), and Germany (n=1). The number of patients ranged from 19 to 1111. Three, four, and eight studies reported the effectiveness of 2D- and 3D-CNN in the identification, classification, and segmentation of MS lesions, respectively. Study features are represented in (Table 1).

View this table:

Table 1

- Features of included studies

For the clinical diagnosis of MS, it is crucial to identify brain lesions utilizing MRI modalities. Medical professionals have significant challenges when trying to segment and categorize brain lesions obtained from MRI modalities and are at risk of making errors in diagnosis. Many elements, including artifacts, intensity heterogeneity, etc., have a negative impact on the MR image’s quality, which frequently results in disease misdiagnosis. The low level and high level preprocessing techniques used by MRI neuroimaging modalities to diagnose MS are covered in the sections that follow. Computer aided diagnosis system (CADS) performs better when high level preprocessing techniques are used in conjunction with low level preprocessing approaches. Data augmentation (DA), patch extraction, and other techniques are among them. Table 2 provides a summary of the specific preprocessing data used by each article to diagnose MS utilizing DL techniques and MRI modalities. There are many toolboxes available for implementing DL models. Table 2 lists the tools used to create DL architectures. TensorFlow and Keras are the most significant DL tools. The final component of the DL-based CADS displayed in Table 2 is the activation function of the final layer used for classification in DL models. It can be noted that, the SoftMax function has yielded the highest classification performance.

View this table:

Table 2

- Summary of CADS developed for MS using MRI neuroimaging modalities and details of deep learning architectures.

View this table:

Table 2

- Summary of CADS developed for MS using MRI neuroimaging modalities and details of deep learning architectures.

Evaluation of the studies’ quality

A high risk of bias was revealed in approximately 25% of articles regarding patient selection and flow and timing criteria. In most of the papers (75%), a certain threshold was provided in relation to the index test criteria. Moreover, in terms of reference standard, a low risk of bias was detected in less than half of the included articles. We noticed that approximately similar results were found for applicability concerns (Figure 2). Indeed, the highest concerns were detected in reference standard criteria (33,33%), followed by patient selection (26,67%) and index test (20%).

Figure 2

- Risk of bias and applicability concerns graph: review authors’ judgements about each domain presented as percentages across included studies

Types of application and outcome measures Identification

Of the 15 included studies, three studies evaluated the diagnostic effectiveness of 2D- or 3D-CNN in the identification of MS lesions using accuracy, sensitivity, and specificity (25,36,37).

Accuracy

The heterogeneity was low (Chi²=4.23, p=0.12, I²=53%), so a fixed effect design was used. We revealed that the overall accuracy rate was significantly high at 98.81 (95% CI: 98.50–99.13; p<0.00001) (Figure 3).

Figure 3

- Pooled accuracy rates of 2D-3D CNN in the identification of MS lesions