Introduction

The words “COVID-19 pandemic” and “SARS-CoV-2 virus” have become household names over the past 6 months. Despite its recent discovery, the virus has had a tremendous impact on our lives. As of today, no vaccine or effective treatment has been discovered. While the “Great Lockdown” transitions away from many western nations, the battle against the pandemic has not yet rested. Thus far, two sets of measures have been decisive in combatting the virus. The first set of enforced policies impose a burden on the citizens to follow basic hygiene practices henceforth limiting the further spread of the virus. The execution of the second set of measures concerns the governmental bodies, where they are required to effectively detect, diagnose and track every infected citizen. The latter set of policies call for more practical methods of detection, given the constrained resources of the medical systems.

At the time writing, different types of tests have been executed. The first kind of test which accounts for a vast majority of examinations is called polymerise chain reaction (PCR) testing. The method searches for the viral RNA and is only capable of diagnosis when the patient has been actively infected. The test is performed using a nasopharyngeal swab. The aforementioned procedure, which is a very labour intensive one, results in human errors and incorrect diagnosis. False negatives can be as high as 30%.

The second category of tests is called serologic testing. This kind of test does not specifically search for the virus RNA but the antibodies to COVID-19. The test is carried by using a blood sample. Antibodies are produced by the body when an infection with a specific virus takes place. The antibodies are generally produced one or two weeks after infection and are useful to assess who has been in contact with the virus. This kind of test is less suitable to detect the infection early and are particularly interesting for the epidemiologists who want to build a clearer picture of the general sanitary conditions. The result produced from this test only displays who has been infected, but it does not determine if and for how long immunity has been acquired for this virus.

Other tests are under development, notably, lateral flow assays which look for a biological marker in different samples (urine, blood, saliva,etc.). It works just like a pregnancy test and can be used for in-home testing. Other pharmaceuticals are developing rapid in-clinic antigen testing. In this procedure, the analyzing device uses cartridges where the biological components in each cartridge are used to prove if the patient has SARS-CoV-2 or any of 9 other respiratory diseases.

Depending on the test type and where it is performed, the prices can be high. Furthermore, reagents (2020) are also key components of most tests and are currently in short supply. China and the US are producers and exporters of reagents but both countries have been hit by the pandemic.

In short, tests are in great demand.

Up to now, only a fraction of the population has been tested, and even if testing every person may be useless, having have the capabilities to do so is critical.

Number of tests per thousand -@data_2020

Number of tests per thousand (“Tests Per Thousand Since the 5th Confirmed Death Due to Covid-19” 2020)

Even small countries like Iceland or Luxembourg have only been able to carry 100 tests per thousand inhabitants.

SARS-CoV-2 transmission is difficult to trace as the symptoms can appear as long as 14 days after exposure to the virus. The most symptomatic people have fever, tiredness and a dry cough, among other things. People are very susceptible to developing pneumonia.

Another method used to confirm a diagnosis is the use of chest x-rays to detect the virus.

However, this method also has both advantages and drawbacks. First, it can be carried in almost every country because the material to conduct the test is ubiquitous. Second, it is less costly than a CT scan or PCR test, and most of the time, CT scans can only be carried at bigger hospitals. Finally, X-rays can be conducted more safely than the nasal swab method as they do not risk aerosolizing the virus when the test is conducted.

The drawbacks of using X-ray scans include the increased difficulty of detecting asymptomatic cases, lack of its remoteness and the availability of a radiologist to analyze and determine the outcome of the X-ray scans.

No method alone will be perfect to detect the virus and implementation of multiple test techniques at once may be needed. For example, it could be interesting to use X-rays primarily to determine if one is COVID positive so that PCR tests may be allocated elsewhere, where X-rays cannot be used.

Therefore, having a cheap, fast and reliable method such as X-rays is deemed suitable by the authors however this calls for automation of the diagnostic process by analyzing the scans and determining if the disease exists or not (2020). (it is not advised to use this technique without the approval of a medical practioner.)

Research question

Thus, our research question is

Can Deep Learning be used to rapidly and accurately analyze X-rays pictures to predict COVID-19 ?

Previous approaches

Study Type of images Number of cases Method used Accuracy (%)
Ioannis et al. (2020) Chest X-ray 224 COVID-19(+)
700 Pneumonia
504 healthy
VGG-19 93.48
Wang and Wong (2020) Chest X-ray 53 COVID-19(+)
5526 COVID-19(-)
8066 Healthy
COVID-NET 92.4
Sethy and Behra (2020) Chest X-ray 25 COVID-19(+)
25 COVID-19(-)
ResNet50+
SVM
95.38
Hemdan et al. (2020) Chest X-ray 25 COVID-19(+)
25 Normal
COVIDX-Net 90.0
Narin et al. (2020) Chest X-ray 50 COVID-19(+)
50 COVID-19(-)
Deep CNN
ResNet-50
98.0
Ozturk et al. (2020) Chest X-ray 125 COVID-19(+)
500 No-Findings
DarkCovidNet 98.08
Ozturk et al.-Ozturk et al. (2020) Chest X-ray 125 COVID-19(+)
500 Pneumonia
500 No-Findings
DarkCovidNet 87.02

The traditional approach consists of using Convolutional Neural Networks(CNN) to solve this task. Half of the previous approaches use transfer learning for the purpose. VGG_19 and ResNet50 are the pre-trained networks of preference. Sethy and Behra use a Support Vector Machine on top of their CNN, at the very end of the dense layers in place of the traditional softmax activation function.

The other half of the approaches use customized networks such as the COVIDX-Net or the DarkCovidNet whose architecture is based on the DarkNet model. The DarkNet model is a model available on MATLAB which is 19 layers deep.

Data

As SARS-CoV-2 is a new virus, most X-rays available come from different sources. At the time being, 5 of them are worth mentionning:

a. The Radiological Society of North America (RSNA)
b. Radiopaedia
c. The Italian Society of Medical and Interventional Radiology (SIRM)
d. Eurorad.
e. Coronacases.org

As most of the data is not centralized, and therefore not always reliable, one of the leading repositories on COVID-19 positive (pneumonia) X-rays is the covid-chestxray-dataset. Using this repository, the data can be downloaded freely. This repo is approved by the University of Montreal’s Ethics Committee and is being continuously updated. Furthermore, previous studies have already used this repo. We sample 140 pictures of it. This has been placed in our data folder inside a sub-folder labeled chestxray_COVID. A small analysis and selection of this dataset can be found in the scripts/import+storage under the photo-organization script. Also please note that we have used one angle of chest X-rays from this repo called anteroposterior(AP) which is the most frequent class of chest X-ray scans. We have ignored other forms of scans which include but are not limited posteroanterior and lateral scans.

To find COVID-19 negative pictures as well as general bacterial and viral pneumonia, we have used another dataset provided by a study done in (2018) , which can be found here. This has been placed in our data folder inside a sub-folder labeled Kermany_OTHERS. This scans have the same AP view as the first dataset.

As the number of images of COVID+ at our disposal, if fairly limited, we build a train set, a small test set and a large test set. This will be specified below.

The total size of the data 1.28GB with 5893 items.

Methodology

CNN models

2 questions which naturally arise as we conduct our experiments are:

  • Can we accurately identify COVID+ individuals from COVID- individuals (binary classification task) ?
  • Can we accurately identify COVID+ individuals from COVID- individuals from individuals who have a viral pneumonia other than COVID from individuals who have a bacterial pneumonia (multiclass classification task with 4 outcome classes) ?

From the literature, we decide to use 2 different models, other than the ones which have been used so far. We decide to use transfer learning for our tasks. We train and test 2 CNN pretrained networks per task and use them as a basis for our model. We then add dense layers on top of them.

  • VGG16: this pretrained model built on the ImageNet dataset has a size of 528MB a Top-1 accuracy of 0.713. It is 23-layer deep and has 138 Mio parameters making it a very large model.

  • DenseNet201: this pretrained model built on the ImageNet dataset has a size of 80MB a Top-1 accuracy of 0.773. It is 201-layer deep and has 20 Mio parameters.

Both models are thus very different in size and in depth.

     

We select our best models based on the categorical accuracy as a first metric and the validation loss if there is a tie in categorical accuracy.

Binary classification  

We use the 140 COVID+ images as well as 140 COVID- images, so 280 images in total. We use 80% (224 images) of these as a training set and 20% (56 images) as a testing set. 50% of the training set will build the validation set. We tune our models on Google Cloud.   To challenge our model, we also build a larger test set made of 29 COVID+ and 1573 COVID- images

COVID+ COVID-
Train set 112 112
Small test set 28 28
Large test set 28 1572

     

Multiclass classification

Again, we build one training set and two testing sets; a small one and a large one. The table below indicates how many images were used per class and per set.

COVID+ COVID- Viral Pneumonia Bacterial Pneumonia
Train set 112 112 112 112
Small test set 28 28 28 28
Large test set 28 1583 1494 2788

   

Binary VGG16

Structure of the model :

  • VGG16 base
  • 2 hidden layers with 100 and 50 nodes respectively
  • Final layer with 2 nodes and the softmax activation function
  • SeLU activation function in each hidden layer
  • Dropout rate of 0.2
  • \(l_1\) regularization penalty of 0.001
  • Adamax optimizer with a learning rate of 0.001

We train this model for 30 epochs and we use early stopping with patience = 7.  

 

## Model: "sequential_1"
## ________________________________________________________________________________
## Layer (type)                        Output Shape                    Param #     
## ================================================================================
## vgg16 (Model)                       (None, 7, 7, 512)               14714688    
## ________________________________________________________________________________
## flatten_1 (Flatten)                 (None, 25088)                   0           
## ________________________________________________________________________________
## dense_3 (Dense)                     (None, 100)                     2508900     
## ________________________________________________________________________________
## dense_4 (Dense)                     (None, 50)                      5050        
## ________________________________________________________________________________
## dropout_1 (Dropout)                 (None, 50)                      0           
## ________________________________________________________________________________
## dense_5 (Dense)                     (None, 2)                       102         
## ================================================================================
## Total params: 17,228,740
## Trainable params: 2,514,052
## Non-trainable params: 14,714,688
## ________________________________________________________________________________

   

Small test set

## $loss
## [1] 1.22
## 
## $categorical_accuracy
## [1] 0.964
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction COVID+ COVID-
##     COVID+     26      0
##     COVID-      2     28
##                                         
##                Accuracy : 0.964         
##                  95% CI : (0.877, 0.996)
##     No Information Rate : 0.5           
##     P-Value [Acc > NIR] : 2.22e-14      
##                                         
##                   Kappa : 0.929         
##                                         
##  Mcnemar's Test P-Value : 0.48          
##                                         
##             Sensitivity : 0.929         
##             Specificity : 1.000         
##          Pos Pred Value : 1.000         
##          Neg Pred Value : 0.933         
##              Prevalence : 0.500         
##          Detection Rate : 0.464         
##    Detection Prevalence : 0.464         
##       Balanced Accuracy : 0.964         
##                                         
##        'Positive' Class : COVID+        
## 

Based on our small test set, we can predict COVID patients in 96% of the cases.        

 

Binary DenseNet201

Structure of the model :

  • DenseNet201 base
  • 2 hidden layers with 100 and 100 nodes respectively
  • Final layer with 2 nodes and the softmax activation function
  • ReLU activation function in each hidden layer
  • Dropout rate of 0.2
  • No \(l_1\) regularization penalty
  • Adamax optimizer with a learning rate of 0.001

We train this model for 30 epochs and we use early stopping with patience = 7.

## Model: "sequential_1"
## ________________________________________________________________________________
## Layer (type)                        Output Shape                    Param #     
## ================================================================================
## densenet201 (Model)                 (None, 7, 7, 1920)              18321984    
## ________________________________________________________________________________
## flatten_1 (Flatten)                 (None, 94080)                   0           
## ________________________________________________________________________________
## dense_3 (Dense)                     (None, 100)                     9408100     
## ________________________________________________________________________________
## dense_4 (Dense)                     (None, 100)                     10100       
## ________________________________________________________________________________
## dropout_1 (Dropout)                 (None, 100)                     0           
## ________________________________________________________________________________
## dense_5 (Dense)                     (None, 2)                       202         
## ================================================================================
## Total params: 27,740,386
## Trainable params: 9,418,402
## Non-trainable params: 18,321,984
## ________________________________________________________________________________

   

Small test set

## $loss
## [1] 0.0092
## 
## $categorical_accuracy
## [1] 1
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction COVID+ COVID-
##     COVID+     28      0
##     COVID-      0     28
##                                     
##                Accuracy : 1         
##                  95% CI : (0.936, 1)
##     No Information Rate : 0.5       
##     P-Value [Acc > NIR] : <2e-16    
##                                     
##                   Kappa : 1         
##                                     
##  Mcnemar's Test P-Value : NA        
##                                     
##             Sensitivity : 1.0       
##             Specificity : 1.0       
##          Pos Pred Value : 1.0       
##          Neg Pred Value : 1.0       
##              Prevalence : 0.5       
##          Detection Rate : 0.5       
##    Detection Prevalence : 0.5       
##       Balanced Accuracy : 1.0       
##                                     
##        'Positive' Class : COVID+    
## 

Based on our small test set, we can predict COVID patients in 100% of the cases.        

 

Multiclass VGG16

Structure of the model :

  • VGG16 base
  • 2 hidden layers with 100 and 50 nodes respectively
  • Final layer with 4 nodes and the softmax activation function
  • ReLU activation function in each hidden layer
  • No dropout rate
  • No \(l_1\) regularization penalty
  • RMSprop optimizer with a learning rate of 0.0001

We train this model for 30 epochs and we use early stopping with patience = 7.

## Model: "sequential_1"
## ________________________________________________________________________________
## Layer (type)                        Output Shape                    Param #     
## ================================================================================
## vgg16 (Model)                       (None, 7, 7, 512)               14714688    
## ________________________________________________________________________________
## flatten_1 (Flatten)                 (None, 25088)                   0           
## ________________________________________________________________________________
## dense_3 (Dense)                     (None, 100)                     2508900     
## ________________________________________________________________________________
## dense_4 (Dense)                     (None, 50)                      5050        
## ________________________________________________________________________________
## dropout_1 (Dropout)                 (None, 50)                      0           
## ________________________________________________________________________________
## dense_5 (Dense)                     (None, 4)                       204         
## ================================================================================
## Total params: 17,228,842
## Trainable params: 2,514,154
## Non-trainable params: 14,714,688
## ________________________________________________________________________________

   

Small test set

## $loss
## [1] 0.402
## 
## $categorical_accuracy
## [1] 0.848
## Confusion Matrix and Statistics
## 
##               Reference
## Prediction     COVID+ COVID- Viral P. Bacterial P.
##   COVID+           28      0        0            0
##   COVID-            0     27        3            0
##   Viral P.          0      0       17            5
##   Bacterial P.      0      1        8           23
## 
## Overall Statistics
##                                         
##                Accuracy : 0.848         
##                  95% CI : (0.768, 0.909)
##     No Information Rate : 0.25          
##     P-Value [Acc > NIR] : <2e-16        
##                                         
##                   Kappa : 0.798         
##                                         
##  Mcnemar's Test P-Value : NA            
## 
## Statistics by Class:
## 
##                      Class: COVID+ Class: COVID- Class: Viral P.
## Sensitivity                   1.00         0.964           0.607
## Specificity                   1.00         0.964           0.940
## Pos Pred Value                1.00         0.900           0.773
## Neg Pred Value                1.00         0.988           0.878
## Prevalence                    0.25         0.250           0.250
## Detection Rate                0.25         0.241           0.152
## Detection Prevalence          0.25         0.268           0.196
## Balanced Accuracy             1.00         0.964           0.774
##                      Class: Bacterial P.
## Sensitivity                        0.821
## Specificity                        0.893
## Pos Pred Value                     0.719
## Neg Pred Value                     0.937
## Prevalence                         0.250
## Detection Rate                     0.205
## Detection Prevalence               0.286
## Balanced Accuracy                  0.857

Based on our small test set, we can predict COVID patients in 84% of the cases.        

Multiclass DenseNet201

We discard this model because the accuracy of the best model after tuning is less than 60%.

Limitations / Improvements

Based on our two analyses, we think that further improvements could be made. First, X-rays of healthy patients are in fact images of children because we did not find reliable data of adults. This is one of the main limitations of our project. Did the model learn to correctly identify features specific to COVID infection or did it learn to distinguish between adults and children ? The most likely answer is that our model is sensitive to specific features of COVID+ patients. Still, it would be good to have access to healthy adults data to check if this is truly the case.

Another improvement which could be made is access to more COVID+ images. At the time being, too few images are freely accessible. This issue is not specific to our project because researchers used at most 224 images of infected patients.

This is even more important in an epidemiological context. Even if we used the validation categorical accuracy and the validation loss as our first and second metrics, the most important criterion besides these is sensitivity. False negatives are much more costly because this would mean undetecting infected people.

So the next steps would be to collect images of healthy adults, to have more COVID chest X-rays and to use the sensitivity as a metric to improve how efficient our model can be.

Conclusion

To conclude, we would advise considering DenseNet201 for the binary task and VGG16 for the multiclass task. From a technical and a contextual point of view, DenseNet201 is better because its accuracy is higher and its sensitivity, which is invaluable to our goal, is higher than of the other model as well. When we test both models on the small test sets, DenseNet201 gives better results. Moreover, DenseNet201 has approximately a size 1/6th of that of VGG16 (80MB vs 528MB). Therefore, we will choose DenseNet201 for the binary task.

On the other hand, for the multiclass classification, VGG16 performs much better than DenseNet201, to the degree that the latter was disregarded in the multiclass model evaluation.

To conclude, despite the limitations of our project, we think deep learning has great potential for specific medical applications such as x-rays analysis. This is especially true when the availability of the medical staff is the bottleneck of the process. We have not only learned how to apply a deep learning model but also to take the context of application into account. This model should be reviewed by a medical practitioner and should be complementary to the work of a radiologist and not its replacement.

References

Apostolopoulos, Ioannis D., and Tzani A. Mpesiana. 2020. “Covid-19: Automatic Detection from X-Ray Images Utilizing Transfer Learning with Convolutional Neural Networks.” Physical and Engineering Sciences in Medicine, April. Springer Science and Business Media LLC. https://doi.org/10.1007/s13246-020-00865-4.

Hemdan, Ezz El-Din, Marwa A. Shouman, and Mohamed Esmail Karar. 2020. “COVIDX-Net: A Framework of Deep Learning Classifiers to Diagnose Covid-19 in X-Ray Images.” https://arxiv.org/abs/2003.11055.

Kermany, Daniel. 2018. “Large Dataset of Labeled Optical Coherence Tomography (Oct) and Chest X-Ray Images.” Mendeley. https://doi.org/10.17632/RSCBJBR9SJ.3.

Narin, Ali, Ceren Kaya, and Ziynet Pamuk. 2020. “Automatic Detection of Coronavirus Disease (Covid-19) Using X-Ray Images and Deep Convolutional Neural Networks.” https://arxiv.org/abs/2003.10849.

Ozturk, Tulin, Muhammed Talo, Eylul Azra Yildirim, Ulas Baran Baloglu, Ozal Yildirim, and U Rajendra Acharya. 2020. “Automated Detection of Covid-19 Cases Using Deep Neural Networks with X-Ray Images.” Computers in Biology and Medicine. Elsevier, 103792. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7187882/.

Sethy, Prabira, Kumari Santi, Behera, Pradyumna Kumar, and Preesat Biswas. 2020. “Detection of Coronavirus Disease (Covid-19) Based on Deep Features and Support Vector Machine,” April, 643–51. https://doi.org/10.33889/IJMEMS.2020.5.4.052.

“Tests Per Thousand Since the 5th Confirmed Death Due to Covid-19.” 2020. Our World in Data. https://ourworldindata.org/grapher/total-tests-per-thousand-since-5th-death.

Wang, Linda, and Alexander Wong. 2020. “COVID-Net: A Tailored Deep Convolutional Neural Network Design for Detection of Covid-19 Cases from Chest X-Ray Images.” https://arxiv.org/abs/2003.09871.