Breast Cancer Classification using Computer Vision

Chandramouli Yalamanchili
Updated - 06/03/2021 [Created - 03/28/2021]
View Project Code on GitHub

 Breast Cancer Classification using Computer Vision


Introduction

Breast cancer is one among the foremost common forms of cancer in American women, it’s estimated that within the year of 2020, approximately 30% of the new cancer diagnosed women are carcinoma. Within Carcinoma, the Invasive Ductal Carcinoma (IDC) is the frequent subcategory. Within the year of 2020, IDC subtype accounted for 85% of total carcinoma cases.

Usually, pathologists look for the regions within the full mount sample that have IDC to come up with grade for the whole sample. As a result, one of the critical pre-processing steps is to define the precise regions of IDC within an entire mount slide. Using the automatic process to evaluate each of the patches or the mount samples would be great help in saving time and increasing the accuracy of the diagnosis.

Through this project we have built a Keras based CNN (Convolutional Neural Network) classifier model that would evaluate the patches collected from several whole mount slide images and accurately classify a histology image as benign or malignant.

back to top

Project Motivation

back to top

Domain Introduction

Breast Cancer Introduction

Breast cancer occurs mostly in women, it is a type of the cancer where the breast cells start to grow abnormally. In the case of breast cancer, a tumor is formed by the cancer cells, these tumors can be examined x-ray or can be felt with touch as a lump. The lumps formed in the breast can either be benign or malignant. Benign, the non-cancerous tumors are usually not life threatening as they grow slowly and do not spread to other parts of the body. Whereas malignant tumors are the cancerous cells, they are considering life threatening as they grow rapidly, they also attack and kill the nearby tissues, as well as spread throughout the body.

On a very high level, breast cancer can be of different types like invasive, non-invasive, metastatic, intrinsic, and molecular. For this version, we have considered the specific type of cancer that is Invasive Ductal Carcinoma (IDC) subtype of breast cancer. This type of cancer constitutes to 80% of the breast cancer cases.

Invasive Ductual Carcinoma Invasive Ductual Carcinoma
Figure 1: Picture showing the invasive Ductal Carcinoma (IDC). Showing the abnormal growth in the enlarged cell within the duct.


As shown in Figure 1, in case of IDC, the cancer would break through the duct cell’s membrane and invade or spread to the nearby tissues. In this case the cancer would start in ductal tissue of the breast, duct is the tube that connects the lobules to the nipple. Carcinoma is the type of cancer that starts in the tissue that covers the internal organs, like breast cell in this case. If not detected early, invasive ductal carcinoma can invade the other tissues within breast or even other parts of the body.

Even though invasive ductal carcinoma is most common in older women, it can still affect younger women as well as men. Early detection of the breast cancer will help in increasing the survival rates of the patients. Currently the histopathologists diagnose the tissues extracted from the suspicious tumors and provide information related to type of cancer, and its grade when the tumor tests to be malignant. Data science can help in evaluating the suspected tumor tissues through an automated computer vision workflow and provide the diagnosis along with grade there by saving time as well increasing the accuracy of the diagnosis.

Breast Cancer Statistics

Cancer Rates Cancer Rates
Figure 2: Chart showing the annual rates of new cancer cases on the left, and the chart showing the annual number of new cancer cases on the right.


As we can see in Figure 2, the rates for new breast cancer cases as the number of breast cancer cases over the years has not been coming down. Below are some of the stats for breast cancer in 2020 to put some perspective around the seriousness of the issue:

back to top

Project Details

Dataset Details

Dataset Link - https://www.kaggle.com/paultimothymooney/breast-histopathology-images/

back to top

Technology used

back to top

Exploratory Data Analysis

1. IDC dataset statistics

Table 1: Table with critical statistics from the IDC dataset.

Parameter Value
Number of patients 279
Total number of images 277,524
Number of benign cells 198,738
Number of malignant cells 78,786

back to top

2. Distribution of data by diagnosis

IDC Images distribution
Figure 3: Bar Chart showing the distribution of different diagnosis classes in the IDC dataset.

back to top

3. Histology images

IDC Histology images
Figure 4: Benign and Malignant cells for 5 random patients.

back to top

Data Preparation

back to top

Modeling

Model Details

We have built a multi-layer Convolutional Neural Network (CNN) using Keras to perform image classification on the IDC dataset. Below are additional details about the model and its performance.

Model Performance

Table 2: Model performance

Evaluation Metric Value
Accuracy 88%
Precision Score 73%
Recall/Sensitivity Score 74%
F1 Score 74%
Specificity 74%

Model Summary Model Summary

Figure 5: Accuracy and Loss plots for CNN Model.


Confusion Matrix
Figure 6: Confusion Matrix.

back to top

Future scope

There has been a substantial process made in the research of computer vision and image classification in last few decades. Due to the progress made in the field, these techniques are being used in health care or medical imaging domain as well to achieve faster and more accurate results by automating some of the diagnosis steps. It is impressive to learn that when sufficient data is provided to train the model, the results are at par or in some cases exceed the manual evaluation performed by physicians.

Convolution Neural Networks (CNNs) seems to be the popular solution for computer vision in general or in case of healthcare. These models are being used by several companies that are working towards AI in medical imaging. It is interesting to learn that some of studies and projects are also working with 3D images or AR (Augmented Reality) using the computer vision technology.

back to top

Acknowledgement

Thanks to Bellevue University and all professors for the continuous guidance and support through out the data science course. Thanks to Professor Fadi Alsaleem for providing continuous constructive feedback and peers for their valuable inputs and discussions that helped me in building this project.

I also like to thank all the authors of the reference papers and articles.

back to top

Conclusion

Breast cancer is an increasing concern in today’s world, it is one among the foremost common forms of cancer in American women. Within breast cancer, Invasive Ductal Carcinoma (IDC) is the most common category, accounting for 85% of all carcinoma cases.

There has been substantial progress made in the computer vision and image classification technology over last few decades, making it possible to use these techniques in healthcare domain for medical imaging and diagnosis. Any contributions the computer vision technology can add in early detection of breast cancer would help in reducing the mortality rate of cancer patients every year.

In this paper we have taken the histology images of the IDC patient tissues and used them to train Keras based Convolutional Neural Network (CNN) to be able to predict if a particular histology image is benign or malignant. We have established that the CNN model has done a good job in predicting the classification of histology images with 88% accuracy score. We believe using the additional training data and allowing more time for training the model would yield even better prediction accuracy.

back to top

References

  1. F. Milletari, N. Navab and S. Ahmadi, “V-Net: Fully Convolutional Neural Networks for Volumetric Medical Image Segmentation,” 2016 Fourth International Conference on 3D Vision (3DV), Stanford, CA, USA, 2016, pp. 565-571, doi: 10.1109/3DV.2016.79. Retrieved May 9, 2021 from https://arxiv.org/pdf/1606.04797.pdf
  2. Junfeng Gao, Yong Yang, Pan Lin, Dong Sun Park, “Computer Vision in Healthcare Applications”, Journal of Healthcare Engineering, vol. 2018, Article ID 5157020, 4 pages, 2018. https://doi.org/10.1155/2018/5157020. Retrieved May 9, 2021 from https://www.hindawi.com/journals/jhe/2018/5157020/
  3. Nadim Mahmud, Jonah Cohen, Kleovoulos Tsourides, Tyler M. Berzin, Computer vision and augmented reality in gastrointestinal endoscopy, Gastroenterology Report, Volume 3, Issue 3, August 2015, Pages 179–184, https://doi.org/10.1093/gastro/gov027. Retrieved May 9, 2021 from https://academic.oup.com/gastro/article/3/3/179/613495
  4. Esteva, A., Chou, K., Yeung, S. et al. Deep learning-enabled medical computer vision. npj Digit. Med. 4, 5 (2021). https://doi.org/10.1038/s41746-020-00376-2. Retrieved May 9, 2021 from https://www.nature.com/articles/s41746-020-00376-2
  5. J. Thevenot, M. B. López and A. Hadid, “A Survey on Computer Vision for Assistive Medical Diagnosis From Faces,” in IEEE Journal of Biomedical and Health Informatics, vol. 22, no. 5, pp. 1497-1511, Sept. 2018, doi: 10.1109/JBHI.2017.2754861. Retrieved May 9, 2021 from https://www.researchgate.net/publication/320250581_A_Survey_on_Computer_Vision_for_Assistive_Medical_Diagnosis_From_Faces
  6. Chaohui Wang, Nikos Komodakis, Nikos Paragios. Markov Random Field Modeling, Inference & Learning in Computer Vision & Image Understanding: A Survey. Computer Vision and Image Understanding, Elsevier, 2013, 117 (11), pp.1610-1627. ff10.1016/j.cviu.2013.07.004ff. ffhal-00858390v2f. Retrieved May 9, 2021 from https://hal.archives-ouvertes.fr/hal-00858390/document
  7. Tim F. Cootes and Christopher J. Taylor “Statistical models of appearance for medical image analysis and computer vision”, Proc. SPIE 4322, Medical Imaging 2001: Image Processing, (3 July 2001); https://doi.org/10.1117/12.431093. Retrieved May 9, 2021 from https://www.spiedigitallibrary.org/conference-proceedings-of-spie/4322/0000/Statistical-models-of-appearance-for-medical-image-analysis-and-computer/10.1117/12.431093.pdf
  8. Sultana, F., Sufian, A., & Dutta, P. (2018, November). Advancements in image classification using convolutional neural network. In 2018 Fourth International Conference on Research in Computational Intelligence and Communication Networks (ICRCICN) (pp. 122-129). IEEE. Retrieved May 9, 2021 from https://arxiv.org/pdf/1905.03288.pdf
  9. O’Mahony, N., Campbell, S., Carvalho, A., Harapanahalli, S., Hernandez, G. V., Krpalkova, L., … & Walsh, J. (2019, April). Deep learning vs. traditional computer vision. In Science and Information Conference (pp. 128-144). Springer, Cham. Retrieved May 9, 2021 from https://arxiv.org/pdf/1910.13796.pdf
  10. Data-flair-training python projects. Project in Python – Breast Cancer Classification with Deep Learning. Retrieved May 9, 2021 from https://data-flair.training/blogs/project-in-python-breast-cancer-classification/
  11. Breast cancer statistics 2020. Retrieved May 9, 2021 from https://www.nationalbreastcancer.org/wp-content/uploads/2020-Breast-Cancer-Stats.pdf
  12. What is Breast Cancer? Retrieved May 9, 2021 from https://www.cancer.org/cancer/breast-cancer/about/what-is-breast-cancer.html
  13. United States Cancer Statistics. Retrieved May 9, 2021 from https://gis.cdc.gov/Cancer/USCS/DataViz.html
  14. Invasive Ductal Carcinoma (IDC). Retrieved May 9, 2021 from https://www.breastcancer.org/symptoms/types/idc

back to top