Artificial Intelligence in Glaucoma: A Literature Review
Purpose:
This review explores the current clinical landscape, potential applications, and implementation challenges of artificial intelligence (AI) systems used to aid in glaucoma diagnosis, progression, and prediction.
Material and Mehods:
Nonsystematic literature review was carried out between March 1, 2022 and March 25, 2022 using the PubMed database using search combinations of “artificial intelligence,” “deep learning,” “machine learning,” “glaucoma diagnosis,” and “glaucoma progression”.
Results:
The study results demonstrate that numerous AI systems have been developed which may provide insight into new disease biomarkers and which provide high levels of sensitivity and specificity using structural (fundus photographs, spectral domain optical coherence tomography [SD-OCT]) and functional inputs (standard automated perimetry) for glaucoma diagnosis and determination of disease progression which are currently being investigated for clinical application. Fundus photographs are currently the most commonly utilized parameter in AI systems. Diagnostic criteria vary between AI systems, training datasets may not be publicly available, and external validation of models may not be performed, which represent potential barriers to clinical application and utility.
Conclusions:
AI systems show significant potential in the ability to aid in diagnosis of disease development and progression, predict disease development and progression, and identify new biomarkers for disease development to maximize quality and accessibility of care, decrease cost, and lower health disparities. Lack of definitive ground truth in glaucoma diagnosis and challenges in explainability of complex AI systems pose regulatory and clinical application challenges.
Introduction
Despite advancements in diagnostic technology and widespread availability of effective treatments, glaucoma remains the leading cause of irreversible blindness worldwide.1 Blindness due to glaucoma accounts for 3.6 million cases of the more than 33 million individuals aged 50 and older who were blind in 2020 worldwide and was the second most common cause of sight loss certifications in England and Wales in 2018.1,2 While risk factors for the development of glaucoma including elevated intraocular pressure are important metrics to incorporate into clinical care, no single screening test exists for the diagnosis of glaucoma.3 Accurate diagnosis of glaucoma represents a challenge due to identification and application of multiple datapoints, evaluation of structural and functional features of the optic nerve, and incorporation and evaluation of underlying ocular, systemic, and familial risk factors by well-trained, and experienced clinicians.4,5 Due to the asymptomatic nature of glaucoma until late in the disease course, early diagnosis and treatment to preserve functional vision throughout the course of an individual’s life relies on preventative care measures including comprehensive eye examinations.6 While access to eye care has improved globally, significant disparities persist. Among Americans of high risk for vision loss due to family history of eye disease, existing vision problems, being 65 years of age or greater, and having self-reported diabetes, only 59.9% of individuals were examined by an eye care professional on an annual basis.1,6
The prevalence of undiagnosed glaucoma is estimated to be up to 49% in the population and is associated with complex social, economic, and geographic factors which result in challenges in access to care.7,8 Overdiagnosis of glaucoma is also common leading to unnecessary financial strain on healthcare resources.9,10 Improvement in accuracy of glaucoma diagnosis, earlier identification of disease progression, improvement of screening techniques, prediction of disease development and trajectory, and identification of new biomarkers for disease development which may inform pathophysiology to ultimately lead to identification of new treatment targets are potential areas where our current clinical and research efforts can be enhanced. Artificial intelligence (AI) systems have been successfully deployed in the fields of radiology and pathology and have the ability to inform and improve challenges identified in glaucoma diagnosis and management while addressing global disparities in healthcare delivery, cost, quality, and accessibility of care.11-13 Areas of interest of application of AI-systems include screening and diagnosis, evaluation and incorporation of data, workflow optimization, and disease prediction.11,13 AI-based technologies in healthcare can be considered an additional tool to assist clinicians in facilitating medical decision making.11,13 As additional risk factors, biomarkers, and ancillary imaging strategies are identified, developed, and utilized in a clinical environment, more sophisticated techniques are needed to evaluate, weigh, and incorporate each parameter to support clinicians in the evaluation of patients who have, or may be suspicious of having glaucoma. This review describes the current landscape, contemporary developments, potential clinical applications, and challenges in implementation of AI systems in glaucoma diagnosis and management.
Material and Methods
The search for manuscripts included in this literature review was carried out between March 1 and March 25, 2022 using the database developed by the National Center for Biotechnology Information at the National Library of Medicine, PubMed (http://www.ncbi.nlm.nih.gov/pubmed). Search combinations included “artificial intelligence,” “deep learning,” “machine learning,” “glaucoma diagnosis,” and “glaucoma progression”.
Artificial Intelligence
Artificial intelligence is a branch of computer science which performs human-like tasks and encompasses machine learning and deep learning techniques.11,13Machine learning (ML) represents statistical modeling of artificial neural networks (ANNs). ANNs analyze data through interconnected nodes with modifiable weights.11,13 In the example of utilizing an ANN in order to identify a glaucomatous optic disc in a fundus image, first, a training dataset which is comprised of a large number of labeled images is provided to the system to analyze multiple features of each image to ultimately determine the correct diagnosis, which is often binary in nature (i.e. “glaucoma” or “not glaucoma”). The model continues to develop and adjust input weights by retesting until the desired output is achieved.10 Following successful training of the model it may be deployed to the evaluate test, or unlabeled data.
Deep learning is an extension of a classical neural network which uses convolutional neural networks (CNNs) or deep neural networks (DNNs).11,13,14 DNNs have multiple intermediate internal or hidden layers with each layer’s output being combined to provide input for a higher successive layer which allows for evaluation of a large number of traits which is an optimal strategy for complex image analysis in eye care.11,13,14 The intended output of an AI system must be clearly defined and must reflect the model’s intended use.11 Clinically, the output of an AI system can be used to inform risk prediction and may result in suggesting specific ancillary tests due to risk of disease development. The output may drive likelihood, probability, or prediction of disease, or highlight areas of interest in a particular image.12 The model’s use may be intended to be autonomous, that is the user may receive output of disease presence or staging, or it may be intended to be assistive where the output is used by the clinician in the diagnosis or treatment of disease to assist in disease staging, screening, or diagnosis.11,12 In addition to clearly defining output and purpose of an AI system, in order for a well-performing model to be translatable to a clinical setting, models must be explainable where developers must include a thorough description of data selection, handling, and processing, offer transparency of training data characteristics and describe how data derived from the system is stored, and validated.12,13,15
Glaucoma diagnosis
Glaucoma is a group of chronic, progressive optic neuropathies which cause characteristic damage to retinal ganglion cells identifiable as progressive retinal nerve fiber layer thinning and neuroretinal rim loss in a characteristic pattern which may be objectively documented using stereoscopic or monoscopic optic disc photographs.4,5 There is no single, gold-standard test in glaucoma detection.5,16 Specific glaucoma diagnosis, i.e. primary open angle glaucoma is made through careful clinical evaluation of the anterior chamber angle, peripapillary area, optic disc, and nerve fiber layer by well-trained and skilled clinicians, and objectively evaluated with quantifiable parameters using ancillary imaging techniques such as spectral domain optical coherence tomography (SD-OCT). 4,5 The characteristic retinal ganglion cell damage which occurs in glaucoma leads to functional vision loss detectable through standard automatic perimetry.4,5 Risk factors of development and progression of primary open angle glaucoma continue to be identified which include elevated intraocular pressure, low corneal hysteresis, high myopia, optic disc hemorrhage, low ocular perfusion pressure, in addition to systemic risk factors and familial features which impact risk assessment.4,5,16
Fundus photography in glaucoma
Monoscopic optic disc or fundus photographs represent a candidate on which to base widespread screening techniques in large populations due to the technology’s overall low cost and ease of deployment when applied to a portable device, such as a smartphone camera.17 Screening tests require a high level of repeatability, sensitivity and specificity to ensure that individuals with the disease are correctly identified (sensitivity) and to ensure that individuals who do not have the disease are correctly identified as healthy (specificity).13 The performance metric commonly utilized to measure accurate discrimination of a technology is the area under the receiver operating characteristic (AUROC), where a value approaching 1.0 identifies a near perfect classifier.13 A current barrier to widespread deployment of screening using fundus photography is that subjective grading of optic nerve images is required to be performed by human experts which is onerous and is subject to interobserver variability.16,18Monoscopic fundus photographs have been the most commonly utilized parameter for application of deep learning models for the detection of glaucoma.15 Deep learning models can be trained to learn the features detectable in a fundus image which are associated with the classification label of “glaucoma”. A recent meta-analysis evaluated the ability of various deep learning models to detect glaucoma in comparison to a varied population which included general ophthalmologists, ophthalmology residents in training and glaucoma specialists and found similar performance between the deep learning models and ophthalmologists and ophthalmologists in-training.19 Utilization of stereoscopic fundus images rather than monoscopic images in ML and DL models does not reveal a significant difference in ability to detect glaucoma.15
Spectral domain optical coherence tomography in glaucoma
Spectral domain optical coherence tomography (SD-OCT) provides an objective evaluation of the retinal nerve fiber layer (RNFL) and ganglion cell layer (GCL) or ganglion cell complex (GCC) which are comprised of axons and cell bodies of retinal ganglion cells, respectively which are damaged in glaucoma4,5. Limitations of SD-OCT are greatest in evaluation of myopic eyes and eyes of individuals which are not well-represented in normative reference databases.4,5, 20 While individual devices differ on specific metrics and images provided on a summary display of RNFL and GCC or GCL characteristics, all reports include comparison of an individual’s data to a normative reference database which identifies statistically significant deviation. Very early glaucomatous damage or progression may be inherently subtle and may not meet the threshold for detection of statistical significance.20Measurement of retinal nerve fiber layer thickness using commercially available SD-OCT systems relies on automated segmentation of the nerve fiber layer and macular thickness parameters such as GCL thickness. Images must be carefully evaluated by the clinician due to the likelihood of images containing segmentation error which may lead to false-positive and false-negatives, increased test-retest variability, and reduced ability to detect disease over time as well as inaccurate diagnostic interpretation.21,22 Recently, a DL-based model has been developed to identify segmentation errors from SD-OCT optic disc scans in order to rapidly assess images for artifacts with high sensitivity and specificity.22 The development of segmentation-free SD-OCT data as the basis for DL models has the potential not only to discriminate glaucomatous disease from healthy eyes using RNFL analysis while eliminating errors in automated segmentation, but through the application of class activation maps, or heat maps which identify areas of interest to the CNN, to identify new parameters important in disease detection.23 Identification of new parameters of clinical relevance may lead to improved understanding of pathophysiology of glaucoma which may lead to the development of new treatment targets.23
Standard automated perimetry in glaucoma
Functional damage due to glaucoma is determined through the use of standard automated perimetry.4,5 Challenges in repeatability and reliability of automated visual field testing due to patient-attention characteristics may lead to artifacts and unreliable results resulting in delayed diagnosis or detection of progression.24,25 Logistical challenges of frequent visual field testing in a busy clinic environment including the need for a separate, quiet space to perform testing and continuous technician oversight may lead to avoiding or delaying testing which may lead to delayed glaucoma diagnosis or delay in determination of progression.24,25 Identification of glaucoma progression can be challenging due to the typical slowly progressive nature of the disease and need for evaluation and comparison of multiple datapoints over time.4,5 With currently available systems, the time required to detect glaucoma progression by visual field change in 80% of eyes considered to have rapidly progressive disease (mean deviation [MD] loss of -2 decibels [dB]/year) when visual field testing is performed once per year is 3.3 years and improved to 2.4 years when testing was performed twice per year.26 In individuals who exhibit a more typical rate of disease progression, 0.5dB/year, determination of progression may take up to 7.3 years.26 Frequency of visual field testing is individualized based on patient characteristics such as age, level of damage, test-retest variability, and previous history of progression.26 A recently described machine learning model applied to visual field parameters was able to detect disease progression 3 months earlier than traditional parameters.27
Considering that visual field testing is labor-intensive and heavily reliant on patient attention and test-taking ability, development of models which predict functional change may aid in determining the optimal frequency of visual field testing or those which estimate functional parameters based on objective imaging data would have high clinical utility. Modeling of parameters using data from automated visual field studies to predict visual field loss and disease progression has long been of interest and was made commercially available using linear regression models in the late 2000s.28 Development of a trained ANN using visual field parameters in 2013 was able to detect glaucoma with higher sensitivity and similar specificity to clinicians, exemplifying the challenges of visual field interpretation and potential for improvement.29 More recent DL models have aimed to differentiate glaucomatous visual fields from normal visual fields, classify visual field defects, and identify visual field progression.30,31
Machine to machine learning
Predicting metrics produced by one ancillary testing modality using an entirely different modality is described as “machine-to-machine” learning.26,32-34 Machine-to-machine learning allows for estimation of features developed by high technology structural and functional devices such as SD-OCT and automated visual fields to be determined by use of a lower-technology modality, for example monoscopic fundus photography.24,32-34 Retinal nerve fiber layer thickness has shown to be accurately predicted based on fundus photographs, and perimetric parameters have been shown to be accurately predicted from RNFL thickness measurements.24,32,33 Deployment of a relatively simple, potentially portable instrument which is capable of fundus imaging could thereby provide the ability to estimate other data points which traditionally require nonportable, expensive systems which would provide a significant opportunity within resource-limited regions.
Identification of new biomarkers
Primary open angle glaucoma is a heritable condition and additional risk factors for disease development including elevated intraocular pressure, and ethnicity are also heritable traits.35 AI systems have the ability to process large amounts of information and identify trends in data which may be associated with disease presence or progression. In the majority of patients who develop glaucoma, disease development is due to complex genetics: the interaction of many genes may lead to disease development, where each gene contributes a small amount of risk, but no gene causes disease on it own.35 127 risk loci have been identified as part of a genome-wide meta-analysis which may contribute to understanding of disease pathogenesis, influence biological pathways which contribute to disease development, or which contribute to disease-associated features including elevated intraocular pressure.35 Recently, an additional 3 genes have been identified which represent diagnostic markers of glaucoma using a deep-learning model, and one gene in particular involved in protein coding, ENO2, has been identified to represent a potential target for therapy.36 Future applications of genetic evaluation to clinical care of the patient with glaucoma or who may be suspicious of having glaucoma may allow for the development of a polygenic risk score for an individual which takes into account the impact of all strong and weak genetic risk factors for disease development, which would provide an additional metric to be considered in glaucoma risk evaluation.
Challenges in AI and glaucoma
Establishment of a reference “ground truth” for glaucoma diagnosis and progression is necessary for any AI system to use as a reference standard.15,25,30 The complexity of the evaluation and interpretation of structural and functional parameters and resulting lack of an international consensus standard for the diagnosis of glaucoma requires researchers to establish their own definition of glaucoma, with some studies basing disease presence on cup to disc ratio alone.15,25 Additionally, protocols for the development of grading criteria and labeling of datasets are often unclear.15 While many models focus on binary detection, primary open angle glaucoma represents a spectrum of disease from clinical glaucoma suspect through early manifest disease to advanced disease.4,5,25 The resulting variability of diagnostic criteria used to meet the definition of glaucoma between studies may impact performance of DL models when applied to real world populations or during external validation.15,25,30 A potential significant barrier to regulatory approval and clinician uptake of DL models to assist in clinical decision-making surrounds the “black box” phenomenon which occurs between the model inputs and outputs.14 In order for regulatory, patient, and physician acceptance of new technology, there must be clear understanding and explainability of the purpose and limitations of the model and the ability to define the type of output and purpose of the output developed by the model.12,30
Advances and developments of DL models in glaucoma have largely been based on private datasets.15 The quality of the training dataset should be such that it contains broad representation of individuals for which the model is likely to be deployed in, including individuals with a variety of ocular and systemic pathologies as well as a wide range of quality of images.12,25 Stringent inclusion and exclusion criteria in studies evaluating DL models in glaucoma may overestimate the sensitivity and specificity of disease detection when applied to a real-world population.25 External validation of a model has been suggested to determine the ability of the model to perform outside of a controlled setting, however; validation reference data is often based on clinician-derived standards, or normative reference database data which may introduce additional sources of error and bias into the model.12,15
Conclusions
AI systems show significant promise in their ability to screen populations, aid in diagnosis of disease, predict disease development and earlier identify disease progression, and identify new biomarkers for disease development which may provide an insight into underlying pathophysiology and ultimately maximize efficiency and optimize workflow in clinical practice. Current systems primarily focus on individual disease states and binary detection, which limit clinical utility. Looking ahead, complex deep learning models which have the ability to “multi-task” and evaluate multiple disease states and datapoints will further improve clinical application of AI systems. Researchers developing AI systems must be able to clearly define the model’s purpose, output specifications, and describe data processing, handling, and validation with a high level of transparency. Deep learning and its application to clinical care of individuals with glaucoma or who may be suspicious of having glaucoma should be regarded as an additional tool to aid in clinical decision making in order to support the clinical skills necessary to provide high level care for an aging population with increasing life expectancy in order to maximize quality and accessibility of care, decrease cost, and lower health disparities.
Palotie A., van Duijn C., Haines J. L., Hammond C., Pasquale L. R., Klaver C. C. W., Hauser M., Khor C. C., Mackey D. A., Kubo M., Cheng C. Y., Craig J. E., MacGregor S., Wiggs J. L. (2021). Genome-wide meta-analysis identifies 127 open-angle glaucoma loci with consistent effect across ancestries. Nat. Commun., 12, 1258.