Artificial Intelligence May Fall Short When Analyzing Data Across Multiple Health Systems
Study shows deep learning models must be carefully tested across multiple environments before being put into clinical practice.
Artificial intelligence (AI) tools trained to detect pneumonia on chest X-rays suffered significant decreases in performance when tested on data from outside health systems, according to a study conducted at the Icahn School of Medicine at Mount and published in a special issue of PLOS Medicine on machine learning and health care. These findings suggest that artificial intelligence in the medical space must be carefully tested for performance across a wide range of populations; otherwise, the deep learning models may not perform as accurately as expected.
As interest in the use of computer system frameworks called convolutional neural networks (CNN) to analyze medical imaging and provide a computer-aided diagnosis grows, recent studies have suggested that AI image classification may not generalize to new data as well as commonly portrayed.
Researchers at the Icahn School of Medicine at Mount Sinai assessed how AI models identified pneumonia in 158,000 chest X-rays across three medical institutions: the National Institutes of Health; The Mount Sinai Hospital; and Indiana University Hospital. Researchers chose to study the diagnosis of pneumonia on chest X-rays for its common occurrence, clinical significance, and prevalence in the research community.
In three out of five comparisons, CNNs’ performance in diagnosing diseases on X-rays from hospitals outside of its own network was significantly lower than on X-rays from the original health system. However, CNNs were able to detect the hospital system where an X-ray was acquired with a high-degree of accuracy, and cheated at their predictive task based on the prevalence of pneumonia at the training institution. Researchers found that the difficulty of using deep learning models in medicine is that they use a massive number of parameters, making it challenging to identify specific variables driving predictions, such as the types of CT scanners used at a hospital and the resolution quality of imaging.
“Our findings should give pause to those considering rapid deployment of artificial intelligence platforms without rigorously assessing their performance in real-world clinical settings reflective of where they are being deployed,” says senior author Eric Oermann, MD, Instructor in Neurosurgery at the Icahn School of Medicine at Mount Sinai. “Deep learning models trained to perform medical diagnosis can generalize well, but this cannot be taken for granted since patient populations and imaging techniques differ significantly across institutions.”
“If CNN systems are to be used for medical diagnosis, they must be tailored to carefully consider clinical questions, tested for a variety of real-world scenarios, and carefully assessed to determine how they impact accurate diagnosis,” says first author John Zech, a medical student at the Icahn School of Medicine at Mount Sinai.
This research builds on papers published earlier this year in the journals Radiology and Nature Medicine, which laid the framework for applying computer vision and deep learning techniques, including natural language processing algorithms, for identifying clinical concepts in radiology reports for CT scans.
About the Mount Sinai Health System
Mount Sinai Health System is one of the largest academic medical systems in the New York metro area, with more than 43,000 employees working across eight hospitals, over 400 outpatient practices, nearly 300 labs, a school of nursing, and a leading school of medicine and graduate education. Mount Sinai advances health for all people, everywhere, by taking on the most complex health care challenges of our time — discovering and applying new scientific learning and knowledge; developing safer, more effective treatments; educating the next generation of medical leaders and innovators; and supporting local communities by delivering high-quality care to all who need it.
Through the integration of its hospitals, labs, and schools, Mount Sinai offers comprehensive health care solutions from birth through geriatrics, leveraging innovative approaches such as artificial intelligence and informatics while keeping patients’ medical and emotional needs at the center of all treatment. The Health System includes approximately 7,300 primary and specialty care physicians; 13 joint-venture outpatient surgery centers throughout the five boroughs of New York City, Westchester, Long Island, and Florida; and more than 30 affiliated community health centers. We are consistently ranked by U.S. News & World Report's Best Hospitals, receiving high "Honor Roll" status, and are highly ranked: No. 1 in Geriatrics and top 20 in Cardiology/Heart Surgery, Diabetes/Endocrinology, Gastroenterology/GI Surgery, Neurology/Neurosurgery, Orthopedics, Pulmonology/Lung Surgery, Rehabilitation, and Urology. New York Eye and Ear Infirmary of Mount Sinai is ranked No. 12 in Ophthalmology. U.S. News & World Report’s “Best Children’s Hospitals” ranks Mount Sinai Kravis Children's Hospital among the country’s best in several pediatric specialties. The Icahn School of Medicine at Mount Sinai is one of three medical schools that have earned distinction by multiple indicators: It is consistently ranked in the top 20 by U.S. News & World Report's "Best Medical Schools," aligned with a U.S. News & World Report "Honor Roll" Hospital, and top 20 in the nation for National Institutes of Health funding and top 5 in the nation for numerous basic and clinical research areas. Newsweek’s “The World’s Best Smart Hospitals” ranks The Mount Sinai Hospital as No. 1 in New York and in the top five globally, and Mount Sinai Morningside in the top 20 globally.