<img height="1" width="1" style="display:none;" alt="" src="https://dc.ads.linkedin.com/collect/?pid=152233&amp;fmt=gif">
❮ Back to Blog

How We Successfully Used Unstructured Data in Healthcare Analytics

by Damian Mingle on August 1, 2017 at 2:14 PM

blog-unstructured-data-2.png

All healthcare data is “dirty” from an analytic standpoint.

While precision medicine is progressing today due to a combination of advanced analytics and physician expertise, the new frontier in this field is the analysis of unstructured data.

We know that most medical information resides in unstructured form, with the most common being clinical notes and images. However, most of the analytics in healthcare today focus on structured data, typically from hospital EMR and claims/reimbursement related systems.

Data is everywhere. Images are no different than 0s and 1s. But many data scientists do not touch imaging because it is layered and has a spatial aspect; it is not flat and not labeled. A computer reads this kind of data differently and can retain the memory of what is being viewed.

So we decided to break the mold and explore the new frontier in the analysis of unstructured data by addressing a specific case where images were used effectively in vision care, and explore the use of images across a wide range of situations and use cases.

Analysis of Images for Diabetic Retinopathy

Diabetic retinopathy is a condition suffered by diabetics that causes progressive damage to the retina. A classification system used in these cases divides the retina into quadrants and counts hemorrhages, exudates, neovascularization and other defects. Counting hemorrhages is an objective measurement, but classification by visual analysis is subject to human error.

This causes a great deal of variance. The premise here is that a data science approach, driven by technology, could reduce or eliminate this variance by enabling the analysis of a large number of images to more accurately detect patterns that can inform a physician’s diagnosis.

Our starting data set for analysis was a picture archive communication system. We conducted the evaluation of diabetic retinopathy through an analysis of retinal images that gave early warning of disease progression. By applying the industry standard, consisting of a five-part scoring system, each individual patient was evaluated for degree of risk, and the results were then compared with the physician’s original diagnosis.

Most ophthalmologists see about 5,000 patients per year (or 10,000 eyes). In this case, the analysis included a half million images that were used to train a computer to recognize the disease stages by “looking at” a more expansive view of the image and discounting errors. In this way, 50 years of clinical experience were compressed into a single 24-hour period.

The result of an enhanced ability to analyze a vast number of images using software greatly improved the quality of diagnosis. These benefits can be extended to other use cases in the field of radiology in general.

The Data Science Principles and Approach to the Problem

Although human beings learn over time, gathering knowledge and applying it, a computer retains everything that is learned and continues to evolve at a staggering rate.

From a data science perspective, the approach to this project was to treat this as a multi-class classification problem. In this case there were five classes: four disease states and one “no disease” state. The computer was trained to identify disease states based on the images. The nature of the problem required the team to create a neural network algorithm.

Initially, a small set of images was provided to us (35k images). Neural network algorithms need large datasets to expand the opportunity to find issues. The data sets were expanded synthetically to train the computer to see an “infinite” number of situations. This approach enabled the physician to benefit from computerized corroboration based on a better process and greater history.

Project Execution and What We Learned from the Process Conclusions

Several conclusions and business benefits were derived from this unique project:

  1. Software developed from this use case could be extended and applied to imaging software (along with pre-authorization software, for example). In a real-time context, the provider could actually use this to validate and ensure their impression is accurate. It creates a real opportunity to enhance quality so that treatment matches diagnosis.
  2. In rural settings, physicians can read images with software and obtain “second opinions” without requiring patients to go anywhere else.
  3. This can be applied to high-volume radiology use cases such as mammography that typically require a dedicated radiologic resource. Most radiologists read everything from broken bones to cancer scans. With a computer analysis providing expertise that a single radiologist cannot gather in a lifetime, physician training is enhanced and mistakes minimized. Of course, a physician review would still be a necessity.

The use of unstructured data is a new frontier in healthcare data analytics. The analysis of images, in particular, requires specialized skills and advanced software. Typically, the computing infrastructure has to be more robust than in standard environments. Since images form a significant part of patient medical history, it is important to explore this frontier and expand to other use cases such as radiology departments.

The ability to rely on a computer to process large volumes of images and approve the results is a game changer in precision medicine. The resulting ability to scale would be tremendous and the accuracy would certainly be improved as well. Finally, the cost savings could also be significant.

Ultimately, the benefits will be in the form of early detection and treatment, which will result in improved quality of care and lower costs of care for patient populations.

New Call-to-action

Recent Posts

author avatar

This post was written by Damian Mingle

Damian Mingle is Chief Data Scientist for Intermedix. In this role Mr. Mingle works with a team of experts to extract knowledge from data. Prior to Intermedix, Mr. Mingle held positions with companies like WPC Healthcare, Hospital Corporation of America (HCA), Coventry Healthcare (an Aetna Company), and Morgan Stanley. As a leading authority on healthcare data science, Mr. Mingle speaks nationally and internationally on patient safety, global health, and applied data science. His current board service includes the Neural Network and JUMP. Mr. Mingle’s work has been published in The International Clinical Pathology Journal, Current Trends in Biomedical Engineering & Biosciences, and The International Journal of Biomedical Data Mining. He is ranked in the top 1 percent globally as a data scientist through regular competitions. Mr. Mingle was the 2015 NTC Data Scientist of the Year, 2016 Sepsis Hero Finalist, and the 2017 Best Use of AI in Food, Health, and Medicine for machine learning innovation in the Emergency Department.

Connect with Damian