❮ Back to Blog

How Subject Matter Experts Are Vital to Data Science Success

by John Enderle on September 18, 2017 at 6:11 PM

blog-subject-matter-experts-analytics-any.png

People sometimes get antsy when they hear about big data, machine learning or data science, thinking that these concepts were developed by people who want to replace years of nuanced experience with cold, rule-bound, un-nuanced machines.

Naturally, this belief has led to worries about workers being replaced and losing jobs. There are also concerns that the machine methods, once they replace experts, will never acquire or leverage the key experience that experts have. However, I believe that should not be the case and here at Intermedix the approach our data science teams take isn’t about replacing experts at all. On the contrary, conferring with experts is a practice utilized during almost every stage of our processes here at Intermedix.

Start with an Expert “Gut-Check”

From the very start of a project, we consult with experts–coders, doctors, business experts of many sorts and so forth. We need to know, how we can be useful and what value we can provide. Additionally, if we answer a certain question, how much would that help? Anything that helps us figure out areas of value are given and conveyed to our client. We want to make sure we are not wasting our time in answering unnecessary questions or working on problems with previously established solutions.

Additionally, we need to understand what’s possible and already been tried. To use an analogy to weather, we’d want to know that predicting rainfall or temperature on a specific day more than a couple of weeks in advance can’t really be done, other than by knowing the long-term averages. And we wouldn’t want to predict something that’s completely random. We might be able to do something with how diseases and epidemics spread, but if we were to go down that road, we would do so with an awareness of all of the great work which has already be done in that field; we’d work with those experts. Most importantly, we would want to have a clear idea of how our technologies, capabilities, resources or data could give us new abilities.

When it comes to new abilities, we want to be groundbreaking. But to do that, we need to have a good understanding of our own capabilities, what’s already been done and how we might bring them together. For example, the main advantages that computers hold over people are the ability to look at large volumes of data in a small amount of time. If you give a person and a computer the same information, the human is usually going to have better understanding and comprehension of that information than the computer. In practical terms, though, the computers often have an advantage, because they effectively have access to more information. A computer can’t read, like a person, but in the time it takes a person to read one page, the computer can process hundreds of thousands of pages.

Apply Feature Engineering

Of course, for this to really be an advantage, you can’t just have a huge chunk of data – that data needs to actually contain something useful. Once again, experts come in. It’s important to understand that data science work isn’t all about writing new code, developing new algorithms or doing complex math. These actions are performed when it’s appropriate, but much more of our work at Intermedix is concerned with applying existing methodologies, or slight variations thereupon, to tackle new problems. The bulk of the work there comes in something we call “feature engineering.” Feature engineering is the process of taking large chunks of bulk, raw data, and filtering, combining and transforming it to end up with well-formatted data the algorithms can efficiently use.

Every piece of domain knowledge and expertise makes a real difference here. What diseases are communicated and how? Knowing whether transmission is via airborne methods, direct contact or some other means can make a difference. What environmental factors have an impact? What process does coding data go through by the time it gets to us? Which different ongoing trends link to each other? Some of this kind of information we would know anyway; for example, smoking is strongly linked to lung cancer and the risk of heart disease increases with age. However, there’s an awful lot that we wouldn’t hope to capture without consulting experts.

Once we’ve consulted with experts, we use our own expertise to pull in other kinds of data. Some of this data we may include based on our experience of its general usefulness. Yet, the majority of data comes from a combination of our expertise and that of our experts. One of the major ways this comes in is where there are known links to whatever it is we’re trying to predict, but we don’t have access to that data. For instance, if we’re trying to look for a lung condition where we know asbestos is a key contributing factor, it’s very likely we wouldn’t specifically have data on asbestos contact regularly. This is where our knowledge as data scientists can combine with our experts—we can look at what data we do have access to and know how to link that back to likelihood of contact with asbestos, which can in turn get us to our target.

Vet Model Accuracy

Experts are also useful once we have models built. In many situations, we can use experts to help evaluate the results that we’re getting. In many cases, we’re looking not so much to predict the future as we are to categorize the past. In a lot of problems like this, human experts can give us the right answers fairly definitively; this is extremely beneficial as it’s just not practical to have people look at thousands upon thousands of examples. In cases like this, Intermedix will often have our human experts on standby to check a few dozen cases, make sure that our results are what we expect them to be and ensure that everything is calibrated properly.

The other service that our experts provide is helping us make sure that what our models are telling us makes sense. This is not quite as important as getting them to work accurately, but in most cases its close. This is in recognition that spitting out a bunch of numbers, even if they’re accurate, doesn’t necessarily help so much. Usually, we want to take our predictions and use them to drive some further action. Accurately knowing which cases are high risk helps us here, but knowing why and what factors lead to that determination can often help us understand what actions to take.

Moving forward, then, it is important to remember that no matter what technologies evolve and help assist in job performance or analytics, our experts are not obsolete – they are and will continue to be critical to innovation and business success.

New Call-to-action
author avatar

This post was written by John Enderle

John Enderle is a Data Scientist at Intermedix.