In part 1 of this four-part blog series, I invited you, the reader, to join me on a data-driven journey: Exploratory Analysis and Discovery in the OR. The goal was to discover whether there were any business process insights that may be leveraged from the hospital supply chain at the point of use. I also hinted at several analytical steps (descriptive, diagnostic, predictive, and prescriptive) that such an endeavor typically goes through to solve an impending business problem.
The Surgical Recipe
Preference card management is a well-known healthcare industry process. Whether your expertise falls in a clinical or logistics domain, if you are a healthcare professional in a perioperative setting, you know exactly what I am talking about. If you are not, it simply means that each physician performing specific surgical procedures has their own ‘preference’ of products for those types of interventions that they perform regularly. Just like famous chefs creating their signature culinary masterpieces have their own recipes, types of ingredients, and tool preferences, surgeons follow a similar approach. The rationale is that what they have been accustomed to using during their training is likely what they will stick to during their professional life. Sure sounds like a sensible process.
So Where Is the Business Problem?
Well, apparently across hundreds of surgeons for a single hospital system (IDN), things can get out of hand quickly. This is because each surgeon has their own procedure-specific preference cards, performs a variety of procedures, and uses all sorts of products from several different vendors. And this happens day in and day out.
I am curious to know what ‘getting out of hand’ means. To what extent is this an issue? This seems like a great first step to find out ‘what happened?’ The key to this lies with Descriptive Analytics that I referred to in part 1. Observing and detecting phenomena that have taken place in the perioperative setting will very likely give us a better understanding of the reality in the past, obviously represented by the data.
One of the key premises of any analytical activity is in obtaining the data. It’s the cornerstone of any machine learning endeavor. Without it, this exercise is non-starter. Specifically getting it in my hands first and foremost seems to be far more difficult than anticipated. It almost feels like pulling teeth.
This means that I have gone through all the legal hurdles to ensure that I have permission to obtain the data, which may include scrutinizing and/or agreeing to certain terms and conditions which may be necessary before any hospital system divulges their data for analysis.
However, it does not end there. Afterwards come the transparency and privacy concerns. Working in the medical field one is acutely aware of ensuring HIPAA-mandated, Protected Health Information (PHI) anonymity. Guarding PHI means that no personally identifiable patient information should ever be accessible to anyone who does not have the legal or medical right to see it. Of course, this includes the data scientist, who has set out to discover perioperative product usage—in this context, namely me.
This necessitates that all PHI-related data needs to be extracted and scrubbed on site at the medical facility, even before it comes into my possession. While PHI may not be necessary (or legal for that matter) for preference card and product usage analysis, surgeon-specific information is. This is where personally identifiable information (PII) of surgeons and the privacy surrounding it comes into play. This is especially true when competitive anonymity comes into question for those who are analyzing the data. To preserve the essence of the complex inter-relationship among many elements of the data set, merely scrubbing them (like those related to the surgeon’s identity) will not only throw away the potential for valuable insight but would also break the relationships and the consistency that holds the entirety of the data together.
Data Engineering: The Challenges Continue to Roll in…
The latter point in the previous paragraph means that not only must patient information (PHI) be scrubbed, but also all surgeon-specific information (PII), which needs to be anonymized as well by the hospital IT staff. Subsequently, the data needs to be prepared, exported, packaged, and made ready to be sent over. Another technical hurdle that needs to be overcome is the logistics of transporting massive amounts of data. Given the typical size of these enterprises, this is most likely in the order of hundreds of gigabytes.
Assuming that the data comes into my possession, I usually have to follow a reverse process, whereby I need to prepare, ingest, format, and eliminate incomplete, null, duplicate, or corrupt data elements. If possible and feasible, I should follow some heuristics to fill in missing data, then scale, consolidate, and finally store the data in a form that is ready for further analysis. Given the likely size of the (big) data, I must employ other technologies that will enable me to handle such gigantic data sets before extracting business insight from them.
Remember, I still have not gotten to the step where I can ask the data “what happened?”
The approaches that I describe above make up parts of a discipline knows as data engineering, albeit in a superficial way. It is a crucial process that typically makes up a very sizable proportion of any analytical effort. Getting it right is absolutely a necessary precursor to building a solid foundation on which the entire analytical endeavor will be based.
Having a shaky foundation at this early phase of pre-analysis will not only set the stage for obtaining a dangerously distorted view of what has happened, but it will also mean building upon this shakiness, jeopardizing all my subsequently complex and increasingly sophisticated phases of the process. A solid data engineering foundation with clear, clean, and robust data is necessary to extract actionable business insight expected to have a measurably positive business outcome. Otherwise, I will be extracting inaccuracies, misinterpretations, and spurious correlations from an analytical house of cards, which will be full of nothing but fallacies.