I’ve been a part of several teams now who have considered, or are actively considering, how they might incorporate machine learning (ML) into their business as usual. In these cases I’ve seen some predictable, and avoidable, misunderstandings.
Misunderstanding #1: ML alone will precipitate gold from voluminous corporate databases.
It can be tempting for decision makers to think of ways to leverage their resources for value. Some, quite innocently, look at corporate databases and conclude ‘there must be value in there’, and assume ML will draw that value out.
This error can be quite frustrating and costly if it lives for too long among influential decision makers.
A more productive process for creating value – at least as far as ML is concerned – is to examine where the team is spending the most amount of time and effort. Are any of those activities repetitive? Or do any require an experienced person to gather a lot of data to make a determination? Of you answered ‘yes’ to either of those questions, then it is possible – though not certain! – ML could help you.
Examples of where ML has been quite useful:
- Automating the diagnosis of mechanical failures in oil wells given static and time data.
- Automating MRI interpretation for radiologists.
- Determining if a customer is likely to buy a product based on their demographic information (as they walk into the store!).
- Using a person’s alcohol purchasing history to determine their voting behavior in US presidential elections.
- Automatically interpreting oil well drilling information to raise alarms to drilling engineers.
Misunderstanding #2: ML has no overhead above and beyond the final ML analysis.
I’ve heard a famous machine learning professor – who consults regularly – say there’s no point to explore ML unless there’s already a data warehouse available. If you don’t know if your company has a data warehouse, then it probably doesn’t.
A data warehouse is a database that pulls in information from many different sources, and can be accessed using regular software tools across the organisation. The warehouse may be in Amazon AWS, or Google Cloud Storage, or could be a solution like the PI from OSIsoft (common in industrial plants, factories and oil/gas assets).
A data warehouse provides a foundation where data importing and cleansing can be automated. Once established, a data warehouse is the natural place to deploy a machine learning model (i.e. where you can generate newly calculated values in your database) and make that available to the organisation through established data warehouse tools.
Misunderstanding #3: ML is supposed to be quick and cutting-edge. It doesn’t need laborious, manual, work.
Some ML requires a reliable ground truth dataset – called training data – so manually establishing a reliable ground truth can be beneficial to an ML project. Such ML techniques are called ‘supervised learning’ techniques – and the results of the models are only as good as the training data. So investing in very high quality training data can be valuable.
Take ML algorithms that automate the interpretation of MRI scans, for example. It is easy to see why errors should be avoided – so training the algorithm on many images with known diagnoses would help ensure the resulting ML model is robust. But that means time and cost are required to generate the input training data before a suitable ML model can be built.
Misunderstanding #4: ML should be used to do cutting-edge work in our non-computer science field, not boring repetitive tasks like reporting.
In reality, ML can really shine when used to automate categorisation work currently being done manually.
I once knew a company that hired 21 engineers (3 teams of 6, and their leaders) to essentially do three things repeatedly:
- Push wells to produce as much natural gas as possible.
- If the well was already producing, i.e. was ‘up and running’, the options for the engineer was ‘speed up the pump’, ‘slow down the pump’ or ‘do nothing’.
- Diagnose why an off-line well had failed, so that it can be repaired properly.
- While there were ~60 various ways the wells could break, 3 of the failure mechanisms comprised ~85% of all failures!
- Fill out paperwork related to issue #2 so that the well could be repaired.
- The paperwork was very repetitive.
In total, this activity was worth about $25M/year in revenue and governed $240M/year in costs. So employing people to do the task was a no-brainer.
Even so, and it may already be obvious, but this line of work was ripe for automation using ML. All production optimisation, diagnostics, and paperwork could be automated by a single python engineer and overseen by a single legacy engineer. Indeed, if the ML model was watched carefully, the results would have been faster, more accurate and more reliable – and would never take a vacation day!
Note: You might think the other 16 engineers (and 3 team leaders) would be made redundant – but they may not be! They could be re-allocated to other valuable work internally in the organisation. It just depends on the opportunities available.
So yeah, those are the headline issues I’ve seen when decision makers are considering ML in their workflows. Just remember – ML, just like other techniques at work, is not magic! It is only a tool to help here and there. At times it can help a lot! But it does require some basic technical and organisational support.