Machine learning (ML) based products have particular characteristics and challenges, from data quality to counterfactual problems and explainability. What then are the implications of ML products for team structure, focus, and hiring?
Data science jobs are increasing at around 30% year on year, and if you don’t already have a data scientist in your ranks there’s a good chance you will soon. Perhaps you already have a number of data products you use to segment customers, predict prices or improve your product in other ways. Or maybe your core product is the machine learning, making recommendations and predictions for healthcare, security, ad tech or other applications.
ML lives and dies by the data it relies on; garbage in, garbage out. The wrong decision can be made if data is missing or biased, if there are collisions, or if data is received at the wrong time or in the wrong order. Furthermore, many ML models’ predictions will be used to determine a certain path in real time or near-real time. So once the data has been fed in and a decision has been made, it’s too late for a fix. The user has already been ushered down the wrong path.
Going all out for accuracy without considering how your users will digest the output can undermine trust in your product. How can you trust this thing if you can’t understand it?
Whether the data is good or bad, once interventions are made a counterfactual problem arises. The path having been taken, you cannot prove what would have happened otherwise – and by extension whether your product was right.
A/B testing can address this, in principle: let a small number of users through without intervention, and see if your predictions were correct. But in practice this can be a tough sell in cases where there is a very measurable cost – financial or otherwise – of not intervening. And even if agreed, there may be a temptation to constrain the test artificially and thereby undermine its premise.
Then there’s the well-publicised black box challenge of ML products. We may know the input and the output, but not what the algorithm did in between. This is the question of “explainability”: the extent to which a human can analyse and explain the reasoning behind a prediction made by ML. In the UK, the Information Commissioner’s Office and The Alan Turing Institute are developing practical guidance and checklists to help with this challenge.
Different ML techniques have different levels of explainability. For example, decision trees are more explainable than neural networks, which are more sophisticated. There’s often a tradeoff between accuracy and explainability. Going all out for accuracy without considering how your users will digest the output can undermine trust in your product. How can you trust this thing if you can’t understand it?
If you’re offering a B2B service, this can affect both the buyer of your product and the end users. The end users may perceive machine learning as a threat, rightly or wrongly. Even products which set out to augment existing teams will change how they work, and change is hard. For example, an insurance analyst may shift from personally reviewing applications to directing ML-driven review tools and investigating macro level trends.The perceived impact needs to be thought through and mitigated.
Given these particular challenges, how should the ML product be managed?
To address the critical role of data quality, you need to focus on how data is received and stored, and who is responsible. Data engineering is commonly thought of as the plumbing, but I don’t think that really does it justice; it’s also the architectural plans and foundations of the house. Without good data engineering the edifice will fall apart. So it’s important to consider who will work on it, their level of experience, and how much time should be dedicated to it. Rather than distribute the responsibility, it may be beneficial to have a centre of excellence in the interests of consistency, efficiency and ensuring the data gets the attention it deserves.
At Ravelin, for example, one of our teams is explicitly focused on data engineering. The team supports the needs of multiple other teams such as data scientists, analysts, and integration engineers. This runs the gamut of raw data being received to insights being extracted. In between, data may need to be normalised to ensure consistency and aid comparison; it may need to be enriched with complementary data sources; numerical values may need to be fed into calculations to populate a different data field; the list of actions goes on. And it’s not just the data this team grapples with, it’s also the surrounding infrastructure and pipeline for training and deploying models.
Data Science and Analysis
When it comes to data science, you could choose to create a centralised shared service team, or else distribute data scientists in multidisciplinary teams. Each approach has trade-offs.
With a centralised team, especially one where the data scientists are largely working on the same problem area, the central model makes it easier for lessons to be shared daily and applied rapidly. It creates a shared purpose and a focal point of responsibility for the accuracy of the ML: the buck stops here. It may also be the only option, depending on the size of the company or number of data scientists. The risk is the team becomes a functional silo, not working closely enough with product-engineering teams, with goals and priorities which are not in sync.
Conversely, embedding the data scientists in cross-functional teams can help with alignment and the early stages of new product development. That team will have reduced dependencies and increased autonomy. But this may not fit so well with the skillsets, areas of interest and future development of the data scientists. It may disrupt the balance of operational data science work, such as constantly training new models, and new product development work. Asking a data scientist to focus only on one area might be equivalent to asking a product manager to only look at the UX, or only the tech, or only the business.
The most important thing is close collaboration between data science, product and engineering, regardless of structure. Make sure you don’t miss out on the considerable brains trust of the data science team, including in less obviously “data sciencey” areas. On the one hand, you can add further diversity of thought to your decision-making. And on the other, you should ensure you aren’t making product changes with unforeseen impact on the quality of your ML. For example, you could unwittingly stop models from learning by encouraging user behaviour which cuts out particular data signals.
Another consideration is who is responsible for data analysis. Just the data science team? Or are there other people, teams or roles which could augment the machines with the benefit of real-world context, intuition and interpretation? You might look at existing analyst or customer care teams. At Ravelin we created a specific investigations team for this purpose, a blend of client support, fraud investigation and data science.
I don’t think a particular profile of product manager is needed for ML products, certainly not across the board. We’ve recruited a mix of technically oriented product managers and generalist product managers.
With ML products there is more emphasis on the data and APIs, and good analytical skills and attention to detail go a long way. Likewise proficiency with Chrome Developer Tools and SQL. You’ll regularly need to tell the difference between a bug and an issue with the underlying data. Then think of better ways to handle that data.
But the product management fundamentals for ML products remain the same. Good teamwork and communication, a large dose of curiosity, and an eagerness to learn are more than enough for someone to be successful working on ML products.
Before you start on a product role in this area, you should read up on metrics such as precision, recall, F1 scores and other common measurements. And read an entry-level machine learning book, check out the Google machine learning recipes videos, or this visual intro to ML.
In the first month you’ll have to get your head round many new concepts. The existence of something called a confusion matrix will never seem more apt. But soon things start falling into place. You’ll figure out what you need to know, and what you don’t need to know. And it is of course a great place to be as a product manager: in an increasingly important domain with interesting challenges and many opportunities to learn and grow.