They're Just Fancy Averages by Ben Fields

In this ProductTank London talk, Ben Fields, formerly Lead Data Scientist at FutureLearn, looks at getting the most from the relationship between product managers and data people, working out people's background understanding, the benefits of leading with a story, and how best to spread knowledge.

There are three key aspects of how data science can effectively sit within a product and software organisation:

Gear and orient to your audience
Optimise for diversity
Embed with product

1. Gear and Orient to Your Audience

How can you show your audience content that they would not otherwise have found? Whether a site is selling ecommerce products or promoting editorial content, delivering an accurate relationship between what the user is presently viewing and then where you direct them next is more likely to lead to conversion.

A Story About Popularity Bias

Ben says that a recommender system and widget built at FutureLearn attempted to orientate the user to the course they were viewing and then gear and direct them to other possible, related options – options that they may not have encountered without this widget.

A recommender system should make effective use of data that an application has available. However, this can push a model to decisions that might not be the best for the viewing user. Popularity bias can creep into data models.

The FutureLearn recommender system would promote courses based upon similar enrolment patterns and behaviour from the wider user base. o, at its simplest if Person A enrolled in Course 1 and Course 2, then when Person B views Course 1 they may also be recommended Course 2.

However, FutureLearn found that its recommender system kept promoting a similar set of courses that didn't necessarily have a fair relation to the course being viewed. This was because the system took popularity into account and one provider, the British Council, provided a number of courses that were popular among FutureLearn users. As a result, a lot of British Council courses appeared in the widget. This meant that the widget was not meeting the aim of showing courses that users would not have found otherwise.

If we're just repeating what we already saw then what work are we doing?

Ben has another story about popularity bias. A popular book retailer had with a “people who bought this also bought…” widget. When browsing the product page for a Java RMI (remote method invocation) book the widget recommended two other software engineering related books and then finally … Harry Potter.

It this wasn't incorrect, it just isn’t what was wanted and likely had little value. The system picked out Harry Potter because “everyone” has bought Harry Potter.

It is very difficult to move from popular courses, books, content, artists, products, deep into the long-tail, less popular and less visible items. However, that longer tail can hold a lot of dormant value which can be unlocked if it is seen by users at relevant opportunities.

2. Optimise for Diversity

One way to combat popularity bias is to optimise for diversity. When designing a complicated system such as a behavioural recommender, it is key to spend time working with the people who will find the edges of the requirements.

Telling and Supporting Your Stories With Data

FutureLearn wanted to build a prediction engine that would predict course enrolment and revenue. The system would take the known information about the course due to run and model it against behaviour seen on the site to get enrolment and revenue figures. Start date, course length, how many times has it been run, how much learning there is, and course density, are some of those elements that can be coupled with other factors such as seasonality, peaks, and troughs.

Using a machine learning tool called TPOT, the team at FutureLearn was able to cut short system setup and prove it worked. It took a week to show that the system was able to predict revenue when given a set of data attributes.

This lean approach gave FutureLearn an immediate output and evidence base, supported by data. While not pristine, the system gave better predictions about which courses might be better than others than the previous rule-based, business process methods.

3. Embed With Product

Finally, the FutureLearn team found success by keeping their data scientists close to product. The product team outnumbered the data scientists in the business but this was offset by using an embedded rotation system. Data scientists moved around and embedded into the various product teams based upon priority and focus.

Priority

Here the data team would consider need and ask when/where are the data scientists going to be most effective? Which product team needs them, when and for what scale of project?

Focus

The battle here is to ensure that the decision aligns with the strategic aims of the company. By serving and orienting to the business focus the team could better put the resources where they could be most effective.

Ben’s final parting thought about scarce resources was that effective internal communication and an understanding of the value that data science will allow companies to be far more effective.