How do you Build a Qualitative Data Lake?

How do you build a data lake? In fact, what even is one? And what's the key to understanding qualitative data operations? I'll share my thinking on these excellent questions, and the process I'm current testing out with my own team – please feel free to share yours too!

For basically my entire 10 years working in tech, I’ve heard about companies building data-lakes. Massive repositories of terabytes of data, collected from a company’s entire portfolio of digital products, all flowing into a storage system that allows anyone with the necessary skills to “sip” from it. Building the capability of creating a quantitative data-lake is seemingly the holy-grail of technical infrastructure, and most companies in the know are chasing it. This concept of a pool of data deeper than Lake Baikal breeds visions in the head of any product person; depth and breadth of knowledge, nuance, evidence, focus, and clarity. Things we all want. Yet there is an entire range of data excluded from this particular concept of data-lakes, which often means that focus and clarity are lacking in quantitative data-lakes.

Qualitative data – collected during interviews, field research, focus groups, customer service touchpoints, sales conversations, etc. – is often data that is frittered away because it’s so difficult to collect in a way that allows you to pool it in one place. Even product teams, with careful research operations processes, architected drives, and elaborate Miro boards, have a hard time extending the value of qualitative data beyond a few months. But at least they have a PROCESS. Functional teams outside of Product often don’t have any hope of capturing the qualitative data that they collect. Perhaps it goes into a help desk black hole, or gets dropped in a Slack channel, never to be surfaced. Those countless feature requests from sales and marketing are useful nuggets and valuable data points that could build up to give REAL insights, if we could capture them.

Bad Data Habits

This complete lack of qualitative data process within functional teams often leads to really bad habits, like finding a way around the product manager, begging dev teams for favors, or throwing tantrums in a desperate attempt to get their voice inserted in the process. Product managers might complain about problematic stakeholders who “disrupt” their backlogs, but ask yourself: have you really taken the time to build a process that helps funnel this data in a way that is clear, transparent and workable? As they say in Jurassic Park, nature “finds a way” around blockers, and it’s human nature to find a way around obstacles. If folks are going around you, it’s probably because you haven’t given them a way to work with you.

What are we aiming for?

I am certainly not the first or only person to address this issue. If you’re familiar with the Research Ops (ReOps) movement, then you’ll know that a lot of it’s focus has been to make qualitative research more accessible and more valuable over the long-term. I highly recommend resources from ResearchOps.

In an email thread I have going with the founder of ReOps, Kate Towsey, I learned that she is trying to pull all her organization’s functional teams under one insight-generating umbrella, and just call it “Data Ops.” As it happens, I’ve just kicked off a similar process. Now Kate works at Atlassian (a truly massive company) and I work at a 30-person start-up, so we’re both obviously going to have to tackle this issue in different ways. Each approach will have its own character and challenges, but the outcome that I think we are both going for is roughly what I’m calling a qualitative data-lake.

What Is Qualitative Data-Lake?

Here’s the thing; I’m using this term in a way that makes sense to me, but others might have their own mental model. In this case, I see a “qualitative data-lake” as:

A process for collecting the valuable qualitative data from my organization’s functional teams, enabling us to regularly synthesize and prioritize insights, folding the top opportunities into the product roadmap, and eventually the backlog.

The other thing I want to do with this process is seamlessly blend this data stream with UX research and engineering’s understanding of effort. The data will come from wider functional teams, and will complement and be complemented by the continuous research that the product team will be conducting. This is NO SMALL TASK at any scale, and the larger your organization gets, the more complex it becomes. I’m quite lucky because the size of my organization is just large enough to make it interesting before it gets really painful.

How Do You Think About Building a Qualitative Data-Lake?

So I will start by saying that there are a lot of products out there who are trying to address this problem with integrations that hook into the help desk and salesforce, and all the other fancy things. I think that solves a part of the problem at best, and I haven’t seen any team manage to make these apps really “sing” for them, but I’d love to be proven wrong (Do you use something like this well? DM me!).

You also have to build a few steps beyond just data capture into this process – namely synthesis and prioritization points that include members of all cross-functional teams. I’m thinking about a double gate, wherein the functional teams in question come to the prioritization session after they have internally boiled up their top 3-4 points to discuss. You might be able to do this in a different way, but this is the way I’ve decided to test against.

Data Lake Process in 3 Simple Steps:

Functional teams capture data into a central repository. In this circumstance, functional teams include Marketing, Sales, and Customer Success. As we are lean, and testing out the process at the moment, we are just going to use a Typeform. They will be collecting data on:
1. Customer feedback
2. Feature requests
3. Org initiatives that require development work
4. Bugs
5. Major initiative requests like new integrations
At specific time intervals, the functional teams will synthesize the data that they have collected. The product team is going to ask them to boil up the top 3-5 most important data bundles.
We will then move into a cross-functional collaborative prioritization that will include reps from each functional team.

The key steps are not the collection of the data, but the synthesis and then the prioritization of the data. The synthesis step forces the functional teams to really evaluate which items are the most important to them, and helps them discuss why. This takes a huge burden off of Product, saving us from having to do all the legwork around distilling insights from the qualitative data set.

The next step – a collaborative prioritization session – does two things. Firstly, it brings product/UX research into the conversation, which should blend, validate, and amplify functional team data points. Secondly, it allows the entire team to hear and ask questions of engineering, so that everyone is aware of how the amount of effort required for a feature will impact potential roadmap decisions.

After reading a rough draft of this, my colleague Tim mentioned that this “seems like an entire function”, and he’s right. Handling this type of process, as well as supporting ongoing customer research conducted by the product team, can eventually become an entire function. Indeed, more and more large companies are actually creating departments around Research Ops or, or potentially “Data Ops”.

What Does a Data Lake Capture?

I think the ultimate goal is to help line up two streams of insight – what functional teams are hearing through their touchpoints, and what customers are saying to the product team – so that we can clearly prioritize the most significant data patterns. And then to bring in Engineering to make it clear where the technical complexity is.

It’s meant to ensure that there is a process for all of the different things the organization hears from customers, and that requests for the product have a place to be openly evaluated and discussed. Success will mean that stakeholders aren’t forced to use back-channels or disruptive tactics to wrangle work out of the team.

How Do You Build a Data Lake?

The honest answer here is: Search me?

The slightly longer answer is that I’m trying out an incremental approach. I did an overview with a few folks from a few teams, asked how this was going now, made a process map of IS and INTENDED, and gathered a bunch of questions and feedback. I’m really lucky in that I’ve been able to pull in a colleague who will help me drive this forward. The most important thing, more important than even defining the process, is getting organizational buy-in.

This involves mapping / visualizing the processes, chatting with a few parts of the organization about the process as it stands, where they have questions. This seems like a throw-away step, but giving an overview of the process, and getting clarity on where there is friction in the current process, and where there are questions about the changes you want to introduce is actually crucial. Processes don’t work if they aren’t supported by the teams that need to input into them.

Gently but persistently nudging people towards the process is necessary, but so is quickly following up with the next steps whenever they participate, so that they know their feedback hasn’t gone somewhere to die. People will use the process if they feel they can trust it to deliver an answer. Whether the answer is yes or no is often beside the point, as long as it’s clear WHY the answer is yes or no.

Right now we are getting folks to input the data they collect from their various touch points into a Typeform. We have a synthesis session scheduled that will be more of a training than a session, and then we’ll try out prioritization. I’ll let you know how things shape up as we move through them.

Questions? Comments? How do you do this at your org?