Groundhog Days
Data has always needed to be believable and valuable. Let’s get the industry out of this Groundhog Day loop and on to something bigger and better!
(My tl;dr summary in the audio below)
In the 20+ years I’ve worked in the data profession, it often feels like a scene from the Bill Murray movie Groundhog Day. We ask the same questions over and over. I keep seeing two big recurring questions asked repeatedly1. First, “Do I believe the data?”. Second, if the data can be believed, “How is this data adding value?”
Despite the increased visibility and importance of data (sexiest jobs of the century and all), along with improvements in technology and practices, it seems we’ve made very little progress in these two areas. I’ll first unpack each of these questions, then discuss what I think can be done to make data more believable and valuable. Especially given the current economic malaise, these two issues are growing more top of mind every day for me.
Question 1: “Do I believe the data?”
As my good friend Bill Inmon (the father of the data warehouse) once told me, throughout his very long and legendary career, his true north has always been “believable data.” The simplicity of this true north hit home for me. It cuts to the heart of what we’re trying to do as a profession. Belief means you accept something as true, and believable data means the data gives you the confidence to take action. Ask yourself if you have believable data. Would you bet your job on it?
As a data professional, your job is to provide believable data. How does data become believable? It’s almost too simple - you need to know what stakeholders believe. Does the data you provide make sense? Do stakeholders believe the data2, and can they make effective decisions? If yes, you’ve done your job.
Believability should be simple, but it’s deceptively hard. Believability isn’t just a technical assessment of the structural quality of the data (“expectations,” as we call them nowadays). It’s also a matter of having people smell-test the data. I can’t tell you how many meetings I’ve been in where an executive fixates on the validity of numbers in a presentation. If an exec says the data’s not believable, you’re probably not going to have a good day. As my old boss told me, “if any number is wrong, they’re all wrong.” He’s not wrong. Fool me once or twice or something. Either way, don’t fool me. Just check your work.
Here are some questions to ask yourself about whether the data is believable.
Understand how your organization makes decisions3. Does the data pass the muster of making a solid data-driven decision? How about supporting a gut-driven decision? What about the situation where it’s first a gut-driven decision, then a data-driven decision after the fact? Hopefully, the data supports different decision-making styles and leads to the best outcome.
Be ready to defend your data. Does the data make sense? If an executive calls out your presentation in a meeting, can you justify why you presented the data that way?
Data is a full-contact sport, which means you have to understand what people want and provide a response. Ultimately, making data believable means having empathy for the stakeholder and knowledge of the domain and context in which they will use data for decision-making.
Question 2: “How is data adding value?”
If data is believable, the next question is how is it adding value?
“We need to add more value,” I hear a ton from data professionals. Given how often people throw around the word “value,” I’m unsure if we’re clear on a consistent definition of value. My favorite definition of value comes from the classic book, Lean Thinking4.
“Value can only be defined by the ultimate customer. And it’s only meaningful when expressed in terms of a specific product (a good or a service, and often both at once) which meets the customer’s needs at a specific price at a specific time…Value is created by the producer. From the customer’s standpoint, this is why producers exist.” - Lean Thinking (Womack and Jones, 1996)
Simply put, the customer defines what’s valuable, and you produce that value for the customer.
Your first job is to identify who this person or group is and truly understand their needs. Who is the customer? It depends, but this is almost always the stakeholder who needs believable data. Know who your customer is, and you’ll know what they want and what’s valuable to them. Your second job is to deliver value, as defined by the customer.
Will Data Ever Be Believable and Valuable?
Believability and value are two sides of the same coin. They’re both customer-centric outcomes. Yet as an industry, we seem to struggle over and over. Grizzled data industry veterans commiserate that these problems have been around for ages. These aren’t easy problems to solve, as they’d have otherwise solved them long ago. I’d love to stop hearing about whether data is believable or valuable. I often wonder if data will ever be believable and valuable. So far, they’ve been elusive.
When I speak with data teams, the conversations are usually technology-focused (“should I use dbt or Spark? Also, do you like Databricks or Snowflake? I hear DuckDB is pretty cool. What are your thoughts, Joe?”). Rarely are the questions about how to talk with stakeholders to figure out what they want. Hopefully, the team and manager(s) goals align with stakeholders and executives. However, when I speak with stakeholders and executives, they often question the value of the data they’re getting and the team providing the data (this might be self-selection, but I doubt it since I speak with a LOT of people). The big question these days is - what is the value generated for the expensive data team salaries and their fancy toys? I’m guessing that’s more and more open to interpretation these days.
One thing I’d like to do is avoid endlessly complaining like the industry’s done for decades. Complaints are easy, but they’re annoying and force people to write articles like this. Instead, I like solutions. Data teams have a dual mandate of making data believable and valuable (among other responsibilities). A good starting place is becoming more customer-focused. It’s not like people haven’t advocated customer-centric focus a million times before. We need to focus on the customer and make data believable and valuable.
That said, it’s better to invert the question and ask - How can we keep Groundhog Day going? Put another way, how do we make data that’s not believable or add any value? How can the industry keep asking the same dumb questions every year? Here are some hints.
Ignore the big picture, work in a silo and completely ignore your stakeholders. Instead, be inwardly focused, overcomplicate everything, and obsess over the hottest data tools and fads. That’s how it went in the good times.
During the good times of 2020/21, data teams could coast by on this approach, with future promises of providing value “someday.” Those days are long gone. The value offered by data teams is under intense scrutiny. The grim reaper of cost-cutting will spare data teams that add value and cull those that do not. That’s how it goes in downturns.
To avoid predictable and miserable failure, do the hard stuff - talk to your stakeholders and understand what they want. Build empathy for your customers. Get to know their goals and how they make decisions. Make them look like rock stars. In the end, all that matters is the customer and their needs. Be the rare data professional who wants to help the industry step out of the deja vu of Groundhog Day.
Remember, this is nothing new. Repeating Groundhog Day is a surefire recipe for continuing the themes stunting our profession, leaving a legacy of frustration for the next generation of data professionals and their stakeholders. But since you want to be proud of your career and make a positive impact, hopefully, you’ll ignore this terrible advice.
Data has always needed to be believable and valuable. Let’s get the industry out of this Groundhog Day loop and on to something bigger and better!
Of course, if you’ve been in the data space for a while, I’m sure there are other questions than these that continue surfacing for you. Drop them in the comments.
In my experience, stakeholders often use data to justify their gut-driven decisions after they’ve already made a decision. Regardless, data support decisions and actions. Data must be believable.
I notice quite a few companies that have invested in expensive Modern Data Stacks, yet the executives still make gut-driven decisions from Excel reports, often from data outside the MDS.
IMO, the data world could turn itself around and get out of this rut if we just adopted practices from Lean. Namely, relentless focus on what the customer finds valuable and the banishment of waste. I’m at a loss why Lean isn’t more widely adopted.
"This new tool is great, but how does it make it easier for the stakeholders to export the data to Excel?"
"Is there a better BI tool than Excel?"
"Do we know if they checked their report filters before submitted a support request ticket?"
Just searched your substack for the word "silo", so commenting on a Nov 2022 post in Jan 2024. This incredible article popped up on Hacker News this morning: https://fernandovillalba.substack.com/p/devops-dont-destroy-silos-transform - and it's reminiscent of the API-driven approach Bezos did at Amazon that ended up leading to the creation of AWS
Key quote (actually by Kelsey Hightower [1]): "Silos are fine as long as there is an API between them"
[1] https://www.youtube.com/watch?v=hD7HlWbmVqI