AI Denialists are the Data Industry’s Flat Earthers
Joe's Nerdy Rants #78 - Weekend reads, podcasts, and other stuff
For as long as I can remember, the success rate of data initiatives (and IT more broadly) has hovered around 20% (30% if we’re being optimistic). That’s not great, and there are many reasons for this, most of which are beyond the scope of this rant. IT is among the worst culprits for projects taking too long and coming in massively over budget. A big reason data teams are constantly under the gun is slow delivery, often with little to nothing to show.
If your child continuously had failing grades with an average of a 20% pass rate from elementary school through high school, you’d question many things - you, your kid, the school system, etc. You’d also likely be open to new ways of helping your child learn. If a new technology or approach could help improve my child’s learning outcomes, I’d try it out. My youngest son had trouble during his reading tests, and we tried many things - after-school programs, and so on. The problem was he’d freeze up in tests. Thankfully, a new technology appeared a few years ago, and I’ve been using it to help my son by giving him mock reading exams. Because he was doing mock reading exams daily, his mental blocks disappeared, helping him overcome his exam anxiety. What is this technology? It’s the same technology that can help the data industry - AI.
I’m known to be an occasionally cranky old man in our industry. Fundamentals excite me. Hype bores me. I was among the first people to call BS on the data science hype cycle publicly, and defected to data engineering when that idea was heresy. As much as possible, I try to stay grounded in reality. And I often try out new technologies before the hype cycle begins, so I’ve got a decent sense of where things are heading. I’ve learn never to discount technology, no matter how dumb or obnoxious it seems. The public gets ahead of itself in every hype cycle. But underlying every hype cycle is a useful technology and new approaches we can incorporate. The story of humanity is one of tension and the evolution of technology. We invent new things and get overly excited. Then that excitement fades into the background, and the latest technology becomes woven into our lives. When did you last marvel at electricity, railroads, aircraft, cars, radios, computers, the Internet, and machine learning? These evolutions were massive hype cycles in their time, and now they’re boring since we’re getting real utility from them. So it goes with the current crop of AI. It’s great and getting better every day, and it will eventually get boring.
Also boring - the data industry. After decades, it plods along, doing the same thing it’s always done, screaming about “business value” and frustrated about not having a “seat at the adult table.” It’s painful to watch. There’s a reason I often go in waves of absolute horror and dejection about the data industry. Same shit, different decade. And finally, there’s a tool that can help - AI.
While more vendors, data scientists/analysts, and data engineers are incorporating AI into their workflows, some in our industry complain that AI is useless because it occasionally hallucinates and fails on obscure edge cases. They insist the industry continues doing things like we’ve always done. Based on their rants, I don’t believe most of these people have tried out the newer models or understand how to use techniques like RAG (which allows models to pull in external knowledge) to supplement a model to learn obscure things. Instead, the blanket response is that AI is garbage. This mentality reminds me a lot of Flat Earthers. The definition of insanity is…
For example, in the case of data modeling, I’ve found AI to be a massive help when it comes to summarizing and parsing text and audio recordings of stakeholder conversations. Any of the mainstream frontier models can decently identify the entities, relationships, and attributes from text or audio. Minimally, using AI can shave days or weeks of sifting through stakeholder interviews. You can use that time to build and deliver a data model to production, versus the enormous time it usually takes, assuming the initiative is complete. I’ve had similar success using AI as a sounding board for architecture design, writing code, etc. Even if the AI isn’t 100% correct, I know enough of what I want and what to look for to steer it back on track quickly. Of course, you need to balance this against laziness. My concern with pure vibe coding and letting the AI do all of the work is that we’ll erode our skills, proven by countless research studies on aircraft pilots and other professions that rely on automation. But that’s a rant for another week.
Imagine how you can use AI to help with the work that takes forever. Data, code, system migrations, or refactoring from one language to another. If you don’t believe me, try it yourself. Depending on your LLM token budget and security and governance profile (don’t throw sensitive data into a public LLM), you could rip through a large code base in minutes or hours. And hey, if it doesn’t produce the desired results, you don’t have to implement them. But you potentially did something in minutes or hours that would take days, weeks, months, or longer. This seems like a no-brainer to me.
How Big Things Get Done discusses a big reason projects are late and over budget - it’s by design. If something can be done for a fraction of the time or cost, some people might view that as a threat to their budget and kingdom. This is one of the significant factors why projects might be sandbagged. So, if you’re in that situation, it might make sense why you would want to avoid AI. I recently chatted with the head of data at big bank about a data migration his team undertook. It is supposed to be done in five years. When I asked him if he was using AI, he said no. Better to do it by hand. The answer didn’t make a lot of sense. I read between the lines and realized this was more about keeping his job and growing his kingdom's budget. So it goes.
The other thing AI allows is “show, don’t tell.” It’s not enough to show slides and diagrams. Though useful, AI helps you move fast enough that you might as well create an MVP. The expectation will be to show a working MVP of a data model, a data product, a dashboard, etc. Get the MVP in front of users and get their feedback. Tweak it and deliver something in minutes or hours. These fast iteration cycles bring you closer to the business instead of the glacial death slogs that data teams typically deliver. Nobody can wait several months to add a column to a database. Those days are over. Data practitioners need to keep pace when engineering moves at warp speed because of AI code editors. The pace we’ve been working at for decades won’t help. It’s time for something different.
Please listen to the audio above or on Spotify (or your podcast platform of choice).
Have a wonderful weekend,
Joe
Join dbt Labs on May 28 for the dbt Launch Showcase to hear from executives and product leaders about the latest features landing in dbt. See firsthand how features will empower data practitioners and organizations in the age of AI.
Thanks to dbt Labs for sponsoring this newsletter.
Cool Weekend Reads
Here are some cool articles I read this week. Enjoy!
LLMs are Making Me Dumber - Vincent Cheng
Saviors, Sages, and Shitposters: The Rise and Fall of CEO Archetypes
GitHub’s new AI coding agent can fix bugs for you | The Verge
From Metadata to Meaning: The Knowledge Infrastructure
The hidden benefits of being an open-source startup
Developers at Microsoft Build question their future relevance | Semafor
Anthropic’s Claude Opus 4 model can work autonomously for nearly a full workday
Florida judge rules AI chatbots not protected by First Amendment | Courthouse News Service
Whether AI is a bubble or revolution, how does software survive?
‘Everybody’s Replaceable’: The New Ways Bosses Talk About Workers
The Tech Industry Is Huge—and Europe’s Share of It Is Very Small
🔒 Your Data Stays Private. 🧠 Your AI Stays Smart.
LLMs are great at generating insights—until they leak your data or hallucinate your metrics.
The fix? A metadata-first architecture.
In this deep dive, GoodData shares its approach to securing AI-powered analytics using semantic models, deterministic execution, and zero raw data exposure.
🧠 Smart AI. 🔒 Safe data.
Thanks to GoodData for sponsoring this newsletter.
Upcoming Events
Analytics and data engineering used to live in separate worlds—different teams, different tools, different goals. But the lines are blurring fast. As modern data products demand speed, scale, and seamless integration, the best teams are embracing engineering principles and best practices.
In this no-BS conversation, Ryan Dolley, Matt Housley, and I, dive into how engineering principles are transforming the way analytics is built, delivered, and scaled.
📆 May 27, 2025 🕘 9:00 AM PDT, 12:00 PM EDT, 6:00 PM CEST 🔗 Register here!
If you’re at Snowflake Summit, I’m doing a live event from the conference with Tom Ridings and Srinivasan Swaminatha about The Agentic Data Team.
Where: Streamed live
When: June 3rd at 10:30am PT
Thanks to Matillion for hosting this event.
Databricks Data & AI Summit - TBA
Iceland - Global Data Summit, June 23-24. Register here
Australia (Sydney, Melbourne) - Data Eng Bytes, July 24-30. Register here
UK - Big Data London, September 24-25. Register here
Helsinki Data Week - October TBA
More to be announced soon…
Podcasts
Freestyle Fridays - AI Denialism is Holding Back the Data Industry (Spotify)
Ryan Russon - Practical ML Engineering (Spotify)
Freestyle Fridays - What Does AI Do to The Craft of Dev and Engineering? (Spotify)
Laura McDonald - Navigating the Complex World of Enterprise Sales (Spotify)
Freestyle Friday - Navigating Data Strategy in the Age of AI w/ Dia Adams & Gordon Wong (Spotify)
Michael Drogalis - Building a Company in Public (Spotify)
John Giles - The Data Elephant in the Board Room, Data Modeling, and More (Spotify)
Zhamak Dehghani - Autonomous Data Products, Data Mesh, and NextData - Q&A (Spotify)
Freestyle Friday - Advice for 2025 Graduates (Spotify)
Jessica Talisman - Libraries, Knowledge, Shitty Tech Jobs, and More (Spotify)
Freestyle Fridays - “I don’t need to learn anything anymore.” (Spotify)
Juan Sequeda & Jesus Barrasa - Unlocking Knowledge with Graphs (Spotify)
Freestyle Fridays - Wartime Data Teams (Spotify)
Tim Berglund - The Art of Developer Relations, Hardware Hacking, and More (Spotify)
There are way more episodes over at the Joe Reis Show, available on Spotify, Apple Podcasts, or wherever you get your podcasts. Also available on YouTube.
Thanks! If you want to support this newsletter
The Data Engineering Professional Certificate is one of the most popular courses on Coursera! Learn practical data engineering with lots of challenging hands-on examples. Shoutout to the fantastic people at Deeplearning.ai and AWS, who helped make this a reality over the last year. Enroll here.
Practical Data Modeling. Great discussions about data modeling with data practitioners. This is also where early drafts of my new data modeling book will be published.
Fundamentals of Data Engineering by Matt Housley and I, available at Amazon, O’Reilly, and wherever you get your books.
The Data Therapy Session calendar is posted here. It’s an incredible group where you can share your experiences with data - good and bad - in a judgment-free place with other data professionals. If you’re interested in regularly attending, add it to your calendar.
My other show is The Joe Reis Show (Spotify and wherever you get your podcasts). I interview guests on it, and it’s unscripted, always fun, and free of shilling.
Would you like me to speak at your event? Please submit a speaking request if you want me to give a workshop or talk at your event.
If you’d like to sponsor my newsletter or podcast, please get in touch with me.
Could you be sure to leave a lovely review if you like the content?
Thanks!
Joe Reis
Don't worry, I believe in equal opportunity. Next week I'll talk about why going 100% all in on vibe coding isn't wise.
Thanks Joe, inspiring as usual. I do believe to some extent that A.I comes as gift not only for data engineering but for statisticians where they were struggling to speak and be understandable by business units. The patiential of A.I stretching minute by minute...