It’s Fall, and I’m back on the road. I am wrapping up a three-week tour in the UK, Australia, and New Zealand. Here are some things I noticed or heard while talking with practitioners, executives, pundits, and vendors over the last few weeks.
2023 was all about imagining the use cases for Generative AI. In 2024, Generative AI is in the POC phase. It will be interesting to see how many use cases make it to production. Right now, I’m hearing that barely anybody uses LLMs or other generative AI in production. The gap between enthusiasm and real-world implementation is massive.
Lots of conflation of “AI” with generative AI, deep learning, and classical ML. It’s very confusing when you hear people talk about AI because it’s unclear what type of AI is meant (or if it’s even meant at all, as there’s lots of AI-washing, too).
Questions (and open cynicism) about AI and agents. People want to see things work in their companies rather than just cool demos. I’ve written in quite a few places about how I think we’re probably past the peak of the current AI bubble. There are still a lot of use cases, and AI isn’t going anywhere, but the sheer amount of noise is loud, and people are tired of it. I expect to see quite a few “AI vendors” having trouble soon. As I’ve written before, I’m long-term bullish on AI, but I feel this hype cycle is played out by a crowd of grifters and sameness. Show us the money!
Is OpenAI the new WeWork? Especially with Softbank’s recent investment and OpenAI’s culling of its executive team, we’ll see…
The winners of the data companies from circa 2020/21 are being settled. I'm not going to name specific vendors, but the top 3 in each category in Matt Turck’s MAD data landscape are much more apparent than last year. It will be a tough road for companies not in the top 3, and pivots, downsizing, and closures will be expected—best of luck to everyone out there.
Data’s still a mess. Most data initiatives fail. Data teams are seen as a cost center and not getting the support they deserve. Same as it ever was.
The countless practitioners and executives I met work at traditional companies. Big tech doesn’t apply. We’re just talking plain Jane data stacks and mostly conventional use cases. As I said last week, boring is good.
There’s more general malaise in the UK and Australia than usual. The economies aren’t doing great, and there’s a sense that things are limping along. This is compared with the USA, where we’re always “killing it,” but somehow also have a mass extinction event still going on with tech employment.
Anecdotally, practitioners are buzzing about Databricks. Whether it’s perceptions on cost or lagging on AI, at least with people I spoke with, Snowflake is now seen as the laggard. Even last year, I started noticing a shift in my conversations about these two vendors. If this year is any indication, Databricks is ahead, and Snowflake has much catching up to do.
There has been much talk about Iceberg, but very few production implementations in the wild. It’s still early days, so we’ll see how open table formats grow in popularity. There are still many questions and speculations about the future of Iceberg post-Databricks acquisition. I’m excited at the prospect of open table formats in general, such as Iceberg, Hudi, and others. Also, I hope Hudi and non-Iceberg formats offer a counter-balance just in case Iceberg isn’t as open as people expect.
I’m sure there’s much more, but these things are the top of my jet-lagged brain for now. If there’s more, I’ll edit this post accordingly.
As a side note, avoid self-serving questions if we meet at a conference. Especially questions that make me feel like you’re trying to get free consulting from me. While most questions are pretty innocent, I encountered a few situations where people blatantly wanted free consulting on their work. In one egregious case, a couple of guys were trying to fish answers out of me so they could return to their boss and say, “Joe said this.” Please don’t bring me into your workplace drama. I’m not your consultant, nor do I have a stake in your work. Leave me out of it. Thanks.
Anyway, I’ve got a busy conference schedule this Fall (and Spring 2025). I hope to see you wherever we happen to meet. My event schedule is posted toward the end of this newsletter if you’re interested in attending.
A shoutout to the Big Data London crew: Andy Steed, Palesa Amadi, Mike Ferguson, and others.
For DataEngBytes, being on tour with Zach Wilson, Chip Huyen, Adi Polak, Eevamaija Virtanen, Sean Falconer, and many others has been great! Also, thank Peter Hanssens and Simon Aubrey for organizing, Lars Klint for MC’ing, and the Cloud Shuttle crew for holding the tour together. You do an insane amount of work, and the community thanks you!
If you haven’t checked it out, the Data Engineering Professional Certificate is available on Coursera! Learn practical data engineering with lots of challenging hands-on examples. Shoutout to the fantastic people at Deeplearning.ai and AWS, who helped make this a reality over the last year. Enroll here.
On another note, the popular Data Therapy Session calendar is posted here. It’s an incredible group where you can share your experiences with data - good and bad - in a judgment-free place with other data professionals. If you’re interested in regularly attending, add it to your calendar.
Hope you have a fun weekend!
Thanks,
Joe
P.S. If you haven’t done so, please sign up for Practical Data Modeling. There are lots of great discussions on data modeling, and I’ll also be releasing early drafts of chapters for my new data modeling book here. Thanks!
Cool Weekend Reads
The Internet’s AI Slop Problem Is Only Going to Get Worse (Ny Mag)
Getting value from data-wrapped products – it takes more than pretty packaging (Cambiont)
What I will remember from Fundamentals of Data Engineering book (Medium)
COBOL has been “dead” for so long, my grandpa wrote about it (Wumpus Cave)
The Pig Butchering Invasion Has Begun (WIRED)
forkme ⑂ - “Turn any repo into a billion-dollar idea with our patented Hype-Generator™!”
AI Can Only Do 5% of Jobs, Says MIT Economist Who Fears Crash
Beyond Bots: How AI Agents Are Driving the Next Wave of Enterprise Automation (Menlo Ventures)
New Show & Upcoming Events
The Joe Reis Show
5 Minute Friday - Notes from the Field, Early Fall 2024 Edition (Spotify)
Ilya Reznik - How to Lead New and Existing ML Teams and More (Spotify)
Jordan Morrow - How to Write Amazing Books (Spotify)
Venkat Subramaniam - Moving Beyond Agile as a Buzzword, Learning to do Less, and more (Spotify)
Paco Nathan - Hacker Culture, Cyberpunk, AI, and More (Spotify)
Bethany Lyons - Disrupting the Recruitment Industry, Startups, and the Future of Work (Spotify)
5 Minute Friday - How Good Do You Need To Be? (Spotify)
Jordan Tigani - Why Small Data is Awesome, DuckDB, and More (Spotify)
Demetrios Brinkmann - AI Hype vs Reality, Building a Global Community, and More (Spotify)
5 Minute Friday - Zero-Sum vs Positive Sum Games (Spotify)
Bill Inmon - History Lessons of the Data Industry. This is a real treat and a very rare conversation with the godfather himself (Spotify) - PINNED HERE.
Monday Morning Data Chat
Andrew Ng - #181 - Andrew Ng - Why Data Engineering is Critical to Data-Centric AI (Spotify, YouTube)
Tevje Olin - What Should Data Engineers Focus On? (Spotify, YouTube)
Rob Harmon - Small Data, Efficiency, and Data Modeling (Spotify, YouTube)
Joe Reis & Matt Housley - The Return of the Show! (Spotify, YouTube)
Nick Schrock & Wes McKinney - Composable Data Stacks and more (Spotify, YouTube)
Zhamak Dehghani + Summer Break Special (Spotify, YouTube)
Chris Tabb - Platform Gravity (YouTube)
Ghalib Suleiman - The Zero-Interest Hangover in Data and AI (Spotify, YouTube)
Events I’m At
dbt Coalesce - Las Vegas. October 7-10. Register here
Matillion Deep Dish Data (virtual event) - TBA. October 23.
OSDC West (virtual) - TBA. Late October.
Helsinki Data Week - Helsinki, Finland. October 28 - November 1. Register here
Austin - TBA. November 7-8.
NYC - Data Galaxy Event. November 13.
Amsterdam - TBA. November 21.
Forward Data Conference - Paris, France. November 25. Register here
AWS ReInvent - Las Vegas. Early December. Doing the after-conference scene. Let’s meet up.
Seoul, Korea - TBA. Mid December.
CES - Las Vegas. Early January 2025.
Data Day Texas - Austin, TX. January 25, 2025. Register here
Data Modeling Zone - Arizona. March 4, 2025. Register here
Winter Data Conference - Austria. March 7, 2025. Register here
Netherlands - TBA. April 2025
Much more to be announced soon…
Thanks! If you want to help out…
Thanks for supporting my content. If you aren’t a subscriber, please consider subscribing to this Substack.
Would you like me to speak at your event? Submit a speaking request here.
You can also find me here:
Monday Morning Data Chat (YouTube / Spotify and wherever you get your podcasts). Matt Housely and I interview the top people in the field. Live and unscripted.
My other show is The Joe Reis Show (Spotify and wherever you get your podcasts). I interview guests on it, and it’s unscripted and free of shilling.
Practical Data Modeling. Great discussions about data modeling with data practitioners. This is also where early drafts of my new data modeling book will be published.
Fundamentals of Data Engineering by Matt Housley and I, available at Amazon, O’Reilly, and wherever you get your books.
Be sure to leave a lovely review if you like the content.
Thanks!
Joe Reis
Such a great summary and super valuable insights!
Maybe we should avoid the top 3 or the winner takes all effect. If your provider meets your requirements and is profitable, that’s fine.
Focusing on the leader or the top 3 for not being fired may not be the best answer or at least should not be the #1 reason.
Current top 3 players may also not remain the top 3 in the next years to come
It would also increase safe and sane competition and avoid abuses from leaders.
So business value first !