I’m back. I took last weekend off from my 5 Minute Friday and this newsletter due to AWS ReInvent and last-minute travel plans over the weekend for my cousin’s wedding. After next week, I won’t be traveling nearly as much. Instead, I will hide in the Bat Cave (my study), focusing on my upcoming course and book. Stay tuned for updates very soon, especially regarding the book.
Thanks,
Joe Reis
Data LARPs
Hark! Fellow data nerd. Gather ‘round as I regale you with tales of data projects that fail to enter the Kingdom of Production. In our world of Data LARPing, we valiant heroes of the Realm of Make Believe merely pretend to do “data things” while offering our daring stakeholders nothing in return.
For some reason, I’m highly intrigued by LARPs. Imagine full-grown adults gathering together for cosplay, battle, and more in city parks or Walmart parking lots. Identities get wrapped up in LARPing. Pretending to be somebody else is a big investment of time and energy. I imagine people have fun LARPing, so that’s cool.
What does this have to do with data teams? From my experience, there are two types of data teams. On one end, data teams push stuff to production and add value to real-world business functions. On the other end, there are data teams working in a vacuum, doing “data stuff” that will never see the light of day. These are the Data LARPs - data teams go through the motions and pretend to add value.
LARPs are everywhere. In your city, nights and wizards might LARP in your local park or sword fight in a Walmart parking lot. Most of Corporate America is one massive LARP, with millions of people clocking into the job and producing little to no value. The wonderful book Bullshit Jobs is a must-read if you’ve felt like most jobs are, well, bullshit. People attend meetings, pretend to be interested, and get 30 minutes of actual work done in an 8-hour day. But I digress…
In Data LARPs, data teams produce very little that the business uses. Often, the data team knows this. I can’t say they’re entirely to blame. I’ve been there myself, being sold a bait and switch of “do awesome data stuff at our company,” which turns out to be a nothing burger. This happens a ton. Often, there’s no data (or data is in terrible shape), teams work on problems that won’t move the needle for the business, etc. Especially with the hype around AI, there will be a ton of Data LARPing. Every company wants to get into AI. Very few will get anywhere. Same as it ever was.
Symptoms of Data LARPs are POC Purgatory1, where data teams spend their time building proof of concepts that never make it to production. POCs feel like work, but they’re just toys. Another symptom is designing complex architectures with shiny new technologies that look great on a diagram but have no utility in the real world and are usually impossible or expensive AF to implement. If you spend a lot of time working on stuff nobody uses, you’re probably LARPing.
What can be done? First, know whether you’re in a Data LARP. If you are, here’s some advice. First, determine whether you can get anything done at your organization. Assuming the situation is intractable, understand you will be spinning your wheels. The best choice is to move on and find an organization working on exciting problems from which the business gets value. Life is too short.
If there’s hope you can move away from Data LARPing, you need to get quick, visible, and tangible wins for your stakeholders. There’s likely a bunch of people cheering you on from other departments. Prove them right. Do stuff that adds value and moves the needle, no matter how small. Success begets success. People want to attach their names to the winners. Over time, small wins become big wins. This might be a report or ML model getting adopted within a department, adding so much value that people within that department start telling their coworkers. The flywheel of success starts, and away you go. You move out of Data LARPing and into doing real, valuable work.
Fellow brave data nerd, this is ye path to the Kingdom of Production.
Listen to the audio clip above on this topic, which is also my 5-Minute Friday on Spotify.
Cool Weekend Reads
Here are some cool things I read this week. Enjoy!
Tech, AI & Data
Google says new AI model Gemini outperforms ChatGPT in most tests (The Guardian)
The big news this week was Google’s release of Gemini. I am very curious to play with it and see what others experience. Is it better than ChatGPT? Who knows. We’ll find out soon.
Europe agrees landmark AI regulation deal (Reuters)
Literally hot off the press. The EU AI Act has finally been agreed upon. I chatted behind the scenes with players in this act, and the agreement was far from certain. I am glad to see it finally moved forward.
Here’s some great analysis from my good friend Juan Sequeda about using knowledge graphs to alleviate hallucinations when using LLMs to query SQL databases. I was lucky enough to catch Juan’s talk a few months ago in London at the Alan Turing Institute lecture series, and I can tell you this is the real deal. I'm really excited to see where LLM’s and knowledge graphs go from here.
Extracting Training Data from ChatGPT
It turns out you can query ChatGPT and extract the training data. This is a good summary (and link to the research paper) of how this attack can be done. Around this time last year, I quipped that LLMs will be a new breeding ground for SEO-style content in LLMs. By seeing the training data, you’ll know how to game LLMs to produce “answers.”
Joe’s pet peeve - the knowledge and skills gap (YouTube)
Airbyte was kind enough to have me speak at their data(move) conference. Here’s my 10-minute rant about the biggest gap I see in our industry - lots of great tools, tons of bad practices, and poor knowledge/skills on using these tools to their fullest potential.
Business & Startups
From Unicorns to Zombies: Tech Start-Ups Run Out of Time and Money (NYTimes)
The startup world is about to start looking like The Walking Dead. This is an excellent article describing the hellscape many startups from the Covid era (and some from before) are entering. It won’t be great. Not at all. Zombie Nation is a fitting tune.
Regretful Accelerationism (Stratechery)
“…the Internet stripped away the constraint of physical distribution, and now AI is removing the constraint of needing to actually produce content. That this is spoiling the Internet is perhaps the best hope for finding our way back to what is real.”
Great analysis (as always) from Ben Thompson on the impacts of generative AI on content.
First decide how to decide: “one weird trick” for easier decisions (Jacob Kaplan-Moss)
“The heart of this process – the move that I think makes it work so well – is that in includes an explicit step to first decide how to decide. That is: when a decision appears that it’ll be controversial or difficult to make, instead of immediately starting to discuss the matter at hand, the stakeholders first come to an agreement about how they’ll eventually decide. In fact, this happens twice: first at the macro scale when the organization agrees to adopt this process overall, and then in the micro scale, for each individual decision.”
The Terrible Twenties? The Assholocene? What to Call Our Chaotic Era (New Yorker)
There’s little argument we live in a very turbulent world right now. This article perfectly captures the insanity, joy, and craziness of our time.
New Content, Events, and Upcoming Stuff
Monday Morning Data Chat
Coming up…
Tristan Handy, Mike Ferguson, and more…
In case you missed it…
Sol Rashidi - Getting Business Value From Data, the CXO Playbook, and more (Spotify, YouTube)
Sarah Nagy - Automating Analytics w/ Generative AI (Spotify, YouTube)
Dave McComb - Knowledge Graphs, Semantics, and More (Spotify, YouTube)
EU AI Act w/ Kai Zenner (Spotify, YouTube)
The Joe Reis Show
Coming up…
Ben Rogojan, Eleanor Thompson, and more…
This week…
5 Minute Friday - Data LARPs (Spotify)
Matt Harrison - Self Publishing Technical Books, Working With Publishers, Book Piracy (Spotify)
In case you missed it…
Karin Wolok - All Things DevRel and More - (Spotify)
5 Minute Friday - “This Was All Predictable” (Spotify)
Peggy Tsai - Setting CDOs Up For Success (Spotify)
Bill Inmon - History Lessons of the Data Industry. This is a real treat and a very rare conversation with the godfather himself (Spotify) - PINNED HERE.
Events
December
Boston - 12/13 and 12/14
2024
dbt + Joe Reis Roadshow (Dallas) - TBA
Data Day Texas (Austin) - register here
Data Modeling Zone (Arizona) - register here
Skiers in Data (Switzerland) - March, TBA
Saudi Arabia - March, TBA
London - May, TBA
Malaga, Spain - May, TBA
Berlin, Germany - May, TBA
Morocco - May, TBA
Vancouver, BC - June, TBA
South Africa - TBA
Dubai - TBA
Australia - TBA
Asia - TBA
Thanks! If you want to help out…
Thanks for supporting my content. If you aren’t a subscriber, please consider subscribing to this Substack.
You can also find me here:
Monday Morning Data Chat (YouTube / Spotify and wherever you get your podcasts). Matt Housely and I interview the top people in the field. Live and unscripted. Zero shilling tolerated.
The Joe Reis Show (Spotify and wherever you get your podcasts). My other show. I interview guests, and it’s unscripted with no shilling.
Fundamentals of Data Engineering (Amazon, O’Reilly, and wherever you get your books)
Be sure to leave a nice review if you like the content.
Thanks! - Joe Reis
Thanks to Sol Rashidi for this one liner
Proof of concept purgatory. Sol nailed it. Great quote.