Joe's Nerdy Rants #9

Data Modeling - What If We Just Burn It All Down?, plus weekend reads and other stuff

Joe Reis

Jul 15, 2023

Data Modeling - What If We Just Burn It All Down?

1×

0:00

-4:33

Imagine two extremes. On one end, data modeling is done perfectly and harmoniously across the data lifecycle. On the other end, data modeling is ignored and thrown into the dustbin of history. Along this spectrum, where do you think we are as a data industry?

As I’ve been thinking about the state of data modeling for the last several years and where we’re going, I definitely think we’re on the latter end of the spectrum. Universally, when I talk with anyone who handles data (developers, “data people,” etc.), data modeling is forgotten, ignored, and sometimes scoffed at as being “too difficult and slow.” The default is to cobble together whatever data looks good for the task at hand.

I wonder if this is due to a lack of awareness of data modeling, incentives to “just ship,” and leave rigorous/formal/resilient practices for another time, or something else? Regardless, the consequences are all over the place.

In this morning’s YouTube show with the Seattle Data Guy, I called today’s data modeling “query-driven modeling.” I suppose we can also call it “just-in-time modeling.” The notion is to react to the question at hand, then move on to the next query…and the next query…and the next query…Sort of like how a puppy gets excited about the world, including pooping all over your nice rug.

If this is where we are as a data industry, it begs the question - does data modeling matter? Apparently, companies that eschew data modeling perform just fine. They make a ton of money, certainly enough to throw costly compute to crunch whatever query needs to be run. And with AI replacing knowledge work anyway, what’s the point of data modeling? Or working, for that matter? AI’s going to replace knowledge work with better knowledge and less work shortly.

My open-ended question to you this week - 🔥What if we just burn it all down?🔥 What if we just forget the old practices and techniques of data modeling ever existed? Would we be fine? If not, why?

I look forward to your answers in the comments :)

Listen to the audio clip above on this topic, which is also my 5 Minute Friday on Spotify.

Cool Weekend Reads

Hope you all had a great week.

Here are some cool things I read this week…

Tech, AI & Data

Software engineers hate code (Dan Cowell)

Code is fun until it’s not…

“Don't write new code when you can use, improve or fix what already exists. If you must write new code, write only what you need to get the job done.”

Software Ate All the Easy Shit (Andrew Rea)

“Personally, I’m not willing to bet my career on beating Open AI, Google, Anthropic, etc. in LLMs. Nor am I willing to bet on being the 1 startup in 100 that finds a way to build an enduring company in “gen AI copywriting” or “gen AI outbound marketing for SDRs” faster than the incumbents or the dozens of venture-backed startups pursuing this.

Most of you probably shouldn’t either.

I am willing to bet that I can find a boring, unsexy problem, in a great market, build a better product, and distribute the shit out of it. Playing my role as a good soldier in the fight to spread the gospel of our Lord and Savior Shareholder Value.”

PoisonGPT: How we hid a lobotomized LLM on Hugging Face to spread fake news

Welcome to the future…

Gossip Protocol (System Design)

Distributed protocols are something I strangely enjoy nerding out on. Enjoy!

Business & Startups

ChatGPT was an 'oh crap' moment for hundreds of CEOs (Insider)

Shareholder-driven development is a thing… See my old post, the Golden Rule of Value.

An e-commerce CEO is getting absolutely roasted online for laying off 90% of his support staff after an AI chatbot outperformed them (Insider)

This CEO rightfully gets a lot of flack for being a douchebag. However, I expect these sorts of layoffs will become the norm, for better or worse.

Sarah Silverman is suing OpenAI and Meta for copyright infringement (The Verge)

The crazy, though unsurprising part, “ThePile, the complaint points out, was described in an EleutherAI paper as being put together from “a copy of the contents of the Bibliotik private tracker.” Bibliotik and the other “shadow libraries” listed, says the lawsuit, are “flagrantly illegal.”

FIGHTING (Marc Andreessen)

I had a pretty rough childhood and was in countless fights growing up. As a result of my experiences and seeing the world for the gritty place it is, my kids had to learn jiu-jitsu at very early ages. They think they’re better for it, and I agree.

Now that billionaires are challenging each other to fights and dick-measuring contests (is this the modern-day duel?), fighting is cool. That said Marc’s spot on. The world is dangerous…learn to deal with it accordingly.

New Content, Events, and Upcoming Stuff

This week

Monday Morning Data Chat - #134 - Should Your Business Chase Generative AI? w/ Andreas Welsch (Spotify, YouTube)

Upcoming

Monday Morning Data Chat - Dataframe Deep Dive w/ Devin Petersohn (Live on LinkedIn and YouTube)

The Joe Reis Show - Lots coming up!

Here are some cool upcoming in-person events I’ll be at in June and beyond for 2023

Taking July off…🏔️, except for the virtual Portable Conference and a few other things. My calendar is otherwise completely blocked off from July to early August, so let’s chat over email or similar.

Joe Reis + dbt roadshow - Atlanta (8/10), Seattle (9/7). More details are coming soon.

DataEngByes. I’ll be on the continental tour in Perth, Brisbane, Melbourne, and Sydney for a couple of weeks. August 2023 (more info and registration)

Big Data London - I’m keynoting. Big up the London Massive. September 2023.

Europe - September 2023 TBA

Dubai - October 2023

India - October 2023

Canada - November 2023

Vegas - ReInvent 2023

More to come…

Thanks! If you mind helping out…

Thanks for supporting my content. If you aren’t a subscriber, please consider subscribing to this Substack.

You can also find me here:

Monday Morning Data Chat (YouTube / Spotify and wherever you get your podcasts). Matt Housely and I interview the top people in the field. Live and unscripted. Zero shilling tolerated.

The Joe Reis Show (Spotify and wherever you get your podcasts). My other show. I interview guests, and it’s totally unscripted with no shilling.

Fundamentals of Data Engineering (Amazon, O’Reilly, and wherever you get your books)

Be sure to leave a nice review if you like the content.

Thanks! - Joe Reis

Carlin Eng

Jul 16, 2023

One of the major issues with data modeling is it's split into two camps, with a never-ending push/pull of who does what, or where the logic goes. The first is the "transformation" camp, where data modeling is the act of producing a set of tables according to some methodology like Kimball or Data Vault. The second is the "semantic layer" camp, where data modeling is the act of linking tables together and defining metrics on top of them.

Neither one of them is great at the end-to-end pipeline -- Team Transformation can never anticipate all the different dimensional cuts required by business users, and ends up in a never-ending spin cycle of fulfilling data requests. Team Semantic Layer usually runs into performance issues when querying fact-level data, and thus inevitably pushes some logic into the transformation layer, at which point, metric definitions are now split across tools.

It's an artificial divide. Both teams are trying to accomplish the same thing, but the current state of the art tooling falls short. The industry needs something that unifies both of these camps. I wrote about this a bit more on my blog, in a post I called "The Data Modeling Divide": https://carlineng.com/?postid=data-modeling-divide#blog

Expand full comment

1 reply by Joe Reis

Andria Campbell

They perform “just fine” in spite of - not because of “just in time modeling”. No company is ever going to publicly expose their struggles. I worked for a company - a very big one- and we provided reporting “one query at just the right time” and it was a nightmare. We spent so much time asking why this result didn’t match that result and looking like fools to each other and in front of customers. The business units are now cannibalizing each other because market conditions have shifted. You can get away with a lot - or should I say you can get by with a lot when the external factors are a wind at your back. Those chickens always come home yo roost. Fundamentals are fundamental for a reason

2 replies by Joe Reis and others

4 more comments...