6 Comments

One of the major issues with data modeling is it's split into two camps, with a never-ending push/pull of who does what, or where the logic goes. The first is the "transformation" camp, where data modeling is the act of producing a set of tables according to some methodology like Kimball or Data Vault. The second is the "semantic layer" camp, where data modeling is the act of linking tables together and defining metrics on top of them.

Neither one of them is great at the end-to-end pipeline -- Team Transformation can never anticipate all the different dimensional cuts required by business users, and ends up in a never-ending spin cycle of fulfilling data requests. Team Semantic Layer usually runs into performance issues when querying fact-level data, and thus inevitably pushes some logic into the transformation layer, at which point, metric definitions are now split across tools.

It's an artificial divide. Both teams are trying to accomplish the same thing, but the current state of the art tooling falls short. The industry needs something that unifies both of these camps. I wrote about this a bit more on my blog, in a post I called "The Data Modeling Divide": https://carlineng.com/?postid=data-modeling-divide#blog

Expand full comment
author
Jul 16, 2023·edited Jul 16, 2023Author

Good points.

And it doesn't just stop and end with data analytics....SWE teams are a much bigger culprit of poor or nonexistent data modeling. All of the analytical modeling approaches you mention get to duct tape whatever data they get from upstream producers, like SWE and 3rd party APIs

Expand full comment

They perform “just fine” in spite of - not because of “just in time modeling”. No company is ever going to publicly expose their struggles. I worked for a company - a very big one- and we provided reporting “one query at just the right time” and it was a nightmare. We spent so much time asking why this result didn’t match that result and looking like fools to each other and in front of customers. The business units are now cannibalizing each other because market conditions have shifted. You can get away with a lot - or should I say you can get by with a lot when the external factors are a wind at your back. Those chickens always come home yo roost. Fundamentals are fundamental for a reason

Expand full comment
author

“ Fundamentals are fundamental for a reason”

Yep. Now we just to find a way to teach these fundamentals.

Expand full comment

Thanks for sharing this. I would be curious how you would design it differently from the ground up to potentially solve these downstream problems!

Expand full comment

Burning everything down is quite aggressive way to start fresh! I’m not saying we should burn everything down, but perhaps these ideas from have something in them.

I’m huge fan of Just-in-time delivery, and I think everything that’s not in use is waste. And many data modelling stuff feels really like planning stuff to storage, not for use 🙅

Aside: ring-ring, 1950’s called, Toyota wants their Industrial Engineering methods back 😆

If you can live with the costs “query-driven modelling” by all means go for it! It’s just a business decision, that just needs to be made knowing the cost of compute, work, duplicate work, quality, delivery speed and such. And most importantly understanding the opportunity cost of actually doing “proper” modelling instead.

I do agree that lot of data modelling stuff should be burned to ground! Joe you’ve recently used lot of MMA analogies, so going for martial arts as well 🤓

Lot of the old school eastern martial arts have similar vibe than many data modelling practice. They are really over dogmatic, closed and often person centric. One needs to practice 10 years of forms and then one might learn “The Secret Way”, that’s ofc better than the other persons “The Secret Way” 🤪 It’s was the 90’s and MMA that revealed the lack of pragmatism in many of these styles.

Not saying that there is no value at all in these styles, one just might need to study them with historical lens. There are lot of good underlying ideas in them! Like BJJ traditionally doesn’t have leg locks, but now they are being introduced to it. This has lead to all kinds of funky things, where some dogmatists don’t believe in leg locks and don’t tap to them with catastrophic consequences 👀

Back to data. Some books that I’ve tried to consume recently have felt really shallow, almost as they haven’t been really pragmatically tested out. This has lead me back to trial and error and experimentation with couple of recent challenges that I’ve had 🧑‍🔬

Can we have our MMA moment in data (modelling)? Where’s our octagon where we throw different styles to fight? How can we evolve our craft pragmatically?

So what would happen if we just burned it all down? I’m visioning some UFC 1 style of chaotic fights. Not perhaps prettiest thing, but we’ll going to to find out what kind of things work, see all kind of things tried out, and evolve to something pragmatic 🧐

Expand full comment