4 Comments
Jul 22, 2023·edited Jul 22, 2023Liked by Joe Reis

Great article, while the world of Data Engineering may feel a bit lukewarm on Kimball models as there are some arguments it doesn't scale as well as Data Vault or One Big Table, I feel Kimball is more in use than any other point in time due to being the default way to model data in Self-Service Business Intelligence applications (BI): Power BI and Tableau.

And BI is 10 times bigger in usage than DE, I say that as a DE myself.

I think a argument can be made we're living in era where it is common in a large organisation to use multiple types of data model whereas 10 to 20 years ago you can use only Kimball and you'd be called crazy to question it (though I could be wrong, I was still in school then!).

Expand full comment

Nice. Very similar thoughts. I think this applies well to what you have discussed with companies that have enough funds to do data-driven change: data modelling based on semantics to support LLM and data governance on usage, scope access, as well ML being a plug and play for most basic use cases (and that is what most businesses need to begin with). Now, to me, the problem is, with high interest rates, companies are not in growth mode but profit mode. What we see is companies scaling down using expensive SAAS tools or put better data governance or observability on limiting the usage in an automated way or do more optimisations as well migrating the repetitive stuff that do not change in cheaper full/hybrid open source solutions. I think many companies using cloud solutions are in a honeymoon period that once the data or usage resources scale or when they have to cut costs, it is one stage companies will need to spend talent and tools that optimises cost on those aspects in the end. ML is now very very cheap with in terms of development with transformers these days, but again, I do see the same pattern happening all over again with ML as with Big Data Engineering.

Just because there is unlimited compute to put anything on ML these days does not mean the return of value is worth the cost. I think the expectations that companies need to do is to be grounded and focus on the long data journey instead of just the features delivered at start. Companies need to be pro-active and know how to optimise data costs which can be a 2-3 year period journey, they cannot be reactive and think this can be delivered within a quarter or two. In other words, the cost optimisation should be bundled within the full roadmap when adding new features into your product and should be part of the cost in order to avoid future bigger bills in the end. If anything I lived in the data world is a thing I wish I should have wished I or everyone else known when starting their data career. I call this being "environmentally friendly" to your data world because you may miss the good effects you will do to your company in the long term as you will job hop to another company later on. You do it for the future generation of people being onboarded to have less technical debt. I know this is cliche, but an ounce of prevention is a pound of cure.

Expand full comment
Jul 22, 2023Liked by Joe Reis

One overlooked gem is The Kimball Group Reader. Available as a book as a collection of articles from 1995 to 2015. Gives a good overview of Kimball's thoughts in easy to read pieces. I'm currently reading a little bit every day. The Kimball Design tips are also available at: https://www.kimballgroup.com/category/articles-design-tips/

Expand full comment

Great article. This is why we still teach ER-model, star model, datavault and anchor in the program and touch on the no-sql options and graphs. So students can make informed decisions.

Expand full comment