Lots of Fancy Tools, and No Idea How to Use Them
Joe's Nerdy Rants #12 - Weekend reads and other stuff
Lots of Fancy Tools, and No Idea How to Use Them
There are a lot of data tools out there today. To say we live in an age of an abundance of data tooling is a very serious understatement. For pretty much every conceivable data problem, there’s a tool to solve that problem. Meanwhile, we’re still talking about how to derive “business value” and other overdone tropes. What gives?
I don’t think we can blame tools at this point. For one thing, they’re inert. On their own, they don’t do much. Second, tools these days are awesome and only improving. If you want to use a “cloud data platform” or a “data lakehouse”, along with observability, catalogs, quality checks, LLM’s, and anything else you can think of - all of these tools are readily available. And if you think tools today are terrible, I urge you to reflect upon the good ole’ days of software even a decade ago. I don’t suspect many people want to go back to that time.
So if tools are great, why are we still struggling to “deliver business value”? To me, the biggest issue in the data industry today is the massive knowledge gap about the basics of data and best practices for the tools we use. We have all of these fancy toys, but from what I’ve seen, we’re often not using them to their fullest capabilities. Oftentimes, the misuse of tools causes more harm than good. Examples include dbt model sprawl and expensive Snowflake bills.
While some argue that vendors intentionally push this sort of bad behavior, I disagree with this assertion. While vendors are out to make money, they also offer best practices for operating on their platform. If you take the time to read their documentation, vendors actually show you ways to reduce consumption and avoid doing dumb things. If users don’t take the time to educate themselves on the best practices for the tools they use, is that the vendor’s fault?
The other issue is a lack of awareness of basic data practices, such as writing clean code, architecture patterns, data modeling, and stakeholder management. This manifests itself in incoherent data, imploding data systems, expensive bills, spaghetti code sprawl, and stakeholders who wonder what the hell the data team does all day. Data teams need to standardize their knowledge and competencies, both from the basics of data and the tools they use. If data teams can do this, I think we’ll see a massive increase in the value that they provide to the business.
As the old saying goes, Read The F*cking Manual.
Listen to the audio clip above on this topic, which is also my 5 Minute Friday on Spotify.
Cool Weekend Reads
Hope you all had a great week.
Here are some cool things I read this week…
Also…I’m holding off on any posts on superconductors yet, as it’s unclear if this is the real thing or total BS (which has happened a lot in the past with this sort of thing). But if superconductors are the new AI or crypto, cool ;)
Tech, AI & Data
GENERATIVE AI AND FIRM VALUES (National Bureau of Economic Research)
This is one of the most fascinating things I’ve read all summer. It looks at the human workforce exposure from AI at various publicly traded firms and the expected earnings of these firms. Should be interesting to see how these predictions pan out in the real world.
Patterns for Building LLM-based Systems & Products (Eugene Yan)
This is a super useful framework for building LLMs.
AI and the Frontier Paradox (Sequoia Capital)
“We’ve been calling many different technologies AI for more than half a century. What we think of as AI is going to change again. And again.”
Yep…I remember at a super early ML startup I worked at in the early 2010s. We actually banned the word “AI.” Even “machine learning” seemed very edgy. I like this article because it covers the Frontier Paradox - “Because we ascribe to humans the frontier beyond our technological mastery, that frontier will always be ill-defined.” The future will always be…the future.
As an author, publishing a book is a lot like what I imagine it’s like to see your kids off to adulthood. I guess Matt Housley and I raised a good kid who’s done well in the world :)
It’s always a trip to see unsolicited and unexpected reviews of your work. Though it’s been one year since the book was published, seeing stuff like this never gets old. Thanks, Karen Zhang!
Four ways to shoot yourself in the foot with Redis (Phil Booth)
Good advice for avoiding dumb mistakes in Redis. I feel like many of these mistakes can also be applied to other databases - running a single instance, no alerts on memory or other usage, etc.
Business & Startups
Greg Rutkowski Was Removed From Stable Diffusion, But AI Artists Brought
Him Back (Decrypt)
“As the world of AI and art continues to evolve, the line between innovation and infringement becomes blurrier—especially when words like “styles,” “decentralization,” and open source come into play. The only certainty? The art community isn't shy about shaking up the digital canvas.”
AI artist creates cool art, opts out, and gets roped back in. My oldest son is a professional artist (he’s also 12), and I’m very curious about what the world of art and creativity mean for him as he gets older. I frankly have no clue…
Seed stage holds steady in Q2 2023 (Carta)
It’s not all doom and gloom out there. Seed is where the next opportunities are at, and let’s hope some seed companies grow!
Inside Apple’s India Dream (Nikkei)
India vs. China is the story of this decade and the next 30 years in the making. Apple is just one of many companies to pay attention to in India and beyond. Also curious if China becomes the next Japan or the next superpower. Time will tell…
New Content, Events, and Upcoming Stuff
This week
Monday Morning Data Chat - Why Apache Iceberg Won the Table Format War w/ Brian Olsen (Spotify, YouTube)
In case you missed it…
Monday Morning Data Chat - Why Your BI Team is Your Best Bet for Data Science w/ Dave Langer (Spotify and YouTube)
Monday Morning Data Chat - Dataframe Deep Dive w/ Devin Petersohn (Spotify, YouTube)
The Joe Reis Show
Scott Taylor - Scott Taylor - Being a Storyteller, Speaker, Creator, and Influencer in the Data Space (Spotify)
In case you missed it…
Ryan Boyd - Small Databases are Motherducking Awesome! (Spotify)
Kai Zenner - The Evolution, Challenges, and Potential of the EU AI Act (Spotify)
Benny Benford - Elevating Data to a Profession (Spotify)
Joshua Bowles - A Wide-Ranging Chat on ML and AI (Spotify)
Upcoming
Monday Morning Data Chat - The Rise and Importance of Business Language w/ John O'Gorman (Linkedin, YouTube)
The Joe Reis Show - Lots coming up! Vin Vashista, Kevin Hu, Gordon Wong, and many more….
August
US - Engineering Leadership Utah Meetup (8/8), Joe Reis + dbt roadshow - Atlanta (8/10) - register here, Utah Data Engineering Meetup (8/16) - Joe + Mage.ai!
Australia - DataEngBytes - I’ll be on the continental tour in Perth, Brisbane, Melbourne, and Sydney for a couple of weeks. August 2023 (more info and registration)
September
Joe Reis + dbt roadshow - Seattle (9/7) - details soon
Big Data London- 9/20
Europe - TBA
October
India - 10/12. Details TBA
Dubai - 10/16-10/19. Details TBA
Chicago - 10/26 - Details TBA
November
Canada - DAMA Toronto. Details TBA
Las Vegas - ReInvent - got a massive special announcement in store :)
2024 - lots of stuff. Stay tuned :)
Thanks! If you mind helping out…
Thanks for supporting my content. If you aren’t a subscriber, please consider subscribing to this Substack.
You can also find me here:
Monday Morning Data Chat (YouTube / Spotify and wherever you get your podcasts). Matt Housely and I interview the top people in the field. Live and unscripted. Zero shilling tolerated.
The Joe Reis Show (Spotify and wherever you get your podcasts). My other show. I interview guests, and it’s totally unscripted with no shilling.
Fundamentals of Data Engineering (Amazon, O’Reilly, and wherever you get your books)
Be sure to leave a nice review if you like the content.
Thanks! - Joe Reis
Agreed. And I'd go farther - a lot of the problems come from lack of actual knowledge in the foundations (I'm talking for data science): good old fashioned statistics and basic scientific method. We've focused on tooling and fancy stuff on top, forgetting that competency takes study and effort.
Enjoy your trip! Thank you for sharing sharing the personal voice message with the catchy duration of 5 minutes tops. That is the kind of content that stands out and makes it easy to digest any time of the date.