Joe's Nerdy Weekend Reads #4
It's the weekend, so time to unwind. Pour a cup of coffee (or many) and enjoy some interesting reading.
Happy Saturday!
It was conference week in the USA, and I ended up skipping Data Universe and Google Cloud Next to spend some much-needed time at home. The conferences looked super fun, and I definitely felt some FOMO. That said, it’s nice to be domestic for a bit, as I am about to start another travel bender next week.
This week, I spent quite a bit of time writing and editing the early chapters of my book. I expect to have the chapters available on Practical Data Modeling starting next week (FYI, they will be paywalled). Also drafting some articles, also to be released “soon.”
Otherwise, most of my published content this week was podcasts and live shows, which I’ll recap for you.
Katharine Jarmul kicked the week off on the Monday Morning Data Chat, where we talked about whether we’re solving the “right” problems with AI. Katharine is a world-class security and privacy expert, and we had a really candid and great chat. Check it out here.
I just released a 5-Minute Friday called “Data Oceans” about how we need to think beyond rows and columns. There’s a massive world of unstructured datasets out there, especially images and video. Most of this data just sits idle on hard drives. What are we doing to unlock meaning and value from this data? I’ll write more about this in articles and my upcoming data modeling book (expected to be released in late Summer 2024).
If you’re into data modeling, check out the really awesome interview with Keith Belanger, a data modeling OG and an all-around good guy. Also, I’ve got a great podcast dropping next week with data legend Kent Graziano. The conversation went way different than I expected, which is cool.
Next week, I’ll be locked away, recording my upcoming data engineering course. It’s massive in terms of content and really complements the Fundamentals of Data Engineering. I think you’ll really enjoy it!
Right after that, I’m off to Chicago for a live broadcast for Matillion’s Deep Dish Data Series on Tuesday, 4/23. It's going to be great, so register here.
Anyway, off to watch my son’s climbing competition. Have a great weekend!
Thanks,
Joe
P.S. If you want to take your data career to the next level, check out my upcoming course with
. Get a 25% discount here.If you haven’t done so, please sign up for Practical Data Modeling. There are lots of great discussions on data modeling, and I’ll also be releasing early drafts of chapters for my new data modeling book here. Thanks!
Cool Weekend Reads
Here are some cool things I read this week. Enjoy!
The new Standard for LLM Benchmarking (Mir’s Data Report)
“According to my research, no model was accurate on more than 50% questions (~20-40% was the range for most). Great, you say. Next year we will have 2x better models, so they will have 100% accuracy. No, they won’t! Next year, improvement in technology will allow me to introduce at least 2x more complexity into my benchmarking dataset for the same amount of time and resources, likely lowering the performance of these models further by 2x, so the end results will be similar.
This is because code generation is inherently a relative problem. No one today cares much whether ChatGPT can write more code than the average programmer from the 1980s. What matters is whether ChatGPT can replace a modern-era engineer, data analyst, etc. And if we take me as a benchmark for the level of an average Data Analyst, the answer is resounding: No - raw foundational models are not enough by themselves to replace the human coder.”
Code generation is hard, and I’m hearing mixed feelings about it from developers. Some like it, some hate it, and others are meh. But, like it or not, code generation is here to stay, and getting integrated into all sorts of workflows and products.
There are many whitepapers and research about text-to-SQL with LLMs. That’s cool, but I haven’t seen anywhere where someone can easily interact with and benchmark these models on their own questions.
Try out the benchmark yourself.
Gemini 1.5 and Google’s Nature (Stratechery by Ben Thompson)
Google announced a ton of new AI products and features. In this article, Ben Thompson discusses the big announcements. Google’s definitely at a crossroads. The nature of the web is changing with LLMs. Google’s got the best dataset on the planet, IMO. But how do they move from selling search ads to selling AI products? We’ll see, but if there’s any company that has the potential for this, it’s Google. At the same time, Google has great products and potential, but they’re also colossally bad at marketing their data/AI products and gaining trust for long staying products.
Data Platform Explained (Spotify Engineering)
“Since the beginning, Spotify has been a data-driven company. Today, we rely on insights that are drawn from a staggering 1.4 trillion data points processed daily.”
That's a LOT of daily events to process! Spotify’s data platform is legendary. This article is a primer on Spotify’s motivations for its data platform. Stoked to read more detailed articles from them.
Teachers are using AI to grade essays. Students are using AI to write them (CNN Business)
Generative AI in education is the Wild West. I speak with teachers, professors, and students, and I don’t get the sense anyone has a game plan. Teachers use ChatGPT to write lessons and grade assignments, and students use it for all sorts of stuff. I’m very curious whether generative AI will augment the learning experience or whether it’s the road to Idiocracy. I’m thankful my kids aren’t too keen on ChatGPT, preferring to do things the old-fashioned way - read and try to learn things.
Survey shows that teenagers are using more VR devices in the US (9to5 Mac)
“According to the survey published this week, the weekly use of virtual reality devices increased from 10% to 13% compared to the fall of 2023. Although this is still a low figure, it shows that people are becoming more interested in such technologies.
At the same time, the Piper Sandler study points out that 33% of teenagers in the US now own a VR device. Last year, that figure was 31%.”
I had no idea that device usage and teenage ownership of VR are this high. My kids play with VR headsets once in a while but spend way more time playing console video games.
Other cool reads…
Typosquatting Campaign Targets Python Developers (Phylum)
The lifecycle of a code AI completion (Sourcegraph)
Meta’s new AI chips run faster than before (The Verge)
‘Social Order Could Collapse’ in AI Era, Two Top Japan Companies Say (WSJ)
The dry sky: future scenarios for humanity's modification of the atmospheric water cycle (Cambridge)
What Phones Are Doing to Reading (The New Yorker)
Hype Deflation & Inflation (Investing 101)
New Content, Events, and Upcoming Stuff
Monday Morning Data Chat
Coming up…
Solomon Kahn, David Yaffe & John Kutay, and more…
In case you missed it…
Katharine Jarmul - Are We Solving the “Right” Problems with AI? (Spotify, YouTube)
Matt Turck - The 2024 MAD Landscape (Special Show) (YouTube)
Cedric Chin & Sam Taylor - Communicating Sophisticated Stuff to Stakeholders (Spotify, YouTube)
Martin Musiol - Martin Musiol - Generative AI: Navigating the Course to the AGI Future (Spotify, YouTube)
Tony Baer - The Outlook for Generative AI in 2024 (and Beyond) (Spotify, YouTube)
The Joe Reis Show
Coming up…
and many more!
This week…
5 Minute Friday - Data Oceans (Spotify)
Keith Belanger - The Art of Data Modeling (Spotify)
In case you missed it…
5 Minute Friday - Your Mileage WILL Vary With Analytical Data Modeling (Spotify)
Kishore Aradhya - Kishore Aradhya - Teaching Tech and Data in a FAST Moving World (Spotify)
Toby Mao - SQL Mesh, Simplifying Data Transformations, and more (Spotify)
Angel Narciso - Live From LEAP Riyadh, Saudi Arabia! (Spotify)
5 Minute Friday - The Inverse Relationship of Talking About Value vs Adding Value (Spotify)
Bill Inmon - History Lessons of the Data Industry. This is a real treat and a very rare conversation with the godfather himself (Spotify) - PINNED HERE.
Events I’m Speaking At
Matillion - Deep Dish (Virtual) - April 23. Register here
J On the Beach (Malaga, Spain) - May 6-10. Register here
GenAI Conference (London) - May 20-22 Register here
DAMA Days (Vancouver, BC) - June 14th, TBA
AI Quality Conference (San Francisco) - June 25th Register here. Rumor has it I’ll also be DJing there…
(Taking the Summer off, sort of…)
Big Data London - September, TBA
DataEngBytes (Australia) - Late September/Early October, TBA
Gitex (Dubai) - Fall, TBA
Helsinki Data Week - Fall TBA
Lots of other stuff in Europe - Fall, TBA
Asia - Fall, TBA
Thanks! If you want to help out…
Thanks for supporting my content. If you aren’t a subscriber, please consider subscribing to this Substack.
Would you like me to speak at your event? Submit a speaking request here.
Want to sponsor this newsletter? Fill out this short form.
You can also find me here:
Monday Morning Data Chat (YouTube / Spotify and wherever you get your podcasts). Matt Housely and I interview the top people in the field. Live and unscripted. Zero shilling tolerated.
My other show is The Joe Reis Show (Spotify and wherever you get your podcasts). I interview guests on it, and it’s unscripted and free of shilling.
Practical Data Modeling. Great discussions about data modeling with data practitioners. This is also where early drafts of my new data modeling book will be published.
Fundamentals of Data Engineering by Matt Housley and I, available at Amazon, O’Reilly, and wherever you get your books.
Be sure to leave a lovely review if you like the content.
Thanks!
Joe Reis
Joe’s nerdy weekend reads are on this #dataninja’s #mustread list.
Candidly, Joe’s rants and scribbles are stylistically shockingly similar to my own. Transparently, one of the “smooth” rocks in my path to expanding my discourse footprint is avoiding the perception of #mealso patterns that might cast the shadow or suggestion of disingenuous imitation.
An inner voice problem more than anything, but it doesn’t change the fact that Joe’s signal to noise ratio is off the charts, and while stating the obvious to most, it’s even off the charts according to the measuring stick that I use on myself and NOBODY ELSE EVER!
Blessed to be alive, healthy, and always learning from folks who #keepitreal, pride themselves on being #genuine and are passionate not just about learning but also teaching, which is the only motion besides DOING that unlocks even more learning opportunity!