The Software and Data Engineer Divide
The other day, I ran into an old friend at the park. She’s a principal-level software engineer who works at a company that sells an ML-heavy product. This isn’t a huge company, and I know a few other people that work there. So I asked her if she knew them and, if so, how they were doing. She said she didn’t know them and explained that her “engineering team” is very distanced from the “data team.” But wait, isn’t ML the product? Wouldn’t dev and data work side by side? Astonishingly not, but I should know better.
If you work in most companies, have you noticed a division between “engineering” and “data”? This divide is commonly seen in companies that are either mature (the divide has had time to harden) or based on preconceived notions of how the two should interact. In most situations, they don’t interact. Engineering throws stuff to the data team, expecting the data team to deal with it. As the old saying goes, shit flows downhill, in this case, to the data teams. This contributes to the old trope that “data scientists spend most of their time finding, cleaning, and making sense of data.”
Once upon a time, data flowed in a one-way path. Applications and ERP systems (among other sources) would provide the data upon which to build reports. Reports were consumed by “decision makers” for “actionable insights.”. That world still exists, but data is far more critical to a business. I’m sorry if your company is still stuck in the reporting-only mindset. Reporting is hard, and best of luck.
The world has moved on from data teams providing basic reporting. It did that a while ago. Beyond simple CRUD transactional use cases, data is becoming integrated into applications. The ubiquity of ML/AI, streaming data, reverse ETL (I wish this term would disappear), and other feedback loops back to the application mean data isn’t something that lives outside the application. Data is the application. Other forces like data products, data mesh, and data-centric models will nudge this along too. Inevitably, the software and data divide will disappear.
Listen to the audio clip above on this topic, which is also my 5 Minute Friday on Spotify.
Cool Weekend Reads
Tech, AI & Data
Why an Octopus-like Creature Has Come to Symbolize the State of A.I. (NY Times)
What does an awesomely scary HP Lovecraft character called Shoggoth have to do with AI? A lot…
Production AI systems are really hard (Methexis)
Remember several years ago when Radiologists were going extinct? It’s easy to prognosticate and hard to execute in the real world.
Modern AI is Domestification (The Gradient)
Is domesticating AI similar to animal domestication?
How to evaluate dependencies (Phil Booth)
“One of my stock interview questions goes: "When picking between dependencies to use in production, what factors contribute to your decision?" I'm surprised by how often I receive an answer along the lines of "Github stars" and not much else. I happen to think Github stars is a terrible metric for selecting production code, so this post sets out my idea of a healthier framework to evaluate dependencies.”
It also starts with my favorite piece of advice - RTFM
Is 20M of rows still a valid soft limit of MySQL table in 2023? (Yisheng's
blog)
Back in the days of HDDs, the common wisdom was don’t put more than 20M rows in a MySQL table. Is that still valid? Read on…
My Approach to Building Large Technical Projects (Mitchell Hashimoto)
One of the best articles I’ve seen in a bit for tackling large projects
Is Avoiding Extinction from AI Really an Urgent Priority? (AI Snake Oil)
“The history of technology to date suggests that the greatest risks come not from technology itself, but from the people who control the technology using it to accumulate power and wealth. The AI industry leaders who have signed this statement are precisely the people best positioned to do just that. And in calling for regulations to address the risks of future rogue AI systems, they have proposed interventions that would further cement their power. We should be wary of Prometheans who want to both profit from bringing the people fire, and be trusted as the firefighters.”
With the mass scare around AI-driven extinction being driven loudly by big tech companies, what are their ulterior motives?
Lawyer cites fake cases invented by ChatGPT, judge is not amused (Simon Wilson)
As a reminder, ChatGPT is full of shit. Use it at your own risk. If you are clumsy or lazy enough to use ChatGPT to cite imaginary law cases before a judge, you deserve what’s coming at you.
AI Camera with no lens 🤯 (The Prompt)
This is pretty fricking cool and creepy.
ChatGPT took their jobs. Now they walk dogs and fix air conditioners.
Welcome to the world where once-safe white-collar work will increasingly change or disappear. Expect a rant on this very soon.
Business & Startups
Company Lifetime Value (Rohit Krishnan)
“Companies today don’t just become larger than any before in record time, they lose it too in record time.”
Vint Cerf’s Career Advice for Engineers (IEEE Spectrum)
Brilliant career advice for engineers, but it also applies to most fields.
The Next Larger Context ( Camille Fournier)
As the old saying goes, what got you here won’t get you there.
Billionaires and the Evolution of Overconfidence (Forking Paths)
Is overconfidence a quirk that could make you rich AF?
How to Hire a Pop Star for Your Private Party (New Yorker)
Really fun to read about the insanely lucrative side hustles of celebrities.
New Content, Events, and Upcoming Stuff
This week
5 Minute Friday - The Software and Data Divide Needs to End (Spotify, or find wherever you get your podcasts)
Monday Morning Data Chat - Big & Small Data. It’s just Matt and me, which is a cool change. (Spotify, or find wherever you get your podcasts)
The Joe Reis Show
Brian Greene - How SWE influences Data Engineering, and much more (Spotify, or find wherever you get your podcasts)
John Giles - The Power of Using Data Model Patterns (Spotify, or find wherever you get your podcasts)
Upcoming
Monday Morning Data Chat - Saket Saurabh
The Joe Reis Show - John Kutay, and many others
Here are some cool upcoming in-person events I’ll be at in June and beyond for 2023
Vancouver BC DAMA. Friday 6/23
Ethan Aaron Low Key Happy Hour - Vegas Edition. Monday 6/26.
Data Engineering Meetup, San Francisco Edition - Tuesday, 6/27 (register here)
Striim event in San Francisco - Wednesday, 6/28. Details TBD
Joe Reis + dbt roadshow - Seattle, Atlanta, Chicago, and more. Details are coming soon.
Taking July off…🏔️, except for the Portable Conference and a few other things. My calendar is otherwise completely blocked off from July to early August, so let’s chat over email or similar.
Portable Low Key Conference - July 12
DataEngByes. I’ll be on the continental tour in Perth, Brisbane, Melbourne, and Sydney. August 2023 (more info and registration)
Big Data London - I’m keynoting. Big up the London Massive. September 2023.
Europe - September 2023 TBA
Dubai - October 2023.
Vegas - ReInvent 2023.
More to come…
Thanks! If you mind helping out…
Thanks for supporting my content. If you aren’t a subscriber, please consider subscribing to this Substack.
You can also find me here:
Monday Morning Data Chat (YouTube / Spotify and wherever you get your podcasts). Matt Housely and I interview the top people in the field. Live and unscripted. Zero shilling tolerated.
The Joe Reis Show (Spotify and wherever you get your podcasts). My other show. I interview guests, and it’s totally unscripted with no shilling.
Fundamentals of Data Engineering (Amazon, O’Reilly, and wherever you get your books)
Be sure to leave a nice review if you like the content.
Thanks! - Joe Reis