Fundamentals of Data Engineering - 2.5 Years Later
Reflections and advice on writing a best-selling technical book

Fundamentals of Data Engineering (FoDE) arrived on July 22, 2022, exactly 2.5 years ago. Coincidentally, this morning, I read a lovely review about the book from Sergio Ramos called “Self Taught Reviews: Fundamentals of Data Engineering by Joe Reis and Matt Housley.” The review got me thinking about starting the book, writing it, and its impact on people and my personal life.
Around 2020, data engineering was blowing up as a field. Matt Housley and I had been running our data engineering consultancy for a few years and toyed with the idea of writing a book that described the field of data engineering from first principles in a tool and technology-agnostic way. That hadn’t been done before, and we were puzzled why. We felt like the way people - especially newcomers - learned data engineering was scattershot and overly tool-focused. We saw many tutorials and books about “learn data engineering with Spark,” but nothing that stepped back and asked - “What is data engineering in the first place?” What are the immutables of data engineering, as much as they can be in a fast-paced field? We wanted to write something relevant several years after the book’s publication.
Around the end of 2020, Matt and I got an inquiry about writing a book. The first publisher’s request - find a popular book, copy it, and tweak it a bit - didn't sit well with us. This sounded corny AF. We’re not a cover band. But this sparked something in Matt and I. This time, we felt, “Why not write that book we discussed?” We decided to find a better publisher. A friend helped us contact O’Reilly, who initially said this book was “too ambitious.” After some back and forth in the book proposal, we eventually were greenlit to write the book. In my mind, I knew this book would be a hit. Sometimes, you just have that intuition.
Writing the book was very intense. If you want to feel humbled, write a book. You start off feeling like you know your topic and quickly realize you’re in for a harsh lesson in humility. Even having worked in the data field for a long time, there’s a big difference between doing it and talking about it versus writing about the topic from first principles for a general audience. Nothing will force you to level your thinking more than writing for a general audience. You need to consider things not just from the angle of your job or experiences but empathize and broaden your thinking to the people who might read your book at any time in the future. You won’t know these people, but they’ll be hanging on your every word, critiquing and judging you every step of the way. Your job is to help these people learn what you’re teaching and keep them reading through the end of the book. The hard part about writing a fundamentals book is distilling complexity into something simple and understandable for newcomers. I’d say it’s more challenging than writing an expert-level book precisely because you’re writing for a broad audience from countless perspectives. Making things simple is incredibly difficult. This is much easier said than done. If you don’t believe me, try it yourself 🙂
Writing a book doesn’t mean you get to ignore the other things in your life. While writing the book, COVID was rampant, and I was running a business with Matt while living in the middle of a house remodel. For over a year, my family and I subsisted in the basement of the home we were remodeling, cooking our food on an air fryer and a microwave. I’d often sneak off to the Front climbing gym (our unofficial office) to write. With business booming and writing amidst the nonstop noise of saws, hammering, and other construction work, I kept writing the book—I have no idea how.
I’d advise you to weigh the pros and cons of writing with a co-author. Matt and I were business partners and very close friends (still are). We know each other so well that we can finish each other’s sentences. This is good and bad. It’s good because you know what direction to move in, and it's terrible since one person’s blind spot is the other’s, too. Matt and I have very different approaches to writing. Understand your co-author’s cadence and try to empathize with how they work. The stress was sometimes hard for Matt and me, and it bled into our friendship. I can’t imagine writing the book with anyone other than Matt, but I know we both left a lot of blood on the floor to finish the book. We still have scar tissue.
The expectations and attention to the book made FoDE different from other authors' experiences. The book was getting a lot of attention and buzz. Several months before it was due for publication, it was already a top new release on Amazon. This meant a ton of pressure to make sure the book was not just good but as great as we could make it.
The book launched on July 22, 2022, and immediately shot to the top of several categories on Amazon. Suddenly, we were on the podcast and speaking tour circuit, appearing at Data Driven NYC and other prestigious data events around the world. Pretty soon I was regularly traveling the world giving talks and meeting with data/AI leaders and practitioners from…everywhere.
People told me there’s your career before and after writing a book. This is very apt. If you want to elevate your profile, write a book. For whatever reason, writing a book puts you in a different class. Even though you’re still the same doofus who sometimes puts their shoes on the wrong feet or underwear backward, you’re now an “authority.” I’ve never been comfortable with this status. My self-talk is lunatic - I’ve always had a chip on my shoulder and I’m never good enough. So, I’m constantly pushing myself ultra-hard, always at a breaking point. I know this is insanely unhealthy, but that’s how I’m wired. It’s a feature and a bug. Part of my therapy is writing books. For me, there’s no substitute for the deep thinking and learning that writing a book forces upon you. I’m in the middle of another book and will continue writing books until I die.
Of course, every book gets its fair share of criticism and bashing. Before our book came out, Bill Inmon (the godfather of the data warehouse and author of 65+ books) gave me advice that’s still highly relevant today: Expect some people to hate not just your book but you personally. I’ve got my fair share of haters. I pay them no individual attention but live rent-free in their heads. It is what it is.
The criticisms of FoDE share a theme that it's “too basic” and “I already knew all of this.” Someone even left an Amazon review saying, “It’s just a bunch of words.” In theory, these comments could be said about any book. To these critiques, I reply, “Cool, good for you.” The book is advertised and titled as a fundamentals book, meaning it starts at ground zero. It makes no pretension at being anything other than a fundamentals book. At one point, everyone starts at zero. Some move on from zero. Everyone is somewhere on their learning journey. Respect that.
Having met countless data engineering practitioners of all levels worldwide, I’ve yet to meet even expert practitioners who have a complete grasp of all fundamental concepts in data engineering. We even got feedback from our heroes in the field, who said they learned something new from FoDE. If your knowledge and experience are beyond the fundamentals, this book probably isn’t for you. I suggest reading what I consider the “sequel” to FoDE, Designing Data Intensive Applications, Database Internals, anything by Chris Date, and whatever else will help you hone your skills, which likely require more specialized knowledge. But like I said above, if you feel like you can do a better job at writing a fundamentals book on data engineering, nothing is stopping you. Go for it! I’ll be cheering you on from the sidelines.
However, these criticisms represent the opinions of a nanoscopic minority. Almost universally, FoDE is applauded for bringing data engineering to the masses in an easy-to-read and digestible way. The framework of the data engineering lifecycle and undercurrents is referenced countless times in classrooms, universities, businesses, and social media. I can’t tell you how many talks I’ve given at Big Tech, Fortune 500 companies, prestigious universities and colleges, and everything else. I get messages every day from people thanking us for writing the book. Andrew Ng’s DeepLearning.ai (in partnership with AWS) published a stellar 16-week data engineering specialization on Coursera, based on the book. That’s the power of frameworks.
The book - now available in audiobook format and translated into many languages - continues to be a best-seller. Most books dwindle in sales a few months after publication. To be a best-seller a few years after publication isn’t a fluke. At that point, word of mouth sells the book. Given the number of data engineers entering the field, I suspect this book will serve them for a while. And on a personal level, the book’s success allowed Matt and I to have new opportunities that weren’t possible beforehand. We went from being two dorks living in relative obscurity in Utah to being “nerd famous.” Right now, we closed our consultancy and pursuing new paths.
If you had asked me in 2020 if this would have happened, I would have laughed at the ridiculousness. Today, I am still humbled, and I thank all of you for your support.
Thanks! If you want to support this newsletter
The Data Engineering Professional Certificate is one of the most popular courses on Coursera! Learn practical data engineering with lots of challenging hands-on examples. Shoutout to the fantastic people at Deeplearning.ai and AWS, who helped make this a reality over the last year. Enroll here.
Practical Data Modeling. Great discussions about data modeling with data practitioners. This is also where early drafts of my new data modeling book will be published.
Fundamentals of Data Engineering by Matt Housley and I, available at Amazon, O’Reilly, and wherever you get your books.
The Data Therapy Session calendar is posted here. It’s an incredible group where you can share your experiences with data - good and bad - in a judgment-free place with other data professionals. If you’re interested in regularly attending, add it to your calendar.
My other show is The Joe Reis Show (Spotify and wherever you get your podcasts). I interview guests on it, and it’s unscripted, always fun, and free of shilling.
Want me to speak at your event? Please submit a speaking request if you want me to speak or give a workshop at your event.
If you’d like to sponsor my newsletter, please reach out to me.
I'm grateful you wrote this book, and for this article. You highlighted so many hard truths that are unknown (pun intended) to most, and hardly ever mentioned.
Nothing worth doing is easy - teaching fundamentals is akin pouring the foundation for a building. It has to stand the test of time, to support whatever is built on top. 20 years after I came up with this analogy - it still holds.
Foundation is something we don't see, we don't think of - we like the shiny countertops and latest design of cabinets, etc. Those - we see and admire. But all will topple if the foundation is not proper.
To any haters - I invite you to take one topic and teach it in an agnostic way. No mention of any tool, no LLMs use. Then come back and let's have a conversation.
Last but not least, I'm really glad you read Sergio's article!
Thanks Joe for the book ! Is a great DE baseline. by the way the course by Coursera DE+AWS+speakers is awesome and build the optimal path.