We all start by writing bad code - its a rite of passage for programmers.
I asked the below question on StackOverflow in 2018 and I got berated by a SO user and I felt so bad that I deleted my question.
I keep it as a reminder to myself as to how far I have come on this journey.
![[43534405_297326430868843_7395248842586718208_n.png]]
## Data Engineering Landscape
Fast forward 7 years later after I started my foray into writing code - I wouldn't call myself a professional programmer by any means. I prefer to think of my skillset as optimization and problem solving using tech.
The data engineering landscape has also changed so much since I started exploring it (almost all the tools in the stack have changed without exception).
Database: Postgres
Python package installers: pip -> poetry -> uv
Data manipulation library: pandas -> polars
Data notebooks: Jupyter -> marimo
Hosted data notebooks: Google Colab -> Deepnote
Data applications: Streamlit
Data warehousing: BigQuery -> Motherduck (duckdb)
Postgres is still the developer's choice for a relational / transactional db - it has a long history of stability, actively supported by the community, and also popularized by hosting services such as supabase (server) and neon (serverless).
## Tool Selection
You want to choose tools that are well-supported, especially in the age of LLMs, as thats where you have more users raising (and fixing) issues and then feeding this back into the LLM training data.
As code is permissionless leverage, anyone with a computer can create code. A small 2-3 man team can create a Github repo and call it a new product. This is good for code consumers as we live in an age of abundance, but it becomes even more important to have a clear mind to curate your toolstack religiously and stick to the basics, as more and more libraries (abstractions) appear to move your understanding further and further away from the metal.
## Reflections
If I had to start over, what would I have done differently?
- **Take responsibility and ownership for your work**
I didn't present my work, in fear that others would judge me for it.
- **Visualize your stack**
Draw a diagram to map out what your code does. I use Excalidraw / mermaid.js to accomplish this nowadays.
- **Choose the right tool for the job**
Evaluate the requirements of your task and choose suitable tools. Admittedly this gets easier with experience after you experience what happens when you make a bad decision and commit to it e.g. use BigQuery as a transactional DB
- **Persist your data**
When I was collecting data, I never had the habit of persisting it. Even if I did, I had very bad naming habits that were not sanitary. Eventually, your work and collection will collapse under the weight of bad habits.
- **Burnout**
I experienced severe burnout a few times, because *I didn't manage my time well*. I was coding all day, every day, but the quality of the output was low, because I didn't plan what I was going to do ahead, so each time I was working on the computer I was banging my head against a brick wall and hoping for results to come.