April 2026
There’s an instinct in data work to keep everything.
More data feels safer. More complete. More “correct.”
But this month pushed back on that idea.
Some fields looked fine at first glance, but the deeper we looked, the more they broke down. Inconsistent, partially filled, or just wrong often enough to create doubt.
At a certain point, the question changed.
Not “can we use this?” but “should we?”
And the answer was no.
Those fields were removed entirely—replaced with empty values, intentionally. It felt strange at first. Like throwing away information. But the result was a system that people could actually trust.
Because bad data doesn’t just sit quietly, it leaks into decisions, dashboards, and assumptions.
And in that sense, removing unreliable data isn’t a loss of information.
It’s an increase in clarity.
March 2026
There’s a point in a project where everything works… but something still feels off.
That showed up this month in an ETL pipeline. It was clean, correct, and reliable… but just slow enough to make every run feel a little heavy.
So I stepped back and asked a different question: what is each tool actually good at?
Python is great for orchestration, cleaning, and shaping data.
The database is built to ingest and store it efficiently.
That’s when it hit me. I had been so focused on building the pipeline that I lost sight of the system as a whole, and ended up asking one tool to do the job of the other.
Switching to bulk inserts let each part of the system do what it was designed for, and everything changed. Minutes became seconds, and the pipeline went from something that worked to something that felt right.
This was the perfect reminder that: good systems aren’t just built, they’re composed.
When each tool is used for its strengths, performance improves, complexity drops, and the whole system becomes easier to reason about.
February 2026
One of the biggest takeaways from my algorithm design course so far is realizing that most algorithms are not something to be memorized.
They emerge naturally from the properties of the data structure being used.
When you understand the structure, the algorithm often becomes obvious, as algorithms are often just constraints meeting structure.
Examples:
In that sense, algorithm design feels less like inventing clever tricks and more like discovering the structure already hiding inside the problem.
January 2026
Most of the work in real data systems is not modeling or visualization. It is fixing the data.
Common issues:
A large part of a data engineer’s job is building pipelines that normalize messy operational data into something analyzable. Clean data isn’t glamorous, but it’s the foundation everything else sits on.
December 2025
Coming from kinesiology influenced how I think about software systems.
Biological systems work through interacting subsystems such as:
Software architectures behave similarly. A good system is not one giant component. It’s a collection of smaller components that communicate clearly.
Healthy systems share traits:
© 2026 Mathieu Sawatzky