No one knows how it works
You need to modify system X. It has been running smoothly. Surprisingly, it hasn't been modified for years. There's some documentation available, but as you go into the details, you realize the necessary information is missing. To make matters worse, everyone who worked on the project has since left. Go figure.
The first issue comes from allowing a large, high-impact system to go into maintenance mode for too long.
A second issue is documenting what, but not the why. “The what” can be understood from the code, the rationale behind it cannot.
When building ML systems, build them for an ML newbie to maintain. If you build only for yourself, it’s easier to make it complex, overlooking the future costs. But ultimately, it’s your future self – or someone else – who’ll pay those costs. Instead, design as if you’re handing it off to someone who can say: “This is too complex and make it simpler.” That mindset keeps the system maintainable front and center.
Another option: build fewer, better things. Every ML feature comes with a fixed long-term maintenance cost. Keep them simple. Keep them few. Explore 20 ideas if you must – but launch only 3. You’ll not only make them better, but also streamline maintenance, spread knowledge more effectively, and reduce the upkeep burden. That frees you up to build what’s next.