Technical Debt
In my short career so far as an engineer, one of the biggest headaches for me is technical debt. Startup culture puts a lot of emphasis on rapid growth and building out as quickly as possible. And with that comes a lot of tech debt. Based on mine and my colleague’s experiences, technical debt is one thing I don’t think tech companies put enough emphasis on.
Of course, I’ve never been in a formal leadership position at a tech company, so my opinion here should come with a grain of salt. I want to shed some light on that here today, and some principles that may help to minimize it going forward.
As a data scientist, I’ve had opportunities to cross-specialize as a Python developer, so some of the resources I note will be geared towards that. But the principles should apply regardless of which language you’re coding in.
What is technical debt?
In its simplest, technical debt is the cost of taking shortcuts. When engineers are met with tight deadlines or face incidents that require an immediate response, then often times a quick or hacky solution is used.
This includes solutions that either:
don’t solve the underlying problem
are extremely inefficient
use code that is poorly written and/or documented
It’s like using a band-aid to patch a hole in your boat; it’ll work for a bit, but you should not be riding in that boat in the long term without a real fix.
There’s 2 options here:
Let the debt accumulate: build poor infrastructure on top of poor infrastructure in order to get things working quickly.
Pay off the debt: invest time into implementing better fixes. But that means time that’s not spent on building new features.
The ultimate cost is that improper fixes can prevent new features from being built or scaled up. They can also make the software harder to maintain, as well as “leaky” and more susceptible to bugs.
Readability
A colleague once told me that as a software developer, I’ll spend 5-10x more time reading code than writing it. So far, I have found this to be very true.
And one of the biggest sources of technical debt I’ve seen so far is in unreadable code. In a few cases, I’ve seen code that’s so unreadable that trying to understand it either gives me a mental breakdown or makes me want to set my computer on fire.
Consider what happens when an engineer leaves the company; unreadable code makes it impossible to hand off responsibility between engineers in a timely manner. And in an industry where it’s the norm to change jobs every 2-3 years, that’s a recipe for disaster.
What’s the solution? Variable and function names should be descriptive, and comments should err towards overuse, and individual files should not be bigger than say, 200 lines of code.
For Python, a dynamically typed language, a mypy linter can be used to enforce static typing. In layman’s terms, it provides more information on whether functions and variables are being used properly. If a function expects a string and an integer is passed in, then the linter will throw an error. This makes the code a lot more traceable as a codebase gets bigger and bigger
Library dependence
Using open-source code libraries that aren’t well-maintained is another major source of tech debt I’ve observed. If that library deprecates key support, then you’re forced to either start maintaining the library yourself, or rewrite ALL code that uses that library. And that can eat up a LOT of time.
For instance, consider two popular Python libraries for reading Excel data: xlrd and openpyxl. The latter has a higher score on Snyk, which is a good indicator that it’s less risky to use. Looking a little deeper, xlrd has a much lower update frequency in comparison.
And that doesn’t even get into the security concerns…
Parting Thoughts
As a data scientist, I’ll admit that I’m guilty of writing hacky code sometimes. A lot of time, my code is only used for one-off analyses and cleaning data. It will usually only be run once, and will never see the light of day. Consequently, it doesn’t matter how messy, hacky, or unreadable it is.
Data scientists are generally taught to code in a very different way from software engineers, especially considering that most data scientists come from academic backgrounds in non-CS STEM backgrounds.
On the other hand, we’re not completely exempt from technical debt; sometimes our code becomes part of a data pipeline or a machine learning model that gets used in production. Also data scientists often take on software engineering responsibilities, especially at smaller companies, so that means learning to write good code.
One major distinction I’ve learned from this is that knowing how to code is not the same as knowing how to write software.
This can have a huge impact on what can be done within a tech company, which is why it’s crucial that all tech professionals understand this concept.