When you're developing software, bugs will appear.
Some things won't work as expected while developing them, and others will stop working after some time.
Often, the cause of a software failure is easy to find. But sometimes the hunt for a bug seems to drag on and on and on...
You'll develop and refine your own set of debugging strategies over time, adjusted for the tech stack you're working with.
The following are 10 of my most often used debugging strategies.
I'm a JS full stack developer, so YMMV.
Bugs typically reach me in one of three ways:
My first step is to reproduce it on my local machine with the current development build, if I didn't find it there myself. If I can't reproduce it there, I check the bug description for the software version, environment, and OS/browser version. Can you reproduce the issue locally? Not being able to reproduce it locally makes it much more difficult to debug stuff.
All following steps are only necessary when the cause of the bug is not obvious from e.g. error messages (
Cannot read property 'x' of undefined
on a specific line).
Sometimes there isn't an actual bug in your software. Get the easy fixes out of the way first. It's frustrating if cleaning the build cache fixes your error after two hours of fruitless investigation.
Of course, if the error happened right after you changed some code, skip this and go straight to the actual investigation (see below).
Clean up, reinstall dependencies, and rebuild your project.
This "fix" often helps when a project won't build locally (and the fault isn't wrong or missing configuration).
Remove all build artifacts, clean the build cache, and remove the local dependencies (e.g. node_modules
).
Then install everything again and start a clean build.
If that doesn't help, upgrading your dependencies is always a good idea to get the latest bug fixes. Check if your dependencies are up-to-date (run yarn outdated
or npm outdated
), and upgrade to at least the latest minor and patch versions for your major version. Be wary of major version upgrades and always check for breaking changes.
If you can reproduce the error and the easy fixes don't work, dig deeper.
Investigate when exactly the error appears. Which user or API inputs lead to this? Does the application need to be in a certain state? Can you trigger it with any other inputs?
Understand all data flows that these inputs take. What happens to those inputs? What other changes do they trigger? Look at all places these variables are used. Are there side-effects?
Do all inputs and outputs of relevant functions contain the data you expect? You can either add logs, or use a debugger. Everybody has their preferences, find out when you want to use which.
Look closely at the error message. Yes, they sometimes are a confusing mess. But it may hide a valuable piece of the puzzle! Try to understand all parts of the error message.
If there is no error message, look if some function “swallows” the error without logging it in necessary detail.
Sometimes you won't find out the fault in the program by understanding the data flows, inputs and outputs, at least not easily.
It may be faster to do something I call “difference-based debugging”. Find two versions of the software: one that has this bug, and one that doesn't.
Does this bug happen on all your environments? Check the current development branch. And check all deployed versions. If one stage does NOT have this bug, look at the diff between both code versions. Narrow it down to two versions that are as close as possible to each other, the differences may tell you where the bug originated.
Test different operating systems, browsers, and runtime versions. Does the bug happen everywhere? If not, why?
Look for breaking changes in recently updated dependencies. Maybe you didn't catch all changes while upgrading. Also, look out for externally loaded libraries (like the Google Maps JS API client), as some use the latest weekly/quarterly version and you need to fix breaking changes yourself.
Try out older builds/revisions of your software. If all environments show similar behaviour, this is worth investigating. Check out a build/commit from one month ago. Is the bug still occurring? If not, check two or three months ago. Has this bug always been there, from the beginning of the affected functionality?
Create a minimal example. Comment out blocks of code. Set input objects to pre-defined variables. Strip out code until you have a minimal working feature that exposes the bug. That makes it easier to dig deep.
When you're working on fixing a bug, you're constantly optimizing your chances of finding and understanding the bug against the time you spend.
All methods take time. And there is a fine balance between how much time you invest before switching strategies.
If you can't make sense of the error message, and the code is complicated, it's useful to jump directly to difference-based methods.