Debugging Microservices: The Ultimate Guide
Microservices have come a long way from being a shiny, new cool toy for hypesters to a legitimate architecture that transforms the way modern applications are built. Microservices are loosely coupled, independently deployable and scalable, allow a highly diverse technology stack – and these are just some of their biggest advantages. Also, these are some of their biggest disadvantages, especially when it comes to debugging microservices.
That’s because all the world’s a trade-off. And all those great advantages come with a price tag attached. For a long time the tag has been too high for many teams. In this blog post I am going to discuss the issue that was (and still is, in some cases) a very significant part of the aforementioned price – difficulties in debugging microservices. Then, I will recommend tools (one of which is our own production debugger Lightrun) and platforms that can help overcome these problems, because microservices aren’t going anywhere.
What is Microservices Architecture
Before we start, though, let’s clearly define a few things. First of all, “microservices” and “serverless” are two different things. Well, right, pretty often microservices are built using serverless architecture, and pretty often the serverless architecture is used bearing microservices in mind. And yet, the main goal of serverless is to reduce the total cost of ownership of an application – i.e. reduce the cost of managing servers and usage bill – and it has nothing to do with microservices. It is still possible to build a monolithic web application running entirely on AWS ElasticBeanstalk or Azure AppService or deploy a microservice on top of a nginx server running on EC2.
After this subtle but legally important distinction, note that I will still address serverless debugging issues alongside microservices debugging issues since they interpolate very often.
Another important thing to mention is that the microservices architecture is just a subclass of a more broad and comprehensive cloud-native paradigm, which introduces even more challenges for development teams (out of the scope of this post). But whether you deploy microservices in the public cloud or totally on-prem, you will face the same difficulties. (I don’t assume your application’s gender, age, religion or cloud. And, of course, the language it is written in.)
Microservices Architecture Creates Microservices Debugging Challenges
Imagine that your huge, cumbersome monolith full of shitty legacy code and written in some old-fashioned, boring dinosaur language starts falling apart into beautiful, tiny microservices. What can go wrong?
A lot of things. And when they do go wrong you will find out that, all of a sudden, you can’t put a breakpoint in that new tiny beautiful microservice! You can’t make your favorite IDE debugger just stop there, you can’t see the stack, the values of variables, the process memory, you can’t pause threads and step through the code line by line (well, I do assume language(s) here, sorry for that).
You can’t do all this because the suspicious code is now not just some class instance running at worst as another thread in the process your IDE is attached to. It is now a dedicated Docker container/Kubernetes pod where it runs written in another language: stateless, asynchronous, lonely.
Or even worse, it is now a Lambda function, which is born and dies hundreds of times in a second somewhere in a distant cloud, throwing NPEs every time it starts. How in the world am I supposed to debug a microservice like that? What have I done?
This post comes to the rescue. There are a lot of techniques, tools, and even startup companies that have emerged to address this problem. It is a vibrant and constantly evolving (which is another way to say poor and incomplete) ecosystem that I will review in two steps: debugging microservices locally (this blog post) and debugging microservices in production.
How to Debug Microservices Locally
Let’s see how it is possible to debug microservices when you either develop them locally or try to reproduce and fix a bug. Before I get into solutions, let’s outline the challenges you will face doing that.
Debugging Microservices Locally: The Challenges
In a good old-fashioned monolith, the functionality (i.e. adding an item to a shopping cart) you were trying to debug was implemented by a couple of classes. These classes made it easy to gain a holistic view.
Now, the same functionality is implemented by a couple of separate microservices, and each can be either a Docker container, a Kubernetes pod, or a serverless function. You are supposed to run all of them simultaneously in order to reproduce a bug and then, after a fix, perform an impact analysis.
To top it all off, to recreate the exact picture, each one of these services must be of the same version they are in production – either all together with the same version, or, even worse, the version each service was running in the production environment where the bug was reported. Creating this environment is a huge challenge, and if you don’t do it right you won’t be able to properly debug your microservices.
Direct synchronous method invocations (or, at worst, message queues between threads) are replaced in microservices with either synchronous REST or gRPC API calls. Even worse, sometimes they are replaced with an asynchronous event-driven architecture based on plenty of available message queues (async gRPC is also an option).
Too bad that issues occurring with in-process message queues are nothing like what you face with distributed message queues: the configuration is complicated and has a lot of nuances, latency and performance are not always predictable, operational costs are very high (yes, Kafka, I am looking at you) and you may run out of a budget very quickly if you are using managed solutions.
Forget about stack trace, forget about logs. Actually no, don’t forget about logs, forget about understanding anything by digging into those of a single microservice. Those magic ERROR lines you are looking for may be printed into logs of some other microservice at an undefined time offset, messed up with totally unrelated ERROR lines which were printed while handling a different HTTP request. In other words, recreating the application state which led to a bug is often mission impossible.
Back in the day it was one language to write them all, now it is a Noah’s ark of languages and you might have no idea WTF is going on with this “undefined has no properties” error that some weakly dynamically typed language loves to throw (who let this become a backend language, for crying out loud?).
Technical Difficulties in Running Microservices Locally
Well, that’s what Docker was invented for in the first place, right? Docker-compose up and we are done. OK, but what about a Kubernetes cluster? A Kafka cluster? A bunch of Lambda functions? And then your laptop ~melted~ needs more RAM and CPU.
Now it is easy to see why until recently many teams just gave up. For some it cost days, for some it was weeks of frustration, anger and suppressed aggression – and I didn’t even get to production debugging of microservices. The industry reacted quickly to this mess and came up with plenty of solutions addressing these issues. Granted, these are still not even close to providing the speed and convenience of debugging a monolith with an IDE debugger, but the gap is slowly closing. Let’s take a close look at what you can do.
How You Can Debug Microservices
So what is in our microservices debugging kit as of July 2020? Let’s look at the main tools and platforms out there, and how they can help you.
Cloud Infrastructure-as-a-Code Tools
There are plenty of configuration orchestration tools, which include, among some others, Terraform and AWS CloudFormation, as well as configuration management tools like Ansible or Puppet, which automate deployment and configuration of complex applications. Debugging microservices with these tools allows creating a quick and seamless debugging environment – subject to your budget constraints, of course. To optimize costs, you can offload only some of the services to a remote cloud and run the rest locally on your machine.
All microservices should send logs to a centralized, preferably external, service. This way you can investigate, trace and find a root case for a bug much easier than switching between multiple log files in your local text editor. You can choose from plenty of managed services like Logz.io and Datadog, deploy your own ELK stack, or just send the logs to ~/dev/null~ cold S3 storage. In case you do not know when you will need the logs, this is a much cheaper option and you can always fetch them later. The most important thing is to implement a Correlation Identifier, and then there are more best practices you should definitely read about.
Serverless Frameworks IaC
Some of your microservices might be implemented using serverless solutions like FaaS and/or other managed services like API Gateway. There are two main players that provide Infrastructure-as-Code frameworks for serverless: the cloud agnostic Serverless and AWS SAM, which is just an abstraction layer over CloudFormation. Back in the day, it was a real mess to develop and debug FaaS, but these days both allow local debugging, while SAM even allows using a local debugger in popular IDEs (Visual Studio, IntelliJ IDEA) with its handy AWS Toolkit. A real time saver!
Running Docker Compose locally is trivial unless you’re using a sophisticated architecture, such as a Kafka cluster alongside your Docker containers. Then things start getting complicated while still feasible – take a look.
When it comes to Kubernetes though, it is much more difficult. There are some tools that try to simplify local Kubernetes deployment, such as Microk8s and Minikube, but both require a lot of effort to be invested – well, you should not expect your life to be easy when dealing with Kubernetes anyway.
Dedicated “Debuggers for Microservices”
Not very convincing until now, right? I mean, after a lot of effort you can (barely) create the microservices debugging environment and see logs in a manner which makes sense – things you hardly bother about when debugging a monolith. But what about the debugging capabilities that really matter – setting breakpoints throughout the application, following variable values on the fly, stepping through the code, and changing values during run time?
If your microservices leverage the Kubernetes platform, you can get all of these, at least to an extent. There are two powerful open source tools, Squash and Telepresence, which allow you to use your local IDE debugger features when debugging the Kubernetes environment, preventing your laptop from melting down when running Minicube.
Squash builds a bridge between some of the popular IDEs and debuggers (here’s the full list) and uses a sidecar approach to deploy its client on every Kubernetes node (the authors claim very low performance and resource consumption overhead). This allows you to use all the powerful features of the local debugger such as live debugging, setting breakpoints, stepping through code, viewing the values of variables, modifying them for troubleshooting, and more. You can find a thorough guide here.
Telepresence operates quite differently: it runs a service you want to debug locally, while connecting it to a remote Kubernetes cluster, so you can develop/test it locally and use any of your favorite local IDE debuggers seamlessly. A bunch of tutorials, FAQs and docs can be found here.
Unless I missed something (let me know in the comments), that’s what you have in your hands in the mid 2020 when it comes to debugging microservices locally. Far from ideal, it is much better than just a couple years ago, and it is constantly getting better.
In the next blog post I will discuss the tools and best practices for debugging microservices in production!
Spoiler: a great tool to debug microservices in production is Lightrun. You can add on-demand logs, performance metrics and snapshots (breakpoints that don’t stop your application) in real time without having to issue hotfixes or reproduce the bug locally – all of which makes life much easier when debugging microservices. You can start using Lightrun today, or request a demo to learn more.
It’s Really not that Complicated.
You can actually understand what’s going on inside your live applications.