Microservices have come a long way from being a shiny, new cool toy for hypesters to a legitimate architecture that transforms the way modern applications are built. Microservices are loosely coupled, independently deployable and scalable, allow highly diverse technology stack – and these are only some of their biggest advantages. Also, these are some of their biggest disadvantages, especially when it comes to debugging microservices.
Because all the world’s a trade-off. And all those great advantages come with a price tag attached. For a long time the tag has been too high for many teams. In this blog post I am going to discuss the issue that was (and still is, in some cases) a very significant part of the aforementioned price – microservices debugging difficulties. Then, I will recommend tools and platforms that can help overcome these problems, because microservices aren’t going anywhere.
Characterizing Microservices Architecture
Before we start, though, let’s clearly define a few things. First of all – “microservices” and “serverless” – these are two different things. Well, right, pretty often microservices are built using serverless architecture, and pretty often the serverless architecture is used bearing microservices in mind. And yet, serverless’s main goal is to reduce the total cost of ownership of an application, i.g.reduce the cost of managing servers and usage bill – and it has nothing to do with microservices. It is still possible to build a monolithic web application running entirely on AWS ElasticBeanstalk or Azure AppService or deploy a microservice on top of nginx server running on EC2.
After this subtle but legally important distinction – it’s important to mention that I will still address serverless debugging issues alongside microservices debugging ones – since they interpolate very often.
Another important thing to mention is that the microservices architecture is just a subclass of a more broad and comprehensive cloud-native paradigm, which introduces even more challenges for development teams (which are out of the scope of this post). But whether you deployed microservices in the public cloud or totally on-prem – you will face the same difficulties (I don’t assume your application’s gender, age, religion or cloud. And, of course, the language it is written in).
Microservices Architecture Creates Microservices Debugging Challenges
So this is the world – your huge, cumbersome, full of legacy shitty code, written in some old-fashioned, boring dinosaur language monolith – starts falling apart into small tiny beautiful microservices. What can go wrong?
A lot of things. And when they do go wrong you will find out that, all of a sudden, you can’t put a breakpoint in that new tiny beautiful microservice! You can’t make your favorite IDE debugger just stop there, you can’t see the stack, the variables values, the process memory, you can’t pause threads and step through the code line by line (well, I do assume language(s) here, sorry for that).
You can’t do all this because the suspicious code is now not just some class instance running at worst as another thread in the process your IDE is attached to. It is now a dedicated Docker container/Kubernetes pod where it runs written in another language: stateless, asynchronous, lonely.
Or even worse, it is now a Lambda function, which is born and dies hundreds of times in a second somewhere in a distant cloud, throwing NPEs every time it starts. How in the world am I supposed to debug that? What have I done?
This post comes to rescue. There are a lot of techniques, tools, and even start up companies that have emerged to address this problem. It is a vibrant and constantly evolving (which is another way to say poor and incomplete) ecosystem that I will review in two steps: Microservices Local Debugging (this blog post) and Microservices Debugging in Production (stay tuned).
How to Debug Microservices Locally
Let’s see how it is possible to debug microservices when you either develop them locally or try to reproduce and fix a bug. Before I get into solutions, let’s outline the challenges you will face doing that.
Microservices Debugging Locally: The Challenges
In a good old-fashioned monolith, the functionality (i.e. adding an item to a shopping cart) you were trying to debug was implemented by a couple of classes. These classes made it easy to gain a holistic view.
Now, the same functionality is implemented by a couple of separate microservices, each one of which can be either a Docker, a Kubernetes pod or a serverless function. You are supposed to run all of them simultaneously in order to reproduce a bug and then, after a fix, perform an impact analysis.
To top it all off, to recreate the exact picture, each one of these services must be of the same version they are in production – either all together with the same version, or, even worse, the version each service was running in the production environment where the bug was reported. Creating this environment is a huge challenge, and if you don’t do it right you won’t be able to properly debug.
Asynchronous Calls Complications
Direct synchronous methods invocations (or, at worst, message queues between threads) are replaced in microservices with either synchronous REST or gRPC API. Even worse, sometimes they are replaced with asynchronous event-driven architecture based on plenty of available message queues (async gRPC is also an option).
Too bad in-process message queues issues are nothing like what you will face with distributed message queues: the configuration is complicated and has a lot of nuances, latency and performance are not always predictable, operational costs are very high (yes, Kafka, I am looking at you) and you may run out of a budget very quickly if you are using managed solutions.
Forget about stack trace, forget about logs. Actually no, don’t forget about logs, forget about understanding anything by digging into those of a single microservice. Those magic ERROR lines you are looking for may be printed into logs of some other microservice at an undefined time offset, messed up with totally unrelated ERROR lines which were printed while handling a different HTTP request. In other words, recreating the application state which led to a bug is often mission impossible.
Back in the day it was one language to write them all, now it is a Noah’s ark of languages and you might have no idea WTF is going on with this “undefined has no properties” error some weak dynamic typed language loves to throw (who let this one be a backend language, for crying out loud?).
Technical Difficulties to Run Microservices Locally
Well, that’s what Docker was invented for in the first place, right? Docker-compose up and we are done. Ok, and what about a Kubernetes cluster? Kafka cluster? Bunch of Lambda functions? And then ~your laptop melted~ needs more RAM and CPU.
Now it is easy to see why until recently many teams just gave up. For some it cost days, for some it was weeks of frustration, anger and suppressed aggression – and I even didn’t get to the debugging in production. The industry reacted quickly to this mess and came up with plenty of solutions addressing these issues. However, these are still not even close to providing the speed and convenience of debugging a monolith with an IDE debugger. But the gap is slowly closing. Let’s take a close look at what you can do.
How You Can Debug Microservices
So what is in our debugging kit as of July 2020? Let’s look at the main tools and platforms out there, and how they can help you.
Cloud Infrastructure-as-a-Code tools
There are plenty of configuration orchestration tools, which include, among some others, Terraform and AWS CloudFormation, and also configuration management tools like Ansible or Puppet, which automate deployment and configuration of complex applications. Debugging microservices with these tools allows creating a quick and seamless debugging environment – subject to your budget constraints, of course. To optimize costs, you can offload only some of the services to a remote cloud and run the rest locally on your machine.
All microservices must send logs to a centralized, preferably external, service. This way you can investigate, trace and find a root case for a bug much easier than switching between multiple log files opened in your local text editor. You can either choose from a plenty of managed services like Logz.io and Datadog, deploy your own ELK stack or just send the logs to ~/dev/null~ cold S3 storage. In case you do not know when you will need the logs, this is a much cheaper option and you can always fetch them later. The most important thing is to implement a Correlation Identifier and there are some more best practices you should definitely read about.
Serverless Frameworks IaC
Some of your microservices might be implemented leveraging serverless solutions like FaaS and/or some other managed services like API Gateway. There are two main players which provide Infrastructure-as-a-Code frameworks for serverless – cloud agnostic Serverless and AWS SAM, which is just an abstraction level over CloudFormation. Back in the day, it was a real mess to develop and debug FaaS, but these days both allow local debugging, while SAM allows even some popular IDEs (Visual Studio, IntelliJ) local debugger usage with its handfull AWS Toolkit. A real time saver!
Running Docker Compose locally is trivial unless you need some more sophisticated architecture, for example a Kafka cluster alongside the Docker containers. Then things start getting complicated while still feasible – take a look.
When it comes to Kubernetes though, it is much more difficult. There are some tools which try to simplify local Kubernetes deployment, such as Microk8s and Minikube, but both require a lot of effort to be invested – well, you are not supposed to expect an easy life when dealing with Kubernetes anyway.
Dedicated “Debuggers for Microservices”
Not very convincing until now, right? I mean after a lot of effort you can (barely) create the debugging environment and see logs in a manner which makes sense – things you hardly bother about when debugging a monolith. But what about the debugging capabilities which really matter – ability to set breakpoints throughout the application, follow variable values on the fly, step through the code, and change values during run time?
Well, if your microservices leverage the Kubernetes platform you can get all those, at least partially. There are two powerful open source tools, Squash and Telepresence, which allow you to use your local IDE debugger features when debugging the Kubernetes environment, stopping your laptop from melting down when running Minicube.
Squash builds a bridge between some of the populars IDEs and debuggers (see the full list here) and uses a sidecar approach to deploy its client on every Kubernetes node (the authors claim very low performance and resource consumption overhead). This allows you use all the powerful features of the local debugger such as live debugging, setting breakpoints, stepping through code, viewing the values of variables of interest and modifying them for troubleshooting and more. You can find the thorough guide here.
Telepresence operates quite differently: it runs a service you want to debug locally, while connecting it to a remote Kubernetes cluster, so you can develop/test it locally and use any of your favorite local IDE debuggers seamlessly. A bunch of tutorials, FAQs and docs can be found here.
Unless I missed something (let me know in the comments), that’s what you have in your hands in the mid 2020 when it comes to microservices local debugging. Far from being an ideal world, but it’s much better than just a couple of years ago and it is constantly getting better.
In the next blog post I will discuss the tools and best practices for debugging microservices in the production environment, stay tuned!
Can’t wait for the next blog post? Schedule a Lightrun demo and learn how to debug microservices in production now.