• Industry Software Development
  • Use case Real-time Lightrun Logs & Snapshots to enhance faster issue resolution – MTTR (Mean Time to Resolution).
  • Results A dev tailored solution for troubleshooting directly from the developer’s IDE, without the need to edit code or produce new builds just to add observability
Share

How Automation Anywhere’s SRE Team Mastered Fast Issue Resolution with Lightrun

Lightrun has revolutionized our approach to observability and debugging. Its dynamic logging, real-time debugging, enhanced security, minimal performance impact, seamless integration, and improved operational efficiency have empowered us to overcome the limitations of static logging levels. As a result, we can deliver more reliable, high-quality solutions to our customers, ensuring their satisfaction and trust in our product. Lightrun is not just a tool; it is a game-changer in our continuous effort to provide exceptional support and maintain robust application performance.

  • Industry Software Development
  • Use case Real-time Lightrun Logs & Snapshots to enhance faster issue resolution – MTTR (Mean Time to Resolution).
  • Results A dev tailored solution for troubleshooting directly from the developer’s IDE, without the need to edit code or produce new builds just to add observability

Introduction

Automation Anywhere is a leading company in the field of Robotic Process Automation (RPA). It provides a comprehensive platform for automating business processes using software robots, or “bots,” that can perform a wide range of tasks typically carried out by human workers. These tasks include data entry, transaction processing, and responding to customer queries, among others.

Automation Anywhere helps organizations increase efficiency, reduce operational costs, and improve productivity by automating time-consuming and error-prone tasks, allowing human employees to focus on more strategic and value-added activities.

As an SRE on the Engineering team at Automation Anywhere, my primary focus is resolving customer issues related to our product. Our technology stack includes AWS, GCP, JAVA, and Containers, and we support both on-premises and cloud-only solutions.

Technology Stack and Challenges

Our application has robust debugging and logging capabilities, allowing for logging at all levels. However, in our cloud support environment, we only enable INFO level logging due to various constraints. When customer issues arise, this limitation often poses significant challenges in enabling debug-level logging without impacting performance or security.

Our application boasts robust debugging and logging capabilities, designed to facilitate comprehensive logging at all levels, from INFO to DEBUG and beyond. This versatility is crucial for troubleshooting and maintaining high-quality performance in both on-premises and cloud environments. However, in our cloud support environment, we usually limit logging to the INFO level. These are primarily related to performance, security, and operational efficiency.

Performance Constraints

Enabling debug-level logging in a cloud environment can significantly impact performance. Debug logs are typically more verbose, generating a large volume of data that needs to be processed and stored. This increased data flow can strain system resources, leading to slower response times and potentially degrading the user experience. In a cloud setting, where resources are shared and dynamically allocated, this performance hit can be even more pronounced, affecting not just individual instances but potentially the entire application’s responsiveness.

Security Concerns

Security is another critical factor influencing our logging strategy. Debug logs often contain detailed information about the system’s internal workings, including potentially sensitive data. In a cloud environment, where security threats are more prevalent, enabling extensive logging could expose this sensitive information to unauthorized access or breaches. Therefore, we maintain INFO level logging to balance the need for operational insights with the imperative to protect our data and infrastructure.

Operational Efficiency

Operational efficiency is also a key consideration. Managing and analyzing debug-level logs can be time-consuming and complex, particularly in a large-scale cloud environment. The sheer volume of logs generated at the debug level can overwhelm our monitoring systems and personnel, leading to delays in identifying and resolving issues. This can result in increased operational costs and reduced overall efficiency.

Customer Impact

When customer issues arise, these constraints present significant challenges. The limited logging information available at the INFO level often lacks the detailed context needed to diagnose and resolve complex issues quickly. This can lead to prolonged troubleshooting times, increased customer frustration, and a potential decline in customer satisfaction. Additionally, the inability to dynamically enable more detailed logging without impacting performance or security hampers our ability to provide swift and effective support.

Without the ability to dynamically enable these logs, we may have to resort to more disruptive and time-consuming methods, such as replicating the issue in a staging environment or implementing temporary logging changes, which could still miss the mark due to differences between environments.

In summary, while our application’s debugging and logging capabilities are inherently robust, the constraints of our cloud support environment necessitate a more limited approach. This creates a challenging dynamic where the need for detailed insights must be balanced against performance, security, and operational efficiency. Addressing these challenges is crucial for maintaining high levels of customer support and satisfaction.

Lightrun Advantages

This powerful observability platform has transformed our debugging process, providing a secure, restricted, dynamic and flexible approach to logging and troubleshooting that directly addresses our previous constraints.

Dynamic Logging

One of Lightrun’s standout features is its ability to add logs dynamically at runtime. Unlike traditional logging mechanisms that require pre-defined log levels and configurations, Lightrun allows us to inject new log statements into live applications without the need for redeployment or restarts. This means we can elevate our logging detail on-demand, shifting from INFO to DEBUG or any other level of granularity as required, all while the application continues to run uninterrupted. This real-time logging capability is invaluable for quickly isolating issues in production environments, where changes and redeployments are often impractical or risky.

Real-Time/Live Debugging

Beyond dynamic logging, Lightrun offers real-time debugging tools that further enhance our ability to diagnose and resolve issues. We can set breakpoints (snapshots), capture stack traces, and evaluate expressions on live applications, all without affecting their performance or availability. This real-time interaction provides deep insights into the application’s behavior, allowing us to understand the exact state of the system at the moment an issue occurs. By enabling these capabilities directly within the runtime environment, Lightrun eliminates the guesswork and delays associated with traditional debugging methods.

Enhanced Enterprise Grade Security

Lightrun’s design also addresses our security concerns through features like SSO, RBAC, SCIM, and more. The platform ensures that sensitive data remains protected by providing fine-grained control over what information is logged and how it is accessed (through the PII Redaction). We can selectively log specific data points relevant to the issue at hand, minimizing the exposure of potentially sensitive information. Additionally, Lightrun’s integration with existing security and access control frameworks ensures that only authorized personnel can add or modify logs, maintaining the integrity and confidentiality of our application data.

Reduced Performance Impact

A major benefit of Lightrun is its minimal performance overhead. Traditional debug logging can significantly impact application performance, especially in high-load environments. Lightrun, however, is optimized to perform efficiently, ensuring that the addition of new logs or debug points has a negligible effect on the application’s performance. This allows us to gather detailed insights without compromising the user experience or system responsiveness.

Seamless Integration

Lightrun seamlessly integrates with our existing technology stack, including AWS, GCP, Java, and Containers. Its compatibility with our infrastructure means that we can deploy it across our entire environment without the need for extensive modifications or additional tools. This seamless integration extends to our CI/CD pipelines, where Lightrun can be used to enhance our automated testing and continuous monitoring efforts, providing early detection and resolution of issues before they reach production.

Improved Operational Efficiency

By enabling us to narrowly target, conditionally, dynamically adjust logging levels and perform real-time debugging, Lightrun significantly improves our operational efficiency. We can respond to customer issues more swiftly and effectively, reducing the mean time to resolution (MTTR) and enhancing overall customer satisfaction. This proactive approach to issue resolution also helps in preventing minor issues from escalating into major problems, contributing to a more stable and reliable application environment.

Summary

Lightrun has revolutionized our approach to observability and debugging. Its dynamic logging, real-time debugging, enhanced security, minimal performance impact, seamless integration, and improved operational efficiency have empowered us to overcome the limitations of static logging levels. As a result, we can deliver more reliable, high-quality solutions to our customers, ensuring their satisfaction and trust in our product. Lightrun is not just a tool; it is a game-changer in our continuous effort to provide exceptional support and maintain robust application performance.

This post was originally published on LinkedIn by Gowtham Arjunan

Share

It’s Really not that Complicated.

You can actually understand what’s going on inside your live applications.

Try Lightrun’s Playground

Lets Talk!

Looking for more information about Lightrun and debugging?
We’d love to hear from you!
Drop us a line and we’ll get back to you shortly.

By clicking Submit I agree to Lightrun’s Terms of Use.
Processing will be done in accordance to Lightrun’s Privacy Policy.