logo

Industry

BI

Use case

Real-time Lightrun Logs & Snapshots for incident resolution by developers and Tier 4 Support engineers

Results

A more streamlined, quick-paced incident resolution process for Sisense engineers

Sisense helps businesses go beyond the dashboard, infuse analytics everywhere and empower their customers and employees to act on their data at the right time, every time. The Sisense Fusion platform provides businesses the analytics foundation they need to transform their business inside and out and deliver AI-powered insights. Sisense is unique in that it offers businesses the analytics building blocks they need to accelerate analytics adoption so employees and customers can make better, faster decisions, including: 

  • A high-performing elastic engine to simplify complex data and analysis 
  • A highly extensible platform with best-in-class APIs for limitless customization to build a personalized analytic experience 
  • A cloud-agnostic architecture for scale and flexibility
  • AI out-of-the-box to automate actionable intelligence to anyone when and where they need it

More than 2,000 global companies rely on Sisense to innovate, disrupt markets and drive meaningful change in the world.

The Challenge

Making a manual, repetitive process less cumbersome

Being in the business of BI, Sisense engineers are required to deal with enormous amounts of user data on a daily basis. To handle these large data workloads, the Sisense team has created a microservices-based infrastructure built from the ground up around Kubernetes, fully-scalable and hosted on multiple clouds (AWS, GCP, and Azure). This robust infrastructure is supported also by a large set of in-house, AI-based services that generate insights by leveraging multiple data sources and deep integrations with high-end data warehouses (such as Snowflake for example).

However, even with the best tech money can buy and best-in-class engineers to build on top of it, hyper-growth always comes with a side of complexity. Understanding the current state of the system is inherently tied to understanding the convoluted data structures of the customer information Sisense’s systems were built to untangle.

Put another way, during their troubleshooting efforts Sisense engineers often rely on logs and metrics defined during development – on the left-hand side of the SDLC – in order to understand what’s going on in production – on the right-hand side of the SDLC. 

One of the most costly endeavors during debugging sessions is the addition of more visibility to a running service. This process has been noted in the past at Sisense as repetitive, cumbersome, and as a contributing factor to increased MTTR (Mean Time To Resolve) of customer incidents, due to the reliance on manual hotfixing and outdated tools (like a remote debugger).  Sisense was looking for innovative ways to streamline the debugging process as much as possible.

“All Sisense engineers that used Lightrun’s product found the tool to be valuable, time-saving, and relevant. It gives us a significant advantage when tackling production issues by allowing for the extraction of real-time information from applications, without having to stop the running process first. Lightrun is currently being adopted by both the R&D department and our Tier 4 Support teams.”

Moshe Ben-Yishay, R&D Architect at Sisense

The Solution

Adding on-demand, real-time Logs & Snapshots with Lightrun

As mentioned earlier, understanding the state of the running processes on a production machine traditionally relies on looking at existing application logs or attaching a remote debugger. Both of these options require engineers to stop the running Java application at different points in time during the course of the investigation. 

Piecing together what’s happening inside the application with missing information, and debugging a process breakpoint by breakpoint is – in Sisense’s world – not interactive enough and unbearably slow.

It’s at this point that Lightrun was introduced as part of a PoC (Proof of Concept) into the Sisense toolchain by Moshe Ben Yishay, an R&D architect at the company. Using Lightrun, developers could now gain deep insights into the running Java processes by adding real-time, on-demand Logs and Snapshots. This allowed them to see how the application behaves when the incident occurs on the machine in real-time.

Lightrun offers an intuitive, local-like experience for debugging production services, which increases the velocity in which Sisense’s engineers now handle incidents. Days-long iterations that consisted of:

  1. Figuring out which piece of the code needed extra visibility
  2. Adding the required logs and measurements at the relevant places  
  3. Deploying the release to the production server
  4. For on-prem instances, waiting for the customer to send back the relevant information 
  5. Inspecting the given information, and repeating the process all over again with new questions

Were reduced to sessions lasting less than an hour with Lightrun, that follows a simple 2-step process:

  1. Add relevant logs, metrics, and traces to your code
  2. See all information immediately inside your IDE or APM (and even Slack!)

The Results

2-3 Days’ worth of debugging efforts reduced to under an hour

Using the information provided by Lightrun to debug in a fast, iterative fashion, Sisense developers can now get to the root cause of production issues much faster. 

During the postmortem of one specific incident, the value of adding Lightrun Actions (logs, snapshots & metrics) and then piping the results back to each developer’s IDE was especially clear. 

In this particular case, Sisense engineers had to reproduce the issue at hand in a machine in their private cloud. Even after reproducing the problem, though, in order to get any extra piece of information they had to redeploy the relevant application to the machine with more logs – a long, arduous process for what is supposed to be a fast-paced investigation.

Using Ligthrun, they were able to reduce a process that used to take 2-3 days (among all the re-deployments and context switches) to one that takes less than an hour. By enabling a real-time debugging experience,  Lightrun allowed the engineers to get the same valuable information without all the costly, cumbersome instrumentation previously required.

In fact, the Sisense team decided to extend the Lightrun PoC to their Tier 4 Support Engineers as well. This proved to be extremely useful in the analysis of underlying issues for Sisense’s first responders, empowering them to act faster and with more granular information.  Lightrun is now one of the suggested tools in the company’s incident response runbooks and proves to significantly increase developer productivity and reduce MTTR.