question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[RFC][DEBUG]Support a debug framework for TVM Runtime

See original GitHub issue

OBJECTIVE Support a debugging tool for TVM’s computation graphs which helps to access internal graph structures, ops, input and output values at TVM runtime.

In TVM’s current computation-graph framework, computation after graph construction happens as part of Python function(graphruntime.run). Basic Python debugging tools such as pdb cannot be used to debug graphruntime.run because TVM’s graph execution happens in the underlying C++ layer. C++ debugging tools such as gdb are not ideal either, because of their inability to recognise and organise the stack frames and variables in a way relevant to TVM’s operations, tensors and other graph constructs.

Runtime debug will fulfil the below objectives.

  • Easy access enabling debug by setting a variable while creating graphruntime.
  • Inspection of runtime ops output values and node connections

TODOs

  • Show fused graph summary
  • Perform debug run and show node details including inputs & outputs tensors
  • Provide flexibility to run without debug
  • Call graph run-time n times from UI
  • Support check for NAN during computation and break
  • Support check for INF during computation and break
  • Support for step debug(debug step by step over the graph nodes)
  • Inject a specific graph node value as numpy array through CLI and re-run the dependent nodes explicitly
  • Inject a graph node value from dump file through CLI
  • Support dumping of node outputs to a file
  • Support comparison of node output with a dump output
  • Support profiler for performance debugging
  • Test framework for tvmdbg

Proposed API Changes tvm.contrib.graph_runtime.create add a new Boolean flag debug to make the runtime debug-gable, this API will be exposed to user to enable or disable debug functionality. In class GraphModule two members debug and dbgobj are added. debug flag will store whether the debug for this is enabled or not and dbgobj holds the object of debugruntime(including the ui framework)

tvm.contrib.graph_runtime.set_inputs is modified to pass the inputs data set from script to the debugruntime if the debug flag is enabled.

tvm.contrib.graph_runtime.run is modified to invoke the _debug_cli_run which will bring up the ncurses framework. ncurses framework will wait for actual user-input for the run operation. once user gives the input, will invoke the runtime.GraphRuntime.DebugRun() in graph_runtime.cc if user select to run with debug. Otherwise usual runtime.GraphRuntime.Run() in graph_runtime.cc is invoked. ‘DebugRun’ can execute a specific node only if all the inputs are ready. c_runtime_api.h is modified to add new struct to hold the output information.

/*!
 * \brief A Device context for Tensor and operator.
 */
typedef struct {
  /*! \brief DL Tensor to collect the output. */
  DLTensor out_tensor;
  /*! \brief The timestamp of each output */
  int64_t time_stamp;
} TVMDbgTensor;

tvm.contrib.graph_runtime.set_debug_buffers this new api is introduced to collect the run output of each node. In GraphRuntime a new field std::vector<TVMDbgTensor*> debug_buffers_; is introduced to store the pointers of output buffers.

After each operation execution is completed runtime.GraphRuntime.DebugRun() the output is copied to the debug buffer and the outputs are dumped to a temporary directory. UI framework will read this outputs from the temporary directory and will show in the display.

tvm.contrib.graph_runtime.inject_value used to inject a node tensor value during the execution Stepper functionality is supported to run each node by node. Stepper will be invoked with ‘invoke_stepper’ from ‘tvm.tools.debug.wrapper.ui_framework’ based on the user run option. invoke_stepper in tvm.tools.debug.wrapper.ui_wrapper create DebugStepper class (in tvm.tools.debug.ui.ui_stepper) for Stepper UI and handlers. tvm.tools.debug.runtime.debug_runtime uses tvm.contrib.graph_runtime to create below stepper interfaces:

  • step: perform the step by step execution from the current node
  • goto: Specify the node to be executed next, step will continue from the this next node
  • inject_value: used to inject a node tensor value during the execution

A wrapper interfaces layer will be created in tvm.tools.debug.wrapper.ui_wrapper for the above interfaces. Based on DebugStepper user events, stepper runtime interfaces will be called through tvm.tools.debug.wrapper.ui_wrapper

TVMDBG profiler can be used for profiling the model based on TVM kernels. The objective is to provide the execution time of each graph node and map its source in the TVM kernels. This can be used to identify the time consuming nodes and analyse its kernel source. This helps identify the areas to be analyse more to optimise.

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:13 (13 by maintainers)

github_iconTop GitHub Comments

3reactions
tqchencommented, Jun 22, 2018

First of all, it could be indeed helpful to introduce profiling mode into the runtime. There are three major technical issues that I would like to see being addressed

Zero Overhead

We should design the debug runtime/ or profiler to be zero cost, this means that we should not worry about it when it is switched off. That would likely mean we would need a common implementation with two subclasses of graph runtime, and the debugger only linked when the debug is switched on.

Ideally, the debugger/profiler should not introduce changes to data structures and keeps everything internal.

Clear Log Data Schema and Separation of Logger and UX

It is important to have a clear separation between UX and the data logging, in this case, a data schema of the log data being generated is extremely important, as we may want to switch UX and make it de-coupled from the logger itself.

Choice of UX

When possible, is it OK to reuse the existing UX frameworks, for example, logging data into tensor board format and reuse the tensorboard’s infrastructure, which seems to be better than the current one? Of course this design choice can be deferred as long as there is a clear data schema that does the de-coupling

0reactions
tqchencommented, Oct 4, 2018

The first debugger version has been merged in #1378, with profiling statistics for each layer

Read more comments on GitHub >

github_iconTop Results From Across the Web

[RFC][DEBUG]Support a debug framework for TVM Runtime
Support a debugging tool for TVM's computation graphs which helps to access internal graph structures, ops, input and output values at TVM ......
Read more >
How to debug TVMRuntime - Questions - Apache TVM Discuss
Hi everyone I want to inspect TVMRuntime. I have modified config.mk to “DEBUG = 1”. [so] Then try to debug in gdb by...
Read more >
Debugger — tvm 0.11.dev0 documentation
TVM Debugger is an interface for debugging TVM's computation graph ... It helps to provide access to graph structures and tensor values at...
Read more >
TVM Runtime System — tvm 0.11.dev0 documentation
Debug : define a function in python and call that from a compiled function. Link: write driver code to call device specific code...
Read more >
Debugging TVM — tvm 0.11.dev0 documentation - Apache TVM
TVM provides a verbose-logging facility that allows you to commit trace-level debugging messages without impacting the binary size or runtime of TVM in ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found