question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Tracker for Debuggability Improvement

See original GitHub issue

🚀 Key Features

This is a tracker issue for the improvement for debuggability.

graph TD;
DP1-->DP2;
DP2-->DP3;
DP2-->DP4;
DP3-->DP5;
DP4-->DP6;
DP5-->DP6;
DP6-->output;

can be printed out as the following

>>> print_graph(traverse(dp6))
DP1 -> DP2 -> DP3 -> DP5
         \             \
         DP4 --------> DP6 ->

Nice to Haves

This section is tracking potential features that we may want

  • Handling mixed usage of IterDataPipe, MapDataPipe, torcharrow DataFrame
    • Are users able to clearly differentiate these when they are using a mixture of these classes?
  • Connect profiling result with graph
    • Different colors for nodes based on their performance (similar to TensorBoard)

Motivation, pitch

This would help our users and developers to easily understand what’s going on with the pipeline. Feel free to post more request for debuggability.

Alternatives

No response

Additional context

No response

cc: @NivekT @VitalyFedyunin

Issue Analytics

  • State:open
  • Created 2 years ago
  • Comments:9 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
pmeiercommented, Mar 18, 2022

One thing that would also simplify debugging, is for each datapipe to have a __repr__. We can follow the approach nn.Module is going:

class IterDataPipe:
    def extra_repr(self) -> str:
        return ""

    def __repr__(self) -> str:
        return f"{type(self).__name__.replace('IterDataPipe', '')}({self.extra_repr()})"


class MinimalIterDataPipe(IterDataPipe):
    pass


def my_map(x):
    return x


class MapperIterDataPipe(IterDataPipe):
    def __init__(self, fn):
        self.fn = fn

    def extra_repr(self):
        return self.fn.__name__


print(MinimalIterDataPipe())
print(MapperIterDataPipe(my_map))
Minimal()
Mapper(my_map)

That would also improve the graph visualization from #299, since each node could contain the __repr__ of the datapipe rather than just its name.

1reaction
pmeiercommented, Mar 16, 2022

Regarding graph visualization: I needed this today and hacked something together:

from __future__ import annotations

import dataclasses
from typing import Optional, Any

import matplotlib.pyplot as plt
import networkx as nx
from torch.utils.data.graph import traverse


@dataclasses.dataclass(repr=False)
class Node:
    obj: Any
    child: Optional[Node] = None

    def __repr__(self):
        return type(self.obj).__name__

    def __hash__(self):
        return hash(self.obj)


def scan(graph, child=None):
    for node, parents in graph.items():
        current = Node(node, child)
        yield current
        yield from scan(parents, child=current)


def visualize_graph(dp):
    G = nx.DiGraph()
    for node in set(scan(traverse(dp))):
        if node.child is not None:
            G.add_edge(node, node.child)
    nx.draw_networkx(G)
    plt.show()

Simple example:

from torchdata.datapipes.iter import FileLister, FileOpener

dp = FileLister()
dp = FileOpener(dp).filter(bool).map(list)

visualize_graph(dp)

sinple

Complex example:

from torchvision.prototype import datasets

dp = datasets.load("coco")

visualize_graph(dp)

coco


I’ve used networkx as backend here, but we can use any graph visualization library. Let me know, if I should prettify the plots and send a PR.

Read more comments on GitHub >

github_iconTop Results From Across the Web

17 Best Bug Tracking Tools: Defect Tracking Tools of 2022
Here is a review of the best Bug Tracking Tools to make your bug management process simpler so you can concentrate on finding...
Read more >
List of Top Debugging Tools 2022 - TrustRadius
Debugging Tools reviews, comparisons, alternatives and pricing. ... Rollbar is a continuous code improvement platform that proactively discovers, predicts, ...
Read more >
GPS Tracker Bug Sweep TSCM Electronic Debugging Service
Call 1 (888) 386-6482 today to speak with a trained TSCM technician. Our bug sweeps professionals will conduct a GPS Tracker Bug Sweep...
Read more >
Tracking Bugs - The Debugging Book
The most basic task of a bug tracker is to report bugs. However, a bug tracker is a bit more general than just...
Read more >
Change Tracker Debugging - EF Core - Microsoft Learn
The Entity Framework Core (EF Core) change tracker generates two kinds ... Title: 'Disassembly improvements for optimized managed debugging' ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found