question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Sometimes a user may want to look at logs of completed workers/clusters. Right now all log handling is backend specific - users need to be familiar with the cluster backend and the particularities of how logs are handled for that backend. For example, YARN logs are stored to HDFS and can be accessed with the yarn cli tool.

It may be useful for dask-gateway to provide a LogProvider class that different log backends could implement. This might look like:

class LogProvider(LoggingConfigurable):
    def get_logs_for_cluster(self, cluster_name, cluster_state):
        """Get the logs for a completed cluster

        Parameters
        ----------
        cluster_name : str
            The cluster name.
        cluster_state : dict
            Any backend-specific information (e.g. application id, pod name, ...)

        Returns
        -------
        logs : dict[str, str]
            A mapping from job id to logs for that job.
        """

    def get_logs_for_worker(self, cluster_name, cluster_state, worker_name, worker_state):
        """get the logs for a completed worker"""

I’d prefer dask-gateway doesn’t manage the storage of these logs (although we could if needed), rather it should be an abstraction around accessing the logs wherever they’re being held by some other service/convention.

Possible implementations for our cluster backends:

  • YARN: this is hard as YARN has no java api for this, but we could hack something up
  • Jobqueue: filesystem backed, logs could be stored in ~/dask-gateway-logs per user, or in some directory managed directly by dask-gateway?
  • Kubernetes: I’m not sure? There’s lots of possible services people might use for logs here. Stackdriver perhaps?

Issue Analytics

  • State:open
  • Created 4 years ago
  • Comments:11 (11 by maintainers)

github_iconTop GitHub Comments

3reactions
jcristcommented, Nov 30, 2020

After a year of letting this linger, I’m leaning towards the following:

  • Writing up a generic LogProvider class allowing for alternative implementatons and customization by users as needed
  • Writing a KubernetesLogProvider class that uses the k8s logging api to pull logs from containers
  • Modifying dask-gateway to not delete stopped worker pods immediately, but after some configurable period (not sure what the default should be, perhaps only delete stopped pods on cluster deletion?)
  • Modifying dask-gateway to not delete stopped cluster pods immediately, but after some configurable period (not sure what the default should be)

This lets us hit the usual requirements of debugging “why did my worker/cluster die” without additional complexity of custom logging backends

0reactions
droctothorpecommented, Nov 30, 2020

An in-cluster ELK stack could handle this. Dask Gateway could then query ES and/or provide a link to the relevant query in Kibana. Tons of added complexity though. It might be best to disable it by default but make it available for users willing to incur the operational overhead.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Please log in
Please log in. Username: Forgot Username? Password: Forgot Password? Login. Copyright C 2009 - 2022 Lodging Solutions, LLC. All Rights Reserved.
Read more >
$logProvider
$logProvider. - $log; - provider in module ng. Contents ... Overview. Use the $logProvider to configure how the application logs messages ...
Read more >
Logging in .NET Core and ASP.NET Core
Logging providers store logs, except for the Console provider ... ASP.NET Core and application code use the same logging API and providers.
Read more >
Logs Provider
faas-netes is the Kubernetes provider and queries logs from the Kubernetes API · faasd queries the logs from the journal, stored by functions...
Read more >
Fundamentals of Logging in .NET Core
Learn the fundamentals of logging in .NET Core application. Learn what is Logging API and Logging Providers and how to implement logging in...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found