question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Implement `resource.analyze` function and CLI command

See original GitHub issue

Overview

https://frictionlessdata.slack.com/archives/C0369HZ2SLT/p1651844750785019

Is there any tooling around that does more than describe to analyse the data. For example tools that would give you distributions for number fields, most common word statistics for text fields, distinct counts, counts for fields that are not blank, most common categories used, and many others? Essentially statistics that are currently not in the tablular data package resource specification? Also could help detect more kinds of data formats. As example of some of this could be done by pandas describe function https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.describe.html, however there is a lot more potential than that. Also is it ok to add suche extra statistics to resouces without causing validation errors? I am thinking such tooling would be interesting as it could give greater insight into the data before the need to analyse it.

Plan

  • @shashigharti @aivuk We need to brainstorm the analytics output format and contents. We can probably use the Stats class (resource.stats) as a target (alternatively, might be a validate/report part)
  • implement resource.analyze
  • implement package.analyze (reusing the above)
  • expose in the CLI (reusing above)

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:16 (16 by maintainers)

github_iconTop GitHub Comments

2reactions
fjuniorrcommented, Jun 2, 2022
  • We need to brainstorm the analytics output format and contents. We can probably use the Stats class (resource.stats) as a target (alternatively, might be a validate/report part)

It would be great if alongside the implementation on frictionless-py at least a pattern is added to the specs to allow the creation of data resources with descriptive statistics generated through other tools (such as frictionless-r) but still consistent. Maybe push this discussion https://github.com/frictionlessdata/specs/issues/364 forward?

Also, since you mentioned pandas, pandas-profiling might be worth taking a look (at least for inspiration).

1reaction
rollcommented, Jun 29, 2022

@shashigharti We don’t need actions for now just resource.analyze and package.analyze (it will be to hard to merge into v5 if work now on actions)

Read more comments on GitHub >

github_iconTop Results From Across the Web

Tips for using the Azure CLI successfully - Microsoft Learn
Azure CLI is a command-line tool that allows you to configure and manage Azure resources from many shell environments.
Read more >
IBM Cloud Functions - CLI
Perform a list command to display all entities in the current targeted namespace. $. ibmcloud fn list.
Read more >
Using Lambda with the AWS CLI - AWS Documentation
You can use the AWS Command Line Interface to manage functions and other AWS Lambda resources. The AWS CLI uses the AWS SDK...
Read more >
Analyzing databases with the CodeQL CLI - GitHub
Reports the results of any diagnostic and summary queries to standard output. You can analyze a database by running the following command: codeql...
Read more >
Using the Command-Line Interface to Analyze the ... - Intel
command line interface (CLI) provides an extensive set of options with which you can perform almost every task that is possible through the ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found