question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Add dataset loading capabilities

See original GitHub issue

From what I gather, there’s no go-to utility to load a matrix / dataset from disk without resorting to the fromJSON function of DenseMatrix and SparseMatrix. It would be great if we had a clean dataset loading interface supporting .csvs, the MatrixMarket format (very popular in R), and others alike.

Thanks! I would be happy to discuss this further and/or lend y’all a hand

Issue Analytics

  • State:open
  • Created 4 years ago
  • Comments:7 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
josdejongcommented, Feb 8, 2020

Thanks for your offer 👍

Before implementing anything I think it will be good to write out the various use cases we see. That way we can see what kind of API makes sense. What pops in my mind:

  • fromCSV and toCSV functionality.
  • Is CSV format the same as MatrixMarket data type or do we need a separate fromMTX and toMTX?
  • the functions can import/export data as string and as a ReadableStream (that’s great for .

We should be careful not to reinvent the wheel here, and embrace API’s and libraries that are already out there. The idea for a “simple” fromCSV(src) may turn out not to be so simple: you need to built in support for nodejs and the browser, people will need to be able to send authentication headers and CORS headers etc etc.

It would be nice if we can allow people to use their rest client of choice and embrace that instead of creating a full blown rest client ourselves. So usage can look like:

// in the browser (user can add credentials, tokens, headers, whatever):
const dataString1 = await fetch(url).text() // <-- people use their loading mechanism of choice
const matrix1 = fromCSV(dataString1)

// in the browser, streaming for large amounts of data:
const dataStream2 = fetch(url).body // <-- people use their loading mechanism of choice
const matrix2 = fromCSV(dataStream2)

// in nodejs:
const dataStream3 = fs.createReadStream(file) // <-- people use their loading mechanism of choice
const matrix3 = fromCSV(dataStream3)
0reactions
davidmrdavidcommented, Sep 22, 2020

@danielruss , I actually never got around it due to my year getting a little crazy. If you want to give it a stab, I’d say go ahead 😃

Read more comments on GitHub >

github_iconTop Results From Across the Web

Create a dataset loading script - Hugging Face
The first step is to add some information, or attributes, about your dataset in DatasetBuilder._info() . The most important attributes you should specify...
Read more >
About loading data into existing feature classes and tables
You can load data into existing feature classes and tables using either the Object Loader or the Simple Data Loader. This topic compares...
Read more >
17.1 Creating Applications with Data Loading Capability
Create applications with data loading capability to enable end users to dynamically import data into a table within any schema to which the...
Read more >
Writing custom datasets - TensorFlow
Add an entry for your dataset into DATASET_EXTRAS in setup.py . This makes it so that users can do, for example, pip install...
Read more >
7. Dataset loading utilities — scikit-learn 1.2.0 documentation
The sklearn.datasets package embeds some small toy datasets as introduced in the Getting Started section. This package also features helpers to fetch larger ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found