question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Add converters for tabular data

See original GitHub issue

@ellisonbg mentioned that it would be good to support some default tabular data formats, to convert between them.

For each of these, we should define a data type, and define converters between them. Then we should make sure they work on some test datasets.

Some pipelines that should work after this:

  1. Open CSV files with nteract data viewer, by first converting to JSON table schema
  2. View pandas dataframe output in datagrid, by going from JSON table schema to datagrid model
  3. If we create a Vega Lite spec that refers to a dataset by url like file:///notebooks/Table.ipynb#/cells/4/outputs/0/data/application/vnd.dataresource+json, then this should use the pandas output from that cell in the notebook as an input to the vega spec. Depends on https://github.com/jupyterlab/jupyterlab-data-explorer/issues/20

Issue Analytics

  • State:open
  • Created 4 years ago
  • Comments:5 (2 by maintainers)

github_iconTop GitHub Comments

1reaction
saulshanabrookcommented, Feb 2, 2020

@Nestak2 You have to set pandas.set_option('display.html.table_schema', True) so that it outputs the JSON, like in this examples.

0reactions
westurnercommented, Mar 20, 2020

FWIW,

  • tablib https://tablib.readthedocs.io/en/stable/ supports a bunch of formats: ‘cli, csv, dbf, df (DataFrame), html, jira, json, latex, ods, rst, tsv, xls, xlsx, yaml’
  • Tabulate does [HTML, LaTeX, *] tables from lists of lists, lists of dicts, etc. (without pandas) https://github.com/astanin/python-tabulate . Hoping for these to land in a release soon:
  • odo https://github.com/blaze/odo (2018) does conversion between very many formats:

    Odo migrates data using network of small data conversion functions between type pairs. That network is below: odo conversions

    Each node is a container type (like pandas.DataFrame or sqlalchemy.Table) and each directed edge is a function that transforms or appends one container into or onto another. We annotate these functions/edges with relative costs.

    This network approach allows odo to select the shortest path between any two types (thank you networkx). For performance reasons these functions often leverage non-Pythonic systems like NumPy arrays or native CSV->SQL loading functions. Odo is not dependent on only Python iterators.

  • Ibis https://docs.ibis-project.org/backends.html
    • Impala, BigQuery, HDFS, Spark, SQLAlchemy, Pandas
  • blazingsql https://github.com/BlazingDB/blazingsql is really fast. It reads into the GPU from CSV, TSV, JSON, Apache Parquet, Apache ORC, Apache Hive, GDF (GPU Dataframe), S3, GCS, Apache HDFS: https://docs.blazingdb.com/docs

    BlazingSQL is a GPU accelerated SQL engine built on top of the RAPIDS ecosystem. RAPIDS is based on the Apache Arrow columnar memory format, and cuDF is a GPU DataFrame library for loading, joining, aggregating, filtering, and otherwise manipulating data.

    BlazingSQL is a SQL interface for cuDF, with various features to support large scale data science workflows and enterprise datasets.

  • notes re: “A dataframe protocol for the PyData ecosystem” https://discuss.ossdata.org/t/a-dataframe-protocol-for-the-pydata-ecosystem/267/9

CSVW would be ideal for tabular data (with Linked Data metadata about the dataset and each column). More about this here: “Linked Data formats, tools, challenges, opportunities; CSVW, schema.org/Dataset, schema.org/ScholarlyArticle” https://discuss.ossdata.org/t/linked-data-formats-tools-challenges-opportunities-csvw-schema-org-dataset-schema-org-scholarlyarticle/160

Read more comments on GitHub >

github_iconTop Results From Across the Web

Tabular Data Converter - Rundeck Docs
# Tabular Data Converter ... Parses Tabular text (csv) into a a Java object. The HTML Table View Converter plugin can render this...
Read more >
Convert data between formats with Data Converters
Converting an Excel spreadsheet to a CSV or a JSON table with the Data Converters command line tool is easy. Data Converters is...
Read more >
Referencing complex data using Room - Android Developers
Room provides functionality for converting between primitive and boxed types but doesn't allow for object references between entities.
Read more >
How to: Use a Custom Converter for Export to a DataTable
Subsequently, add an instance of a custom converter to the ... Range; // Create a data table with column names obtained from the...
Read more >
Understanding converters - IBM
Converters carry out the conversion of data from InfoSphere MDM ... You must also add those fields to the Critical Data Elements table...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found