question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Dataset extends pandas.DataFrame (or geopandas.GeoDataFrame)

See original GitHub issue

Context

Right now, we have 2 main entities to work with data in CARTOframes: a DataFrame or a Dataset. This situation generates some doubts / troubles / a bit of confusion in some cases, for example:

  • what we should return in the DO enrich method? A DataFrame? A Dataset?
  • same as before in download or upload Dataset methods
  • some people ask: why do I need a new entity? I want to work with pandas, I don’t want to learn a new entity.

Extending DataFrame (or GeoDataFrame)

If Dataset class extends pandas.DataFrame or even better extending from geopandas.GeoDataFrame:

  • all these doubts will be resolved and at the same time
  • we will simplify the understanding of this class

This is an idea, it is not defined at a low level. But it seems as possible. At the end the class Dataset should have:

  • all the DataFrame methods and properties
  • credentials
  • table, query, schema, dataset_info props and methods
  • download / upload
  • delete, exists and is_public
  • is_remote, is_local, compute_geom_type, get_table_column_names, get_table_names, get_query

Some changes that we will need to tackle:

  • _df prop and dataframe method disappear. Also, every internal reference to it should we changed (a reminder: _df can have a geoDataFrame or a DataFrame)
  • methods related to CARTO tables/queries/stuff should be renamed making them more explicit about the context of the action done by the method, for example: delete to delete_table, exists to table_exists, dataset_info to table_info.

We dont have to go crazy, but the context of class changes a bit, so we should reflect it.

About geoDataFrame

As I said before, _df can have a geoDataFrame or a DataFrame depending on the source of the data. Of course, it doesn’t seem as the best of the scenarios. Basically, we are using geoDataframe in 2 cases:

  • if the Dataset is created from a geoJSON or a geoDataFrame
  • when we are going to render data in a map, if the data is not a geoDataFrame, we are changing it.

It could be the moment to move (or not) to a geoDataFrame. We have to take into account some things like:

  • all the methods supported by it: http://geopandas.org/reference.html#geodataframe (for example the plot method should be overwritten)
  • we have some geometry helpers like compute_geom_type that maybe are solved in a different way in geoDataFrame

cc @alasarr @alrocar @andy-esch

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:6 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
alasarrcommented, Aug 16, 2019

Anyways, I think it’s a big change we need to talk carefully. I like the debate, but please don’t change code yet until we carefully discuss it. In the meantime, we will continue with the current implementation of dataset

0reactions
amine-aboufirasscommented, Aug 5, 2020

Hi. I’m not familiar with cartoframes but I’m definitely curious as to how successful you were in subclassing the GeoDataFrame class? I also need to do this for my own projects but am running into serious roadblocks with some of the class methods that generate GeoDataFrames from files or features. Please see my post on SE and my post on the geopandas issues for some more details.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Introduction to GeoPandas
DataFrame , that can store geometry columns and perform spatial operations. The geopandas.GeoSeries , a subclass of pandas.Series , handles the geometries.
Read more >
geopandas.GeoDataFrame
Constructing GeoDataFrame from a pandas DataFrame with a column of WKT geometries: ... Returns the estimated UTM CRS based on the bounds of...
Read more >
Indexing and Selecting Data - GeoPandas
Using the world dataset, we can use this functionality to quickly select all countries whose boundaries extend into the southern hemisphere.
Read more >
Introduction to GeoPandas
The core data structure in GeoPandas is geopandas.GeoDataFrame , a subclass of pandas.DataFrame able to store geometry columns and perform spatial operations.
Read more >
GeoPandas 0.12.2 — GeoPandas 0.12.2+0.gefcb367.dirty ...
GeoPandas is an open source project to make working with geospatial data in python easier. GeoPandas extends the datatypes used by pandas to...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found