Dataset extends pandas.DataFrame (or geopandas.GeoDataFrame)
See original GitHub issueContext
Right now, we have 2 main entities to work with data in CARTOframes: a DataFrame or a Dataset. This situation generates some doubts / troubles / a bit of confusion in some cases, for example:
- what we should return in the DO enrich method? A DataFrame? A Dataset?
- same as before in download or upload Dataset methods
- some people ask: why do I need a new entity? I want to work with pandas, I don’t want to learn a new entity.
Extending DataFrame (or GeoDataFrame)
If Dataset class extends pandas.DataFrame or even better extending from geopandas.GeoDataFrame:
- all these doubts will be resolved and at the same time
- we will simplify the understanding of this class
This is an idea, it is not defined at a low level. But it seems as possible. At the end the class Dataset should have:
- all the DataFrame methods and properties
- credentials
table
,query
,schema
,dataset_info
props and methods- download / upload
- delete, exists and is_public
- is_remote, is_local, compute_geom_type, get_table_column_names, get_table_names, get_query
Some changes that we will need to tackle:
_df
prop anddataframe
method disappear. Also, every internal reference to it should we changed (a reminder:_df
can have a geoDataFrame or a DataFrame)- methods related to CARTO tables/queries/stuff should be renamed making them more explicit about the context of the action done by the method, for example:
delete
todelete_table
,exists
totable_exists
,dataset_info
totable_info
.
We dont have to go crazy, but the context of class changes a bit, so we should reflect it.
About geoDataFrame
As I said before, _df
can have a geoDataFrame or a DataFrame depending on the source of the data. Of course, it doesn’t seem as the best of the scenarios. Basically, we are using geoDataframe in 2 cases:
- if the Dataset is created from a geoJSON or a geoDataFrame
- when we are going to render data in a map, if the data is not a geoDataFrame, we are changing it.
It could be the moment to move (or not) to a geoDataFrame. We have to take into account some things like:
- all the methods supported by it: http://geopandas.org/reference.html#geodataframe (for example the
plot
method should be overwritten) - we have some geometry helpers like
compute_geom_type
that maybe are solved in a different way in geoDataFrame
Issue Analytics
- State:
- Created 4 years ago
- Comments:6 (5 by maintainers)
Top GitHub Comments
Anyways, I think it’s a big change we need to talk carefully. I like the debate, but please don’t change code yet until we carefully discuss it. In the meantime, we will continue with the current implementation of dataset
Hi. I’m not familiar with
cartoframes
but I’m definitely curious as to how successful you were in subclassing theGeoDataFrame
class? I also need to do this for my own projects but am running into serious roadblocks with some of the class methods that generate GeoDataFrames from files or features. Please see my post on SE and my post on the geopandas issues for some more details.