How to implement cartodb_id at CARTOframes
See original GitHub issueRight now, what we’re using is to set cartodb_id as the index of the CartoDataFrame.
It has some special uses cases where it could be strange. At the following example, we’re returning a CartoDataFrame with a different index.
For me it has the following issues:
-
We’re modifying the index of the DataFrame, the user could have their current workflow based on an index and we’re changing it.
-
The index could even have valuable information. If they, for example, created a special index for access to the data. For example a data
From Pandas’s getting started guide
-
The user will expect an index of dataframe starts in 0.
-
It gives the sensation we’re not respecting user’s data, we’re being very invasive here
Solution:
I know lots of time cartodb_id is a pain, but it’s something our platform needs, I’ll make it explicit as we already have at our tables, a CartoDataFrame should have a cartodb_id
. It will make our life easier and the users will understand what is happening. If they want to remove the column they will do it.
Issue Analytics
- State:
- Created 4 years ago
- Comments:12 (12 by maintainers)
Top GitHub Comments
I agree you ideally shouldn’t change the index silently as is now the case (eg even if you have a index with strings or datetimes, it gets replaced with the integer 1, …n).
The most explicit is probably to include a
cartodb_id
column when reading from CARTO, as mentioned above. And I suppose you can then also potentially use it if present when writing to CARTO? (if the column doesn’t violate the constraints like no duplicated values).For the actual (original) index of the DataFrame: the default in most pandas IO methods is indeed to include the index by default as a column (eg
to_csv
has aindex=True/False
parameter that defaults to True,to_sql
has the same). So doing that here as well would probably be most consistent. We are actually just dealing with this as well in GeoPandas: https://github.com/geopandas/geopandas/pull/1059. Currently, theGeoDataFrame.to_file
(the function that can write to any file format supported by GDAL) does not write the index of the DataFrame, loosing this information. In that PR we are adding support for writing the index, but with a slight deviation from pandas’ defaults. We only write the index if it is a “non-default” index (basically if it has a name, and potentially also if it has a different dtype), to avoid writing unnecessary data to the file.Looks good to me! +1 @jorisvandenbossche comment about the index