Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

How to implement cartodb_id at CARTOframes

See original GitHub issue

Right now, what we’re using is to set cartodb_id as the index of the CartoDataFrame.

It has some special uses cases where it could be strange. At the following example, we’re returning a CartoDataFrame with a different index.

For me it has the following issues:

We’re modifying the index of the DataFrame, the user could have their current workflow based on an index and we’re changing it.
The index could even have valuable information. If they, for example, created a special index for access to the data. For example a data

From Pandas’s getting started guide

The user will expect an index of dataframe starts in 0.
It gives the sensation we’re not respecting user’s data, we’re being very invasive here

Solution:

I know lots of time cartodb_id is a pain, but it’s something our platform needs, I’ll make it explicit as we already have at our tables, a CartoDataFrame should have a cartodb_id. It will make our life easier and the users will understand what is happening. If they want to remove the column they will do it.

Issue Analytics

State:
Created 4 years ago
Comments:12 (12 by maintainers)

Top GitHub Comments

2reactions

jorisvandenbosschecommented, Nov 25, 2019

I agree you ideally shouldn’t change the index silently as is now the case (eg even if you have a index with strings or datetimes, it gets replaced with the integer 1, …n).

The most explicit is probably to include a cartodb_id column when reading from CARTO, as mentioned above. And I suppose you can then also potentially use it if present when writing to CARTO? (if the column doesn’t violate the constraints like no duplicated values).

For the actual (original) index of the DataFrame: the default in most pandas IO methods is indeed to include the index by default as a column (eg to_csv has a index=True/False parameter that defaults to True, to_sql has the same). So doing that here as well would probably be most consistent. We are actually just dealing with this as well in GeoPandas: https://github.com/geopandas/geopandas/pull/1059. Currently, the GeoDataFrame.to_file (the function that can write to any file format supported by GDAL) does not write the index of the DataFrame, loosing this information. In that PR we are adding support for writing the index, but with a slight deviation from pandas’ defaults. We only write the index if it is a “non-default” index (basically if it has a name, and potentially also if it has a different dtype), to avoid writing unnecessary data to the file.

1reaction

alasarrcommented, Nov 27, 2019

Looks good to me! +1 @jorisvandenbossche comment about the index

Top Results From Across the Web

Introducing CARTOframes: A Python Interface for CARTO

Create data-driven maps with sensible and responsive cartographic defaults; Augment datasets using the Data Observatory; Perform spatial ...

CartoDB/cartoframes: CARTO Python package for data scientists

Create interactive maps from pandas DataFrames (CARTO account not required) · Publish interactive maps to CARTO's platform · Write and read pandas DataFrames...

CARTOframes — cartoframes 0.10.1 documentation

The easiest way to try out cartoframes is to use the cartoframes example notebooks running in binder: https://mybinder.org/v2/gh/CartoDB/cartoframes/master?

CARTOFrames - Home

CARTOFrames is a Python package for integrating CARTO maps, analysis, and data ... Quick reference guides for learning how to use CARTOFrames features....

cartoframes Documentation - Read the Docs

Create customizable, interactive CARTO maps in a Jupyter notebook ... Source code: https://github.com/CartoDB/cartoframes.