Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

How to implement cartodb_id at CARTOframes

See original GitHub issue

Right now, what we’re using is to set cartodb_id as the index of the CartoDataFrame.

It has some special uses cases where it could be strange. At the following example, we’re returning a CartoDataFrame with a different index.


For me it has the following issues:

  • We’re modifying the index of the DataFrame, the user could have their current workflow based on an index and we’re changing it.

  • The index could even have valuable information. If they, for example, created a special index for access to the data. For example a data


From Pandas’s getting started guide image

  • The user will expect an index of dataframe starts in 0.

  • It gives the sensation we’re not respecting user’s data, we’re being very invasive here


I know lots of time cartodb_id is a pain, but it’s something our platform needs, I’ll make it explicit as we already have at our tables, a CartoDataFrame should have a cartodb_id. It will make our life easier and the users will understand what is happening. If they want to remove the column they will do it.

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:12 (12 by maintainers)

github_iconTop GitHub Comments

jorisvandenbosschecommented, Nov 25, 2019

I agree you ideally shouldn’t change the index silently as is now the case (eg even if you have a index with strings or datetimes, it gets replaced with the integer 1, …n).

The most explicit is probably to include a cartodb_id column when reading from CARTO, as mentioned above. And I suppose you can then also potentially use it if present when writing to CARTO? (if the column doesn’t violate the constraints like no duplicated values).

For the actual (original) index of the DataFrame: the default in most pandas IO methods is indeed to include the index by default as a column (eg to_csv has a index=True/False parameter that defaults to True, to_sql has the same). So doing that here as well would probably be most consistent. We are actually just dealing with this as well in GeoPandas: Currently, the GeoDataFrame.to_file (the function that can write to any file format supported by GDAL) does not write the index of the DataFrame, loosing this information. In that PR we are adding support for writing the index, but with a slight deviation from pandas’ defaults. We only write the index if it is a “non-default” index (basically if it has a name, and potentially also if it has a different dtype), to avoid writing unnecessary data to the file.

alasarrcommented, Nov 27, 2019

Looks good to me! +1 @jorisvandenbossche comment about the index

Read more comments on GitHub >

github_iconTop Results From Across the Web

Introducing CARTOframes: A Python Interface for CARTO
Create data-driven maps with sensible and responsive cartographic defaults; Augment datasets using the Data Observatory; Perform spatial ...
Read more >
CartoDB/cartoframes: CARTO Python package for data scientists
Create interactive maps from pandas DataFrames (CARTO account not required) · Publish interactive maps to CARTO's platform · Write and read pandas DataFrames...
Read more >
CARTOframes — cartoframes 0.10.1 documentation
The easiest way to try out cartoframes is to use the cartoframes example notebooks running in binder:
Read more >
CARTOFrames - Home
CARTOFrames is a Python package for integrating CARTO maps, analysis, and data ... Quick reference guides for learning how to use CARTOFrames features....
Read more >
cartoframes Documentation - Read the Docs
Create customizable, interactive CARTO maps in a Jupyter notebook ... Source code:
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Post

No results found

github_iconTop Related Hashnode Post

No results found