question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Filling country colours by value of column in other DataFrame

See original GitHub issue

Hi,

I’ve just started experimenting with Altair, and am loving it so far. I am, however, struggling somewhat with generating map plots.

Problem

I’m trying to plot a world map of confirmed Covid-19 cases, using data provided by John Hopkins. I want to colour countries by the number of confirmed cases, but have been unsuccessful.

Data

The code to retrieve and clean the data is:

import janitor # pip install pyjanitor
import pandas as pd

CASES_WORLDWIDE = "https://raw.githubusercontent.com/CSSEGISandData/COVID-19/web-data/data/cases_country.csv"

def get_worldwide_cases(url: str = CASES_WORLDWIDE):
    cases = pd.read_csv(url)
    cleaned = (
        cases.clean_names()
        .rename_column("long_", "lon")
        .transform_column("last_update", lambda x: pd.to_datetime(x).normalize())
    )
    return cleaned

world_source = get_worldwide_cases()

The head of the data will look like this:

country_region last_update lat lon confirmed deaths recovered active
0 Australia 2020-03-27 00:00:00 -25 133 3143 13 194 2936
1 Austria 2020-03-27 00:00:00 47.5162 14.5501 7317 58 225 7034
2 Canada 2020-03-27 00:00:00 60.001 -95.001 4046 40 184 0
3 China 2020-03-27 00:00:00 30.5928 114.305 81897 3296 74720 3881
4 Denmark 2020-03-27 00:00:00 56 10 2163 52 57 2054

Desired output

I would like to produce a plot that looks something like this:

covid19

Current output

Since I’ve been unable to reproduce the above plot, I hacked together a workaround that displays the information in a different way. My current output looks like this:

current

Current solution

The code that produces my plot is:

# Get data
import altair as alt
from vega_datasets import data

# Map plot
source = alt.topo_feature(data.world_110m.url, "countries")
base_map = (
    alt.Chart(source)
    .mark_geoshape(fill="white", stroke="gray")
    .properties(width=600, height=300)
    .project("naturalEarth1")
)

points = (
    alt.Chart(world_source)
    .mark_point()
    .encode(
        latitude="lat",
        longitude="lon",
        fill=alt.value("red"),
        size="confirmed:Q",
        stroke=alt.value(None),
     )
)

final_map = (
    (base_map + points)
    .configure_view(strokeWidth=0)
    .configure_mark(opacity=0.5,)
)

I’ve understood that using .transform_lookup() is one possible way to solve this, but all attempts at looking up the confirmed column in world_source have failed so far.

I would greatly appreciate any input!

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:6 (3 by maintainers)

github_iconTop GitHub Comments

2reactions
smu095commented, Mar 27, 2020

Hi @jakevdp, thanks for the reply and useful tips.

You were right, it was just a matter of finding the appropriate ISO 3166-1 codes and performing a lookup on them. This is the (preliminary) result:

Screenshot 2020-03-27 at 20 47 03

In case anyone is interested, this is the code to generate the plot above. It assumes that we have defined a lookup dataset (in my case world_source) with the required ISO 3166-1 codes.

source = alt.topo_feature(data.world_110m.url, "countries")

background = alt.Chart(source).mark_geoshape(fill="white")

foreground = (
    alt.Chart(source)
    .mark_geoshape(stroke="black", strokeWidth=0.15)
    .encode(
        color=alt.Color(
            "sick_per_100k:N", scale=alt.Scale(scheme="lightgreyred"), legend=None,
        ),
        tooltip=[
            alt.Tooltip("country_region:N", title="Country"),
            alt.Tooltip("sick_per_100k:Q", title="Cases pr. 100k"),
        ],
    )
    .transform_lookup(
        lookup="id",
        from_=alt.LookupData(world_source, "id", ["sick_per_100k", "country_region"]),
    )
)

final_map = (
    (background + foreground)
    .configure_view(strokeWidth=0)
    .properties(width=700, height=400)
    .project("naturalEarth1")
)

Thanks again @jakevdp, stay safe.

0reactions
reidjohnsoncommented, Dec 18, 2021

@ThelmaSilva et al., here’s a full working example with the current data:

  country_region last_update lat lon confirmed deaths incident_rate uid
0 Afghanistan 2021-12-17 33.939110 67.709953 157734 7332 405.190655 4
1 Albania 2021-12-17 41.153300 20.168300 205224 3158 7131.28084 8
2 Algeria 2021-12-17 28.033900 1.659600 213745 6171 487.434244 12
3 Andorra 2021-12-17 42.506300 1.521800 20549 134 26595.4831 20
4 Angola 2021-12-17 -11.202700 17.873900 65648 1737 199.742788 24
import altair as alt
import janitor
import pandas as pd
from vega_datasets import data

CASES_WORLDWIDE = "https://raw.githubusercontent.com/CSSEGISandData/COVID-19/web-data/data/cases_country.csv"


def get_worldwide_cases(url: str = CASES_WORLDWIDE):
    cases = pd.read_csv(url)
    cleaned = cases.clean_names(
    ).rename_column(
        "long_", "lon"
    ).transform_column(
        "last_update", lambda x: pd.to_datetime(x).normalize()
    )
    return cleaned


world_source = get_worldwide_cases()

source = alt.topo_feature(data.world_110m.url, "countries")

background = alt.Chart(source).mark_geoshape(fill="white")

foreground = (
    alt.Chart(source).mark_geoshape(
        stroke="black", strokeWidth=0.15
    ).encode(
        color=alt.Color(
            "incident_rate:N", scale=alt.Scale(scheme="lightgreyred"), legend=None,
        ),
        tooltip=[
            alt.Tooltip("country_region:N", title="Country"),
            alt.Tooltip("incident_rate:Q", title="Cases pr. 100k"),
        ],
    ).transform_lookup(
        lookup="id",
        from_=alt.LookupData(world_source, "uid", ["incident_rate", "country_region"]),
    )
)

chart = (
    (background + foreground)
    .configure_view(strokeWidth=0)
    .properties(width=700, height=400)
    .project("naturalEarth1")
)

chart
visualization
Read more comments on GitHub >

github_iconTop Results From Across the Web

Color only specific countries based on values in a different ...
Using all.x will make sure all countries still remain for plotting, and fill in NA where there is no gdp values.
Read more >
Colorizing polygons based on color values in dataframe ...
The column= keyword can be used if you have values in a column which need to be mapped to a color (with a...
Read more >
Set Pandas Conditional Column Based on Values ... - YouTube
In this video, you'll learn how to set a Pandas column values based on values of another column. You'll learn how to set...
Read more >
Package 'data.table'
If the column doesn't exist, it is added, by reference. RHS. A list of replacement values. It is recycled in the usual way...
Read more >
Manipulating data tables with dplyr
When used alone, dataframe dat is inside the select function dat2 <- select(dat, column1) # When ... Tables can be subsetted by rows...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found