question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

MaxRowsError for pandas.df with > 5000 rows

See original GitHub issue

Hey,

Thanks for the package, I’m very keen to try it out on my own data. When trying to create a simple histogram with my own data, VegaLite fails on dataframes with more than 5000 rows. Here’s a minimal reproducible example:

import altair as alt
import os
import pandas as pd
import numpy as np

lengths = np.random.randint(0,2000,6000)
lengths_list = lengths.tolist()
labels = [str(i) for i in lengths_list]
peak_lengths = pd.DataFrame.from_dict({'coords': labels, 'length': lengths_list},orient='columns')
alt.Chart(peak_lengths).mark_bar().encode(alt.X('lengths:Q', bin=True),y='count(*):Q')

Here’s the error:

---------------------------------------------------------------------------
MaxRowsError                              Traceback (most recent call last)
~/anaconda/envs/py3/lib/python3.5/site-packages/altair/vegalite/v2/api.py in to_dict(self, *args, **kwargs)
    259         copy = self.copy()
    260         original_data = getattr(copy, 'data', Undefined)
--> 261         copy._prepare_data()
    262 
    263         # We make use of two context markers:

~/anaconda/envs/py3/lib/python3.5/site-packages/altair/vegalite/v2/api.py in _prepare_data(self)
    251             pass
    252         elif isinstance(self.data, pd.DataFrame):
--> 253             self.data = pipe(self.data, data_transformers.get())
    254         elif isinstance(self.data, six.string_types):
    255             self.data = core.UrlData(self.data)

~/anaconda/envs/py3/lib/python3.5/site-packages/toolz/functoolz.py in pipe(data, *funcs)
    550     """
    551     for func in funcs:
--> 552         data = func(data)
    553     return data
    554 

~/anaconda/envs/py3/lib/python3.5/site-packages/toolz/functoolz.py in __call__(self, *args, **kwargs)
    281     def __call__(self, *args, **kwargs):
    282         try:
--> 283             return self._partial(*args, **kwargs)
    284         except TypeError as exc:
    285             if self._should_curry(args, kwargs, exc):

~/anaconda/envs/py3/lib/python3.5/site-packages/altair/vegalite/data.py in default_data_transformer(data)
    122 @curry
    123 def default_data_transformer(data):
--> 124     return pipe(data, limit_rows, to_values)
    125 
    126 

~/anaconda/envs/py3/lib/python3.5/site-packages/toolz/functoolz.py in pipe(data, *funcs)
    550     """
    551     for func in funcs:
--> 552         data = func(data)
    553     return data
    554 

~/anaconda/envs/py3/lib/python3.5/site-packages/toolz/functoolz.py in __call__(self, *args, **kwargs)
    281     def __call__(self, *args, **kwargs):
    282         try:
--> 283             return self._partial(*args, **kwargs)
    284         except TypeError as exc:
    285             if self._should_curry(args, kwargs, exc):

~/anaconda/envs/py3/lib/python3.5/site-packages/altair/vegalite/data.py in limit_rows(data, max_rows)
     47             return data
     48     if len(values) > max_rows:
---> 49         raise MaxRowsError('The number of rows in your dataset is greater than the max of {}'.format(max_rows))
     50     return data
     51 

MaxRowsError: The number of rows in your dataset is greater than the max of 5000

A quick issues search didn’t turn up any hits for MaxRowsError. There is a related issue (#287), but this was a data set with > 300k rows, and I have a measly 35k. Also, the FAQ link referenced in that issue now turns up a 404. For the meantime, does the advice in #249 still apply?

Package info: Running on Altair 2.0.0rc1, JupyterLab 0.31.12-py35_1 conda-forge

Issue Analytics

  • State:closed
  • Created 6 years ago
  • Comments:39 (21 by maintainers)

github_iconTop GitHub Comments

3reactions
palewirecommented, Mar 23, 2018

+1 to documenting how to workaround these limits and including a link to the docs in the error message. I think that could go a long way to addressing newbie frustrations.

2reactions
jakevdpcommented, Mar 28, 2018

I just opened #672, which would allow users to run alt.data_transformers.enable('no_max_rows') and then be able to embed arbitrarily large datasets in the notebook, if that is what they wish to do.

@ellisonbg, do you think we should offer that option?

Read more comments on GitHub >

github_iconTop Results From Across the Web

Frequently Asked Questions — Altair 4.2.0 documentation
MaxRowsError : The number of rows in your dataset is greater than the maximum allowed (5000). For information on how to plot larger...
Read more >
Using Altair on data aggregated from large datasets
I'm learning that vegalite stores the entire underlying data and applies the transformation at run time, but it seems like altair could (and ......
Read more >
How to get first N rows of Pandas DataFrame?
To get the first N rows of a DataFrame in Pandas, use the function DataFrame.head(). You can pass an optional integer that represents...
Read more >
How to get the first N rows in Pandas DataFrame - Data to Fish
You can use df.head() to get the first N rows in Pandas DataFrame. For example, if you need the first 4 rows, then...
Read more >
Visualizing real estate prices with Altair | Haowen's AI blog
MaxRowsError is the first trouble I got! It turns out that by default Altair only allows you to plot a dataset with a...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found