MaxRowsError for pandas.df with > 5000 rows
See original GitHub issueHey,
Thanks for the package, I’m very keen to try it out on my own data. When trying to create a simple histogram with my own data, VegaLite fails on dataframes with more than 5000 rows. Here’s a minimal reproducible example:
import altair as alt
import os
import pandas as pd
import numpy as np
lengths = np.random.randint(0,2000,6000)
lengths_list = lengths.tolist()
labels = [str(i) for i in lengths_list]
peak_lengths = pd.DataFrame.from_dict({'coords': labels, 'length': lengths_list},orient='columns')
alt.Chart(peak_lengths).mark_bar().encode(alt.X('lengths:Q', bin=True),y='count(*):Q')
Here’s the error:
---------------------------------------------------------------------------
MaxRowsError Traceback (most recent call last)
~/anaconda/envs/py3/lib/python3.5/site-packages/altair/vegalite/v2/api.py in to_dict(self, *args, **kwargs)
259 copy = self.copy()
260 original_data = getattr(copy, 'data', Undefined)
--> 261 copy._prepare_data()
262
263 # We make use of two context markers:
~/anaconda/envs/py3/lib/python3.5/site-packages/altair/vegalite/v2/api.py in _prepare_data(self)
251 pass
252 elif isinstance(self.data, pd.DataFrame):
--> 253 self.data = pipe(self.data, data_transformers.get())
254 elif isinstance(self.data, six.string_types):
255 self.data = core.UrlData(self.data)
~/anaconda/envs/py3/lib/python3.5/site-packages/toolz/functoolz.py in pipe(data, *funcs)
550 """
551 for func in funcs:
--> 552 data = func(data)
553 return data
554
~/anaconda/envs/py3/lib/python3.5/site-packages/toolz/functoolz.py in __call__(self, *args, **kwargs)
281 def __call__(self, *args, **kwargs):
282 try:
--> 283 return self._partial(*args, **kwargs)
284 except TypeError as exc:
285 if self._should_curry(args, kwargs, exc):
~/anaconda/envs/py3/lib/python3.5/site-packages/altair/vegalite/data.py in default_data_transformer(data)
122 @curry
123 def default_data_transformer(data):
--> 124 return pipe(data, limit_rows, to_values)
125
126
~/anaconda/envs/py3/lib/python3.5/site-packages/toolz/functoolz.py in pipe(data, *funcs)
550 """
551 for func in funcs:
--> 552 data = func(data)
553 return data
554
~/anaconda/envs/py3/lib/python3.5/site-packages/toolz/functoolz.py in __call__(self, *args, **kwargs)
281 def __call__(self, *args, **kwargs):
282 try:
--> 283 return self._partial(*args, **kwargs)
284 except TypeError as exc:
285 if self._should_curry(args, kwargs, exc):
~/anaconda/envs/py3/lib/python3.5/site-packages/altair/vegalite/data.py in limit_rows(data, max_rows)
47 return data
48 if len(values) > max_rows:
---> 49 raise MaxRowsError('The number of rows in your dataset is greater than the max of {}'.format(max_rows))
50 return data
51
MaxRowsError: The number of rows in your dataset is greater than the max of 5000
A quick issues search didn’t turn up any hits for MaxRowsError
. There is a related issue (#287), but this was a data set with > 300k rows, and I have a measly 35k. Also, the FAQ link referenced in that issue now turns up a 404. For the meantime, does the advice in #249 still apply?
Package info: Running on Altair 2.0.0rc1
, JupyterLab 0.31.12-py35_1 conda-forge
Issue Analytics
- State:
- Created 6 years ago
- Comments:39 (21 by maintainers)
Top Results From Across the Web
Frequently Asked Questions — Altair 4.2.0 documentation
MaxRowsError : The number of rows in your dataset is greater than the maximum allowed (5000). For information on how to plot larger...
Read more >Using Altair on data aggregated from large datasets
I'm learning that vegalite stores the entire underlying data and applies the transformation at run time, but it seems like altair could (and ......
Read more >How to get first N rows of Pandas DataFrame?
To get the first N rows of a DataFrame in Pandas, use the function DataFrame.head(). You can pass an optional integer that represents...
Read more >How to get the first N rows in Pandas DataFrame - Data to Fish
You can use df.head() to get the first N rows in Pandas DataFrame. For example, if you need the first 4 rows, then...
Read more >Visualizing real estate prices with Altair | Haowen's AI blog
MaxRowsError is the first trouble I got! It turns out that by default Altair only allows you to plot a dataset with a...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
+1 to documenting how to workaround these limits and including a link to the docs in the error message. I think that could go a long way to addressing newbie frustrations.
I just opened #672, which would allow users to run
alt.data_transformers.enable('no_max_rows')
and then be able to embed arbitrarily large datasets in the notebook, if that is what they wish to do.@ellisonbg, do you think we should offer that option?