KeyError when using Redshift
See original GitHub issueI’m trying to create line charts and time series charts with aggregates. In this case, I am summing up the column items sales over a date, day
. However I keep getting this error
- [ x] I have checked the superset logs for python stacktraces and included it here as text if any
2018-06-28 12:07:29,604:INFO:root:Database.get_sqla_engine(). Masked URL: redshift+psycopg2://user:XXXXXXXXXX@my-redshift.host.at.amazonaws.com:5439/testdb
2018-06-28 12:07:30,077:DEBUG:root:[stats_logger] (incr) loaded_from_source
2018-06-28 12:07:30,077:ERROR:root:u'SUM(itemsales)'
Traceback (most recent call last):
File "/Users/minhmai/envs/py2/lib/python2.7/site-packages/superset/views/core.py", line 1107, in generate_json
payload = viz_obj.get_payload()
File "/Users/minhmai/envs/py2/lib/python2.7/site-packages/superset/viz.py", line 329, in get_payload
payload['data'] = self.get_data(df)
File "/Users/minhmai/envs/py2/lib/python2.7/site-packages/superset/viz.py", line 580, in get_data
values=values)
File "/Users/minhmai/envs/py2/lib/python2.7/site-packages/pandas/core/frame.py", line 4468, in pivot_table
margins_name=margins_name)
File "/Users/minhmai/envs/py2/lib/python2.7/site-packages/pandas/core/reshape/pivot.py", line 58, in pivot_table
raise KeyError(i)
KeyError: u'SUM(itemsales)'
A bit of digging saw that the column names become lower case when turned into a pandas data frame but the metric name is still capitalized, as shown by my logs above. I’ve set a trace and it’s exactly what I expected
(Pdb) l
585 records=pt.to_dict(orient='index'),
586 columns=list(pt.columns),
587 is_group_by=len(fd.get('groupby')) > 0,
588 )
589 except:
590 -> import pdb; pdb.post_mortem()
591
592
593 class PivotTableViz(BaseViz):
594
595 """A pivot table view, define your rows, columns and metrics"""
(Pdb) values
[u'SUM(itemsales)']
(Pdb) df.head()
__timestamp sum(itemsales)
0 2018-06-15 00:00:00+00:00 0.0
1 2018-06-11 00:00:00+00:00 0.0
2 2018-06-13 00:00:00+00:00 0.0
3 2018-06-09 00:00:00+00:00 0.0
4 2018-06-07 00:00:00+00:00 0.0
(Pdb) self.metrics
[u'SUM(itemsales)']
(Pdb) df.columns
Index([u'__timestamp', u'sum(itemsales)'], dtype='object')
The error occurred at line 578
pt = df.pivot_table(
index=DTTM_ALIAS,
columns=columns,
values=values)
Make sure these boxes are checked before submitting your issue - thank you!
- [ x] I have reproduced the issue with at least the latest released version of superset - [ x] I have checked the issue tracker for the same issue and I haven’t found one similar
Superset version
superset==0.25.6
Expected results
I expect either the metrics to be all lower cased or that the column names of the results dataframe to match the form as the aggregate query
Actual results
The data frame has their column name lower cased and the metrics still retain the formatting.
Steps to reproduce
This is used on test data with a random numeric generator. I have seen this error in every case where I am using the SUM aggregation. The database is on Redshift and I have confirmed that I am using pandas==0.22.0.
I can push a fix to make the metrics lower cased or have the column name of the data frame match the metric but I’m not sure if that is the best way to approach this.
Issue Analytics
- State:
- Created 5 years ago
- Comments:6 (6 by maintainers)
Top GitHub Comments
Hi does it seem like this merge got clobbered :: https://github.com/apache/incubator-superset/commit/a165aec822e5014a99b6467ed6d3d87184e13bc4#diff-6519edc75f2440a575cb22492f401100?
Thanks, @villebro. I tested it out and it worked beautifully, I pushed up a PR.