Using Row retrieved by extract(stream=True) in dict-unpacking function invocation ("double star" f(**row)) can fail
See original GitHub issueHi,
many thanks for this cool new library - I’ve successfully used the fine tableschema- and datapackage-py libs in the past. This looks like a very nice successor.
Environment: Python 3.6 - frictionless frictionless 4.40.3
I’ve run into “disturbing behaviour” 😉 when trying to add description
and example
attributes to a Resource object’s table schema fields. Problem: I want to add to the info gathered by frictionless.describe(<csv file>)
, namely additional field description
and example
attributes. For the examples I’d like to simple read the 1st data row and use these values as example values.
Here’s a minimal example that provokes the bug(?) for me:
Data: $ cat colors.csv
id,color,description
1,red,The color of blood
2,green,The color of hope
3,blue,The color of oceans
4,yellow,Bright as the sun
5,black,Dark as the night
Program: $ cat test_read_resource_examples.py
import frictionless
def add_attr(resource, attr_name, **field_attrs):
"""Add attr: value to data resource table schema fields for each field in
{field_name: value} field dict.
"""
for field in resource['schema']['fields']:
field[attr_name] = field_attrs[field['name']]
return resource
datafile = 'colors.csv'
resource = frictionless.describe(datafile)
# add additional attributes to resource table schema fields
add_attr(resource, 'description', id='The color ID', color='The color name',
description='The exhaustive color description')
print(resource)
# read 1st resource table row for use as examples
example_row = next(resource.extract(stream=True))
#print(example_row) # Uncommenting this print() line makes the program work!
add_attr(resource, 'example', **example_row)
print(example_row['color'])
Running this fails:
$ .venv/dev-venv2/bin/python test_read_resource_examples.py
{'encoding': 'utf-8',
'format': 'csv',
'hashing': 'md5',
'name': 'colors',
'path': 'colors.csv',
'profile': 'tabular-data-resource',
'schema': {'fields': [{'description': 'The color ID',
'name': 'id',
'type': 'integer'},
{'description': 'The color name',
'name': 'color',
'type': 'string'},
{'description': 'The exhaustive color description',
'name': 'description',
'type': 'string'}]},
'scheme': 'file'}
Traceback (most recent call last):
File "test_read_resource_examples.py", line 24, in <module>
add_attr(resource, 'example', **example_row)
File "test_read_resource_examples.py", line 9, in add_attr
field[attr_name] = field_attrs[field['name']]
KeyError: 'id'
However, if I just uncomment the print(example_row)
line…
$ .venv/dev-venv2/bin/python test_read_resource_examples.py
{'encoding': 'utf-8',
'format': 'csv',
'hashing': 'md5',
'name': 'colors',
'path': 'colors.csv',
'profile': 'tabular-data-resource',
'schema': {'fields': [{'description': 'The color ID',
'name': 'id',
'type': 'integer'},
{'description': 'The color name',
'name': 'color',
'type': 'string'},
{'description': 'The exhaustive color description',
'name': 'description',
'type': 'string'}]},
'scheme': 'file'}
{'id': 1, 'color': 'red', 'description': 'The color of blood'}
red
… it suddenly works?!
I’m unsure if that actually should work or if I’m doing s.th. unsupported, e.g. by adding to the Resource schema fields through its dict interface (I’ve noticed there’s also example
and description
properties).
That said I found the behaviour rather surprising 😃 - what kind of “sync” does the row stdout output provoke to make it function? Do I somehow accidentally corrupt the Resource (meta)data?
Again, thanks for the cool library.
Best, Holger
Issue Analytics
- State:
- Created a year ago
- Comments:6 (6 by maintainers)
Just noticed that the title “Dynamically adding information to Resource’s schema fields through dict interface provokes incomplete extract(stream=True) result row” was misleading imho, so changed it.
Hi @hjoukl!
The problem is that the row is lazy evulated (that’s why print fixes it). We can make the code work by using
row.to_dict()
:Currently, we’re in v5 reworking the way Frictionless works with Metadata; making it simpler and easier. The next step for us will be revising the
Row
class; maybe also making it simpler to prevent this kind of edge case confusing sutuations