Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Using Row retrieved by extract(stream=True) in dict-unpacking function invocation ("double star" f(**row)) can fail

See original GitHub issue

Hi,

many thanks for this cool new library - I’ve successfully used the fine tableschema- and datapackage-py libs in the past. This looks like a very nice successor.

Environment: Python 3.6 - frictionless frictionless 4.40.3

I’ve run into “disturbing behaviour” 😉 when trying to add description and example attributes to a Resource object’s table schema fields. Problem: I want to add to the info gathered by frictionless.describe(<csv file>), namely additional field description and example attributes. For the examples I’d like to simple read the 1st data row and use these values as example values.

Here’s a minimal example that provokes the bug(?) for me:

Data: $ cat colors.csv

id,color,description
1,red,The color of blood
2,green,The color of hope
3,blue,The color of oceans
4,yellow,Bright as the sun
5,black,Dark as the night

Program: $ cat test_read_resource_examples.py

import frictionless


def add_attr(resource, attr_name, **field_attrs):
    """Add attr: value to data resource table schema fields for each field in
    {field_name: value} field dict.
    """
    for field in resource['schema']['fields']:
        field[attr_name] = field_attrs[field['name']]
    return resource


datafile = 'colors.csv'
resource = frictionless.describe(datafile)

# add additional attributes to resource table schema fields
add_attr(resource, 'description', id='The color ID', color='The color name',
         description='The exhaustive color description')
print(resource)

# read 1st resource table row for use as examples
example_row = next(resource.extract(stream=True))
#print(example_row)  # Uncommenting this print() line makes the program work!
add_attr(resource, 'example', **example_row)
print(example_row['color'])

Running this fails:

$ .venv/dev-venv2/bin/python test_read_resource_examples.py 
{'encoding': 'utf-8',
 'format': 'csv',
 'hashing': 'md5',
 'name': 'colors',
 'path': 'colors.csv',
 'profile': 'tabular-data-resource',
 'schema': {'fields': [{'description': 'The color ID',
                        'name': 'id',
                        'type': 'integer'},
                       {'description': 'The color name',
                        'name': 'color',
                        'type': 'string'},
                       {'description': 'The exhaustive color description',
                        'name': 'description',
                        'type': 'string'}]},
 'scheme': 'file'}
Traceback (most recent call last):
  File "test_read_resource_examples.py", line 24, in <module>
    add_attr(resource, 'example', **example_row)
  File "test_read_resource_examples.py", line 9, in add_attr
    field[attr_name] = field_attrs[field['name']]
KeyError: 'id'

However, if I just uncomment the print(example_row) line…

$ .venv/dev-venv2/bin/python test_read_resource_examples.py 
{'encoding': 'utf-8',
 'format': 'csv',
 'hashing': 'md5',
 'name': 'colors',
 'path': 'colors.csv',
 'profile': 'tabular-data-resource',
 'schema': {'fields': [{'description': 'The color ID',
                        'name': 'id',
                        'type': 'integer'},
                       {'description': 'The color name',
                        'name': 'color',
                        'type': 'string'},
                       {'description': 'The exhaustive color description',
                        'name': 'description',
                        'type': 'string'}]},
 'scheme': 'file'}
{'id': 1, 'color': 'red', 'description': 'The color of blood'}
red

… it suddenly works?!

I’m unsure if that actually should work or if I’m doing s.th. unsupported, e.g. by adding to the Resource schema fields through its dict interface (I’ve noticed there’s also example and description properties). That said I found the behaviour rather surprising 😃 - what kind of “sync” does the row stdout output provoke to make it function? Do I somehow accidentally corrupt the Resource (meta)data?

Again, thanks for the cool library.

Best, Holger

Issue Analytics

State:
Created a year ago
Comments:6 (6 by maintainers)

Top GitHub Comments

2reactions

hjouklcommented, Jul 5, 2022

Just noticed that the title “Dynamically adding information to Resource’s schema fields through dict interface provokes incomplete extract(stream=True) result row” was misleading imho, so changed it.

1reaction

rollcommented, Jul 18, 2022

Hi @hjoukl!

The problem is that the row is lazy evulated (that’s why print fixes it). We can make the code work by using row.to_dict():

example_row = resource.read_rows(size=1)[0]  # also we can simplify the reading
add_attr(resource, "example", **example_row.to_dict())

Currently, we’re in v5 reworking the way Frictionless works with Metadata; making it simpler and easier. The next step for us will be revising the Row class; maybe also making it simpler to prevent this kind of edge case confusing sutuations