question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Using Row retrieved by extract(stream=True) in dict-unpacking function invocation ("double star" f(**row)) can fail

See original GitHub issue

Hi,

many thanks for this cool new library - I’ve successfully used the fine tableschema- and datapackage-py libs in the past. This looks like a very nice successor.

Environment: Python 3.6 - frictionless frictionless 4.40.3

I’ve run into “disturbing behaviour” 😉 when trying to add description and example attributes to a Resource object’s table schema fields. Problem: I want to add to the info gathered by frictionless.describe(<csv file>), namely additional field description and example attributes. For the examples I’d like to simple read the 1st data row and use these values as example values.

Here’s a minimal example that provokes the bug(?) for me:

Data: $ cat colors.csv

id,color,description
1,red,The color of blood
2,green,The color of hope
3,blue,The color of oceans
4,yellow,Bright as the sun
5,black,Dark as the night

Program: $ cat test_read_resource_examples.py

import frictionless


def add_attr(resource, attr_name, **field_attrs):
    """Add attr: value to data resource table schema fields for each field in
    {field_name: value} field dict.
    """
    for field in resource['schema']['fields']:
        field[attr_name] = field_attrs[field['name']]
    return resource


datafile = 'colors.csv'
resource = frictionless.describe(datafile)

# add additional attributes to resource table schema fields
add_attr(resource, 'description', id='The color ID', color='The color name',
         description='The exhaustive color description')
print(resource)

# read 1st resource table row for use as examples
example_row = next(resource.extract(stream=True))
#print(example_row)  # Uncommenting this print() line makes the program work!
add_attr(resource, 'example', **example_row)
print(example_row['color'])

Running this fails:

$ .venv/dev-venv2/bin/python test_read_resource_examples.py 
{'encoding': 'utf-8',
 'format': 'csv',
 'hashing': 'md5',
 'name': 'colors',
 'path': 'colors.csv',
 'profile': 'tabular-data-resource',
 'schema': {'fields': [{'description': 'The color ID',
                        'name': 'id',
                        'type': 'integer'},
                       {'description': 'The color name',
                        'name': 'color',
                        'type': 'string'},
                       {'description': 'The exhaustive color description',
                        'name': 'description',
                        'type': 'string'}]},
 'scheme': 'file'}
Traceback (most recent call last):
  File "test_read_resource_examples.py", line 24, in <module>
    add_attr(resource, 'example', **example_row)
  File "test_read_resource_examples.py", line 9, in add_attr
    field[attr_name] = field_attrs[field['name']]
KeyError: 'id'

However, if I just uncomment the print(example_row) line…

$ .venv/dev-venv2/bin/python test_read_resource_examples.py 
{'encoding': 'utf-8',
 'format': 'csv',
 'hashing': 'md5',
 'name': 'colors',
 'path': 'colors.csv',
 'profile': 'tabular-data-resource',
 'schema': {'fields': [{'description': 'The color ID',
                        'name': 'id',
                        'type': 'integer'},
                       {'description': 'The color name',
                        'name': 'color',
                        'type': 'string'},
                       {'description': 'The exhaustive color description',
                        'name': 'description',
                        'type': 'string'}]},
 'scheme': 'file'}
{'id': 1, 'color': 'red', 'description': 'The color of blood'}
red

… it suddenly works?!

I’m unsure if that actually should work or if I’m doing s.th. unsupported, e.g. by adding to the Resource schema fields through its dict interface (I’ve noticed there’s also example and description properties). That said I found the behaviour rather surprising 😃 - what kind of “sync” does the row stdout output provoke to make it function? Do I somehow accidentally corrupt the Resource (meta)data?

Again, thanks for the cool library.

Best, Holger

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:6 (6 by maintainers)

github_iconTop GitHub Comments

2reactions
hjouklcommented, Jul 5, 2022

Just noticed that the title “Dynamically adding information to Resource’s schema fields through dict interface provokes incomplete extract(stream=True) result row” was misleading imho, so changed it.

1reaction
rollcommented, Jul 18, 2022

Hi @hjoukl!

The problem is that the row is lazy evulated (that’s why print fixes it). We can make the code work by using row.to_dict():

example_row = resource.read_rows(size=1)[0]  # also we can simplify the reading
add_attr(resource, "example", **example_row.to_dict())

Currently, we’re in v5 reworking the way Frictionless works with Metadata; making it simpler and easier. The next step for us will be revising the Row class; maybe also making it simpler to prevent this kind of edge case confusing sutuations

Read more comments on GitHub >

github_iconTop Results From Across the Web

Nested for loop in R is giving me bracket error despite using ...
I'm trying to use a nested for loop to incorporate a time increment in the first loop, then in the second loop add...
Read more >
Operators in R | Introduction to Quantitative Methods
You can think of the square brackets as marking the edges of a cell, column or row of a table. The square brackets...
Read more >
Python Parentheses Cheat Sheet - Edlitera
In this article, I'll cover what standard parentheses, square brackets, and curly braces represent to Python when it interprets the code you've ...
Read more >
Practical PostgreSQL - Functions - Linuxtopia
To use a function in a SQL statement, type the function's name, followed by its list of parameters (called arguments), if any. The...
Read more >
Retrieve Rows from Google Spreadsheet with Google Apps ...
This post will describe different ways of getting row data from a Google Sheet using Google Apps Script, and then walk you through...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found