question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Associate custom data to Doc objects to be used when running custom components

See original GitHub issue

Hi Matthew and Ines, I checked the documentation and the code, however I’m not able to find a solution on associating data to Doc objects that are created using either nlp(my_text) or Doc(...). In a nutshell, I have many paragraphs with a lot of data that I know must be associated with them. I also have a custom component that aims to make use of this data to perform computations on the particular associated text. However, it seems this data cannot be associated on the creation of Doc objects.

This is my pseudocode to exemplify the use case:

# Set Doc extensions
set_doc_metadata_extensions(resources)

for document in documents:
  for paragraph in document:
    metadata = get_metadata(paragraph)
    doc = nlp(paragraph)   # I want to associate here the metadata, since this is the point in which all components are ran

    # Associate their values to be used in a custom component
    doc._.metadata_a = value
    ...
    doc._.metadata_b = value

Obviously, accessing the various metadata_* from the custom component gives me a default value for each metadata field (because the association of values happens after the pipeline). I cannot define them before since the Doc object doesn’t exist yet. I also tried to use the user_data parameter from the Doc object but it leads to the same.

Is there a workaround to get the expected result? If no, this can be an interesting feature request due to the use case that seems very common in my experience.

Thank you!

Your Environment

  • Operating System: MacOs Sierra
  • Python Version Used: 2.6
  • spaCy Version Used: 2.0.10

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:5 (1 by maintainers)

github_iconTop GitHub Comments

1reaction
alanramponicommented, Jun 19, 2018

Hi @SandeepNaidu, thank you for your reply. I solved using your first suggestion: after setting custom attributes and the workflow, I created a doc object using the make_doc function, then I assigned custom data to them and finally I ran each component. In this way each component can access “a priori” data, even if it is the first component in the pipeline. For the ones that will face the very same problem in the future, this is the solution:

# Create a Doc object
doc = nlp.make_doc(paragraph)

# Associate the metadata
doc._.A = value_A
doc._.B = value_B

# Run each component
for name, proc in nlp.pipeline:
    doc = proc(doc)

It would be great if this scenario can be well documented somewhere, thank you very much indeed!

1reaction
SandeepNaiducommented, Jun 19, 2018

Hi Alan,

Did you try using nlp.make_doc and then send it into the pipeline? If you can propagate, you can write a custom pipeline component and intercept/position it after sent or sbd so that you can assign the properties you want there. Else before you call the pipeline, create a doc and then send the doc object into the pipeline.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Create Custom Components to Integrate with Backend Services
To create a custom component service project, use the Oracle Bots Node.js SDK command line interface (CLI) that was installed when you installed...
Read more >
Developing a Custom Data Flow Component - Microsoft Learn
Describes the run-time methods to implement in a custom data flow ... object model, and is used to create custom data flow components...
Read more >
Getting started with custom objects | Zendesk Developer Docs
This guide shows you how to create and use custom objects with the Zendesk API. The example takes an IT team that maintains...
Read more >
Using custom elements - Web Components | MDN
The controller of custom elements on a web document is the CustomElementRegistry object — this object allows you to register a custom ......
Read more >
CRM | custom objects - HubSpot Developers
HubSpot custom objects allow organizations to represent and organize that data based on your business requirements with the custom objects API.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found