question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Flexible handling of non-existent message_attribute for given training sample

See original GitHub issue

Description of Problem:

Currently, the SpacyNLP component provides the following:

provides = ["spacy_doc", "spacy_nlp", "intent_spacy_doc", "response_spacy_doc"]

which caused the necessity to handle non-existent / None-valued attributes for a given training sample. Currently this is realized by converting None values to empty strings since spaCy can’t handle None values while creating its Doc-objects upon them.

Since simply filtering out those training samples and therefore disobey their order would cause consecutive problems, we need to find a more flexible solution.

Overview of the Solution: I am going to think about a robust solution and update this issue likewise.

Examples: If there are no samples for the response-attribute, currently this results in a list of empty Doc-objects while calling pipe on:

docs = [doc for doc in self.nlp.pipe(texts, batch_size=50)]

[, , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ]

since an empty string is valid for Doc-objects but in fact is a problem for e.g. libraries like spacy-pytorch-transformers or other custom-components which can’t handle this cases properly.

The coresponding forum entry to this conversation can be found here @dakshvar22

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:6 (4 by maintainers)

github_iconTop GitHub Comments

2reactions
JulianGerhard21commented, Sep 13, 2019

Hi @dakshvar22,

allright - I agree with you and I am going to start to work on this this afternoon. I will get back to you with a code proposal as soon as it is ready.

Thanks for your help!

Regards Julian

0reactions
joejuzlcommented, Jan 28, 2021

Closing as this is in a minor release around 1.3.x.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Google-provided Dataflow streaming templates
The Pub/Sub Topic to BigQuery template is a streaming pipeline that reads JSON-formatted messages from a Pub/Sub topic and writes them to a...
Read more >
Release Notes — Airflow Documentation
Bug Fixes¶. Automatically reschedule stalled queued tasks in CeleryExecutor (#23690). Fix expand/collapse all buttons (#23590).
Read more >
QTI v3 Best Practices and Implementation Guide - 1EdTech
Associating a stylesheet with an item, a test, a section, a rubric block, template block, feedback structure, or stimulus to control appearance involves...
Read more >
CMDB CI Lifecycle Management
CI LIfecycle Management provides the mechanism to define states and ... Common Service Data Model to Configuration Management Database (CMDB) mapping.
Read more >
HL7 Version 2.5.1 Implementation Guide: Lab Orders
The Segment attribute only applies to the Message attribute table. DT. Data type used by this profile for HL7 element. The data type...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found