Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Flexible handling of non-existent message_attribute for given training sample

See original GitHub issue

Description of Problem:

Currently, the SpacyNLP component provides the following:

provides = ["spacy_doc", "spacy_nlp", "intent_spacy_doc", "response_spacy_doc"]

which caused the necessity to handle non-existent / None-valued attributes for a given training sample. Currently this is realized by converting None values to empty strings since spaCy can’t handle None values while creating its Doc-objects upon them.

Since simply filtering out those training samples and therefore disobey their order would cause consecutive problems, we need to find a more flexible solution.

Overview of the Solution: I am going to think about a robust solution and update this issue likewise.

Examples: If there are no samples for the response-attribute, currently this results in a list of empty Doc-objects while calling pipe on:

docs = [doc for doc in self.nlp.pipe(texts, batch_size=50)]

[, , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ]

since an empty string is valid for Doc-objects but in fact is a problem for e.g. libraries like spacy-pytorch-transformers or other custom-components which can’t handle this cases properly.

The coresponding forum entry to this conversation can be found here @dakshvar22

Issue Analytics

State:
Created 4 years ago
Comments:6 (4 by maintainers)

Top GitHub Comments

2reactions

JulianGerhard21commented, Sep 13, 2019

Hi @dakshvar22,

allright - I agree with you and I am going to start to work on this this afternoon. I will get back to you with a code proposal as soon as it is ready.

Thanks for your help!

Regards Julian

0reactions

joejuzlcommented, Jan 28, 2021

Closing as this is in a minor release around 1.3.x.

Top Results From Across the Web

Google-provided Dataflow streaming templates

The Pub/Sub Topic to BigQuery template is a streaming pipeline that reads JSON-formatted messages from a Pub/Sub topic and writes them to a...

Release Notes — Airflow Documentation

Bug Fixes¶. Automatically reschedule stalled queued tasks in CeleryExecutor (#23690). Fix expand/collapse all buttons (#23590).

QTI v3 Best Practices and Implementation Guide - 1EdTech

Associating a stylesheet with an item, a test, a section, a rubric block, template block, feedback structure, or stimulus to control appearance involves...

CMDB CI Lifecycle Management

CI LIfecycle Management provides the mechanism to define states and ... Common Service Data Model to Configuration Management Database (CMDB) mapping.

HL7 Version 2.5.1 Implementation Guide: Lab Orders

The Segment attribute only applies to the Message attribute table. DT. Data type used by this profile for HL7 element. The data type...