question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

can't run pre-processinng code.

See original GitHub issue

hi, xinya 我跑了你的代码,在数据预处理的时候报错, 步骤是执行‘python scripts/data/ace-event/convert_examples.py’

  File "scripts/data/ace-event/convert_examples.py", line 11, in <module>
    line = json.loads(line)
  File "/home/yangpan/anaconda3/envs/ace-event-preprocess/lib/python3.7/json/__init__.py", line 348, in loads
    return _default_decoder.decode(s)
  File "/home/yangpan/anaconda3/envs/ace-event-preprocess/lib/python3.7/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/home/yangpan/anaconda3/envs/ace-event-preprocess/lib/python3.7/json/decoder.py", line 353, in raw_decode
    obj, end = self.scan_once(s, idx)
json.decoder.JSONDecodeError: Expecting property name enclosed in double quotes: line 2 column 1 (char 2)

我检查了下过程,是严格按照readme的步骤来的 & 重新从官网下载了原始数据:

  • 在解析ace05数据的时候,我采用了default-setting: python ./scripts/data/ace-event/parse_ace_event.py default-settings

我检查了下执行parse_ace_event.py得到的数据,会有一些问题,如‘events’ 字段有大量无意义的空list, 如下图:

image

请问下这个是什么原因?

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:7

github_iconTop GitHub Comments

6reactions
HuangZhenyangcommented, Mar 31, 2021

parse_ace_event.py处理完的train.json,形式是{}{}{},一个json object不是放在同一行,不同的json object之间也没有分隔符

作者的代码是一行一行去读取和解析的,所以会报错… 可能是这个代码不适用default-setting的情况?(没具体看parse_ace_event.py

image

下面是我改过的convert_examples.py

from os import path
import json
import collections

output_dir = "./data/ace-event/processed-data/json"
tmp_json_dir = "./data/ace-event/processed-data/default-settings/json"

for fold in ["train", "dev", "test"]:
    f_convert = open(path.join(output_dir, fold + "_convert.json"), "w")

    with open(path.join(tmp_json_dir, fold + ".json"), "r") as f:
        json_str = ""
        ed_char = "}"
        
        for line in f.readlines():
            line = line.strip()
            json_str += line
            if line == ed_char:
                json_obj = json.loads(json_str)
                json_str = ""

                sentences = json_obj["sentences"]
                ner = json_obj["ner"]
                relations = json_obj["relations"]
                events = json_obj["events"]
                sentence_start = json_obj["sentence_start"]
                doc_key = json_obj["doc_key"]

                assert len(sentence_start) == len(ner) == len(relations) == len(events) == len(sentence_start)

                for sentence, ner, relation, event, s_start in zip(sentences, ner, relations, events, sentence_start):
                    # sentence_annotated = dict()
                    sentence_annotated = collections.OrderedDict()
                    sentence_annotated["sentence"] = sentence
                    sentence_annotated["s_start"] = s_start
                    sentence_annotated["ner"] = ner
                    sentence_annotated["relation"] = relation
                    sentence_annotated["event"] = event

                    # if sentence_annotated["s_start"]>5:
                    f_convert.write(json.dumps(sentence_annotated, default=int) + "\n")

0reactions
Akeeperscommented, Dec 21, 2020

@JunnYu 嗯嗯~

Read more comments on GitHub >

github_iconTop Results From Across the Web

How to have the C preprocessor execute code during ...
One way to accomplish this is to put all your strings into a header file, and name them: // StringHeader.h #define helloWorld "Hello...
Read more >
Errors running prepare_text.sh (and other preprocessing) from ...
First, I can't run it from the same folder as the README at https://github.com/pytorch/fairseq/tree/master/examples/wav2vec/unsupervised# ...
Read more >
Run Scripts with Your Own Processing Container
The workflow shows how to create your own image, build your container, and use a ScriptProcessor class to run a Python preprocessing script...
Read more >
Data Preprocessing in Python for Machine Learning
Python program to preprocess data for machine learning algorithms.
Read more >
Preprocessor Cheat Sheet - USC Bytes
Any preprocessor lines in the inserted code will also be run when the proprocessor gets to them. This means you have to be...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found