Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Parsing/Tokenizing a file with Ginza from command line

See original GitHub issue

Hi, Ginza is primarily used as a dependency parser. However, it can also be used for tokenization. Hence, I would like to know if there is a command line interface similar to Jieba for tokenizing a file.

In Jieba, we can use: python -m jieba -d filename

How could we do that using Ginza? 😃

Kind Regards,

Issue Analytics

State:
Created 4 years ago
Reactions:1
Comments:5 (3 by maintainers)

Top GitHub Comments

1reaction

hiroshi-matsuda-ritcommented, Oct 21, 2019

@MastafaF Please try to use sudachipy command, which is installed together with ginza, if you need to. Or, do you need --delimiter option like Jieba? Actually, I don’t want to add that for ginza command. I’m planning to add a format option which provides MeCab compatible output format. https://taku910.github.io/mecab/

1reaction

hiroshi-matsuda-ritcommented, Oct 17, 2019

@MastafaF It is typical use-case of GiNZA. I’d like to add the tokenization mode for command_line.py in next release. Thank you!

Top Results From Across the Web

Correcting tokenization errors by parser · Issue #3818 - GitHub

Feature description I'd like to implement a functionality which can correct tokenization errors (both boundaries and tags) by parser.

Gd Tokenizer - scripts - Godot Asset Store

The most important file is ./cmd/CommandParser.gd . This is where the tokenization and execution methods are defined. To use the class file you...

tokenize — Tokenizer for Python source — Python 3.11.1 ...

tokenize () determines the source encoding of the file by looking for a UTF-8 BOM or encoding cookie, according to PEP 263. tokenize.generate_tokens(readline)¶....

Tokenization - CoreNLP - Stanford NLP Group

Tokenizing From The Command Line. This command will take in the text of the file input.txt and produce a human readable output of...

Token · spaCy API Documentation

The leftward immediate children of the word in the syntactic dependency parse. Example. doc = nlp("I like New York in Autumn.") ...