Pocket and Pinboard imports causing tags to be split incorrectly into individual characters w/ broken hyphenation
See original GitHub issueA simple bug due to a .split()
or set()
somewhere on the tags_str instead of the tags list. Should be easy to fix.
We should also add a filter to prevent emptystring / whitespace-only tags:
Issue Analytics
- State:
- Created 2 years ago
- Comments:17 (12 by maintainers)
Top Results From Across the Web
Fix text hyphenation - Microsoft Support
To fix this problem manually, turn off hyphenation and delete the hyphens. Click the Text Box Tools Format tab, and then click Hyphenation....
Read more >hyphens - CSS: Cascading Style Sheets - MDN Web Docs
The hyphens CSS property specifies how words should be hyphenated when text wraps across multiple lines. It can prevent hyphenation entirely, ...
Read more >php - Don't split hyphenated words - Stack Overflow
I'm developing a website (Wordpress, HTML, CSS, jQuery) with contents written in portuguese. There are a lot of words with hyphen, which are ......
Read more >EagleFiler 1.8.5 Manual - C-Command Software
8.11 Why does EagleFiler import the wrong Web page when I press the capture key? ... Some people like to dump all their...
Read more >Breaking words at the end of line - TeX - LaTeX Stack Exchange
Various possibilities: use the right kind of hyphenation rules, e.g., in your case it is possibly \usepackage[british]{babel} instead of the ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
I tried out what I suggested in the previous comment in my test environment and it seems to have fixed the problem. All I did was copy the method to split out the tags into a list from further down in the same file.
One note on the tests: Without my changes, there are 8 failing tests for me. After my changes, there are still the same 8 faililng tests, so my pull request did not introduce any new test failiures.
I did a little digging because I would love to get my pinboard.json file imported as well.
I’m not sure if this will help, but this is what I found: I ran through the debugger when adding a json file, and stepped through each line until I wound up in this method: https://github.com/ArchiveBox/ArchiveBox/blob/84b927e3e5fb8da93fb86a9070291495a7563a35/archivebox/core/models.py#L249-L255
where I noticed it was processing one letter of a tag at a time. The entrypoint to this function looks like it is expecting tags to be a list.
There are two places that call this function:
https://github.com/ArchiveBox/ArchiveBox/blob/32764347ce2e59919f763c552bd3e250f49c2f5b/archivebox/index/sql.py#L47
https://github.com/ArchiveBox/ArchiveBox/blob/32764347ce2e59919f763c552bd3e250f49c2f5b/archivebox/index/sql.py#L113
It looks like the first one passes in the tag string: https://github.com/ArchiveBox/ArchiveBox/blob/32764347ce2e59919f763c552bd3e250f49c2f5b/archivebox/index/sql.py#L36-L38
While the second one sets up a list to pass in: https://github.com/ArchiveBox/ArchiveBox/blob/32764347ce2e59919f763c552bd3e250f49c2f5b/archivebox/index/sql.py#L107-L109
I’m not really a python guy, but it looks like the first call also needs to setup the tags as a list before passing them to the
save_tags
method?Feel free to delete this if it doesn’t help. Cheers!