question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Pocket and Pinboard imports causing tags to be split incorrectly into individual characters w/ broken hyphenation

See original GitHub issue

A simple bug due to a .split() or set() somewhere on the tags_str instead of the tags list. Should be easy to fix.

image

We should also add a filter to prevent emptystring / whitespace-only tags: image

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:17 (12 by maintainers)

github_iconTop GitHub Comments

2reactions
hannah98commented, Dec 20, 2021

I tried out what I suggested in the previous comment in my test environment and it seems to have fixed the problem. All I did was copy the method to split out the tags into a list from further down in the same file.

One note on the tests: Without my changes, there are 8 failing tests for me. After my changes, there are still the same 8 faililng tests, so my pull request did not introduce any new test failiures.

2reactions
hannah98commented, Dec 1, 2021

I did a little digging because I would love to get my pinboard.json file imported as well.

I’m not sure if this will help, but this is what I found: I ran through the debugger when adding a json file, and stepped through each line until I wound up in this method: https://github.com/ArchiveBox/ArchiveBox/blob/84b927e3e5fb8da93fb86a9070291495a7563a35/archivebox/core/models.py#L249-L255

where I noticed it was processing one letter of a tag at a time. The entrypoint to this function looks like it is expecting tags to be a list.

There are two places that call this function:

https://github.com/ArchiveBox/ArchiveBox/blob/32764347ce2e59919f763c552bd3e250f49c2f5b/archivebox/index/sql.py#L47

https://github.com/ArchiveBox/ArchiveBox/blob/32764347ce2e59919f763c552bd3e250f49c2f5b/archivebox/index/sql.py#L113

It looks like the first one passes in the tag string: https://github.com/ArchiveBox/ArchiveBox/blob/32764347ce2e59919f763c552bd3e250f49c2f5b/archivebox/index/sql.py#L36-L38

While the second one sets up a list to pass in: https://github.com/ArchiveBox/ArchiveBox/blob/32764347ce2e59919f763c552bd3e250f49c2f5b/archivebox/index/sql.py#L107-L109

I’m not really a python guy, but it looks like the first call also needs to setup the tags as a list before passing them to the save_tags method?

Feel free to delete this if it doesn’t help. Cheers!

Read more comments on GitHub >

github_iconTop Results From Across the Web

Fix text hyphenation - Microsoft Support
To fix this problem manually, turn off hyphenation and delete the hyphens. Click the Text Box Tools Format tab, and then click Hyphenation....
Read more >
hyphens - CSS: Cascading Style Sheets - MDN Web Docs
The hyphens CSS property specifies how words should be hyphenated when text wraps across multiple lines. It can prevent hyphenation entirely,  ...
Read more >
php - Don't split hyphenated words - Stack Overflow
I'm developing a website (Wordpress, HTML, CSS, jQuery) with contents written in portuguese. There are a lot of words with hyphen, which are ......
Read more >
EagleFiler 1.8.5 Manual - C-Command Software
8.11 Why does EagleFiler import the wrong Web page when I press the capture key? ... Some people like to dump all their...
Read more >
Breaking words at the end of line - TeX - LaTeX Stack Exchange
Various possibilities: use the right kind of hyphenation rules, e.g., in your case it is possibly \usepackage[british]{babel} instead of the ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found