Is it possible to unshorten urls with twarc2?
See original GitHub issueI’ve gathered tweets with twarc2. I want to unshorten their URL. Do I have to use twarc or is it possible to do it with twarc2?
I am looking in the documentation (https://twarc-project.readthedocs.io/en/latest/) and when running it with twarc, from the directory where I have this repository cloned and updated:
cat archive_flatten.jsonl | utils/unshrtn.py > archive_flatten_unshrtn.jsonl
I get:
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "/home/numeroteca/.pyenv/versions/3.8.1/lib/python3.8/multiprocessing/pool.py", line 125, in worker
result = (True, func(*args, **kwds))
File "utils/unshrtn.py", line 78, in rewrite_line
for url_dict in tweet["entities"]["urls"]:
KeyError: 'urls'
"""
Issue Analytics
- State:
- Created 2 years ago
- Comments:14 (9 by maintainers)
Top Results From Across the Web
twarc Is it possible to unshorten urls with twarc2? - Python - GitAnswer
I've gathered tweets with twarc2. I want to unshorten their URL. Do I have to use twarc or is it possible to do...
Read more >new unshrtn.py equivalent plugin for v2 · Issue #269 · DocNow/twarc ...
This message form Chris Holt has me thinking that unshrtn should unshorten urls in the retweeted and quoted tweets as well as in...
Read more >twarc1 (en) - twarc
To avoid using up your entire budget you will likely want to limit the time ... Once you unshorten your URLs you can...
Read more >Unshorten.It!: Unshorten that URL!
Unshorten.It! Not got a short URL to try? Here's one: http://bit.ly/GVBQJS. Title Loading... Destination URL: URL Loading... Description:.
Read more >twarc2. twarc has been redesigned from the… | by Ed Summers
Access to the full archive means it's now possible to study events ... For example there are scripts for extracting and unshortening urls, ......
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
I’ve documented my URL analysis in this wiki page https://github.com/DocNow/twarc/wiki/Analyzing-links-in-tweets
I’ going to ticket the missing
entities.urls
for referenced tweets in twarc-csv.