import: can't handle weird encoding on url
See original GitHub issue- version:
0.35.7+4e5ab2
- method of installation:
git clone && pip install -e .
- kernel name: Linux
dvc import -v 'https://ru.wikipedia.org/wiki/%D0%94%D0%BE%D1%81%D1%82%D0%BE%D0%B5%D0%B2%D1%81%D0%BA%D0%B8%D0%B9,_%D0%A4%D1%91%D0%B4%D0%BE%D1%80_%D0%9C%D0%B8%D1%85%D0%B0%D0%B9%D0%BB%D0%BE%D0%B2%D0%B8%D1%87' cyrilic.html
DEBUG: CREATE TABLE IF NOT EXISTS state (inode INTEGER PRIMARY KEY, mtime TEXT NOT NULL, size TEXT NOT NULL, md5 TEXT NOT NULL, timestamp TEXT NOT NULL)
DEBUG: CREATE TABLE IF NOT EXISTS state_info (count INTEGER)
DEBUG: CREATE TABLE IF NOT EXISTS link_state (path TEXT PRIMARY KEY, inode INTEGER NOT NULL, mtime TEXT NOT NULL)
DEBUG: INSERT OR IGNORE INTO state_info (count) SELECT 0 WHERE NOT EXISTS (SELECT * FROM state_info)
DEBUG: PRAGMA user_version = 3;
DEBUG: SELECT count from state_info WHERE rowid=1
DEBUG: fetched: [(0,)]
DEBUG: UPDATE state_info SET count = 0 WHERE rowid = 1
DEBUG: Path /home/mroutis/tmp/.dvc/cache inode 1610975953
DEBUG: INSERT OR REPLACE INTO state(inode, size, mtime, timestamp, md5) VALUES (1610975953, "6", "1556756110696766208", "1556756118497520896", "")
DEBUG: PRAGMA user_version;
DEBUG: fetched: [(3,)]
DEBUG: CREATE TABLE IF NOT EXISTS state (inode INTEGER PRIMARY KEY, mtime TEXT NOT NULL, size TEXT NOT NULL, md5 TEXT NOT NULL, timestamp TEXT NOT NULL)
DEBUG: CREATE TABLE IF NOT EXISTS state_info (count INTEGER)
DEBUG: CREATE TABLE IF NOT EXISTS link_state (path TEXT PRIMARY KEY, inode INTEGER NOT NULL, mtime TEXT NOT NULL)
DEBUG: INSERT OR IGNORE INTO state_info (count) SELECT 0 WHERE NOT EXISTS (SELECT * FROM state_info)
DEBUG: PRAGMA user_version = 3;
DEBUG: Removing output 'cyrilic.html' of 'cyrilic.html.dvc'.
DEBUG: Removing 'cyrilic.html'
Importing 'https://ru.wikipedia.org/wiki/%D0%94%D0%BE%D1%81%D1%82%D0%BE%D0%B5%D0%B2%D1%81%D0%BA%D0%B8%D0%B9,_%D0%A4%D1%91%D0%B4%D0%BE%D1%80_%D0%9C%D0%B8%D1%85%D0%B0%D0%B9%D0%BB%D
0%BE%D0%B2%D0%B8%D1%87' -> '/home/mroutis/tmp/cyrilic.html'
DEBUG: Computed stage 'cyrilic.html.dvc' md5: '742c553c2b86e0598eef4d225c10e00c'
DEBUG: Downloading 'https://ru.wikipedia.org/wiki/%D0%94%D0%BE%D1%81%D1%82%D0%BE%D0%B5%D0%B2%D1%81%D0%BA%D0%B8%D0%B9,_%D0%A4%D1%91%D0%B4%D0%BE%D1%80_%D0%9C%D0%B8%D1%85%D0%B0%D0%B
9%D0%BB%D0%BE%D0%B2%D0%B8%D1%87' to '/home/mroutis/tmp/cyrilic.html'
[##############################] 100% cyrilic.html
DEBUG: SELECT count from state_info WHERE rowid=1
DEBUG: fetched: [(0,)]
DEBUG: UPDATE state_info SET count = 0 WHERE rowid = 1
DEBUG: Path /home/mroutis/tmp/.dvc/cache inode 1610975953
DEBUG: INSERT OR REPLACE INTO state(inode, size, mtime, timestamp, md5) VALUES (1610975953, "6", "1556756110696766208", "1556756119515656960", "")
ERROR: unexpected error
------------------------------------------------------------
Traceback (most recent call last):
File "/home/mroutis/src/iterative/dvc/dvc/main.py", line 38, in main
ret = cmd.run_cmd()
File "/home/mroutis/src/iterative/dvc/dvc/command/base.py", line 60, in run_cmd
return self.run()
File "/home/mroutis/src/iterative/dvc/dvc/command/imp.py", line 23, in run
self.args.url, out, self.args.resume, fname=self.args.file
File "/home/mroutis/src/iterative/dvc/dvc/repo/scm_context.py", line 4, in run
result = method(repo, *args, **kw)
File "/home/mroutis/src/iterative/dvc/dvc/repo/imp.py", line 19, in imp
stage.run(resume=resume)
File "/home/mroutis/src/iterative/dvc/dvc/stage.py", line 834, in run
self.save()
File "/home/mroutis/src/iterative/dvc/dvc/stage.py", line 689, in save
dep.save()
File "/home/mroutis/src/iterative/dvc/dvc/output/base.py", line 211, in save
self.info = self.remote.save_info(self.path_info)
File "/home/mroutis/src/iterative/dvc/dvc/remote/base.py", line 277, in save_info
assert path_info["scheme"] == self.scheme
AssertionError
------------------------------------------------------------
Having any troubles?. Hit us up at https://dvc.org/support, we are always happy to help!
Issue Analytics
- State:
- Created 4 years ago
- Reactions:2
- Comments:6 (4 by maintainers)
Top Results From Across the Web
Using URL encoding to handle special characters in a ...
The <space> character needs to be encoded because it is not a valid URL character. Also, some characters, such as "~" might not...
Read more >Weird characters in URL - Stack Overflow
When it "understands" that the characters are not encoded in UTF-8 format, it replaces any character that it doesn't know with the bytes ......
Read more >Special characters are garbled on CSV imported objects | Jira
Cause. The UTF-8 encoding has not been correctly applied to the CSV file. This happens usually if the file is opened with some...
Read more >"Special" characters encoding issues with write_* and read_* ...
So apparently print() cannot deal with ⅛ and ℅ when they are in a data.frame? Anyway, this is what readr does. library("readr") write_csv(df ......
Read more >Considerations for Data Loader, special characters, file ...
This behavior is the result of a combination of your import file's encoding and the Data Loader settings you have selected and is...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
No, @efiop 🤔
dvc version
:Alright. The original command fails because there is no ETAG or MD5 by that link, which is ok.