question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

import: can't handle weird encoding on url

See original GitHub issue
  • version: 0.35.7+4e5ab2
  • method of installation: git clone && pip install -e .
  • kernel name: Linux
dvc import -v 'https://ru.wikipedia.org/wiki/%D0%94%D0%BE%D1%81%D1%82%D0%BE%D0%B5%D0%B2%D1%81%D0%BA%D0%B8%D0%B9,_%D0%A4%D1%91%D0%B4%D0%BE%D1%80_%D0%9C%D0%B8%D1%85%D0%B0%D0%B9%D0%BB%D0%BE%D0%B2%D0%B8%D1%87' cyrilic.html
DEBUG: CREATE TABLE IF NOT EXISTS state (inode INTEGER PRIMARY KEY, mtime TEXT NOT NULL, size TEXT NOT NULL, md5 TEXT NOT NULL, timestamp TEXT NOT NULL)
DEBUG: CREATE TABLE IF NOT EXISTS state_info (count INTEGER)
DEBUG: CREATE TABLE IF NOT EXISTS link_state (path TEXT PRIMARY KEY, inode INTEGER NOT NULL, mtime TEXT NOT NULL)
DEBUG: INSERT OR IGNORE INTO state_info (count) SELECT 0 WHERE NOT EXISTS (SELECT * FROM state_info)
DEBUG: PRAGMA user_version = 3;
DEBUG: SELECT count from state_info WHERE rowid=1
DEBUG: fetched: [(0,)]
DEBUG: UPDATE state_info SET count = 0 WHERE rowid = 1
DEBUG: Path /home/mroutis/tmp/.dvc/cache inode 1610975953
DEBUG: INSERT OR REPLACE INTO state(inode, size, mtime, timestamp, md5) VALUES (1610975953, "6", "1556756110696766208", "1556756118497520896", "")
DEBUG: PRAGMA user_version;
DEBUG: fetched: [(3,)]
DEBUG: CREATE TABLE IF NOT EXISTS state (inode INTEGER PRIMARY KEY, mtime TEXT NOT NULL, size TEXT NOT NULL, md5 TEXT NOT NULL, timestamp TEXT NOT NULL)
DEBUG: CREATE TABLE IF NOT EXISTS state_info (count INTEGER)
DEBUG: CREATE TABLE IF NOT EXISTS link_state (path TEXT PRIMARY KEY, inode INTEGER NOT NULL, mtime TEXT NOT NULL)
DEBUG: INSERT OR IGNORE INTO state_info (count) SELECT 0 WHERE NOT EXISTS (SELECT * FROM state_info)
DEBUG: PRAGMA user_version = 3;
DEBUG: Removing output 'cyrilic.html' of 'cyrilic.html.dvc'.
DEBUG: Removing 'cyrilic.html'
Importing 'https://ru.wikipedia.org/wiki/%D0%94%D0%BE%D1%81%D1%82%D0%BE%D0%B5%D0%B2%D1%81%D0%BA%D0%B8%D0%B9,_%D0%A4%D1%91%D0%B4%D0%BE%D1%80_%D0%9C%D0%B8%D1%85%D0%B0%D0%B9%D0%BB%D
0%BE%D0%B2%D0%B8%D1%87' -> '/home/mroutis/tmp/cyrilic.html'
DEBUG: Computed stage 'cyrilic.html.dvc' md5: '742c553c2b86e0598eef4d225c10e00c'
DEBUG: Downloading 'https://ru.wikipedia.org/wiki/%D0%94%D0%BE%D1%81%D1%82%D0%BE%D0%B5%D0%B2%D1%81%D0%BA%D0%B8%D0%B9,_%D0%A4%D1%91%D0%B4%D0%BE%D1%80_%D0%9C%D0%B8%D1%85%D0%B0%D0%B
9%D0%BB%D0%BE%D0%B2%D0%B8%D1%87' to '/home/mroutis/tmp/cyrilic.html'
[##############################] 100% cyrilic.html
DEBUG: SELECT count from state_info WHERE rowid=1
DEBUG: fetched: [(0,)]
DEBUG: UPDATE state_info SET count = 0 WHERE rowid = 1
DEBUG: Path /home/mroutis/tmp/.dvc/cache inode 1610975953
DEBUG: INSERT OR REPLACE INTO state(inode, size, mtime, timestamp, md5) VALUES (1610975953, "6", "1556756110696766208", "1556756119515656960", "")
ERROR: unexpected error
------------------------------------------------------------
Traceback (most recent call last):
  File "/home/mroutis/src/iterative/dvc/dvc/main.py", line 38, in main
    ret = cmd.run_cmd()
  File "/home/mroutis/src/iterative/dvc/dvc/command/base.py", line 60, in run_cmd
    return self.run()
  File "/home/mroutis/src/iterative/dvc/dvc/command/imp.py", line 23, in run
    self.args.url, out, self.args.resume, fname=self.args.file
  File "/home/mroutis/src/iterative/dvc/dvc/repo/scm_context.py", line 4, in run
    result = method(repo, *args, **kw)
  File "/home/mroutis/src/iterative/dvc/dvc/repo/imp.py", line 19, in imp
    stage.run(resume=resume)
  File "/home/mroutis/src/iterative/dvc/dvc/stage.py", line 834, in run
    self.save()
  File "/home/mroutis/src/iterative/dvc/dvc/stage.py", line 689, in save
    dep.save()
  File "/home/mroutis/src/iterative/dvc/dvc/output/base.py", line 211, in save
    self.info = self.remote.save_info(self.path_info)
  File "/home/mroutis/src/iterative/dvc/dvc/remote/base.py", line 277, in save_info
    assert path_info["scheme"] == self.scheme
AssertionError
------------------------------------------------------------

Having any troubles?. Hit us up at https://dvc.org/support, we are always happy to help!

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Reactions:2
  • Comments:6 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
ghostcommented, Aug 28, 2019

No, @efiop 🤔

dvc import-url -v 'https://upload.wikimedia.org/wikipedia/commons/7/78/Vasily_Perov_-_%D0%9F%D0%BE%D1%80%D1%82%D1%80%D0%B5%D1%82_%D0%A4.%D0%9C.%D0%94%D0%BE%D1%81%D1%82%D0%BE%D0%B5%D0%B2%D1%81%D0%BA%D0%BE%D0%B3%D0%BE_-_Google_Art_Project.jpg' dostoyevsky.jpg
DEBUG: PRAGMA user_version;
DEBUG: fetched: [(3,)]
DEBUG: CREATE TABLE IF NOT EXISTS state (inode INTEGER PRIMARY KEY, mtime TEXT NOT NULL, size TEXT NOT NULL, md5 TEXT NOT NULL, timestamp TEXT NOT NULL)
DEBUG: CREATE TABLE IF NOT EXISTS state_info (count INTEGER)
DEBUG: CREATE TABLE IF NOT EXISTS link_state (path TEXT PRIMARY KEY, inode INTEGER NOT NULL, mtime TEXT NOT NULL)
DEBUG: INSERT OR IGNORE INTO state_info (count) SELECT 0 WHERE NOT EXISTS (SELECT * FROM state_info)
DEBUG: PRAGMA user_version = 3;
DEBUG: SELECT count from state_info WHERE rowid=?
DEBUG: fetched: [(18,)]
DEBUG: UPDATE state_info SET count = ? WHERE rowid = ?
DEBUG: PRAGMA user_version;
DEBUG: fetched: [(3,)]
DEBUG: CREATE TABLE IF NOT EXISTS state (inode INTEGER PRIMARY KEY, mtime TEXT NOT NULL, size TEXT NOT NULL, md5 TEXT NOT NULL, timestamp TEXT NOT NULL)
DEBUG: CREATE TABLE IF NOT EXISTS state_info (count INTEGER)
DEBUG: CREATE TABLE IF NOT EXISTS link_state (path TEXT PRIMARY KEY, inode INTEGER NOT NULL, mtime TEXT NOT NULL)
DEBUG: INSERT OR IGNORE INTO state_info (count) SELECT 0 WHERE NOT EXISTS (SELECT * FROM state_info)
DEBUG: PRAGMA user_version = 3;
DEBUG: Removing output 'dostoyevsky.jpg' of 'dostoyevsky.jpg.dvc'.
Importing 'https://upload.wikimedia.org/wikipedia/commons/7/78/Vasily_Perov_-_%D0%9F%D0%BE%D1%80%D1%82%D1%80%D0%B5%D1%82_%D0%A4.%D0%9C.%D0%94%D0%BE%D1%81%D1%82%D0%BE%D0%B5%D0%B2%D1%81%D0%BA%D0%BE%D0%B3%D0%BE_-_Google_Art_Project.jpg' -> 'dostoyevsky.jpg'
DEBUG: Computed stage 'dostoyevsky.jpg.dvc' md5: '68dac73c66eb94dbd609e4b87c746ac0'
DEBUG: Downloading 'https://upload.wikimedia.org/wikipedia/commons/7/78/Vasily_Perov_-_%D0%9F%D0%BE%D1%80%D1%82%D1%80%D0%B5%D1%82_%D0%A4.%D0%9C.%D0%94%D0%BE%D1%81%D1%82%D0%BE%D0%B5%D0%B2%D1%81%D0%BA%D0%BE%D0%B3%D0%BE_-_Google_Art_Project.jpg' to 'dostoyevsky.jpg'
DEBUG: Path dostoyevsky.jpg inode 806058497
DEBUG: SELECT mtime, size, md5, timestamp from state WHERE inode=?
DEBUG: fetched: []
DEBUG: Path dostoyevsky.jpg inode 806058497
DEBUG: SELECT mtime, size, md5, timestamp from state WHERE inode=?
DEBUG: fetched: []
DEBUG: INSERT INTO state(inode, mtime, size, md5, timestamp) VALUES (?, ?, ?, ?, ?)
DEBUG: {'dostoyevsky.jpg': 'modified'}
DEBUG: Path dostoyevsky.jpg inode 806058497
DEBUG: SELECT mtime, size, md5, timestamp from state WHERE inode=?
DEBUG: fetched: [('1567023651993397248', '3876258', '6c1c99252185bd6c8b1bbde0493cfd6a', '1567023653339473920')]
DEBUG: UPDATE state SET timestamp = ? WHERE inode = ?
DEBUG: Computed stage 'dostoyevsky.jpg.dvc' md5: '5559f59d09f285b0ffa397b3fe47c66d'
DEBUG: cache '.dvc/cache/6c/1c99252185bd6c8b1bbde0493cfd6a' expected '6c1c99252185bd6c8b1bbde0493cfd6a' actual 'None'
Saving 'dostoyevsky.jpg' to '.dvc/cache/6c/1c99252185bd6c8b1bbde0493cfd6a'.
DEBUG: cache '.dvc/cache/6c/1c99252185bd6c8b1bbde0493cfd6a' expected '6c1c99252185bd6c8b1bbde0493cfd6a' actual 'None'
DEBUG: Created 'reflink': .dvc/cache/6c/1c99252185bd6c8b1bbde0493cfd6a -> dostoyevsky.jpg
DEBUG: Path dostoyevsky.jpg inode 806973593
DEBUG: REPLACE INTO link_state(path, inode, mtime) VALUES (?, ?, ?)
DEBUG: Path dostoyevsky.jpg inode 806973593
DEBUG: SELECT mtime, size, md5, timestamp from state WHERE inode=?
DEBUG: fetched: []
DEBUG: INSERT INTO state(inode, mtime, size, md5, timestamp) VALUES (?, ?, ?, ?, ?)
DEBUG: Path .dvc/cache/6c/1c99252185bd6c8b1bbde0493cfd6a inode 806058497
DEBUG: SELECT mtime, size, md5, timestamp from state WHERE inode=?
DEBUG: fetched: [('1567023651993397248', '3876258', '6c1c99252185bd6c8b1bbde0493cfd6a', '1567023653340432384')]
DEBUG: UPDATE state SET mtime = ?, size = ?, md5 = ?, timestamp = ? WHERE inode = ?
DEBUG: SELECT count from state_info WHERE rowid=?
DEBUG: fetched: [(18,)]
DEBUG: UPDATE state_info SET count = ? WHERE rowid = ?
Saving information to 'dostoyevsky.jpg.dvc'.

dvc version:

DVC version: 0.57.0+5d05e9
Python version: 3.7.4
Platform: Linux-5.2.9-arch1-1-ARCH-x86_64-with-arch
Binary: False
Cache: reflink - True, hardlink - True, symlink - True
Filesystem type (cache directory): ('xfs', '/dev/mapper/vg-root')
Filesystem type (workspace): ('xfs', '/dev/mapper/vg-root')
0reactions
efiopcommented, Aug 29, 2019

Alright. The original command fails because there is no ETAG or MD5 by that link, which is ok.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Using URL encoding to handle special characters in a ...
The <space> character needs to be encoded because it is not a valid URL character. Also, some characters, such as "~" might not...
Read more >
Weird characters in URL - Stack Overflow
When it "understands" that the characters are not encoded in UTF-8 format, it replaces any character that it doesn't know with the bytes ......
Read more >
Special characters are garbled on CSV imported objects | Jira
Cause. The UTF-8 encoding has not been correctly applied to the CSV file. This happens usually if the file is opened with some...
Read more >
"Special" characters encoding issues with write_* and read_* ...
So apparently print() cannot deal with ⅛ and ℅ when they are in a data.frame? Anyway, this is what readr does. library("readr") write_csv(df ......
Read more >
Considerations for Data Loader, special characters, file ...
This behavior is the result of a combination of your import file's encoding and the Data Loader settings you have selected and is...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found