repro: invalid argument quoting during dictionary interpolation
See original GitHub issueDuring dictionary interpolation, values are always enclosed in single quotes. On Windows this always results in an error because the cmd
shell doesn’t have single quotes. On Linux this results in an error at least if the value contains a single quote.
Reproduce
A testing dvc.yaml:
vars:
- param:
param_1:
param: testing quotes
param_2:
param: test'n quotes
stages:
test_quotes_1: # This fails on Windows
cmd: python -c "import sys; assert sys.argv[2] == 'testing quotes', sys.argv[2]" ${param_1} && echo Passed || echo Failed
test_quotes_2: # This fails on both Windows and Linux
cmd: python -c "import sys; assert sys.argv[2] == 'test\'n quotes', sys.argv[2]" ${param_2} && echo Passed || echo Failed
- dvc init --no-scm
- Create a dvc.yaml as above
- dvc repro
On Windows: both stages print “Failed”
On Linux: test_quotes_1
prins “Passed”, test_quotes_2
fails to execute due to bash syntax error.
Expected
Both stages print “Passed”
Environment information
Output of dvc doctor
:
Windows:
DVC version: 2.27.2 (pip)
---------------------------------
Platform: Python 3.9.7 on Windows-10-10.0.19044-SP0
Subprojects:
dvc_data = 0.10.0
dvc_objects = 0.4.0
dvc_render = 0.0.11
dvc_task = 0.1.2
dvclive = 0.10.0
scmrepo = 0.1.1
Supports:
http (aiohttp = 3.7.4.post0, aiohttp-retry = 2.8.3),
https (aiohttp = 3.7.4.post0, aiohttp-retry = 2.8.3),
s3 (s3fs = 2022.8.2, boto3 = 1.24.78)
Cache types: hardlink
Cache directory: NTFS on C:\
Caches: local
Remotes: local
Workspace directory: NTFS on C:\
Repo: dvc, git
Linux:
DVC version: 2.27.2 (pip)
---------------------------------
Platform: Python 3.9.7 on Linux-4.14.268-205.500.amzn2.x86_64-x86_64-with-glibc2.26
Subprojects:
dvc_data = 0.10.0
dvc_objects = 0.4.0
dvc_render = 0.0.11
dvc_task = 0.1.2
dvclive = 0.11.0
scmrepo = 0.1.1
Supports:
http (aiohttp = 3.8.3, aiohttp-retry = 2.8.3),
https (aiohttp = 3.8.3, aiohttp-retry = 2.8.3),
s3 (s3fs = 2022.8.2, boto3 = 1.24.59)
Cache types: <https://error.dvc.org/no-dvc-cache>
Caches: local
Remotes: None
Workspace directory: xfs on /dev/xvda1
Repo: dvc (no_scm)
Issue Analytics
- State:
- Created a year ago
- Reactions:2
- Comments:8 (5 by maintainers)
Top Results From Across the Web
Interpolation in Your DSO - Teledyne LeCroy
This picture demonstrates how to test SinX interpolation validity with a WaveMaster scope. The repetitive signal is applied in this case to channel...
Read more >Ansible: Invalid JSON when using --extra-vars - Stack Overflow
I have been struggling with an issue in ansible issue for days now. Everything is executed wihtin a Jenkins pipeline. The ansible command...
Read more >GIS Dictionary - Esri Support
Look up terms related to GIS operations, cartography, and Esri technology.
Read more >Projections | Google Earth Engine
This projection propagates back through the sequence of operations such that the inputs are requested in maps mercator, at a scale determined by...
Read more >An introduction to data cleaning with R
Reproduction is permitted, provided Statistics Netherlands is quoted as the ... notes describe a range of techniques, implemented in the R ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
@daavoo, unfortunately different shells use different parsing rules, so the problem can’t be solved in a platform-independent way in principle. For UNIX-like shells the proper way is to use shlex.quote().
For Windows one can use
subprocess.list2cmdline
. It is an internal undocumented function, but I don’t know a better option, and the algorithm is not trivial (see also How to escape os.system() calls?) It should be mentioned that foros.system()
,subprocess.list2cmdline
is not sufficient, one should also escape special characters. But if you always use thesubprocess
module, it is enough.Let me clarify. Dict unpacking is not just for fun. The result is usually executed, so it is essential that the resulting command line is well-formed from the point of view of the shell, in which it gonna be executed, and it is essential that this command line produces exactly the expected result. Different shells use different syntax, so if you pass the same command line to different shells, it may fail to execute or may execute in an unexpected way. I’d understand if you just insert values without modification. In this case users would have to insert quotes etc. themselves, but at least the result would be easily predictable. But you are trying to modify values: backslash-escape double quotes and add surrounding quotes. On Windows this may lead to utterly puzzling results. E.g. a single backslash will convert to a single double-quote!
This gets:
The algorithm to escape quotes and backslashes correctly is different on Windows and Linux. But fortunately, it is already implemented, you don’t need to invent it. Just use
shlex.quote
on Linux andsubprocess.list2cmdline
on Windows. If you want to understand why your approach fails on Windows, you can consult A Better Way To Understand Quoting and Escaping of Windows Command Line Arguments.