question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Corner cases where getopt behavior is not mimiced: -- or --help as string values

See original GitHub issue

The goal of this library is to mimic the behavior of getopt, but there are a few corner cases where this library behaves differently than getopt would: in the handling of -- or --help when they are the value of a string parameter.

How getopt behaves

First, an illustration of how getopt works with the particular corner case I’m demonstrating. Let’s look at the standard gzip and gunzip tools found with any Linux distribution. They take many options, but one of them is --suffix (or -S for short); this lets you specify a different suffix than the standard .gz for the compressed file. E.g. if you have a README.md file in the current directory, then gzip -S .compressed README.md will create a README.md.compressed file instead of README.md.gz.

Now, what do you think will happen if I run this command?

gzip -S -- README.md

The correct answer is that it will create a compressed file named README.md-- in the current directory. Because the string -- was specified immediately after an option that takes a string value, it was processed as the value for that option (the --suffix option), and so gzip created a file with a -- suffix instead of .gz. Now look at these three examples:

1. gzip -- --help
2. gzip -S -- --help
3. gzip -S -- -- --help

What do you think these will do? Answer:

  1. This will compress a file named --help in the current directory, and create a file named --help.gz.
  2. This will print the help text, and do nothing else.
  3. This will compress a file named --help in the current directory, and create a file named --help--.

Why did gzip -S -- --help print the help text? Because -- was the value for the -S option, and so it was not treated as the “stop processing options now” marker. Then after the -S option was fully processed, the only remaining options were --help. Since --help was encountered, gzip displayed the help screen and did nothing else.

With the gzip -S -- -- --help line, OTOH, the first -- became the value for the -S option. Then the second -- was processed as an option, and had the “stop processing options now” meaning. So the --help text was treated as a value, and so it looked for a file named --help to compress. And since I specified that the compressed suffix should be --, the compressed file was named --help--.

What CommandLine does

The current way CommandLine works is to call a preprocessor function to look for any -- options and, if found, mark anything found after them as a value. But this would mean that in the gzip -S -- --help example, where the correct getopt-mimicing behavior would be to print the help text, CommandLine will instead return an error saying that -S needed a value and didn’t get one.

This corner case actually shows a fundamental difference between the behavior of CommandLine and the behavior of getopt. CommandLine uses a tokenizer to parse the command-line arguments and decide, based on the presence of - or -- at the front, to treat them as Name tokens or Value tokens. But if you read the getopt source code and figure out what it’s actually doing, it’s parsing one argument at a time, deciding whether that argument needs a value, and then if a value is needed, it swallows the next argument without further processing. Which is why you can pass -- as the suffix in gzip, and it will happily accept that.

What CommandLine should do

The tokenizer, instead of processing all the arguments at once and deciding whether they’re names or values, should process each argument one at a time. Then the decision tree should look like:

  • Is this option exactly -- and EnableDashDash is true? Then stop processing; the rest of the arguments are all values.
  • Is this option exactly -- and EnableDashDash is false? Then it is the value --; continue processing the next argument.
  • Does this option start with -- and contain an equals sign? Then split it into two tokens, the part before the = is the name, and the part after the equals is the value. (Split at the first equals sign; any equals signs after that point would become part of the value).
  • Does this option start with -- and not contain an equals sign? Then we look at the list of option longnames that the tokenizer was given:
    • Name matches a boolean option: this is a name token. Resume tokenizing with the next argument (it is NOT swallowed).
    • NEW FEATURE: Name matches an int option and the option attribute has AllowMultiple=true: this is a name token. Resume tokenizing with the next argument (it is NOT swallowed). (This allows for things like -v or --verbose to be passed multiple times, like -vvv, which the parser will turn into Verbose=3 in the final options instance.)
    • Name matches an option that’s neither of the two cases above (boolean or int with AllowMultiple): this is a name token, and the next argument is a value token no matter what it is. “Swallow” the next argument, and resume tokenizing with the argument after next.
  • Does the option start with - and contain only letters that match shortnames? Split it into multiple shortnames. (I.e., -lR would become Name("l"), Name("R") if there are both -l and -R options).
  • Does the option start with - and its first letter matches a shortname, but the rest does not? Split it into first letter & rest, and that’s two tokens: Name(first letter) and Value(rest).
  • Does the option start with - and have only one letter? Then it’s a shortname, and we look at the type of the option with that shortname:
    • As above, if boolean, then don’t swallow the next argument.
    • NEW FEATURE: As above, if int with AllowMultiple, then don’t swallow the next argument.
    • As above, if other type, then swallow the next argument (WHATEVER it is) and treat it as a value.

Conclusion

Unfortunately, if the goal of getopt compatibility is to be achieved, a big rewrite of the guts of CommandLine’s tokenizer and parser will be needed, so this is a big job. But if we want to mimic the behavior of getopt, then that’s what will be needed. And the behavior I described above is how getopt works.

Also unfortunately, this is probably going to be a breaking change, so it might end up requiring a 3.0 version number. Because some people might be very surprised when --stringoption --booloption ends up being parsed with --booloption as the string value of --stringoption; they would probably have come to expect that to produce a MissingValueOptionError for --stringoption. But surprise or not, the correct way to handle that is for --booloption to be the string value of --stringoption in that example.

Issue Analytics

  • State:open
  • Created 4 years ago
  • Comments:10 (6 by maintainers)

github_iconTop GitHub Comments

1reaction
rmunncommented, Apr 5, 2020

@Ozzard #607 also treats a bare - as a value — see https://github.com/commandlineparser/commandline/pull/607/files#diff-c55127e12f4102753e3927ba25bfba42R59 — for precisely that reason. It’s up to you to convert the - into stdin or stdout as appropriate, but it will be a value and not an empty option.

0reactions
rmunncommented, Jun 10, 2020

Ah, so you were using POSIXLY_CORRECT in those options. I didn’t understand that, since you didn’t show it in the grep command lines you posted, and I don’t know anyone who has it set by default in their .bashrc because the default getopt behavior is so much more useful than the POSIX standard behavior.

And you’re arguing that CLP should mimic the POSIX standard by default, whereas I’m arguing that it should mimic getopt’s default (non-POSIX) behavior by default.

Actually, it will be pretty easy to allow both; I’ll tweak PR #607 to add a ParserSettings option called PosixlyCorrect that turns on the POSIX behavior (stop processing optons after first non-option argument), and I’ll also make it honor the POSIXLY_CORRECT environment variable so that end users who expect that behavior can make it happen. (And after doing a bit of Googling on the subject myself, I’ve come to the conclusion that sometimes POSIXLY_CORRECT is what you want, but most of the time it’s not since most people write Bash scripts with the assumption that getopt’s default mixed-options-and-values behavior is what they’re going to get. So allowing for both behaviors is definitely the right thing to do here. I’ll leave it defaulting to mixed, since it seems that that’s what most people expect, but there will be a ParserSettings option to change that (like putting a + in front of the options string of getopt).

As for the question of validation of option values, I am firmly convinced that CLP should do exactly as much validation as is needed to validate the types of the options, and nothing more. I.e., if -s is a string option and -n is a number (say an int) option, then -n foo should be rejected, but -n -1 should be accepted and put the value -1 (negative one) into the Number property. And -s foo should be accepted, and so should -s -1, because CLP cannot know the end user’s intent. What if the end user preferred having tarballs with a .tar-gz extension instead of .tar.gz? If getopt worked the way CLP currently does, gzip -S -gz file.tar would throw an error, instead of producing the file.tar-gz file that the user wanted. But since opinion clearly does differ on this subject, I’ll put in another ParserSettings option to change that, and forbid string values starting with a - (except for the bare - value which means “stdin/stdout”, and should always be allowable as a string value). I have a feeling that most people will want to permit string values that start with -, so I think that most people will want to turn that particular option off, but in deference to CLP’s current behavior I’ll default that one to on so that the “no options that start with -” validation is kept by default.

AFAICT, the changes I made to the parser don’t change the validation of ints or other types: -n foo will still produce a parser error when it tries to convert “foo” to an integer. So I only really need to care about this for string values, because integer values in particular need to be able to allow -1 and the like.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Using getopts to process long and short command line ...
Bash builtin getopts . This does not support long option names with the double-dash prefix. It only supports single-character options.
Read more >
How can I detect that no options were passed with getopts?
If none of the options take an argument, the value of OPTIND indicates how many options were passed.
Read more >
What is the purpose of the very first character of the option- ...
If the very first character of optstring is a colon, getopts will not produce any diagnostic messages for missing option arguments or ...
Read more >
Changelog — Python 3.11.4 documentation
gh-97786: Fix potential undefined behaviour in corner cases of floating-point-to-time conversions. gh-101517: Fixed bug where bdb looks up the source line with ...
Read more >
[eZine] Perl Underground 2
Perl Underground maintains a policy of not providing an e-mail address to the ... That means, we want to treat the strings as...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found