confusion about parseInteger
See original GitHub issueI just tried to solve a Codegolf task, paraphrased:
Given a string, parse it in all base-n representations (n in 2…36) where that makes sense (i.e. no digit >= n), and add all the results together.
This looks like a straightforward implementation:
Integer baseSum(String string) =>
sum((2..36).map((radix) => parseInteger(string, radix) else 0));
(This function tries to parse the string in each basis, uses 0 where that doesn’t succeed, and sums the results up.)
It works for most examples, but when passed "2T"
, it returns 2000000000665
instead of the wanted 665
.
It turns out that parseInteger(string, radix)
has some special handling when radix
is 10 – then the suffixes “kMGTP” have their SI prefix meaning, so this is “2 Tera + 665”.
The documentation seems to support this:
The syntax accepted by this function is the same as the syntax for an Integer literal in the Ceylon language except that it may optionally begin with a sign character (+ or -) and may not contain grouping underscore characters.
But that actually is also not true … while the “Numeric literals” section in the Ceylon language spec allows these suffixes, it also allows some prefixes.
Try to guess what this code prints:
print(2T);
print(parseInteger("2T"));
print(parseInteger("2T", 10));
print($100);
print(parseInteger("$100"));
print(parseInteger("$100", 2));
print(#100);
print(parseInteger("#100"));
print(parseInteger("#100", 16));
print(2_000);
print(parseInteger("2_000"));
print(parseInteger("2_000", 10));
The output (with Ceylon 1.2):
2000000000000
2000000000000
2000000000000
4
<null>
<null>
256
<null>
<null>
2000
<null>
<null>
So parseInteger
doesn’t actually support the integer literal syntax, just the part of it which corresponds to decimal literals (and even that not completely, as the grouping using _
is also not supported).
parseInteger
also behaves as expected (by me) for all bases other than 10, and just has this strange exception for accepting these SI-prefix-like suffixes.
I could imagine this alternative (“more sensible”, in my eyes) behavior, if we want to support both “literal parsing” and “arbitrary base parsing”:
- When given an explicit radix parameter (whether 10 or not), just parse the string in this base. I.e. the SI suffixes would not be allowed.
- When given no radix parameter (or some special value indicating this), parse the string as a Ceylon Integer literal (including both decimal literals with possible SI suffixes, and also binary and hexadecimal literals with their
$
and#
prefixes).
So the type of parseInteger
could be Integer(String,Integer|\IceylonLiteral=)
, with a ceylonLiteral
object indicating that the second version should be used (and this would be the same as not passing the parameter).
I would guess there are not that many programs actually relying on the current behavior compared to this one.
Issue Analytics
- State:
- Created 8 years ago
- Comments:29 (28 by maintainers)
Top GitHub Comments
I don’t see how the number makes the assertion any less true, behaving differently 1 in a million times is still behaving differently.
And when of those FFFFFF times FFFFFE will be for plain integers without any modifiers then what is the use? In fact I’d like to throw in my own invented statistic: 99.9% of the times people will just want to parse normal integers. What is the use-case for parsing Ceylon integers after all? I can only imagine one: you’re writing your own Ceylon parser. In all other cases people will have to write guards to first check they’re dealing only with digits before being able to use this function without modifiers (and for parseFloat that’s even worse).
But we could easily support an extra flag that enables or disables this behaviour. Or two different methods (one accepting a radix and no modifiers and one only for radix 10 with modifiers). That way these functions are more useful and can be used in a more diverse set of circumstances.
Ah, thanks, I missed this. Sorry for the confusion.