Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

long datatype is unusable with JavaScript and potentially other languages

See original GitHub issue

Hi,

I started reading the 2.0 specification and immediately encountered a serious cross-language compatibility problem.

https://github.com/OAI/OpenAPI-Specification/blob/master/versions/2.0.md#data-types

The datatype “long” is defined as a JSON Schema integer with a format keyword of int64.

JSON Schema integers are of course implemented in JSON using the JSON number type. As RFC 7159 points out, although the JSON specification defines numbers as unbounded, the reality is that languages can and do refuse to deserialize numbers outside of certain ranges; most notably in JavaScript, where all JSON numbers are deserialized into the sole JavaScript number type, which is 64-bit floating point and therefore cannot handle integers larger than about 2^53.

https://tools.ietf.org/html/rfc7159#section-6

The Google Discovery Document format, which Swagger drew a lot of inspiration from, recognized this problem and defined an int64 format that is backed by a JSON Schema string, not an integer. It seems that Swagger/OpenAPI changed this. If you were to actually use the datatype as defined in OpenAPI, you would be locking out all JavaScript clients from using your API.

A real-world example of this is Twitter’s APIs, which initially provided tweet IDs as a JSON number, and then discovered that JavaScript clients weren’t going to be able to handle that, so they had to add a second, string-based field containing the same value.

I have two proposals to address this issue.

The first is simpler, but does not address the larger problem of unbounded numbers in JSON: redefine the “long” datatype as a string.

The second is to drop the int32 and int64 formats entirely. Instead, mandate that all instances of integer and number types must provide explicit values for maximum and minimum to be considered valid. Additionally, do not allow maximum and minimum to exceed (2^31)-1 or be smaller than -(2^31). This would ensure that JSON numbers are used in the most cross-language compatible way possible.

Additional formats or perhaps common schemas could be added to support larger values via the string type.

Issue Analytics

State:
Created 7 years ago
Comments:8 (3 by maintainers)

Top GitHub Comments

7reactions

webroncommented, May 22, 2016

@auspicacious - you’re taking it into a slightly wrong direction, IMHO, and I say this as someone who encountered this issue before with other users.

The key problem here is the API design and not the documentation. Large numeric values would have issues with several languages, not just javascript. In Java, for example, I’d say that type integer that has no format defined would translate to BigInteger and not long. The format can serve two purposes - validation, and data type hinting (especially since the number of available types in JSON Schema is limited).

Now, it is true that javascript cannot (at least by default) process unbounded numbers. However, by using type: string for unbounded numbers - you are saying that the value itself should be transferred as a string and not a number. This is a reasonable solution to the problem (one which we’ve seen used due to the same limitation) - but this is an API design choice and has nothing to do with OpenAPI as a spec. My recommendation in this case would be to use format: number alongside the string type to indicate that it should be a number (or integer and so on). Most basic validators will not know what to do with it, but those can be extended to ‘understand’ what it means.

1reaction

auspicaciouscommented, May 23, 2016

I’m afraid that I disagree with that.

I’m in a similar position to you in that I have spent much of the past three years trying to educate people about the difficulties in authoring HTTP/JSON/REST APIs that will consistently work across platforms. Another important focus has been on making it easier to determine backwards-compatibility in APIs; you’ll see how those two intertwine in a moment.

I take it as an assumption that since you are developing a specification for HTTP/JSON APIs you are interested in building APIs that can be consumed by the widest variety of programming languages possible; I know that was the driver behind my company’s switch from SOAP and binary format. And it goes without saying that you want to help API designers do the right thing.

Further, if you’ve encountered this problem before, you know that most people don’t understand that this problem exists. Even the fact that RFC 7159 explicitly encourages developers never to use values greater than 2^53 isn’t widely known. I’d say it’s even worse: most developers I know don’t even think about cross-platform compatibility; they just never have had to before. I’ve had people tell me that because jsonschema2pojo generates a 32-bit integer in Java when you pass "type": "integer" therefore the JSON Schema standard is saying that "integer" means 32-bit integer. I’ve had people tell me that JSON numbers have an implicit bound because “JS” stands for “JavaScript,” so all JavaScript rules apply. These are people who are responsible for designing APIs that I then have to tell them to re-write, when I’m able to catch them.

So, if you have a new developer, and they read an OpenAPI specification that encourages them to design an API that will break in one of the most commonly used Web languages, who is responsible? I believe that you are.

You also commented that Java should generate BigInteger when it sees "type": "integer". You’re right, it should. That would be defensive. But the only open-source code generator for Java, jsonschema2pojo, doesn’t – it generates int, or maybe long if you read the options! And this, of course, is a casualty of the fact that JSON Schema was so poorly specified, because no-one took responsibility for these issues. A naive person came along, tried to be helpful, and has helped encourage many people to shoot themselves in the foot.

Unless, of course, you are not actually interested in people consuming APIs specified by OpenAPI. I’m sure that people don’t enjoy being told that they shouldn’t have actually used the tools that are provided to them by the specification; it does not engender confidence, and you need to realize that 99% of your users do not come to OpenAPI with the knowledge they need to make this “API design choice” on their own.

Moreover, it does no harm to use string as a transport rather than integer; both will get the job done. Who would choose less cross-language compatibility for an HTTP API? Again, that’s one of the primary reasons people choose HTTP and JSON to begin with.

I said that backwards compatibility plays into this as well, and here’s how. I think that your proposal to use format to handle these situations doesn’t go far enough.

OpenAPI APIs contain a version number, so I assume that there is some level of concern about backwards-compatibility. For example, let’s say that someone defines an API with an ID field. This ID field is initially represented by a 32-bit integer. They never really mention this to their clients, and many of their clients, without better guidance available, create database columns to store that ID that can contain a 32-bit integer.

The API designers realize they’re running out of IDs and move to a 64-bit integer, causing massive errors and downtime in their clients, who have to scramble to redefine databases that are now quite large. I could present similar scenarios for string, but I’ll stick to integer here.

But this is easy to avoid, and simultaneously solve the primary problem we’re discussing. I mentioned this solution in my initial post.

If OpenAPI required items of type integer to have a maximum and a minimum, it becomes far easier to detect changes that are not backwards-compatible.

Moreover, it becomes possible for OpenAPI to specify absolute maximums and minimums, for example, those that correspond to a signed 32-bit integer. It allows OpenAPI to explain to its users why they should do this, in order to protect themselves. It allows tool implementers to make the right choices for their languages. In the absence of positive information about these restrictions, people will make the wrong choices. Boundaries must be explicit, or people will fail to consider them.

It is not possible to make design choices without being informed. As it stands, the OpenAPI specification provides the tools for people to make the wrong design choices, but doesn’t even provide a hint that they might be wrong. This is setting people up for failure. That’s not responsible.