Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Either $ref resolution doesn't work, or $id is ignored.

See original GitHub issue

Wetzel version: Whatever it is in git right now. OS: Windows 10 Node: 16.13.1

Given the following two schemas placed in a subdirectory named schemas:

schemas\a.json:

{
    "$schema": "https://json-schema.org/draft/2020-12/schema",
    "$id": "http://example.com/schema/schema_a",
    "title": "schema a",
    "type": "object",
    "properties": {
        "something": { "type": "string" }
    }
}

schemas\b.json:

{
    "$schema": "https://json-schema.org/draft/2020-12/schema",
    "$id": "http://example.com/schema/schema_b",
    "$ref": "http://example.com/schema/schema_a",
    "title": "schema b",
    "description": "schema a with an additional property",
    "type": "object",
    "properties": {
        "something_else": { "type": "string" }
    }
}

When I run:

node bin\wetzel.js -p schemas -i "[\"a.json\"]" schemas\b.json

Wetzel fails with:

Error: Unable to find $ref http://example.com/schema/schema_a
    at replaceRef (C:\...\wetzel\lib\replaceRef.js:54:19)

Why isn’t it loading a.json and how do I make it find the references? Is my understanding of the -i option incorrect?

I tried hand-wavily adding -s schemas as well, but the result was the same.

Thanks!

Issue Analytics

State:
Created a year ago
Comments:6 (3 by maintainers)

Top GitHub Comments

1reaction

javaglcommented, Jul 3, 2022

And I thought that my issue comments were long 😌

Maybe you are overthinking it 😃. The meaning is straightforward and sensible

I might be overthinking this, but I have seen too many effects of ‘underthinking’, and this may just be a countermeasure. If you think that you can implement “The Right Solution®”, then feel free to open a pull request. As long as the updated state is still generating the same output for glTF, the repository maintainers will probably be willing to merge it.

But if you try, just a word of warning:

know what schemas are available already (e.g. Wetzel’s -i option),

The -i option is totally unrelated to the question which schemas are ‘known’. Its sole purpose is to not include these schemas (i.e. their types) in the ‘Table Of Contents’. The -s option might be closer related to that: It contains a ‘search path’. But … it’s difficult (and you will have a hard time convincing me otherwise). There is a DRAFT PR at https://github.com/CesiumGS/wetzel/pull/71 for supporting multiple search paths, but this raises a bunch of questions (most obviously, how to deal with ambiguities, related to the fact that the $id is not really used as an ‘ID’ in wetzel…)

Therefore the premise of the question, “if retrieving a URL indicated by the $id yields a 404 then why should it work?”, is not valid: … for example, from a map of the canonical URIs of a user-provided list of additional schemas [potentially falling back on an actual filesystem/network request, more on that below])

I’m roughly (!) aware of some of these caveats. I occasionally looked at https://json-schema.org/understanding-json-schema/structuring.html , which explains some of these concepts on a slightly less formal way than the specs that you linked to (but I won’t claim to have thoroughly understood all that, and admit that I did not read the technical version of the specs and all the RFCs that are necessary to really understand that).

My (somewhat shallow) understanding seems to be in line with what you said in a more profound and elaborate form. Roughly:

The $id is not really ‘the place where the schema file can be found’
- Most important implication: The $id can not be used as a basis for resolving $refs
The $id might be used as a real identifier in wetzel (i.e. actually the key of a dictionary for something = dictionary[schema.$id]), because it should be unique
- Caveat 1: It isn’t. There may be two schemas that define the same ID
- Caveat 2: There may be a $ref that does not use an ID, but a filename.

So this still leaves the question open about where and how exactly a $ref should be resolved. What is the actual URI for resolving a $ref? Yes, implementations SHOULD ‘know’ that…

const baseUrl = MagicalUrlFairy.whereAreWe();

but doing that in a ‘spec-compliant’ way that works in all cases that are covered by the spec, and (!) in all cases that appear in the real world can be difficult. Imagine you find a real-world schema that contains a $ref like

"$ref": "example.json"

You could argue that this is wrong, and it should use a proper ID (and that’s correct). But that’s not what’s happening. So where, exactly, is the example.json schema file? That depends on where you found the schema that contains this $ref, and in the case of wetzel, this may have been found in one of the ‘search paths’ that have been given at the command line…

An aside: All this does not yet address the issue of fragments in $refs. Covering the cases of

"$ref": "#example"
"$ref": "#/definitions/example"
"$ref": "foo.schema.json#/definitions/example"
"$ref": "https://example.com/foo#/definitions/example"

is not entirely trivial. (Some related code is in some branch, but again: This is faaar from perfect - it just ‘worked for me’, as far as I needed it…)

You seem to read the specs on a more detailed level than I do. So maybe I can throw in that random question here, which I carved out as some sort of “quiz”. Consider the following schema:

{
    "$schema": "https://json-schema.org/draft/2020-12/schema",
    "$id": "base.schema.json",
    "title": "Base",
    "type": "object",
    
    "definitions": {
        "example": {
            "type": "string"
        }
    },
    
    "additionalProperties": {
        "$ref": "#/definitions/example"
    }
    
}

It defines definitions/example to be of type string.

Now consider this one, “extending” it (even though there is no real ‘inheritance’ going on) :

{
    "$schema": "https://json-schema.org/draft/2020-12/schema",
    "$id": "extended.schema.json",
    "title": "Extended",
    "type": "object",
    "$ref": "base.schema.json",
    
    "definitions": {
        "example": {
            "type": "number"
        }
    },
    
    "additionalProperties": {
        "$ref": "#/definitions/example"
    }
}

It defines definitions/example to be of type number.

Question 1: What type may the additional properties have so that they conform to the second schema?

string
number

Question 2: Are you sure about your answer to question 1.?

Still, the specification and behavior of “id” has been present since draft-03

I have read this ‘change log’, but admittedly, I will not read through each link. But a big 👍 for that nevertheless, because I might take a closer look at the links when this becomes immediately relevant for my work, and in any case, it is a useful overview (maybe for the case that someone wants to support multiple draft versions).

To summarize it, subjectively: The id/$id (sic!) was always present, but important details have changed considerably (or at least, been clarified) throughout the draft versions. If somebody was supposed to implement something like wetzel from scratch, ‘on a green field’, it would be far easier to look at the latest spec and follow it diligently. Or to directly address the bottom line:

In other words, $id has always been supposed to identify a schema, and schemas have always been required to be addressable by their $id.

This has never been followed in glTF, and it was never implemented in wetzel.

Of course “some limitations” is completely reasonable, but schema IDs are a fundamental feature of JSON Schema. … To be clear, my intent isn’t to dish out negative criticism or make demands. What I mean is: Wetzel can obviously do whatever you want it or need it to do, but I strongly feel that if it’s not going to be given some more compliant behavior, then it at least ought to be removed from json-schema.org’s front page given its current level of compliance.

That’s all fine for me. I’m also only a user of wetzel. It does what it was intended for, but there are many aspects of the JSON schema that it did never handle correctly, and many aspects that it did never handle at all (roughly: because it wasn’t necessary for glTF).

Or to put it that way: Wetzel <strike>MUST</strike> SHOULD be improved in many ways.

That’s definitely helpful. Essentially, the following URIs are defined: … And resolution is performed by: …

I went through some of these steps/approaches while I tried to use wetzel for a more complex schema. I originally tried to do these changes incrementally, in a somewhat backward-compatible way. But at some point, I had to ‘burn some bridges’, because the necessary changes completely changed the original implementation, and of course, the refactored state is still far from perfect, and vastly different from something that one could do when…

implementing it this from scratch
implementing it only for the latest schema version
implementing it under the assumption that all inputs perfectly follow the specs

I also considered to use the $id for actual lookups (i.e. as a real identifier), but considering that this is not sufficient for actually resolving a $ref, one still has to carry along the “actual base URL” together with the $id (i.e. one of the “search paths”). Some of that is addressed in the SchemaRepository of the refactored state, but not in a deeply spec-compliant way.

Incidentally, AJV’s loadSchema callback …

I occasionally looked at AJV. It is a project with ~11000 stars, ~2600 commits, ~150 releases, billion-dollar companies as sponsors, 180 contributors, (and still, 169 open issues and 29 pending pull requests). It’s an entirely different category of project than wetzel. One may find some “inspiration” there, in terms of spec-compliant handling of details like $id and $ref. But carving out the relevant parts (and translating them to JavaScript) does not seem to be a reasonable approach - and even if someone did that: If the result was something that couldn’t re-generate the glTF spec, verbatim, then it would be moot…

As for glTF, those schemas are technically non-compliant with the current draft. … I could very reasonably go over to the glTF issue page and request that their schemas be given absolute URIs (don’t worry, I won’t, that’d be kind of a dick move given the current conversation, 😂).

I just did that dick move: https://github.com/KhronosGroup/glTF/issues/2182 . It is a valid point, so why not. The fact that glTF and wetzel are somewhat “coupled” should not prevent <strike>changes</strike> improvements on either side. But even when the $id in glTF are changed: This will not immediately affect wetzel. As I said: The $id is not used at all right now, so any way of taking it into account would require a considerable refactoring.

1reaction

javaglcommented, Jul 1, 2022

So either Wetzel is doing something wrong on the resolution end, or it’s just plain ignoring $id. Not sure, but whatever it is, it appears to be non-compliant.

It’s both. As far as I know, the $id is not accessed anywhere in the codebase of wetzel at all, and even if it was, it would certainly not be used for any sort of resolution.

I cannot point my finger at “the” reason. And I agree that we could consider to make wetzel more compliant to the specification in this regard. But some aspects to keep in mind:

It’s complicated. The quoted statements like “implementations SHOULD understand ahead of time which schemas they will be using” and “Implementations SHOULD be able to associate arbitrary URIs with an arbitrary schema and/or automatically associate a schema’s “$id”-given URI” would still leave me with the question: “What do ‘understand’ or ‘associate’ mean here, exactly, on the implementation level?”.
Schemas are usually not published at the ID path. The overly naive way of phrasing this is " http://example.com/schema/a.json yields a 404, so why should that work, exactly?". In order to actually work and be resolvable, each schema has to be associated with a “base URI” from where ‘ref’ schemas actually can be resolved. And this base URI can basically never be obtained from the $id anyhow.
Things change rapidly and arbitrarily Wetzel was started with JSON schema draft-03 or draft-04, and many things have changed (or been clarified) in the meantime. There have been considerable changes in the mechanisms behind $id and $ref even between draft 2019-09 and draft 2020-12 (and this makes you wonder whether there will ever be a ‘JSON Schema 1.0.0 (final)’ …). Keeping up with the subtle changes between these drafts is challenging.

These points may appear to be a bit shallow and handwaving. But maybe some background is relevant here: wetzel was mainly intended for generating the property reference for the glTF schema. The glTF schema uses IDs like "$id": "accessor.schema.json". So there wasn’t so much effort put into implementing a ‘JSON schema spec compliant resolution mechanism’. The focus is that it should “Work In Practice®”. And at this point, the most important use case is that a $ref contains a file name (like in your “Case 3”), and this is resolved against whatever that file is supposed to refer to.

It may not be perfect in terms of spec compliance. But it works for glTF and other schemas.

An aside: In the refactored state that I pointed to in another issue, I tried to at least carry along some information about the ‘base URI’ together with the schema. This ‘base URI’ still consists of a ‘directory name’ in the current state, but at least, there is a structure for carrying that sort of information, which could either be derived from the $id or from the local file name. While still faaar from being perfect, it might be possible to come closer to the spec based on this state - see SchemaEntry.

Top Results From Across the Web

StaleElementReferenceException on Python Selenium

It means the element is no longer in the DOM, or it changed. The following code will help you find the element by...

External reference (xref) file is missing or unresolved in ...

On the command line in AutoCAD, type OPTIONS and then click the Open and Save tab. In the External References section, from the...

Tracking vs. No-Tracking Queries - EF Core | Microsoft Learn

No-tracking queries don't use the change tracker and don't do identity resolution. So you get back a new instance of the entity even...

Structuring a complex schema — Understanding JSON ...

When an object contains a $ref property, the object is considered a reference, not a schema. Therefore, any other properties you put in...

Contexts and Dependency Injection - Quarkus

dependencies that contain a beans.xml descriptor (content is ignored), ... It may happen that some beans from third-party libraries do not work correctly...