Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validator::validate() doesn't correctly apply schema ID when the schema is an assoc array (works when an object) #767

Open
dkarlovi opened this issue Nov 22, 2024 · 12 comments · May be fixed by #769

Comments

@dkarlovi
Copy link

dkarlovi commented Nov 22, 2024

Update: this is due to the validator not applying the schema ID correctly, it only handles an object, not an assoc array.

I have this schema:

{
    "id": "tests/schemas/suggest/empty.schema.json",
    "allOf": [
        {
            "$ref": "baseline.schema.json"
        }
    ],
    "properties": {
        "items": {
            "maxItems": 0
        }
    }
}

With the update to 6.x, this fails with

      file_get_contents(internal://provided-schema/tests/schemas/suggest/baseline.schema.json): Failed to open stream: No such file or directory (JsonSchema\Exception\ResourceNotFoundException)

I guess the schemaless ref is now interpreted as internal in some way?

If I modify my schema as so

{
    "id": "tests/schemas/suggest/empty.schema.json",
    "allOf": [
        {
            "$ref": "file://tests/schemas/suggest/baseline.schema.json"
        }
    ],
    "properties": {
        "items": {
            "maxItems": 0
        }
    }
}

Now it says

      file_get_contents(file://tests/schemas/suggest/baseline.schema.json): Failed to open stream: no suitable wrapper could be found (JsonSchema\Exception\ResourceNotFoundException)

which means it understands it needs to read the file locally. For some reason, it doesn't read the file even though it exists.

If I add

$uri = substr($uri, 7);

to strip file:// before here, it works.

$response = file_get_contents($uri);

Should this happen conditionally, if there's a file:// prefix, strip it? There's some file:// special handling a little later in the function, but it's already too late because an error was raised.

@dkarlovi

This comment was marked as outdated.

@DannyvdSluijs
Copy link
Collaborator

DannyvdSluijs commented Nov 22, 2024

Hi @dkarlovi

In willing to check this out but would need a little bit more context. Could you share the code in order te reproduce this issue?

Did you also check with the examples from the wiki? I guess the https://github.com/jsonrainbow/json-schema/wiki/Validate-using-an-inline-schema might best fit your case?

@dkarlovi
Copy link
Author

Hello, this is just validating a payload against a schema, but the schema is built as building blocks so being able to load relative schemas to compose into a larger more complex one is key.

I can send the code snippet, but there's nothing unexpected, the schema I'm using is just composed of smaller schemas in files next to it. This worked as shown in 5.x, it works in 6.x with the path change and the snippet shown, even though I think the path should be relative to the current schema, which 5.x did but 6.x does not.

@DannyvdSluijs
Copy link
Collaborator

@dkarlovi I think the code snippet can be very helpful in under standing the issue/impact. It is also to assist in the most effective way. This project is open source and is powered by people putting in their time for free.

This repo has a Bowtie report which points out that JSON Schema test suite with the references are working. This just makes me think since it worked for you in 5.x but not in 6.x but the 6.x bowtie report doesn't report issues with ref that between version 5 and 6 something has changed in terms of implementation that we need to highlight.

@dkarlovi
Copy link
Author

@DannyvdSluijs yes, I understand. 👍 I'll try to create a minimal reproducer.

@dkarlovi
Copy link
Author

dkarlovi commented Nov 25, 2024

@DannyvdSluijs found the issue: I was passing my schema as an assoc array. When I pass it as an object, it works.

This fails as is:

<?php

use JsonSchema\Constraints\Constraint;
use JsonSchema\Validator;

require_once __DIR__ . '/vendor/autoload.php';

$data = new stdClass;
$data->results = [];

validate($data, __DIR__.'/tests/schemas/suggest/empty.schema.json');

function validate(array|object $data, string $path): void
{
    // switch second arg to false to have it work
    $schema = json_decode(file_get_contents($path), true);

    $validator = new Validator();
    $response = $validator->validate(
        $data,
        $schema,
        Constraint::CHECK_MODE_VALIDATE_SCHEMA & Constraint::CHECK_MODE_APPLY_DEFAULTS
    );
    if ($response === 0) {
        return;
    }

    var_dump($validator->getErrors());
}

It only checks for is_object here and uses the ID if given, but it can AFAIK also be an assoc array, this doesn't take it into account.

if (is_object($schema) && property_exists($schema, 'id')) {
$schemaURI = $schema->id;
} else {
$schemaURI = SchemaStorage::INTERNAL_PROVIDED_SCHEMA_URI;
}

@dkarlovi dkarlovi changed the title \JsonSchema\Uri\Retrievers\FileGetContents::retrieve should strip file:// before doing file_get_contents() Validator::validate() doesn't correctly apply schema ID when the schema is an assoc array (works when an object) Nov 25, 2024
@DannyvdSluijs
Copy link
Collaborator

@dkarlovi great that you found the cause.
I have some concerns about solving this and accepting an associative array. Whereas I’m leaning towards an opinion that schemas should be objects and the fix lies in altering your json_decode call. Can you elaborate on the why you would deviate from the default and decode the json string into an associative array instead of an object?

Secondly, I would like to first understand why this change was added tot v6 in the first place before starting to add code in order to support associative array schemas.

As raised in other issues as well, associative arrays is something very specific to PHP and no other programming language.

@dkarlovi
Copy link
Author

dkarlovi commented Nov 26, 2024

@DannyvdSluijs this is probably some misunderstanding? The patch doesn't introduce support for accepting schemas as assoc arrays: it was always supported, you even have a test to make sure it works:

public function testValidateWithAssocSchema(): void
{
$schema = json_decode('{"properties":{"propertyOne":{"type":"array","items":[{"type":"string"}]}}}', true);
$data = json_decode('{"propertyOne":[42]}', true);
$validator = new Validator();
$validator->validate($data, $schema);
$this->assertFalse($validator->isValid(), 'Validation succeeded, but should have failed.');
}

This patch is fixing the regression in 6.x which broke this specific thing with not applying the schema ID correctly from an assoc array.

schemas should be objects

If so, the validate() method should clearly state so, currently it does not.

@DannyvdSluijs
Copy link
Collaborator

Well that helps a lot in clarifying things. I'll look into it.

@DannyvdSluijs
Copy link
Collaborator

Ive been diving into the git blame and history. While doing so I was able to find #389 which actually introduces the code in 5.x.x. which takes care of your case.
A couple of days later #407 was raised as there was impact to the added functionality and the object to array casting was removed. This raises a red flag with regards to supporting assoc. array schemas.

Since the suggested change is only adding the reading of the id key when the schema is an array at that point the fix seems easy. But following along what happens in the code, when the schema is registered in the storage it is casted to an object. Also according to the SchemaStorageInterface the addSchema functions should accept an object but apparently is stil accepting an array. The getSchema also is annotated to return an object

Seeing the history, the intention and the internal casting of schema's to objects I would lean towards the "Schema should be object" thought. Which then bring back your point that it currently isn't correctly stated by the lib. This might be the alternative approach to explore.

I'm very eager to see what @erayd thinks about this case and see if he is available for a brief answer or more.

@dkarlovi I also wanted to point out that you're still helping out. I noticed you're a First Time Contributor to the project and the amount of information as well as the quick turn around with information is helping us here. So please bear with us in finding the correct approach and seeing how we move forward with this and #769

@erayd
Copy link
Contributor

erayd commented Nov 27, 2024

I'm very eager to see what @erayd thinks about this case and see if he is available for a brief answer or more.

@DannyvdSluijs Happy to dig back into this, although it was over seven years ago - definitely going to be stretching the memory a bit on this one!

What is it specifically you're wanting me to look at? The OP report, or are you wanting some clarification on the historical changes you've linked above?

@dkarlovi
Copy link
Author

@DannyvdSluijs from my POV, the solution is quite clear:

  1. the library without a doubt supports assoc arrays as schemas currently, as it stands
  2. this probably isn't ideal and you're saying it shouldn't be so (which I tend to agree with)
  3. but this cannot change in 5.x or 6.x to keep BC, it can only get deprecated in 6.x and removed in 7.x

The proposed patch fixes this feature and removes the regression made in the 5.x -> 6.x transition. The patch should get accepted (or an equivalent patch be made) even if the feature is to be deprecated in 6.x and removed in 7.x, that consideration (which, again, I mostly agree with) should be made in a totally separate PR. Otherwise, you're keeping a fully supported feature broken for no good reason.

This will get removed in next major so why fix the bug in this one?

isn't a good argument, it's broken now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants