Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[Enhancement] : Better error message with line number

See original GitHub issue

Hi @mildebrandt and the other contributors. First thank you for this amazing library. I honestly think this should be the standard way to validate YAML schema. It would be amazing if you can reach “ruamel.yaml” and “pyyaml” so they could include your library and set the way to validate schema in YAML.

Anyway. I started using Yamale and found myself to wish to get the line number in the yaml file where I have errors. I am using ruamel.yaml, (I don’t really now pyyaml but I guess it should be possible too). I use ruamel.yaml in round-trip mode and extract the line number of each keys and I copy your format (with the ‘.’ for nested keys). I get a one level dict with the line number.

Here below my code with comments as well as an example with output. I hope you will find it a nice addition, ideally I see that you add the line numbers into yamale.YamaleError.results.errors

from pathlib import Path
from typing import Dict

import yamale
from ruamel.yaml import YAML
from ruamel.yaml.comments import CommentedMap


def _get_lc_dict_helper(data: CommentedMap, dict_key_line: Dict[str, int], parentkey: str = "") -> Dict[str, int]:
    """
    Recursive helper function to fetch the line infos of each keys in the config yaml file.

    Built to be called inside of `_get_lc_dict`.
    """
    sep = "."  # Don't modify, it is to match the "keys" return in the errors of the yamale lib.
    keys_indexes = None

    try:
        if len(data) > 0:
            keys_indexes = range(len(data))
    except TypeError:
        pass
    try:
        keys = data.keys()
        keys_indexes = keys
    except AttributeError:
        pass

    if keys_indexes is None:
        return dict_key_line  # return condition from recursion

    for key in keys_indexes:
        if parentkey != "":
            keyref = parentkey + sep + str(key)
        else:
            keyref = str(key)
        try:
            lnum = data.lc.data[key][0] + 1
            if keyref in dict_key_line:
                print(
                    f"WARNING : key '{keyref}' is NOT UNIQUE, at lines {dict_key_line[keyref]:>4} and {lnum:>4}."
                    f" (overwriting)."
                )
            dict_key_line[keyref] = lnum
            # print(f"line {lnum:<3} : {keyref}")
            _get_lc_dict_helper(data[key], dict_key_line, keyref)  # recursion
        except AttributeError:
            pass

    return dict_key_line


def _get_lc_dict(path: Path) -> Dict[str, int]:
    """
    Helper function to trace back the line number in the yaml file for each keys.

    Built to be called inside of `validate`.

    Parameters
    ----------
    path : Path
        Path to the config yaml file (not the schema).

    Returns
    -------
    Dict[str, int]
        Maps the keys to their line number, the line counter (lc).
        This dictionary is only 1 level and the keys corresponds to the ones report by the yamale lib.
    """
    dict_key_line: Dict[str, int] = {}
    with YAML(typ="rt") as yaml:
        for data in yaml.load_all(path):
            dict_key_line = _get_lc_dict_helper(data, dict_key_line)
    return dict_key_line


def validate(path_schema: Path, path_data: Path):
    """
    Validates the config yaml file according to the schema yaml file.

    Will be silent if good and will exit the program if there is an error,
    and will output an detailed error message to fix the config file.

    Parameters
    ----------
    path_schema : Path
        Path to the schema yaml file.
    path_data : Path
        Path to the config yaml file.
    """
    # Create a schema object
    schema = yamale.make_schema(path=path_schema, parser="ruamel")

    # Create a Data object
    config = yamale.make_data(path=path_data, parser="ruamel")
    # Validate data against the schema. Throws a ValueError if data is invalid.
    try:
        yamale.validate(schema, config)
        print("Validation success!👍")
    except yamale.YamaleError as e:
        errmsg = "Validation failed!\n"
        lc = _get_lc_dict(path_data)
        for result in e.results:
            title1 = "Schema"
            title2 = "Config"
            sep = f"{'-'*40}\n"
            errmsg += f"{title1:<10} : {result.schema}\n{title2:<10} : {result.data}\n{sep}"
            for error in result.errors:
                keyerr = error.split(":", 1)
                keypath = keyerr[0]
                err = keyerr[1]
                l_num = lc.get(keypath, "?")
                errmsg += f"* line {l_num:>4}:  {keypath:<40} : {err}\n"
            errmsg += f"{sep}"

        print(errmsg)
        exit(1)

Then in another file I use it like that :

curr_path = Path(__file__).parent
path_schema = (curr_path / "schema.yaml").resolve()
path_data = (curr_path / "data.yaml").resolve()

validate(path_schema=path_schema, path_data=path_data)

I guess the code can be improved, I tested quiet a lot but I admit I did not try special cases. But it should not crash as I handled errors, maximum you don’t get the line number (just a ‘?’). I tried to be careful to use only “public” method from ruamel.yaml (CommentedMap.lc.data[key][0]). (I am on Python 3.8, ruamel.yaml 0.16.6 and yamale 3.0.1)

Here below an example: schema.yaml:

list_with_two_types: list(str(), include('variant'))
questions: list(include('question'))
---
variant:
  rsid: str()
  name: str()
# Comment 1
question:
  choices: list(include('choices')) # Comment 10
  questions: list(include('question'), required=False)

choices:
  id: str()
---
variant2:
  rsid2: str()
  name2: str()
  extra:
    abc: int()
    def: num()
# Comment 1
question2:
  choices2: list(include('choices')) # Comment 10
  questions2: list(include('question'), required=False)

data.yaml :

list_with_two_types:
  - name: "some SNP"
    rsid: "rs123"
  - "some"
  - "thing"
  - rsid: "rs312"
    name: 35
questions:
  - choices:
      - id: "id_str"
      - id: "id_str1"
    questions:
      - choices:
          - id: "id_str"
          - id: 66
---
list_with_two_types2:
  - name2: "some SNP"
    rsid2: "rs123"
  - "some2"
  - "thing2"
  - rsid2: "rs312"
    name2: 35
questions2:
  - choices2:
      - id2: "id_str"
      - id2: "id_str1"
    questions2:
      - choices2:
          - id2: "id_str"
          - id2: "id_str1"

And this is the output I am getting:

Validation failed!
Schema     : C:\Users\PC-G\Documents\Work\Workspace\sens\config\schema.yaml
Config     : C:\Users\PC-G\Documents\Work\Workspace\sens\config\data.yaml
----------------------------------------
* line    6:  list_with_two_types.3                    :  '{'rsid': 'rs312', 'name': 35}' is not a str.
* line    7:  list_with_two_types.3.name               :  '35' is not a str.
* line   15:  questions.0.questions.0.choices.1.id     :  '66' is not a str.
----------------------------------------
Schema     : C:\Users\PC-G\Documents\Work\Workspace\sens\config\schema.yaml
Config     : C:\Users\PC-G\Documents\Work\Workspace\sens\config\data.yaml
----------------------------------------
* line   24:  questions2                               :  Unexpected element
* line   17:  list_with_two_types2                     :  Unexpected element
* line    1:  list_with_two_types                      :  Required field missing
* line    8:  questions                                :  Required field missing
----------------------------------------

Issue Analytics

State:
Created 3 years ago
Comments:6 (1 by maintainers)

Top GitHub Comments

5reactions

mildebrandtcommented, Jul 31, 2020

No, we’ll leave it open as a reminder that people would like to have a line number in the output.

3reactions

mildebrandtcommented, Jul 31, 2020

Thanks for your interest in Yamale. I agree, the line number would be helpful. To be complete, we’d probably want to update the error class to hold the line numbers separately. It’ll take a little thought. Thanks for the start towards that.

For your code, be careful using . as the separator since that can be part of the key. We use that as a separator for the output, but internally we use something else. I can see how that may cause confusion when reading the output, and we may need to revisit that later.

Thanks!