[Enhancement] : Better error message with line number
See original GitHub issueIssue Description
Hi @mildebrandt and the other contributors. First thank you for this amazing library. I honestly think this should be the standard way to validate YAML schema. It would be amazing if you can reach “ruamel.yaml” and “pyyaml” so they could include your library and set the way to validate schema in YAML.
Anyway. I started using Yamale and found myself to wish to get the line number in the yaml file where I have errors. I am using ruamel.yaml, (I don’t really now pyyaml but I guess it should be possible too). I use ruamel.yaml in round-trip mode and extract the line number of each keys and I copy your format (with the ‘.’ for nested keys). I get a one level dict with the line number.
Here below my code with comments as well as an example with output.
I hope you will find it a nice addition, ideally I see that you add the line numbers into yamale.YamaleError.results.errors
from pathlib import Path
from typing import Dict
import yamale
from ruamel.yaml import YAML
from ruamel.yaml.comments import CommentedMap
def _get_lc_dict_helper(data: CommentedMap, dict_key_line: Dict[str, int], parentkey: str = "") -> Dict[str, int]:
"""
Recursive helper function to fetch the line infos of each keys in the config yaml file.
Built to be called inside of `_get_lc_dict`.
"""
sep = "." # Don't modify, it is to match the "keys" return in the errors of the yamale lib.
keys_indexes = None
try:
if len(data) > 0:
keys_indexes = range(len(data))
except TypeError:
pass
try:
keys = data.keys()
keys_indexes = keys
except AttributeError:
pass
if keys_indexes is None:
return dict_key_line # return condition from recursion
for key in keys_indexes:
if parentkey != "":
keyref = parentkey + sep + str(key)
else:
keyref = str(key)
try:
lnum = data.lc.data[key][0] + 1
if keyref in dict_key_line:
print(
f"WARNING : key '{keyref}' is NOT UNIQUE, at lines {dict_key_line[keyref]:>4} and {lnum:>4}."
f" (overwriting)."
)
dict_key_line[keyref] = lnum
# print(f"line {lnum:<3} : {keyref}")
_get_lc_dict_helper(data[key], dict_key_line, keyref) # recursion
except AttributeError:
pass
return dict_key_line
def _get_lc_dict(path: Path) -> Dict[str, int]:
"""
Helper function to trace back the line number in the yaml file for each keys.
Built to be called inside of `validate`.
Parameters
----------
path : Path
Path to the config yaml file (not the schema).
Returns
-------
Dict[str, int]
Maps the keys to their line number, the line counter (lc).
This dictionary is only 1 level and the keys corresponds to the ones report by the yamale lib.
"""
dict_key_line: Dict[str, int] = {}
with YAML(typ="rt") as yaml:
for data in yaml.load_all(path):
dict_key_line = _get_lc_dict_helper(data, dict_key_line)
return dict_key_line
def validate(path_schema: Path, path_data: Path):
"""
Validates the config yaml file according to the schema yaml file.
Will be silent if good and will exit the program if there is an error,
and will output an detailed error message to fix the config file.
Parameters
----------
path_schema : Path
Path to the schema yaml file.
path_data : Path
Path to the config yaml file.
"""
# Create a schema object
schema = yamale.make_schema(path=path_schema, parser="ruamel")
# Create a Data object
config = yamale.make_data(path=path_data, parser="ruamel")
# Validate data against the schema. Throws a ValueError if data is invalid.
try:
yamale.validate(schema, config)
print("Validation success!👍")
except yamale.YamaleError as e:
errmsg = "Validation failed!\n"
lc = _get_lc_dict(path_data)
for result in e.results:
title1 = "Schema"
title2 = "Config"
sep = f"{'-'*40}\n"
errmsg += f"{title1:<10} : {result.schema}\n{title2:<10} : {result.data}\n{sep}"
for error in result.errors:
keyerr = error.split(":", 1)
keypath = keyerr[0]
err = keyerr[1]
l_num = lc.get(keypath, "?")
errmsg += f"* line {l_num:>4}: {keypath:<40} : {err}\n"
errmsg += f"{sep}"
print(errmsg)
exit(1)
Then in another file I use it like that :
curr_path = Path(__file__).parent
path_schema = (curr_path / "schema.yaml").resolve()
path_data = (curr_path / "data.yaml").resolve()
validate(path_schema=path_schema, path_data=path_data)
I guess the code can be improved, I tested quiet a lot but I admit I did not try special cases. But it should not crash as I handled errors, maximum you don’t get the line number (just a ‘?’). I tried to be careful to use only “public” method from ruamel.yaml (CommentedMap.lc.data[key][0]). (I am on Python 3.8, ruamel.yaml 0.16.6 and yamale 3.0.1)
Here below an example: schema.yaml:
list_with_two_types: list(str(), include('variant'))
questions: list(include('question'))
---
variant:
rsid: str()
name: str()
# Comment 1
question:
choices: list(include('choices')) # Comment 10
questions: list(include('question'), required=False)
choices:
id: str()
---
variant2:
rsid2: str()
name2: str()
extra:
abc: int()
def: num()
# Comment 1
question2:
choices2: list(include('choices')) # Comment 10
questions2: list(include('question'), required=False)
data.yaml :
list_with_two_types:
- name: "some SNP"
rsid: "rs123"
- "some"
- "thing"
- rsid: "rs312"
name: 35
questions:
- choices:
- id: "id_str"
- id: "id_str1"
questions:
- choices:
- id: "id_str"
- id: 66
---
list_with_two_types2:
- name2: "some SNP"
rsid2: "rs123"
- "some2"
- "thing2"
- rsid2: "rs312"
name2: 35
questions2:
- choices2:
- id2: "id_str"
- id2: "id_str1"
questions2:
- choices2:
- id2: "id_str"
- id2: "id_str1"
And this is the output I am getting:
Validation failed!
Schema : C:\Users\PC-G\Documents\Work\Workspace\sens\config\schema.yaml
Config : C:\Users\PC-G\Documents\Work\Workspace\sens\config\data.yaml
----------------------------------------
* line 6: list_with_two_types.3 : '{'rsid': 'rs312', 'name': 35}' is not a str.
* line 7: list_with_two_types.3.name : '35' is not a str.
* line 15: questions.0.questions.0.choices.1.id : '66' is not a str.
----------------------------------------
Schema : C:\Users\PC-G\Documents\Work\Workspace\sens\config\schema.yaml
Config : C:\Users\PC-G\Documents\Work\Workspace\sens\config\data.yaml
----------------------------------------
* line 24: questions2 : Unexpected element
* line 17: list_with_two_types2 : Unexpected element
* line 1: list_with_two_types : Required field missing
* line 8: questions : Required field missing
----------------------------------------
Issue Analytics
- State:
- Created 3 years ago
- Comments:6 (1 by maintainers)
No, we’ll leave it open as a reminder that people would like to have a line number in the output.
Thanks for your interest in Yamale. I agree, the line number would be helpful. To be complete, we’d probably want to update the error class to hold the line numbers separately. It’ll take a little thought. Thanks for the start towards that.
For your code, be careful using
.
as the separator since that can be part of the key. We use that as a separator for the output, but internally we use something else. I can see how that may cause confusion when reading the output, and we may need to revisit that later.Thanks!