"Error parsing date..."
See original GitHub issueDescribe the bug
I’m attempting to train a model from data in a large JSON array. An example of the relevant slice looks like:
[
{
"date": "1941-03-01",
"foo": "foo"
"bar": "bar"
},
{
"date": "2001-05-15",
"foo": "foo"
"bar": "bar"
}
]
When I run the following command:
ludwig train --dataset input.json --data_format json -c config.yaml
After a few moments, I get the following output for every element of the array:
Error parsing date: 1941-03-01 00:00:00 with error strptime() argument 1 must be str, not Timestamp Please provide a datetime format that parses it in the preprocessing section of the date feature in the config. The preprocessing fill in value will be used.For more details: https://ludwig-ai.github.io/ludwig-docs/0.5/configuration/features/date_features/
Error parsing date: 2001-05-15 00:00:00 with error strptime() argument 1 must be str, not Timestamp Please provide a datetime format that parses it in the preprocessing section of the date feature in the config. The preprocessing fill in value will be used.For more details: https://ludwig-ai.github.io/ludwig-docs/0.5/configuration/features/date_features/
I’ve tried this with several config options in config.yaml
, including each of the following:
input_features:
- name: date
type: date
preprocessing:
datetime_format: "%Y-%m-%d %H:%M:%S"
input_features:
- name: date
type: date
preprocessing:
datetime_format: "%Y-%m-%d"
input_features:
- name: date
type: date
All 3 result in the same error message.
It seems to me what’s happening is that the date string is being parsed too soon in the process – it’s being turned into a Timestamp object BEFORE being fed into strptime(), hence the error.
I’m not familiar enough with the codebase to know if this is a bug, an issue with JSON as the input source, or something else, but some guidance would be appreciated!
Expected behavior The date should parse according to the provided format.
Environment (please complete the following information):
- OS: MacOS
- Version 12.3.1
- Python version 3.10.5
- Ludwig version 0.5.4
Issue Analytics
- State:
- Created a year ago
- Comments:5 (3 by maintainers)
Top GitHub Comments
@noahlh Thanks for looking more into the
read_json
docs.Setting
convert_dates=False
seems like a reasonable suggestion. However, there might be other inconsistencies with other data loading libraries, so making sure that date_feature can handle pre-datetime-typed data seems like a good change to make.Ok I did some exploring and as a temporary workaound I’ve successfully patched
ludwig/features/date_feature.py
as follows:Changed:
To:
I’m not going to submit this as a patch because this doesn’t fix the root cause - but it does get things working. I’d be happy to help submit a proper patch if someone can guide me in the right direction to where this might be happening upstream.