Support for custom logical writers
See original GitHub issueHello,
I have a proposal to add more options for adding our own logical writers/readers. Right now I’m adding my own logical writer reader capability by adding necessary functions to dicts LOGICAL_WRITERS
and LOGICAL_READERS
like this:
class NanoTime:
''' Times with precisions to nanoseconds '''
__init__(self, nanoseconds):
# number of nanoseconds since epoch
self.nanoseconds
...
import fastavro
def read_nanotime(data, writer_schema = None, reader_schema = None):
return NanoTime(data)
def prepare_nanotime(data, schema):
if isinstance(data, NanoTime):
return data.nanoseconds
else:
return data
fastavro.read.LOGICAL_READERS['long-nano-time'] = read_nanotime
fastavro.write.LOGICAL_WRITERS['long-nano-time'] = prepare_nanotime
This works well so I can use schemata like this:
{
"name": "Time:1",
"doc": "Time represented as nanoseconds since epoch.",
"namespace": "common",
"type": "long",
"logicalType": "nano-time"
}
This might seem like a hack but it is working pretty good because we always use our wrapper library to parse fastavro payload. So it is seamless for us. However, when we use schema common.Time:1
in union
we cannot use it directly. When I use schema like:
[null, "common.Time:1"]
then when we use as a value
NanoTime(0)
It fails on
ValueError: NanoTime(nanoseconds = 1495610391000000000) (type <class 'NanoTime'>) do not match ['null', 'common.Time:1']
This is happening because all checks for logical writers in fastavro._write.validate
are hardcoded:
if record_type == 'long':
return (
(isinstance(datum, (int, long,)) and
LONG_MIN_VALUE <= datum <= LONG_MAX_VALUE) or
isinstance(datum, (
datetime.time, datetime.datetime, datetime.date))
)
To resolve this I suggest to prepare data here in validate
like this:
logical_type = schema.get('logicalType')
...
if record_type == 'long':
if logical_type is not None:
datum = LOGICAL_WRITERS[logical_type](datum, schema)
return (
(isinstance(datum, (int, long,)) and
LONG_MIN_VALUE <= datum <= LONG_MAX_VALUE))
)
It would add another calling of prepare_
functions but on the other hand we would get rid of one isinstance
call and it would be useful for adding custom logical types. Also the code would be more flexible because information about logical types (like datetime.date
) would not be on multiple places. What do you think about this? I’m willing to make a PR if you are not against.
Issue Analytics
- State:
- Created 5 years ago
- Reactions:1
- Comments:12 (2 by maintainers)
@scottbelden I just noticed that
write_union
callsvalidate
, which would break some of this logic if we don’t updatevalidate
. I’ve already had on my list of things to do of updating adding custom and descriptive validators per known type, since that’s what’s already happening with the currentvalidate
implementation but with if conditions.I think the idea of custom logical types is a neat one, but I’d be hesitant to add it as it’s not part of the avro specification. I’ll try to take some time to more closely look at what you have written above and give some further thoughts in a day or two.