question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

ParquetSharp should take into account "isAdjustedToUTC" for Timestamps

See original GitHub issue

Quality of life improvement.

ParquetSharp by default returns DateTimes with DateTime.Kind equal to Unspecified. ParquetSharp can be a bit smarter and use isAdjustedToUTC field from the Parquet Format to specify the DateTime.Kind as DateTime.Utc

By default, ParquetSharp uses new DateTime() which returns an DateTime.Kind.Unspecified variant.

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:6 (6 by maintainers)

github_iconTop GitHub Comments

1reaction
Nisdencommented, May 30, 2022

HI @adamreeve

Just a small follow up… So I have been feeding you the wrong information before.

Azure Synapse Serverless only supports isAdjustedToUTC=true And we do actually save “locally adjusted” DateTime’s to Parquet, so getting DateTimeKind.UTC is kinda of a pain.

For now I have implemented a workaround

public class CustomLogicalReadConverterFactory : LogicalReadConverterFactory
    {
        private readonly bool forceDateTimeKindUnspecified;

        public CustomLogicalReadConverterFactory(bool forceDateTimeKindUnspecified)
        {
            this.forceDateTimeKindUnspecified = forceDateTimeKindUnspecified;
        }

        public override Delegate GetConverter<TLogical, TPhysical>(ColumnDescriptor columnDescriptor, ColumnChunkMetaData columnChunkMetaData)
        {
            if (forceDateTimeKindUnspecified && typeof(TLogical) == typeof(DateTime))
            {
                var timestampType = (TimestampLogicalType)columnDescriptor.LogicalType;
                switch (timestampType.TimeUnit)
                {
                    case TimeUnit.Millis:
                        return (LogicalRead<DateTime, long>.Converter)((s, _, d, _) => LogicalRead.ConvertDateTimeMillis(s, d, DateTimeKind.Unspecified));
                    case TimeUnit.Micros:
                        return (LogicalRead<DateTime, long>.Converter)((s, _, d, _) => LogicalRead.ConvertDateTimeMicros(s, d, DateTimeKind.Unspecified));
                }
            }

            if (forceDateTimeKindUnspecified && typeof(TLogical) == typeof(DateTime?))
            {
                var timestampType = (TimestampLogicalType)columnDescriptor.LogicalType  ;
                switch (timestampType.TimeUnit)
                {
                    case TimeUnit.Millis:
                        return (LogicalRead<DateTime?, long>.Converter)(
                            (source, rep, dest, def) => LogicalRead.ConvertDateTimeMillis(source, rep, dest, def, DateTimeKind.Unspecified));
                    case TimeUnit.Micros:
                        return (LogicalRead<DateTime?, long>.Converter)(
                            (source, rep, dest, def) => LogicalRead.ConvertDateTimeMicros(source, rep, dest, def, DateTimeKind.Unspecified));
                    case TimeUnit.Nanos:
                        return (LogicalRead<TPhysical?, TPhysical>.Converter)LogicalRead.ConvertNative;
                }
            }

            return base.GetConverter<TLogical, TPhysical>(columnDescriptor, columnChunkMetaData);
        }
    }

And ill try and open up a ticket with Microsoft for an actual solution that will allow use to use isAdjustedToUtc=false

1reaction
adamreevecommented, Mar 22, 2022

Yes the proposed change only affects reading Parquet data into dotnet DateTime values. When writing DateTimes, we currently default to using isAdjustedToUtc: true, and there isn’t really a way to infer what this should be so we’re not suggesting changing how data is written.

So from what you’re saying, it sounds like you already need to specify isAdjustedToUtc: false when writing files for compatibility with Azure Synapse Serverless. And there probably isn’t a need to make the default reading behaviour configurable then.

Read more comments on GitHub >

github_iconTop Results From Across the Web

ParquetSharp's UTC adjustment
I have tried changing the DateTime with argument LogicalType.Timestamp(isAdjustedToUtc: false) but this causes an exception as the original data ...
Read more >
isAdjustedToUtc: false · Issue #272 · G-Research ...
To briefly summarize, I just want to take the DateTime values in the original database column and write them to the parquet file...
Read more >
Parquet tools should indicate UTC parameter for time/ ...
Parquet-tools should indicate if a time/timestamp is UTC adjusted or timezone agnostic, the values written by the tools should take UTC ...
Read more >
Parquet has both a date type and the datetime ...
“Parquet has both a date type and the datetime type (both sensibly recorded as integers in UTC).” What does it mean for a...
Read more >
How to load logical type TIMESTAMP data from Parquet ...
Data in Parquet files that are of logical type TIMESTAMP with adjustedToUTC=false are not supported by Snowflake, and are loaded as "Invalid ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found