question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[Read] Read Decimal LogicalType

See original GitHub issue

Hello,

Do you have any example to read a decimal column in a parquet file ?

I am stuck on error “System.NotSupportedException: only 16 bytes of decimal length is supported”

PhysicalType = FixedLenByteArray

i dont find workaround since it breaks whenever i call Column.LogicalReader<T>

Maybe is there a method to get raw data like Column.PhysicalReader<T> ?

Br Nick

What i tried so far:

case LogicalTypeEnum.Decimal:
      return column.LogicalReader<decimal?>().ReadAll(numRows)
          .Select(l => (object)l)
          .ToArray();

Issue Analytics

  • State:closed
  • Created 8 months ago
  • Comments:5 (2 by maintainers)

github_iconTop GitHub Comments

1reaction
adamreevecommented, Jan 10, 2023

To handle nullable values you need to use the ReadBatch override that takes defLevels and repLevels. In the case of a column of scalar nullable values (not nested in an array or other struct) then repLevels can be ignored. Eg. something like this should work:

var typeLength = column.ColumnDescriptor.TypeLength;
var definitionLevel = column.ColumnDescriptor.MaxDefinitionLevel;
var values = new FixedLenByteArray[numRows];
var defLevels = new short[numRows];
var bytes = new byte[typeLength];

var decimalValues = new List<decimal?>();

// create byte vector values
var coefs = Enumerable.Range(0, column.ColumnDescriptor.TypeLength)
    .Reverse()
    .Select(iB => (decimal)Math.Pow(256, iB))
    .ToArray();

long totalRowsRead = 0;
while (totalRowsRead < numRows)
{
    var rowsRead = _column.ReadBatch(numRows, defLevels, null, values, out var valuesRead);
    var valueIdx = 0;
    for (var i = 0; i < rowsRead; ++i)
    {
        if (defLevels[i] == definitionLevel)
        {
            Marshal.Copy(values[valueIdx].Pointer, bytes, 0, typeLength);
            // Parse the string as a decimal value
            decimalValues.Add(ConvertFixedLengthByteArrayToDecimal(bytes, column.ColumnDescriptor.TypeScale, coefs));
            valueIdx++;
        }
        else
        {
            decimalValues.Add(null);
        }
    }

    totalRowsRead += rowsRead;
}
1reaction
adamreevecommented, Jan 8, 2023

Hi @Platob, which version of ParquetSharp are you using, and do you know the byte width of the decimal data you are trying to read? If your data uses a physical type of int32 or int64 then support for reading this has been added in ParquetSharp 10.0.1-beta1 (#315).

Otherwise we only currently support reading fixed-length byte array based decimal data that uses 16 bytes, but it is possible to read the raw byte values. See https://github.com/G-Research/ParquetSharp/discussions/317#discussioncomment-4210430 for an example that just prints a hex representation of the byte values.

Read more comments on GitHub >

github_iconTop Results From Across the Web

How to read decimal logical type into spark dataframe
You can specify schema in read operation: val schema = new StructType() .add(StructField("MyField", BooleanType)). or you can cast column
Read more >
org.apache.avro.LogicalTypes.decimal java code examples
Create a Decimal LogicalType with the given precision and scale 0 */ public ... assertEquals("Should convert bytes to BigDecimals", expected, read(GENERIC.
Read more >
Issues with reading Avro logicalType decimal using...
Hi All,. Am new to Nifi and trying to solve an issue am facing with Avro to Json and Json to Avro conversion...
Read more >
Read bytes and fixed avro types with decimal logicalType ...
Having a problem with this issue spark-avro save data in BigDecimal type and reading back to DecimalType any idea when it will fixed...
Read more >
PXF – Introducing support for reading the Avro Logical Types
PXF is now supporting reading the following Logical Types. Decimal; UUID; Date; Time (millisecond precision); Time (microsecond precision) ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found