question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Can I read DataColumnStatistics of the column only before reading the entire column data ?

See original GitHub issue

Hi

Is there a way to read only the DataColumnStatistics before actually loading the entire column data into memory ? Essentially I have a method that checks if the search value exists in the data column by checking column Min and Max value

public static bool ValueExistsInColumnRange<T>(this DataColumn column, T value)
        where T : IComparable<T>
    {
        if (value is null || column.Statistics.MinValue is null || column.Statistics.MaxValue is null ||
            ((T)column.Statistics.MinValue).CompareTo(value) > 0 || ((T)column.Statistics.MaxValue).CompareTo(value) < 0)
            return false;

        return true;
    }

and if value doesn’t happen to exist in within the column then I can skip entire column without loading its vaules into memory. However I’ve noticed that ParquetRowGroupReader.ReadColumnAsync loads the entire column data into memory. How to only load column statistics and optionaly load column data on demand ?

Issue Analytics

  • State:closed
  • Created 7 months ago
  • Comments:5 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
mirosuavcommented, Apr 19, 2023

@aloneguid no, I don’t know GeoParquet 😃 We’re doing our own research on comparing different storage format for big real time data.

1reaction
mirosuavcommented, Apr 18, 2023

Thanks, @aloneguid. I already have a working solution for that, will PR it once I’m done testing.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Reading a column starting from a specific cell - Help
I have user Read column but when the column contains only one value ... Read the entire data in the sheet and store...
Read more >
Pandas.read_csv reads all of the file into one column
I fixed the issue by specifying names field of read_csv and header=None . fields = ["colA", "colB"]; df = pd.read_csv("/tmp/data.csv", ...
Read more >
Column names vs number when training/predicting
It depends on the model. Some models work without column names, other require them. The models that do not require column names, ...
Read more >
Read a particular column in excel spreadshee​t - NI Community
Basically what i am trying to do is take my reading from one. ... As you can see it just copies the first...
Read more >
Improve your ML model by trying different approaches
Creating dummies requires handling of missing columns in test data. Play around with the parameters of the ML model as it can be...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found