Can I read DataColumnStatistics of the column only before reading the entire column data ?
See original GitHub issueHi
Is there a way to read only the DataColumnStatistics before actually loading the entire column data into memory ? Essentially I have a method that checks if the search value exists in the data column by checking column Min and Max value
public static bool ValueExistsInColumnRange<T>(this DataColumn column, T value)
where T : IComparable<T>
{
if (value is null || column.Statistics.MinValue is null || column.Statistics.MaxValue is null ||
((T)column.Statistics.MinValue).CompareTo(value) > 0 || ((T)column.Statistics.MaxValue).CompareTo(value) < 0)
return false;
return true;
}
and if value doesn’t happen to exist in within the column then I can skip entire column without loading its vaules into memory. However I’ve noticed that
ParquetRowGroupReader.ReadColumnAsync
loads the entire column data into memory.
How to only load column statistics and optionaly load column data on demand ?
Issue Analytics
- State:
- Created 7 months ago
- Comments:5 (5 by maintainers)
Top Results From Across the Web
Reading a column starting from a specific cell - Help
I have user Read column but when the column contains only one value ... Read the entire data in the sheet and store...
Read more >Pandas.read_csv reads all of the file into one column
I fixed the issue by specifying names field of read_csv and header=None . fields = ["colA", "colB"]; df = pd.read_csv("/tmp/data.csv", ...
Read more >Column names vs number when training/predicting
It depends on the model. Some models work without column names, other require them. The models that do not require column names, ...
Read more >Read a particular column in excel spreadsheet - NI Community
Basically what i am trying to do is take my reading from one. ... As you can see it just copies the first...
Read more >Improve your ML model by trying different approaches
Creating dummies requires handling of missing columns in test data. Play around with the parameters of the ML model as it can be...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
@aloneguid no, I don’t know GeoParquet 😃 We’re doing our own research on comparing different storage format for big real time data.
Thanks, @aloneguid. I already have a working solution for that, will PR it once I’m done testing.