question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Column types are wrong with larger scan depth and empty values, how to handle empty values

See original GitHub issue

Hello I am using choETL version 1.0.9.9. I like your lib. a lot 🥇 since it offers Data type discovery, some of this does not work with new releases so I opted to stay at that version.

I need help in configuring this correctly to handle Type discovery even if there are empty values.

The Cost1 Cost2,..3. columns data type, shows up wrongly as String, when it should be double/number. using WithMaxScanRows(6), but if I change it to WithMaxScanRows(3) it works fine.

using (var choReader = new ChoCSVReader(someCsvFile).WithFirstLineHeader().WithMaxScanRows(3))
         {   
             var   dataTable = choReader.AsDataTable();

             foreach (DataColumn column in table.Columns)              
              var checkType =  SQLGetType(column) ; // if scan rows is 3, it works fine, but if its 6 i get string, 
              // how can I ensure it gives me the proper dataype even if there are empty data cells.
             
            ...
         }

But when I set WithMaxScanRows(3) the Cost1..2..3. show up correctly as double.


CSV file

Date,Project,Login,Area,Cost1,Cost2,Cost3,Cost4,Cost5 2018-03-19 23:00:00,jasonLogin,sr1,taskscompleted,21.0,0.0,0.0,0.0,0.0 2018-06-27 23:00:00,jasonLogin,bblackmon,aircraftdelivery,480.0,0.0,0.0,0.0,0.0 2017-12-19 23:00:00,jasonLogin,sfinishing1,hoursearned,94.73,0.23333333333333334,0.0,0.0,0.0 2018-03-27 23:00:00,jasonLogin,bblackmon,hoursearned,46.5,0.0,0.0,0.0,0.0 2017-12-05 23:00:00,jasonLogin,tmodrick,toolissues,0.0 2018-03-11 23:00:00,jasonLogin,jclark,hoursearned,72.39,0.0,0.0,0.0,0.0 2018-02-28 23:00:00,jasonLogin,jmartinez,housekeeping1,30.0,0.0,0.0,0.0,0.0 2018-02-17 23:00:00,jasonLogin,sfinishing1,budgetrevenue,2018.94,0.0,0.0,0.0,0.0 2018-05-30 23:00:00,jasonLogin,khall,hoursworked,45.7,0.0,0.0,0.0,0.0 2018-03-28 23:00:00,jasonLogin,finish1,budgetrevenue,4834.77,0.0,0.0,0.0,0.0 2017-12-15 23:00:00,jasonLogin,btwilley,aircraftmove1,0.0

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:9 (4 by maintainers)

github_iconTop GitHub Comments

2reactions
Cinchoocommented, Feb 5, 2020

On the top ‘Include Prerelease’ checkbox, check that box.

image

0reactions
Cinchoocommented, Feb 20, 2020

well, RecordFieldTypeAssessment is called for each record, to take control of assessing the field type from its contents.

On the other hand, there is another event MembersDiscovered, this get called once where you can assign field types without considering the payload.

p.MembersDiscovered += (o, e) =>
{
    var ft = e.Value; 

    ft["Id.x"] = typeof(short);
    ft["Name"] = typeof(string);
    ft["City"] = typeof(string);
};

PS. found a bug that this method get called twice in the life of CSV load. Put a fix in 1.1.0.5-beta6 package.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Working with SQL NULL values
This article will show functions and operators for handling SQL NULL values.
Read more >
How to handle the null/empty values on a dataframe Spark ...
I have a CSV file and I am processing its data. I am working with data frames, and I calculate average, min, max,...
Read more >
7 Ways to Handle Missing Values in Machine Learning
Missing values can be handled by deleting the rows or columns having null values. If columns have more than half of the rows...
Read more >
Working with Missing Data in Pandas
How to reverse the column order of the Pandas DataFrame? Check if a column starts with given string in Pandas DataFrame? Miscellaneous DataFrame ......
Read more >
Replacing empty values in a column
Replaces empty values with the average value of the other values in the column. For integer values, the average is rounded towards zero...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found