ChoFixedLengthRecordWriter issue with handling double quotes
See original GitHub issueWe are facing two issues with ChoFixedLengthRecordWriter, as detailed below, if the field value contains a double quote.
ISSUE #1: When the field value contains a double quote character, the ChoFixedLengthRecordWriter is doubling the double quote character by default. Since it adds additional double quote characters, it increases the size of that column where the double quote is present and shifts the starting positions of the subsequent columns. Since it is a fixed length file, it should use the actual size specified for that column.
For example, if below is the model data,
List<Employee> employee = new List<Employee>()
{
new Employee() { Id = 20, Name = "John Smith", Address = "PO BOX 12165", Age = "25" },
new Employee() { Id = 21, Name = "Bob Kevin", Address = "123 NEW LIVERPOOL RD \"APT 12\"", Age = "30" },
new Employee() { Id = 22, Name = "Jack Robert", Address = "PO BOX 123", Age = "40" }
};
It is generating the fixed length file with below contents. Here, the Age column value for the second record is shifted by two positions since it added two double quotes (one for each occurance of doule quote in the Address column of the second record).
ISSUE #2: We basically don’t want to double the double quote if it is present in the field value. It would be great if you can add a configuration setting to enable or disable this feature. We would like to have this configuration for both CSV and Fixed Length Writer.
In order to handle this issue temporarily, we are basically setting the ‘QuoteChar’ configuration to a value that is not expected in the flat file, as shown below. Since ‘~’ (tilde) character never occurs in the file, it is able to bypass the below if condition that is replacing the double quote with two double quotes.
// Temporary Fix config.QuoteChar = ‘~’; // tilde character
// ChoFixedLengthRecordWriter implementation
if (fieldValue.Contains(Configuration.QuoteChar))
{
fieldValue = fieldValue.Replace(Configuration.QuoteChar.ToString(), Configuration.DoubleQuoteChar);
}
But since we are changing the QuoteChar configuration to tilde character, the ChoFixedLengthRecordWriter will end up using that character to enclose the field value (it adds ‘~’ character to the starting and ending position of the field value) if the “QuoteField” column configuration is set to ‘true’, which is not correct. So having a configuration setting to enable or disable the feature to replace single double quote with two double quotes would solve this problem. Please help.
// ChoFixedLengthRecordWriter implementation
if (quoteValue)
fieldValue = "{1}{0}{1}".FormatString(fieldValue, Configuration.QuoteChar);
Issue Analytics
- State:
- Created 2 years ago
- Comments:7 (3 by maintainers)
Top GitHub Comments
Done, released v1.2.1.26 today
@Cinchoo, I would like to confirm that the fixes are working fine for us. Thank you so much again for providing the fix so quickly!