Parsing the data file incorrectly
See original GitHub issueI am currently working on a project converting EBCDIC binary file to UTF-8 text file. I am using cobrix but the output seems to be incorrect. I am using the following script to load the datafile and copyBook:
dataFrame = spark.read.format("cobol").options(copybook = copyBook).option("is_record_sequence", "true").load(filename)
The output is showing as below:
A screenshot from original data presentation from mainframe:
It looks like when parsing the data, it always skips the “EOB_FAMILY_NUM” field and that field will always be Null. Other fields are mismatched as well. I have tried adding more options like .option("rdw_adjustment", 4)
but it doesn’t solve the issue. Do you know anything I can do to solve that issues?
I also attached the copy book screenshot below:
Issue Analytics
- State:
- Created 3 years ago
- Comments:7
Top Results From Across the Web
How to Fix Photoshop Problem Parsing the JPEG Data
The above-mentioned reasons can be the causes of the Photoshop problem parsing the data. However, the main cause could be a corrupt image...
Read more >How to solve problem parsing the JPEG data in Photoshop
Methods to fix Photoshop JPEG parsing error · Open the JPEG file in the Windows default picture viewer · Rotate the image by...
Read more >How to resolve Rational DOORS "Error while parsing file ... - IBM
Answer · Note the file name and location named in the DOORS error that indicates data corruption · Stop the DOORS server ·...
Read more >5 most common parsing errors in CSV files (and ... - Medium
Typically the problem will appear when the CSV file is not using double quotes to enclose text and number fields. Names and addresses...
Read more >[Solved] How to Fix There Was A Problem Parsing the Package
Six Fixes on Parse Error on Android · Fix 1: Enable "Allow installation of apps from unknown sources" · Fix 2: Turn on...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Hi Ruslan, after getting my new files with RDW. Cobrix is working perfectly fine now. Thanks again!
In order to read variable record length files, there should be a way to determine the record length for each record. RDW is the best way, it is general, explicit, and deterministic. So if you can preserve RDW it would be very easy to extract data from the file. Other options are more complicated. If one of the record fields contains record size, it can be used. If there is no such field, but there is a field that determines record type, a custom record extractor can be used.
You can send the file and the copybook (or links to them on GDrive/Dropbox) to yruslan@gmail.com.