How can I load these multi-segment data from ASCII files?
See original GitHub issueHello,
Thank you for your work on this project, it is of great help for me.
So far, I have been able to successfully load ASCII single segment files, but failed with multi-segment ones.
Here is a simplified example of the kind of data I am trying to load into a DataFrame:
Copybook:
01 COMPANY-DETAILS.
05 SEGMENT-ID PIC 9(1).
05 STATIC-DETAILS.
10 NAME PIC X(2).
05 CONTACTS REDEFINES STATIC-DETAILS.
10 PERSON PIC X(3).
Data:
1BB
2CCC
Code:
val copybook =
""" 01 COMPANY-DETAILS.
| 05 SEGMENT-ID PIC 9(1).
| 05 STATIC-DETAILS.
| 10 NAME PIC X(2).
|
| 05 CONTACTS REDEFINES STATIC-DETAILS.
| 10 PERSON PIC X(3).
""".stripMargin
val df = spark.read
.format("cobol")
.option("copybook_contents", copybook)
.option("is_record_sequence", "true")
.option("schema_retention_policy", "collapse_root")
.option("encoding", "ascii")
.load("data_ascii/mini.txt")
Output:
+----------+--------------+--------+
|SEGMENT_ID|STATIC_DETAILS|CONTACTS|
+----------+--------------+--------+
| null| [2C]| [2CC]|
+----------+--------------+--------+
I can see 2 problems in my output:
- null value
- only one row : the 2 records seems to be read as if they were one (in my tests with my real data containing many records, I always end up with only one row in the dataframe)
After thoroughly reading your (very nice) README, I have tried to modify the copybook, data and several options, but I still fail to load my data correctly.
Since I am new to Cobol formats, I suspect either my use of Cobrix options to be incorrect, or my data format (ASCII, no record header in data) to be incompatible with Cobrix.
Can you see what is wrong here?
Thanks!
Issue Analytics
- State:
- Created 4 years ago
- Comments:9
Top Results From Across the Web
Reading data from ASCII file - MATLAB Answers - MathWorks
I have an ASCII file with data corresponding to several runs of a simulation. I want to plot all the runs in the...
Read more >Using ASCII Stored Files - Teledyne LeCroy
The ASCII waveform storage feature allows waveforms to be saved to a mass-memory device in any of three ASCII formats: Spreadsheet, Mathcad and...
Read more >Importing ASCII Data into Excel
Importing ASCII data into SPSS First, pull down the File menu and select Open/Data. Under Files of Type select all files and then...
Read more >gmtconvert(1) - Linux man page - Die.net
file (s). Segments are separated by a special record. For ASCII files the first character must be flag [Default is '>']. For binary...
Read more >HEADER(5) - PhysioNet
Header files contain line- and field-oriented ASCII text. ... Header files for multi-segment records (supported by WFDB library version 9.1 and later ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Thanks for theses clarifications 😃 I proposed a pull request adding this example to the documentation, hope it helps
Yes, it is great. Thank you!