question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

How can I load these multi-segment data from ASCII files?

See original GitHub issue

Hello,

Thank you for your work on this project, it is of great help for me.

So far, I have been able to successfully load ASCII single segment files, but failed with multi-segment ones.

Here is a simplified example of the kind of data I am trying to load into a DataFrame:

Copybook:

01  COMPANY-DETAILS.
    05  SEGMENT-ID PIC 9(1).
    05  STATIC-DETAILS.
        10  NAME PIC X(2).

    05  CONTACTS REDEFINES STATIC-DETAILS.
        10  PERSON PIC X(3).

Data:

1BB
2CCC

Code:

val copybook =
      """       01  COMPANY-DETAILS.
        |            05  SEGMENT-ID		PIC 9(1).
        |            05  STATIC-DETAILS.
        |               10  NAME      	PIC X(2).
        |
        |            05  CONTACTS REDEFINES STATIC-DETAILS.
        |               10  PERSON    	PIC X(3).
      """.stripMargin

val df = spark.read
      .format("cobol")
      .option("copybook_contents", copybook)
      .option("is_record_sequence", "true")
      .option("schema_retention_policy", "collapse_root")
      .option("encoding", "ascii")
      .load("data_ascii/mini.txt")

Output:

+----------+--------------+--------+
|SEGMENT_ID|STATIC_DETAILS|CONTACTS|
+----------+--------------+--------+
|      null|          [2C]|   [2CC]|
+----------+--------------+--------+

I can see 2 problems in my output:

  • null value
  • only one row : the 2 records seems to be read as if they were one (in my tests with my real data containing many records, I always end up with only one row in the dataframe)

After thoroughly reading your (very nice) README, I have tried to modify the copybook, data and several options, but I still fail to load my data correctly.

Since I am new to Cobol formats, I suspect either my use of Cobrix options to be incorrect, or my data format (ASCII, no record header in data) to be incompatible with Cobrix.

Can you see what is wrong here?

Thanks!

Issue Analytics

  • State:open
  • Created 4 years ago
  • Comments:9

github_iconTop GitHub Comments

2reactions
bastien-bonnetcommented, Jul 25, 2019

Thanks for theses clarifications 😃 I proposed a pull request adding this example to the documentation, hope it helps

1reaction
yruslancommented, Jul 25, 2019

Yes, it is great. Thank you!

Read more comments on GitHub >

github_iconTop Results From Across the Web

Reading data from ASCII file - MATLAB Answers - MathWorks
I have an ASCII file with data corresponding to several runs of a simulation. I want to plot all the runs in the...
Read more >
Using ASCII Stored Files - Teledyne LeCroy
The ASCII waveform storage feature allows waveforms to be saved to a mass-memory device in any of three ASCII formats: Spreadsheet, Mathcad and...
Read more >
Importing ASCII Data into Excel
Importing ASCII data into SPSS​​ First, pull down the File menu and select Open/Data. Under Files of Type select all files and then...
Read more >
gmtconvert(1) - Linux man page - Die.net
file (s). Segments are separated by a special record. For ASCII files the first character must be flag [Default is '>']. For binary...
Read more >
HEADER(5) - PhysioNet
Header files contain line- and field-oriented ASCII text. ... Header files for multi-segment records (supported by WFDB library version 9.1 and later ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found