question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Unicode support with PIC N notation

See original GitHub issue

Question

We are reading in COBOL files and are encountering an issue with unicode definitions in the copybooks.

      ******************************************************************        
      * COBOL DECLARATION FOR VIEW  XXXXXXXX                           *        
      ******************************************************************        
       01  YYYYYYYY-YYY.                                                        
         10  ZZZ-ZZZZZZ                      PIC N(4). 
         ... more lines

This results in an error:

Syntax error in the copybook at line 5: Invalid input 'N' at position 5:49
za.co.absa.cobrix.cobol.parser.exceptions.SyntaxErrorException: Syntax error in the copybook at line 5: Invalid input 'N' at position 5:49
	at za.co.absa.cobrix.cobol.parser.antlr.ThrowErrorStrategy.recover(ANTLRParser.scala:33)
	at za.co.absa.cobrix.cobol.parser.antlr.copybookParser.pic(copybookParser.java:2469)
	at za.co.absa.cobrix.cobol.parser.antlr.copybookParser.primitive(copybookParser.java:2791)
	at za.co.absa.cobrix.cobol.parser.antlr.copybookParser.item(copybookParser.java:3015)
	at za.co.absa.cobrix.cobol.parser.antlr.copybookParser.main(copybookParser.java:214)
	at za.co.absa.cobrix.cobol.parser.antlr.ANTLRParser$.parse(ANTLRParser.scala:72)
	at za.co.absa.cobrix.cobol.parser.CopybookParser$.parseTree(CopybookParser.scala:124)
	at za.co.absa.cobrix.spark.cobol.reader.fixedlen.FixedLenNestedReader.loadCopyBook(FixedLenNestedReader.scala:96)
	at za.co.absa.cobrix.spark.cobol.reader.fixedlen.FixedLenNestedReader.<init>(FixedLenNestedReader.scala:57)
	at za.co.absa.cobrix.spark.cobol.source.DefaultSource.createFixedLengthReader(DefaultSource.scala:88)
	at za.co.absa.cobrix.spark.cobol.source.DefaultSource.buildEitherReader(DefaultSource.scala:75)
	at za.co.absa.cobrix.spark.cobol.source.DefaultSource.createRelation(DefaultSource.scala:60)
	at za.co.absa.cobrix.spark.cobol.source.DefaultSource.createRelation(DefaultSource.scala:48)
	at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:340)
	at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:239)
	at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:227)
	at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:174)

According to this, the copybook is valid.

Are there any plans to support this notation?

Thanks, Steve

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:10 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
tr11commented, Mar 16, 2020
1reaction
schaloner-kbccommented, Mar 11, 2020

I’m adding the necessary support at the moment. I’ll prepare a PR for this, with supporting data and send it over.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Unicode 15.0 Character Code Charts
Unicode 15.0 Character Code Charts. Scripts | Symbols & Punctuation | Name Index. Find chart by hex code: Help Conventions Terms of Use ......
Read more >
Mathematical operators and symbols in Unicode - Wikipedia
Mathematical operators and symbols are in multiple Unicode blocks. Some of these blocks are dedicated to, or primarily contain, mathematical characters ...
Read more >
Insert ASCII or Unicode Latin-based symbols and characters
Learn how to insert ASCII or Unicode characters using character codes or the Character Map.
Read more >
Characters, Entities and Fonts - W3C
For the latest character tables and font information, see the [Entities] and the Unicode Home Page, notably Unicode Work in Progress and Unicode...
Read more >
Defining UTF-8 data items - IBM
The UTF-8 encoding of a character varies in length and it is always between ... data item holding 10 UTF-8 characters (40 bytes...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found