Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Unicode support with PIC N notation

See original GitHub issue

Question

We are reading in COBOL files and are encountering an issue with unicode definitions in the copybooks.

      ******************************************************************        
      * COBOL DECLARATION FOR VIEW  XXXXXXXX                           *        
      ******************************************************************        
       01  YYYYYYYY-YYY.                                                        
         10  ZZZ-ZZZZZZ                      PIC N(4). 
         ... more lines

This results in an error:

Syntax error in the copybook at line 5: Invalid input 'N' at position 5:49
za.co.absa.cobrix.cobol.parser.exceptions.SyntaxErrorException: Syntax error in the copybook at line 5: Invalid input 'N' at position 5:49
	at za.co.absa.cobrix.cobol.parser.antlr.ThrowErrorStrategy.recover(ANTLRParser.scala:33)
	at za.co.absa.cobrix.cobol.parser.antlr.copybookParser.pic(copybookParser.java:2469)
	at za.co.absa.cobrix.cobol.parser.antlr.copybookParser.primitive(copybookParser.java:2791)
	at za.co.absa.cobrix.cobol.parser.antlr.copybookParser.item(copybookParser.java:3015)
	at za.co.absa.cobrix.cobol.parser.antlr.copybookParser.main(copybookParser.java:214)
	at za.co.absa.cobrix.cobol.parser.antlr.ANTLRParser$.parse(ANTLRParser.scala:72)
	at za.co.absa.cobrix.cobol.parser.CopybookParser$.parseTree(CopybookParser.scala:124)
	at za.co.absa.cobrix.spark.cobol.reader.fixedlen.FixedLenNestedReader.loadCopyBook(FixedLenNestedReader.scala:96)
	at za.co.absa.cobrix.spark.cobol.reader.fixedlen.FixedLenNestedReader.<init>(FixedLenNestedReader.scala:57)
	at za.co.absa.cobrix.spark.cobol.source.DefaultSource.createFixedLengthReader(DefaultSource.scala:88)
	at za.co.absa.cobrix.spark.cobol.source.DefaultSource.buildEitherReader(DefaultSource.scala:75)
	at za.co.absa.cobrix.spark.cobol.source.DefaultSource.createRelation(DefaultSource.scala:60)
	at za.co.absa.cobrix.spark.cobol.source.DefaultSource.createRelation(DefaultSource.scala:48)
	at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:340)
	at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:239)
	at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:227)
	at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:174)

According to this, the copybook is valid.

Are there any plans to support this notation?

Thanks, Steve

Issue Analytics

State:
Created 4 years ago
Comments:10 (3 by maintainers)

Top GitHub Comments

1reaction

tr11commented, Mar 16, 2020

According to https://www.ibm.com/support/knowledgecenter/en/SS6SG3_6.1.0/pg/concepts/cpuni01.html, it can be either.

1reaction

schaloner-kbccommented, Mar 11, 2020

I’m adding the necessary support at the moment. I’ll prepare a PR for this, with supporting data and send it over.

Top Results From Across the Web

Unicode 15.0 Character Code Charts

Unicode 15.0 Character Code Charts. Scripts | Symbols & Punctuation | Name Index. Find chart by hex code: Help Conventions Terms of Use ......

Mathematical operators and symbols in Unicode - Wikipedia

Mathematical operators and symbols are in multiple Unicode blocks. Some of these blocks are dedicated to, or primarily contain, mathematical characters ...

Insert ASCII or Unicode Latin-based symbols and characters

Learn how to insert ASCII or Unicode characters using character codes or the Character Map.

Characters, Entities and Fonts - W3C

For the latest character tables and font information, see the [Entities] and the Unicode Home Page, notably Unicode Work in Progress and Unicode...

Defining UTF-8 data items - IBM

The UTF-8 encoding of a character varies in length and it is always between ... data item holding 10 UTF-8 characters (40 bytes...