Unicode support with PIC N notation
See original GitHub issueQuestion
We are reading in COBOL files and are encountering an issue with unicode definitions in the copybooks.
******************************************************************
* COBOL DECLARATION FOR VIEW XXXXXXXX *
******************************************************************
01 YYYYYYYY-YYY.
10 ZZZ-ZZZZZZ PIC N(4).
... more lines
This results in an error:
Syntax error in the copybook at line 5: Invalid input 'N' at position 5:49
za.co.absa.cobrix.cobol.parser.exceptions.SyntaxErrorException: Syntax error in the copybook at line 5: Invalid input 'N' at position 5:49
at za.co.absa.cobrix.cobol.parser.antlr.ThrowErrorStrategy.recover(ANTLRParser.scala:33)
at za.co.absa.cobrix.cobol.parser.antlr.copybookParser.pic(copybookParser.java:2469)
at za.co.absa.cobrix.cobol.parser.antlr.copybookParser.primitive(copybookParser.java:2791)
at za.co.absa.cobrix.cobol.parser.antlr.copybookParser.item(copybookParser.java:3015)
at za.co.absa.cobrix.cobol.parser.antlr.copybookParser.main(copybookParser.java:214)
at za.co.absa.cobrix.cobol.parser.antlr.ANTLRParser$.parse(ANTLRParser.scala:72)
at za.co.absa.cobrix.cobol.parser.CopybookParser$.parseTree(CopybookParser.scala:124)
at za.co.absa.cobrix.spark.cobol.reader.fixedlen.FixedLenNestedReader.loadCopyBook(FixedLenNestedReader.scala:96)
at za.co.absa.cobrix.spark.cobol.reader.fixedlen.FixedLenNestedReader.<init>(FixedLenNestedReader.scala:57)
at za.co.absa.cobrix.spark.cobol.source.DefaultSource.createFixedLengthReader(DefaultSource.scala:88)
at za.co.absa.cobrix.spark.cobol.source.DefaultSource.buildEitherReader(DefaultSource.scala:75)
at za.co.absa.cobrix.spark.cobol.source.DefaultSource.createRelation(DefaultSource.scala:60)
at za.co.absa.cobrix.spark.cobol.source.DefaultSource.createRelation(DefaultSource.scala:48)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:340)
at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:239)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:227)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:174)
According to this, the copybook is valid.
Are there any plans to support this notation?
Thanks, Steve
Issue Analytics
- State:
- Created 4 years ago
- Comments:10 (3 by maintainers)
Top Results From Across the Web
Unicode 15.0 Character Code Charts
Unicode 15.0 Character Code Charts. Scripts | Symbols & Punctuation | Name Index. Find chart by hex code: Help Conventions Terms of Use ......
Read more >Mathematical operators and symbols in Unicode - Wikipedia
Mathematical operators and symbols are in multiple Unicode blocks. Some of these blocks are dedicated to, or primarily contain, mathematical characters ...
Read more >Insert ASCII or Unicode Latin-based symbols and characters
Learn how to insert ASCII or Unicode characters using character codes or the Character Map.
Read more >Characters, Entities and Fonts - W3C
For the latest character tables and font information, see the [Entities] and the Unicode Home Page, notably Unicode Work in Progress and Unicode...
Read more >Defining UTF-8 data items - IBM
The UTF-8 encoding of a character varies in length and it is always between ... data item holding 10 UTF-8 characters (40 bytes...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
According to https://www.ibm.com/support/knowledgecenter/en/SS6SG3_6.1.0/pg/concepts/cpuni01.html, it can be either.
I’m adding the necessary support at the moment. I’ll prepare a PR for this, with supporting data and send it over.