R - read_feather() error: embedded nul in string ...
See original GitHub issueFirst off - thanks for the feather package! I use it all the time to quickly transfer large data between R and Python, and just to read and write data because it’s FAST.
Unfortunately I frequently run into the same issue in R and debugging it has been tricky. I usually work with data.tables
, writing and reading them as feather files with something like
write_feather(transactions, "Data/transactions1.feather")
transactions1 <- data.table(read_feather("Data/transactions1.feather"))
Unfortunately, I frequently get the error “embedded nul in string”. In this case,
Error in coldataFeather(x, i) :
embedded nul in string: 'Acoustic\0\0\0\0\0\0\031\xa5\x9b\001)\xa5\x9b\001礛\001\006\xa5\x9b\0016\xa4\x9b\0017\xa4\x9b\001\x9d\xa4\x9b\001\x96\xa4\x9b\001\xf3\xa2\x9b\001M\xa3\x9b\001K\xa3\x9b\001P\xa3\x9b\001a\xa3\x9b\001\xba\xa9\x9b\001\xed\xa3\x9b\001\x8a\xa3\x9b\001\xff\xa2\x9b\0011\xa3\x9b\001\xb1\xa3\x9b\001a\xa2\x9b\001\b\xa3\x9b\001\ua89b\001\xf2\xa2\x9b\001\x98\xa9\x9b\001\xaa\xa2\x9b\001\xa9\xa2\x9b\001\xb0\xa2\x9b\001١\x9b\001R\xa1\x9b\001[\xa1\x9b\001H\xa1\x9b\001\xb6\xa2\x9b\001s\xa1\x9b\001Ѡ\x9b\001\x87\xa9\x9b\001\x96\xa0\x9b\001\x99\xa1\x9b\001\x9c\xa1\x9b\001-\xa1\x9b\001|\xa0\x9b\001\xa6\xa0\x9b\001\xab\xa0\x9b\001f\xa0\x9b\001h\xa0\x9b\001(\xa0\x9b\001\x81\xa9\x9b\0017\xa0\x9b\001\x80\xa9\x9b\001\a\xa1\x9b\001ӟ\x9b\001\xbb\x9f\x9b\001\xbc\x9f\x9b\001m\x9f\x9b'
Debugging is epecially weird - if I try slicing my data in half, sometimes each half of the dataset will read and write as feather format just fine. Needless to say I haven’t been able to build a reproducible example of this error and I can’t share my large transactions dataset. Any tips to help me figure out what’s going wrong?
str(transactions1)
Classes ‘data.table’ and 'data.frame': 3000001 obs. of 6 variables:
$ ArticleID : int 13516378 13516378 13516378 13516379 13516379 13516379 13516379 13516379 13516379 13516379 ...
$ ArticleTags : int 34 34 34 24 24 24 24 24 24 24 ...
$ Tagset : chr "Apex predator|Autonomy|City|Climate|Ethics|Exercise|Human resource management|Jacksonville Jaguars|Jacksonville, Florida|Jaguar"| __truncated__ "Apex predator|Autonomy|City|Climate|Ethics|Exercise|Human resource management|Jacksonville Jaguars|Jacksonville, Florida|Jaguar"| __truncated__ "Apex predator|Autonomy|City|Climate|Ethics|Exercise|Human resource management|Jacksonville Jaguars|Jacksonville, Florida|Jaguar"| __truncated__ "AFC North|Baltimore Ravens|Blood|Cincinnati|Cincinnati Bengals|Cleveland Browns|Discrimination|Emotion|Hatred|Heinz|Hematology|"| __truncated__ ...
$ TransactionID: int 153089414 153089435 153089428 153089444 153089445 153089448 153089450 153089446 153089447 153089453 ...
$ TagID : int 23892 26058 26229 344 1977 2776 4828 4829 4963 7076 ...
$ Tag : chr "Stained glass" "Trousers" "U.S. state" "AFC North" ...
- attr(*, ".internal.selfref")=<externalptr>
Issue Analytics
- State:
- Created 7 years ago
- Comments:23 (6 by maintainers)
Top GitHub Comments
We’re working on getting Feather users migrated over to the Arrow C++ libraries (see https://github.com/apache/arrow/pull/2947), so this issue should be resolved after the migration, but we should test to verify. @jameslamb would you be up for writing an ad hoc test (not necessarily to be run in testthat, but could be in a separate directory of integration tests) to check?
We can discuss doing a release with @hadley, who is the package’s current maintainer. Seems like it could be a good idea since there are known limitations/bugs in the old implementation.