Read multiple feathers into a single tibble
See original GitHub issueThere are many applications where people break up data into several medium-sized objects. It would be great to be able to read multiple feathers into a single tibble object without an intermediate step.
In the below, the memory used is double what should be necessary:
library(feather)
library(tidyverse)
iris_big <- bind_rows(lapply(1:10000, function(x) iris)) %>%
as_tibble()
pryr::object_size(iris_big)
write_feather(iris_big, "tmp1.feather")
write_feather(iris_big, "tmp2.feather")
pryr::mem_change({
featherdf1 <- read_feather("tmp1.feather")
featherdf2 <- read_feather("tmp2.feather")
final_df <- bind_rows(featherdf1, featherdf2)
})
This doesnβt work, but might be how this could go:
pryr::mem_change({
feather1 <- feather("tmp1.feather")
feather2 <- feather("tmp2.feather")
final_df <- bind_rows(feather1, feather2) %>% collect()
})
file.remove("tmp1.feather", "tmp2.feather")
Issue Analytics
- State:
- Created 7 years ago
- Comments:5 (2 by maintainers)
Top Results From Across the Web
Reading Pandas data frame stored with Feather into R
It's just brought it in as a tibble, which is more or less an 'enhanced' dataframe from the tidyverse world. You can see...
Read more >11 Data import - R for Data Science - Hadley Wickham
In this chapter, you'll learn how to read plain-text rectangular files into R. Here, we'll only scratch the surface of data import, but...
Read more >Subsetting tibbles
However, the behavior is different for tibbles and data frames in some cases: [ always returns a tibble by default, even if only...
Read more >Chapter 4 Data Importing and βTidyβ Data - ModernDive
We'll cover two methods for importing .csv and .xlsx spreadsheet data in R: one using the console and the other using RStudio's graphical...
Read more >pandas.read_feather β pandas 1.5.2 documentation
Load a feather-format object from the file path. ... If not provided, all columns are read. ... Whether to parallelize reading using multiple...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Note this is currently possible with the Julia Feather.jl package,
I created https://issues.apache.org/jira/browse/ARROW-2332 about working on this, letβs continue the discussion there