question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. ItΒ collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Read multiple feathers into a single tibble

See original GitHub issue

There are many applications where people break up data into several medium-sized objects. It would be great to be able to read multiple feathers into a single tibble object without an intermediate step.

In the below, the memory used is double what should be necessary:

library(feather)
library(tidyverse)
iris_big <- bind_rows(lapply(1:10000, function(x) iris)) %>% 
  as_tibble()
pryr::object_size(iris_big)
write_feather(iris_big, "tmp1.feather")
write_feather(iris_big, "tmp2.feather")
pryr::mem_change({
  featherdf1 <- read_feather("tmp1.feather")
  featherdf2 <- read_feather("tmp2.feather")
  final_df <- bind_rows(featherdf1, featherdf2)
})

This doesn’t work, but might be how this could go:

pryr::mem_change({
  feather1 <- feather("tmp1.feather")
  feather2 <- feather("tmp2.feather")
  final_df <- bind_rows(feather1, feather2) %>% collect()
})
file.remove("tmp1.feather", "tmp2.feather")

Issue Analytics

  • State:closed
  • Created 7 years ago
  • Comments:5 (2 by maintainers)

github_iconTop GitHub Comments

2reactions
quinnjcommented, Dec 16, 2016

Note this is currently possible with the Julia Feather.jl package,

julia> using Feather

julia> df = Feather.read(Pkg.dir("Feather") * "/test/newdata/BOD.feather")
6Γ—2 DataFrames.DataFrame
β”‚ Row β”‚ Time β”‚ demand β”‚
β”œβ”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ 1   β”‚ 1.0  β”‚ 8.3    β”‚
β”‚ 2   β”‚ 2.0  β”‚ 10.3   β”‚
β”‚ 3   β”‚ 3.0  β”‚ 19.0   β”‚
β”‚ 4   β”‚ 4.0  β”‚ 16.0   β”‚
β”‚ 5   β”‚ 5.0  β”‚ 15.6   β”‚
β”‚ 6   β”‚ 7.0  β”‚ 19.8   β”‚

julia> df = Feather.read(Pkg.dir("Feather") * "/test/newdata/BOD.feather", df; append=true)
12Γ—2 DataFrames.DataFrame
β”‚ Row β”‚ Time β”‚ demand β”‚
β”œβ”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ 1   β”‚ 1.0  β”‚ 8.3    β”‚
β”‚ 2   β”‚ 2.0  β”‚ 10.3   β”‚
β”‚ 3   β”‚ 3.0  β”‚ 19.0   β”‚
β”‚ 4   β”‚ 4.0  β”‚ 16.0   β”‚
β”‚ 5   β”‚ 5.0  β”‚ 15.6   β”‚
β”‚ 6   β”‚ 7.0  β”‚ 19.8   β”‚
β”‚ 7   β”‚ 1.0  β”‚ 8.3    β”‚
β”‚ 8   β”‚ 2.0  β”‚ 10.3   β”‚
β”‚ 9   β”‚ 3.0  β”‚ 19.0   β”‚
β”‚ 10  β”‚ 4.0  β”‚ 16.0   β”‚
β”‚ 11  β”‚ 5.0  β”‚ 15.6   β”‚
β”‚ 12  β”‚ 7.0  β”‚ 19.8   β”‚
0reactions
wesmcommented, Mar 21, 2018

I created https://issues.apache.org/jira/browse/ARROW-2332 about working on this, let’s continue the discussion there

Read more comments on GitHub >

github_iconTop Results From Across the Web

Reading Pandas data frame stored with Feather into R
It's just brought it in as a tibble, which is more or less an 'enhanced' dataframe from the tidyverse world. You can see...
Read more >
11 Data import - R for Data Science - Hadley Wickham
In this chapter, you'll learn how to read plain-text rectangular files into R. Here, we'll only scratch the surface of data import, but...
Read more >
Subsetting tibbles
However, the behavior is different for tibbles and data frames in some cases: [ always returns a tibble by default, even if only...
Read more >
Chapter 4 Data Importing and β€œTidy” Data - ModernDive
We'll cover two methods for importing .csv and .xlsx spreadsheet data in R: one using the console and the other using RStudio's graphical...
Read more >
pandas.read_feather β€” pandas 1.5.2 documentation
Load a feather-format object from the file path. ... If not provided, all columns are read. ... Whether to parallelize reading using multiple...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found