Implement the first batch of Serialization / IO / Conversion functions
See original GitHub issueSee https://pandas.pydata.org/pandas-docs/stable/reference/frame.html#serialization-io-conversion
Looks like we can easily implement almost all of them by calling toPandas().func_name()
.
One thing is that some of the functions support max_rows
. When that argument is specified, we should add a limit
call in Spark to avoid moving all the data to the driver.
The list to add in the first batch are:
- to_dict (see #169)
- to_excel (#288)
- to_html (we already have this, but let’s add a limit when max_rows is set), done in #206
- to_latex (#297)
- to_records (#298)
- to_string (done in #211 and #213)
- to_clipboard (#257)
Skipping the following because I don’t know how popular they are:
- to_pickle
- to_hdf
- to_stata
- to_msgpack
- to_records
- to_sparse
- to_dense
The following might require parallelization with Pandas UDFs, rather than collecting all the data to the driver, so leaving them for the future:
- to_sql
- to_gbq
I’m also not adding json and csv here. We need to design those properly because both Spark and Pandas have those.
Issue Analytics
- State:
- Created 4 years ago
- Comments:20 (16 by maintainers)
Top Results From Across the Web
Serialization in Java - DigitalOcean
Deserialization is the process of converting Object stream to actual Java Object to be used in our program. Serialization in Java seems very ......
Read more >Serialization and Deserialization in Java with Example
Serialization is a mechanism of converting the state of an object into a byte stream. Deserialization is the reverse process where the byte ......
Read more >Introduction to Java Serialization | Baeldung
Serialization is the conversion of the state of an object into a byte stream; deserialization does the opposite.
Read more >Serialization and deserialization in Java | Snyk Blog
Let's look at the following example of Java deserialize vulnerability where we serialize an object from a serializable class ValueObject :
Read more >Everything You Need to Know About Java Serialization ...
First, the object is checked to ensure it implements Serializable , and then, it is checked to see whether either of those private...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Thanks, @shril. This issue is nicely finished.
Thank you all!