question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Incorrect serialization of ints using Apache Arrow

See original GitHub issue

Summary

When using Apache Arrow to serialize dataframes (the default since Streamlit 0.85), negative ints are handled incorrectly.

Steps to reproduce

Code snippet:

import pandas as pd
import streamlit as st

df = pd.DataFrame({
    "ints": [-1, 0, 1],
    "floats": [-1.0, 0.0, 1.0],
})
st.table(df)

Expected behavior:

Using the legacy serializer, the negative ints are displayed correctly.

streamlit run --global.dataFrameSerialization=legacy apps/neg.py
image

Actual behavior:

With Arrow serialization, the -1 becomes 18446744073709551615.

streamlit run --global.dataFrameSerialization=arrow apps/neg.py
image

Is this a regression?

Yes. The legacy serializer handled negative ints correctly. The arrow serializer became the default in Streamlit 0.85.

Debug info

  • Streamlit version: Streamlit, version 0.88.0
  • Python version: Python 3.9.7
  • Using Conda? PipEnv? PyEnv? Pex? pip
  • OS version: Debian GNU/Linux 11 (python:3.9 container)
  • Browser version: Safari 14.1.2 on macOS

Additional information

None.

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:5 (2 by maintainers)

github_iconTop GitHub Comments

1reaction
asainicommented, Oct 2, 2021
0reactions
LukasMasuchcommented, May 29, 2022

Closing this since it seems to work fine with the current version. I did quick tests with the current version of streamlit and a few different browsers and wasn’t able to reproduce this issue.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Streaming, Serialization, and IPC — Apache Arrow v10.0.1
File or Random Access format: for serializing a fixed number of record batches. Supports random access, and thus is very useful when used...
Read more >
Use arrow as a general data serialization framework ... - GitHub
We want to transfer streaming data between java/python/c++ efficiently. Arrow is a great framework for cross-language data transfer. But it ...
Read more >
It's not just a serde. One of its key use cases is eliminating serde.
If I convert an Arrow int8 array to normal python list of int's, ... Arrow's RPC serialization [1] is basically Protobuf, with a...
Read more >
Serialization — Ray 0.8.4 documentation
Primitive types: ints, floats, longs, bools, strings, unicode, and numpy arrays. ... Ray optimizes for numpy arrays by using the Apache Arrow data...
Read more >
"Apache Arrow and the Future of Data Frames" with Wes ...
Title: Apache Arrow and the Future of Data FramesSpeaker: Wes McKinney, Director, Ursa LabsDate: July 8, 2020ABSTRACTIn this talk I will ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found