question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Use pyarrow Tensor dtype

See original GitHub issue

Feature request

I was going the discussion of converting tensors to lists. Is there a way to leverage pyarrow’s Tensors for nested arrays / embeddings?

For example:

import pyarrow as pa
import numpy as np
x = np.array([[2, 2, 4], [4, 5, 100]], np.int32)
pa.Tensor.from_numpy(x, dim_names=["dim1","dim2"])

Apache docs

Maybe this belongs into the pyarrow features / repo.

Motivation

Working with big data, we need to make sure to use the best data structures and IO out there

Your contribution

Can try to a PR if code changes necessary

Issue Analytics

  • State:open
  • Created 10 months ago
  • Comments:6 (2 by maintainers)

github_iconTop GitHub Comments

2reactions
rokcommented, Nov 21, 2022

The work stalled a little because it was not clear where TensorArray would live. However Arrow community recently agreed to make a well-known-extension-type document and I would like https://github.com/apache/arrow/pull/8510 to land there and add an implementation to C++/Python + another language. Is that something you would find beneficial to you?

2reactions
rokcommented, Nov 21, 2022

Hey @franz101 & @lhoestq! There is a plan and a PR to create an ExtensionArray of Tensors of equal sizes as well as a plan to do the same for Tensors of different sizes ARROW-8714.

Read more comments on GitHub >

github_iconTop Results From Across the Web

pyarrow.Tensor — Apache Arrow v10.0.1
pyarrow.Tensor¶ ; dim_name (self, i). Returns the name of the i-th tensor dimension. ; equals (self, Tensor other). Return true if the tensors...
Read more >
Data Types and In-Memory Data Model - Apache Arrow
The Dictionary type in PyArrow is a special array type that is similar to a factor in R or a pandas.Categorical . It...
Read more >
Data Types and Schemas — Apache Arrow v10.0.1
Construct pyarrow.Schema from collection of fields. from_numpy_dtype (dtype). Convert NumPy dtype to pyarrow.DataType.
Read more >
pyarrow.DataType — Apache Arrow v10.0.1
pyarrow.DataType¶ ... Base class of all Arrow data types. Each data type is an instance of this class. ... Return true if type...
Read more >
Extending pyarrow — Apache Arrow v10.0.1
Using the first approach, we create a UuidType subclass, and implement the __reduce__ method to ensure the class can be properly pickled: ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found