question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Efficient conversion from dataframes to numpy arrays retaining column dtypes and names

See original GitHub issue

Problem description

I was looking into how to convert dataframes to numpy arrays so that both column dtypes and names would be retained, preferably in an efficient way so that memory is not duplicated while doing this. In some way, I would like to have a view on internal data already stored by dataframes as a numpy array. I am good with all datatypes already used in dataframe, and names there.

The issue is that both as_matrix and values convert dtypes of all values. And to_records does not create a simple numpy array.

I have found two potential StackOverflow answers:

But it seems to me that all those solutions copy data around through intermediate data structures, and then just store them into a new numpy array.

So I would ask for a way to get data as it is, without any conversions of dtypes, as a numpy array.

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.5.2.final.0 python-bits: 64 OS: Linux OS-release: 4.9.27-moby machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: None LOCALE: None.None

pandas: 0.20.1 pytest: None pip: 9.0.1 setuptools: 20.7.0 Cython: 0.25.2 numpy: 1.12.1 scipy: 0.19.0 xarray: None IPython: None sphinx: None patsy: None dateutil: 2.6.0 pytz: 2017.2 blosc: None bottleneck: None tables: None numexpr: None feather: None matplotlib: None openpyxl: None xlrd: None xlwt: None xlsxwriter: None lxml: None bs4: None html5lib: None sqlalchemy: None pymysql: None psycopg2: None jinja2: None s3fs: None pandas_gbq: None pandas_datareader: None

Issue Analytics

  • State:closed
  • Created 6 years ago
  • Reactions:1
  • Comments:8 (7 by maintainers)

github_iconTop GitHub Comments

9reactions
chris-b1commented, May 31, 2017

From what I recall recarray is very thin subclass, something like this probably works if you have a strict ndarray requirement downstream.

In [14]: ra = frame.to_records(index=False)

In [15]: np.asarray(ra)
Out[15]: 
array([(1, 0, 3.4), (1, 0, 3.4), (2, 1, 4.5)], 
      dtype=(numpy.record, [('col1', '<i4'), ('col2', 'i1'), ('col3', '<f8')]))
4reactions
jrebackcommented, May 31, 2017

@mitar

using multi-dtype ndarrays is only supported via rec-arrays (as @chris-b1 shows how to convert).

You certainly can select out columns or do a .values conversion. But the target function needs to potentially deal with an object dtype array. So this is not efficient at all. You need to segregate dtypes; it is simply a lot of work to do with numpy arrays. pandas does this with ease. So you can certainly use some of the pointed to solutions. But I suspect you have other issues if the conversion to an ndarray is your bottleneck.

Read more comments on GitHub >

github_iconTop Results From Across the Web

How to keep column names when converting from pandas to ...
DataFrame to numpy.array while preserving label/column names ... import numpy def to_tensor(dataframe, columns = [], dtypes = {}): # Use all ...
Read more >
How to Convert Pandas DataFrames to NumPy Arrays [+ ...
Learn how to convert pandas DataFrames to NumPy arrays. We walk through the syntax in Python and examples of converting your DataFrames to ......
Read more >
Pandas DataFrame To NumPy Array – df.to_numpy()
DataFrame To Numpy Array - Change your data from a Pandas DataFrame into a NumPy array. Use the full force of the NumPy...
Read more >
How to convert pandas DataFrame to NumPy array?
This method simply takes a DataFrame as a parameter and converts it into NumPy array. The data type of the returned array will...
Read more >
How To Convert Pandas DataFrame Into NumPy Array
When working with pandas DataFrames it may sometimes be beneficial if we instead convert them into NumPy arrays. The latter seems to be...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found