Write a Cython/C in-place rstrip function to speed up writing of FITS tables
See original GitHub issueI was profiling writing of FITS tables with string columns (which has terrible performance at the moment) and I found that one of the bottlenecks is the following function in fitsrec.py
def _rstrip_inplace(array, chars=None):
"""
Performs an in-place rstrip operation on string arrays.
This is necessary since the built-in `np.char.rstrip` in Numpy does not
perform an in-place calculation. This can be removed if ever
https://github.com/numpy/numpy/issues/6303 is implemented (however, for
the purposes of this module the only in-place vectorized string functions
we need are rstrip and encode).
"""
for item in np.nditer(array, flags=['zerosize_ok'],
op_flags=['readwrite']):
item[...] = item.item().rstrip(chars)
At the moment, https://github.com/numpy/numpy/issues/6303 hasn’t been implemented yet, but it should be reasonably easy to write a C or Cython function that would implement something like this ourselves for now. Specifically, we need a function that loops over elements of a string array and replaces trailing spaces with null. I think the changes to Numpy would be a lot harder to do.
This could speed up writing FITS tables with character arrays by a factor of ~10x or more when combined with some other changes I’m preparing locally (PR forthcoming)
Issue Analytics
- State:
- Created 6 years ago
- Comments:6 (6 by maintainers)
Top Results From Across the Web
Python String rstrip() Method - GeeksforGeeks
Python String rstrip() method returns a copy of the string with trailing characters removed (based on the string argument passed).
Read more >Python String | rstrip() method with Examples - Javatpoint
Python rstrip () method removes all the trailing characters from the string. It means it removes all the specified characters from right side...
Read more >How can I speed up reading multiple files and putting the data ...
A much faster way of doing this is to read the contents of the input file into a primitive data structure such as...
Read more >Your Guide to the Python print() Function
This tutorial will get you up to speed with using Python print() effectively. ... character from a string in Python, use its .rstrip()...
Read more >Usage - reading/writing tables — petl 1.7.12 documentation
petl uses simple python functions for providing a rows and columns abstraction for reading and writing data from files, databases, and other sources....
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
@astrofrog - the numpy code essentially does the same as the above loop (it ends up calling back to python for the
rstrip
method), so I agree that is no solution. I played around a little and the following speeds up the code by a factor of 2 (for bytes; for unicode it is essentially the same, sadly). Maybe worth including until we do write the cython function…… … … But one does a factor 40 better by turning the arrays temporarily into numbers:
Here the loop in
j
tries to avoid creating very large temporary mask arrays (which would perhaps defeat the purpose of doing this in-place to start with). As written, it is obviously not safe for strange array shapes.I meant complexity mostly for code developers, of having to look at a differently coded piece to see what’s happening (for me at least, this is not a negligible hurdle).