question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Speed up get_formatted_array

See original GitHub issue

Problem description

Using get_formatted_array splits loc::techs and loc::tech::carriers string sets and interacts between xarray and pandas to produce a sparse matrix for easier indexing (e.g. summing over a single tech).

This can take a very long time for large DataArrays, and has been recorded as hitting memory limits for some devices.

So, it should be made more efficient. This could be a matter of defning loc::techs etc. as tuples instead of :: concatenated strings. Then they automagically are parsed as a MultiIndex, instead of needing to apply string operations.

Calliope version

0.6.3

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:7 (7 by maintainers)

github_iconTop GitHub Comments

2reactions
brynpickeringcommented, May 7, 2019

@timtroendle, I actually just switched off postprocessing on the cluster in my runs, due to the same issue… Anyway, I had some stuff waiting to go on this, see PR #231 for a working branch that you could test with. It may still blow up on unstacking the MultiIndex (but my memory profiling suggests a much lower memory use than the previous incarnation of get_formatted_array.

I’m a but confused by your solution, how does it go from ("loctechscarriers", data_var_df.index) to being possible to select a location using ("loctechscarriers", data_var_df.index)? If it offers an even better solution, I’m happy to look at updating the PR in line with it.

0reactions
sjpfenningercommented, May 14, 2019

Some ideas in here for further improvements of how we deal with arrays…

Read more comments on GitHub >

github_iconTop Results From Across the Web

python - fast formatted file output of numpy array
(fmt*len(a)).format(*a.tolist()) is a faster way of formatting the whole array. – hpaulj. Dec 12, 2019 at 17:31.
Read more >
Improving I/O Performance
To eliminate unnecessary overhead, write whole arrays or strings at one time rather than individual elements at multiple times. Each item in an...
Read more >
Speed Up Array Comparisons in Powershell with a ...
Summary: Learn how to speed up array comparisons in Windows PowerShell by using a runtime regular expression. Hey, Scripting Guy! Question.
Read more >
How to find elements in an array faster / without using for ...
I have the following working code with a for loop but I want to make the process faster. For the sizes of arrays...
Read more >
Slow Google Sheets? Here are 27 Ideas to Try Today
How can you speed up a slow Google Sheet? ... array notation, for example in this formula which gets the first 15,000 rows...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found