question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Speeding up Scaper with in-memory operations

See original GitHub issue

I’ve done some investigation into speeding up Scaper quite a bit. The key idea is to make everything happen in-memory and most importantly reduce the number of exec calls. I’m writing out the roadmap that we came up with offline here. The milestones we should hit to make this happen are:

  • Write a profiling script that reliably tells us the speed of Scaper right now. This shouldn’t touch any existing code but should be a good benchmarking tool that we can use to report progress on this issue. (PR: #106)
  • We have soundfile now, so let’s use soundfile.info instead of sox.file_info. This avoids a costly exec call. (PR #110)
  • The current pysox master now can accept and return numpy arrays when doing tfm.build. This API may change slightly, however. See this issue and the associated PR. By updating to this, we will be able to do everything in-memory, but will still have exec calls. (PR #111)
  • Once we use soundfile to load the audio files, we can get rid of sox.Combiner and should do all the summing of mixtures in-memory. (PR #111)
  • Since we have soundfile which can read from an offset and a duration, we should use that instead of making calls to tfm.trim, which is not as efficient. tfm.trim decodes the entire file into memory, then trims it, which can be costly for large files, I believe. The same goes for tfm.fade which is slow compared to just multiplying in-memory with a quarter-sine ramp. (PR #117)
  • I have been working on making Python bindings for Sox with the goal of it being a drop-in replacement for PySox’s transformer API. The project is soxbindings. In it, I subclass sox.Transformer to have the same API with respect to Scaper’s use case. The bindings are generated via pybind11 and avoid all exec calls. The change at this point should be as simple as import sox -> import soxbindings as sox, after making soxbindings pip-installable with cross-platform wheels. At the moment soxbindings does not work on Windows, but we can make it an optional import in Scaper: if it’s installed, use it, otherwise fall back to sox.
  • We decided that tfm.trim replacement should happen later, as it changes the regression data.
  • Now we make use of pyloudnorm, which gets around another exec call that’s in the current LUFS calculation, which uses ffmpeg. This breaks the regression data as pyloudnorm and ffmpeg LUFS don’t match exactly (they’re close enough). Everything up to this point should match the current regression data and pass tests. After this, we need to change the regression data.

This issue is related to #76 and #3.

Update, the returns milestone Scaper.generate has been moved to a separate issue: #114.

  • We can now have the Scaper.generate function return numpy arrays. This is useful for on-the-fly mixing. At this stage, we should also make it so that the Scaper object doesn’t have to save jams and audio to disk but can instead just return them. Possibly this could be done via jams_file=None and audio_file=None when passing it into Scaper as arguments.

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Reactions:1
  • Comments:9 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
pseethcommented, Sep 24, 2020

All of this happened!

1reaction
pseethcommented, Jul 21, 2020

So replacing tfm.fade operations with the following snippet works great:

https://github.com/justinsalamon/scaper/blob/dfb50f561b532b46930e3bb2bac664ce3fdf261c/scaper/core.py#L1820-L1828

However, when trying to replace tfm.trim operations in a similar way by removing:

https://github.com/justinsalamon/scaper/blob/dfb50f561b532b46930e3bb2bac664ce3fdf261c/scaper/core.py#L1781-L1783

and replacing it with (using SoundFile to read from disk at specified start and stop according to the EventSpec):

# synthesize edited foreground sound event
event_sr = soundfile.info(e.value['source_file']).samplerate
start_sample = int(event_sr * e.value['source_time'])
stop_sample = start_sample + int(event_sr * e.value['source_duration'])

event_audio, event_sr = soundfile.read(
  e.value['source_file'], always_2d=True,
  start=start_sample, stop=stop_sample)

results in test failures only when resampling is needed. This is because sox.Transformer resamples the entire audio file first, then trims it. In the workflow above, where the trim is done when the audio is loaded in, the audio is trimmed first then resampled. These operations can’t be swapped (resample -> trim != trim -> resample).

However, for long source audio files, it’d be better to trim before resampling. Otherwise, Scaper would incur the cost of resampling the entire audio file first before trimming it every time it accessed that audio in a soundscape. Making this change would require updating the regression data so I suggest we leave it till the end.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Web Scraping Speed: Processes, Threads and Async - ScrapFly
We'll cover what are CPU-bound and IO-bound tasks, how we can optimize for them using processes, threads and asyncio to speed up our...
Read more >
How to speed up Puppeteer scraping with parallelization
This makes it easy to run jobs in parallel, speeding up the scraping. But as each job needs a tab in the browser,...
Read more >
How to Speed Up Data Scraping Application with ...
When I first learned about concurrency using C in class, I was so amazed at the idea of finishing tasks quicker by leveraging...
Read more >
Speeding up scraping? - KODI Forum
I realized sraping (generic album/artist scraper) is incredibly slow. This is sort of an issue especially for large libraries, as Kodi memory ......
Read more >
Advanced Python Web Scraping: Best Practices & Workarounds
There are a variety of obstacles that you may encounter when web scraping with Python, so here's how to resolve them.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found