Speeding up Scaper with in-memory operations
See original GitHub issueI’ve done some investigation into speeding up Scaper quite a bit. The key idea is to make everything happen in-memory and most importantly reduce the number of exec calls. I’m writing out the roadmap that we came up with offline here. The milestones we should hit to make this happen are:
- Write a profiling script that reliably tells us the speed of Scaper right now. This shouldn’t touch any existing code but should be a good benchmarking tool that we can use to report progress on this issue. (PR: #106)
- We have
soundfile
now, so let’s usesoundfile.info
instead ofsox.file_info
. This avoids a costly exec call. (PR #110) - The current
pysox
master now can accept and return numpy arrays when doingtfm.build
. This API may change slightly, however. See this issue and the associated PR. By updating to this, we will be able to do everything in-memory, but will still have exec calls. (PR #111) - Once we use
soundfile
to load the audio files, we can get rid ofsox.Combiner
and should do all the summing of mixtures in-memory. (PR #111) - Since we have
soundfile
which can read from an offset and a duration, we should use that instead of making calls totfm.trim
, which is not as efficient.tfm.trim
decodes the entire file into memory, then trims it, which can be costly for large files, I believe. The same goes fortfm.fade
which is slow compared to just multiplying in-memory with a quarter-sine ramp. (PR #117) - I have been working on making Python bindings for Sox with the goal of it being a drop-in replacement for PySox’s transformer API. The project is soxbindings. In it, I subclass
sox.Transformer
to have the same API with respect to Scaper’s use case. The bindings are generated viapybind11
and avoid all exec calls. The change at this point should be as simple asimport sox
->import soxbindings as sox
, after makingsoxbindings
pip-installable with cross-platform wheels. At the momentsoxbindings
does not work on Windows, but we can make it an optional import in Scaper: if it’s installed, use it, otherwise fall back tosox
. - We decided that
tfm.trim
replacement should happen later, as it changes the regression data. - Now we make use of pyloudnorm, which gets around another exec call that’s in the current LUFS calculation, which uses
ffmpeg
. This breaks the regression data as pyloudnorm and ffmpeg LUFS don’t match exactly (they’re close enough). Everything up to this point should match the current regression data and pass tests. After this, we need to change the regression data.
This issue is related to #76 and #3.
Update, the returns milestone Scaper.generate
has been moved to a separate issue: #114.
- We can now have the
Scaper.generate
function return numpy arrays. This is useful for on-the-fly mixing. At this stage, we should also make it so that the Scaper object doesn’t have to save jams and audio to disk but can instead just return them. Possibly this could be done viajams_file=None
andaudio_file=None
when passing it into Scaper as arguments.
Issue Analytics
- State:
- Created 3 years ago
- Reactions:1
- Comments:9 (3 by maintainers)
Top Results From Across the Web
Web Scraping Speed: Processes, Threads and Async - ScrapFly
We'll cover what are CPU-bound and IO-bound tasks, how we can optimize for them using processes, threads and asyncio to speed up our...
Read more >How to speed up Puppeteer scraping with parallelization
This makes it easy to run jobs in parallel, speeding up the scraping. But as each job needs a tab in the browser,...
Read more >How to Speed Up Data Scraping Application with ...
When I first learned about concurrency using C in class, I was so amazed at the idea of finishing tasks quicker by leveraging...
Read more >Speeding up scraping? - KODI Forum
I realized sraping (generic album/artist scraper) is incredibly slow. This is sort of an issue especially for large libraries, as Kodi memory ......
Read more >Advanced Python Web Scraping: Best Practices & Workarounds
There are a variety of obstacles that you may encounter when web scraping with Python, so here's how to resolve them.
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
All of this happened!
So replacing
tfm.fade
operations with the following snippet works great:https://github.com/justinsalamon/scaper/blob/dfb50f561b532b46930e3bb2bac664ce3fdf261c/scaper/core.py#L1820-L1828
However, when trying to replace
tfm.trim
operations in a similar way by removing:https://github.com/justinsalamon/scaper/blob/dfb50f561b532b46930e3bb2bac664ce3fdf261c/scaper/core.py#L1781-L1783
and replacing it with (using SoundFile to read from disk at specified start and stop according to the EventSpec):
results in test failures only when resampling is needed. This is because
sox.Transformer
resamples the entire audio file first, then trims it. In the workflow above, where the trim is done when the audio is loaded in, the audio is trimmed first then resampled. These operations can’t be swapped (resample -> trim != trim -> resample).However, for long source audio files, it’d be better to trim before resampling. Otherwise, Scaper would incur the cost of resampling the entire audio file first before trimming it every time it accessed that audio in a soundscape. Making this change would require updating the regression data so I suggest we leave it till the end.