question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[Announcement] Improving I/O for correct and consistent experience

See original GitHub issue

tl;dr: how to migrate to new backend/interface in 0.7

  • If you are using torchaudio in Linux/macOS environments, please use torchaudio.set_audio_backend("sox_io") to adopt to the upcoming changes.

  • If you are in Windows environment, please set torchaudio.USE_SOUNDFILE_LEGACY_INTERFACE = False and reload backend to use the new interface.

  • Note that this ships with some bug-fixes for formats other than 16bit signed integer WAV, so you might experience some BC-breaking changes as described in the section below.

News [UPDATE] 2021/03/06

  • All the migration works have been completed on master branch.

[UPDATE] 2021/02/12

  • Added bits_per_sample and encoding argument (replaced dtype) to save function.

[UPDATE] 2021/01/29

  • Added encoding to AudioMetaData

[UPDATE] 2021/01/22

  • Added format argument to load/info/save function.
  • bits_per_sample to AudioMetaData

[UPDATE] 2020/10/21

  • Added Description of "soundfile" backend legacy interface.

[UPDATE] 2020/09/18

  • Added migration guide for "soundfile" backend.
  • Moved the phase when "soundfile" backend signatures change from 0.9.0 to 0.8.0 so that they match with "sox_io" backend, which becomes default in 0.8.0.

[UPDATE] 2020/09/17

  • Added information on deprecation of native libsox structures such as signalinfo_t and encoding_t.

Improving I/O for correct and consistent experience

This is an announcement for users that we are making backward-incompatible changes to I/O functions of torchaudio backends from 0.7.0 release throughout 0.9.0 release.

What is affected?

  • Public APIs

    • torchaudio.load
      • [Linux/macOS] By switching the default backend from "sox" backend to "sox_io" backend in 0.8.0, loading audio formats other than 16bit signed integer WAV returns the correct tensor.
      • [Linux/macOS/Windows] The signature of "soundfile" backend will be change in 0.8.0 to match that of "sox_io" backend.
    • torchaudio.save
      • [Linux/macOS] By switching to "sox_io" backend, saving audio files will no longer degrade the data. The supported format will be restricted to the tested formats only. (please refer to the doc for the supported formats.)
      • [Linux/macOS/Windows] The signature of "soundfile" backend will be change in 0.8.0 to match that of "sox_io" backend.
    • torchaudio.info
      • [Linux/macOS/Windows] The signature of "soundfile" backend will be change in 0.8.0 to match that of "sox_io" backend.
    • torchaudio.load_wav
      • will be removed in 0.9.0. (load function with normalize=False will provide the same functionality)
  • Internal APIs The following functions/classes of "sox" backend were accidentally exposed and will be removed in 0.9.0. There is no replacement for them. Please use save/load/info functions.

    • torchaudio.save_encinfo
      • will be removed in 0.9.0
    • torchaudio.get_sox_signalinfo_t
      • will be removed in 0.9.0
    • torchaudio.get_sox_encodinginfo_t
      • will be removed in 0.9.0
    • torchaudio.get_sox_option_t
      • will be removed in 0.9.0
    • torchaudio.get_sox_bool
      • will be removed in 0.9.0

The signatures of the other backends are not planned to be changed within this overhaul plan.

  • Classes
    • torchaudio.SignalInfo and torchaudio.EncodingInfo
      • will be replaced with AudioMetaData in 0.8.0 for "soundfile" backend
      • will be removed in 0.9.0

Why

There are currently three backends in torchaudio. (Please refer to the documentation for the detail.)

"sox" backend is the original backend, which binds libsox with pybind11. The functionalities (load / save / info) of this backend are not well-tested and have number of issues. (See https://github.com/pytorch/audio/pull/726).

Fixing these issues in backward-compatible manner is not straightforward. Therefore while we were adding TorchScript-compatible I/O functions, we decided to deprecate this original "sox" backend and replace it with the new backend ("sox_io" backend), which is confirmed not to have those issues.

When we are switching the default backend for Linux/macOS from "sox" to "sox_io" backend, we would like to align the interface of "soundfile" backend, therefore, we introduced the new interface (not a new backend to reduce the number of public API) to "soundfile" backend.

When / What Changes

The following is the timeline for the planned changes;

Phase Expected Release Expected Changes
1 0.7.0
(Oct 2020)
  • "sox" backend issues deprecation warning. ~#904~
  • "soundfile" backend issues warning of expected signature change. ~#906~
  • Add the new interface to "soubdfile" backend. ~#922~
  • load_wav function of all backends are marked as deprecated. ~#905~
2 0.8.0
(March 2021)
  • [BC-Breaking] "sox_io" backend becomes default backend. Function signatures of "soundfile" backend are aligned with "sox_io" backend. ~#978~
  • get_sox_XXX functions issue deprecation warning. ~#975~
3 0.9.0
  • "sox" backend is removed. ~#1311~
  • The legacy interface of "soundfile" backend is removed. ~#1311~
  • [BC-Breaking] load_wav functions are removed from all backends. ~#1362~

Planned signature changes of "soundfile" backend in 0.8.0

The following is the planned signature change of "soundfile" backend functions in 0.8.0 release.

info function

AudioMetaData implementation can be found here. The placement of the AudioMetaData might be changed.

~0.7.0 0.8.0
def info(
  filepath: str,
) ->
  Tuple[SignalInfo, EncodingInfo]
def info(
  filepath: str,
  format: Optional[str],
) ->
  AudioMetaData

Migration

The values returned from info function will be changed. Please use the corresponding new attributes.

~0.7.0 0.8.0
si, ei = torchaudio.info(filepath)
sample_rate = si.rate
num_frames = si.length
num_channels = si.channels
precision = si.precision
bits_per_sample = ei.bits_per_sample
encoding = ei.encoding
metadata = torchaudio.info(filepath)
sample_rate = metadata.sample_rate
num_frames = metadata.num_frames
num_channels = metadata.num_channels
bits_per_sample = metadata.bits_per_sample
encoding = metadata.encoding

Note If the attribute you are using is missing, file a Feature Request issue.

load function

~0.7.0 0.8.0
def load(
  filepath: str,
  # out: Optional[Tensor] = None,
      # To be removed.
      # Currently not used
      # Raise AssertionError if given
  normalization: Optional[bool] = True,
      # To be renamed to normalize.
      # Currently only accept True
      # Raise AssertionError if given
  channels_first: Optional[bool] = True,
  num_frames: int = 0,
  offset: int = 0,
      # To be renamed to frame_offset
  # signalinfo: SignalInfo = None,
      # To be removed
      # Currently not used
      # Raise AssertionError if given
  # encodinginfo: EncodingInfo = None,
      # To be removed
      # Currently not used
      # Raise AssertionError if given
  filetype: Optional[str] = None
      # To be removed
      # Currently not used
) -> Tuple[Tensor, int]
def load(
  filepath: str,
  frame_offset: int = 0,
  num_frames: int = -1,
  normalize: bool = True,
  channels_first: bool = True,
  format: Optional[str] = None,  # only required for file-like object input
) -> Tuple[Tensor, int]
Migration

Please change the argument names;

  • normalization -> normalize
  • offset -> frame_offst
~0.7.0 0.8.0
waveform, sample_rate = torchaudio.load(
    filepath,
    normalization=normalization,
    channels_first=channels_first,
    num_frames=num_frames,
    offset=offset,
)
waveform, sample_rate = torchaudio.load(
    filepath,
    frame_offset=frame_offset,
    num_frames=num_frames,
    normalize= normalization,
    channels_first=channels_first,
)

save function

~0.7.0 0.8.0
def save(
  filepath: str,
  src: Tensor,
  sample_rate: int,
  precision: int = 16,
    # moved to `bits_per_sample` argument
  channels_first: bool = True
)
def save(
  filepath: str,
  src: Tensor,
  sample_rate: int,
  channels_first: bool = True,
  compression: Optional[float] = None,
    # Added only for compatibility.
    # soundfile does not support compression option
    # Raises Warning if not None
  format: Optional[str] = None,
  encoding: Optoinal[str] = None,
  bits_per_sample: Optional[int] = None,
)
Migration
~0.7.0 0.8.0
torchaudio.save(
    filepath,
    waveform,
    sample_rate,
    channels_first
)
torchaudio.save(
    filepath,
    waveform,
    sample_rate,
    channels_first,
    bits_per_sample=16,
)
# You can also designate audio format with `format` and configure the encoding with `compression` and `encoding`. See https://pytorch.org/audio/master/backend.html#save for the detail 

BC-breaking changes

Read and write operations on the formats other than WAV 16-bit signed integer were affected by small bugs.

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Reactions:3
  • Comments:41 (24 by maintainers)

github_iconTop GitHub Comments

2reactions
mthrokcommented, Jan 22, 2021

(note: I updated the save un-normalization code snippet based on the suggestion.)

Hi @f0k

Thanks for the comment. Those are very good points.

Let me first tell you the context. The design principle for the new I/O modules are

  1. Correct Since I/O is the first part of data processing, these I/O modules must be returning the data as accurate (close to that in file format) as possible.
  2. Easy to use We want our library to be easy to use. In this case, since it’s common practice for DL application to work on floating point values within the range of[-1.0, 1.0].
  3. Predictable/reversible behavior Since we want the library to be a good building block of research/real world application, we want our features to be well mannered.

For the normalization, it is because of the principle 2 and 3 that we return the normalized value by default, and the normalization is performed on fixed coefficients. (Determined by dtypes) If we normalize the resulting tensor with the value found in the tensor, users will have questions like “what was the normalization coefficient being used?”, which they might never get an answer. Also it is because of the principle 1 we want to provide the option to return the uncompressed data without normalized. This design is influenced by spicy.io.wavfile.read function. If someone is working on non-DL application and wants to decode some audio data in the format other Python libraries do not support, they can use torchaudio as PyTorch provides zero-overhead conversion from Tensor to NumPy NDArray type. Now, for the parameter name "normalization", I get that it’s confusing. (There were other users who had the same confusion.) This is kind of historical. The previous backend had similar argument and when I started workin on this module, we did not intend to introduce the BC-braking change. As of your suggestion of as_float or floatify, I think there is still an ambiguity, as for the range value of the resulting Tensor. It is more explicit about the data type, but none of them are perfect, so I am in favor of keeping it as-is. However I think the documentation should be updated so that normalization is based on data type. For the dtype argument, it would be nice to do but that’s also something users can do easily. And since we expect floating type with [-1.0, 1.0] value range throughout the library (except kaldi module that was introduced without design review, which we plan to address), and the use of integer type is reserved for user-specific case, so I think the use-case is under defined from our perspective.

About the un-normalization process. I looked into some detail and now I think you are right. Let me give you why I suggested the formula. When I started writing the new loading function in C++, I wondered how I know my code is doing the right thing the resulting Tensor has right values. I ended up with this. Internally, libsox represents 32 bit signed integer so normalization was needed. At the time I did not know how libsox internally do the conversion, so I set up the test and change the normalization strategy until I found an acceptable one. (That is, values are close to what sox command generates, and there should be no overflow) I ended up with this normalization, which is the reverse of what you pointed out. This achieved about 4e-05 (or 3e-03 for mp3) closeness, which was the best.

Now, I understand the code base of libsox better and I digged into it to find how libsox does it and found the following. As you say it does normalization with single value and apply clipping.

https://github.com/dmkrepo/libsox/blob/b9dd1a86e71bbd62221904e3e59dfaa9e5e72046/src/sox.h#L994

I think I can update the implementation to do the same and that should yield the result even closer to sox.

For the saving part, as @faroit suggested above, I am thinking to include un-normalization inside of the save function and default to 16-bit signed integer. So that users are not bothered for un-normalization and to cover the most of real world use case with default.

2reactions
tbazincommented, Nov 5, 2020

This is great news, this will definitely improve trust and adoption of torchaudio 🙂 !

Read more comments on GitHub >

github_iconTop Results From Across the Web

Istio / Announcing Istio 1.9
Feb 9, 2021. We are pleased to announce the release of Istio 1.9! Our core focus for the 1.9 release was to improve...
Read more >
Transforming Your Release Notes into Product Announcements
Frequent and consistent product updates give existing customers confidence that the items they're asking for are on your team's radar and are ...
Read more >
In-app Announcements & Its Role in Product Adoption - Apty
In-app announcements with Software Walkthroughs can significantly improve your Product Adoption and User Onboarding by making your application ...
Read more >
The Importance of Consistency in Customer Service - 7 Things ...
With that in mind, here's a summary of why consistency is so important in customer service: People are sharing their experiences online and...
Read more >
Medallia Digital Experience Analytics
Enhance your digital customer experience using powerful Medallia Digital Experience ... The secret to consistent, exceptional digital experiences at scale.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found