question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Microphone buffer to Predictions.convert using React Native

See original GitHub issue

Describe the bug When I send my microphone buffer to Transcribe using Predictions.convert, I only get empty strings back. I’m not sure whether this should be a bug report or feature request. I’m guessing my audio buffer is formatted incorrectly for Predictions.convert, but the docs don’t give enough information to verify that. This may be related to this open issue: https://github.com/aws-amplify/amplify-js/issues/4163

To Reproduce Steps to reproduce the behavior:

  1. Follow the Amplify React-Native tutorial to this point: https://docs.amplify.aws/start/getting-started/nextsteps/q/integration/react-native
  2. Import this module to read the microphone stream: https://github.com/chadsmith/react-native-microphone-stream (the only one i’ve found that works with React Native)
  3. Convert the buffer using the pcmEncode function here: https://github.com/aws-samples/amazon-transcribe-websocket-static/blob/master/lib/audioUtils.js
  4. Send buffer using Predictions.convert as described here: https://docs.amplify.aws/lib/predictions/transcribe/q/platform/js#working-with-the-api
  5. Build app on Android phone (tested on Pixel 2). Verify that the app has microphone permissions. Press Start and talk into the microphone.

Expected behavior Expected a transcription of the spoken text to return – instead got only empty strings back.

Code Snippet My App.js is here:

import React, {useState} from 'react';
import {
  View,
  Text,
  StyleSheet,
  TextInput,
  Button,
  TouchableOpacity,
} from 'react-native';
import Amplify from 'aws-amplify';
import Predictions, {
  AmazonAIPredictionsProvider,
} from '@aws-amplify/predictions';
import awsconfig from './aws-exports';
// import LiveAudioStream from 'react-native-live-audio-stream';
// import AudioRecord from 'react-native-audio-record';
import MicStream from 'react-native-microphone-stream';

// from https://github.com/aws-samples/amazon-transcribe-websocket-static/tree/master/lib
const util_utf8_node = require('@aws-sdk/util-utf8-node'); // utilities for encoding and decoding UTF8
const marshaller = require('@aws-sdk/eventstream-marshaller'); // for converting binary event stream messages to and from JSON
const eventStreamMarshaller = new marshaller.EventStreamMarshaller(
  util_utf8_node.toUtf8,
  util_utf8_node.fromUtf8,
);

Amplify.configure(awsconfig);
Amplify.addPluggable(new AmazonAIPredictionsProvider());
const initialState = {name: '', description: ''};

global.Buffer = global.Buffer || require('buffer').Buffer;

function VoiceCapture() {
  const [text, setText] = useState('');

  // from https://github.com/aws-samples/amazon-transcribe-websocket-static/tree/master/lib
  function pcmEncode(input) {
    var offset = 0;
    var buffer = new ArrayBuffer(input.length * 2);
    var view = new DataView(buffer);
    for (var i = 0; i < input.length; i++, offset += 2) {
      var s = Math.max(-1, Math.min(1, input[i]));
      view.setInt16(offset, s < 0 ? s * 0x8000 : s * 0x7fff, true);
    }
    return buffer;
  }

  async function transcribe(bytes) {
    await Predictions.convert({
      transcription: {
        source: {
          bytes,
        },
        language: 'en-US',
      },
    })
      .then(({transcription: {fullText}}) => console.log({fullText}))
      .catch((err) => console.log({err}));
  }

  var listener = MicStream.addListener((data) => {
    // console.log(data);

    // encode the mic input
    let pcmEncodedBuffer = pcmEncode(data);

    // // add the right JSON headers and structure to the message
    // let audioEventMessage = getAudioEventMessage(
    //   global.Buffer.from(pcmEncodedBuffer),
    // );

    // //convert the JSON object + headers into a binary event stream message
    // let binary = eventStreamMarshaller.marshall(audioEventMessage);

    // the docs say this takes a PCM Audio byte buffer, so i assume the wrappers above aren't necessary. Tried them anyways with no luck.
    // (https://docs.amplify.aws/lib/predictions/transcribe/q/platform/js#set-up-the-backend)
    transcribe(pcmEncodedBuffer);
  });

  function startTranscribing() {
    MicStream.init({
      bufferSize: 4096 * 32, // tried multiplying this buffer size to send longer  - still no luck
      // sampleRate: 44100,
      sampleRate: 16000,
      bitsPerChannel: 16,
      channelsPerFrame: 1,
    });

    MicStream.start();
    console.log('Started mic stream');
  }

  function stopTranscribing() {
    MicStream.stop();
    listener.remove();
  }

  return (
    <View style={styles.container}>
      <View style={styles.horizontalView}>
        <TouchableOpacity
          style={styles.mediumButton}
          onPress={() => {
            // Voice.start('en_US');
            // transcribeAudio();
            startTranscribing();
          }}>
          <Text style={styles.mediumButtonText}>START</Text>
        </TouchableOpacity>

        <TouchableOpacity
          style={styles.mediumButton}
          onPress={() => {
            stopTranscribing();
          }}>
          <Text style={styles.mediumButtonText}>STOP</Text>
        </TouchableOpacity>
      </View>
      <TextInput
        style={styles.editableText}
        multiline
        onChangeText={(editedText) => setText(editedText)}>
        {text}
      </TextInput>
    </View>
  );
}

const App = () => {
  return (
    <View style={styles.container}>
      <VoiceCapture />
    </View>
  );
};

export const colors = {
  primary: '#0049bd',
  white: '#ffffff',
};

export const padding = {
  sm: 8,
  md: 16,
  lg: 24,
  xl: 32,
};
const styles = StyleSheet.create({
  container: {
    flex: 1,
    backgroundColor: 'white',
  },
  bodyText: {
    fontSize: 16,
    height: 20,
    fontWeight: 'normal',
    fontStyle: 'normal',
  },
  mediumButtonText: {
    fontSize: 16,
    height: 20,
    fontWeight: 'normal',
    fontStyle: 'normal',
    color: colors.white,
  },
  smallBodyText: {
    fontSize: 14,
    height: 18,
    fontWeight: 'normal',
    fontStyle: 'normal',
  },
  mediumButton: {
    alignItems: 'center',
    justifyContent: 'center',
    width: 132,
    height: 48,
    padding: padding.md,
    margin: 14,
    backgroundColor: colors.primary,
    fontSize: 20,
    fontStyle: 'normal',
    elevation: 1,
    shadowOffset: {width: 1, height: 1},
    shadowOpacity: 0.2,
    shadowRadius: 2,
    borderRadius: 2,
  },
  editableText: {
    textAlign: 'left',
    textAlignVertical: 'top',
    borderColor: 'black',
    borderWidth: 2,
    padding: padding.md,
    margin: 14,
    flex: 5,
    fontSize: 16,
    height: 20,
  },
  horizontalView: {
    flex: 1,
    flexDirection: 'row',
    alignItems: 'stretch',
    justifyContent: 'center',
  },
});

export default App;

Issue Analytics

  • State:open
  • Created 3 years ago
  • Comments:19 (10 by maintainers)

github_iconTop GitHub Comments

1reaction
nathandercommented, Oct 20, 2022

@meherranjan I ended up abandoning React Native for my project and building in Java and Swift instead.

1reaction
nathandercommented, Aug 14, 2020

@ashika01 I appreciate the update – thanks a lot for looking into this!

Read more comments on GitHub >

github_iconTop Results From Across the Web

Predictions - Example - JavaScript - AWS Amplify Docs
It shows how to use all scenarios above by calling the appropriate convert , identify , and interpret API calls in the Amplify...
Read more >
How to convert PCM samples from Microphone to bytes in ...
I have used a work around by making some changes in RecordingModule.java of react-native-recording module. – SandeepGurram. Feb 5, 2020 at 11:12.
Read more >
How to build a transcription app in React Native
For speech-to-text conversion, we'll use the Voice component supplied by the React Native Voice library, which contains numerous events that ...
Read more >
ffmpeg Documentation
All the format options (bitrate, codecs, buffer sizes) are then set ... Enable automatically inserting format conversion filters in all filter graphs, ...
Read more >
KeyEvent - Android Developers
A key press starts with a key event with ACTION_DOWN . ... as it gives more visibility to the user as to how...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found