How to detect the start of speech on Google Colab

How to detect the start of speech on Google Colab - javascript

I tried to detect the start of speech on google Colab.
I implemented the detecter which can obtain 1 sec recorded stream using javascript mediarecorder. Thresholding of the amplitude is working, and if it detects the speech, I put the 3-sec recording.
But there is some blank of stream between first 1-sec stream and last 3-sec streams.
I think there might be any delay of evaluating JS.
Could you suggest better idea for this?
I also tried this on Javascript before, but I couldn't find a way get magnitude of waveform stream from Blob event..
from IPython.display import Javascript
from google.colab import output
from base64 import b64decode
import array
RECORD = """
const sleep = time => new Promise(resolve => setTimeout(resolve, time))
const b2text = blob => new Promise(resolve => {
const reader = new FileReader()
reader.onloadend = e => resolve(e.srcElement.result)
reader.readAsDataURL(blob)
})
var record = time => new Promise(async resolve => {
stream = await navigator.mediaDevices.getUserMedia({ audio: true })
recorder = new MediaRecorder(stream)
chunks = []
recorder.ondataavailable = e => {
chunks.push(e.data)
}
recorder.start()
await sleep(time)
recorder.onstop = async ()=>{
blob = new Blob(chunks)
text = await b2text(blob)
resolve(text)
}
recorder.stop()
})
"""
def record(sec=3):
display(Javascript(RECORD))
s = output.eval_js('record(%d)' % (sec*1000))
b = b64decode(s.split(',')[1])
sound = AudioSegment.from_file(BytesIO(b))
# ipd.display(ipd.Audio(data=waveform, rate=sample_rate))
return sound
def AStoWave(sound):
waveform = sound.get_array_of_samples()
sample_rate = sound.frame_rate
return waveform, sample_rate
def detectSpeech(sec=3):
is_started = False
SILENCE_THREASHOLD = 80
while True:
sound = record(1)
vol = sound.max / 1e6
# print(vol)
if not is_started:
if vol >= SILENCE_THREASHOLD:
print('start of speech detected')
is_started = True
if is_started:
return sound + record(sec)
waveform, rate = AStoWave(detectSpeech())
ipd.display(ipd.Audio(data=waveform, rate=rate))

Related

Web Audio API - Stereo to Mono

I need to convert an stereo input (channelCount: 2) stream comming from chrome.tabCapture.capture to a mono stream and send it to a server, but keep the original audio unchanged.
I've tried several things but the destination.stream always has 2 channels.
const context = new AudioContext()
const splitter = context.createChannelSplitter(1)
const merger = context.createChannelMerger(1)
const source = context.createMediaStreamSource(stream)
const dest = context.createMediaStreamDestination()
splitter.connect(merger)
source.connect(splitter)
source.connect(context.destination) // audio unchanged
merger.connect(dest) // mono audio sent to "dest"
console.log(dest.stream.getAudioTracks()[0].getSettings()) // channelCount: 2
I've also tried this:
const context = new AudioContext()
const merger = context.createChannelMerger(1)
const source = context.createMediaStreamSource(stream)
const dest = context.createMediaStreamDestination()
source.connect(context.destination)
source.connect(merger)
merger.connect(dest)
console.log(dest.stream.getAudioTracks()[0].getSettings()) // channelCount: 2
and this:
const context = new AudioContext()
const source = context.createMediaStreamSource(stream)
const dest = context.createMediaStreamDestination({
channelCount: 1,
channelCountMode: 'explicit'
})
sourcer.connect(context.destination)
soruce.connect(dest)
console.log(dest.stream.getAudioTracks()[0].getSettings()) // channelCount: 2
there has to be an easy way to achieve this...
thanks!

There is a bug in Chrome which requires the audio to flow before the channelCount property gets updated. It's 2 by default.
The following example assumes that the AudioContext is running. Calling resume() in response to a user action should work in case it's not allowed to run on its own.
const audioContext = new AudioContext();
const sourceNode = new MediaStreamAudioSourceNode(
audioContext,
{ mediaStream }
);
const destinationNode = new MediaStreamAudioDestinationNode(
audioContext,
{ channelCount: 1 }
);
sourceNode.connect(destinationNode);
setTimeout(() => {
console.log(destinationNode.stream.getAudioTracks()[0].getSettings());
}, 100);

Procedural Audio using MediaStreamTrack

I want to encode a video (from a canvas) and add procedural audio to it.
The encoding can be accomplished with MediaRecorder that takes a MediaStream.
For the stream, I want to obtain the video part from a canvas, using the canvas.captureStream() call.
I want to add an audio track to the stream. But instead of microphone input, I want to generate the samples for those on the fly, for simplicity sake, let's assume it writes out a sine-wave.
How can I create a MediaStreamTrack that generates procedural audio?

The Web Audio API has a createMediaStreamDestination() method, which will return a MediaStreamAudioDestinationNode object, on which you'll be able to connect your audio context, and which will give you access to a MediaStream instance fed by the audio context audio output.
document.querySelector("button").onclick = (evt) => {
const duration = 5;
evt.target.remove();
const audioContext = new AudioContext();
const osc = audioContext.createOscillator();
const destNode = audioContext.createMediaStreamDestination();
const { stream } = destNode;
osc.connect(destNode);
osc.connect(audioContext.destination);
osc.start(0);
osc.frequency.value = 80;
osc.frequency.exponentialRampToValueAtTime(440, audioContext.currentTime+10);
osc.stop(duration);
// stream.addTrack(canvasStream.getVideoTracks()[0]);
const recorder = new MediaRecorder(stream);
const chunks = [];
recorder.ondataavailable = ({data}) => chunks.push(data);
recorder.onstop = (evt) => {
const el = new Audio();
const [{ type }] = chunks; // for Safari
el.src = URL.createObjectURL(new Blob(chunks, { type }));
el.controls = true;
document.body.append(el);
};
recorder.start();
setTimeout(() => recorder.stop(), duration * 1000);
console.log(`Started recording, please wait ${duration}s`);
};
<button>begin</button>

Try to make the audio frequency analyser in javascript

adobe audition has the audio frequency analyser.
But it need some people to see.
I want to make the similar function in javascript.
Just put .mp3 file and run will get the most times of frequency.
In below, I put 600hz pure audio in adobe audition.
enter image description here
const AudioContext = window.AudioContext || window.webkitAudioContext
const audioCtx = new AudioContext()
const gainNode = audioCtx.createGain()
const analyser = audioCtx.createAnalyser()
const audio = new Audio('./600hz.mp3')
const source = audioCtx.createMediaElementSource(audio)
source.connect(analyser)
analyser.connect(audioCtx.destination)
gainNode.gain.value = 1
analyser.fftSize = 1024
analyser.connect(gainNode)
const fftArray = new Uint8Array(analyser.fftSize)
analyser.getByteFrequencyData(fftArray)
audio.play()
const timer = setInterval(() => {
const fftArray = new Uint8Array(analyser.fftSize)
analyser.getByteFrequencyData(fftArray)
console.log(fftArray)
}, 100)
setTimeout(() => {
clearInterval(timer)
}, 2000)
In fact, I don't know so many knowledge about audio.
Hope someone give me advise.
Thanks.

WebRTC transmit high audio stream sample rate

Given a WebRTC PeerConnection between two clients, one client is trying to send an audio MediaStream to another.
If this MediaStream is an Oscillator at 440hz - everything works fine. The audio is very crisp, and the transmission goes through correctly.
However, if the audio is at 20000hz, the audio is very noisy and crackly - I expect to hear nothing, but I hear a lot of noise instead.
I believe this might be a problem of sample rate sent in the connection, maybe its not sending the audio at 48000samples/second like I expect.
Is there a way for me to increase the sample rate?
Here is a fiddle to reproduce the issue:
https://jsfiddle.net/mb3c5gw1/9/
Minimal reproduction code including a visualizer:
<button id="btn">start</button>
<canvas id="canvas"></canvas>
<script>class OscilloMeter{constructor(a){this.ctx=a.getContext("2d")}listen(a,b){function c(){g.getByteTimeDomainData(j),d.clearRect(0,0,e,f),d.beginPath();let a=0;for(let c=0;c<h;c++){const e=j[c]/128;var b=e*f/2;d.lineTo(a,b),a+=k}d.lineTo(canvas.width,canvas.height/2),d.stroke(),requestAnimationFrame(c)}const d=this.ctx,e=d.canvas.width,f=d.canvas.height,g=b.createAnalyser(),h=g.fftSize=256,j=new Uint8Array(h),k=e/h;d.lineWidth=2,a.connect(g),c()}}</script>
btn.onclick = e => {
const ctx = new AudioContext();
const source = ctx.createMediaStreamDestination();
const oscillator = ctx.createOscillator();
oscillator.type = 'sine';
oscillator.frequency.setValueAtTime(20000, ctx.currentTime); // value in hertz
oscillator.connect(source);
oscillator.start();
// a visual cue of AudioNode out (uses an AnalyserNode)
const meter = new OscilloMeter(canvas);
const pc1 = new RTCPeerConnection(),
pc2 = new RTCPeerConnection();
pc2.ontrack = ({
track
}) => {
const endStream = new MediaStream([track]);
const src = ctx.createMediaStreamSource(endStream);
const audio = new Audio();
audio.srcObject = endStream;
meter.listen(src, ctx);
audio.play()
};
pc1.onicecandidate = e => pc2.addIceCandidate(e.candidate);
pc2.onicecandidate = e => pc1.addIceCandidate(e.candidate);
pc1.oniceconnectionstatechange = e => console.log(pc1.iceConnectionState);
pc1.onnegotiationneeded = async e => {
try {
await pc1.setLocalDescription(await pc1.createOffer());
await pc2.setRemoteDescription(pc1.localDescription);
await pc2.setLocalDescription(await pc2.createAnswer());
await pc1.setRemoteDescription(pc2.localDescription);
} catch (e) {
console.error(e);
}
}
const stream = source.stream;
pc1.addTrack(stream.getAudioTracks()[0], stream);
};

Looking around in the webrtc demo i found this: https://webrtc.github.io/samples/src/content/peerconnection/audio/ in the example they show a dropdown where you can setup the audio codec. I think this is your solution.

How to cut an audio recording in JavaScript

Say I record audio from my microphone, using Recorder.js. How can I then trim/crop/slice/cut the recording? Meaning, say I record 3.5 seconds, and I want only the first 3 seconds. Any ideas?

You can stop the original recording at 3 seconds or use <audio> element, HTMLMediaElement.captureStream() and MediaRecorder to record 3 seconds of playback of the original recording.
const audio = new Audio(/* Blob URL or URL of recording */);
const chunks = [];
audio.oncanplay = e => {
audio.play();
const stream = audio.captureStream();
const recorder = new MediaRecorder(stream);
recorder.ondataavailable = e => {
if (recorder.state === "recording") {
recorder.stop()
};
chunks.push(e.data)
}
recorder.onstop = e => {
console.log(chunks)
}
recorder.start(3000);
}

Develop Reference

JavaScript is the programming language of the Web.

How to detect the start of speech on Google Colab - javascript

Related

Web Audio API - Stereo to Mono

Procedural Audio using MediaStreamTrack

Try to make the audio frequency analyser in javascript

WebRTC transmit high audio stream sample rate

How to cut an audio recording in JavaScript

Categories

Resources