How to detect the start of speech on Google Colab - javascript

I tried to detect the start of speech on google Colab.
I implemented the detecter which can obtain 1 sec recorded stream using javascript mediarecorder. Thresholding of the amplitude is working, and if it detects the speech, I put the 3-sec recording.
But there is some blank of stream between first 1-sec stream and last 3-sec streams.
I think there might be any delay of evaluating JS.
Could you suggest better idea for this?
I also tried this on Javascript before, but I couldn't find a way get magnitude of waveform stream from Blob event..
from IPython.display import Javascript
from google.colab import output
from base64 import b64decode
import array
RECORD = """
const sleep = time => new Promise(resolve => setTimeout(resolve, time))
const b2text = blob => new Promise(resolve => {
const reader = new FileReader()
reader.onloadend = e => resolve(e.srcElement.result)
reader.readAsDataURL(blob)
})
var record = time => new Promise(async resolve => {
stream = await navigator.mediaDevices.getUserMedia({ audio: true })
recorder = new MediaRecorder(stream)
chunks = []
recorder.ondataavailable = e => {
chunks.push(e.data)
}
recorder.start()
await sleep(time)
recorder.onstop = async ()=>{
blob = new Blob(chunks)
text = await b2text(blob)
resolve(text)
}
recorder.stop()
})
"""
def record(sec=3):
display(Javascript(RECORD))
s = output.eval_js('record(%d)' % (sec*1000))
b = b64decode(s.split(',')[1])
sound = AudioSegment.from_file(BytesIO(b))
# ipd.display(ipd.Audio(data=waveform, rate=sample_rate))
return sound
def AStoWave(sound):
waveform = sound.get_array_of_samples()
sample_rate = sound.frame_rate
return waveform, sample_rate
def detectSpeech(sec=3):
is_started = False
SILENCE_THREASHOLD = 80
while True:
sound = record(1)
vol = sound.max / 1e6
# print(vol)
if not is_started:
if vol >= SILENCE_THREASHOLD:
print('start of speech detected')
is_started = True
if is_started:
return sound + record(sec)
waveform, rate = AStoWave(detectSpeech())
ipd.display(ipd.Audio(data=waveform, rate=rate))

Related

Web Audio API - Stereo to Mono

I need to convert an stereo input (channelCount: 2) stream comming from chrome.tabCapture.capture to a mono stream and send it to a server, but keep the original audio unchanged.
I've tried several things but the destination.stream always has 2 channels.
const context = new AudioContext()
const splitter = context.createChannelSplitter(1)
const merger = context.createChannelMerger(1)
const source = context.createMediaStreamSource(stream)
const dest = context.createMediaStreamDestination()
splitter.connect(merger)
source.connect(splitter)
source.connect(context.destination) // audio unchanged
merger.connect(dest) // mono audio sent to "dest"
console.log(dest.stream.getAudioTracks()[0].getSettings()) // channelCount: 2
I've also tried this:
const context = new AudioContext()
const merger = context.createChannelMerger(1)
const source = context.createMediaStreamSource(stream)
const dest = context.createMediaStreamDestination()
source.connect(context.destination)
source.connect(merger)
merger.connect(dest)
console.log(dest.stream.getAudioTracks()[0].getSettings()) // channelCount: 2
and this:
const context = new AudioContext()
const source = context.createMediaStreamSource(stream)
const dest = context.createMediaStreamDestination({
channelCount: 1,
channelCountMode: 'explicit'
})
sourcer.connect(context.destination)
soruce.connect(dest)
console.log(dest.stream.getAudioTracks()[0].getSettings()) // channelCount: 2
there has to be an easy way to achieve this...
thanks!
There is a bug in Chrome which requires the audio to flow before the channelCount property gets updated. It's 2 by default.
The following example assumes that the AudioContext is running. Calling resume() in response to a user action should work in case it's not allowed to run on its own.
const audioContext = new AudioContext();
const sourceNode = new MediaStreamAudioSourceNode(
audioContext,
{ mediaStream }
);
const destinationNode = new MediaStreamAudioDestinationNode(
audioContext,
{ channelCount: 1 }
);
sourceNode.connect(destinationNode);
setTimeout(() => {
console.log(destinationNode.stream.getAudioTracks()[0].getSettings());
}, 100);

Procedural Audio using MediaStreamTrack

I want to encode a video (from a canvas) and add procedural audio to it.
The encoding can be accomplished with MediaRecorder that takes a MediaStream.
For the stream, I want to obtain the video part from a canvas, using the canvas.captureStream() call.
I want to add an audio track to the stream. But instead of microphone input, I want to generate the samples for those on the fly, for simplicity sake, let's assume it writes out a sine-wave.
How can I create a MediaStreamTrack that generates procedural audio?
The Web Audio API has a createMediaStreamDestination() method, which will return a MediaStreamAudioDestinationNode object, on which you'll be able to connect your audio context, and which will give you access to a MediaStream instance fed by the audio context audio output.
document.querySelector("button").onclick = (evt) => {
const duration = 5;
evt.target.remove();
const audioContext = new AudioContext();
const osc = audioContext.createOscillator();
const destNode = audioContext.createMediaStreamDestination();
const { stream } = destNode;
osc.connect(destNode);
osc.connect(audioContext.destination);
osc.start(0);
osc.frequency.value = 80;
osc.frequency.exponentialRampToValueAtTime(440, audioContext.currentTime+10);
osc.stop(duration);
// stream.addTrack(canvasStream.getVideoTracks()[0]);
const recorder = new MediaRecorder(stream);
const chunks = [];
recorder.ondataavailable = ({data}) => chunks.push(data);
recorder.onstop = (evt) => {
const el = new Audio();
const [{ type }] = chunks; // for Safari
el.src = URL.createObjectURL(new Blob(chunks, { type }));
el.controls = true;
document.body.append(el);
};
recorder.start();
setTimeout(() => recorder.stop(), duration * 1000);
console.log(`Started recording, please wait ${duration}s`);
};
<button>begin</button>

Try to make the audio frequency analyser in javascript

adobe audition has the audio frequency analyser.
But it need some people to see.
I want to make the similar function in javascript.
Just put .mp3 file and run will get the most times of frequency.
In below, I put 600hz pure audio in adobe audition.
enter image description here
const AudioContext = window.AudioContext || window.webkitAudioContext
const audioCtx = new AudioContext()
const gainNode = audioCtx.createGain()
const analyser = audioCtx.createAnalyser()
const audio = new Audio('./600hz.mp3')
const source = audioCtx.createMediaElementSource(audio)
source.connect(analyser)
analyser.connect(audioCtx.destination)
gainNode.gain.value = 1
analyser.fftSize = 1024
analyser.connect(gainNode)
const fftArray = new Uint8Array(analyser.fftSize)
analyser.getByteFrequencyData(fftArray)
audio.play()
const timer = setInterval(() => {
const fftArray = new Uint8Array(analyser.fftSize)
analyser.getByteFrequencyData(fftArray)
console.log(fftArray)
}, 100)
setTimeout(() => {
clearInterval(timer)
}, 2000)
In fact, I don't know so many knowledge about audio.
Hope someone give me advise.
Thanks.

WebRTC transmit high audio stream sample rate

Given a WebRTC PeerConnection between two clients, one client is trying to send an audio MediaStream to another.
If this MediaStream is an Oscillator at 440hz - everything works fine. The audio is very crisp, and the transmission goes through correctly.
However, if the audio is at 20000hz, the audio is very noisy and crackly - I expect to hear nothing, but I hear a lot of noise instead.
I believe this might be a problem of sample rate sent in the connection, maybe its not sending the audio at 48000samples/second like I expect.
Is there a way for me to increase the sample rate?
Here is a fiddle to reproduce the issue:
https://jsfiddle.net/mb3c5gw1/9/
Minimal reproduction code including a visualizer:
<button id="btn">start</button>
<canvas id="canvas"></canvas>
<script>class OscilloMeter{constructor(a){this.ctx=a.getContext("2d")}listen(a,b){function c(){g.getByteTimeDomainData(j),d.clearRect(0,0,e,f),d.beginPath();let a=0;for(let c=0;c<h;c++){const e=j[c]/128;var b=e*f/2;d.lineTo(a,b),a+=k}d.lineTo(canvas.width,canvas.height/2),d.stroke(),requestAnimationFrame(c)}const d=this.ctx,e=d.canvas.width,f=d.canvas.height,g=b.createAnalyser(),h=g.fftSize=256,j=new Uint8Array(h),k=e/h;d.lineWidth=2,a.connect(g),c()}}</script>
btn.onclick = e => {
const ctx = new AudioContext();
const source = ctx.createMediaStreamDestination();
const oscillator = ctx.createOscillator();
oscillator.type = 'sine';
oscillator.frequency.setValueAtTime(20000, ctx.currentTime); // value in hertz
oscillator.connect(source);
oscillator.start();
// a visual cue of AudioNode out (uses an AnalyserNode)
const meter = new OscilloMeter(canvas);
const pc1 = new RTCPeerConnection(),
pc2 = new RTCPeerConnection();
pc2.ontrack = ({
track
}) => {
const endStream = new MediaStream([track]);
const src = ctx.createMediaStreamSource(endStream);
const audio = new Audio();
audio.srcObject = endStream;
meter.listen(src, ctx);
audio.play()
};
pc1.onicecandidate = e => pc2.addIceCandidate(e.candidate);
pc2.onicecandidate = e => pc1.addIceCandidate(e.candidate);
pc1.oniceconnectionstatechange = e => console.log(pc1.iceConnectionState);
pc1.onnegotiationneeded = async e => {
try {
await pc1.setLocalDescription(await pc1.createOffer());
await pc2.setRemoteDescription(pc1.localDescription);
await pc2.setLocalDescription(await pc2.createAnswer());
await pc1.setRemoteDescription(pc2.localDescription);
} catch (e) {
console.error(e);
}
}
const stream = source.stream;
pc1.addTrack(stream.getAudioTracks()[0], stream);
};
Looking around in the webrtc demo i found this: https://webrtc.github.io/samples/src/content/peerconnection/audio/ in the example they show a dropdown where you can setup the audio codec. I think this is your solution.

How to cut an audio recording in JavaScript

Say I record audio from my microphone, using Recorder.js. How can I then trim/crop/slice/cut the recording? Meaning, say I record 3.5 seconds, and I want only the first 3 seconds. Any ideas?
You can stop the original recording at 3 seconds or use <audio> element, HTMLMediaElement.captureStream() and MediaRecorder to record 3 seconds of playback of the original recording.
const audio = new Audio(/* Blob URL or URL of recording */);
const chunks = [];
audio.oncanplay = e => {
audio.play();
const stream = audio.captureStream();
const recorder = new MediaRecorder(stream);
recorder.ondataavailable = e => {
if (recorder.state === "recording") {
recorder.stop()
};
chunks.push(e.data)
}
recorder.onstop = e => {
console.log(chunks)
}
recorder.start(3000);
}

Categories

Resources