Alright so I have this block of code here
ipd.Audio(audio[0].data.cpu().numpy(), rate=hparams.sampling_rate)
I am trying to use the audio[0].data.cpu().numpy() part which contains the audio array data.
I want to send it to the front-end, which I know how. But the problem is I don't know what to do with the data. I have done some research on converting numpy to other forms of data but still pretty lost on how to go about this.
What can I do in the front using JavaScript to turn it into audio. Or better yet using a flask server to redirect it to a get route that returns a mp3 file.
I would start looking into Audio Buffers.
Here is an example I copied from here.
This creates white noise, since we are pushing random values into the audio buffer. Here you have to use your numeric values. Please make sure how to set up the sample rate (should be defined in your python tool)
<body>
<h1>AudioBuffer example</h1>
<button>Make white noise</button>
<script>
const button = document.querySelector('button');
const myScript = document.querySelector('script');
let AudioContext = window.AudioContext || window.webkitAudioContext;
let audioCtx;
// Stereo
let channels = 2;
function init() {
audioCtx = new AudioContext();
}
button.onclick = function() {
if(!audioCtx) {
init();
}
// Create an empty two second stereo buffer at the
// sample rate of the AudioContext
let frameCount = audioCtx.sampleRate * 2.0;
let myArrayBuffer = audioCtx.createBuffer(channels, frameCount, audioCtx.sampleRate);
// Fill the buffer with white noise;
//just random values between -1.0 and 1.0
for (let channel = 0; channel < channels; channel++) {
// This gives us the actual array that contains the data
let nowBuffering = myArrayBuffer.getChannelData(channel);
for (let i = 0; i < frameCount; i++) {
// Math.random() is in [0; 1.0]
// audio needs to be in [-1.0; 1.0]
nowBuffering[i] = Math.random() * 2 - 1;
}
}
// Get an AudioBufferSourceNode.
// This is the AudioNode to use when we want to play an AudioBuffer
let source = audioCtx.createBufferSource();
// set the buffer in the AudioBufferSourceNode
source.buffer = myArrayBuffer;
// connect the AudioBufferSourceNode to the
// destination so we can hear the sound
source.connect(audioCtx.destination);
// start the source playing
source.start();
source.onended = () => {
console.log('White noise finished');
}
}
</script>
</body>
Related
Here's my basic example, I couldn't put it in a snippet as it generate a security error.
The problem is the processing rate is a little bit high, compared to my needs which is about 300 to 500 milliseconds between each. is there a way to control it.
And is there a way to pause the processing, until the microphone receives a input.
Thank you for your help.
html out that shows the rate:
<input type='text' id='output' >
the script:`
navigator.getUserMedia = navigator.getUserMedia ||
navigator.webkitGetUserMedia ||
navigator.mozGetUserMedia;
if (navigator.getUserMedia) {
navigator.getUserMedia({
audio: true
},
function(stream) {
output=document.getElementById("output");
audioContext = new AudioContext();
analyser = audioContext.createAnalyser();
microphone = audioContext.createMediaStreamSource(stream);
javascriptNode = audioContext.createScriptProcessor(256, 1, 1);
analyser.smoothingTimeConstant = 0;// 0.8;
analyser.fftSize = 32;//1024;
microphone.connect(analyser);
analyser.connect(javascriptNode);
javascriptNode.connect(audioContext.destination);
canvasContext = document.querySelector("#canvas").getContext("2d");
javascriptNode.onaudioprocess = function() {
var array = new Uint8Array(analyser.frequencyBinCount);
analyser.getByteFrequencyData(array);
var values = 0;
var length = array.length;
for (var i = 0; i < length; i++) {
values += (array[i]);
}
var average = values / length;
output.value= average;
} // end fn stream
},
function(err) {
console.log("The following error occured: " + err.name)
});
} else {
console.log("getUserMedia not supported");
}
What I'm trying to do is really simple. All I need is to scroll the page by a bit whenever the the audio volume pass a threshold, if you have a simpler alternative, It would be even better. Like how to access the volume in a setTimeout callback.
You create your ScriptProcessorNode with a buffer size of 256. That means the onaudioprocess event gets call every 256 frames, which is about every 5 ms (at 44.1 kHz). If you want something on the order of 300 ms, use 0.3 * 44100 or 16384 since the buffer size is a power of two.
Note also that you don't need to call the analyser node to get the data. The onaudioprocess function already has the data passed in in the event, which you don't use.
Also, depending on your use case, you could get rid of the script processor altogether and just use the analyser node to get the data you want. But then you'll need a setTimeout or requestAnimationFrame to periodically request data from the analyser node.
I'm exploring the Web Audio api in an attempt to try and adapt some aspects of the api into a non-web framework I'm working on (which'll get compiled for the web via Emscripten).
Take the following code:
var audioCtx = new AudioContext();
// imagine I've called getUserMedia and have the stream from a mic.
var source = audioCtx.createMediaStreamSource(stream);
// make a filter to alter the input somehow
var biquadFilter = audioCtx.createBiquadFilter();
// imagine we've set some settings
source.connect(biquadFilter);
Say I wanted to get the raw data of the input stream after it's been altered by the BiQuadFilter (or any other filter). Is there any way to do that? As far as I can tell it looks like the AnalyserNode might be what I'm looking for but ideally it'd be great to just pull a buffer off the end of the graph if possible.
Any hints or suggestions are appreciated.
There are two ways...
ScriptProcessorNode
You can use a ScriptProcessorNode, which is normally used to process data in your own code, to simply record the raw 32-bit float PCM audio data.
Whether this node outputs anything or not is up to you. I usually copy the input data to the output out of convenience, but there is a slight overhead to that.
MediaRecorder
The MediaRecorder can be used to record MediaStreams, both audio and/or video. First you'll need a MediaStreamAudioDestinationNode. Once you have that, you can use the MediaRecorder with the resulting stream to record it.
It's important to note that typically with the MediaRecorder, you're recording compressed audio with a lossy codec. This is essentially the purpose of the MediaRecorder. However, support for PCM in WebM has recently been added by at least Chrome. Just use {type: 'audio/webm;codecs=pcm'} when instantiating your MediaRecorder.
(I haven't tested this yet, but I suspect you're going to end up with 16-bit PCM, not 32-bit float which is used internally in the Web Audio API.)
here is a web page just save it mycode.html then give its file location to your browser ... it will prompt to gain access to your microphone ... take notice of createMediaStreamSource as well as where raw audio buffer is accessed then printed to browser console log ... essentially you define callback functions which make available the raw audio upon each Web Audio API event loop iteration - enjoy
<html><head><meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
<title>capture microphone then show time & frequency domain output</title>
<script type="text/javascript">
var webaudio_tooling_obj = function () {
var audioContext = new AudioContext();
console.log("audio is starting up ...");
var BUFF_SIZE_RENDERER = 16384;
var SIZE_SHOW = 3; // number of array elements to show in console output
var audioInput = null,
microphone_stream = null,
gain_node = null,
script_processor_node = null,
script_processor_analysis_node = null,
analyser_node = null;
if (!navigator.getUserMedia)
navigator.getUserMedia = navigator.getUserMedia || navigator.webkitGetUserMedia ||
navigator.mozGetUserMedia || navigator.msGetUserMedia;
if (navigator.getUserMedia){
navigator.getUserMedia({audio:true},
function(stream) {
start_microphone(stream);
},
function(e) {
alert('Error capturing audio.');
}
);
} else { alert('getUserMedia not supported in this browser.'); }
// ---
function show_some_data(given_typed_array, num_row_to_display, label) {
var size_buffer = given_typed_array.length;
var index = 0;
console.log("__________ " + label);
if (label === "time") {
for (; index < num_row_to_display && index < size_buffer; index += 1) {
var curr_value_time = (given_typed_array[index] / 128) - 1.0;
console.log(curr_value_time);
}
} else if (label === "frequency") {
for (; index < num_row_to_display && index < size_buffer; index += 1) {
console.log(given_typed_array[index]);
}
} else {
throw new Error("ERROR - must pass time or frequency");
}
}
function process_microphone_buffer(event) {
var i, N, inp, microphone_output_buffer;
// not needed for basic feature set
// microphone_output_buffer = event.inputBuffer.getChannelData(0); // just mono - 1 channel for now
}
function start_microphone(stream){
gain_node = audioContext.createGain();
gain_node.connect( audioContext.destination );
microphone_stream = audioContext.createMediaStreamSource(stream);
microphone_stream.connect(gain_node);
script_processor_node = audioContext.createScriptProcessor(BUFF_SIZE_RENDERER, 1, 1);
script_processor_node.onaudioprocess = process_microphone_buffer;
microphone_stream.connect(script_processor_node);
// --- enable volume control for output speakers
document.getElementById('volume').addEventListener('change', function() {
var curr_volume = this.value;
gain_node.gain.value = curr_volume;
console.log("curr_volume ", curr_volume);
});
// --- setup FFT
script_processor_analysis_node = audioContext.createScriptProcessor(2048, 1, 1);
script_processor_analysis_node.connect(gain_node);
analyser_node = audioContext.createAnalyser();
analyser_node.smoothingTimeConstant = 0;
analyser_node.fftSize = 2048;
microphone_stream.connect(analyser_node);
analyser_node.connect(script_processor_analysis_node);
var buffer_length = analyser_node.frequencyBinCount;
var array_freq_domain = new Uint8Array(buffer_length);
var array_time_domain = new Uint8Array(buffer_length);
console.log("buffer_length " + buffer_length);
script_processor_analysis_node.onaudioprocess = function() {
// get the average for the first channel
analyser_node.getByteFrequencyData(array_freq_domain);
analyser_node.getByteTimeDomainData(array_time_domain);
// draw the spectrogram
if (microphone_stream.playbackState == microphone_stream.PLAYING_STATE) {
show_some_data(array_freq_domain, SIZE_SHOW, "frequency");
show_some_data(array_time_domain, SIZE_SHOW, "time"); // store this to record to aggregate buffer/file
}
};
}
}(); // webaudio_tooling_obj = function()
</script>
</head>
<body>
<p>Volume</p>
<input id="volume" type="range" min="0" max="1" step="0.1" value="0.0"/>
</body>
</html>
a variation of above approach will allow you to replace microphone with your own logic to synthesize the audio curve again with ability to access the audio buffer
On other alternative is to create your graph with an OfflineAudioContext. You do have to know before hand how much data you want to capture, but if you do, you'll get your result faster than real-time, usually.
You'll get the raw PCM data so you can save it or analyze it, further modify it, or whatever.
I would like to use Pizzicato JS, with Three JS to create a sound visualizer. But for some reason after I get the frequency data, it's returning a frequency of 0 for each band. Is there something that I'm missing in order to get these frequencies so I can them manipulate my meshes with Three JS, please let me know?. I've attached a screenshot of my console window and pasted my code below for reference.
var context = Pizzicato.context;
var analyser = context.createAnalyser();
var ambient = new Pizzicato.Sound('./mp3/ambient.mp3', playAmbient);
ambient.loop = true;
ambient.volume = 1;
ambient.connect(analyser);
var frequencyData = new Uint8Array(analyser.frequencyBinCount);
console.log("Frequency Data: " , frequencyData);
console.log("Frequency Data Length: " , frequencyData.length);`
function playAmbient(e)
{
console.log("playAmbient();");
ambient.play();
}
Thanks
So I figured it out, I was expecting the frequency data to return back a array of frequencies for each band when doing a console.log(). When in reality I had to receive my frequency data using the getByteFrequencyData method. I've pasted my new set of code to reference the issue of the undefined data.
context = Pizzicato.context;
analyser = context.createAnalyser();
sound = new Pizzicato.Sound(params, playAmbient);
sound.volume = 1;
sound.connect(analyser);
function playAmbient(e)
{
console.log("playAmbient();");
ambient.play();
}
setInterval(function () {
try{
var bufferLength = analyser.frequencyBinCount;
frequencyData = new Uint8Array(bufferLength);
// The statement below was missing, and in return it will then
// update my frequencies for each band given from my
// frequencyData.
analyser.getByteFrequencyData(frequencyData);
// Now I'm seeing the frequencies update in my console.log window
// when each interval is fired.
console.log(frequencyData);
}catch(error){
console.log(error);
}
}, 500);
The web audio api furnish the method .stop() to stop a sound.
I want my sound to decrease in volume before stopping. To do so I used a gain node. However I'm facing weird issues with this where some sounds just don't play and I can't figure out why.
Here is a dumbed down version of what I do:
https://jsfiddle.net/01p1t09n/1/
You'll hear that if you remove the line with setTimeout() that every sound plays. When setTimeout is there not every sound plays. What really confuses me is that I use push and shift accordingly to find the correct source of the sound, however it seems like it's another that stop playing. The only way I can see this happening is if AudioContext.decodeAudioData isn't synchronous. Just try the jsfiddle to have a better understanding and put your headset on obviously.
Here is the code of the jsfiddle:
let url = "https://raw.githubusercontent.com/gleitz/midi-js-soundfonts/gh-pages/MusyngKite/acoustic_guitar_steel-mp3/A4.mp3";
let soundContainer = {};
let notesMap = {"A4": [] };
let _AudioContext_ = AudioContext || webkitAudioContext;
let audioContext = new _AudioContext_();
var oReq = new XMLHttpRequest();
oReq.open("GET", url, true);
oReq.responseType = "arraybuffer";
oReq.onload = function (oEvent) {
var arrayBuffer = oReq.response;
makeLoop(arrayBuffer);
};
oReq.send(null);
function makeLoop(arrayBuffer){
soundContainer["A4"] = arrayBuffer;
let currentTime = audioContext.currentTime;
for(let i = 0; i < 10; i++){
//playing at same intervals
play("A4", currentTime + i * 0.5);
setTimeout( () => stop("A4"), 500 + i * 500); //remove this line you will hear all the sounds.
}
}
function play(notePlayed, start) {
audioContext.decodeAudioData(soundContainer[notePlayed], (buffer) => {
let source;
let gainNode;
source = audioContext.createBufferSource();
gainNode = audioContext.createGain();
// pushing notes in note map
notesMap[notePlayed].push({ source, gainNode });
source.buffer = buffer;
source.connect(gainNode);
gainNode.connect(audioContext.destination);
gainNode.gain.value = 1;
source.start(start);
});
}
function stop(notePlayed){
let note = notesMap[notePlayed].shift();
note.source.stop();
}
This is just to explain why I do it like this, you can skip it, it's just to explain why I don't use stop()
The reason I'm doing all this is because I want to stop the sound gracefully, so if there is a possibility to do so without using setTimeout I'd gladly take it.
Basically I have a map at the top containing my sounds (notes like A1, A#1, B1,...).
soundMap = {"A": [], "lot": [], "of": [], "sounds": []};
and a play() fct where I populate the arrays once I play the sounds:
play(sound) {
// sound is just { soundName, velocity, start}
let source;
let gainNode;
// sound container is just a map from soundname to the sound data.
this.audioContext.decodeAudioData(this.soundContainer[sound.soundName], (buffer) => {
source = this.audioContext.createBufferSource();
gainNode = this.audioContext.createGain();
gainNode.gain.value = sound.velocity;
// pushing sound in sound map
this.soundMap[sound.soundName].push({ source, gainNode });
source.buffer = buffer;
source.connect(gainNode);
gainNode.connect(this.audioContext.destination);
source.start(sound.start);
});
}
And now the part that stops the sounds :
stop(sound){
//remember above, soundMap is a map from "soundName" to {gain, source}
let dasound = this.soundMap[sound.soundName].shift();
let gain = dasound.gainNode.gain.value - 0.1;
// we lower the gain via incremental values to not have the sound stop abruptly
let i = 0;
for(; gain > 0; i++, gain -= 0.1){ // watchout funky syntax
((gain, i) => {
setTimeout(() => dasound.gainNode.gain.value = gain, 50 * i );
})(gain, i)
}
// we stop the source after the gain is set at 0. stop is in sec
setTimeout(() => note.source.stop(), i * 50);
}
Aaah, yes, yes, yes! I finally found a lot of things by eventually bothering to read "everything" in the doc (diagonally). And let me tell you this api is a diamond in the rough. Anyway, they actually have what I wanted with Audio param :
The AudioParam interface represents an audio-related parameter, usually a parameter of an AudioNode (such as GainNode.gain). An
AudioParam can be set to a specific value or a change in value, and
can be scheduled to happen at a specific time and following a specific
pattern.
It has a function linearRampToValueAtTime()
And they even have an example with what I asked !
// create audio context
var AudioContext = window.AudioContext || window.webkitAudioContext;
var audioCtx = new AudioContext();
// set basic variables for example
var myAudio = document.querySelector('audio');
var pre = document.querySelector('pre');
var myScript = document.querySelector('script');
pre.innerHTML = myScript.innerHTML;
var linearRampPlus = document.querySelector('.linear-ramp-plus');
var linearRampMinus = document.querySelector('.linear-ramp-minus');
// Create a MediaElementAudioSourceNode
// Feed the HTMLMediaElement into it
var source = audioCtx.createMediaElementSource(myAudio);
// Create a gain node and set it's gain value to 0.5
var gainNode = audioCtx.createGain();
// connect the AudioBufferSourceNode to the gainNode
// and the gainNode to the destination
gainNode.gain.setValueAtTime(0, audioCtx.currentTime);
source.connect(gainNode);
gainNode.connect(audioCtx.destination);
// set buttons to do something onclick
linearRampPlus.onclick = function() {
gainNode.gain.linearRampToValueAtTime(1.0, audioCtx.currentTime + 2);
}
linearRampMinus.onclick = function() {
gainNode.gain.linearRampToValueAtTime(0, audioCtx.currentTime + 2);
}
Working example here
They also have different type of timings, like exponential instead of linear ramp which I guess would fit this scenario more.
I have a live, constant source of waveform data that gives me a second of single-channel audio with constant sample rate every second. Currently I play them this way:
// data : Float32Array, context: AudioContext
function audioChunkReceived (context, data, sample_rate) {
var audioBuffer = context.createBuffer(2, data.length, sample_rate);
audioBuffer.getChannelData(0).set(data);
var source = context.createBufferSource(); // creates a sound source
source.buffer = audioBuffer;
source.connect(context.destination);
source.start(0);
}
Audio plays fine but with noticeable pauses between consecutive chunks being played (as expected). I'd like to get rid of them and I understand I'll have to introduce some kind of buffering.
Questions:
Is there a JS library that can do this for me? (I'm in the process of searching through them)
If there is no library that can do this, how should I do it myself?
Detecting when playback finished in one source and have another one ready to play it immediately afterwards? (using AudioBufferSourceNode.onended event handler)
Create one large buffer and copy my audio chunks one after another and control the flow using AudioBufferSourceNode.start AudioBufferSourceNode.stop functions?
Something different?
I've written a small class in TypeScript that serves as buffer for now. It has bufferSize defined for controlling how many chunks it can hold. It's short and self-descriptive so I'll paste it here. There is much to improve so any ideas are welcome.
( you can quickly convert it to JS using: https://www.typescriptlang.org/play/ )
class SoundBuffer {
private chunks : Array<AudioBufferSourceNode> = [];
private isPlaying: boolean = false;
private startTime: number = 0;
private lastChunkOffset: number = 0;
constructor(public ctx:AudioContext, public sampleRate:number,public bufferSize:number = 6, private debug = true) { }
private createChunk(chunk:Float32Array) {
var audioBuffer = this.ctx.createBuffer(2, chunk.length, this.sampleRate);
audioBuffer.getChannelData(0).set(chunk);
var source = this.ctx.createBufferSource();
source.buffer = audioBuffer;
source.connect(this.ctx.destination);
source.onended = (e:Event) => {
this.chunks.splice(this.chunks.indexOf(source),1);
if (this.chunks.length == 0) {
this.isPlaying = false;
this.startTime = 0;
this.lastChunkOffset = 0;
}
};
return source;
}
private log(data:string) {
if (this.debug) {
console.log(new Date().toUTCString() + " : " + data);
}
}
public addChunk(data: Float32Array) {
if (this.isPlaying && (this.chunks.length > this.bufferSize)) {
this.log("chunk discarded");
return; // throw away
} else if (this.isPlaying && (this.chunks.length <= this.bufferSize)) { // schedule & add right now
this.log("chunk accepted");
let chunk = this.createChunk(data);
chunk.start(this.startTime + this.lastChunkOffset);
this.lastChunkOffset += chunk.buffer.duration;
this.chunks.push(chunk);
} else if ((this.chunks.length < (this.bufferSize / 2)) && !this.isPlaying) { // add & don't schedule
this.log("chunk queued");
let chunk = this.createChunk(data);
this.chunks.push(chunk);
} else { // add & schedule entire buffer
this.log("queued chunks scheduled");
this.isPlaying = true;
let chunk = this.createChunk(data);
this.chunks.push(chunk);
this.startTime = this.ctx.currentTime;
this.lastChunkOffset = 0;
for (let i = 0;i<this.chunks.length;i++) {
let chunk = this.chunks[i];
chunk.start(this.startTime + this.lastChunkOffset);
this.lastChunkOffset += chunk.buffer.duration;
}
}
}
}
You don't show how audioChunkReceived, but to get seamless playback, you have to make sure you have the data before you want to play it and before the previous one stops playing.
Once you have this, you can schedule the newest chunk to start playing when the previous one ends by calling start(t), where t is the end time of the previous chunk.
However, if the buffer sample rate is different from the context.sampleRate, it's probably not going to play smoothly because of the resampling that is needed to convert the buffer to the context rate.
I think it is because you allocate your buffer for 2 channel.
change that to one.
context.createBuffer(2, data.length, sample_rate);
to
context.createBuffer(1, data.length, sample_rate);