FFmpeg converting from video to audio missing duration - javascript

I'm attempting to load YouTube videos via their direct video URL (retrieved using ytdl-core). I load them using the request library. I then pipe the result to a stream, which is used as the input to ffmpeg (via fluent-ffmpeg). The code looks something like this:
var getAudioStream = function(req, res) {
var requestUrl = 'http://youtube.com/watch?v=' + req.params.videoId;
var audioStream = new PassThrough();
var videoUrl;
ytdl.getInfo(requestUrl, { downloadURL: true }, function(err, info) {
res.setHeader('Content-Type', 'audio/x-wav');
res.setHeader('Accept-Ranges', 'bytes');
videoUrl = info.formats ? info.formats[0].url : '';
request(videoUrl).pipe(audioStream);
ffmpeg()
.input(audioStream)
.outputOptions('-map_metadata 0')
.format('wav')
.pipe(res);
});
};
This actually works just fine, and the frontend successfully receives just the audio in WAV format and is playable. However, the audio is missing any information about its size or duration (and all other metadata). This also makes it unseekable.
I'm assuming this is lost somewhere during the ffmpeg stage, because if I load the video directly via the URL passed to request it loads and plays fine, and has a set duration/is seekable. Any ideas?

It isn't possible to know the output size nor duration until it is finished. FFmpeg cannot know this information ahead of time in most cases. Even if it could, the way you are executing FFmpeg it prevents you from accessing the extra information.
Besides, to support seeking you need to support range requests. This isn't possible either, short of encoding the file up to the byte requested and streaming from there on.
Basically, this isn't possible by the nature of what you're doing.

Related

How to use GCP DLP with a file stream

I'm working with Node.js and GCP Data Loss Prevention to attempt to redact sensitive data from PDFs before I display them. GCP has great documentation on this here
Essentially you pull in the nodejs library and run this
const fileBytes = Buffer.from(fs.readFileSync(filepath)).toString('base64');
// Construct image redaction request
const request = {
parent: `projects/${projectId}/locations/global`,
byteItem: {
type: fileTypeConstant,
data: fileBytes,
},
inspectConfig: {
minLikelihood: minLikelihood,
infoTypes: infoTypes,
},
imageRedactionConfigs: imageRedactionConfigs,
};
// Run image redaction request
const [response] = await dlp.redactImage(request);
const image = response.redactedImage;
So normally, I'd get the file as a buffer, then pass it to the DLP function like the above. But, I'm no longer getting our files as buffers. Since many files are very large, we now get them from FilesStorage as streams, like so
return FilesStorage.getFileStream(metaFileInfo1, metaFileInfo2, metaFileInfo3, fileId)
.then(stream => {
return {fileInfo, stream};
})
The question is, is it possible to perform DLP image redaction on a stream instead of a buffer? If so, how?
I've found some other questions that say you can stream with ByteContentItem and GCPs own documentation mentions "streams". But, I've tried passing the returned stream from .getFileStream into the above byteItem['data'] property, and it doesn't work.
So chunking the stream up into buffers of appropriate size is going to work best here. There seem to be a number of approaches to build buffers from a stream you can use here.
Potentially relevant: Convert stream into buffer?
(A native stream interface is a good feature request, just not yet there.)

Best way to stream large local media files with electron main process and handle them in the renderer process

Hello there,
I will try to be more specific about my question, I'm using angular 9 and latest electron version 8.1.1.
My goal is to read large media files like videos(>4go), songs(flac) or other large files located on my file system but using the stream api in nodeJS in the main process and sending the result (chunk,buffer) to the renderer process.
How can I handle the stream buffer in the renderer process, I would like to use video and audio api already built in html5, thus using ipc between the main and renderer process, I will send the chunks and would like to be able to display,play,read those chunk using html5 api's (e.g with the video tag and attribute src or audio tag and attribute src) what should I give as a src for these elements so that it works like a basic web stream with an url ?
Here is my main process code (where I use fs with nodejs):
ipcMain.on('getFileStream', (event, arg) => {
try {
if(fs.existsSync(arg.path) && fs.lstatSync(arg.path).isFile()) {
const s = fs.createReadStream(arg.path)
s.on('data', (chunk: Buffer) => {
console.log(chunk)
event.sender.send('getFileStreamResponse', chunk)
}).on('end', () => {
console.log("end stream")
event.sender.send('getFileStreamResponse', false)
s.close()
});
}
} catch(e) {
console.log('getFileStream', e)
}
})
I don't want to wait all chunks to begin to play the media, I want to be able to play the media when enough data is loaded. I have read about Mediasources in html5 but wanted to know if it's the right way or do you know a better way to read large files and rebuilt the chunks in the renderer process ?
Thanks a lot for your time

Blob name issue with new tab in chrome and firefox [duplicate]

In my Vue app I receive a PDF as a blob, and want to display it using the browser's PDF viewer.
I convert it to a file, and generate an object url:
const blobFile = new File([blob], `my-file-name.pdf`, { type: 'application/pdf' })
this.invoiceUrl = window.URL.createObjectURL(blobFile)
Then I display it by setting that URL as the data attribute of an object element.
<object
:data="invoiceUrl"
type="application/pdf"
width="100%"
style="height: 100vh;">
</object>
The browser then displays the PDF using the PDF viewer. However, in Chrome, the file name that I provide (here, my-file-name.pdf) is not used: I see a hash in the title bar of the PDF viewer, and when I download the file using either 'right click -> Save as...' or the viewer's controls, it saves the file with the blob's hash (cda675a6-10af-42f3-aa68-8795aa8c377d or similar).
The viewer and file name work as I'd hoped in Firefox; it's only Chrome in which the file name is not used.
Is there any way, using native Javascript (including ES6, but no 3rd party dependencies other than Vue), to set the filename for a blob / object element in Chrome?
[edit] If it helps, the response has the following relevant headers:
Content-Type: application/pdf; charset=utf-8
Transfer-Encoding: chunked
Content-Disposition: attachment; filename*=utf-8''Invoice%2016246.pdf;
Content-Description: File Transfer
Content-Encoding: gzip
Chrome's extension seems to rely on the resource name set in the URI, i.e the file.ext in protocol://domain/path/file.ext.
So if your original URI contains that filename, the easiest might be to simply make your <object>'s data to the URI you fetched the pdf from directly, instead of going the Blob's way.
Now, there are cases it can't be done, and for these, there is a convoluted way, which might not work in future versions of Chrome, and probably not in other browsers, requiring to set up a Service Worker.
As we first said, Chrome parses the URI in search of a filename, so what we have to do, is to have an URI, with this filename, pointing to our blob:// URI.
To do so, we can use the Cache API, store our File as Request in there using our URL, and then retrieve that File from the Cache in the ServiceWorker.
Or in code,
From the main page
// register our ServiceWorker
navigator.serviceWorker.register('/sw.js')
.then(...
...
async function displayRenamedPDF(file, filename) {
// we use an hard-coded fake path
// to not interfere with legit requests
const reg_path = "/name-forcer/";
const url = reg_path + filename;
// store our File in the Cache
const store = await caches.open( "name-forcer" );
await store.put( url, new Response( file ) );
const frame = document.createElement( "iframe" );
frame.width = 400
frame.height = 500;
document.body.append( frame );
// makes the request to the File we just cached
frame.src = url;
// not needed anymore
frame.onload = (evt) => store.delete( url );
}
In the ServiceWorker sw.js
self.addEventListener('fetch', (event) => {
event.respondWith( (async () => {
const store = await caches.open("name-forcer");
const req = event.request;
const cached = await store.match( req );
return cached || fetch( req );
})() );
});
Live example (source)
Edit: This actually doesn't work in Chrome...
While it does set correctly the filename in the dialog, they seem to be unable to retrieve the file when saving it to the disk...
They don't seem to perform a Network request (and thus our SW isn't catching anything), and I don't really know where to look now.
Still this may be a good ground for future work on this.
And an other solution, I didn't took the time to check by myself, would be to run your own pdf viewer.
Mozilla has made its js based plugin pdf.js available, so from there we should be able to set the filename (even though once again I didn't dug there yet).
And as final note, Firefox is able to use the name property of a File Object a blobURI points to.
So even though it's not what OP asked for, in FF all it requires is
const file = new File([blob], filename);
const url = URL.createObjectURL(file);
object.data = url;
In Chrome, the filename is derived from the URL, so as long as you are using a blob URL, the short answer is "No, you cannot set the filename of a PDF object displayed in Chrome." You have no control over the UUID assigned to the blob URL and no way to override that as the name of the page using the object element. It is possible that inside the PDF a title is specified, and that will appear in the PDF viewer as the document name, but you still get the hash name when downloading.
This appears to be a security precaution, but I cannot say for sure.
Of course, if you have control over the URL, you can easily set the PDF filename by changing the URL.
I believe Kaiido's answer expresses, briefly, the best solution here:
"if your original URI contains that filename, the easiest might be to simply make your object's data to the URI you fetched the pdf from directly"
Especially for those coming from this similar question, it would have helped me to have more description of a specific implementation (working for pdfs) that allows the best user experience, especially when serving files that are generated on the fly.
The trick here is using a two-step process that perfectly mimics a normal link or button click. The client must (step 1) request the file be generated and stored server-side long enough for the client to (step 2) request the file itself. This requires you have some mechanism supporting unique identification of the file on disk or in a cache.
Without this process, the user will just see a blank tab while file-generation is in-progress and if it fails, then they'll just get the browser's ERR_TIMED_OUT page. Even if it succeeds, they'll have a hash in the title bar of the PDF viewer tab, and the save dialog will have the same hash as the suggested filename.
Here's the play-by-play to do better:
You can use an anchor tag or a button for the "download" or "view in browser" elements
Step 1 of 2 on the client: that element's click event can make a request for the file to be generated only (not transmitted).
Step 1 of 2 on the server: generate the file and hold on to it. Return only the filename to the client.
Step 2 of 2 on the client:
If viewing the file in the browser, use the filename returned from the generate request to then invoke window.open('view_file/<filename>?fileId=1'). That is the only way to indirectly control the name of the file as shown in the tab title and in any subsequent save dialog.
If downloading, just invoke window.open('download_file?fileId=1').
Step 2 of 2 on the server:
view_file(filename, fileId) handler just needs to serve the file using the fileId and ignore the filename parameter. In .NET, you can use a FileContentResult like File(bytes, contentType);
download_file(fileId) must set the filename via the Content-Disposition header as shown here. In .NET, that's return File(bytes, contentType, desiredFilename);
client-side download example:
download_link_clicked() {
// show spinner
ajaxGet(generate_file_url,
{},
(response) => {
// success!
// the server-side is responsible for setting the name
// of the file when it is being downloaded
window.open('download_file?fileId=1', "_blank");
// hide spinner
},
() => { // failure
// hide spinner
// proglem, notify pattern
},
null
);
client-side view example:
view_link_clicked() {
// show spinner
ajaxGet(generate_file_url,
{},
(response) => {
// success!
let filename = response.filename;
// simplest, reliable method I know of for controlling
// the filename of the PDF when viewed in the browser
window.open('view_file/'+filename+'?fileId=1')
// hide spinner
},
() => { // failure
// hide spinner
// proglem, notify pattern
},
null
);
I'm using the library pdf-lib, you can click here to learn more about the library.
I solved part of this problem by using api Document.setTitle("Some title text you want"),
Browser displayed my title correctly, but when click the download button, file name is still previous UUID. Perhaps there is other api in the library that allows you to modify download file name.

efficient way of streaming a html5 canvas content?

I'm trying to stream the content of a html5 canvas on a live basis using websockets and nodejs.
The content of the html5 canvas is just a video.
What I have done so far is:
I convert the canvas to blob and then get the blob URL and send that URL to my nodejs server using websockets.
I get the blob URL like this:
canvas.toBlob(function(blob) {
url = window.URL.createObjectURL(blob);
});
The blob URLs are generated per video frame (20 frames per second to be exact) and they look something like this:
blob:null/e3e8888e-98da-41aa-a3c0-8fe3f44frt53
I then get that blob URL back from the the server via websockets so I can use it to DRAW it onto another canvas for other users to see.
I did search how to draw onto canvas from blob URL but I couldn't find anything close to what i am trying to do.
So the questions I have are:
Is this the correct way of doing what i am trying to achieve? any
pros and cons would be appreciated.
Is there any other more efficient way of doing this or I'm on a right
path?
Thanks in advance.
EDIT:
I should have mentioned that I cannot use WebRTC in this project and I have to do it all with what I have.
to make it easier for everyone where I am at right now, this how I tried to display the blob URLs that I mentioned above in my canvas using websockets:
websocket.onopen = function(event) {
websocket.onmessage = function(evt) {
var val = evt.data;
console.log("new data "+val);
var canvas2 = document.querySelector('.canvMotion2');
var ctx2 = canvas2.getContext('2d');
var img = new Image();
img.onload = function(){
ctx2.drawImage(img, 0, 0)
}
img.src = val;
};
// Listen for socket closes
websocket.onclose = function(event) {
};
websocket.onerror = function(evt) {
};
};
The issue is that when I run that code in FireFox, the canvas is always empty/blank but I see the blob URLs in my console so that makes me think that what I am doing is wrong.
and in Google chrome, i get Not allowed to load local resource: blob: error.
SECOND EDIT:
This is where I am at the moment.
First option
I tried to send the whole blob(s) via websockets and I managed that successfully. However, I couldn't read it back on the client side for some strange reason!
when I looked on my nodejs server's console, I could see something like this for each blob that I was sending to the server:
<buffer fd67676 hdsjuhsd8 sjhjs....
Second option:
So the option above failed and I thought of something else which is turning each canvas frame to base64(jpeg) and send that to the server via websockets and then display/draw those base64 image onto the canvas on the client side.
I'm sending 24 frames per second to the server.
This worked. BUT the client side canvas where these base64 images are being displayed again is very slow and and its like its drawing 1 frame per second. and this is the issue that i have at the moment.
Third option:
I also tried to use a video without a canvas. So, using WebRTC, I got the video Stream as a single Blob. but I'm not entiely sure how to use that and send it to the client side so people can see it.
IMPORTANT: this system that I am working on is not a peer to peer connection. its just a one way streaming that I am trying to achieve.
The most natural way to stream a canvas content: WebRTC
OP made it clear that they can't use it, and it may be the case for many because,
Browser support is still not that great.
It implies to have a MediaServer running (at least ICE+STUN/TURN, and maybe a gateway if you want to stream to more than one peer).
But still, if you can afford it, all you need then to get a MediaStream from your canvas element is
const canvas_stream = canvas.captureStream(minimumFrameRate);
and then you'd just have to add it to your RTCPeerConnection:
pc.addTrack(stream.getVideoTracks()[0], stream);
Example below will just display the MediaStream to a <video> element.
let x = 0;
const ctx = canvas.getContext('2d');
draw();
startStream();
function startStream() {
// grab our MediaStream
const stream = canvas.captureStream(30);
// feed the <video>
vid.srcObject = stream;
vid.play();
}
function draw() {
x = (x + 1) % (canvas.width + 50);
ctx.fillStyle = 'white';
ctx.fillRect(0,0,canvas.width,canvas.height);
ctx.fillStyle = 'red';
ctx.beginPath();
ctx.arc(x - 25, 75, 25, 0, Math.PI*2);
ctx.fill();
requestAnimationFrame(draw);
}
video,canvas{border:1px solid}
<canvas id="canvas">75</canvas>
<video id="vid" controls></video>
The most efficient way to stream a live canvas drawing: stream the drawing operations.
Once again, OP said they didn't want this solution because their set-up doesn't match, but might be helpful for many readers:
Instead of sending the result of the canvas, simply send the drawing commands to your peers, which will then execute these on their side.
But this approach has its own caveats:
You will have to write your own encoder/decoder to pass the commands.
Some cases might get hard to share (e.g external media would have to be shared and preloaded the same way on all peers, and the worse case being drawing an other canvas, where you'd have to also have shared its own drawing process).
You may want to avoid intensive image processing (e.g ImageData manipulation) to be done on all peers.
So a third, definitely less performant way to do it, is like OP tried to do:
Upload frames at regular interval.
I won't go in details in here, but keep in mind that you are sending standalone image files, and hence a whole lot more data than if it had been encoded as a video.
Instead, I'll focus on why OP's code didn't work?
First it may be good to have a small reminder of what is a Blob (the thing that is provided in the callback of canvas.toBlob(callback)).
A Blob is a special JavaScript object, which represents binary data, generally stored either in browser's memory, or at least on user's disk, accessible by the browser.
This binary data is not directly available to JavaScript though. To be able to access it, we need to either read this Blob (through a FileReader or a Response object), or to create a BlobURI, which is a fake URI, allowing most APIs to point at the binary data just like if it was stored on a real server, even though the binary data is still just in the browser's allocated memory.
But this BlobURI being just a fake, temporary, and domain restricted path to the browser's memory, can not be shared to any other cross-domain document, application, and even less computer.
All this to say that what should have been sent to the WebSocket, are the Blobs directly, and not the BlobURIs.
You'd create the BlobURIs only on the consumers' side, so that they can load these images from the Blob's binary data that is now in their allocated memory.
Emitter side:
canvas.toBlob(blob=>ws.send(blob));
Consumer side:
ws.onmessage = function(evt) {
const blob = evt.data;
const url = URL.createObjectURL(blob);
img.src = url;
};
But actually, to even better answer OP's problem, a final solution, which is probably the best in this scenario,
Share the video stream that is painted on the canvas.

Steaming fragmented Webm over websocket to MediaSouce

I am trying to do the following:
On the server I encode h264 packets into Webm (MKV) container structure, so that each cluster gets a single frame packet.Only the first data chunk is different as it contains something called Initialization Segment.Here it is explained quite well.
Then I stream those clusters one by one in a binary stream via WebSocket to a broweser, which is Chrome.
It probably sounds weird that I use h264 codec and not VP8 or VP9, which are native codec for Webm Video Format. But it appears that html video tag has no problem to play this sort of video container. If I just write the whole stream to a file and pass it to video.src, it is played fine. But I want to stream it in real-time.That's why I am breaking the video into chunks and sending them over websocket.
On the client, I am using MediaSource API. I have little experience in Web technologies, but I found that's probably the only way to go in my case.
And it doesn't work.I am getting no errors, the streams runs ok, and the video object emits no warning or errors (checking via developer console).
The client side code looks like this:
<script>
$(document).ready(function () {
var sourceBuffer;
var player = document.getElementById("video1");
var mediaSource = new MediaSource();
player.src = URL.createObjectURL(mediaSource);
mediaSource.addEventListener('sourceopen', sourceOpen);
//array with incoming segments:
var mediaSegments = [];
var ws = new WebSocket("ws://localhost:8080/echo");
ws.binaryType = "arraybuffer";
player.addEventListener("error", function (err) {
$("#id1").append("video error "+ err.error + "\n");
}, false);
player.addEventListener("playing", function () {
$("#id1").append("playing\n");
}, false);
player.addEventListener("progress",onProgress);
ws.onopen = function () {
$("#id1").append("Socket opened\n");
};
function sourceOpen()
{
sourceBuffer = mediaSource.addSourceBuffer('video/mp4; codecs="avc1.64001E"');
}
function onUpdateEnd()
{
if (!mediaSegments.length)
{
return;
}
sourceBuffer.appendBuffer(mediaSegments.shift());
}
var initSegment = true;
ws.onmessage = function (evt) {
if (evt.data instanceof ArrayBuffer) {
var buffer = evt.data;
//the first segment is always 'initSegment'
//it must be appended to the buffer first
if(initSegment == true)
{
sourceBuffer.appendBuffer(buffer);
sourceBuffer.addEventListener('updateend', onUpdateEnd);
initSegment = false;
}
else
{
mediaSegments.push(buffer);
}
}
};
});
I also tried different profile codes for MIME type,even though I know that my codec is "high profile.I tried the following profiles:
avc1.42E01E baseline
avc1.58A01E extended profile
avc1.4D401E main profile
avc1.64001E high profile
In some examples I found from 2-3 years ago, I have seen developers using type= "video/x-matroska", but probably alot changed since then,because now even video.src doesn't handle this sort of MIME.
Additionally, in order to make sure the chunks I am sending through the stream are not corrupted, I opened a local streaming session in VLC player and it played it progressively with no issues.
The only thing I suspect that the MediaCodec doesn't know how to handle this sort of hybrid container.And I wonder then why video object plays such a video ok.Am I missing something in my client side code? Or MediacCodec API indeed doesn't support this type of media?
PS: For those curious why I am using MKV container and not MPEG DASH, for example. The answer is - container simplicity, data writing speed and size. EBML structures are very compact and easy to write in real time.

Categories

Resources