pdf.js memory leak when generating thumbnails

pdf.js memory leak when generating thumbnails - javascript

I'm in the process of creating an nw.js application that needs to show loads of PDF's. The PDF's are initially downloaded the first time you start the application. In the initialization phase I also need to create a thumbnail for each PDF to be shown in lists.
The thumbnail generation itself doesn't seem to be an issue when we had a few PDF's. It works by creating a canvas element and having PDF.js draw the first page, and then save the canvas to a PNG.
The issue is that PDF.js doesn't seem to unload the PDF between runs. Loading 20 1MB PDF files usually leads to nw.js using around 500MB RAM. Now we would have 100+, maybe even thousands of PDFs, so we need to figure out how to free the RAM between each thumbnail, since at around 80 or so PDFs, nw.js already uses 2GB of RAM and freezes my laptop as it runs out of memory.
I've made a simple test that shows this issue:
var fs = require("fs");
var Q = require("q");
var glob = require("glob");
var canvas = document.createElement("canvas");
var ctx = canvas.getContext('2d');
PDFJS.workerSrc = "pdf.worker.js";
function pdf(pdfFile) {
return new Q.Promise(function (fulfill, reject) {
PDFJS.getDocument(pdfFile).then(function (pdf) {
pdf.getPage(1).then(function (page) {
var viewport = page.getViewport(0.5);
canvas.height = viewport.height;
canvas.width = viewport.width;
var renderContext = {
canvasContext: ctx,
viewport: viewport
};
page.render(renderContext).then(function () {
//set to draw behind current content
ctx.globalCompositeOperation = "destination-over";
//set background color
ctx.fillStyle = "#ffffff";
//draw background / rect on entire canvas
ctx.fillRect(0, 0, canvas.width, canvas.height);
var img = canvas.toDataURL("image/png");
img = img.replace(/^data:image\/png;base64,/, "");
fs.writeFile(pdfFile + ".png", img, 'base64', function (err) {
console.log("Done thumbnail for: " + pdfFile);
fulfill();
});
});
});
});
});
}
glob("pdf/*.pdf", function (err, files) {
if (err) {
console.log(err);
} else {
function generate(file) {
console.log("Generating thumb for: " + file);
pdf(file).then(function() {
if(files.length > 0) next();
});
}
function next() {
var file = files.pop();
generate(file);
}
next();
}
});
I've never done anything like this before. I've tried to reuse the same canvas for all thumbs but that didn't seem to change a thing.
I've tried to do a heap snapshot in developer tools to see what takes up all the RAM, but guess what? It seems to trigger a garbage collection before doing the snapshot, so nw.js goes from 500MB to around 100MB before doing the snapshot. This makes me believe that the objects are actually marked for deletion but that the GC never has the chance to run before the computer runs out of RAM. Loading 20 files and then just wait doesn't trigger a GC though, and neither does running out of RAM.
I've tried to check the API and documentation of PDF.js, but I could not find anything mentioning how to unload a PDF before loading the next one.
Any ideas on how I should proceed? An idea I had was to call some external tool or make a c/c++ lib that I would call using node-ffi, but I'd have to use PDF.js to show the PDF's at a later state anyway and I'd imagine I would just run into the same issue again.

Related

PDF.JS PDF is not rendering properly, appears mirrored and upside down

I'm trying to get pdf.js to work in IE. I've copied the code almost exactly from the "Hello World using base64 encoded PDF" example on the pdf.js site at https://mozilla.github.io/pdf.js/examples/. The PDF is upside down and mirrored. I've looked around and a common cause of this is re-using the canvas for multiple renders, but I'm not doing that I'm just rendering once, so I really have no idea.
At the top of my html document i have:
$html .= '<canvas width="600px" height="2000px" id="the-canvas"></canvas>';
Then I've basically copied the JS exactly from the demo like so (encodedString variable is my pdf base64 string)
var pdfData = atob(encodedString);
// Loaded via <script> tag, create shortcut to access PDF.js exports.
var pdfjsLib = window['pdfjs-dist/build/pdf'];
// The workerSrc property shall be specified.
pdfjsLib.GlobalWorkerOptions.workerSrc = '//mozilla.github.io/pdf.js/build/pdf.worker.js';
// Using DocumentInitParameters object to load binary data.
var loadingTask = pdfjsLib.getDocument({data: pdfData});
loadingTask.promise.then(function(pdf) {
console.log('PDF loaded');
// Fetch the first page
var pageNumber = 1;
pdf.getPage(pageNumber).then(function(page) {
console.log('Page loaded');
var scale = 1.5;
var viewport = page.getViewport({scale: scale});
// Prepare canvas using PDF page dimensions
var canvas = document.getElementById('the-canvas');
var context = canvas.getContext('2d');
//canvas.height = viewport.height;
//canvas.width = viewport.width;
// Render PDF page into canvas context
var renderContext = {
canvasContext: context,
viewport: viewport
};
var renderTask = page.render(renderContext);
renderTask.promise.then(function () {
console.log('Page rendered');
});
});
}, function (reason) {
// PDF loading error
console.error(reason);
});
The only thing i really changed was i commented out a couple of lines setting the canvas width and height based on viewport, because it wasnt working it was always collapsed, so instead i specified width and height inline with the canvas html.
I cant seem to include images with this new stack overflow design but the pdf is rendering and appears, but its upside down and the text is mirrored, like you're looking at the text in a mirror.
If anyone could give me advice i'd appreciate it. thanks

Change {scale: scale} to scale. It wants a number not an object. Example docs are wrong.

The method signature changed in PR #10369, hence:
In version 2.0.943 and earlier it takes "regular" parameters, i.e.
formatted as getViewport(scale, rotate, dontFlip).
In version 2.1.266 and later it takes an object, i.e. formatted as
getViewport({ scale, rotation, dontFlip })
Source:
https://github.com/mozilla/pdf.js/issues/10809
PR:
https://github.com/mozilla/pdf.js/pull/10369

Javascript - Draw specific part of image on canvas

I'm trying to send images from desk to webapp, on the desk part I have a button that takes a screen-shot then compares it with the one taken before, it calculates the difference between them then generate a new Bitmap based on this difference, then it sends this new Bitmap to the webapp which draws it on a canvas, this all happens in real-time using socket.io.
My problem is, let's assume I use Google Chrome on my screen and took a screen-shot, then I opened cmd.exe & took another screenshot, assuming that cmd.exe size is 300 * 100, and starts at half of the screen, the new generated bitmap will contains the cmd.exe screen-shots but rest of the screen (the uncover parts of Google Chrome) will be black, which is perfect for my case since i want to reduce bandwidth usage, now what I want to do, is take the difference bitmap (blob) on the JavaScript side and draw the difference on the previous screen-shot, which will make it look like I transferred the whole screen-shot, If i simply parse the blobs on the canvas, It will make a black screen with the cmd.exe on the center of it, here is my current code :
socket.on("image up", (bin) => {
var ctx = canvas[0].getContext('2d');
var img = new Image();
img.onload = function() {
ctx.drawImage(img, 0, 0);
}
var urlCreator = window.URL || window.webkitURL;
var binaryData = [];
binaryData.push(bin);
img.src = urlCreator.createObjectURL(new Blob(binaryData));
});
Any suggestions ? Thanks in advance

Is there a way to use a web worker to resize an image client side?

The way I'm resizing images now is by sticking it into a canvas element and then scaling the context of the canvas. The problem is, when I'm resizing many images the UI basically freezes. Is there anyways I can move this resizing step to a web worker? Problem I'm having is that you can't use document.createElement('canvas') or Image(), two functions crucial to this implementation.

It is possible. However, because canvas isn't available in a worker, you would have to use your own/3rd party code to manipulate the image data in the worker.
For example, you could use
https://github.com/nodeca/pica, which quite handily does its processing in a web worker if web workers are supported.
A rough example of using this to resize image from an img element to a canvas element...
<button onclick="resize()">Resize</button>
<img id="original" src="my-image.jpg">
<canvas id="resized">
With Javascript
function resize() {
// Find the original image
var originalImg = document.getElementById("original");
// Create an empty canvas element of the same dimensions as the original
var originalCanvas = document.createElement("canvas");
originalCanvas.width = originalImg.width;
originalCanvas.height = originalImg.height;
// Copy the image contents to the canvas
var ctx = originalCanvas.getContext("2d");
ctx.drawImage(originalImg, 0, 0);
// Set the target dimensions
var resizedCanvas = document.getElementById("resized");
resizedCanvas.width = originalCanvas.width / 2;
resizedCanvas.height = originalCanvas.height / 2;
// Resize (using web workers if supported)
pica.resizeCanvas(originalCanvas, resizedCanvas, {}, function(err) {
// Do something on finish/error
});
}
Which can be seen at https://plnkr.co/edit/yPRjxqQkHryqeZKw4YIH?p=preview

Unfortunately, you cannot use integrated browser functions for that. Instead, you need to obtain pixel data:
var data = ctx.getImageData(0,0,canvas.width, canvas.height);
You need to send those to worker. You can use transfer mode for the array:
worker.postMessage( {
name: "image_data",
data: data.data,
width: data.width,
height: data.height
},
[data.data] // this tells browser to transfer data to web worker
);
I modified function from some other answer so that it can scale image using the image data array. It's quite limited, as the scale is only allowed to be integer - that means you can't scale down: https://jsfiddle.net/n3drn8v9/5/
I recommend googling some libraries for this, rather than reinventing the wheel.

Swapping image on canvas eats memory [duplicate]

This question already has an answer here:
Memory leaks when manipulating images in Chrome
(1 answer)
Closed 6 years ago.
I think I have a problem related to:
Systematically updating src of IMG. Memory leak
I don't have enough rep to comment on answers but https://stackoverflow.com/a/34085389/3270244 is exactly my case.
var canvasElement = $('canvas', camContainer);
var ctx = canvasElement[0].getContext('2d');
var image = new Image();
image.onload = function() {
ctx.drawImage(this, 0, 0);
image.src = '';
};
//for every getCamImg I receive exactly 1 image
socket.on('getCamImg', function(err, data) {
if(data) {
var dataImg = data.substring(data.indexOf(';') + 1);
image.src = dataImg;
}
socket.emit('getCamImg');
});
socket.emit('getCamImg');
I change img.src every 1/10s (jpegs from a camera) and I can watch the browsers consume more and more memory. Firefox stops at 500MB, Edge stops at 100MB and for Chrome I stopped testing near 1G. If I remove the img.src change everything runs smooth (without an image of course).
I found a lot of (at leat I think so) related issues:
https://bugs.chromium.org/p/chromium/issues/detail?id=36142
https://bugs.chromium.org/p/chromium/issues/detail?id=337425
memory leak while drawing many new images to canvas
Memory leaks when manipulating images in Chrome
Somewhere someone mentioned (sorry for this :D) that maybe the cache is spammed because the old images are kept. I don't think it's a gc problem, because chrome has a tool to run him and nothing changes.
Can someone reproduce this or guide me the correct way?
Update:
socket.on('getCamImg', function(err, data) {
if(data) {
var image = document.createElement("img");
image.onload = function() {
ctx.drawImage(this, 0, 0, ctx.canvas.width, ctx.canvas.height);
socket.emit('getCamImg');
image.src = '';
};
image.src = dataImg;
}
});
This works good in Firefox (the image.src='' is important). Chrome still leaks.

I'm doing nearly the same as you do in my current project. As this is too much for a comment I just share my observations in an answer. This is how I do it:
var canvas = document.getElementById("canvas"),
ctx = canvas.getContext("2d"),
onNewImgData = function (data) {
//create a new image element for every base64 src
var img = document.createElement("img");
//bind the onload event handler to the image
img.onload = function () {
//draw the image on the canvas
ctx.drawImage(this, 0, 0);
//do some stuff
//...
//done, request next image
};
//update the image source with given base64 data
img.src = "data:image/bmp;base64," + data;
};
I don't clean up anything and I can't see a memory leak in my application (no matter which browser). But I had a memory leak before, as I logged all the base64 data into the browser console :) That caused the exact same issue you've described.

angularjs compress image before upload

I'm buliding a web site for mobile devices, that uses angular-file-upload.min.js for uploading images from a mobile device image library.
html code:
<div>
<div class="rating-camera-icon">
<input type="file" accept="image/*" name="file" ng-file-
select="onFileSelect($files)">
</div>
<img ng-show="fileName" ng-src="server/{{fileName}}" width="40"
style="margin-left:10px">
</div>
code:
$scope.onFileSelect = function($files) {
for (var i = 0; i < $files.length; i++) {
var file = $files[i];
if (!file.type.match(/image.*/)) {
// this file is not an image.
};
$scope.upload = $upload.upload({
url: BASE_URL + 'upload.php',
data: {myObj: $scope.myModelObj},
file: file
}).progress(function(evt) {
// console.log('percent: ' + parseInt(100.0 * evt.loaded / evt.total));
// $scope.fileProgress = evt.loaded / evt.total * 100.0;
}).success(function(data, status, headers, config) {
// file is uploaded successfully
$scope.fileName = data;
});
}
};
The upload is very slow in mobile devices. How can I compress the file?

Stringifying the image into a base-64 text format is all fine and well, but it will take a small amount of time and certainly does not compress it. In fact it will likely be noticeably larger than the raw image. Unfortunately your browser will also not gzip an uploads. They can of course handle gzipped downloads. You could certainly try to do a gzip of the text itself using some pure JS solution. Looking on github you can find such things - https://github.com/beatgammit/gzip-js However, that will take some time as well and there is no guarantee that the compressed text version of the image is any smaller than the raw JPEG you attach.
A native mobile app might decide to use some native code JPEG or PNG optimization before sending (basically resample the image) if appropriate, but doing this out in JavaScript seems potentially problematic at this point in time. Given Atwood's law (of writing everything eventually in JavaScript) it certainly could be done but at this point in mid-2014 it isn't.

You could try to store the image on a canvas, then convert to data64 and then upload the data string.
I made kind of this in a POC, theres a bug in ios regarding large images as the one you could take with the camera when in canvas, but the overal works nice... something like;
file = files[0];
try {
var URL = window.URL || window.webkitURL,
imgURL = URL.createObjectURL(file);
showPicture.src = imgURL;
imgBlobToStore = imgURL;
if(AppData.supports_html5_storage()) {
var canvas = document.getElementById('storingCanvas') ,
ctx = canvas.getContext('2d'),
img = new Image(),
convertedFile;
img.src = imgBlobToStore;
img.onload = function () {
canvas.width = img.width;
canvas.height= img.height;
ctx.drawImage(img, 0,0,img.width, img.height);
convertedFile = canvas.toDataURL("image/jpeg"); //or png
//replace with angular storage here
localStorage.setItem( $('.pic').attr('id'), convertedFile);
};
},
}

There are several libraries that do this for you on the client side.
https://github.com/oukan/angular-image-compress
https://github.com/sammychl/ng-image-compress
http://angularscript.com/client-side-image-compress-directive-with-angular/

As an alternative to a programmatic solution - if your image is being created by the device camera for upload, then why not simply change the resolution of the camera. The smallest resolution may be 10x smaller than the largest, and this may be suitable for many situations.

Develop Reference

JavaScript is the programming language of the Web.

pdf.js memory leak when generating thumbnails - javascript

Related

PDF.JS PDF is not rendering properly, appears mirrored and upside down

Javascript - Draw specific part of image on canvas

Is there a way to use a web worker to resize an image client side?

Swapping image on canvas eats memory [duplicate]

angularjs compress image before upload

Categories

Resources