Handling Large stream with Node.js

Handling Large stream with Node.js - javascript

Here is my attempt to convert an svg string to a png buffer using node and the imagemagick convert tool. The png buffer is then used to draw an image in a pdf using pdfkit.
Td;lr I have a large svg string that needs to get to a child process "whole" (i.e not chunked). How do I do so?
This is an example that works for small files.
var child_process = require('child_process');
var pdfDocument = require('pdfkit');
var convert = child_process.spawn("convert", ["svg:", "png:-"]),
svgsrc = '<svg><rect height="100" width="100" style="fill:red;"/></svg>';
convert.stdout.on('data', function(data) {
console.log(data.toString('base64')
doc = new pdfDocument()
doc.image(data)
}
convert.stdin.write(svgsrc);
convert.stdin.end()
This works when the svg string is 'small' (like the on provided in the example) -- I'm not sure where the cut-off from small to large is.
However, when attempting to use a larger svg string (something you might generate using D3) like this [ large string ]. I run into:
Error: Incomplete or corrupt PNG file
So my question is: How do I ensure that the convert child process reads the entire stream before processing it?
A few things are known:
The png buffer is indeed incomplete. I used a diff tool to check
the base64 string generated by the app
versus the base64 of a png-to-svg converter online. The non-corrupted
string is much larger than the corrupted string. (sorry I haven't
been more specific with file size). That is, the convert tool seems
to not be reading the entire source at any given time.
The source svg string is not corrupted (as evidenced by the fact the the
gist rendered it)
When used in the command line the convert tool correctly generate a
png file from a svg "stream" with cat large_svg.svg | convert svg:png:- So this is not an issue with the convert tool.
This lead me to down a rabbit hole of looking a node's buffer size for writeable and readable streams but to no avail. Maybe someone has worked with larger streams in node and can help out with getting the to work.

As #mscdex pointed out I had to wait for the process to finish before
attemping downstream work. All that was need was to wait for the end event on the convert.stdout stream and concatenate buffers on the data events.
// allocate a buffer of size 0
graph = Buffer.alloc(0)
// on data concat the incoming and the `graph`
convert.stdout.on('data', function(data) {
graph = Buffer.concat([graph, data])
}
convert.stdout.on('end', function(signal) {
// ... draw on pdf
}
EDIT:
Here is an more efficient version of the above where we use #mscdex
suggestion to do the concatenation on the end callback and keeping a chunksize argument to help the Buffer allocate size when concatenation the chunks.
// allocate a buffer of size 0
var graph = [];
var totalchunks = 0;
convert.stdout.on('data', function(data) {
graph.push(data);
totalsize +=data.length;
}
convert.stdout.on('end', function(signal) {
var image = Buffer.concat(graph, totalsize);
// ... draw on pdf
}

Related

Does JIMP (nodeJS) create bmp files from top-to-bottom? Is there a way to reverse this?

I am trying to generate bmp images (graphs) in nodeJS for use in C to display on a low-res display.
Creating the graphs (with d3.js) and converting the svg-code to a bmp works fine (via svg2img and Jimp) and everything appears correctly in the bmp-file.
When I try to display it on the low-res screen, the C code reports that the image height is negative and fails. Now I read that bmp's can be stored top-to-bottom or bottom-to-top (here for example).
In which direction does Jimp work and how could it be reversed?
I have converted the bmp that I generated with Jimp again (using XnConvert) and tried the resulting bmp, which did successfully display on the low-res screen.
In node:
svg2img(body.select('.container').html(), function(error, buffer) {
//returns a Buffer
fs.writeFile('foo1.png', buffer, function(){
Jimp.read('./foo1.png')
.then(image => {
// Do stuff with the image.
image.background(0xffffffff).resize(1200, 500);
image.dither565();
image.write("./public/test.bmp"); // save
})
.catch(err => {
// Handle an exception.
res.send("error1");
});
});
});
In the C-script logs:
BMP_cfSize:1800054
BMP_cfoffBits:54
BMP_ciSize:40
BMP_ciWidth:1200
BMP_ciHeight:-500 <---------------------
//etc.
*****************************************
total_length:1800000,1800000
bytesPerLine = 3600
imageSize = -1800000
Is there a way to revert the order in Jimp? Or am I missing something else?
Or would it be easier to try to revert the order in the C-library (I'm not very good with C)?

As pointed out in the comments, making the negative value positive in the bmp-js library that Jimp uses in this line flipped the image but also solved the issue with the C library that required that order.
Using image.flip(false, true) in Jimp, I could keep the correct orientation in the final result.
Issue reported in the bmp-js GitHub.

Webscraping images in python with selenium and beautifulsoup from an AJAX website

I've spent a long time trying to go through the html, javascript, network traffic, etc, and learning a lot about javascript, blobs, base64 decoding/encoding of images but I still can't seem to figure out how to extract the images in these videos from this website: https://www.jamesallen.com/loose-diamonds/all-diamonds/
Here's what I know:
Each video is actually a set of up to 512 images, which are retrieved from a server via files titled setX.bin (X is a number). Then they are parsed via an int array into a blob object (There's also some base64 but I forget where), that is somehow converted into an image.
Following the source code is very difficult as it is purposely written as spaghetti code.
How can I extract each diamond's images and do so efficiently?
My one solution is:
I can get the setX.bin files very easily, and if I just 'pass' them into the javascript functions somehow then I should be good.
My second solution is:
to rotate each diamond manually and extract the images from the cache or something like that.
I'd like to use python to do this.
EDIT:
I found javascript here on SO that does gives the 'SecurityError: The operation is not secure'. Here it is:
function exportCanvasAsPNG(id, fileName) {
var canvasElement = document.getElementById(id);
canvasElement.crossOrigin = "anonymous";
var MIME_TYPE = "image/png";
var imgURL = canvasElement.toDataURL(MIME_TYPE);
window.console.log(canvasElement);
var dlLink = document.createElement('a');
dlLink.download = fileName;
dlLink.href = imgURL;
dlLink.dataset.downloadurl = [MIME_TYPE, dlLink.download, dlLink.href].join(':');
document.body.appendChild(dlLink);
dlLink.click();
document.body.removeChild(dlLink);
}
exportCanvasAsPNG("canvas-key-_w5qzvdqpl",'asdf.png');
I ran it from Firefox console. When I ran a similar execute script in python, I got the same issue.
I want to be able to scrape all 360 degree images for each canvas.
Edit2: To make this question simpler, I know how to get the setX.bin files, but I don't know how to covert this collection of images from bin to jpg. Each bin file is multiple jpg files.

The .bin files appear to just contain the jpegs concatenated together with some leading metadata. You can simply iterate through the bytes of the file looking for jpeg file signatures (0xFFD8) and slice out each image:
JPEG_MAGIC = b"\xff\xd8"
with open("set0.bin", "rb") as f:
s = f.read()
i = 0
start_index = s.find(JPEG_MAGIC)
while True:
end_index = s.find(JPEG_MAGIC, start_index + 1)
if end_index == -1:
end_index = len(s)
with open(f"out{i:03}.jpg", "wb") as out:
out.write(s[start_index:end_index])
if end_index == len(s):
break
start_index = end_index
i += 1
Result:

Put generated PNG image into JSZip

I am using JSZip to make a program that generates the image data from a canvas element and puts the image into a zip file.
Right now, it is turning the canvas image into an DataURL. Then, I get rid of the part of the resulting string that says data:image/png;base64,. Now, there is nothing left but the base64 data. I then use atob to change it to ascii.
It seems like putting the remaining string into an image file should work, but the generated ascii text is not correct. Many parts of it are correct, but something is not right.
Here is my code:
//screen is the name of the canvas.
var imgData = screen.toDataURL();
imgData = imgData.substr(22);
imgData = atob(imgData);
console.log(imgData);
Here is an image of the resulting png file (in notepad):
incorrect text http://upurs.us/image/71280.png
And here is what is should look like:
correct text http://upurs.us/image/71281.png
As you can see, there are slight differences, and I have no idea why. I know absolutely nothing about the PNG file type or ASCII, so I don't know where to go from here.
If you want to see all my work, here's the project:
http://s000.tinyupload.com/download.php?file_id=09682586927694868772&t=0968258692769486877226111
EDIT: My end goal is to have a program that exports every single frame of a canvas animation so that I can use them to make a video. If anyone knows a program that does that, please post it!

When you use zip.file("hello.png", imgData) where imgData is a string, you tell JSZip to save an (unicode) string. Since it's not a textual content, you get a corrupted content. To fix that, you can do:
zip.file("hello.png", imgData, {binary: true})
As dandavis suggested, using a blob will be more efficient here. You can convert a canvas to a blob with canvas.toBlob:
screen.toBlob(function (blob) {
zip.file("hello.png", blob);
});
The only caveat is that toBlob is asynchronous: you should disable the download button during that time (or else, if a user is quick enough or the browser slow enough, zip.file won't be executed and you will give an empty zip to your user).
document.getElementById("download_button").disabled = true;
screen.toBlob(function (blob) {
zip.file("hello.png", blob);
document.getElementById("download_button").disabled = false;
});

Windows OCR engine fails to recognize the text in canvas (converted to bitmap)

I have a cordova project where I have a "scribble pad" where the user can scribble their notes. This is a simple canvas object, and I'd like to get the OCR Engine to convert it into text. I'm struggling to convert the canvas data into the software bitmap that OCR Engine supports.
All the samples are based either around loading a file from the storage or reading a stream from camera. Do I have to save this canvas into a file on a device and read it back in into a stream?
I'd welcome the guidance in here as images are something I struggle with.
[Update]
So, I've managed to somehow get the stream, but unfortunately, OCR is not recognizing it.
I have the canvas object and after page is loaded, I place the text into it, so any capable OCR should be able to read it.
I also have the "img" element, for checking whether the stream is correct and contains the correct bitmap. Here is the code that handles the convas conversion to OCR recognition
var blob = canvas.msToBlob();
// This is the stream I'll use for OCR detection
var randomAccessStream = blob.msDetachStream();
// This is the stream I'll use for the image element to make sure the stream above contains what I've placed into the canvas
var blob2 = MSApp.createBlobFromRandomAccessStream("image/png", randomAccessStream.cloneStream());
// Angular JS scope model
$scope.imageUrl = URL.createObjectURL(blob2);
// This works, but returns ""
var scope = this;
if (!this.ocrEngine)
return;
var bitmapDecoder = Windows.Graphics.Imaging.BitmapDecoder;
bitmapDecoder.createAsync(randomAccessStream).then(function (decoder) {
return decoder.getSoftwareBitmapAsync();
}).then(function (bitmap) {
return scope.ocrEngine.recognizeAsync(bitmap);
}).then(function (result) {
console.log(result.text);
});
After this all runs, the image is given the src and is loaded and contains exactly whatever is in the canvas so the stream is correct.
The ocrEngine is setup the following way:
var Globalization = Windows.Globalization;
var OCR = Windows.Media.Ocr;
this.ocrEngine = OCR.OcrEngine.tryCreateFromUserProfileLanguages();
if (!this.ocrEngine) {
// Try to create OcrEngine for specified language.
// If language is not supported on device, method returns null.
this.ocrEngine = OCR.OcrEngine.tryCreateFromLanguage(new Globalization.Language("en-us"));
}
if (!this.ocrEngine) {
console.error("Selected language is not available.");
}
Why is OCR not recognizing simple 'Hello World' ?

well, that was rather embarrassing to realize that the reason why the OCR failed to read anything, even a system written text was that the resulting, generated image had a transparent background. Once I've included a rectangle with the white fill it all started to work correctly.
Unfortunately, the OCR is struggling to recognize anything I scribble on the canvas, so e.g. handwritten numbers or multiline text in canvas are not being recognized, see below
Recognized:
Not recognized
Not recognized:
Not recognized:
Then I've found Windows.UI.Input.Inking namespace and I reckon that's the only way to go.

nodejs binary websocket mimetype handling

i'm not 100% sure but from what i read when i send a blob (binary data) over websocket, the blob does not contain any file information. (Also the official specification states that wesockets only send the raw binary)
the filesize
the mimetype
user info (explain later)
i'm using https://github.com/websockets/ws
Testing:
Sending directly the blob from an input file.
ws.send(this.files[0]) //this should already contain the info
Creating a new blob with the native javascript api from file setting the proper mimetype.
ws.send(new Blob([this.files[0]],{type:this.files[0].type})); //also this
on both sides you can get only the effective blob without any other information.
Is it possible to append let's say a 4kb predefined json data converted also to binary that contains important information like the mimetype and the filesize,
and then just split off the 4kb when needed?
{"mime":"txt/plain","size":345}____________4KB_REST_OF_THE_BINARY
OR
ws.send({"mime":"txt\/plain","size":345})
ws.send(this.files[0])
Even if the first one is the worst solution ever it would allow me to send everything in one time.
The second one has a big problem:
it's a chat that allows to send also files like documents,images,music videos.
i could write some sort of handshaking system when sending the file/user info before i send the binary data.
BUT
if another person sends also a file, as it's async, the handshaking system has no chance to determine wich file is the right one for the correct user and mimetype.
So how do you properly send a binary file in a multiuser async envoirement?
i know i can convert to base64 but thats 30% bigger.
btw. Totally disappointed with Apple... while chrome shows every binary data properly, my ios devices are not able to handle blob's, only images will show in blob or base64 format, not even a simple txt file. Basically only a <img> tag can read dynamic files.
How everything works (now):
user sends a file
nodejs gets the binary data, also user info... but not mimetype,filename,size.
nodejs broadcasts the raw binary file to all the users.(can't specify user & file info)
clients create a bloburl (who send that? XD).
EDIT
what i have now:
client 1 (sends a file)CHROME
fileInput.addEventListener('change',function(e){
var file=this.files[0];
ws.send(new Blob([file],{
type:file.type //<- SET MIMETYPE
}));
//file.size
},false);
note: file is already a blob ... but this is how you would normally create a new blob specifying the mimetype.
server (broadcasts the binary data to the other clients)NODEJS
aaaaaand the mimetype is gone...
ws.addListener('message',function(binary){
var b=0,c=wss.clients.length;
while(b<c){
wss.clients[b++].send(binary)
}
});
client 2 (recieves the binary)CHROME
ws.addEventListener('message',function(msg){
var blob=new Blob([msg.data],{
type:'application/octet-stream' //<- LOST
});
var file=window.URL.createObjectURL(blob);
},false);
note: m.data is already a blob ... but this is how you would normally create a new blob specifying the mimetype witch is lost.
In client 2 i need the mimetype and naturally i also need the info about the user, wich can be retrieved from client 1 or the server (not a good choice)...

You're a bit out of luck with this because Node doesn't support the Blob interface and so any data you send or receive in Binary with Node is just Binary. You would have to have something that knew how to interpret a Blob object.
Here's an idea, and let me know if this works. Reading through the documentation for websockets\ws it says it supports sending and receiving ArrayBuffers. Which means you can use TypedArrays.
Here's where it gets nasty. You set a certain fixed n number of bytes at the beginning of every TypedArray to signal the mime type encoded in utf8 or what have you, and the rest of your TypedArray contains your file's bytes.
I would recommend using UInt8Array because utf8 characters are 8 bits long and your text will probably be readable when encoded that way. As for the file bits you'll probably just end up writing those down somewhere and appending an ending to it.
Also note, this method of interpretation works both ways whether from Node or in the Browser.
This solution is really just a form of type casting and you might get some unexpected results. The fixed length of your mime type field is crucial.
Here it is illustrated. Copy, paste, set the image file to whatever you want and then run that. You'll see the mime type I set pop out.
var fs = require('fs');
//https://stackoverflow.com/questions/8609289/convert-a-binary-nodejs-buffer-to-javascript-arraybuffer
function toUint8Array(buffer) {
var ab = new ArrayBuffer(buffer.length);
var array = new Uint8Array(ab);
for (var i = 0; i < buffer.length; ++i) {
array[i] = buffer[i];
}
return array;
}
//data is a raw Buffer object
fs.readFile('./ducklings.png', function (err, data) {
var mime = new Buffer('image/png');
var allBuffed = Buffer.concat([mime, data]);
var array = toUint8Array(allBuffed);
var mimeBytes = array.subarray(0,9); //number of characters in mime Buffer
console.log(String.fromCharCode.apply(null, mimeBytes));
});
Here's how you do it on the client side:
SOLUTION A: GET A PACKAGE
Get buffer, an implementation of Node's Buffer API for browsers. The solution to concatenate Byte buffers will work exactly as before. You can append fields like To: and what not as well. The way you format your headers in order to best serve your clients will be an evolving process I'm sure.
SOLUTION B: OLD SCHOOL
STEP 1: Convert your Blob to an ArrayBuffer
Notes: How to convert a String to an ArrayBuffer
var fr = new FileReader();
fr.addEventListener('loadend', function () {
//Asynchronous action in part 2.
var message = concatenateBuffers(headerStringAsBuffer, fr.result);
ws.send(message);
});
fr.readAsArrayBuffer(blob);
STEP 2: Concatenate ArrayBuffers
function concatenateBuffers(buffA, buffB) {
var byteLength = buffA.byteLength + buffB.byteLength;
var resultBuffer = new ArrayBuffer(byteLength);
//wrap ArrayBuffer in a typedArray/view
var resultView = new Uint8Array(resultBuffer);
var viewA = new Uint8Array(resultBuffer);
var viewB = new Uint8Array(resultBuffer);
//Copy 8 bit integers AKA Bytes
resultView.set(viewA);
resultView.set(viewB, viewA.byteLength);
return resultView.buffer
}
STEP 3: Receive and Reblob
I'm not going to repeat how to convert the concatenated String bytes back into a string because I've done it in the server example, but for turning the file bytes into a blob of your mime type is fairly simple.
new Blob(buffer.slice(offset, buffer.byteLength), {type: mimetype});
This Gist by robnyman goes into further details on how you would use an image transmitted via XHR, put it into localstorage, and use it in an image tag on your page.

I liked #Breedly's idea of prepending a fixed length byte array to indicate mime type of the ArrayBuffer so I created this npm package that I use when dealing with websockets but maybe others' might find it useful.
Example usage
const {
arrayBufferWithMime,
arrayBufferMimeDecouple
} = require('arraybuffer-mime')
// some image array buffer
const uint8 = new Uint8Array(1)
uint8[0] = 1
const ab = uint8.buffer
const mime = 'image/png'
const abWithMime = arrayBufferWithMime(ab, mime)
const {mime, arrayBuffer} = arrayBufferMimeDecouple(abWithMime)
console.log(mime) // "image/png"
console.log(arrayBuffer) // ArrayBuffer

Develop Reference

JavaScript is the programming language of the Web.

Handling Large stream with Node.js - javascript

Related

Does JIMP (nodeJS) create bmp files from top-to-bottom? Is there a way to reverse this?

Webscraping images in python with selenium and beautifulsoup from an AJAX website

Put generated PNG image into JSZip

Windows OCR engine fails to recognize the text in canvas (converted to bitmap)

nodejs binary websocket mimetype handling

Categories

Resources