Problem with reading large XML file into Javascript

Problem with reading large XML file into Javascript - javascript

I created a website where you can import an XML file and then read it out. It works perfectly fine for most files but I tried using an XML file with 730MB and it doesn't work anymore. I don't seem to be getting any errors on the console, but if I for example use this code,
numberOfReports = xmlDoc.getElementsByTagName("DailyReport").length;
I always get 0 even though it should be far more than that, since the XML files definetely contains multiple <DailyReport> elements. My function to import and parse the files looks like this:
// Function to import and serialize the XML file
function import_XML() {
var input = document.createElement('input');
input.type = 'file';
input.onchange = e => {
// getting a hold of the file reference
file = e.target.files[0];
// setting up the reader
var reader = new FileReader();
reader.readAsText(file, 'UTF-8');
// Tell the reader what to do when it's done reading
reader.onload = readerEvent => {
content = readerEvent.target.result;
const parser = new DOMParser();
xmlDoc = parser.parseFromString(content, "application/xml");
console.log(xmlDoc.documentElement.nodeName == "parsererror" ? "Error while parsing XML File" : xmlDoc.documentElement.nodeName);
console.log("content: " + content);
// Number of reports in the XML file
numberOfReports = xmlDoc.getElementsByTagName("DailyReport").length;
console.log("number of daily reports: " + numberOfReports);
updateTable();
}
}
input.click();
}
The content I get from content = readerEvent.target.result; in the console is also just empty:
I'm not sure if it's because the file is too large, but the XML file should not have any malformations. Can anyone help me with this problem? Would really appreciate any help!

I suspect you're exceeding the maximum string length of your browser's JavaScript engine. Different engines have different limits. MDN says Firefox's limit is about 1GB (although I just tried an experiment and it was more like 800MB). A quick experiment in Brave (Chrome-like) suggests a maximum of about 512MB:
let size = 0;
const chunk = "".padStart(4096, " ");
const max = 800 * 1024 * 1024;
try {
let str = "";
while (str.length < max) {
size = str.length;
str += chunk;
}
console.log(`worked! size = ${size / 1024 / 1024}`);
} catch {
console.log(`ERROR, size = ${size / 1024 / 1024}`);
}
The same experiment in Node.js (which uses the same JavaScript engine as Chromium-based browsers, V8) yields the same result, suggesting it's the limit in V8.
Unfortunately, DOMParser only accepts strings, not (say) blobs. I think you're probably not going to be able to handle files that large on V8-based browsers.
I suspect DOMParser will get a method that allows it to read streams someday, but that doesn't help you now. The only solution I can think of is to find an XML parser written in JavaScript that either supports streams or that you could adapt to use a stream. There are several XML parsers in npm packages, there might be one that can use a blob, or a ReadableStream, or one that supports Node.js streams you could adapt to work with ReadableStream (and the browser's version of XML documents rather than whatever they're using on Node.js).

Related

Encode / Decode PNGs to base64 strings in JXA/JavaScript

I am trying to write a JXA script in Apple Script Editor that converts PNG files to base64 strings, which can then be added to a JSON object.
I cannot seem to find a JXA method that works for doing the base64 encoding /decoding part.
I came across a droplet which was written using Shell Script that outsources the task to openssl and then outputs a .b64 file:
for f in "$#"
do
openssl base64 -in "$f" -out "$f.b64"
done
So I was thinking of Frankenstein'ing this up to a method that uses evalAS to run inline AppleScript, per the example:
(() => {
'use strict';
// evalAS2 :: String -> IO a
const evalAS2 = s => {
const a = Application.currentApplication();
return (a.includeStandardAdditions = true, a)
.runScript(s);
};
return evalAS2(
'use scripting additions\n\
for f in' + '\x22' + file + '\x22\n'
do
openssl base64 -in "$f" -out "$f.b64"
done'
);
})();
And then re-opening the .b64 file in the script, but this all seems rather long-winded and clunky.
I know that it is possible to use Cocoa in JXA scripts, and I see that there are methods for base64 encoding/decoding in Cocoa...
As well as Objective-C:
NSData *imageData = UIImagePNGRepresentation(myImageView.image);
NSString * base64String = [imageData base64EncodedStringWithOptions:0];
The JXA Cookbook has a whole section going over Syntax for Calling ObjC functions, which I am trying to read over.
From what I understand, it should look something like:
var image_to_convert = $.NSData.alloc.UIImagePNGRepresentation(image)
var image_as_base64 = $.NSString.alloc.base64EncodedStringWithOptions(image_to_convert)
But I just am a total noob to this, so it is still difficult for me to understand it all.
In the speculative code above, I am not sure where I would get the image data from?
I am currently trying:
ObjC.import("Cocoa");
var image = $.NSImage.alloc.initWithContentsOfFile(file)
console.log(image);
var image_to_convert = $.NSData.alloc.UIImagePNGRepresentation(image)
var image_as_base64 = $.NSString.alloc.base64EncodedStringWithOptions(image_to_convert)
But it is resulting in the following errors:
$.NSData.alloc.UIImagePNGRepresentation is not a function. (In
'$.NSData.alloc.UIImagePNGRepresentation(image)',
'$.NSData.alloc.UIImagePNGRepresentation' is undefined)
I am guessing it is because UIImagePNGRepresentation is of the UIKit framework, which is an iOS thing and not OS X?
I came across this post, which suggests this:
NSArray *keys = [NSArray arrayWithObject:#"NSImageCompressionFactor"];
NSArray *objects = [NSArray arrayWithObject:#"1.0"];
NSDictionary *dictionary = [NSDictionary dictionaryWithObjects:objects forKeys:keys];
NSImage *image = [[NSImage alloc] initWithContentsOfFile:[imageField stringValue]];
NSBitmapImageRep *imageRep = [[NSBitmapImageRep alloc] initWithData:[image TIFFRepresentation]];
NSData *tiff_data = [imageRep representationUsingType:NSPNGFileType properties:dictionary];
NSString *base64 = [tiff_data encodeBase64WithNewlines:NO];
But again, I have no idea how this translates to JXA. I just am determined to get something working.
I was hoping that there was some way of just doing it in plain old JavaScript that will work in a JXA script?
I look forward to any answers and/or pointers that you might be able to provide. Thank you all in advance!

I'm sorry I never worked with JXA but a lot in Objective-C.
I think You are getting the compile errors, because You are trying to always allocate new Objects.
I think it should be the simply:
ObjC.import("Cocoa");
var imageData = $.NSData.alloc.initWithContentsOfFile(file);
console.log(imageData);
var image_as_base64 = imageData.base64EncodedStringWithOptions(0); // Call method of allocated object
0 is a constant for Base64 encodings to just get the base64 String.
edit:
var theString = ObjC.unwrap(image_as_base64);
This to make the value visible to JXA

Use below code. Read the file to var file from jquery file input element using FileReader in 'readDataAsURL'. Then you will have your png as a string in base64 format.
You may need to split the base64 string with ',' to get the actual data part of the string, which you can include in a JSON and send it to the backend via an API.
var file = $('#fileUpload').prop('files')[0];
var base64data;
var reader = new FileReader();
reader.readAsDataURL(file);
reader.onload = function() {
base64data = reader.result;
var dataUrl = base64data.split(",");
};
Usually the base64 string you will get be in this form.
'data:image/png;base64,STREAM_OF_SOME_CHARACTERS...
So the STREAM_OF_SOME_CHARACTERS...(dataUrl) is where actually the image data is in.
Furthermore you can open the image in a HTML page with
<img src=base64data>

FileReader() is unable to read large files

I'm using the following approach in order to preview images before uploading them:
$("#file").change(function() {
var reader = new FileReader();
reader.readAsArrayBuffer(this.files[0]);
var fileName = this.files[0].name;
var fileType = this.files[0].type;
alert(fileType)
reader.onloadend = function() {
var base64Image = btoa(String.fromCharCode.apply(null, new Uint8Array(this.result)));
// I show the image now and convert the data to base 64
}
}
I have noticed that when the image is large, the method fails and I cannot preview the image.
I am unsure if the problem is due to base64 conversion or the FileReader.
Is there any setting to increase the max size, or is there any work around?
Here is the error message thrown in the console :
Uncaught RangeError: Maximum call stack size exceeded
at FileReader.reader.onloadend

Your problem is that you use Function.apply which will convert your Typed Array items to arguments to the String.fromCharCode method.
Functions have a maximum arguments length limit.
To avoid this, when dealing with large files, the best way is to not process it at all.
If you need to send the file to your server, simply send the Blob directly, this can be easily achieved with the FormData API.
If you need to display the file i.e in HTML media element, then use URL.createObjectURL(yourFile) method.
And if you really need a dataURI version of the file, then use reader.readAsDataURL(yourFile) method.

Works for me:
var reader = new FileReader();
reader.onload = function (evt) {
var binary = '';
var bytes = new Uint8Array(reader.result);
var len = bytes.byteLength;
for (var i = 0; i < len; i++) {
binary += String.fromCharCode(bytes[i]);
}
console.log(btoa(binary))
}
reader.readAsArrayBuffer(file)

If you read the file using the FileReader, the whole file will be loaded into the memory. If you'd like handle large files, this will simply result in your web browser crashing right away. If you are really interested in passing your file as a Base64 String, I recommend you to add file size constraints in order to prevent any potential problems. As a conclusion, none of the methods of the FileReader class would be suitable for this purpose unless and again unless you are dealing with small files not larger than 100MG or so, otherwise you will run into problems.

After playing around here's the solution:
$("#file").change(function () {
var reader = new FileReader();
reader.readAsBinaryString(this.files[0]);
var fileName = this.files[0].name;
var fileType = this.files[0].type;
alert(fileType)
reader.onloadend = function () {
var base64Image = btoa(this.result);
}
}

JavaScript Zlib Decompress

I'm trying to decompress zlib'ed XML such as the following:
https://drive.google.com/file/d/0B52P0MZLTdw8ZzQwQzVpZGZVZWc
Uploading to online decompress services works, such as: http://i-tools.org/gzip
In PHP, I'm using this code and works just fine, I get the XML string:
$raw = file_get_contents("file_here");
$uncompressed = zlib_decode($raw);
However, I want to do this in JavaScript.
The app is a client-side Chrome extension which uses chrome.devtools.network that reads from the network logs
Reads binary responses. Example at Google Drive link at the top
JS needs to decompress that response to its original XML and parsed afterwards into object
The only problem I have is the zlib decompress part.
As of Latest Update, the Decompression Libraries work but unpacking doesn't. Please skip to the Update Sept 16 at the bottom.
I have already tried several JavaScript libraries and still cannot make it work:
Pako: https://github.com/nodeca/pako
unpack() code: https://codereview.stackexchange.com/questions/3569/pack-and-unpack-bytes-to-strings
function unpack(str) {
var bytes = [];
for(var i = 0, n = str.length; i < n; i++) {
var char = str.charCodeAt(i);
bytes.push(char >>> 8, char & 0xFF);
}
return bytes;
}
$.get("file_here", function(response){
var charData = unpack(response);
var binData = new Uint8Array(charData);
var data = pako.inflate(binData);
var strData = String.fromCharCode.apply(null, new Uint16Array(data));
console.log(strData);
});
Error: Uncaught incorrect header check
It's the same even placing the response elsewhere:
new Uint8Array(response);
pako.inflate(response);
Imaya's zlib: https://github.com/imaya/zlib.js
$.get("file_here", function(response){
var inflate = new Zlib.Inflate(response);
var output = inflate.decompress();
console.log(output);
});
Error: Uncaught Error: unsupported compression method inflate.js:60
Still using Imaya's zlib, combining with this Stack Overflow question:
Decompress gzip and zlib string in javascript
$.get("file_here", function(response){
var response = response.split('').map(function(e) {
return e.charCodeAt(0);
});
var inflate = new Zlib.Inflate(response);
var output = inflate.decompress();
console.log(output);
});
Error: Uncaught Error: invalid fcheck flag:29 inflate.js:65
dankogai's js-deflate: https://github.com/dankogai/js-deflate
console.log(RawDeflate.inflate(response));
Output: empty
augustl's js-inflate: https://github.com/augustl/js-inflate
console.log(JSInflate.inflate(response));
Output: empty
zlib-browserify: https://github.com/brianloveswords/zlib-browserify
Error: ReferenceError: exports is not defined
This is just a wrapper for Imaya's zlib. I think this is requireJS? I'm not even sure how to use it. Can it even be used without installing anything and just jQuery/JS? The app as mentioned is downloadable Chrome extension with just HTML importing JS files.
UPDATE Sept 16, 2014
It seems the problem is with the JavaScript unpack( ) function. When I use the ByteArray generated by PHP: http://pastebin.com/uDWvK94B, the JavaScript decompression functions work.
PHP unpacking that works:
$unpacked = unpack("C*", $raw);
For the JavaScript unpack( ) code that I use, which doesn't work, see top of the post under Pako section.
So the new question is, why does JavaScript generate a different ByteArray values than the one generated by PHP.
Is it really a problem with the unpack( ) function?
or is it something when the JS fetches the file, the encoding or whatsoever changes thus bytes get messed up?
and lastly, what is your suggested fix?
UPDATE Sept 20, 2014
With more research and some of the answers here giving leads
Sebastian S opening the idea that the problem was in the manner of retrieving data and it had something to do with text encodings
user3995789 providing an example that it will work even without the unpack( ) function, though outside the context of Chrome extensions
Isaac providing examples in the context of Chrome extensions, but still does not work
With that I researched further combining all leads which lead me to a theory that the reason behind all this is that Chrome is unable to get "raw" data through its request.getContent function. See here for the Chrome documentation for the said function.
As of now, I have taken the issue to Chrome, see here.
UPDATE March 24, 2015
Although the problem was not fully resolved, the answer which I think was the most useful to me was from #Sebastian S, who proposed that "the way" I was taking or receiving the data was at fault and a bad conversion was the cause, which is as near as the problem was.

Jquery reads in utf8 format, you have to read the raw file, this function will work.
function readTextFile(file)
{
var rawFile = new XMLHttpRequest();
rawFile.open('GET', file, true);
rawFile.responseType = 'arraybuffer';
rawFile.onload = function (response)
{
var words = new Uint8Array(rawFile.response);
console.log(words[1]);
console.log(pako.ungzip(words));
};
rawFile.send();
}
For more information see this answer.

I understood that you wanna use the zlib decompression inside a chrome extension while reading reponses bodies from the network log.
You need first to retrieve the base64 who will be decompressed. You can achieve this while using the getContent method.
function zlibDecompress(base64Content){
// var base64Content = base64Content.split(',')[1]; // Not sure if need to keep it
// Decode base64 (convert ascii to binary)
var strData = atob(base64Content);
// Convert binary string to character-number array
var charData = strData.split('').map(function(x){return x.charCodeAt(0);});
// Turn number array into byte-array
var binData = new Uint8Array(charData);
// Pako inflate
var data = pako.inflate(binData, { to: 'string' });
return data;
}
chrome.devtools.network.onRequestFinished.addListener(
function(request) {
request.getContent(
function(content, encoding){
if(encoding == 'base64'){
var output = zlibDecompress(content);
}
}
);
}
);
https://developer.chrome.com/extensions/devtools_network#type-Request
Using XMLHttpRequest :
<script type="text/javascript" src="pako.js"></script>
<script type="text/javascript">
function zlibDecompress(url){
var xhr = new XMLHttpRequest();
xhr.open('GET', url, true);
xhr.responseType = 'blob';
xhr.onload = function(oEvent) {
// Base64 encode
var reader = new window.FileReader();
reader.readAsDataURL(xhr.response);
reader.onloadend = function() {
base64data = reader.result;
var base64 = base64data.split(',')[1];
// Decode base64 (convert ascii to binary)
var strData = atob(base64);
// Convert binary string to character-number array
var charData = strData.split('').map(function(x){return x.charCodeAt(0);});
// Turn number array into byte-array
var binData = new Uint8Array(charData);
// Pako inflate
var data = pako.inflate(binData, { to: 'string' });
console.log(data);
}
};
xhr.send();
}
zlibDecompress('fileurl');
</script>
If you wanna use XMLHttpRequest with chrome extension
{
"name": "My extension",
...
"permissions": [
"http://www.domain.com/", // The domain that hold the file
"http://*/" // Or every domain
],
...
}
https://developer.chrome.com/extensions/xhr
Feel free to ask if you have any questions ;)

In my opinion the question you should really be asking is: How do you retrieve the compressed data? As soon as it becomes an UTF-16 string, the trouble begins. I'm not even sure, if the conversion from raw byte data to javascript strings is lossless.
As you wrote something about php, I assume you're communicating to some sort of backend. If this is true, there are options to handle binary data with native means. Maybe this can help you: https://developer.mozilla.org/en-US/docs/Web/API/XMLHttpRequest/Sending_and_Receiving_Binary_Data

Read Raw Data in with Mozilla Add-on

I'm trying to read and write raw data from files using Mozilla's add-on SDK. Currently I'm reading data with something like:
function readnsIFile(fileName, callback){
var nsiFile = new FileUtils.File(fileName);
NetUtil.asyncFetch(nsiFile, function (inputStream, status) {
var data = NetUtil.readInputStreamToString(inputStream, inputStream.available(),{charset:"UTF-8"});
callback(data, status, nsiFile);
});
}
This works for text files, but when I start messing with raw bytes outside of Unicode's normal range, it doesn't work. For example, if a file contains the byte 0xff, then that byte and anything past that byte isn't read at all. Is there any way to read (and write) raw data using the SDK?

You've specified an explicit charset in the options to NetUtil.readInputStream.
When you omit the charset option, then the data will be read as raw bytes. (Source)
function readnsIFile(fileName, callback){
var nsiFile = new FileUtils.File(fileName);
NetUtil.asyncFetch(nsiFile, function (inputStream, status) {
// Do not specify a charset at all!
var data = NetUtil.readInputStreamToString(inputStream, inputStream.available());
callback(data, status, nsiFile);
});
}
The suggestion to use io/byte-streams is OK as well, but keep in mind that that SDK module is still marked experimental, and that using ByteReader via io/file as the example suggests is not a good idea because this would be sync I/O on the main thread.
I don't really see the upside, as you'd use NetUtil anyway.
Anyway, this should work:
const {ByteReader} = require("sdk/io/byte-streams");
function readnsIFile(fileName, callback){
var nsiFile = new FileUtils.File(fileName);
NetUtil.asyncFetch(nsiFile, function (inputStream, status) {
var reader = new ByteReader(inputStream);
var data = reader.read(inputStream);
reader.close();
callback(data, status, nsiFile);
});
}
Also, please keep in mind that reading large files like this is problematic. Not only will the whole file buffered in memory, obviously, but:
The file is read as a char (byte) array first, so there will be a temporary buffer in the stream of at least file.size length (via asyncFetch).
Both NetUtil.readInputStreamToString and ByteReader will use another char (byte) array to read the result into from the inputStream, but ByteReader will do that in 32K chunks, while NetUtil.readInputStreamToString, will use a big buffer of file.length.
The data is then read into the resulting jschar/wchar_t (word) array aka. Javascript string, i.e. you need file.size * 2 bytes in memory at least.
E.g., reading a 1MB file would require more than fileSize * 4 = 4MB memory (NetUtil.readInputStreamToString) and/or more than fileSize * 3 = 3MB memory (ByteReader) during the read operation. After the operation, 2MB of that memory will be still alive to store the resulting data in a Javascript string.
Reading a 1MB file might be OK, but a 10MB file might be already problematic on mobile (Firefox for Android, Firefox OS) and a 100MB would be problematic even on desktop.
You can also read the data directly into an ArrayBuffer (or Uint8Array), which has more efficient storage for byte arrays than a Javascript string and avoid the temporary buffers of NetUtil.readInputStreamToString and/or ByteReader.
function readnsIFile(fileName, callback){
var nsiFile = new FileUtils.File(fileName);
NetUtil.asyncFetch(nsiFile, function (inputStream, status) {
var bs = Cc["#mozilla.org/binaryinputstream;1"].
createInstance(Ci.nsIBinaryInputStream);
bs.setInputStream(inputStream);
var len = inputStream.available();
var data = new Uint8Array(len);
reader.readArrayBuffer(len, data.buffer);
bs.close();
callback(data, status, nsiFile);
});
}
PS: The MDN documentation might state something about "iso-8859-1" being the default if the charset option is omitted in the NetUtil.readInputStreamToString call, but the documentation is wrong. I'll fix it.

Reading client side text file using Javascript

I want to read a file (on the client side) and get the content in an array. It will be just one file. I have the following and it doesn't work. 'query_list' is a textarea where I want to display the content of the file.
<input type="file" id="file" name="file" enctype="multipart/form-data"/>
<script>
document.getElementById('file').addEventListener('change', readFile, false);
function readFile (evt) {
var files = evt.target.files;
var file = files[0];
var fh = fopen(file, 0);
var str = "";
document.getElementById('query_list').textContent = str;
if(fh!=-1) {
length = flength(fh);
str = fread(fh, length);
fclose(fh);
}
document.getElementById('query_list').textContent = str;
}
</script>
How should I go about it? Eventually I want to loop over the array and run some SQL queries.

If you want to read files on the client using HTML5's FileReader, you must use Firefox, Chrome or IE 10+. If that is true, the following example reads a text file on the client.
your example attempts to use fopen that I have never heard of (on the client)
http://jsfiddle.net/k3j48zmt/
document.getElementById('file').addEventListener('change', readFile, false);
function readFile (evt) {
var files = evt.target.files;
var file = files[0];
var reader = new FileReader();
reader.onload = function(event) {
console.log(event.target.result);
}
reader.readAsText(file)
}
For IE<10 support you need to look into using an ActiveX Object like ADO.Stream Scripting.FileSystemObject http://msdn.microsoft.com/en-us/library/2z9ffy99(v=vs.85).aspx but you'll run into a security problem. If you run IE allowing all ActiveX objects (for your website), it should work.

There is such thing as HTML5 File API to access local files picked by user, without uploading them anywhere.
It is quite new feature, but supported by most of modern browsers.
I strongly recommend to check out this great article to see, how you can use it.
There is one problem with this, you can't read big files (~400 MB and larger) because straight forward FileAPI functions attempting to load entire file into memory.
If you need to read big files, or search something there, or navigate by line index check my LineNavigator, which allows you to read, navigate and search in files of any size. Try it in jsFiddle! It is super easy to use:
var navigator = new FileNavigator(file);
navigator.readSomeLines(0, function linesReadHandler(err, index, lines, eof, progress) {
// Some error
if (err) return;
// Process this line bucket
for (var i = 0; i < lines.length; i++) {
var line = lines[i];
// Do something with it
}
// End of file
if (eof) return;
// Continue reading
navigator.readSomeLines(index + lines.length, linesReadHandler);
});

Well I got beat to the answer but its different:
<input type="file" id="fi" />
<button onclick="handleFile(); return false;">click</button>
function handleFile() {
var preview = document.getElementById('prv')
var file = document.getElementById('fi').files[0];
var div = document.body.appendChild(document.createElement("div"));
div.innerHTML = file.getAsText("utf-8");
}
This will work in FF 3.5 - 3.6, and that's it. FF 4 and WebKit you need to use the FileReader as mentioned by Juan Mendes.
For IE you may find a Flash solution.

I work there, but still wanted to contribute because it works well: You can use the filepicker.io read api to do exactly this. You can pass in an dom element and get the contents back, for text or binary data, even in IE8+

Develop Reference

JavaScript is the programming language of the Web.

Problem with reading large XML file into Javascript - javascript

Related

Encode / Decode PNGs to base64 strings in JXA/JavaScript

FileReader() is unable to read large files

JavaScript Zlib Decompress

Read Raw Data in with Mozilla Add-on

Reading client side text file using Javascript

Categories

Resources