Convert byte array to ms word (.dotx) file with Javascript - javascript

I am writing frontend Angular application, where user should be able to upload MS Word template files (.dotx) and later download them on demand. Due to file size limitations I have to upload files as byte array and files will be stored and downloaded in this format. Approach, described below works fine with .txt files, however with MS Word files it does not work. Downloaded MS Word file has issues with encoding and is not readable.
Part 1: Uploading file
I use to get file from user:
<input type="file" class="file-input" (change)="onFileSelected($event)">
Reading file to byte array ("uploading file"):
onFileSelected($event) {
const file = $event.target.files[0] as File;
const reader = new FileReader();
const fileByteArray = [];
reader.readAsArrayBuffer(file);
reader.onloadend = (evt) => {
if (evt.target.readyState === FileReader.DONE) {
const arrayBuffer = evt.target.result as ArrayBuffer;
const array = new Uint8Array(arrayBuffer);
for (const a of array) {
fileByteArray.push(a);
}
this.byteArray = fileByteArray;
}
};
}
Downloading file:
downloadFile() {
const bytesView = new Uint8Array(this.byteArray);
const str = new TextDecoder('windows-1252').decode(bytesView); //windows-1252 encoding used here for cirilic
const file = new Blob([str], {type: 'application/vnd.openxmlformats-officedocument.wordprocessingml.template'});
saveAs(file, 'test.dotx');
}
Did not find similar issues in Internet, I guess it is not standard way of dealing with files. Is it ever possible to process MS Word files as byte array with JS?
Another important question - is it safe to upload a file as an array of bytes, saved in database?
Appreciate any help.

Related

JavaScript, creation of large files (more than 2 gb) to download from the client

I am implementing a websocket server to transfer files to clients, that is, I send the bytes of the file in parts from the server to the client, and from the client I join the byte fragments and create the file with its extension.
My problem is that the string apparently can only store and concatenate only up to 500mb approx, which would limit the generation of larger files such as 1gb or greater since it freezes.
Is there any way to buffer that large amount of data and be able to download it as files?
I have searched and found about StreamSaver.js but I have not been able to implement it with the internet examples to create that giant data buffer. Thanks!
I leave a sample of my code to pass from a hex string to files.
function CreateFile(DataHex, FileName) {
var binary = new Array();
for (var i=0; i<DataHex.length/2; i++) {
var h = DataHex.substr(i*2, 2);
binary[i] = parseInt(h,16);
}
var byteArray = new Uint8Array(binary);
var filecomp = window.document.createElement('a');
filecomp.href = window.URL.createObjectURL(new Blob([byteArray], { type: 'application/octet-stream' }));
filecomp.download = FileName;
filecomp.click();
window.URL.revokeObjectURL(filecomp.href);
}

Reading a local binary file in javascript and converting to base64

I have a local site which uses Javascript to browse files on my machine. This is not a NodeJS question. I have been reading binary files on my local filesystem and converting them to base64. The problem I'm having is when there are non-printable characters. The output I get from javascript is different to the base64 command line tool in Linux.
An example file, which we can use for this question, was generated with head -c 8 /dev/random > random -- it's just some binary nonsense written to a file. On this example it yielded the following:
$ base64 random
Tg8j3hAv/u4=
If you want to play along at home you can run this to generate the same file:
echo -n 'Tg8j3hAv/u4=' | base64 -d > random
However, when I try and read that file in Javascript and convert it to base64 I get a different result:
Tg8j77+9EC/vv73vv70=
It looks kind of similar, but with some other characters in there.
Here's how I got it:
function readTextFile(file)
{
let fileContents;
var rawFile = new XMLHttpRequest();
rawFile.open("GET", file, false);
rawFile.onreadystatechange = function ()
{
if(rawFile.readyState === 4)
{
if(rawFile.status === 200 || rawFile.status == 0)
{
fileContents = rawFile.responseText;
}
}
}
rawFile.send(null);
return fileContents;
}
var fileContents = readTextFile("file:///Users/henrytk/tmp/stuff/random");
console.log(btoa(unescape(encodeURIComponent(fileContents))));
// I also tried
console.log(Base64.encode(fileContents));
// from http://www.webtoolkit.info/javascript_base64.html#.YVW4WaDTW_w
// but I got the same result
How is this happening? Is it something to do with how I'm reading the file? I want to be able to read that file synchronously in a way which can be run locally - no NodeJS, no fancy third-party libraries, if possible.
I believe this is the problem:
fileContents = rawFile.responseText
This will read your file as a JavaScript string, and not all binary is valid JavaScript character code points.
I will recommend using fetch to get a blob, since that is the method I know best:
async function readTextFileAsBlob(file) {
const response = await fetch( file );
const blob = await response.blob();
return blob;
}
Then, convert the blob to base64 using the browser's FileReader.
(Maybe that matches the Linux tool?)
const blobToBase64DataURL = blob => new Promise(
resolvePromise => {
const reader = new FileReader();
reader.onload = () => resolvePromise( reader.result );
reader.readAsDataURL( blob );
}
);
In your example, you would use these functions like this:
readTextFileAsBlob( "file:///Users/henrytk/tmp/stuff/random" ).then(
async blob => {
const base64URL = await blobToBase64DataURL( blob );
console.log( base64URL );
}
);
This will give you a URL like data://.... You'll need to split off the URL part, but if all goes well, the last bit should be the right base64 data. (Hopefully).

File in selected file array md5 encryption with CryptoJS always gives the same md5 value

In my Vuejs front-end, there is a file upload button. When user selected the file Vuejs triggers the #change event.I have used the file reader and I have imported the Cryptojs libraries which I have downloaded as node modules(in npm).
import cryptoJs from '../../node_modules/crypto-js'
import md5 from '../../node_modules/crypto-js/md5'
My html code for file upload button as follows:
<input type="file" ref="uploadedfile" name="file1" id="file1" #change="handleFileUpload">
File reader code inside the #change function:
handleFileUpload(e){
const filesToRead = e.target.files;
//getting the first file from the files array
let file1 = filesToRead[0];
const fileReader = new FileReader();
fileReader.addEventListener('loadend', (evt) => {
if (evt.target.readyState == FileReader.DONE) {
file1 = fileReader.result;
const encryptedvalue = md5(cryptoJs.enc.Latin1.parse(file1)).toString();
console.log("MD5 value is :");
console.log(encryptedvalue);
}
});
}
But always I get the same md5 value although I selected different files.
In the file object array, I can see all the file related data also when I inspect through the Chrome developer tool's console.(If I console log as follows)
console.log(file1);
The posted code lacks the call that loads the data. This is probably just a copy/paste error. Since the data is parsed with the Latin1 (aka ISO 8859-1) encoder, FileReader.readAsBinaryString() is an appropriate method, e.g.:
handleFileUpload(e) {
const filesToRead = e.target.files;
let file1 = filesToRead[0];
const fileReader = new FileReader();
fileReader.addEventListener('loadend', (evt) => {
if (evt.target.readyState == FileReader.DONE) {
file1 = fileReader.result;
const encryptedvalue = md5(cryptoJs.enc.Latin1.parse(file1)).toString();
console.log("MD5 value is :");
console.log(encryptedvalue);
}
});
fileReader.readAsBinaryString(file1); // missing in the posted code
}
However, I cannot reproduce the problem with this code, neither locally nor online https://codesandbox.io/s/brave-fast-cx9gz (if in the online case the error message C is undefined is displayed, this can generally be eliminated by commenting out and in the two CryptoJS import lines in components/Repro - no idea why this happens).
However, I can reproduce the issue when the data is loaded with FileReader.readAsArrayBuffer(). If the ArrayBuffer is then parsed with the Latin1 encoder (as in the posted code), which is incompatible for this, then for different files always the same hash results. The result is correct again if the WordArray is created directly from the ArrayBuffer with:
const encryptedvalue = md5(cryptoJs.lib.WordArray.create(file1)).toString();

JavaScript FileReader Slice Performance

I am trying to access the first few lines of text files using the FileApi in JavaScript.
In order to do so, I slice an arbitrary number of bytes from the beginning of the file and hand the blob over to the FileReader.
For large files this takes very long, even though, my understanding currently is that only the first few bytes of the file need to be accessed.
Is there some implementation in the background that requires the whole file to be accessed before it can be sliced?
Does it depend on the browser implementation of the FileApi?
I currently have tested in both Chrome and Edge (chromium).
Analysis in Chrome using the performance dev tools shows a lot of idle time before the reader.onloadend and no increase in ram usage. This might be however, because the FileApi is implemented in the Browser itself and does not reflect in the JavaScript performance statistics.
My implementation of the FileReader looks something like this:
const reader = new FileReader();
reader.onloadend = (evt) => {
if (evt.target.readyState == FileReader.DONE) {
console.log(evt.target.result.toString());
}
};
// Slice first 10240 bytes of the file
var blob = files.item(0).slice(0, 1024 * 10);
// Start reading the sliced blob
reader.readAsBinaryString(blob);
This works fine but as described performs quite underwhelmingly for large files. I tried it for 10kb, 100mb and 6gb. The time until the first 10kb are logged seems to correlate directly to the file size.
Any suggestions on how to improve performance for reading the beginning of a file?
Edit:
Using Response and DOM streams as suggested by #BenjaminGruenbaum does sadly not improve the read performance.
var dest = newWritableStream({​​​​​​​​
write(str) {​​​​​​​​
console.log(str);
}​​​​​​​​,
}​​​​​​​​);
var blob = files.item(0).slice(0, 1024 * 10);
(blob.stream ? blob.stream() : newResponse(blob).body)
// Decode the binary-encoded response to string
.pipeThrough(newTextDecoderStream())
.pipeTo(dest)
.then(() => {​​​​​​​​
console.log('done');
}​​​​​​​​);
how about this!!
function readFirstBytes(file, n) {
return new Promise((resolve, reject) => {
const reader = new FileReader();
reader.onload = () => {
resolve(reader.result);
};
reader.onerror = reject;
reader.readAsArrayBuffer(file.slice(0, n));
});
}
readFirstBytes('file', 10).then(buffer => {
console.log(buffer);
});

File Uploading ReadAsDataUrl

I have a question about the File API and uploading files in JavaScript and how I should do this.
I have already utilized a file uploader that was quite simple, it simply took the files from an input and made a request to the server, the server then handled the files and uploaded a copy file on the server in an uploads directory.
However, I am trying to give people to option to preview a file before uploading it. So I took advantage of the File API, specifically the new FileReader() and the following readAsDataURL().
The file object has a list of properties such as .size and .lastModifiedDate and I added the readAsDataURL() output to my file object as a property for easy access in my Angular ng-repeat().
My question is, it occurred to me as I was doing this that I could store the dataurl in a database rather than upload the actual file? I was unsure if modifying the File data directly with it's dataurl as a property would affect its transfer.
What is the best practice? Is it better to upload a file or can you just store the dataurl and then output that, since that is essentially the file itself? Should I not modify the file object directly?
Thank you.
Edit: I should also note that this is a project for a customer that wants it to be hard for users to simply take uploaded content from the application and save it and then redistribute it. Would saving the files are urls in a database mitigate against right-click-save-as behavior or not really?
There is more then one way to preview a file. first is dataURL with filereader as you mention. but there is also the URL.createObjectURL which is faster
Decoding and encoding to and from base64 will take longer, it needs more calculations, more cpu/memory then if it would be in binary format.
Which i can demonstrate below
var url = 'https://upload.wikimedia.org/wikipedia/commons/c/cc/ESC_large_ISS022_ISS022-E-11387-edit_01.JPG'
fetch(url).then(res => res.blob()).then(blob => {
// Simulates a file as if you where to upload it throght a file input and listen for on change
var files = [blob]
var img = new Image
var t = performance.now()
var fr = new FileReader
img.onload = () => {
// show it...
// $('body').append(img)
var ms = performance.now() - t
document.body.innerHTML = `it took ${ms.toFixed(0)}ms to load the image with FileReader<br>`
// Now create a Object url instead of using base64 that takes time to
// 1 encode blob to base64
// 2 decode it back again from base64 to binary
var t2 = performance.now()
var img2 = new Image
img2.onload = () => {
// show it...
// $('body').append(img)
var ms2 = performance.now() - t2
document.body.innerHTML += `it took ${ms2.toFixed(0)}ms to load the image with URL.createObjectURL<br><br>`
document.body.innerHTML += `URL.createObjectURL was ${(ms - ms2).toFixed(0)}ms faster`
}
img2.src = URL.createObjectURL(files[0])
}
fr.onload = () => (img.src = fr.result)
fr.readAsDataURL(files[0])
})
The base64 will be ~3x larger. For mobile devices I think you would want to save bandwidth and battery.
But then there is also the latency of doing a extra request but that's where http 2 comes to rescue

Categories

Resources