Fetching non-utf8 data with XMLHttpRequest - javascript

I want to fetch a document from the web with xmlHttpRequest. However the text in question isn't utf8 (in this case it's windows-1251 but in the generic case, I wouldn't know that for sure).
However, if I use responseType="text" it treats it as though the string is utf8, ignoring the charset in the content-type (resulting in a nasty mess).
If I used 'blob' (probably the nearest thing to what I want), could I then convert that to a DomString taking into account the encoding?

I actually found an API which does what I want, from here:
https://developers.google.com/web/updates/2014/08/Easier-ArrayBuffer-String-conversion-with-the-Encoding-API
Basically, use responseType="arraybuffer", pick the encoding from the returned headers, and use DataView and TextDecoder. It does exactly what is required.
const xhr = new XMLHttpRequest();
xhr.responseType = "arraybuffer";
xhr.onload = function() {
const contenttype = xhr.getResponseHeader("content-type");
const charset = contenttype.substring(contenttype.indexOf("charset=") + 8);
const dataView = new DataView(xhr.response);
const decoder = new TextDecoder(charset);
console.log(decoder.decode(dataView));
}
xhr.open("GET", "https://people.w3.org/mike/tests/windows-1251/test.txt");
xhr.send(null);
fetch("https://people.w3.org/mike/tests/windows-1251/test.txt")
.then(response => {
const contenttype = response.headers.get("content-type");
const charset = contenttype.substring(contenttype.indexOf("charset=") + 8);
response.arrayBuffer()
.then(ab => {
const dataView = new DataView(ab);
const decoder = new TextDecoder(charset);
console.log(decoder.decode(dataView));
})
})

If I used 'blob' (probably the nearest thing to what I want), could I then convert that to a DomString taking into account the encoding?
https://medium.com/programmers-developers/convert-blob-to-string-in-javascript-944c15ad7d52 outlines a general approach you can use. To apply that to the case of fetching a remote document:
Create a FileReader to read in the fetch response as a Blob
Use FileReader.readAsText() to get back text from that Blob in the right encoding
Like this:
const reader = new FileReader()
reader.addEventListener("loadend", function() {
console.log(reader.result)
})
fetch("https://people.w3.org/mike/tests/windows-1251/test.txt")
.then(response => response.blob())
.then(blob => reader.readAsText(blob, "windows-1251"))
Or if you instead really want to use XHR:
const reader = new FileReader()
reader.addEventListener("loadend", function() {
console.log(reader.result)
})
const xhr = new XMLHttpRequest()
xhr.responseType = "blob"
xhr.onload = function() {
reader.readAsText(xhr.response, "windows-1251")
}
xhr.open("GET", "https://people.w3.org/mike/tests/windows-1251/test.txt", true)
xhr.send(null)
However, if I use responseType="text" it treats it as though the string is utf8, ignoring the charset in the content-type
Yes. That’s what’s required by the Fetch spec (which for this is what the XHR spec relies on too):
Objects implementing the Body mixin also have an associated package data algorithm, given bytes, a type and a mimeType, switches on type, and runs the associated steps:
…
↪ text
            Return the result of running UTF-8 decode on bytes.

Related

Decode XMLHTTPResponseText into dataUrl without base encoding on server side

How can the plain text response of an XMLHTTPRequest be converted to a dataUrl on client side?
Image data is being send from the server to an Iframe and the only option to retrieve the data is the default encoded data from a GET request.*
I do not have any control over the server. I can not specify overrideMimeType nor the responseType of the request.
I tried to utf8 encode the returned data:
const utf8 = new TextEncoder();
const bytes = utf8.encode(imageDataAsText);
//Convert to data url
const fr = new FileReader();
const blob = new Blob([bytes], { type: attachment.metadata.mediaType });
fr.onload = (e) => {
setImageData(fr.result as string);
};
fr.readAsDataURL(blob);
Converting via the char code didn't work either:
https://stackoverflow.com/a/6226756/3244464
let bytesv2 = []; // char codes
for (var i = 0; i < imageDataAsString.length; ++i) {
var code = imageDataAsString.charCodeAt(i);
bytesv2 = bytesv2.concat([code & 0xff, code / 256 >>> 0]);
}
Raw data as it is displayed by console out. What is the actual default encoding of the data I am working with here?
Context:
* The data in question is recieved inside an iframe inside the atlassian (jira/confuence) ecosystem. They do not support piping binary data from the parent frame to the iframe, nor can I make my own request due to the authorization flow which requires cookies stored on the parent page. All other options mention to override some encoding, or changing it on the server side do not apply in this specific case.
When you are using XMLHttpRequest to get binary data then don't return it as a string. Use xhr.responseType = 'blob' (or arrayBuffer if you intend to read/modify it)
var xhr = new XMLHttpRequest()
xhr.responseType = 'blob'
xhr.onload = () => {
// Convert xhr blob to data url
const fr = new FileReader()
fr.onload = () => {
setImageData(fr.result)
}
fr.readAsDataURL(xhr.response)
}
Better yet use the fetch api instead of the old XMLHttpRequest to get the binary
It's more popular among web workers and servers. It's also simpler and based on promises
fetch(url)
.then(res => res.blob())
.then(blob => { ... })
And why do you need it to be a base64 url? if it's just to show a preview of an <img> then it's a waste of time, cpu and memory.
It would be better just to do:
img.src = URL.createObjectURL(blob)

Blob image from database to Java and from Java to Javascript Image

I have Blob, which stored in db and i take it from database with java server like this:
Entity.java
#Column(name = "img")
private Blob img;
public Blob getImg() {
return img;
}
public void setImg(Blob img) {
this.img = img;
}
Repository.java
#Transactional
#Query(value = "SELECT img FROM articles WHERE category = ?", nativeQuery = true)
//Blob findP(String category);
Blob findPic(String category);
Controller.java
#RequestMapping(value="/Pic_test")
#ResponseBody
public Blob getPics() throws SQLException, IOException {
return remindRepository.findPic("Java");
}
Then I receive it with Javascript to image it:
function toDataURL(url, callback) {
var xhr = new XMLHttpRequest();
xhr.onload = function() {
var reader = new FileReader();
reader.onloadend = function() {
callback(reader.result);
}
reader.readAsDataURL(xhr.response);
};
xhr.open('GET', url);
xhr.responseType = 'blob';
xhr.send();
}
toDataURL('http://localhost:8080/articles/Pic_test', function(dataUrl) {
var display = document.getElementById('display');
var url = window.URL.createObjectURL(new Blob([dataUrl]));
var img = new Image();
img.src = url;
document.getElementById("demo").innerHTML = img.src;
})
However, if I call my "img" Blob in java code, i have an error in server, but if I call it byte[], my picture is not shown just.
I can't comment the java part since I know nothing about it, but for the javascript one, what you do is... not correct.
You don't seem to understand what is a data URL, nor what you are doing here.
So a data URL is a string, made of an header and of some file content (data:|mime/type;|file-content).
A data URL is an URL that points to itself, useful to embed data that should normally be served from network.
Quite often, the file content part is encoded as base64, because the URI scheme is limited in its set of allowed characters, and that binary data couldn't be represented in this scheme.
Now let's see what you are doing here...
You are downloading a resource as a Blob. That's good, Blob are perfect objects to deal with binary data.
Then, you read this Blob a data URL. Less good, but I can see the logic, <img> can indeed load images from data URLs.
But then from this data URL string, you create a new Blob! This is completely wrong. The Blob you just created with new Blob([dataUrl]) is a text file, not your image file in any way. So yes, the data is still hidden somewhere in the base64 data which is itself in the data URL, but what your poor <img> will see when accessing the data hooked by the Blob URI is really just text, data:image/png;base64,iVBORw0... and not at all �PNG... like its parsing algo can read.
So the solution is quite easy: get rid of the FileReader step. You don't need it.
var xhr = new XMLHttpRequest();
xhr.open('get', 'https://upload.wikimedia.org/wikipedia/commons/4/47/PNG_transparency_demonstration_1.png');
xhr.responseType = 'blob';
xhr.onload = display;
xhr.send();
function display(evt) {
// we did set xhr.responseType = "blob"
var blob = evt.target.response; // so this is a Blob
// hence, no need for anything else than
var url = URL.createObjectURL(blob);
var img = new Image();
img.src = url;
document.body.appendChild(img);
}
And if I may, all your thing could also just be
document.getElementById('display').src = 'http://localhost:8080/articles/Pic_test';

What is the output of a piped file stream?

Perhaps the question is not worded in the greatest way but here's some more context. Using GridFSBucket, I'm able to store a file in mongo and obtain a download stream for that file. Here's my question. Let's say I wanted to send that file back as a response to my http request.
I do:
downloadStream.pipe(res);
On the client side now when I print the responseText, I get some long string with some funky characters that look to be encrypted. What is the format/type of this string/stream? How do I setup my response so that I can get the streamed data as an ArrayBuffer on my client side?
Thanks
UPDATE:
I haven't solved the problem yet, however the suggestion by #Jekrb, gives exactly the same output as doing console.log(this.responseText). It looks like the string is not a buffer. Here is the output from these 2 lines:
console.log(this.responseText.toString('utf8'))
var byteArray = new Uint8Array(arrayBuffer);
UPDATE 2 - THE CODE SNIPPETS
Frontend:
var savePDF = function(blob){
//fs.writeFile("test.pdf",blob);
var xhr = new XMLHttpRequest();
xhr.onreadystatechange = function() {
if (this.readyState === XMLHttpRequest.DONE && this.status === 200){
//TO DO: Handle the file in the response which we will be displayed.
console.log(this.responseText.toString('utf8'));
var arrayBuffer = this.responseText;
if (arrayBuffer) {
var byteArray = new Uint8Array(arrayBuffer);
}
console.log(arrayBuffer);
}
};
xhr.open("POST","/pdf",true);
xhr.responseType = 'arrayBuffer';
xhr.send(blob);
};
BACKEND:
app.post('/pdf',function(req,res){
MongoClient.connect("mongodb://localhost:27017/test", function(err, db) {
if(err) return console.dir(err);
console.log("Connected to Database");
var bucket = new GridFSBucket(db, { bucketName: 'pdfs' });
var CHUNKS_COLL = 'pdfs.chunks';
var FILES_COLL = 'pdfs.files';
// insert file
var uploadStream = bucket.openUploadStream('test.pdf');
var id = uploadStream.id;
uploadStream.once('finish', function() {
console.log("upload finished!")
var downloadStream = bucket.openDownloadStream(id);
downloadStream.pipe(res);
});
// This pipes the POST data to the file
req.pipe(uploadStream);
});
});
My guess is that either the response is being outputted as plain binary which is not base64 encoded (still a buffer) or it is a compressed (gzip) response that needs to be uncompressed first.
Hard to pinpoint the issue without seeing the code though.
UPDATE:
Looks like you're missing the proper response headers.
Try setting these headers before the downloadStream.pipe(res):
res.setHeader('Content-disposition', 'attachment; filename=test.pdf');
res.set('Content-Type', 'application/pdf');
Your stream is likely already a buffer. You might be able to call responseText.toString('utf8') to convert the streamed data into readable string.
I solved it!!!
Basically preset the response type to "arraybuffer" before you make the request using
req.responseType = "arraybuffer"
Now, once you receive the response, don't use responseText, instead use response. response contains the arraybuffer with the data for the file.

How to compare two blobs in JavaScript?

I load two blob files in JavaScript using the code below.
I want to compare them to see if they are precisely the same.
(blob1 === blob2) is returning false, even though the reported size of each blob is 574 bytes. What am I doing wrong?
getHTTPAsBlob(url, callback) {
let cacheBust = Math.random().toString()
url = url + '?&cachebust=' + cacheBust
let xhr = new XMLHttpRequest();
xhr.open('GET', url, true);
xhr.responseType = 'blob';
xhr.onload = function (e) {
if (xhr.status == 200) {
// get binary data as a response
let fileData = this.response;
let contentType = xhr.getResponseHeader("content-type")
var reader = new FileReader()
reader.onload = (e) => {
console.log(reader.result)
console.log(fileData)
callback(null, {
Body: reader.result,
Blob: fileData,
ContentType: contentType,
Etag: null,
LastModified: null,
})
}
reader.readAsText(fileData)
} else {
callback(xhr)
}
}
xhr.send();
}
You must use a FileReader to compare them
The documentation for FileReader is on MDN. Take a look at methods to choose what's best for your data.
A useless way of comparing two blobs is looking at their size. Although two blobs of the same size will return true no matter their content.
Example
new Blob([Math.random()]).size == new Blob([Math.random()]).size // true.
Unfortunately, this is a case of comparing two completely independent pointers that reference comparable content. To see how this might work for simpler content, try the following in the javascript console:
new Number(5) == new Number(5)
Returns false. Frustrating that 5 as an object does not equal an object whose value is 5. But at least this puts Blob into context.
I'm running into this same problem, and agree with the previous suggestion that FileReader is the only option.
I think "blob compare" plug-in will be help you.
It can compare blob size,type and so on.
https://www.npmjs.com/package/blob-compare
but I don't know if this plug-in is the perfect way.

Chrome extension: how to pass ArrayBuffer or Blob from content script to the background without losing its type?

I have this content script that downloads some binary data using XHR, which is sent later to the background script:
var self = this;
var xhr = new XMLHttpRequest();
xhr.open('GET', url);
xhr.responseType = 'arraybuffer';
xhr.onload = function(e) {
if (this.status == 200) {
self.data = {
data: xhr.response,
contentType: xhr.getResponseHeader('Content-Type')
};
}
};
xhr.send();
... later ...
sendResponse({data: self.data});
After receiving this data in background script, I'd like to form another XHR request that uploads this binary data to my server, so I do:
var formData = new FormData();
var bb = new WebKitBlobBuilder();
bb.append(data.data);
formData.append("data", bb.getBlob(data.contentType));
var req = new XMLHttpRequest();
req.open("POST", serverUrl);
req.send(formData);
The problem is that the file uploaded to the server contains just this string: "[object Object]". I guess this happens because ArrayBuffer type is lost somehow while transferring it from content process to the background? How can I solve that?
Messages passed between a Content Script and a background page are JSON-serialized.
If you want to transfer an ArrayBuffer object through a JSON-serialized channel, wrap the buffer in a view, before and after transferring.
I show an isolated example, so that the solution is generally applicable, and not just in your case. The example shows how to pass around ArrayBuffers and typed arrays, but the method can also be applied to File and Blob objects, by using the FileReader API.
// In your case: self.data = { data: new Uint8Array(xhr.response), ...
// Generic example:
var example = new ArrayBuffer(10);
var data = {
// Create a view
data: Array.apply(null, new Uint8Array(example)),
contentType: 'x-an-example'
};
// Transport over a JSON-serialized channel. In your case: sendResponse
var transportData = JSON.stringify(data);
//"{"data":[0,0,0,0,0,0,0,0,0,0],"contentType":"x-an-example"}"
// At the receivers end. In your case: chrome.extension.onRequest
var receivedData = JSON.parse(transportData);
// data.data is an Object, NOT an ArrayBuffer or Uint8Array
receivedData.data = new Uint8Array(receivedData.data).buffer;
// Now, receivedData is the expected ArrayBuffer object
This solution has been tested successfully in Chrome 18 and Firefox.
new Uint8Array(xhr.response) is used to create a view of the ArrayBuffer, so that the individual bytes can be read.
Array.apply(null, <Uint8Array>) is used to create a plain array, using the keys from the Uint8Array view. This step reduces the size of the serialized message. WARNING: This method only works for small amounts of data. When the size of the typed array exceeds 125836, a RangeError will be thrown. If you need to handle large pieces of data, use other methods to do the conversion between typed arrays and plain arrays.
At the receivers end, the original buffer can be obtained by creating a new Uint8Array, and reading the buffer attribute.
Implementation in your Google Chrome extension:
// Part of the Content script
self.data = {
data: Array.apply(null, new Uint8Array(xhr.response)),
contentType: xhr.getResponseHeader('Content-Type')
};
...
sendResponse({data: self.data});
// Part of the background page
chrome.runtime.onMessage.addListener(function(data, sender, callback) {
...
data.data = new Uint8Array(data.data).buffer;
Documentation
MDN: Typed Arrays
MDN: ArrayBuffer
MDN: Uint8Array
MDN: <Function> .apply
Google Chrome Extension docs: Messaging > Simple one-time requests
"This lets you send a one-time JSON-serializable message from a content script to extension, or vice versa, respectively"
SO bonus: Upload a File in a Google Chrome Extension - Using a Web worker to request, validate, process and submit binary data.
For Chromium Extensions manifest v3, URL.createObjectURL() approach doesn't work anymore because it is prohibited in the service workers.
The best (easiest) way to pass data from a service worker to a content script (and vice-versa), is to convert the blob into a base64 representation.
const fetchBlob = async url => {
const response = await fetch(url);
const blob = await response.blob();
const base64 = await convertBlobToBase64(blob);
return base64;
};
const convertBlobToBase64 = blob => new Promise(resolve => {
const reader = new FileReader();
reader.readAsDataURL(blob);
reader.onloadend = () => {
const base64data = reader.result;
resolve(base64data);
};
});
Then send the base64 to the content script.
Service worker:
chrome.tabs.sendMessage(sender.tab.id, { type: "LOADED_FILE", base64: base64 });
Content script:
chrome.runtime.onMessage.addListener(async (request, sender) => {
if (request.type == "LOADED_FILE" && sender.id == '<your_extension_id>') {
// do anything you want with the data from the service worker.
// e.g. convert it back to a blob
const response = await fetch(request.base64);
const blob = await response.blob();
}
});

Categories

Resources