javascript utf-8 encoded text being interpreted as unicode

javascript utf-8 encoded text being interpreted as unicode - javascript

I'm sending utf-8 encoded text to a wso2 service bus. The service bus interprets this as unicode. In the header the encoding is specified and the character type is also set as utf-8.
When sending the µ, it gets send as C2 B5. If I check this on the utf-8 codepage its the µ sign. But the bus thinks its C2 = Â and B5 = µ.
Does anyone know how to prevent or fix this?
EDIT:
The json my client send to the WSO32
Accept: application/json, text/plain, /
Content-Type: application/json;charset=utf-8
Accept-Language: nl-BE
Accept-Encoding: gzip, deflate
User-Agent: Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.2; Trident/6.0; MSAppHost/1.0)
Host: app1.o.esb.local
Content-Length: 156
Connection: Keep-Alive
Pragma: no-cache
{"meldingen":{"MeldingDto":[{"Id":-1,"Code":"","MeldingTijd":"2014-06-20T15:34:52.680+02:00","Melder":"ABW_PN","Commentaar":"testµ","InterventieId":381}]}}
Here in the hexview in fiddler the µ sign is C2 B5.
The WSO2 service sees this as 2 characters instead of one.
When I use soapUI and send the same character gets send as, B5 and that gets interpreted correctly. I would expect that the WSO2 would know how to handle UTF-8.
Should I edit my encoding to send B5 instead of C2 B5 or does the WSO2 esb need an aditional setting so it knows how to interpret the UTF8 encoding correctly?
Thanks in advance
Ian

My collegues and I figured out how to fix this. On the client we encoded the message in the following manner:
encodeURIComponent(comment);
In the case of the µ sign we got %C2%B5.
Then on the server side WSO2 we used a javascript function (local entry):
function decode(mc)
{
var Input = mc.getProperty('InputdecodeURIComponent').toString();
var Output = decodeURIComponent(Input);
mc.setProperty('OutputdecodeURIComponent', Output);
}
<property name="InputdecodeURIComponent" value=”ZET HIER ENCODED VALUE” />
<script language="js" key="HelperFunctions.js" function="decodeXml"/>
<property name="Output" expression="get-property('OutputdecodeURIComponent')"/>
I hope this helps.
Regards
Ian

Related

Content-length header doesnt match request body on chunked upload

Hello,
I have a problem with chunked uploading to google-cloud-storage (gcs) using dropzone.js.
What Im Trying To Do:
I want to upload a (bigger) file to google-cloud-storage via dropzone. Since its a quite large file I'm using dropzone's internal function to chunk (and upload) it. I'm trying to upload via signed url that I'm creating, initalizing and afterwards passing to dropzone so that it knows where to send the file. Also I'm adding a Content-Range header to the request so that gcs keeps track of the current file-upload status.
Current Status:
When the download starts, it seems to work. But after the first chunk finishes dropzone tries to reupload the first chunk, because the xhr-response status is 400.
The Problem:
When investigated the xhr request and response it showed that the Content Length and Content-Range header are NOT the same size. And that is also what the response text told me the error is.
My code in .on("sending") by dropzone:
let procChunks = file.upload.chunks; // Get all chunks processed until now
latestChunk = procChunks[procChunks.length-1]; // Select latest chunk
const chunkFirstByte = dz.options.chunkSize * latestChunk.index; // Calc first byte
const chunkLastByte = chunkFirstByte + (latestChunk.dataBlock.data.size-1); // Calc last byte
let header = "bytes "+chunkFirstByte+"-"+chunkLastByte +"/"+ (file.size-1); // Create header value
xhr.setRequestHeader("Content-Range", header); // Set header to xhr
Request Header:
PUT /upload/storage/v1/b/(myBucket)/o?uploadType=resumable&name=test2.fna&upload_id=(myUpLoadID) HTTP/2
Host: storage.googleapis.com
User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:83.0) Gecko/20100101 Firefox/83.0
Accept: application/json
Accept-Language: de,en-US;q=0.7,en;q=0.3
Accept-Encoding: gzip, deflate, br
Cache-Control: no-cache
X-Requested-With: XMLHttpRequest
Content-Range: bytes 0-8388607/13088352
Content-Type: multipart/form-data; boundary=---------------------------428770732237854445893641225227
Content-Length: 8388843
Origin: http://127.0.0.1:5000
Connection: keep-alive
Referer: http://127.0.0.1:5000/upload
TE: Trailers
The request body:
-----------------------------428770732237854445893641225227
Content-Disposition: form-data; name="file"; filename="test2.fna"
Content-Type: application/octet-stream
>NC_000913.3 Escherichia coli str. K-12 substr. MG1655, complete genome
AGCTTTTCATTCTGACTGCAACGGGCAATATGTCTCTGTGTGGATTAAAAAAAGAGTGTCTGATAGCAGCTTCTGAACTG
GTTACCTGCCGTGAGTAAATTAAAATTTTATTGACTTAGGTCACTAAATACTTTAACCAATATAGGCATAGCGCACAGAC
AGATAAAAATTACAGAGTACACAACATCCATGAAACGCATTAGCACCACCATTACCACCACCATCACCATTACCACAGGT
AACGGTGCGGGCTGACGCGTACAGGAAACACAGAAAAAAGCCCGCACCTGACAGTGCGGGCTTTTTTTTTCGACCAAAGG
TAACGAGGTAACAACCATGCGAGTGTTGAAGTTCGGCGGTACATCAGTGGCAAATGCAGAACGTTTTCTGCGTGTTGCCG
ATATTCTGGAAAGCAATGCCAGGCAGGGGCAGGTGGCCACCGTCCTCTCTGCCCCCGCCAAAATCACCAACCACCTGGTG
GCGATGATTGAAAAAACCATTAGCGGCCAGGATGCTTTACCCAATATCAGCGATGCCGAACGTATTTTTGCCGAACTTTT
GACGGGACTCGCCGCCGCCCAGCCGGGGTTCCCGCTGGCGCAATTGAAAACTTTCGTCGATCAGGAATTTGCCCAAATAA
AACATGTCCTGCATGGCATTAGTTTGTTGGGGCAGTGCCCGGATAGCATCAACGCTGCGCTGATTTGCCGTGGCGAGAAA
ATGTCGATCGCCATTATGGCCGGCGTATTAGAAGCGCGCGGTCACAACGTTACTGTTATCGATCCGGTCGAAAAACTGCT
GGCAGTGGGGCATTACCTCGAATCTACCGTCGATATTGCTGAGTCCACCCGCCGTATTGCGGCAAGCCGCATTCCGGCTG
ATCACATGGTGCTGATGGCAGGTTTCACCGCCGGTAATGAAAAAGGCGAACTGGTGGTGCTTGGACGCAACGGTTCCGAC
TACTCTGCTGCGGTGCTGGCTGCCTGTTTACGCGCCGATTGTTGCGAGATTTGGACGGACGTTGACGGGGTCTATACCTG
CGACCCGCGTCAGGTGCCCGATGCGAGGTTGTTGAAGTCGATGTCCTACCAGGAAGCGATGGAGCTTTCCTACTTCGGCG
CTAAAGTTCTTCACCCCCGCACCATTACCCCCATCGCCCAGTTCCAGATCCCTTGCCTGATTAAAAATACCGGAAATCCT
CAAGCACCAGGTACGCTCATTGGTGCCAGCCGTGATGAAGACGAATTACCGGTCAAGGGCATTTCCAATCTGAATAACAT
GGCAATGTTCAGCGTTTCTGGTCCGGGGATGAAAGGGATGGTCGGCATGGCGGCGCGCGTCTTTGCAGCGATGTCACGCG
CCCGTATTTCCGTGGTGCTGATTACGCAATCATCT
Response Text:
Invalid request. There were 8388843 byte(s) (or more) in the request body. There should have been 8388608 byte(s) (starting at offset 0 and ending at offset 8388607) according to the Content-Range header.
As you can maybe see the first 4 lines are NOT from the file.
Additional Information:
The file is pure text data (utf-8 encoded i guess).
Im chunking in 8mb large chunks (recommended by google).
Somewhere on google's api guide I read that when uploading via PUT there should/must be no other data except the file data.
Content-Length header gets added automatically by dropzone.
My Questions:
Where does those 4 lines come from?
Can I (re)set the xhr request body?
Is my problem maybe caused by some formating problem of file-data? (Like bytes, strings)?
In case that's not the problem - do u have any other idea what could be the problem?
THANK YOU VERY MUCH!
Any help appricated! If u need some more information please ask!

API Data Returning Unicode Characters in Console

I am facing a rather confusing problem since the last two days. I am working on a document management system, that uses an API that pulls in data from SOLR. The data is in tune of around ~15Mbs, and pulls records of more than 4000+ documents. The API has response in this format -
{
"documents": [
{
id: 123,
some_field: "abcd",
some_other_field: "abcdef"
},
{
id: 124,
some_field: "abcd1",
some_other_field: "abcdef1"
}
]
}
Everything works fine in browser. If I hit the endpoint in Chrome or Firefox browser, it gives me the correct output and I am able to see the JSON output.
However, if I try hitting the same API endpoint with a Java or JS code - the response code is 200, but the output in console (Terminal or Eclipse) shows unicode characters like \u0089 \u0078 U+0080 - all the output comes in this way, and since there are around 4000+ records being fetched by the API, the console kinda fills with all of these unicode characters.
The only difference that I see between the requests made from browser and the code is that in browser I can see Content-Encoding : gzip, while I cannot find this header from the code that I written . For eg - in JS code, through Chakram framework, I can check
expect(response).to.be.encoded.with.gzip
mentioned here. However, this returns a failure stating expected undefined to match gzip
What am I missing here? Is this something related to encoding/decoding or something entirely different?
Edit 1 : The Response Headers as seen in Network tab of Chrome :
cache-control: max-age=0, private, must-revalidate, max-age=315360000
content-encoding: gzip
content-type: application/json; charset=utf-8
date: Tue, 22 May 2018 06:07:26 GMT
etag: "a07eb7c1eef4ab97699afc8d61fb9c5d"
expires: Fri, 19 May 2028 06:07:26 GMT
p3p: CP="NON CUR OTPi OUR NOR UNI"
server: Apache
Set-Cookie : some_cookie
status: 200 OK
strict-transport-security:
transfer-encoding: chunked
vary: Accept-Encoding
x-content-type-options: nosniff
x-frame-options: SAMEORIGIN
x-request-id: abceefr4-1234-acds-100b-d2bef2413r47
x-runtime: 3.213943
x-ua-compatible: chrome=1
x-xss-protection: 1; mode=block
The Request Headers as seen in Network tab of Chrome
Accept: application/json, text/plain, */*
Accept-Encoding: gzip, deflate
Accept-Language: en-US,en;q=0.9
Connection: keep-alive
Cookie: some_cookie
Host: abcd.bcd.com
IV_USER: demouser123
IV_USER_L: demouser123
MAIL: demouser#f.com
PERSON_ID: 123
Referer: http://abcd.bcd.com
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.181 Safari/537.36
X-CSRF-TOKEN: some_csrf_token
Edit 2 : The tests that I am using
describe('Hits required API',()=>{
before(()=>{
return chakram.wait(api_response = chakram.get(url,options));
});
it('displayes response',()=>{
return api_response.then((t_resp)=>{
console.log(JSON.stringify(t_resp));
expect(t_resp).to.have.header('Content-Encoding','gzip');
});
});

This has nothing to do with encoding. The web server in general compresses to gzip to save the bandwidth since its redundant to transfer the whole 15MB file as is refer this article for more about gZip and the its working ( https://betterexplained.com/articles/how-to-optimize-your-site-with-gzip-compression/ ). So where does it went wrong and how it worked in chrome is pretty simple chrome has an inbuilt unicode parser(even an HTML parser) in its devTools which can show you the parsed content rather showing you the wiered text (same can be seen in response tab next to preview tab). why you see wierd text is that you are stingfying the response which will escape special character if any console.log(JSON.stringify(t_resp));. You cannot use something like console.log("response", t_resp); without stringifying in terminal since the terminal doesn't have a JSON or an unicode parser it just prints in text. try removing that console since stringifying a 15mb file is a costly process.
Edit 1:-
if you still want to output in the console here whats to be done.
Since NODE cannot decode gzip by default directly (not with chakram, its just a APItesting platform) you can use zlib to do this. Please find the example snippet
const zlib = require('zlib');
describe('Hits required API',()=>{
before(()=>{
return chakram.wait(api_response = chakram.get(url,options));
});
it('displayes response',()=>{
return api_response.then((t_resp)=>{
zlib.gunzip(t_resp, function(err, dezipped) {
console.log(dezipped);
});
});
});

Try with console.dir to display your values
describe('Hits required API',()=>{
before(()=>{
return chakram.wait(api_response = chakram.get(url,options));
});
it('displayes response',()=>{
return api_response.then((t_resp)=>{
console.dir(t_resp, { depth: null });
});
});
Console.dir

Websocket handshake Sec-WebSocket-Accept header value is incorrect

I'm writing a c++ websocket server, dev tools on chrome says sec-websocket-accept header value is incorrect. I've tested for days and it all seems fine. The client closes with readystate 3 without the websocket onopen being called although it shows as 101 in chrome dev tools.
This is my code for calculating the keys
string magickey = "258EAFA5-E914-47DA-95CA-C5AB0DC85B11";
string key = msgkey.append(magickey);
unsigned char* sha_str = SHA1(reinterpret_cast<const unsigned char*>(key.c_str()), key.length(), nullptr);
string acceptkey = base64_encode(reinterpret_cast<const unsigned char*>(sha_str), strlen((char*)sha_str));
string handshake_response = "HTTP/1.1 101 Switching Protocols\r\n";
handshake_response.append("Upgrade: websocket\r\n");
handshake_response.append("Connection: Upgrade\r\n");
handshake_response.append("Sec-WebSocket-Accept: "+acceptkey+"\r\n");
handshake_response.append("\r\n");
Chrome Response
HTTP/1.1 101 Switching Protocols
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Accept: 5T5MvxP1iz40vLpi3kQs/ifDaCo=
Chrome Request
GET ws://localhost:4897/echo HTTP/1.1
Host: localhost:4897
Connection: Upgrade
Pragma: no-cache
Cache-Control: no-cache
Upgrade: websocket
Origin: http://localhost
Sec-WebSocket-Version: 13
User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.106 Safari/537.36
Accept-Encoding: gzip, deflate, sdch
Accept-Language: en-US,en;q=0.8
Sec-WebSocket-Key: LKF8lHGznbKGIgO1UzAOhg==
Sec-WebSocket-Extensions: permessage-deflate; client_max_window_bits
It says "Error during WebSocket handshake: Incorrect 'Sec-WebSocket-Accept' header value".
Chrome also shows one additional frame received size 79 bytes opcode -1.
Thanks Heaps!

Chrome says that 'Sec-WebSocket-Accept' is incorrect. Trying to compute it manually, I have to agree with Chrome.
My test:
concat "LKF8lHGznbKGIgO1UzAOhg==" and "258EAFA5-E914-47DA-95CA-C5AB0DC85B11"
=> "LKF8lHGznbKGIgO1UzAOhg==258EAFA5-E914-47DA-95CA-C5AB0DC85B11", that's key.
compute SHA1 160 bits hex: bf15 14e3 7108 0ee4 7782 c709 a767 cc72 423d e5c4
From your log, your encoding to base64 is: 5T5MvxP1iz40vLpi3kQs/ifDaCo=
Decoding it to hex: e53e 4cbf 13f5 8b3e 34bc ba62 de44 2cfe
27c3 682a
The bold values should be equal. Feel free to correct me if I'm wrong somewhere.
Possible problems:
Is sha_str null-terminated ? i.e. strlen((char*)sha_str) == 20 ?
signed/unsigned char mixup ?

Express JS 4.0, serve binary data, request Accept header changes output

Thanks in advance.
Short:
Express JS 4.0 alters the output data, due to the Accept headers in the request.
Is there a way for me to override this behaviour, and just write the same data regardless of the request headers.
When Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8 is present output is changed.
Is there a way I can ignore, remove, override these headers.
Long (probably tl;dr):
I am trying to serve binary data from a Node/ExpressJS app.
I am storing a compressed log file (plain/text), that has been gzipped, base64 encoded and sent to my server app, where it is being stored in a mongo database using mongoose. I know this is probably not optimal, but is currently a necessary evil. This is working fine.
$(gzip --stdout /var/log/cloud-init-script.log | base64 --wrap=0)
Is being used to compress and base64 the data, before it is sent with other data as part of a json post.
The problem occurs when I attempt to retrieve, decode the base64 encoded string and send to the browser as a binary gzip file.
// node, referring to the machine the log came from
var log = new Buffer(node.log, 'base64');
res.setHeader('Content-Disposition', 'attachment; filename=' + node.name + "-log.gz");
res.setHeader('Content-Type', 'application/x-gzip');
res.setHeader('Content-Length', log.length);
console.log(log.toString('hex'));
// res.end(log, 'binary'); I tried this hoping I could by pass, some content-negotiation
res.send(log);
I had this working when using ExpressJS 3.0 using res.send.
But when I updated to ExpressJS 4.0 the downloaded data, ceased to extract properly. The data being pulled down seemingly corrupt somehow.
I started to try and fix this by comparing the downloaded file and the source file in hexidecimal output using xxd or od and found that the downloaded file was different to the source. I also dumped the hex of the NodeJS Buffer just before it is sent to the client to console, and this matches the source.
I have been banging my head against this issued for nearly a day now, and have suspected that NodeJS might be doing something funky with character encoding (UTF-8 v. Buffer v. UTF16 Strings), OS endianess.
Eventually finding none of this the be problem, I had assumed NodeJS had always been outputting the wrong data to the browser, which was correct, but it wasn't "Always" outputting the wrong data.
I had a break through, when I did a curl request to the endpoint, and the data came through as expected (matching the source), I then added the request headers that were sent with my browser requests, and got back the mangled data.
Actual log file:
I'm a log file
Good Request:
> User-Agent: curl/7.37.1
> Host: 127.0.0.1:9000
> Accept: */*
>
< HTTP/1.1 200 OK
< X-Powered-By: Express
< Last-Modified: Tue, 26 May 2015 11:47:46 GMT
< Content-Description: File Transfer
< Content-Disposition: attachment; filename=test-log.gz
< Content-Type: application/x-gzip
< Content-Transfer-Encoding: binary
< Content-Length: 57
< Date: Tue, 26 May 2015 11:47:46 GMT
< Connection: keep-alive
0000000: 1f8b 0808 0256 6455 0003 636c 6f75 642d .....VdU..cloud-
0000010: 696e 6974 2d73 6372 6970 742e 6c6f 6700 init-script.log.
0000020: f354 cf55 4854 c8c9 4f57 48cb cc49 e502 .T.UHT..OWH..I..
0000030: 003b 5ff5 5f0f 0000 00 .;_._....
Bad Request:
> Host: localhost:9000
> Connection: keep-alive
> Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
> User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/43.0.2357.65 Safari/537.36
> Referer: http://localhost:9000/nodes?query=environment%3D5549b6cbdc023b5e26fe6bd4%20type%3Dnat
> Accept-Language: en-US,en;q=0.8
>
< HTTP/1.1 200 OK
< X-Powered-By: Express
< Last-Modified: Tue, 26 May 2015 11:47:00 GMT
< Content-Description: File Transfer
< Content-Disposition: attachment; filename=test-log.gz
< Content-Type: application/x-gzip
< Content-Transfer-Encoding: binary
< content-length: 57
< Date: Tue, 26 May 2015 11:47:00 GMT
< Connection: keep-alive
0000000: 1ffd 0808 0256 6455 0003 636c 6f75 642d .....VdU..cloud-
0000010: 696e 6974 2d73 6372 6970 742e 6c6f 6700 init-script.log.
0000020: fd54 fd55 4854 fdfd 4f57 48fd fd49 fd02 .T.UHT..OWH..I..
0000030: 003b 5ffd 5f0f 0000 00 .;_._....

res.end(node.log, 'base64');
instead of
res.send(log);
Where node.log is the raw base64 encoded String and log was a Buffer that had decoded that string.
Bearing in mind I am using Node v0.10.38.
I ended up following the function call chain.
// I call
res.send(log);
// ExpressJS calls on http.ServerResponse
this.end(chunk, encoding); // chunk = Buffer, encoding = undefined
// NodeJS http.ServerResponse calls
res.inject(string);
At this point NodeJS appears to be treating the data as a string, which is where the buffer contents were being mangled.
This behaviour was different when the 'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8' header was not present, a different end(chunk, encoding) function was being called in this case, not using res.inject and not mangling the Buffer data.
I am not entirely sure where the content negotiation is happening and what is swapping in the different res.end functions, whether this is NodeJS or ExpressJS, but it would be nice to be able to control this content negotiation in some simple way.

DOJO xhrGet json with special character

I use dojo and struts 1.3.8 and I want to pass some "special" characters like è,°,ù,€ via dojo.xhrGet to an action struts saveBill. But when I print the json in the action it gave me
Ã¨ la prova nÂ°10
I don't know where is the problem, I set all content-type to utf-8 in all the jsp... I used also a struts filter for encoding in utf-8... nothing... where I wrong?
This is the javascript code
var billJson = {"row":"0","descr":"è la prova n°10"};
dojo.xhrGet({
url: "saveBill.do",
headers: {'bill': billJson, 'Content-Type': 'application/json; charset=UTF-8'},
handleAs: "text",
load: function(response, ioArgs) {
showMessage(response);
},
error: function(message, ioArgs) {
showMessage(message);
}
});
and this is the response header (copy&paste from Firebug)
Host localhost:9080
User-Agent Mozilla/5.0 (Windows NT 5.1; rv:2.0) Gecko/20100101 Firefox/4.0
Accept text/html,application/xhtml+xml,application/xml;q=0.9,**;q=0.8
Accept-Language it-it,it;q=0.8,en-us;q=0.5,en;q=0.3
Accept-Encoding gzip, deflate
Accept-Charset UTF-8,*
Keep-Alive 115
Connection keep-alive
bill {"row":"0","descr":"Ã¨ la prova nÂ°10"}
Content-Type application/json; charset=UTF-8
X-Requested-With XMLHttpRequest
Referer http://localhost:9080/CA3_S_001/login.do
Cookie invoiceTreeSaveStateCookie=undefined%2C000001%2C000014; JSESSIONID=0000QeyArD4K7CDr_oyNkrpw9Zk:-1
Thanks!!!

you need to unicode escape those characters: so instead of è, you'd have \\u00E8
here's a resource: http://www.fileformat.info/info/unicode/category/Ll/list.htm

Sending JSON in an HTTP header is fairly non-standard, and I wouldn't recommend it. One reason not to send JSON in a header is that, as you've found out, HTTP headers are just bytes; they have no intrinsic code page to translate them into strings. I would instead send the JSON data in a POST body; I think you'll have much, much better luck.
However, if you absolutely must send non-ASCII JSON data in a header, you can try calling ServletRequest.setCharacterEncoding("UTF-8"). I think it only affects the parsing of URL parameters and POST bodies, but it's worth a try. You could also, as Robot Woods suggests, \uXXXX encode all non-ASCII characters in the JSON (where XXXX is the hex representation of the UTF-16 encoding of the character).
But seriously, just put it in the POST body; it's a stronger, better solution.

Develop Reference

JavaScript is the programming language of the Web.

javascript utf-8 encoded text being interpreted as unicode - javascript

Related

Content-length header doesnt match request body on chunked upload

API Data Returning Unicode Characters in Console

Websocket handshake Sec-WebSocket-Accept header value is incorrect

Express JS 4.0, serve binary data, request Accept header changes output

DOJO xhrGet json with special character

Categories

Resources