How can I decode a rfc2047 encoded string using javascript?

How can I decode a rfc2047 encoded string using javascript? - javascript

I have a service which returns e.g. jpg file. Name of it is in Content-Disposition header, but the filename is encoded in rfc2047 standard, so I have:
filename="=?UTF-8?Q?=C4=99=C5=9B.jpg?=" => decoded to = ęś.jpg
I found that there is in Java MimeUtility.decodeText which works nice, but I need to decode this text on client site using Angular 8. I tried to decode this text on server-site using Java and pass it to client-site and then decode it using decodeURIComponent(escape(filename)), but it doesn't work.
Is there any equivalent of function or maybe npm dependency to decode it in Angular 8?

If you can perform the decoding on server side, then do that and then encode it using something that JS can decode natively.
For example, you could send URL-encoded (percent encoded) string, which would be %C4%99%C5%9B.jpg in your example, and decode it using decodeURIComponent() (note that you should not add the escape call as in your question).
Base64 won't work because the JS built-in method (atob) will need extra work to decode these characters (https://developer.mozilla.org/en-US/docs/Web/API/WindowOrWorkerGlobalScope/btoa#unicode_strings)
Or if you are looking for npm module, here is one that supports parsing the value directly: https://www.npmjs.com/package/content-disposition

I came to this topic when my AWS SES subject line contains some non standard characters. I found this repo but it's a NodeJS project. https://github.com/One-com/rfc2047. I also made static site https://ldu2.github.io/rfc2047/ for live conversion. Just in case someone wants my source code. https://github.com/ldu2/rfc2047

Related

How to convert a file encoded to base64 string by javascript to a Ruby file object?

I am trying to upload a file to Rails 6 Activestorage api (using ReactJS), the docs for this are not very clear to me but after lots of searching, it seems best to use:
obj.attachment_name.attach(file)
This works well if I have a file on the Rails end, however passing the file from the React to the Rails side is a bit tricky, the best option I found was to transform the file to base64 string & send it to the API
So how to convert a javascript file passed as base64 string on the Rails side to a file object?
I tried this answer, but surprisingly, it doesn't convert the string to a file (I wonder if that is because JS conversion is different than Rails base64 conversion)
Note:
This works well to encode & decode files using ruby only
# Testing plain ruby
# Open the file you wish to encode
file_path = "/Users/...path/some_image.jpg"
data = File.open(file_path).read
# Encode the image
encoded = Base64.encode64(data)
# Why this block doesn't work for a JS encoded base64 string??
# i.e, if I passed encoded string from JS here, it won't work
File.open("some_filename", "wb") do |file|
file.write(Base64.decode64(encoded))
end

Node.js: How to send file text-content to user with russian symbols (I need utf8)?

I am develop in node.js
I get file text-content using fs-module. This is .txt file
Привет мир
Hello world
Hello world
...
After I want to send this content to user to show using res.end(fileString)
But my browser can't decode russian symbols. Only english words are correct.

First and foremost, be sure that the browser is set correctly to use UTF-8 encoding.
If the text still does not show correctly, you can try an alternate encoding for sending the data, depending on how you're doing this.
As an example, you can encode the Russian in base64 prior to sending it over to the browser, and decode it on the browser side.
On the NodeJS side, you can use Buffer to encode the utf8 string to base64. Reference this thread for examples and more details.
On the client/browser side, the atob() and btoa() functions are used rather than Buffer. Here is the documentation for these functions. Use atob() in order to decode the base64 to utf8, and the browser will be able to show this correctly.

how to get data from a base64 encoding of a .pptx file in javascript

I get data from a server of the .pptx file in base64 encoding now i would like to get the text that is present inside the base64 data.
Is there any third party java script library to do this especially scanning in base64 code rather than taking the file path and i would like insert these strings into a power point using office js.
Client side would be preferred.
thanks

Seems that what you need is a JavaScript decoder for base64 files, there are many projects in Github Doing this, for instance here is one https://github.com/mathiasbynens/base64.
That said, I am not sure about your scenario, and what types of files are been base64-encoded. Base64 at the end of the day is a text "representation" of usually a binary file, like an image or a compressed zip file. I wonder if once you decode it you will get what you expect. And if you are expecting text, i wonder why your service is even encoding it like this.
Anyways... once you have whatever text you want to insert, you can use the setSelectedDataAsync method of our Office.js in PPT to write it in your presentation's active selection. https://dev.office.com/reference/add-ins/shared/document.setselecteddataasync

Encode/Decode a picture in Hex

A client (a job board) has asked me to do the following :
Create a form, gather informations and create an xml file containing all those informations everytime a user fills out the form, easy enough.
So the client sent me an xml model and within this xml file there is an encoded picture and an encoded CV, both are encoded (it seems in HEX), and i cannot understand how to decode (or even encode under the same format for that matter) the piece of HEX.
Here are pieces of the xml fiel, I cannot post it entirely, you'll surely understand why.
<photo>
FFD8FFE10A1845786966000049492A000800000008000F010200040000004854430010010200150000006E0000001201030001000000010[.............]EF6A57F5A8E41EE594D62075FF8F77CFF00B1FF00D7A7C17D13B7FA99157FE0269C60E22E4D4DAB38A09E24788F5FF80D5B5B5FEE9ACE32E518AB6DFEDAD1F653FC2D57700FB23FFB1F9D5DB64289B4
</photo>
<cv>
255044462D312E340D0A25C8C9CACB0D0A372030206F626A0D0A3C3C2F54797065202F506167652F506172656E742033203020522F436F6E74656E74.......
</cv>
<extensionCv>.pdf</extensionCv>
And just to make it harder here are several points to take into consideration :
This file is to be used to import informations into a software which has been developed especially for this company, I do not have access to it, and cannot get in touch with the company that designed it. The xml file has been created by this software as an export of a candidat file.
I cannot encode it in base64 (it'd be too easy), it needs to be the same encoding.
I need to be able to encode it in either js, or php (once im sure the software can decode if, i'll only need to encode, I won't need to decode anything).
Thank you.

You can use bin2hex PHP function to convert binary data into HEX string. Please check PHP documentation where you can find an example of bin2hex with reading a binary file

Having encoded a unicode string in javascript, how can I decode it in Python?

Platform: App Engine
Framework: webapp / CGI / WSGI
On my client side (JS), I construct a URL by concatenating a URL with an unicode string:
http://www.foo.com/地震
then I call encodeURI to get
http://www.foo.com/%E5%9C%B0%E9%9C%87
and I put this in a HTML form value.
The form gets submitted to PayPal, where I've set the encoding to 'utf-8'.
PayPal then (through IPN) makes a post request on the said URL.
On my server side, WSGIApplication tries to extract the unicode string using a regular expression I've defined:
(r'/paypal-listener/(.+?)', c.PayPalIPNListener)
I'd try to decode it by calling
query = unquote_plus(query).decode('utf-8')
(or a variation) but I'd get the error
/paypal-listener/%E5%9C%B0%E9%9C%87
... (ommited) ...
'ascii' codec can't encode characters
in position 0-1: ordinal not in
range(128)
(the first line is the request URL)
When I check the length of query, python says it has length 18, which suggests to me that '%E5%9C%B0%E9%9C%87' has not been encoded in anyway.

In principle this should work:
>>> urllib.unquote_plus('http://www.foo.com/%E5%9C%B0%E9%9C%87').decode('utf-8')
u'http://www.foo.com/\u5730\u9707'
However, note that:
unquote_plus is for application/x-form-www-urlencoded data such as POSTed forms and query string parameters. In the path part of a URL, + means a literal plus sign, not space, so you should use plain unquote here.
You shouldn't generally unquote a whole URL. Characters that have special meaning in a component of the URL will be lost. You should split the URL into parts, get the single pathname component (%E5%9C%B0%E9%9C%87) that you are interested in, and then unquote it.
(If you want to fully convert a URI to an IRI like http://www.foo.com/地震 things are a bit more complicated. Only the path/query/fragment part of an IRI is UTF-8-%-encoded; the domain name is mapped between Unicode and bytes using the oddball ‘Punycode’ IDN scheme.)
This gets received in my python server side.
What exactly is your server-side? Server, gateway, framework? And how are you getting the url variable?
You appear to be getting a UnicodeEncodeError, which is about unexpected non-ASCII characters in the input to the unquote function, not an decoding problem at all. So I suggest that something has already decoded the path part of your URL to a Unicode string of some sort. Let's see the repr of that variable!
There are unfortunately a number of serious problems with several web servers that makes using Unicode in the pathname part of a URL very unreliable, not just in Python but generally.
The main problem is that the PATH_INFO variable is defined (by the CGI specification, and subsequently by WSGI) to be pre-decoded. This is a dreadful mistake partly because of issue (1) above, which means you can't get %2F in a path part, but more seriously because decoding a %-sequence introduces a Unicode decode step that is out of the hands of the application. Server environments differ greatly in how non-ASCII %-escapes in the URL are handled, and it is often impossible to recreate the exact sequence of bytes that the web browser passed in.
IIS is a particular problem in that it will try to parse the URL path as UTF-8 by default, falling back to the wildly-unreliable system default codepage (eg. cp1252 on a Western Windows install) if the path isn't a valid UTF-8 sequence, but without telling you. You are then likely to have fairly severe problems trying to read any non-ASCII characters in PATH_INFO out of the environment variables map, because Windows envvars are Unicode but are accessed by Python 2 and many others as bytes in the system codepage.
Apache mitigates the problem by providing an extra non-standard environ REQUEST_URI that holds the original, completely undecoded URL submitted by the browser, which is easy to handle manually. However if you are using URL rewriting or error documents, that unmapped URL may not match what you thought it was going to be.
Some frameworks attempt to fix up these problems, with varying degrees of success. WSGI 1.1 is expected to make a stab at standardising this, but in the meantime the practical position we're left in is that Unicode paths won't work everywhere, and hacks to try to fix it on one server will typically break it on another.
You can always use URL rewriting to convert a Unicode path into a Unicode query parameter. Since the QUERY_STRING environ variable is not decoded outside of the application, it is much easier to handle predictably.

Assuming the HTML page is encoded in utf-8, it should just be a simple path.decode('utf-8') if the framework decodes the URLs percentage escapes.
If it doesn't, you could use:
urllib.unquote(path).decode('utf-8') if the URL is http://www.foo.com/地震
urllib.unquote_plus(path).decode('utf-8') if you're talking about a parameter sent via AJAX or in an HTML <form>
(see http://docs.python.org/library/urllib.html#urllib.unquote)
EDIT: Please supply us with the following information if you're still having problems to help us track this problem down:
Which web framework you're using inside of google app engine, e.g. Django, WebOb, CGI etc
How you're getting the URL in your app (please add a short code sample if you can)
repr(url) of when you add http://www.foo.com/地震 as the URL
Try adding this as the URL and post repr(url) so we can make sure the server isn't decoding the characters as either latin-1 or Windows-1252:
http://foo.com/¡¢£¤¥¦§¨©ª«¬®¯°±²³´µ¶·¸¹º»¼½¾¿ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿ
EDIT 2: Seeing as it's an actual URL (and not in the query section i.e. not http://www.foo.com/?param=%E5%9C%B0%E9%9C%87), doing
query = unquote(query.encode('ascii')).decode('utf-8')
is probably safe. It should be unquote and not unquote_plus if you're decoding the actual URL though. I don't know why google passes the URL as a unicode object but I doubt the actual URL passed to the app would be decoded using windows-1252 etc. I was a bit concerned as I thought it was decoding the query incorrectly (i.e. the parameters passed to GET or POST) but it doesn't seem to be doing that by the looks of it.

Usually there is a function in server-side languages to decode urls, there might be one in Python as well. You can also use the decodeURIComponent() function of javascript in your case.

urllib.unquote() doesn't like unicode-string in this case. Pass it byte-string and decode afterwards to get unicode.
This works:
>>> u = u'http://www.foo.com/%E5%9C%B0%E9%9C%87'
>>> print urllib.unquote(u.encode('ascii'))
http://www.foo.com/地震
>>> print urllib.unquote(u.encode('ascii')).decode('utf-8')
http://www.foo.com/地震
This doesn't (see also urllib.unquote decodes percent-escapes with Latin-1):
>>> print urllib.unquote(u)
http://www.foo.com/å °é
Decoding string that already unicode doesn't work:
>>> print urllib.unquote(u).decode('utf-8')
Traceback (most recent call last):
File "<input>", line 1, in <module>
File ".../lib/python2.6/encodings/utf_8.py", line
16, in decode
return codecs.utf_8_decode(input, errors, True)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 19-24: o
rdinal not in range(128)

check out this way
var uri = "https://rasamarasa.com/service/catering/ගාල්ල-Galle";
var uri_enc = encodeURIComponent(uri);
var uri_dec = decodeURIComponent(uri_enc);
var res = "Encoded URI: " + uri_enc + "<br>" + "Decoded URI: " + uri_dec;
document.getElementById("demo").innerHTML = res;
for more check this link
https://www.w3schools.com/jsref/jsref_decodeuricomponent.asp

aaaah, the dreaded
'ascii' codec can't encode characters in position... ordinal not in range
error. unavoidable when dealing with languages like Japanese in python...
this is not a url encode/decode issue in this case. your data is most likely already decoded and ready to go.
i would try getting rid of the call to 'decode' and see what happens. if you get garbage but no error it probably means people are sending you data in one of the other lovely japanese specific encodings: eucjp, iso-2022-jp, shift-jis, or perhaps even the elusive iso-2022-jp-ext which is nowadays only rarely spotted in the wild. this latter case seems pretty unlikely though.
edit: id also take a look at this for reference:
What is the difference between encode/decode?

Develop Reference

JavaScript is the programming language of the Web.