I am trying to HTML-Encode a string with jQuery, but I can't seem to find the right encoding format.
What I got is a String like Ütest.docx. The server doesn't handle special characters very well so that I get a FileNotFoundException from Java (I have no way of editing the server itself).
Now, I tried around and found out that the URL works when I replace Ü with %DC. Now I tought this is called HTML Encoding, googled a bit but I always get results saying something about URL-Encoding. I checked that, and it seems like this isn't the right encoding, because Ü is beeing encoded to %C3%9C, which doesn't work for the server.
Now, which encoding is it, that would encode Ü to %DC? And is there a function in javascript or jQuery that would to the encoding for me?
Thanks for any help, I've been trying to find out which encoding I need for an hour now, but no luck.
They are both URL encoding, just that the UTF-8 one is a newer standard.
If you are using Tomcat, you can use just encodeURIComponent() which uses UTF-8
and works when you set the Tomcat connector URIEncoding attribute to <connector URIEncoding="UTF-8" ...>
If that's not ok, you can use this:
function uriEncodeLegacy( str ) {
return escape(str.replace( /[\u0100-\uFFFF]/g, ""));
}
uriEncodeLegacy("Ü") //%DC
However UTF-8 is recommended, otherwise you cannot even support the € character for example.
Related
I need to parse URLs from javascript source code using a different langauge(i.e. PHP and Java). The issue is that I don't know what kind of encoding is used on certain URLs(See below). Is there a standard(i.e. RFC, specifications) for this kind of encoding that I can use to implement it in the language I need? Initially I thought it simply backslashes the forward slashes but it seems is more than that as we have escaped characters such \x3d...
var _F_jsUrl = 'https:\/\/www.example.com\/accounts\/static\/_\/js\/k\x3dgaia.gaiafe_glif.en.nnMHsIffkD4.O\/m\x3dglifb,identifier,unknownerror\/am\x3dggIgAAAAAAEKBCEImA2CYiCxoQo\/rt\x3dj\/d\x3d1\/rs\x3dABkqax3Fc8CWFtgWOYXlvHJI_bE3oVSwgA';
I have an url like: file:///C:/Users/index.html?Scale:%20Service-Qualität
When I use window.location.search to get the parameter in the url, in this case the parameter should be Scale: Service-Qualität but what I actually received was Scale:%20Service-Qualit%C3%A4t, I dont know why my character ä changed to %C3%A4 and when I tested in the console it displayed as Scale: Service-Qualität
Can anyone help me to fix this problem?
I found the solution for my problem. What I need to do is decode again my url using: decodeURIComponent(url); then I will get again exact url string.
You are seeing two issues here.
The ä being converted into %C3%A4 is called URL or percent encoding.
It's because URLs can't, technically, contain Unicode characters.
Browsers and servers work around this by converting non-ASCII characters in URLs to their percent encoded equivalents.
It's generally nothing to worry about.
In your case however, there seems to be an actual problem as well. The weird output in the console could be because your web page uses a single-byte encoding (like ISO-8859-1) instead of UTF-8.
Switching the web page to UTF-8 might solve the problem, using this Meta tag:
<meta charset="utf-8"/>
and, of course, saving the HTML file as UTF-8 in your editor.
I just don't get it.
My case is, that my application is sending all the needed GUI text by JSON at page startup from my PHP server. On my PHP server I have all text special characters written in UTF-8. Example: Für
So on the client side I have exactly the same value, and it gets displayed nicely everywhere except on input fields. When I do this with JavaScript:
document.getElementById('myInputField').value = "FÖr";
Then it is written exactly like that without any transformation into the special character.
Did I understand something wrong in UTF-8 concepts?
Thanks for any hints.
The notation ü has nothing particular to do with UTF-8. The use of character references is a common way of avoiding the need to use UTF-8; they can be used with any encoding, but if you use UTF-8, you don’t need them.
The notation ü is an HTML notation, not JavaScript. Whether it gets interpreted by HTML rules when it appears inside your JavaScript code depends on the context (like JavaScript inside an HTML document vs. separate JavaScript file). This problem is best avoided by using either characters as such or by using JavaScript notations for characters.
For example, ü means the same as ü, i.e. U+00FC, ü (u with diaeresis). The JavaScript notation, for use inside string literals, for this is \u00fc (\u followed by exactly four hexadecimal digits). E.g., the following sets the value to “Für”:
document.getElementById('myInputField').value = "F\u00fcr";
Your using whats called HTML entities to encode characters which it not the same as UTF-8, but of course a UTF-8 string can include HTML entities.
I think the problem is that tag attributes can't include HTML entities so you have to use some other encoding when assigning the text input value attribute. I think you have two options:
Decode the HTML entity on the client side. A quite ugly solution to piggyback on the decoder available in the browser (im using jQuery in the example, but you probably get the point).
inputElement.value = $("<p/>").html("FÖr").text();
Another option, which is think is nicer, is to not send HTML entities in the server response but instead use proper UTF-8 encoding for all characters which should work fine when put into text nodes or tag attributes. This assumes the HTML page uses UTF-8 encoding of course.
The SO post below is comprehensive, but all three methods described fail to encode for periods.
Post: Encode URL in JavaScript?
For instance, if I run the three methods (i.e., escape, encodeURI, encodeURIComponent), none of them encode periods.
So "food.store" comes out as "food.store," which breaks the URL. It breaks the URL because the Rails app cannot recognize the URL as valid and displays the 404 error page. Perhaps it's a configuration mistake in the Rails routes file?
What's the best way to encode periods with Javascript for URLs?
I know this is an old thread, but I didn't see anywhere here any examples of URLs that were causing the original problem. I encountered a similar problem myself a couple of days ago with a Java application. In my case, the string with the period was at the end of the path element of the URL eg.
http://myserver.com/app/servlet/test.string
In this case, the Spring library I'm using was only passing me the 'test' part of that string to the relevant annotated method parameter of my controller class, presumably because it was treating the '.string' as a file extension and stripping it away. Perhaps this is the same underlying issue with the original problem above?
Anyway, I was able to workaround this simply by adding a trailing slash to the URL. Just throwing this out there in case it is useful to anybody else.
John
Periods shouldn't break the url, but I don't know how you are using the period, so I can't really say. None of the functions I know of encode the '.' for a url, meaning you will have to use your own function to encode the '.' .
You could base64 encode the data, but I don't believe there is a native way to do that in js. You could also replace all periods with their ASCII equivalent (%2E) on both the client and server side.
Basically, it's not generally necessary to encode '.', so if you need to do it, you'll need to come up with your own solution. You may want to also do further testing to be sure the '.' will actually break the url.
hth
I had this same problem where my .htaccess was breaking input values with .
Since I did not want to change what the .htaccess was doing I used this to fix it:
var val="foo.bar";
var safevalue=encodeURIComponent(val).replace(/\./g, '%2E');
this does all the standard encoding then replaces . with there ascii equivalent %2E. PHP automatically converts back to . in the $_REQUEST value but the .htaccess doesn't see it as a period so things are all good.
Periods do not have to be encoded in URLs. Here is the RFC to look at.
If a period is "breaking" something, it may be that your server is making its own interpretation of the URL, which is a fine thing to do of course but it means that you have to come up with some encoding scheme of your own when your own metacharacters need escaping.
I had the same question and maybe my solution can help someone else in the future.
In my case the url was generated using javascript. Periods are used to separate values in the url (sling selectors), so the selectors themselves weren't allowed to have periods.
My solution was to replace all periods with the html entity as is Figure 1:
Figure 1: Solution
var urlPart = 'foo.bar';
var safeUrlPart = encodeURIComponent(urlPart.replace(/\./g, '.'));
console.log(safeUrlPart); // foo%26%2346%3Bbar
console.log(decodeURIComponent(safeUrlPart)); // foo.bar
I had problems with .s in rest api urls. It is the fact that they are interpreted as extensions which in it's own way makes sense. Escaping doesn't help because they are unescaped before the call (as already noted). Adding a trailing / didn't help either. I got around this by passing the value as a named argument instead. e.g. api/Id/Text.string to api/Id?arg=Text.string. You'll need to modify the routing on the controller but the handler itself can stay the same.
If its possible using a .htaccess file would make it really cool and easy. Just add a \ before the period. Something like:\.
It is a rails problem, see Rails REST routing: dots in the resource item ID for an explanation (and Rails routing guide, Sec. 3.2)
You shouldn't be using encodeURI() or encodeURIComponent() anyway.
console.log(encodeURIComponent('%^&*'));
Input: %^&*. Output: %25%5E%26*. So, to be clear, this doesn't convert *. Hopefully you know this before you run rm * after "cleansing" that input server-side!
Luckily, MDN gave us a work-around to fix this glaring problem, fixedEncodeURI() and fixedEncodeURIComponent(), which is based on this regex: [!'()*]. (Source: MDN Web Docs: encodeURIComponent().) Just rewrite it to add in a period and you'll be fine:
function fixedEncodeURIComponent(str) {
return encodeURIComponent(str).replace(/[\.!'()*]/g, function(c) {
return '%' + c.charCodeAt(0).toString(16);
});
}
console.log(fixedEncodeURIComponent('hello.'));
I have a Javascript bookmarklet that, when clicked on, redirects the user to a new webpage and supplies the URL of the old webpage as a parameter in the query string.
I'm running into a problem when the original webpage has a double hyphen in the URL (ex. page--1--of--3.html). Stupid, I know - I can't control the original page The javascript escape function I'm using does not escape the hyphen, and IIS 6 gives a file not found error if asked to serve resource.aspx?original=page--1--of--3.html
Is there an alternative javascript escape function I can use? What is the best way to solve this problem? Does anybody know why IIS chokes on resource.aspx?original=page--1 and not page-1?
"escape" and "unescape" are deprecated precisely because it doesn't encode all the relevant characters. DO NOT USE ESCAPE OR UNESCAPE. use "encodeURIComponent" and "decodeURIComponent" instead. Supported in all but the oldest most decrepit browsers. It's really a huge shame this knowledge isn't much more common.
(see also encodeURI and decodeURI)
edit: err just tested, but this doesn't really cover the double hyphens still. Sorry.
Can you expand the escape function with some custom logic to encode the hypen's manually?
resource.aspx?original=page%2d%2d1%2d%2dof%2d%2d3.html
Something like this:
function customEscape(url) {
url = escape(url);
url = url.replace(/-/g, '%2d');
return url;
}
location.href = customEscape("resource.axd?original=test--page.html");
Update, for a bookmarklet:
Link
You're doing something else wrong. -- is legal in URLs and filenames. Maybe the file really isn't found?
-- is used to comment out text in a few scripting languages. SQL Server uses it to add comments. Do you use any database logic to store those filenames? Or create any queries where this name is part of the query string instead of using query parameters?