I'm beginner in Javascript, I have a url containing unicode like below:
/Solutions/راه-کار-جامع-امنیت-اطلاعات
Now I need to read the path name by following code
window.location.pathname.split('/')
and in the output I have this
"", "Solutions", "%D8%B1%D8%A7%D9%87-%DA%A9%D8%A7%D8%B1-%D8%AC%D8%A7…%D8%AA->%D8%A7%D8%B7%D9%84%D8%A7%D8%B9%D8%A7%D8%AA"
How can I solve this problem?
The unicode text is url-encoded. This means that the unicode characters are translated to codes that are safe to use as a url. You can revert this using the decodeURIComponent or the decodeURI function.
The difference between these two is already nicely explained in this question. In your case you will most likely use decodeURIComponent after you performed the split.
Related
Long story short, I'm trying to "fix" my system so I'm using the same regular expressions on the backend as we are the front (validating both sides for obvious security reasons). I've got my regex server side working just fine, but getting it down to the client is a pain. My quickest thought was to simply store it in a data attribute on a tag, grab it, and then validate against it.
Well, me, think again! JS is throwing me for a loop because apparently RegExp interprets the string differently depending how it's pulled in. Can anyone shine some light on what is happening here or how I might go about resolving this issue
HTML
<span data-regex="(^\\d{5}$)|(^\\d{5}-\\d{4}$)"></span>
Javascript
new RegExp($0.dataset.regex)
//returns /(^\\d{5}$)|(^\\d{5}-\\d{4}$)/
new RegExp($($0).data('regex'))
//returns /(^\\d{5}$)|(^\\d{5}-\\d{4}$)/
new RegExp("(^\\d{5}$)|(^\\d{5}-\\d{4}$)");
//returns /(^\d{5}$)|(^\d{5}-\d{4}$)/
Note in the first two how if I pull the value from the data attribute dynamically, the constructor for RegExp for some reason doesn't interpret the double slash correctly. If, however, I copy and paste the value as a string and call RegExp on the value, it correctly interprets the double slash and returns it in the right pattern.
I've also attempted simply not escaping the \d character by double slashing on the server side, but as you might (or might not) have guessed, the opposite happens. When pulled from attributes/dataset, the \ is completely removed leading the Regex to think I'm looking for the "d" character rather than digits. I'm at a loss for understanding what JS is thinking here. Please send help, Internet
Your data attribute has redundant backslashes. There's no need to escape backslashes in HTML attributes, so you'll actually get a double-backslash where you don't want one. When writing regular expressions as strings in JavaScript you have to escape backslashes, of course.
So you don't actually have the same string on both sides, simply because escaping works differently.
I need to parse URLs from javascript source code using a different langauge(i.e. PHP and Java). The issue is that I don't know what kind of encoding is used on certain URLs(See below). Is there a standard(i.e. RFC, specifications) for this kind of encoding that I can use to implement it in the language I need? Initially I thought it simply backslashes the forward slashes but it seems is more than that as we have escaped characters such \x3d...
var _F_jsUrl = 'https:\/\/www.example.com\/accounts\/static\/_\/js\/k\x3dgaia.gaiafe_glif.en.nnMHsIffkD4.O\/m\x3dglifb,identifier,unknownerror\/am\x3dggIgAAAAAAEKBCEImA2CYiCxoQo\/rt\x3dj\/d\x3d1\/rs\x3dABkqax3Fc8CWFtgWOYXlvHJI_bE3oVSwgA';
I just don't get it.
My case is, that my application is sending all the needed GUI text by JSON at page startup from my PHP server. On my PHP server I have all text special characters written in UTF-8. Example: Für
So on the client side I have exactly the same value, and it gets displayed nicely everywhere except on input fields. When I do this with JavaScript:
document.getElementById('myInputField').value = "FÖr";
Then it is written exactly like that without any transformation into the special character.
Did I understand something wrong in UTF-8 concepts?
Thanks for any hints.
The notation ü has nothing particular to do with UTF-8. The use of character references is a common way of avoiding the need to use UTF-8; they can be used with any encoding, but if you use UTF-8, you don’t need them.
The notation ü is an HTML notation, not JavaScript. Whether it gets interpreted by HTML rules when it appears inside your JavaScript code depends on the context (like JavaScript inside an HTML document vs. separate JavaScript file). This problem is best avoided by using either characters as such or by using JavaScript notations for characters.
For example, ü means the same as ü, i.e. U+00FC, ü (u with diaeresis). The JavaScript notation, for use inside string literals, for this is \u00fc (\u followed by exactly four hexadecimal digits). E.g., the following sets the value to “Für”:
document.getElementById('myInputField').value = "F\u00fcr";
Your using whats called HTML entities to encode characters which it not the same as UTF-8, but of course a UTF-8 string can include HTML entities.
I think the problem is that tag attributes can't include HTML entities so you have to use some other encoding when assigning the text input value attribute. I think you have two options:
Decode the HTML entity on the client side. A quite ugly solution to piggyback on the decoder available in the browser (im using jQuery in the example, but you probably get the point).
inputElement.value = $("<p/>").html("FÖr").text();
Another option, which is think is nicer, is to not send HTML entities in the server response but instead use proper UTF-8 encoding for all characters which should work fine when put into text nodes or tag attributes. This assumes the HTML page uses UTF-8 encoding of course.
What is wrong with the following regular expression, which works in many online JavaScript regular expression testers (and RegEx Buddy), yet doesn't work in my application?
It is intended to replace URLs with a Hyperlink. The Javascript is found in a javascript file.
var fixed = text.replace(/\b(https?|ftp|file)://[-A-Z0-9+&##/%?=~_|$!:,.;]*[A-Z0-9+&##/%=~_|$]/ig, "<a href='$&' target='blank'>$&</a>");
Chrome, for example, complains that & is not valid (as does IE8). Is there some way to escape the ampersand (or whatever else is wrong), without resorting to the RegEx object?
Those testers let you input the regex in its raw form, but when you use it in source code you have to write it in the form of a string literal or (as is the case here) a regex literal. JavaScript uses forward-slashes for its regex-literal delimiters, so you have to escape any slashes in the regex itself to avoid confusing the interpreter.
Once you escape the slashes it should stop complaining about the ampersand. That was most likely caused by the malformed regex literal.
I recognize that regex, having used it myself the other day; you got it from RegexBuddy's Library, didn't you? If you had used RB's "Use" feature to create a JS-compatible regex, it would have escaped the slashes for you.
This works for me in Chrome
var fixed = text.replace(/(ftp|http|https):\/\/(\w+:{0,1}\w*#)?(\S+)(:[0-9]+)?(\/|\/([\w#!:.?+=&%#!\-\/]))?/igm, "<a href='$1' target='blank'>$1</a>");
I need to create an EBCDIC string within my javascript and save it into an EBCDIC database. A process on the EBCDIC system then uses the data. I haven't had any problems until I came across the character '¬'. In EBCDIC it is hex value of 5F. All of the usual letters and symbols seem to automagically convert with no problem. Any idea how I can create the EBCDIC value for '¬' within javascript so I can store it properly in the EBCDIC db?
Thanks!
If "all of the usual letters and symbols seem to automagically convert", then I very strongly suspect that you do not have to create an EBCDIC string in Javascript. The character codes for Latin letters and digits are completely different in EBCDIC than they are in Unicode, so something in your server code is already converting the strings.
Thus what you need to determine is how that process works, and specifically you need to find out how the translation maps character codes from Unicode source into the EBCDIC equivalents. Once you know that, you'll know what Unicode character to use in your Javascript code.
As a further note: every single time I've been told by an IT organization that their mainframe software requires that data be supplied in EBCDIC, that advice has been dead wrong. The fact that there's some external interface means that something in the pile of iron that makes up the mainframe and it's tentacles, something the IT people have forgotten about and probably couldn't find if they needed to, is already mapping "real world" character encodings like Unicode into EBCDIC. How does it work? Well, it may be impossible to figure out.
You might try whether this works: var notSign = "\u00AC";
edit: also: here's a good reference for HTML entities and Unicode glyphs: http://www.elizabethcastro.com/html/extras/entities.html The HTML/XML syntax uses decimal numbers for the character codes. For Javascript, you have to convert those to hex, and the notation in Javascript strings is "\u" followed by a 4-digit hex constant. (That reference isn't complete, but it's pretty easy to read and it's got lots of useful symbols.)