The original problem:
I send a JSON string with Unicode strings (many different languages and also md5 hashes) from a Java servlet to web clients. I URLEncoder.encode("my strings", "UTF-8") the strings before creating the JSON array.
(I'm almost sure something is wrong in this approach too, and I am probably doing one encoding too much though)
Anyway:
in javascript I run a unescape() to get back the result, but spaces (encoded as +) are not decoded.
So I use .replace(/\+/g,' ') to replace + with space before calling unescape().
But:
leading and trailing + signs are omitted
and
consecutive + signs are replaced by a single space.
Please lend me a hand (or mind) :)
Use this
var string="+Salvis+Sumeet+Jacob,Srlawrjhkjh+"
var str=string.replace(/[+ ]+/g, " ");
console.log(str)
DEMO HERE
So I guess
leading and trailing + signs are omitted and consecutive + signs are replaced by a single space.
is what you want to achieve, not the outcome you currently get and want to avoid. If that's the case then
.replace(/\++/g,' ').trim()
will replace every one or more + characters with a single space, then remove leading/trailing space.
"++foo+bar++baz+".replace(/\++/g,' ').trim()
// "foo bar baz"
You may need String.prototype.trim polyfill for IE8 and older
The reason that unescape() doesn't change + to spaces is because ... it's not part of its spec.
The ascii-space-character-as-plus-sign encoding is rather non-standard (though widely supported) and dates back to early versions of HTML.
Per the spec for unescape() and escape(), the only things that are changed by unescape() are hexadecimal escape sequences in the form %XX and %uXXXX. escape() replaces unicode characters outside a small subset of unrestricted characters with such hexadecimal escape sequences; unescape(), naturally, just reverses the operation.
Related
$("<h2/>", {"class" : "wi wi"+data.today.code}).text(" " + data.city + data.today.temp.now + "F").appendTo(custom_example);
Hi there, I'm trying to alter the code above to add the degrees icon just before the (F)arenheit marker. I've tried entering + html("°") + but it doesn't work. My JS is pretty rough and I was hoping I could get a quick answer here before I spent too long trying and failing. Thanks!
I want the end result to print something like: Encinitas 65°F
Special characters are characters that must be escaped by a backslash\, like:
Single quote \'
Double quote \"
Backslash \\
The degree ° is not a special character, you can just write it, as it is.
Edit: If you want to use the unicode of °F, just write: '\u2109'.
Escape Special Characters JavaScript
JavaScript uses the \ (backslash) as an escape characters for:
\' single quote
\" double quote
\ backslash
\n new line
\r carriage return
\t tab
\b backspace
\f form feed
\v vertical tab (IE < 9 treats '\v' as 'v' instead of a vertical tab
('\x0B').
If cross-browser compatibility is a concern, use \x0B instead of \v.)
\0 null character (U+0000 NULL) (only if the next character is not a
decimal digit; else it’s an octal escape sequence)
Note that the \v and \0 escapes are not allowed in JSON strings.
First of all the degree character needs not to be escaped. So simply entering "°F" should do the job.
However, if you are in doubt with the codepage of your JS code you could use a JavaScript escape sequence. JS escape sequences are quite different from HTML escapes. The do not support decimal values at all. So first of all you have to convert 176 to hex: b0. The correctly escaped equivalent to "°F" is "\xb0F". It will work too and is more robust with respect to codepage issues of you platform's source editor.
If you really want to assign HTML code you need to use the .html() function. But this is mutual exclusive to .text(). So in this case all of your content needs to be HTML rather than plain text. Otherwise an HTML injection vulnerability arises. I.e. you need to properly escape angle brackets and some other symbols in data.city and maybe data.today.temp.now as well.
JS itself has no built-in function to escape HTML. But JQuery provides a trick: $('<div/>').text(data.city).html() will return appropriately escaped HTML. See HTML-encoding lost when attribute read from input field for more details.
I would recommend not to use .html() unless you really need it, e.g. if you want to apply styles or formatting to parts of the text only.
I have a pretty simple question, but a few simple googling and stachexchange queries were not able to answer it, so i guess i'm missing something here.
Here are my simplified parameters:
I'm using Javascript.
I have a text that needs to get URLEncoded and the text have more than 1 line.
My question is: What is the character for newline before the text get encoded? (I know that after the encoding the newline will be encoded into %0A)
I guess asking "What char is decoded when decoding %0A" will be the same.
Those codes consist of a percent sign, followed by a two character hexadecimal number representing a byte value.
So in this case, the byte value is 0A, representing the ASCII newline character. This is commonly written as \n inside strings in JavaScript (and others, like PHP).
But I think your question suggests you want to do some search and replace for this character. I would not do that, since there can be other characters too that need encoding. Instead, use the function encodeURIComponent, which can encode the entire string for you. There is encodeURI as well, but in your case, I think the first is more appropriate.
This example shows how special characters (newline, space, and others) are encoded to an url-friendly format. Note that the diacritic é translates to the two bytes of its UTF-8 representation.
document.write(encodeURIComponent("Normal text\nEéy, check the specials: /, + and \t!"));
I need to get rid of unwanted symbols, such as the multiple spaces, the leading and trailing whitespaces, as well as escape single and double quotes and other characters that may pose problems in my Neo4J Cypher query.
I currently use this (string.js Node module and jsesc Node module)
result = S(result).trim().collapseWhitespace().s;
result = jsesc(result, { 'quotes': 'double' });
They work fine, however,
1) I want to find a better, easier way to do it (preferably without those libraries) ;
2) When I use other encodings (e.g. Russian), jsesc seems to translate it into some other encoding than UTF-8 that the other parts of my script don't understand.
So I wanted to ask you if you could recommend me a RegExp that would do the job above without me having to use those modules.
Thank you!
I have a series of regex replace calls that do what you seem to be looking for, or at least the issues you mentioned. I put together a test string with several items you mentioned.
var testString = ' I start with \"unwanted items and" end with a space". Also I have Quotes ';
var cleanedString = testString.replace(/\s\s+/g, ' ').replace(/^\s|\s$/g, '').replace(/([^\\])(['"])/g, "$1\\$2");
console.log(cleanedString);
This will escape quotes (single or double) that have not yet been escaped, though you would have to worry about the case where the item is preceded by an escaped escape symbol. For example \\' would not be turned into \\\' as it should be. If you want to escape more characters you just need to add them to the final .replace regex. Let me know if there are specific examples you are looking for.
I created a html textarea with a capability to add "[" and "]" at the beginning and end of whatever text has been entered within that.
My problem is, when I enter some multiline data into the textarea, the regex is handled differently in ff and ie.
Input:
Iam
learning
regex
Expected Output: (I get this in FF )
[Iam]
[learning]
[regex]
Output in IE:
[Iam
][]
[learning
][]
[regex]
The Regex code is here:
(textareaIDelement.value).replace(/(^)(.*)(\n{0,})($)/gm, "[" + "$2" +"]");
I added the (\n{0,}) in the regex to match newlines.. but it doesn't have any effect..
Thanks
In IE, the line separator in a textarea's value property is \r\n. In all other major browsers it's \n. A simple solution would be to normalize line separators into \n first. I've also simplified the regex:
textareaIDelement.value.replace(/\r\n/g, "\n").replace(/^(.*)\n*$/gm, "[$1]");
My guess is that Firefox is using a single 0x0A (\n) as the line separator, whereas IE is using the Windows separator 0x0D 0x0A (\r\n).
Depending on the exact semantics of the regex library, it's probably matching both of the WIndows characters independently as line separators, hence it detects the end of the line followed by a 0-character line.
(This isn't an actual answer per se, as I'm not massively familiar with exactly how JS processes regex metacharacters, but hopefully it will point you in the right direction.)
Before POST-ing a form with text fields, I'm able to convert curly quotes from word into normal quotation marks with the following JavaScript snippet:
s = s.replace( /\u201c/g, '"' );
s = s.replace( /\u201d/g, '"' );
But I've recently encountered double opening/closing quotes as shown in brackets in the Question Title, does anyone know the unicode numbers for these?
U+201C and U+201D are the Unicode characters “ and ”! You should already be catching them.
If you want to also pick up the single-quote characters ‘ and ’ and convert them to ', that would be U+2018 and U+2019.
However, this kind of replacement is a Unicode Smell. What are you trying to do here and why? ‚‘’„“”«»–— etc are perfectly valid characters and if your app can't handle them it won't be able to handle other non-ASCII characters either, which would generally be considered a Bad Thing. If at all possible, it is better to fix whatever problem these characters are currently triggering, rather than sweep it under the rug with a replacement.
You could easily find this out for yourself, in JavaScript, by using charCodeAt. You could even do it in the Firebug console:
>>> "”".charCodeAt(0).toString(16)
201d
To toString call at the end even converts it to hexadecimal for you. Remember to pad it with zeros if it's shorted than 4 digits.
Your code looks correct for unicode:
for start quote : U+201C
for end quote : U+201D
Source: http://www.cl.cam.ac.uk/~mgk25/ucs/quotes.html
HTML Escaped entities:
“ ”
Converted with this tool: http://u-n-i.co/de/