Before POST-ing a form with text fields, I'm able to convert curly quotes from word into normal quotation marks with the following JavaScript snippet:
s = s.replace( /\u201c/g, '"' );
s = s.replace( /\u201d/g, '"' );
But I've recently encountered double opening/closing quotes as shown in brackets in the Question Title, does anyone know the unicode numbers for these?
U+201C and U+201D are the Unicode characters “ and ”! You should already be catching them.
If you want to also pick up the single-quote characters ‘ and ’ and convert them to ', that would be U+2018 and U+2019.
However, this kind of replacement is a Unicode Smell. What are you trying to do here and why? ‚‘’„“”«»–— etc are perfectly valid characters and if your app can't handle them it won't be able to handle other non-ASCII characters either, which would generally be considered a Bad Thing. If at all possible, it is better to fix whatever problem these characters are currently triggering, rather than sweep it under the rug with a replacement.
You could easily find this out for yourself, in JavaScript, by using charCodeAt. You could even do it in the Firebug console:
>>> "”".charCodeAt(0).toString(16)
201d
To toString call at the end even converts it to hexadecimal for you. Remember to pad it with zeros if it's shorted than 4 digits.
Your code looks correct for unicode:
for start quote : U+201C
for end quote : U+201D
Source: http://www.cl.cam.ac.uk/~mgk25/ucs/quotes.html
HTML Escaped entities:
“ ”
Converted with this tool: http://u-n-i.co/de/
Related
need some help with nested double quotes regex,
I have the following string:
"abcd-1234\":" : value\":1234\":
and I want to capture the entire string and separate it out into key and value pair but I am not able to come with a proper regex.
Basically, I have the following string format -->
"key" : "value"
and I want to find a proper regex for the string format.
I am able capture the key and value individually with the following regex -->
((^[\"]).*\2(?![^:]))
But not able to get a proper regex for the entire string.
Please, can someone help me with the regex.
Imagine the following string: "\\" - That contains \" but is still a complete, valid string. You can't just 'ignore \" - you have to count backslashes.
(?:[^"\\]|\\.) will cover any 'in-the-string' character: Either a backslash followed by anything (. is anything), or any character at all, as long as it isn't either a backslash, or a quote. A string is a quote, followed by any amount of those, followed by a quote, thus, a regexp appears.
However, regexps probably aren't the right tool for the job. This looks like a part of a JSON formatted input; there are JSON parsers that do a much better job on this, covering far more cases.
$("<h2/>", {"class" : "wi wi"+data.today.code}).text(" " + data.city + data.today.temp.now + "F").appendTo(custom_example);
Hi there, I'm trying to alter the code above to add the degrees icon just before the (F)arenheit marker. I've tried entering + html("°") + but it doesn't work. My JS is pretty rough and I was hoping I could get a quick answer here before I spent too long trying and failing. Thanks!
I want the end result to print something like: Encinitas 65°F
Special characters are characters that must be escaped by a backslash\, like:
Single quote \'
Double quote \"
Backslash \\
The degree ° is not a special character, you can just write it, as it is.
Edit: If you want to use the unicode of °F, just write: '\u2109'.
Escape Special Characters JavaScript
JavaScript uses the \ (backslash) as an escape characters for:
\' single quote
\" double quote
\ backslash
\n new line
\r carriage return
\t tab
\b backspace
\f form feed
\v vertical tab (IE < 9 treats '\v' as 'v' instead of a vertical tab
('\x0B').
If cross-browser compatibility is a concern, use \x0B instead of \v.)
\0 null character (U+0000 NULL) (only if the next character is not a
decimal digit; else it’s an octal escape sequence)
Note that the \v and \0 escapes are not allowed in JSON strings.
First of all the degree character needs not to be escaped. So simply entering "°F" should do the job.
However, if you are in doubt with the codepage of your JS code you could use a JavaScript escape sequence. JS escape sequences are quite different from HTML escapes. The do not support decimal values at all. So first of all you have to convert 176 to hex: b0. The correctly escaped equivalent to "°F" is "\xb0F". It will work too and is more robust with respect to codepage issues of you platform's source editor.
If you really want to assign HTML code you need to use the .html() function. But this is mutual exclusive to .text(). So in this case all of your content needs to be HTML rather than plain text. Otherwise an HTML injection vulnerability arises. I.e. you need to properly escape angle brackets and some other symbols in data.city and maybe data.today.temp.now as well.
JS itself has no built-in function to escape HTML. But JQuery provides a trick: $('<div/>').text(data.city).html() will return appropriately escaped HTML. See HTML-encoding lost when attribute read from input field for more details.
I would recommend not to use .html() unless you really need it, e.g. if you want to apply styles or formatting to parts of the text only.
I'm trying to write a regex in javascript to identify string representations of arbitrary javascript functions found in json, ie. something like
{
"key": "function() { return 'I am a function'; }"
}
It's easy enough to identify the start, but I can't figure out how to identify the ending double quotes since the function might also contain escaped double quotes. My best try so far is
/"\s*function\(.*\)[^"]*/g
which works nicely if there are no double quotes in the function string. The end of a json key value will end with a double quote and a subsequent comma or closing bracket. Is there some way to retrieve all characters (including newline?) until a negated pattern such as
not "/s*, and not "/s*}
... or do I need to take a completely different approach without regex?
Here's is the current test data I'm working with:
http://regexr.com/39pvi
Seems like you want something like this,
"\s*function\(.*\)(?:\\.|[^\\"])*
It matches also the inbetween \" escaped double quotes.
DEMO
I have been trying to use a regexp that matches any text that is between a caret, less than and a greater than, caret.
So it would look like: ^< THE TEXT I WANT SELECTED >^
I have tried something like this, but it isn't working: ^<(.*?)>^
I'm assuming this is possible, right? I think the reason I have been having such a tough time is because the caret serves as a quantifier. Thanks for any help I get!
Update
Just so everyone knows, they following from am not i am worked
/\^<(.*?)>\^/
But, it turned out that I was getting html entities since I was getting my string by using the .innerHTML property. In other words,
> ... >
< ... <
To solve this, my regexp actually looks like this:
\^<(.*?)((.|\n)*)>\^
This includes the fact that the string in between should be any character or new line. Thanks!
You need to escape the ^ symbol since it has special meaning in a JavaScript regex.
/\^<(.*?)>\^/
In a JavaScript regex, the ^ means beginning of the string, unless the m modifier was used, in which case it means beginning of the line.
This should work:
\^<(.*?)>\^
In a regex, if you want to use a character that has a special meaning (caret, brackets, pipe, ...), you have to escape it using a backslash. For example, (\w\b)*\w\. will select a sequence of words terminated by a dot.
Careful!
If you have to pass the regex pattern as a string, i.e. there's no regex literal like in javascript or perl, you may have to use a double backslash, which the programming language will escape to a single one, which will then be processed by the regex engine.
Same regex in multiple languages:
Python:
import re
myRegex=re.compile(r"\^<(.*?)>\^") # The r before the string prevents backslash escaping
PHP:
$result=preg_match("/\\^<(.*?)>\\^/",$subject); // Notice the double backslashes here?
JavaScript:
var myRegex=/\^<(.*?)>\^/,
subject="^<blah example>^";
subject.match(myRegex);
If you tell us what programming language you're writing in, we'll be able to give you some finished code to work with.
Edit: Whoops, didn't even notice this was tagged as javascript. Then, you don't have to worry about double backslash at all.
Edit 2: \b represent a word boundary. Though I agree yours is what I would have used myself.
I am trying to create some random unicode strings within javascript and was wondering if there was an easy way. I tried doing something like this...
var username = "David Perry" + "/u4589";
But it just appends /u4589 to the end which is to be expected since it's just a string. What I WANT it to do is convert that into the unicode character in the string (AS IF I typed ALT 4589 on the keypad). I'm trying to build the string within javascript because I wanna test my form with various symbols and stuff and I'm tired of trying ALT codes to see what weird characters there are... so I thought.. I would loop through ALL unicode characters for FUN and populate my form and submit it automatically...
I was going to start at /u0000 and go up to /uffff and see which codes break my website when outputting them :)
I know there are different functions in JS but I can't seem to figure out why I can't build a string of unicode characters. lol.
If it's too complicated don't worry about it. It's just something I wanted to tinker with.
Try "\u4589" instead of "/u4589":
>>> "/u4589"
"/u4589"
>>> "\u4589"
"䖉"
the forward slash (/) is just a forward slash in a string, however the backslash (\) is an escape character.
If you wish to generate random characters or loop through a range of characters, then you could use String.fromCharCode(), which gives you the character with the Unicode number passed as argument, e.g. String.fromCharCode(0x4589) or String.fromCharCode(i) where i is a variable with an integer value.
Both the \uxxxx notation and the String.fromCharCode() work up to 0xFFFF only, i.e. for Basic Multilingual Plane characters. This may well suffice, but if you need non-BMP characters, check out e.g. the MDN page on fromCharCode.