I have a object something like:
{
"category|subCategory" : value
}
Is it wrong to use "|" (which I intend to use as a delimiter) in key of an object?
It is valid. Property names may be any string.
Wrongness seems like a moral judgement that is a matter of opinion.
According to the standard, any string can be used as the key.
A string is a sequence of Unicode code points wrapped with quotation marks (U+0022). All code points may
be placed within the quotation marks except for the code points that must be escaped: quotation mark
(U+0022), reverse solidus (U+005C), and the control characters U+0000 to U+001F.
Even {"⛄|⛱|☠": "is valid"} is valid.
Related
Why does this:
console.log(/^(['"])(?:(?:\\[^])|[^\\])*\1/.test('"\"'))
result in true? Is this expected behavior or a bug? If it's expected, how to achieve intended behavior, which is to result in false as the last closing quote in the example shouldn't be matched as it's escaped? Maybe I made a mistake in writing the RegEx, in which case, I hope someone can kindly point out the error to me...
For the uninitiated, the above regular expression in JavaScript is intended to match only a complete (meaning, the matched portion should be a complete quoted string, NOT that the whole input string should be a complete quoted string.) single or double quoted string that may or not contain backslash escaped special characters. Nested levels of escaped strings may be present. Also, for simplicity, and as per requirement, the match starts from the beginning of the input string, as otherwise, a match may be possible, incorrectly, starting from an already escaped quote.
Tested in Firefox 82.0.2 and Edge 86.0.622.63
Ah, never mind! I figured out that the problem is not in the RegEx, but in the way I crafted the input string. The way I've written it, the outer string interprets the escape instead of the backslash acting as an escape for the inner string! The correct way to write it is to escape the backslash, so the above code should be rewritten as:
console.log(/^(['"])(?:(?:\\[^])|[^\\])*\1/.test('"\\"'))
So, the result is as expected after all, and not a bug!
I have a file of localized properties coming in.
The file is like this:
str1=Rawr
str2=This is a dot \u00B7
In str2, they mean that \u00B7 is the unicode and not the actual string \\u00B7. Is there anyway to parse strings to the unicode chars are converted?
Add double quotes around the value – then JSON.parse can do the job for you.
If you want to read and parse
str1=Rawr
str2=This is a dot \u00B7
as one value, then you will need to replace the line breaks with \n before doing so, otherwise it’ll break the syntax of the “string” you are passing to JSON.parse.
I have been trying to use a regexp that matches any text that is between a caret, less than and a greater than, caret.
So it would look like: ^< THE TEXT I WANT SELECTED >^
I have tried something like this, but it isn't working: ^<(.*?)>^
I'm assuming this is possible, right? I think the reason I have been having such a tough time is because the caret serves as a quantifier. Thanks for any help I get!
Update
Just so everyone knows, they following from am not i am worked
/\^<(.*?)>\^/
But, it turned out that I was getting html entities since I was getting my string by using the .innerHTML property. In other words,
> ... >
< ... <
To solve this, my regexp actually looks like this:
\^<(.*?)((.|\n)*)>\^
This includes the fact that the string in between should be any character or new line. Thanks!
You need to escape the ^ symbol since it has special meaning in a JavaScript regex.
/\^<(.*?)>\^/
In a JavaScript regex, the ^ means beginning of the string, unless the m modifier was used, in which case it means beginning of the line.
This should work:
\^<(.*?)>\^
In a regex, if you want to use a character that has a special meaning (caret, brackets, pipe, ...), you have to escape it using a backslash. For example, (\w\b)*\w\. will select a sequence of words terminated by a dot.
Careful!
If you have to pass the regex pattern as a string, i.e. there's no regex literal like in javascript or perl, you may have to use a double backslash, which the programming language will escape to a single one, which will then be processed by the regex engine.
Same regex in multiple languages:
Python:
import re
myRegex=re.compile(r"\^<(.*?)>\^") # The r before the string prevents backslash escaping
PHP:
$result=preg_match("/\\^<(.*?)>\\^/",$subject); // Notice the double backslashes here?
JavaScript:
var myRegex=/\^<(.*?)>\^/,
subject="^<blah example>^";
subject.match(myRegex);
If you tell us what programming language you're writing in, we'll be able to give you some finished code to work with.
Edit: Whoops, didn't even notice this was tagged as javascript. Then, you don't have to worry about double backslash at all.
Edit 2: \b represent a word boundary. Though I agree yours is what I would have used myself.
I encountered this regular expression that detects string literal of Unicode characters in JavaScript.
'"'("\\x"[a-fA-F0-9]{2}|"\\u"[a-fA-F0-9]{4}|"\\"[^xu]|[^"\n\\])*'"'
but I couldn't understand the role and need of
"\\x"[a-fA-F0-9]{2}
"\\"[^xu]|[^"\n\\]
My guess about 1) is that it is detecting control characters.
"\\x"[a-fA-F0-9]{2}
This is a literal \x followed by two characters from the hex-digit group.
This matches the shorter-form character escapes for the code points 0–255, \x00–\xFF. These are valid in JavaScript string literals but they aren't in JSON, where you have to use \u0000–\u00FF instead.
"\\"[^xu]|[^"{esc}\n]
This matches one of:
backslash followed by one more character, except for x or u. The valid cases for \xNN and \uNNNN were picked up in the previous |-separated clauses, so what this does is avoid matching invalid syntax like \uqX.
anything else, except for the " or newline. It is probably also supposed to be excluding other escape characters, which I'm guessing is what {esc} means. That isn't part of the normal regex syntax, but it may be some extended syntax or templating over the top of regex. Otherwise, [^"{esc}\n] would mean just any character except ", {, e, s, c, } or newline, which would be wrong.
Notably, the last clause, that picks up ‘anything else’, doesn't exclude \ itself, so you can still have \uqX in your string and get a match even though that is invalid in both JSON and JavaScript.
I am trying to do a substr on a UTF-8 string like हिन्दी.
The problem is that it becomes totally screwed up=> with some weird box in the end (does not show here, although i copy pasted) (its something like [00 02]): हिन...
okay this is how it appers after using substr function:
alt text http://img27.imageshack.us/img27/765/capturexv.png
Wondering if there is some function to solve this problem? Atleast I want to remove that funny box.
Thank you for your time.
JavaScript encodes strings with UTF-16, meaning characters outside the basic multilingual plane have to be represented as a surrogate pair. Splitting a string in the middle of such a pair might explain your results.
As I understand the wikipedia article, you'll have to check if your last character lies in the range 0xD800–0xDBFF and, if so, either drop it or add the following character (which should be in range 0xDC00-0xDFFF) to the substring.
I believe that the box is the font's representation of the UTF-8 values that the substring created. Try to remove the character at the box's position and it should be removed.
Try avoiding to put UTF-8 byte sequences into JavaScript string objects. Instead, rely on the Unicode support of JavaScript, and use a proper Unicode string (instead of an UTF-8 string).
My guess is that you managed to slice the string in the middle of a character, so that the result is an incomplete character. Browser then try to render it anyway, leading to moji-bake.