Regular expression to detect double quoted javascript object properties with brackets

Regular expression to detect double quoted javascript object properties with brackets - javascript

There are three ways to access JavaScript Object property.
someObject.propertyName
someObject['propertyName'] // with single quote '
someObject["propertyName"] // with double quote "
Spaces between the brackets, i.e., someObject[ 'propertyName' ] or someObject[ "propertyName" ], are allowed.
To detect all properties of the object someObject within a text file, I wrote the following regexes.
Regex regex = new Regex(#"someObject\.[a-zA-Z_]+[a-zA-Z0-9_]*"); to detect properties of the form someObject.propertyName.
regex = new Regex(#"someObject\[[ ]*'[a-zA-Z_]+[a-zA-Z0-9_]*'[ ]*\]"); to detect properties of the form someObject['propertyName'].
But I couldn't write regular expression for the properties of the form someObject["propertyName"]. Whenever I try to write " or \" within a regular expression visual studio gives error.
I found some regular expression in the internet to detect double quoted text. For example this. But I couldn't add \[ and \] in the regex, visual studio gives error.
How the properties of the form someObject["propertyName"] can be detected?
I'm using C# System.Text.RegularExpressions library.

But I couldn't write regular expression for the properties of the form someObject["propertyName"]:
You can use this regex:
\bsomeObject\[\s*(['"])(.+?)\1\s*\]
RegEx Demo
Or to match any object:
\b\w+\[\s*(['"])(.+?)\1\s*\]
In C#, regex would be like
Regex regex = new Regex(#"\bsomeObject\[\s*(['""])(.+?)\1\s*]");
RegEx Breakup:
\b # word boundary
\w+ # match any word
\[ # match opening [
\s* # match 0 or more whitespaces
(['"]) # match ' or " and capture it in group #1
(.+?) # match 0 or more any characters
\1 # back reference to group #1 i.e. match closing ' or "
\s* # match 0 or more whitespaces
\] # match closing ]

Related

Regular Expression to match text between # and only if # is not preceded by '

Hello I'm trying to find a regular expression that can help me find all matches inside a string when they're inside # and only if # are not preceded by an apostrophe "'".
Basically I need to bold the text just as here when we use double * to bold text like this, but the apostrophe should work as an escape character.
For example
#Hello my name is Noé# should look like Hello my name is Noé
#Hello this has an escape apostrophe '# so I'll match until here# should look like Hello this has an escape apostrophe '# so I'll match until here
Inside a long text there might or might not be several matches:
"Hello I'm a text #I'm bold#, and I need to know how to match my text that's inside two '#, and #I will not match either 'cause I got no end"
So i can print it like
"Hello I'm a text I'm bold, and I need to know how to match my text that's inside two '#, and #I will not match either 'cause I got no end"
If thats not possible with a RegExp I could program a finite state machine, but I was hoping I was possible, thank you in advance God bless you!
Note: I will handle the escape characters later by now I just need to know how to mach this
/(?<!')#.*(?<!')#/gim
This was the only thing I could come up with, but honestly, I have no idea how negative look behind works :(, with this regexp it would match wrong. For example, if I type:
"I'm a text #and I should be a match# and this should not #But this should as well# and I'm just some random extra text"
matches from the first # occurrence until the last one, like so:
"I'm a text #and I should be a match# and this should not #But this should as well# and I'm just some random extra text"

I think this should work:
(?<!')#(.*?)(?<!')#
Here you can see the regexp working with your examples: https://regex101.com/r/wnguiA/1
(?<!') is Negative Lookbehind, it tells the regex engine to temporarily step backwards in the string, to check if the text inside the lookbehind can be matched there. (?<!a)b matches a b that is not preceded by an a.
More easy is the (.*?) that matches any character (except for line terminators); adding ? tells the capturing group to be not-greedy and stop at the first occourence of the succesive token.

To prevent triggering the negatilve lookbehind at all the positions not asserting a ' to the left, you can also first match # and do the assertion after it.
#(?<!'#)(.*?)#(?<!'#)
Regex demo
Another option instead of using the non greedy .*? is to use a negated character class matching any char except #
Then when you encounter # only match it if there is ' before it using a positive lookbehind.
#(?<!'#)([^#\n]*(?:#(?<='#)[^#\n]*)*)#(?<!'#)
#(?<!'#) Match # not directly preceded by '
( Capture group 1
[^#\n]* Optionally match any char except # or a newline
(?: Non capture group
#(?<='#) Match # not directly preceded by '
[^#\n]* Match optional repetitions of any char except # or a newline
)* Close non capture group and optionally repeat it to match all occurrences
) Close group 1
#(?<!'#) Match # not directly preceded by '
Regex demo

How to only catch what's between quotation marks in the following regex?

The following regex matches substrings inside quotation marks:
^("[^"]*")$
"Dialogue," she said. "More dialogue."
I don't want to catch the quotation marks (only what's inside the quotation marks). So I figured I should use a lookahead and a lookbehind:
^((?<=")[^"]*(?="))$
But now the regex isn't matching anything.
Why is this? And how to fix it?
https://regexr.com/5spdt
EDIT: Removing the outer capture group kind of worked, but now she said is being caputerd too. (?<=")[^"]*(?=")

You get too much matches, as the assertions to not match the " so anything between 2 double quotes is a match.
You can assert a " to the left, the match all except " until you can assert a " to the right followed by optional pairs of "" till the end of the string.
Assuming no escaped double quotes between the double quotes
(?<=")[^"]*(?="(?:[^"]*"[^"]*")*[^"]*$)
(?<=") Positive lookbehind, assert " directly to the left of the current position
[^"]* Match 0+ times any char except "
(?= Positive lookahead, assert to the right
" Match closing "
(?:[^"]*"[^"]*")* Match optional pairs of ""
[^"]*$ Match option char other than " and assert end of string
) Close lookahead
Regex demo

KISS
The regex in the question is overly specific (exploded):
^ # Start of string
( # Begin capturing group
"
[^"]*
"
) # End capturing group
$ # End of string
This will only match strings of the form:
"some string"
It would not, for example, match strings of the form:
anything "some string" (does not start with a quote
"some string" anything (does not end with a quote)
So given the goal is to capture quoted strings, just don't include the quotes in the capturing group:
"([^"]*)"
And then reference the capturing group, not the whole matching string.
Applied to Javascript
Consider the following code:
input = '"one" something "two" something "three" etc.';
regex = /"([^"]*)"/;
match = input.match(regex);
Match contains: ["\"one\"", "one"] - the 0 entry is the full matching string, the 1 entry is the first capturing group. Adapt js code as relevant.

Allow space in regex when validating file

I've got a text box where I wanted to ensure some goods and bads out of it.
For instance good could include:
GoodString
GoodString88
99GoodString
Some bad things I did not want to include was:
Good*String
Good&String
But one thing I wanted to allow would be to allow spaces between words so this should of been good:
Good String
However my regex/js is stating this is NOT a good string - I want to allow it. I'm using the test routine for this and I'm as dumb as you can get with regexes. I don't know why I can never understand these things...
In any event my validation is as follows:
var rx = /^[\w.-]+$/;
if (!rx.test($("#MainContent_txtNewDocumentTitle").val())) {
//code for bad string
}else{
//code for good string
}
What can I do to this:
var rx = /^[\w.-]+$/;
Such that spaces are allowed?

You can use this regex instead to allow space only in middle (not at start/end):
var rx = /^[\w.-]+(?:[ \t]+[\w.-]+)*$/gm;
RegEx Demo
RegEx Breakup:
^ # line start
[\w.-]+ # match 1 or more of a word character or DOT or hyphen
(?: # start a non-capturing group
[ \t]+ # match one or more space or tab
[\w.-]+ # match 1 or more of a word character or DOT or hyphen
)* # close the non-capturing group. * will allow 0 or more matches of group
$ # line end
/gm # g for global and m for multiline matches
RegEx Reference

How to prevent regex from validating double dots after # character

I am using the following regex in a js
^[a-zA-Z0-9._+-]+#[a-zA-Z0-9.-]+\.[a-zA-Z]{2,4}$
This validates email in subdomain (ex: myname#google.co.in)
Unfortunately a double dot is also validated as true, such as
myname#..in
myname#domain..in
I understand the part #[a-zA-Z0-9.-] is to be modified but kinda struck. What is the best way to proceed.
TIA

Try using:
^([\w+-]+\.)*[\w+-]+#([\w+-]+\.)*[\w+-]+\.[a-zA-Z]{2,4}$
I've replaced the [a-zA-Z0-9_] with the exact equivalent \w in the char group.
Note that in the regex language the dot . is a special char that matches everything (but newlines). So to match a literal dot you need to escape it \..
Legenda:
^ start of the string
([\w+-]+\.)* zero or more regex words (in addiction to plus + and minus-) composed by 1 or more chars followed by a literal dot \.
[\w+-]+ regex words (plus [+-]) of 1 or more chars
# literal char
([\w+-]+\.)*[\w+-]+ same sequence as above
\.[a-zA-Z]{2,4} literal dot followed by a sequence of lowercase or uppercase char with a length between 2 and 4 chars.
$ end of the string

Try this:
^([a-zA-Z0-9._+-]+)(#[a-zA-Z0-9-]+)(.[a-zA-Z]{2,4}){2,}$
You can test it here - https://regex101.com/r/Ihj8sd/1

Match everything but not quoted strings

I want to match everything but no quoted strings.
I can match all quoted strings with this: /(("([^"\\]|\\.)*")|('([^'\\]|\\.)*'))/
So I tried to match everything but no quoted strings with this: /[^(("([^"\\]|\\.)*")|('([^'\\]|\\.)*'))]/ but it doesn't work.
I would like to use only regex because I will want to replace it and want to get the quoted text after it back.
string.replace(regex, function(a, b, c) {
// return after a lot of operations
});
A quoted string is for me something like this "bad string" or this 'cool string'
So if I input:
he\'re is "watever o\"k" efre 'dder\'4rdr'?
It should output this matches:
["he\'re is ", " efre ", "?"]
And than I wan't to replace them.
I know my question is very difficult but it is not impossible! Nothing is impossible.
Thanks

EDIT: Rewritten to cover more edge cases.
This can be done, but it's a bit complicated.
result = subject.match(/(?:(?=(?:(?:\\.|"(?:\\.|[^"\\])*"|[^\\'"])*'(?:\\.|"(?:\\.|[^"'\\])*"|[^\\'])*')*(?:\\.|"(?:\\.|[^"\\])*"|[^\\'])*$)(?=(?:(?:\\.|'(?:\\.|[^'\\])*'|[^\\'"])*"(?:\\.|'(?:\\.|[^'"\\])*'|[^\\"])*")*(?:\\.|'(?:\\.|[^'\\])*'|[^\\"])*$)(?:\\.|[^\\'"]))+/g);
will return
, he said.
, she replied.
, he reminded her.
,
from this string (line breaks added and enclosing quotes removed for clarity):
"Hello", he said. "What's up, \"doc\"?", she replied.
'I need a 12" crash cymbal', he reminded her.
"2\" by 4 inches", 'Back\"\'slashes \\ are OK!'
Explanation: (sort of, it's a bit mindboggling)
Breaking up the regex:
(?:
(?= # Assert even number of (relevant) single quotes, looking ahead:
(?:
(?:\\.|"(?:\\.|[^"\\])*"|[^\\'"])*
'
(?:\\.|"(?:\\.|[^"'\\])*"|[^\\'])*
'
)*
(?:\\.|"(?:\\.|[^"\\])*"|[^\\'])*
$
)
(?= # Assert even number of (relevant) double quotes, looking ahead:
(?:
(?:\\.|'(?:\\.|[^'\\])*'|[^\\'"])*
"
(?:\\.|'(?:\\.|[^'"\\])*'|[^\\"])*
"
)*
(?:\\.|'(?:\\.|[^'\\])*'|[^\\"])*
$
)
(?:\\.|[^\\'"]) # Match text between quoted sections
)+
First, you can see that there are two similar parts. Both these lookahead assertions ensure that there is an even number of single/double quotes in the string ahead, disregarding escaped quotes and quotes of the opposite kind. I'll show it with the single quotes part:
(?= # Assert that the following can be matched:
(?: # Match this group:
(?: # Match either:
\\. # an escaped character
| # or
"(?:\\.|[^"\\])*" # a double-quoted string
| # or
[^\\'"] # any character except backslashes or quotes
)* # any number of times.
' # Then match a single quote
(?:\\.|"(?:\\.|[^"'\\])*"|[^\\'])*' # Repeat once to ensure even number,
# (but don't allow single quotes within nested double-quoted strings)
)* # Repeat any number of times including zero
(?:\\.|"(?:\\.|[^"\\])*"|[^\\'])* # Then match the same until...
$ # ... end of string.
) # End of lookahead assertion.
The double quotes part works the same.
Then, at each position in the string where these two assertions succeed, the next part of the regex actually tries to match something:
(?: # Match either
\\. # an escaped character
| # or
[^\\'"] # any character except backslash, single or double quote
) # End of non-capturing group
The whole thing is repeated once or more, as many times as possible. The /g modifier makes sure we get all matches in the string.
See it in action here on RegExr.

Here is a tested function that does the trick:
function getArrayOfNonQuotedSubstrings(text) {
/* Regex with three global alternatives to section the string:
('[^'\\]*(?:\\[\S\s][^'\\]*)*') # $1: Single quoted string.
| ("[^"\\]*(?:\\[\S\s][^"\\]*)*") # $2: Double quoted string.
| ([^'"\\]*(?:\\[\S\s][^'"\\]*)*) # $3: Un-quoted string.
*/
var re = /('[^'\\]*(?:\\[\S\s][^'\\]*)*')|("[^"\\]*(?:\\[\S\s][^"\\]*)*")|([^'"\\]*(?:\\[\S\s][^'"\\]*)*)/g;
var a = []; // Empty array to receive the goods;
text = text.replace(re, // "Walk" the text chunk-by-chunk.
function(m0, m1, m2, m3) {
if (m3) a.push(m3); // Push non-quoted stuff into array.
return m0; // Return this chunk unchanged.
});
return a;
}
This solution uses the String.replace() method with a replacement callback function to "walk" the string section by section. The regex has three global alternatives, one for each section; $1: single quoted, $2: double quoted, and $3: non-quoted substrings, Each non-quoted chunk is pushed onto the return array. It correctly handles all escaped characters, including escaped quotes, both inside and outside quoted strings. Single quoted substrings may contain any number of double quotes and vice-versa. Illegal orphan quotes are removed and serve to divide a non-quoted section into two chunks. Note that this solution requires no lookaround and requires only one pass. It also implements Friedl's "Unrolling-the-Loop" efficiency technique and is quite efficient.
Additional: Here is some code to test the function with the original test string:
// The original test string (with necessary escapes):
var s = "he\\'re is \"watever o\\\"k\" efre 'dder\\'4rdr'?";
alert(s); // Show the test string without the extra backslashes.
console.log(getArrayOfNonQuotedSubstrings(s).toString());

You can't invert a regex. What you have tried was making a character class out of it and invert that - but also for doing that you would have to escape all closing brackets "\]".
EDIT: I would have started with
/(^|" |' ).+?($| "| ')/
This matches anything between the beginning or the end of a quoted string (very simple: a quotation mark plus a blank) and the end of the string or the start of a quoted string (a blank plus a quotation mark). Of course this doesn't handle any escape sequences or quotations which don't follow the scheme / ['"].*['"] /. See above answers for more detailed expressions :-)

Develop Reference

JavaScript is the programming language of the Web.

Regular expression to detect double quoted javascript object properties with brackets - javascript

Related

Regular Expression to match text between # and only if # is not preceded by '

How to only catch what's between quotation marks in the following regex?

Allow space in regex when validating file

How to prevent regex from validating double dots after # character

Match everything but not quoted strings

Categories

Resources