Combine Regex to match variations of a String - javascript

I have a string that I'd like to pull some content from using javascript. The string can have multiple forms as follows:
[[(a*, b*) within 20]] or [[...(a*, b*) within 20]] where the "..." may or may not exist.
I'd like a regex that will match the "(a*, b*) within 20" portion.
/\[\[(.*?)\]\]/.exec(text)[1] will match [[(a*, b*) within 20]]
and
/([^\.]+)\]\]/.exec(text)[1] will match [[...(a*, b*) within 20]]
How can I combine these so that both version of the text will match "(a*, b*) within 20"?

You can use this regex:
var m = s.match(/\[\[.*?(\([^)]*\).*?)\]\]/);
if (m)
console.log(m[1]);
// (a*, b*) within 20 for both input strings

I'd like a regex that will match the (a*, b*) within 20 portion.
You can try
\[\[.*?(\(a\*, b\*\) .*?)\]\]
Here is demo on regex101
Note: you can use \w or [a-z] to make it more precise as per your need instead of a and b
\[\[.*?(\w\*, \w\*\) .*?)\]\]
Here escape chracter \ is used to escape special characters of regex pattern such as . [ [ ] ] * ( )

You can use the following to match both variations.
\[\[[^(]*(\([^)]*\)[^\]]*)\]\]
Explanation:
\[ # '['
\[ # '['
[^(]* # any character except: '(' (0 or more times)
( # group and capture to \1:
\( # '('
[^)]* # any character except: ')' (0 or more times)
\) # ')'
[^\]]* # any character except: '\]' (0 or more times)
) # end of \1
\] # ']'
\] # ']'
Working Demo

Related

Javascript RegEx to match all whitespaces except between special characters

I need a regEx to match all whitespaces except those inside # signs to make a correct split.
I have this string:
[0] == #Item 1#
With the split I need the following array (with or without # in 3rd element):
var array = ["[0]","==","#Item 1#"];
With a simple split(" ") I get this:
var array = ["[0]","==","#Item","1#"];
Thank you for your help.
You can use
const text = '[0] == #Item 1#';
console.log( text.match(/(?:#[^#]*#|\S)+/g) )
See the regex demo. The (?:#[^#]*#|\S)+ pattern means:
(?: - start of a non-capturing group:
#[^#]*# - a # char, zero or more chars other than # and then a # char
| - or
\S - any non-whitespace char
)+ - end of the group, repeat one or more times.
The /g flat tells .match() to extract all occurrences.

How to match non-escaped quoted strings and also non-quoted strings?

I have a string that contains single, double, and escaped quotations:
Telling myself 'you are \'great\' ' and then saying "thank you" feels "a \"little\" nice"
I would like a single regex to pull out:
single quoted strings
double quoted strings
strings not in quotes
Expected Result: the following groups
Telling myself
you are \'great\'
and then saying
thank you
feels
a \"little\" nice
Requirements: don't return quotes, and ignore escaped quotes
What I have so far:
Regex #1 to return single and double quotes (source):
((?<![\\])['"])((?:.(?!(?<![\\])\1))*.?)\1
Result:
Regex #2 to return non-quoted strings:
((?<![\\])['"]|^).*?((?<![\\])['"]|$)
Result:
Problems:
I am unable to make regex #2 put the non-quoted string into a consistent group
I am unable to combine regex #1 and #2 to return all strings in one regex function
How about something like this:
(?<!\\)'(.+?)(?<!\\)'|(?<!\\)"(.+?)(?<!\\)"|(.+?)(?='|"|$)
Demo.
The basic idea behind this is that it tries to match the strings with quotes first so that whatever is left after that is the strings that were not enclosed quotes. You will have all the matched strings (not including the quotes) in the capturing groups.
Shortened version:
(?<!\\)(['"])(.+?)(?<!\\)\1|(.+?)(?='|"|$)
Demo.
If you don't want to use capturing groups, you may adjust it to work with Lookarounds like the following:
(?<=(?<!\\)').+?(?=(?<!\\)')|(?<=(?<!\\)").+?(?=(?<!\\)")|(?<=^|['"]).+?(?=(?<!\\)['"]|$)
Demo.
Shortened version:
(?<=(?<!\\)(['"])).+?(?=(?<!\\)\1)|(?<=^|['"]).+?(?=(?<!\\)['"]|$)
Demo.
JS version
/(?:"([^"\\]*(?:\\[\S\s][^"\\]*)*)"|'([^'\\]*(?:\\[\S\s][^'\\]*)*)'|([^'"\\]+)|(\\[\S\s]))/
https://regex101.com/r/5xfs7q/1
PCRE - Pro level, super version ..
(?|(?|\s*((?:[^'"\\]|(?:\\[\S\s][^'"\\]*))+)(?<!\s)\s*|\s+(*SKIP)(*FAIL))|(?<!\\)(?|"([^"\\]*(?:\\[\S\s][^"\\]*)*)"|'([^'\\]*(?:\\[\S\s][^'\\]*)*)')|([\S\s]))
https://regex101.com/r/Tdyd3y/1
This is the cleanest, nicest one I've ever seen.
Wsp trim and regex contains just a single capture group.
Explained
(?| # BReset
(?| # BReset
\s* # Wsp trim
( # (1 start), Non-quoted data
(?:
[^'"\\]
| (?: \\ [\S\s] [^'"\\]* )
)+
) # (1 end)
(?<! \s )
\s* # Wsp trim
| # or,
\s+ (*SKIP) (*FAIL) # Skip intervals with all whitespace
)
|
(?<! \\ ) # Not an escape behind
(?| # BReset
"
( # (1 start), double quoted string data
[^"\\]*
(?: \\ [\S\s] [^"\\]* )*
) # (1 end)
"
| # or,
'
( # (1 start), single quoted string data
[^'\\]*
(?: \\ [\S\s] [^'\\]* )*
) # (1 end)
'
)
|
( [\S\s] ) # (1), Pass through, single char
# Un-balanced " or ' or \ at EOF
)

RegEx - No sequentially spaces

I'm trying to achieve these tasks through RegEx:
The string must start with alphabet.
String can have maximum length of 30 characters.
String may contain Numbers, Alphabets and Space ( ).
String may be case-insensitive.
String should not have more than one space sequentially.
String cannot end with Space.
After going through RegEx wiki and other RegEx questions, I've this expression:
/^([A-Z])([A-Z0-9 ]){0,29}$/i
Although, This successfully achieves task 1-4, I'm unable to find anything on task 5 and 6.
Note: I'm using Javascript for RegEx.
String should not have more than one space sequentially.
When matching a space, negative lookahead for another space.
String cannot end with Space.
Also negative lookahead for the end of the string when matching a space:
/^([A-Z])([A-Z0-9]| (?! |$)){0,29}$/i
^^^^^^^^^
This regular expression works with Ruby. I assume it will with Javascript as well.
r = /^(?!.{31})\p{Alpha}(?:\p{Alnum}| (?! ))*(?<! )$/
"The days of wine and 007" =~ r #=> 0 (a match)
"The days of wine and roses and 007" =~ r #=> nil (too long)
"The days of wine and 007" =~ r #=> nil (two consecutive spaces)
"The days of wine and 007!" =~ r #=> nil ('!' illegal)
The \p{} constructs match Unicode characters.
The regular expression can be expressed as follows in free-spacing mode (in order to document its component parts).
/
^ # beginning of string anchor
(?!.{31}) # 31 characters do not follow (neg lookahead)
\p{Alpha} # match a letter at beg of string
(?: # begin a non-capture group
\p{Alnum} # match an alphanumeric character
| # or
[ ] # match a space
(?![ ]) # a space does not follow (neg lookahead)
)* # end non-capture group and execute >= 0 times
(?<![ ]) # a space cannot precede end of string (neg lookbehind)
$ # end of string anchor
/x # free-spacing regex definition mode
Note that spaces are stripped from regexs defined in free-spacing mode, so spaces that are to be retained must be protected. I've put each in a character class ([ ]), but \s can be used as well (though that matches spaces, tabs, newlines and a few other characters, which should not be a problem).

Regular expression to detect double quoted javascript object properties with brackets

There are three ways to access JavaScript Object property.
someObject.propertyName
someObject['propertyName'] // with single quote '
someObject["propertyName"] // with double quote "
Spaces between the brackets, i.e., someObject[ 'propertyName' ] or someObject[ "propertyName" ], are allowed.
To detect all properties of the object someObject within a text file, I wrote the following regexes.
Regex regex = new Regex(#"someObject\.[a-zA-Z_]+[a-zA-Z0-9_]*"); to detect properties of the form someObject.propertyName.
regex = new Regex(#"someObject\[[ ]*'[a-zA-Z_]+[a-zA-Z0-9_]*'[ ]*\]"); to detect properties of the form someObject['propertyName'].
But I couldn't write regular expression for the properties of the form someObject["propertyName"]. Whenever I try to write " or \" within a regular expression visual studio gives error.
I found some regular expression in the internet to detect double quoted text. For example this. But I couldn't add \[ and \] in the regex, visual studio gives error.
How the properties of the form someObject["propertyName"] can be detected?
I'm using C# System.Text.RegularExpressions library.
But I couldn't write regular expression for the properties of the form someObject["propertyName"]:
You can use this regex:
\bsomeObject\[\s*(['"])(.+?)\1\s*\]
RegEx Demo
Or to match any object:
\b\w+\[\s*(['"])(.+?)\1\s*\]
In C#, regex would be like
Regex regex = new Regex(#"\bsomeObject\[\s*(['""])(.+?)\1\s*]");
RegEx Breakup:
\b # word boundary
\w+ # match any word
\[ # match opening [
\s* # match 0 or more whitespaces
(['"]) # match ' or " and capture it in group #1
(.+?) # match 0 or more any characters
\1 # back reference to group #1 i.e. match closing ' or "
\s* # match 0 or more whitespaces
\] # match closing ]

RegExp - find all occurences, but not inside quotes

I have this text (it's a string value, not a language expression):
hello = world + 'foo bar' + gizmo.hoozit + "escaped \"quotes\"";
And I would like to find all words ([a-zA-Z]+) which are not enclosed in double or single quotes. The quotes can be escaped (\" or \'). The result should be:
hello, world, gizmo, hoozit
Can I do this using regular expressions in JavaScript?
you can use this pattern, what you need is in the second capturing group:
EDIT: a little bit shorter with a negative lookahead:
var re = /(['"])(?:[^"'\\]+|(?!\1)["']|\\{2}|\\[\s\S])*\1|([a-z]+)/ig
var mystr = 'hello = world + \'foo bar\' + gizmo.hoozit + "escaped \\"quotes\\"";';
var result = Array();
while (match = re.exec(mystr)) {
if (match[2]) result.push(match[2]);
}
console.log(mystr);
console.log(result);
the idea is to match content enclosed between quotes before the target.
Enclosed content details: '(?:[^'\\]+|\\{2}|\\[\s\S])*'
(["']) # literal single quote
(?: # open a non capturing group
[^"'\\]+ # all that is not a quote or a backslash
| # OR
(?!\1)["'] # a quote but not the captured quote
| # OR
\\{2} # 2 backslashes (to compose all even numbers of backslash)*
| # OR
\\[\s\S] # an escaped character (to allow escaped single quotes)
)* # repeat the group zero or more times
\1 # the closing single quote (backreference)
(* an even number of backslashes doesn't escape anything)
You might want to use several regular expression methods one after the other for simplicity and clarity of function (large Regexes may be fast, but they're hard to construct, understand and edit): first remove all escaped quotes, then remove all quoted strings, then run your search.
var matches = string
.replace( /\\'|\\"/g, '' )
.replace( /'[^']*'|"[^']*"/g, '' )
.match( /\w+/g );
A few notes on the regular expressions involved:
The central construct in the 2nd replacement is character ('), followed by zero or more (*) of any character from the set ([]) which does not (^) conform to character (')
| means or, meaning either the part before or after the pipe can be matched
'\w' means 'any word character', and works as a shorthand for '[a-zA-Z]'
jsFiddle demo.
Replace each escaped quote with an empty string;
Replace each pair of quotes and the string between with an empty string:
If you use a capture group for the opening quote (["']) then you can use a back-reference \1 to match the same style quote at the other end of the quoted string;
Matching with a back reference means you need to use a non-greedy (match as few characters as possible) wildcard match .*? to get the minimum possible quoted string.
Finally, find the matches using your regular expression [a-zA-Z]+.
Like this:
var text = "hello = world + 'foo bar' + gizmo.hoozit + \"escaped \\\"quotes\\\"\";";
var matches = text.replace( /\\["']/g, '' )
.replace( /(["']).*?\1/g, '' )
.match( /[a-zA-Z]+/g );
console.log( matches );

Categories

Resources