Check for open, unescaped quotes in a given string - javascript

A part of the application I'm building allows you to evaluate bash commands in an interactive terminal. On enter, the command is run. I'm trying to make it a bit more flexible, and allow for commands spanning multiple lines.
I already check for a trailing backslash, now I'm trying to figure out how to tell if there is an open string. I have not been successful in writing a regex for this, as it should also support escaped quotes.
For example:
echo "this is a
\"very\" cool quote"

If you want a regex that matches a string (subject) only if it doesn't contain unbalanced (unescaped) quotes, then try the following:
/^(?:[^"\\]|\\.|"(?:\\.|[^"\\])*")*$/.test(subject)
Explanation:
^ # Match the start of the string.
(?: # Match either...
[^"\\] # a character besides quotes or backslash
| # or
\\. # any escaped character
| # or
" # a closed string, i. e. one that starts with a quote,
(?: # followed by either
\\. # an escaped character
| # or
[^"\\] # any other character except quote or backslash
)* # any number of times,
" # and a closing quote.
)* # Repeat as often as needed.
$ # Match the end of the string.

Related

How to match non-escaped quoted strings and also non-quoted strings?

I have a string that contains single, double, and escaped quotations:
Telling myself 'you are \'great\' ' and then saying "thank you" feels "a \"little\" nice"
I would like a single regex to pull out:
single quoted strings
double quoted strings
strings not in quotes
Expected Result: the following groups
Telling myself
you are \'great\'
and then saying
thank you
feels
a \"little\" nice
Requirements: don't return quotes, and ignore escaped quotes
What I have so far:
Regex #1 to return single and double quotes (source):
((?<![\\])['"])((?:.(?!(?<![\\])\1))*.?)\1
Result:
Regex #2 to return non-quoted strings:
((?<![\\])['"]|^).*?((?<![\\])['"]|$)
Result:
Problems:
I am unable to make regex #2 put the non-quoted string into a consistent group
I am unable to combine regex #1 and #2 to return all strings in one regex function
How about something like this:
(?<!\\)'(.+?)(?<!\\)'|(?<!\\)"(.+?)(?<!\\)"|(.+?)(?='|"|$)
Demo.
The basic idea behind this is that it tries to match the strings with quotes first so that whatever is left after that is the strings that were not enclosed quotes. You will have all the matched strings (not including the quotes) in the capturing groups.
Shortened version:
(?<!\\)(['"])(.+?)(?<!\\)\1|(.+?)(?='|"|$)
Demo.
If you don't want to use capturing groups, you may adjust it to work with Lookarounds like the following:
(?<=(?<!\\)').+?(?=(?<!\\)')|(?<=(?<!\\)").+?(?=(?<!\\)")|(?<=^|['"]).+?(?=(?<!\\)['"]|$)
Demo.
Shortened version:
(?<=(?<!\\)(['"])).+?(?=(?<!\\)\1)|(?<=^|['"]).+?(?=(?<!\\)['"]|$)
Demo.
JS version
/(?:"([^"\\]*(?:\\[\S\s][^"\\]*)*)"|'([^'\\]*(?:\\[\S\s][^'\\]*)*)'|([^'"\\]+)|(\\[\S\s]))/
https://regex101.com/r/5xfs7q/1
PCRE - Pro level, super version ..
(?|(?|\s*((?:[^'"\\]|(?:\\[\S\s][^'"\\]*))+)(?<!\s)\s*|\s+(*SKIP)(*FAIL))|(?<!\\)(?|"([^"\\]*(?:\\[\S\s][^"\\]*)*)"|'([^'\\]*(?:\\[\S\s][^'\\]*)*)')|([\S\s]))
https://regex101.com/r/Tdyd3y/1
This is the cleanest, nicest one I've ever seen.
Wsp trim and regex contains just a single capture group.
Explained
(?| # BReset
(?| # BReset
\s* # Wsp trim
( # (1 start), Non-quoted data
(?:
[^'"\\]
| (?: \\ [\S\s] [^'"\\]* )
)+
) # (1 end)
(?<! \s )
\s* # Wsp trim
| # or,
\s+ (*SKIP) (*FAIL) # Skip intervals with all whitespace
)
|
(?<! \\ ) # Not an escape behind
(?| # BReset
"
( # (1 start), double quoted string data
[^"\\]*
(?: \\ [\S\s] [^"\\]* )*
) # (1 end)
"
| # or,
'
( # (1 start), single quoted string data
[^'\\]*
(?: \\ [\S\s] [^'\\]* )*
) # (1 end)
'
)
|
( [\S\s] ) # (1), Pass through, single char
# Un-balanced " or ' or \ at EOF
)

Livecycle RegExp - trouble with decimal

Within Livecycle, I am validating that the number entered is a 0 through 10 and allows quarter hours. With the help of this post, I've written the following.
if (!xfa.event.newText.match(/^(([10]))$|^((([0-9]))$|^((([0-9]))\.?((25)|(50)|(5)|(75)|(0)|(00))))$/))
{
xfa.event.change = "";
};
The problem is periods are not being accepted. I have tried wrapping the \. in parenthesis but that did not work either. The field is a text field with no special formatting and the code in the change event.
Yikes, that's a convoluted regex. This can be simplified a lot:
/^(?:10|[0-9](?:\.(?:[27]?5)?0*)?)$/
Explanation:
^ # Start of string
(?: # Start of group:
10 # Either match 10
| # or
[0-9] # Match 0-9
(?: # optionally followed by this group:
\. # a dot
(?:[27]?5)? # either 25, 75 or 5 (also optional)
0* # followed by optional zeroes
)? # As said before, make the group optional
) # End of outer group
$ # End of string
Test it live on regex101.com.

Javascript Regular expression for currency amount with spaces

I have this regular expression
/^[',",\+,<,>,\(,\*,\-,%]?([£,$,€]?\d+([\,,\.]\d+)?[£,$,€]?\s*[\-,\/,\,,\.,\+]?[\/]?\s*)+[',",\+, <,>,\),\*,\-,%]?$/
It matches this very well $55.5, but in few of my test data I have some values like $ 55.5 (I mean, it has a space after $ sign).
The answers on this link are not working for me.
Currency / Percent Regular Expression
So, how can I change it to accept the spaces as well?
Try following RegEx:
/^[',",\+,<,>,\(,\*,\-,%]?([£,$,€]?\s*\d+([\,,\.]\d+)?[£,$,€]?\s*[\-,\/,\,,\.,\+]?[\/]?\s*)+[',",\+, <,>,\),\*,\-,%]?$/
Let me know if it worked!
Demo Here
TLDR:
/^[',",\+,<,>,\(,\*,\-,%]?([£,$,€]?\s*\d+([\,,\.]\d+)?[£,$,€]?\s*[\-,\/,\,,\.,\+]?[\/]?\s*)+[',",\+, <,>,\),\*,\-,%]?$/
The science bit
Ok, I'm guessing that you didn't construct the original regular expression, so here are the pieces of it, with the addition marked:
^ # match from the beginning of the string
[',",\+,<,>,\(,\*,\-,%]? # optionally one of these symbols
( # start a group
[£,$,€]? # optionally one of these symbols
\s* # <--- NEW ADDITION: optionally one or more whitespace
\d+ # then one or more decimal digits
( # start group
[\,,\.] # comma or a dot
\d+ # then one or more decimal digits
)? # group optional (comma/dot and digits or neither)
[£,$,€]? # optionally one of these symbols
\s* # optionally whitespace
[\-,\/,\,,\.,\+]? # optionally one of these symbols
[\/]? # optionally a /
\s* # optionally whitespace
)+ # this whole group one or more times
[',",\+, <,>,\),\*,\-,%]? # optionally one of these symbols
$ # match to the end of the string
Much of this is poking about matching stuff around the currency amount, so you could reduce that.

Match everything but not quoted strings

I want to match everything but no quoted strings.
I can match all quoted strings with this: /(("([^"\\]|\\.)*")|('([^'\\]|\\.)*'))/
So I tried to match everything but no quoted strings with this: /[^(("([^"\\]|\\.)*")|('([^'\\]|\\.)*'))]/ but it doesn't work.
I would like to use only regex because I will want to replace it and want to get the quoted text after it back.
string.replace(regex, function(a, b, c) {
// return after a lot of operations
});
A quoted string is for me something like this "bad string" or this 'cool string'
So if I input:
he\'re is "watever o\"k" efre 'dder\'4rdr'?
It should output this matches:
["he\'re is ", " efre ", "?"]
And than I wan't to replace them.
I know my question is very difficult but it is not impossible! Nothing is impossible.
Thanks
EDIT: Rewritten to cover more edge cases.
This can be done, but it's a bit complicated.
result = subject.match(/(?:(?=(?:(?:\\.|"(?:\\.|[^"\\])*"|[^\\'"])*'(?:\\.|"(?:\\.|[^"'\\])*"|[^\\'])*')*(?:\\.|"(?:\\.|[^"\\])*"|[^\\'])*$)(?=(?:(?:\\.|'(?:\\.|[^'\\])*'|[^\\'"])*"(?:\\.|'(?:\\.|[^'"\\])*'|[^\\"])*")*(?:\\.|'(?:\\.|[^'\\])*'|[^\\"])*$)(?:\\.|[^\\'"]))+/g);
will return
, he said.
, she replied.
, he reminded her.
,
from this string (line breaks added and enclosing quotes removed for clarity):
"Hello", he said. "What's up, \"doc\"?", she replied.
'I need a 12" crash cymbal', he reminded her.
"2\" by 4 inches", 'Back\"\'slashes \\ are OK!'
Explanation: (sort of, it's a bit mindboggling)
Breaking up the regex:
(?:
(?= # Assert even number of (relevant) single quotes, looking ahead:
(?:
(?:\\.|"(?:\\.|[^"\\])*"|[^\\'"])*
'
(?:\\.|"(?:\\.|[^"'\\])*"|[^\\'])*
'
)*
(?:\\.|"(?:\\.|[^"\\])*"|[^\\'])*
$
)
(?= # Assert even number of (relevant) double quotes, looking ahead:
(?:
(?:\\.|'(?:\\.|[^'\\])*'|[^\\'"])*
"
(?:\\.|'(?:\\.|[^'"\\])*'|[^\\"])*
"
)*
(?:\\.|'(?:\\.|[^'\\])*'|[^\\"])*
$
)
(?:\\.|[^\\'"]) # Match text between quoted sections
)+
First, you can see that there are two similar parts. Both these lookahead assertions ensure that there is an even number of single/double quotes in the string ahead, disregarding escaped quotes and quotes of the opposite kind. I'll show it with the single quotes part:
(?= # Assert that the following can be matched:
(?: # Match this group:
(?: # Match either:
\\. # an escaped character
| # or
"(?:\\.|[^"\\])*" # a double-quoted string
| # or
[^\\'"] # any character except backslashes or quotes
)* # any number of times.
' # Then match a single quote
(?:\\.|"(?:\\.|[^"'\\])*"|[^\\'])*' # Repeat once to ensure even number,
# (but don't allow single quotes within nested double-quoted strings)
)* # Repeat any number of times including zero
(?:\\.|"(?:\\.|[^"\\])*"|[^\\'])* # Then match the same until...
$ # ... end of string.
) # End of lookahead assertion.
The double quotes part works the same.
Then, at each position in the string where these two assertions succeed, the next part of the regex actually tries to match something:
(?: # Match either
\\. # an escaped character
| # or
[^\\'"] # any character except backslash, single or double quote
) # End of non-capturing group
The whole thing is repeated once or more, as many times as possible. The /g modifier makes sure we get all matches in the string.
See it in action here on RegExr.
Here is a tested function that does the trick:
function getArrayOfNonQuotedSubstrings(text) {
/* Regex with three global alternatives to section the string:
('[^'\\]*(?:\\[\S\s][^'\\]*)*') # $1: Single quoted string.
| ("[^"\\]*(?:\\[\S\s][^"\\]*)*") # $2: Double quoted string.
| ([^'"\\]*(?:\\[\S\s][^'"\\]*)*) # $3: Un-quoted string.
*/
var re = /('[^'\\]*(?:\\[\S\s][^'\\]*)*')|("[^"\\]*(?:\\[\S\s][^"\\]*)*")|([^'"\\]*(?:\\[\S\s][^'"\\]*)*)/g;
var a = []; // Empty array to receive the goods;
text = text.replace(re, // "Walk" the text chunk-by-chunk.
function(m0, m1, m2, m3) {
if (m3) a.push(m3); // Push non-quoted stuff into array.
return m0; // Return this chunk unchanged.
});
return a;
}
This solution uses the String.replace() method with a replacement callback function to "walk" the string section by section. The regex has three global alternatives, one for each section; $1: single quoted, $2: double quoted, and $3: non-quoted substrings, Each non-quoted chunk is pushed onto the return array. It correctly handles all escaped characters, including escaped quotes, both inside and outside quoted strings. Single quoted substrings may contain any number of double quotes and vice-versa. Illegal orphan quotes are removed and serve to divide a non-quoted section into two chunks. Note that this solution requires no lookaround and requires only one pass. It also implements Friedl's "Unrolling-the-Loop" efficiency technique and is quite efficient.
Additional: Here is some code to test the function with the original test string:
// The original test string (with necessary escapes):
var s = "he\\'re is \"watever o\\\"k\" efre 'dder\\'4rdr'?";
alert(s); // Show the test string without the extra backslashes.
console.log(getArrayOfNonQuotedSubstrings(s).toString());
You can't invert a regex. What you have tried was making a character class out of it and invert that - but also for doing that you would have to escape all closing brackets "\]".
EDIT: I would have started with
/(^|" |' ).+?($| "| ')/
This matches anything between the beginning or the end of a quoted string (very simple: a quotation mark plus a blank) and the end of the string or the start of a quoted string (a blank plus a quotation mark). Of course this doesn't handle any escape sequences or quotations which don't follow the scheme / ['"].*['"] /. See above answers for more detailed expressions :-)

Regex validation for comma separated string

I need to validate and input string client side.
Here is an example of the string:
1:30-1:34, 1:20-1:22, 1:30-1:37,
It's basically time codes for a video.
Can this be done with regex?
Banging my head against the wall...
^(?:\b\d+:\d+-\d+:\d+\b(?:, )?)+$
would probably work; at least it matches your example. But you might need to add a few edge cases to make the rules for matching/not matching clearer.
^ # Start of string
(?: # Try to match...
\b # start of a "word" (in this case, number)
\d+ # one or more digits
: # a :
\d+ # one or more digits
- # a dash
\d+ # one or more digits
: # a :
\d+ # one or more digits
\b # end of a "word"
(?:, )? # optional comma and space
)+ # repeat one or more times
$ # until the end of the string
The following is a simple representation. I have assumed that the string has the exact same form as you have shown. This may be a good starting point for you. I'll improve the regex if you provide more specific requirements.
([0-9]+:[0-9]{1,2}-[0-9]+:[0-9]{1,2},\w*)+
Explanation (inspired from Tim above)
[0-9]+   #One ore more digits
:      # A colon
[0-9]{1,2}  #A single digit or a pair of digits
-       #A dash
,       #A comma
\w*      #Optional whitespace

Categories

Resources