Exclude some characters in string - javascript

In quotes I try to exclude the ; characters using regex in Node.js.
For example i have an sql like below:
update codes set character='.' where character = ';' and write_date < now()-5 ;
I want to find effected rows before execute the statement. I wrote below regex but it not work correctly if there is ; character in quotes as above
const regexp = /update\s+((.|\n)[^;]*?)\s+set\s+((.|\n)[^;]*?)\s+where\s+((.|\n)[^;]*?);/ig;
regexp.exec(str)
Expected output:
table: codes
where: character = ';' and write_date < now()-5
But I get:
table: codes
where: character = ';

You can use
update\s+([^;]*?)\s+set\s(?:[^;]*?\s)?where\s+((?:'[^']*'|[^;])*?)\s*;
See the regex demo. Details:
update - a word
\s+ - one or more whitespaces
([^;]*?) - Group 1: zero or more but as few as possible chars other than ;
\s+ - one or more whitespaces
set - a word
\s - a whitespace
(?:[^;]*?\s)? - an optional sequence of any chars other than ; as few as possible, and then a whitespace
where - a word
\s+ - one or more whitespaces
((?:'[^']*'|[^;])*?) - Group 2: zero or more (as few as possible) sequences of ', zero or more non-'s, and then ', or any single char other than a ;
\s* - zero or more whitespaces
; - a ; char.

First of, I'm not sure what (.|\n) is for, so I'm ignoring that.
I believe there are two problems with your regexp, changing either will probably solve your problem, but I'd change both, just to be sure.
The ? after the * makes the * non-greedy, which means the regex will match as little as possible, so that the final ; in the regexp will match the first possible ; it finds, not the last possible. So I'd leave the ? out.
The regexp doesn't use $ to anchor to the end of string. Add $ after ; at the end (possibly \s*$ if you expect additional white space at the end of the string). If you do this, you actually don't need to exclude ;. And it may be a good idea, to add ^ (or ^\s*) at the beginning to anchor to the beginning of the string, too.
So the resulting regexp is
const regexp = /^\s*update\s+((.|\n).*)\s+set\s+((.|\n).*)\s+where\s+((.|\n).*);\s*$/ig;
Finally some conceptional ideas: Why are you doing this in the first place? Instead of starting with the UPDATE SQL, why don't you start out with the structure:
{
table: "codes",
where: "character = ';' and write_date < now()-5"
}
and build both the UPDATE and the SELECT SQLs from that?
Or if you only have the UPDATE SQL, instead of using a regular expression, there are SQL parser libraries (example) which would probably be more reliable.

Related

RegEx match end character only when other character is not present

I'm having quite some trouble to define a regEx that I'm needing....
Basically the idea is to detect all lines that end with a , or a ; character. For this I have defined the following regex:
(,|;)$
Which works fine for this, but then I have the exception that if there's a * character within that line (not necessarily starting with, but at some position), then I don't want to detect that match. Based on this sample:
/**
* Here there's a comment I don't want to find,
* but after this comment I do
*/
detectMe;
other,
I would intend to find 2 groups, the first one
/**
* Here there's a comment I don't want to find,
* but after this comment I do
*/
detectMe;
And the second one
other,
I've tried many things such as non capturing groups, negative looks ahead and also start of a string with [^\s*\*] with no success. Is there a way to do this?
Some of the regEx I've tried...
^[^\*](.*?)(,|;)$
^[^\s*\*](.*?)(,|;)$
To match an optional C comment and the following line ending with ; or , you may use
/(?:\/\*+[^*]*\*+(?:[^\/*][^*]*\*+)*\/\r?\n)?.*[;,]$/gm
See this regex demo
Details
(?:\/\*+[^*]*\*+(?:[^\/*][^*]*\*+)*\/\r?\n)? - an optional (as there is a ? quantifier after the group) non-capturing group matching 1 or 0 occurrences of
\/\*+[^*]*\*+(?:[^\/*][^*]*\*+)*\/ - a C comment pattern
\r?\n - a CRLF or LF ending
.*[;,]$ - a whole line that ends with ; or , ($ is the end of a line anchor here due to m modifier).
You can use this regex:
/^[^*]*?[,;]$/gm
It will start by mathing any number of characters not being '*', then match ',' or ';' at the end of the line. It uses the global and multiline flags to match all lines.

Regex consume a character if it matches, but not otherwise

I am trying to write a regex expression which will capture all instances of the '#' character, except when two such characters appear in succession (essentially, an escape sequence). For example:
abd#ajk: # should be matched
abd##ajk: No matches
abd###ajk: The final # should match.
abd####ajk: No matches
This almost works with the negative lookahead expression #(?!#), except that because the second # is not consumed, the last of two # symbols will still be matched. What I think I want to do is to lookahead for an # but consume the character if it is there; otherwise, do not consume it. Is this possible?
Edit: I'm using Javascript which unfortunately rules out several good approaches :(
In JavaScript, to split strings at an unescaped #, you may actually match chunks of text that is either ## (an escaped #) and any chars other than #:
var strs = ['abd#ajk','abd##ajk','abd###ajk','abd####ajk'];
var rx = /(?:[^#]|##)+/g;
for (var s of strs) {
console.log(s, "=>", s.match(rx))
}
The regex is
/(?:[^#]|##)+/g
See its demo
Details
(?: - start of a non-capturing group that matches either of the 2 alternatives:
[^#]- any char other than#`
| - or
## - 2 #s
)+ - repeat matching 1 or more times.
The g modifier finds all matching occurrences inside the input string.
Since you didn't tag a programming language to your question here is my 2 cents for Java:
(?<=(?<!#)(?:##){0,999})#(?!#)
Java doesn't support infinite lookbehinds but bounded so here I explicitly specified max of even occurrences of #: 999.
JavsScript
Lookbehinds in JavaScript are not implemented and supported by many browsers yet. If you are trying to do this in JS then this would be your working solution:
Method 1
((?:[^#]*(?:##)+[^#]*)+)|#
(?:[^#]*(?:##)+[^#]*)+ Match ## occurrences and all its leading / trailing characters
|# Or a single #
JS Code:
str.split(/((?:[^#]*(?:##)+[^#]*)+)|#/).filter(Boolean);
Method 2 (Recommended)
Or if you don't have problem with using match() this is much more cleaner and of course faster:
(?:[^#]*(?:##)+[^#]*)+|[^#]+
JS Code:
console.log(
"aaaa#######bbb#aa###cccc##ddddd#".match(/(?:[^#]*(?:##)+[^#]*)+|[^#]+/g)
);

regular expression, not reading entire string

I have a standard expression that is not working correctly.
This expression is supposed to catch if a string has invalid characters anywhere in the string. It works perfect on RegExr.com but not in my tests.
The exp is: /[a-zA-Z0-9'.\-]/g
It is failing on : ####
but passing with : aa####
It should fail both times, what am I doing wrong?
Also, /^[a-zA-Z0-9'.\-]$/g matches nothing...
//All Boxs
$('input[type="text"]').each(function () {
var text = $(this).prop("value")
var textTest = /[a-zA-Z0-9'.\-]/g.test(text)
if (!textTest && text != "") {
allFieldsValid = false
$(this).css("background-color", "rgba(224, 0, 0, 0.29)")
alert("Invalid characters found in " + text + " \n\n Valid characters are:\n A-Z a-z 0-9 ' . -")
}
else {
$(this).css("background-color", "#FFFFFF")
$(this).prop("value", text)
}
});
edit:added code
UPDATE AFTER QUESTION RE-TAGGING
You need to use
var textTest = /^[a-zA-Z0-9'.-]+$/.test(text)
^^
Note the absence of /g modifier and the + quantifier. There are known issues when you use /g global modifier within a regex used in RegExp#test() function.
You may shorten it a bit with the help of the /i case insensitive modifier:
var textTest = /^[A-Z0-9'.-]+$/i.test(text)
Also, as I mention below, you do not have to escape the - at the end of the character class [...], but it is advisable to keep escaped if the pattern will be modified later by less regex-savvy developers.
ORIGINAL C#-RELATED DETAILS
Ok, say, you are using Regex.IsMatch(str, #"[a-zA-Z0-9'.-]"). The Regex.IsMatch searches for partial matches inside a string. So, if the input string contains an ASCII letter, digit, ', . or -, this will pass. Thus, it is logical that aa#### passes this test, and #### does not.
If you use the second one as Regex.IsMatch(str, #"^[a-zA-Z0-9'.-]$"), only 1 character strings (with an optional newline at the end) would get matched as ^ matches at the start of the string, [a-zA-Z0-9'.-] matches 1 character from the specified ranges/sets, and $ matches the end of the string (or right before the final newline).
So, you need a quantifier (+ to match 1 or more, or * to match zero or more occurrences) and the anchors \A and \z:
Regex.IsMatch(str, #"\A[a-zA-Z0-9'.-]+\z")
^^ ^^^
\A matches the start of string (always) and \z matches the very end of the string in .NET. The [a-zA-Z0-9'.-]+ will match 1+ characters that are either ASCII letters, digits, ', . or -.
Note that - at the end of the character class does not have to be escaped (but you may keep the \- if some other developers will have to modify the pattern later).
And please be careful where you test your regexps. Regexr only supports JavaScript regex syntax. To test .NET regexps, use RegexStorm.net or RegexHero.
/^[a-zA-Z0-9'.-]+$/g
In the second case your (/[a-zA-Z0-9'.-]/g) was working because it matched on the first letter, so to make it correct you need to match the whole string (use ^ and $) and also allow more letters by adding a + or * (if you allow empty string).
Try this regex it matches any char which isn't part of the allowed charset
/[^a-zA-Z0-9'.\-]+/g
Test
>>regex = /[^a-zA-Z0-9'.\-]+/g
/[^a-zA-Z0-9'.\-]+/g
>>regex.test( "####dsfdfjsakldfj")
true
>>regex.test( "dsfdfjsakldfj")
false

Query on Javascript RegEx

I need a regex that allows 0-9, a-z, A-Z, hyphen, question mark and "/" slash characters alone. Also the length should be between 5 to 15 only.
I tried as follows, but it does not work:
var reg3 = /^([a-zA-Z0-9?-]){4,15}+$/;
alert(reg3.test("abcd-"));
length should be between 5 to 15 only
Is that why you have this?
{4,15}+
Just use {5,15}; it’s already a quantifier, and a + after it won’t work. Apart from that, the group isn’t necessary, but things should work.
/^[a-zA-Z0-9?/-]{5,15}$/
(I also added a slash character.)
This is what you need:
if (/^([a-z\/?-]{4,15})$/i.test(subject)) {
// Successful match
} else {
// Match attempt failed
}
REGEX EXPLANATION
^([a-z\/?-]{4,15})$
Options: Case insensitive
Assert position at the beginning of the string «^»
Match the regex below and capture its match into backreference number 1 «([a-z\/?-]{4,15})»
Match a single character present in the list below «[a-z\/?-]{4,15}»
Between 4 and 15 times, as many times as possible, giving back as needed (greedy) «{4,15}»
A character in the range between “a” and “z” (case insensitive) «a-z»
The literal character “/” «\/»
The literal character “?” «?»
The literal character “-” «-»
Assert position at the very end of the string «$»
Couple issues,
you need {5,15} instead of {4,15}+
need to include /
Your code can be rewritten as
var reg3 = new RegExp('^[a-z0-9?/-]{5,15}$', 'i'); // i flag to eliminate need of A-Z
alert(reg3.test("a1?-A7="));
Update
Let's not confuse can be with MUST be and concentrate on the actual thing I was trying to convey.
{4,15}+ part in /^([a-zA-Z0-9?-]){4,15}+$/ should be written as {5,15}, and / must be included; which will make your regexp
/^([a-zA-Z0-9?/-]){5,15}$/
which CAN be written as
/^[a-z0-9?/-]{5,15}$/i // i flag to eliminate need of A-Z
Also I hope everybody is OK with use of /i

Javascript RegExp Tokenizing

Given a string, I want to use a regular expression to tokenize it. The pattern is as follows: any character (including new line, etc.), until "<", followed by a space zero or more times, followed by "%".
I tried
var patt = /(.)*<(\s)*%/;
but it does not yield the desired result. I would appreciate an explanation along with the pattern.
Use this:
"some string".split(/.*<\s*%/);
/^[\s\S]*?< *%/
should do what you want.
^ causes it to match at the beginning of the string.
[\s\S] matches any character. Literally, it means any space or non-space character, and works around the fact that . does not match newlines.
*? matches zero or more but the fewest necessary for the rest of the pattern to match.
< matches a literal '<'
* (note the space) matches zero or more spaces. This is more readable if written as [ ]*.
% finally matches that character.
If you want to match the entire string (i.e. the % should be the last character in the string), then you can put a $ before the last /.

Categories

Resources