Why does this regex match in javascript? - javascript

Fiddle at http://jsfiddle.net/42zcL/
I have the following code, which should alert "No Match". If I put the regex into regexpal.com and run it, it doesn't match (as expected). With this code, it does match. I know there is another way to do it, which works correctly - /^((.*)Waiting(.*))?$/, but I am curious as to why this one fails. It should match a string with the text "Waiting" in it or nothing at all.
var teststring="Anything";
if (teststring.match(/^((.*)Waiting(.*))|()$/)) alert('match');
else alert('No Match');
EDIT: Clearer example:
var teststring="b";
if (teststring.match(/^(a)|()$/)) alert('match');
else alert('No Match');
Produces a Match, when I would expect "No Match"
Expected behaviour, as per regexpal.com:
teststring: a = match
teststring: b = no match
Actual behaviour in javascript:
teststring: a = match
teststring: b = match

Because you have |()$ at the end which is like saying "Match what comes before | but if you don't find it, match anything as long as there's an end of line."
- Full RegEx reference
- Try it out
Hopefully this explains it a little better:
The use of () in RegEx does not mean "Don't match anything". If no characters are specified it will still match against () at each position in the string (letter position that is). Imagine it like this: The word "Anything" turned into an array - [A,n,y,t,h,i,n,g] - if n = length of that array, the placeholder at [n] is non-empty, resulting in a "match" since no specific restriction was expressed in the pattern.
Since #1 essentially means |()$ will return a positive result on any word tested, you will always see "match" in your alert.
I'm pretty terrible at conveying my thoughts so maybe this previous stack answer will fill in whatever holes my answer left open.

Related

Javascript issue: Script that checks to see if words were included in an input box. Whole words only

I have an input box and that's working great, but my issue is with the script that checks to see if certain words were included in it. Right now I have:
$gameVariables.value(1).toLowerCase().includes("steal") || $gameVariables.value(1).toLowerCase().includes("thief") || $gameVariables.value(1).toLowerCase().includes("pickpocket")
This is working fine except I would prefer it to work for whole words only. How would I go about doing that?
I was also wondering if there's a way to shorten this script so I don't have to copy and paste to same line for every word. I've tried the following code (which doesn't work. It only works for the first word but not the ones that come after.):
$gameVariables.value(1).toLowerCase().includes("steal","thief","pickpocket")
If you only want the words which exactly matches either theif or steal or pickpocket you can use the following code
const value = $gameVariables.value(1);
const regex = /^(thief|steal|pickpocket)$/i;
return regex.test(value);
So to explain the regex above,
^ will make sure that the regex matches from the start
( and ) creates a group for which regex should match
| is like or operator in coding
$ is used when you want the end to exactly match with the end of the regex
i basically is a flag to tell that you ignore the case while doing the check. So you won't need to use the .toLowerCase() either here
So the code above would basically check if the word is exactly matching either theif or steal or pickpocket and would return false if the word is theifs or theif steal
But if you don't want the exact match and only want to know if the input value contains these words then the simple answer would be to not use ^ and $
const value = $gameVariables.value(1);
const regex = /(thief|steal|pickpocket)/i;
return regex.test(value);
And the result would be that it would be true for exact matches like thief, steal and pickpocket and it would also be true when either of the word is present in the whole statement like this statement contains thief. The second method would fail for words like the or stl. But the problem is that it would also match with steals or would return true for statements like this statement contains steals. To fix it you can use the following code
const value = $gameVariables.value(1);
const regex = /(thief|steal|pickpocket)\b/i;
return regex.test(value);
\b allows you to perform a “whole words only” search
thus, it would return false if the statement contains this statement contains steals but would return true when the statement contains this statement contains steal or word which exactly matches steal
You can define an array ([]) of words to look for and then do this:
// Your list of words to look for
const words = ["steal", "thief", "pickpocket", "quick"];
// Your string
const str = "The quick brown fox jumps over the lazy dog";
const result = words.some(w => str.includes(w));
console.log(result);
Array.prototype.some will return true if any of the words are found in your string.

Regex (Javascript) - Match certain chat queries

So I'm posting due to me having spent several hours working on a filter that should record only certain chat messages based on the start of said message. I've reached a point where it's about fifty-fifty, but my lack of knowledge regarding regex has stopped me from being able to continue working on it.
Basically, the expression is supposed to match with messages that are one of a few annoying things. My apologies if this gets too specific, I'm unsure of how to get all of the conditions working together.
"word": (any word that is not "notice" or "type: s" - So anything like John:
word_word: (this time, the second word can be anything) - Something like John_Smith:
[Tag]word: or [Tag]word_word: (where a tag is either a unicode character or two characters between square brackets) - Something like [DM]Tom_Cruise: or such
One of the above, minus the colon. This is where I'm having issues. Something like [DM]Tom_Cruise waves.
Starts with (WHISPER) or (SHOUT). It doesn't matter what comes after it, in this case.
I've managed to get a regex that works with most of the situations, but I can't get condition 4 to work without getting unwanted messages.
In addition, if the message (received as a string per line) starts with (OOC), it shouldn't be matched. If it says (OOC) in the message later on, it's alright. If the string ends with "joined the game." or "left the game.", it should also not match.
So... yeah, I'm completely stuck on getting condition 4 to work, and hoped that the community that helped me get this far wouldn't mind answering a (hopefully not too specific) question about it. Here's the expression as I've gotten it:
(?!^\(OOC\))(_[a-z]+:)|(^[a-z]+:)|(^[a-z]+ [a-z]+ )
It can match most of the above conditions, except for 4 and some of 1. I can't figure out how to get the specific words (notice: and type:s) to not match, and 4 is just messing up some of my other conditions. And lastly, it doesn't seem to stop matches if, despite starting with (OOC), the string matches another condition.
Sorry if this is too specific, but I'm completely stuck and basically just picked up regex today. I'll take anything.
EDIT
Examples:
[AT]Smith_Johnson: "Hello there." - matches under Condition 3, works
Tom_Johnson: moves to the side. - matches under Condition 2, works
Notice: That private wooden door is locked. - should not match due to Condition 1, but currently does
Tom hops around like a fool. - Should match under Condition 4, doesn't
(OOC)SmithsonsFriend: hey guys, back - matches, but shouldn't under the not-match specifiers
(WHISPER)Bob_Ross: "Man, this is lame." - Condition 5
West Coast: This is a lovely place to live. - doesn't match due to whitespace, that's good
Joe joined the game. - matches, shouldn't under the not-match specifiers
EDIT TWO
To clarify:
A) string starts with (OOC) - never match
B) string starts with (WHISPER) or (SHOUT) - always match
If neither A nor B apply, then go to conditions 1-4.
You can use this regular expression:
^(?:\(shouts\)|\(whisper\))?(?:\[[A-Z]{1,2}\])?(?!Notice|Note)[A-Za-z]*(?:_[A-Za-z]*)?(?::|\s(?![A-Za-z]*:))(?!(?:joined|left) the game)
^ Start of the string (make sure to check line by line)
(?:\(shouts\)|\(whisper\))? allows optional sequences like (shouts) or (whisper)
(?:\[[A-Z]{1,2}\])? matches a non-capturing group with 1 or 2 A-Z characters inside [] which is optional (because of the ? at the end)
(?!Notice|Note): list of words, which are not part of the subsequent selector
[A-Za-z]* matches as much alphabetical characters as possible
(?:_[A-Za-z]*)? matches a _ followed by alphabetic characters
(?::|\s(?![A-Za-z]*:)) matches a : or a whitespace character \s, which however cannot be followed by [A-Z]:
(?!(?:joined|left) the game) negative lookahead: whole regex does not match, if this pattern matches
You should add the case insensitive flag /i in your regex, if you want to e.g. match (whisper) and (WHISPER).
→ Here are your example texts in an updated regex101 for a live test
Instead of making it one big (HUGE) regular expression, you could make a function that take a message and then check it against a number of regular expression (much flexible and much easier to implement). Like this:
function isValid(msg){
// starts with "WHISPER" or "SHOUT"
if(/^(?:whisper|shout)/i.test(msg)) return true;
// Check if it begins with "notice:" or "type:"
if(/^(?:notice|type)\s*:/i.test(msg)) return false;
// Check if it ends with "joined the game" or "left the game."
if(/(?:joined|left)\s+the\s+game\.?$/i.test(msg)) return false;
// starts with "(ooc)"
if(/^\(ooc\)/i.test(msg)) return false;
// "[at]word:" or "[a]word_word" or "word:" or "word_word" ...
if(/^(?:\[[a-z]{1,2}\])?[a-z_]+:?.*$/i.test(msg)) return true;
return false;
}
Example:
function isValid(msg) {
if (/^(?:whisper|shout)/i.test(msg)) return true;
if (/^(?:notice|type)\s*:/i.test(msg)) return false;
if (/(?:joined|left)\s+the\s+game\.?$/i.test(msg)) return false;
if (/^\(ooc\)/i.test(msg)) return false;
if (/^(?:\[[a-z]{1,2}\])?[a-z_]+:?.*$/i.test(msg)) return true;
return false;
}
function check() {
var string = prompt("Enter a message: ");
if(isValid(string))
alert(string + " is valid!");
else
alert(string + " is not valid!");
}
<button onclick="check()">TRY</button>

How to look for a pattern that might be missing some characters, but following a certain order?

I am trying to make a validation for "KQkq" <or> "-", in the first case, any of the letters can be missing (expect all of them, in which case it should be "-"). The order of the characters is also important.
So quick examples of legal examples are:
-
Kkq
q
This is for a Chess FEN validation, I have validated the first two parts using:.
var fen_parts = "rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1";
fen_parts = fen_parts.split(" ");
if(!fen_parts[0].replace(/[1-8/pnbrqk]/gi,"").length
&& !fen_parts[1].replace(/[wb]/,"").length
&& !fen_parts[2].replace(/[kq-]/gi,"").length /*not working, allows KKKKKQkq to be valid*/
){
//...
}
But simply using /[kq-]/gi to validate the third part allows too many things to be introduced, here are some quick examples of illegal examples:
KKKKQkq (there is more than one K)
QK (order is incorrect)
You can do
-|K?Q?k?q?
though you will need to do a second test to ensure that the input is not empty. Alternatively, using only regex:
KQ?k?q?|Qk?q?|kq?|q|-
This seems to work for me...
^(-|(K)?((?!\2)Q)?((?!\2\3)k)?((?!\2\3\4)q)?)$
A .match() returns null if the expression did not match. In that case you can use the logical OR to default to an array with an empty-string (a structure similar to the one returned by .match() on a successful match), which will allow you to check the length of the matched expression. The length will be 0 if the expression did not match, or K?Q?k?q? matched the empty string. If the pattern matches, the length will be > 0. in code:
("KQkq".match(/^(?:K?Q?k?q?|-)$/) || [""])[0].length
Because | is "stronger" than you'd expect, it is necessary to wrap your actual expression in a non-capturing group (?:).
Having answered the question, let's have a look at the rest of your code:
if (!fen_parts[0].replace(/[1-8/pnbrqk]/gi,"").length)
is, from the javascript's perspective equivalent to
if (!fen_parts[0].match(/[^1-8/pnbrqk]/gi))
which translates to "false if any character but 1-8/pnbrqk". This notation is not only simpler to read, it also executes faster as there is no unnecessary string mutation (replace) and computation (length) going on.

Find longest repeating substring in JavaScript using regular expressions

I'd like to find the longest repeating string within a string, implemented in JavaScript and using a regular-expression based approach.
I have an PHP implementation that, when directly ported to JavaScript, doesn't work.
The PHP implementation is taken from an answer to the question "Find longest repeating strings?":
preg_match_all('/(?=((.+)(?:.*?\2)+))/s', $input, $matches, PREG_SET_ORDER);
This will populate $matches[0][X] (where X is the length of $matches[0]) with the longest repeating substring to be found in $input. I have tested this with many input strings and found am confident the output is correct.
The closest direct port in JavaScript is:
var matches = /(?=((.+)(?:.*?\2)+))/.exec(input);
This doesn't give correct results
input Excepted result matches[0][X]
======================================================
inputinput input input
7inputinput input input
inputinput7 input input
7inputinput7 input 7
XXinputinputYY input XX
I'm not familiar enough with regular expressions to understand what the regular expression used here is doing.
There are certainly algorithms I could implement to find the longest repeating substring. Before I attempt to do that, I'm hoping a different regular expression will produce the correct results in JavaScript.
Can the above regular expression be modified such that the expected output is returned in JavaScript? I accept that this may not be possible in a one-liner.
Javascript matches only return the first match -- you have to loop in order to find multiple results. A little testing shows this gets the expected results:
function maxRepeat(input) {
var reg = /(?=((.+)(?:.*?\2)+))/g;
var sub = ""; //somewhere to stick temp results
var maxstr = ""; // our maximum length repeated string
reg.lastIndex = 0; // because reg previously existed, we may need to reset this
sub = reg.exec(input); // find the first repeated string
while (!(sub == null)){
if ((!(sub == null)) && (sub[2].length > maxstr.length)){
maxstr = sub[2];
}
sub = reg.exec(input);
reg.lastIndex++; // start searching from the next position
}
return maxstr;
}
// I'm logging to console for convenience
console.log(maxRepeat("aabcd")); //aa
console.log(maxRepeat("inputinput")); //input
console.log(maxRepeat("7inputinput")); //input
console.log(maxRepeat("inputinput7")); //input
console.log(maxRepeat("7inputinput7")); //input
console.log(maxRepeat("xxabcdyy")); //x
console.log(maxRepeat("XXinputinputYY")); //input
Note that for "xxabcdyy" you only get "x" back, as it returns the first string of maximum length.
It seems JS regexes are a bit weird. I don't have a complete answer, but here's what I found.
Although I thought they did the same thing re.exec() and "string".match(re) behave differently. Exec seems to only return the first match it finds, whereas match seems to return all of them (using /g in both cases).
On the other hand, exec seems to work correctly with ?= in the regex whereas match returns all empty strings. Removing the ?= leaves us with
re = /((.+)(?:.*?\2)+)/g
Using that
"XXinputinputYY".match(re);
returns
["XX", "inputinput", "YY"]
whereas
re.exec("XXinputinputYY");
returns
["XX", "XX", "X"]
So at least with match you get inputinput as one of your values. Obviously, this neither pulls out the longest, nor removes the redundancy, but maybe it helps nonetheless.
One other thing, I tested in firebug's console which threw an error about not supporting $1, so maybe there's something in the $ vars worth looking at.

Using javascript regexp to find the first AND longest match

I have a RegExp like the following simplified example:
var exp = /he|hell/;
When I run it on a string it will give me the first match, fx:
var str = "hello world";
var match = exp.exec(str);
// match contains ["he"];
I want the first and longest possible match,
and by that i mean sorted by index, then length.
Since the expression is combined from an array of RegExp's, I am looking for a way to find the longest match without having to rewrite the regular expression.
Is that even possible?
If it isn't, I am looking for a way to easily analyze the expression, and arrange it in the proper order. But I can't figure out how since the expressions could be a lot more complex, fx:
var exp = /h..|hel*/
How about /hell|he/ ?
All regex implementations I know of will (try to) match characters/patterns from left to right and terminate whenever they find an over-all match.
In other words: if you want to make sure you get the longest possible match, you'll need to try all your patterns (separately), store all matches and then get the longest match from all possible matches.
You can do it. It's explained here:
http://www.regular-expressions.info/alternation.html
(In summary, change the operand order or group with question mark the second part of the search.)
You cannot do "longest match" (or anything involving counting, minus look-aheads) with regular expressions.
Your best bet is to find all matches, and simply compare the lengths in the program.
I don't know if this is what you're looking for (Considering this question is almost 8 years old...), but here's my grain of salt:
(Switching the he for hell will perform the search based on the biggest first)
var exp = /hell|he/;
var str = "hello world";
var match = exp.exec(str);
if(match)
{
match.sort(function(a, b){return b.length - a.length;});
console.log(match[0]);
}
Where match[0] is going to be the longest of all the strings matched.

Categories

Resources