Regex (Javascript) - Match certain chat queries - javascript

So I'm posting due to me having spent several hours working on a filter that should record only certain chat messages based on the start of said message. I've reached a point where it's about fifty-fifty, but my lack of knowledge regarding regex has stopped me from being able to continue working on it.
Basically, the expression is supposed to match with messages that are one of a few annoying things. My apologies if this gets too specific, I'm unsure of how to get all of the conditions working together.
"word": (any word that is not "notice" or "type: s" - So anything like John:
word_word: (this time, the second word can be anything) - Something like John_Smith:
[Tag]word: or [Tag]word_word: (where a tag is either a unicode character or two characters between square brackets) - Something like [DM]Tom_Cruise: or such
One of the above, minus the colon. This is where I'm having issues. Something like [DM]Tom_Cruise waves.
Starts with (WHISPER) or (SHOUT). It doesn't matter what comes after it, in this case.
I've managed to get a regex that works with most of the situations, but I can't get condition 4 to work without getting unwanted messages.
In addition, if the message (received as a string per line) starts with (OOC), it shouldn't be matched. If it says (OOC) in the message later on, it's alright. If the string ends with "joined the game." or "left the game.", it should also not match.
So... yeah, I'm completely stuck on getting condition 4 to work, and hoped that the community that helped me get this far wouldn't mind answering a (hopefully not too specific) question about it. Here's the expression as I've gotten it:
(?!^\(OOC\))(_[a-z]+:)|(^[a-z]+:)|(^[a-z]+ [a-z]+ )
It can match most of the above conditions, except for 4 and some of 1. I can't figure out how to get the specific words (notice: and type:s) to not match, and 4 is just messing up some of my other conditions. And lastly, it doesn't seem to stop matches if, despite starting with (OOC), the string matches another condition.
Sorry if this is too specific, but I'm completely stuck and basically just picked up regex today. I'll take anything.
EDIT
Examples:
[AT]Smith_Johnson: "Hello there." - matches under Condition 3, works
Tom_Johnson: moves to the side. - matches under Condition 2, works
Notice: That private wooden door is locked. - should not match due to Condition 1, but currently does
Tom hops around like a fool. - Should match under Condition 4, doesn't
(OOC)SmithsonsFriend: hey guys, back - matches, but shouldn't under the not-match specifiers
(WHISPER)Bob_Ross: "Man, this is lame." - Condition 5
West Coast: This is a lovely place to live. - doesn't match due to whitespace, that's good
Joe joined the game. - matches, shouldn't under the not-match specifiers
EDIT TWO
To clarify:
A) string starts with (OOC) - never match
B) string starts with (WHISPER) or (SHOUT) - always match
If neither A nor B apply, then go to conditions 1-4.

You can use this regular expression:
^(?:\(shouts\)|\(whisper\))?(?:\[[A-Z]{1,2}\])?(?!Notice|Note)[A-Za-z]*(?:_[A-Za-z]*)?(?::|\s(?![A-Za-z]*:))(?!(?:joined|left) the game)
^ Start of the string (make sure to check line by line)
(?:\(shouts\)|\(whisper\))? allows optional sequences like (shouts) or (whisper)
(?:\[[A-Z]{1,2}\])? matches a non-capturing group with 1 or 2 A-Z characters inside [] which is optional (because of the ? at the end)
(?!Notice|Note): list of words, which are not part of the subsequent selector
[A-Za-z]* matches as much alphabetical characters as possible
(?:_[A-Za-z]*)? matches a _ followed by alphabetic characters
(?::|\s(?![A-Za-z]*:)) matches a : or a whitespace character \s, which however cannot be followed by [A-Z]:
(?!(?:joined|left) the game) negative lookahead: whole regex does not match, if this pattern matches
You should add the case insensitive flag /i in your regex, if you want to e.g. match (whisper) and (WHISPER).
→ Here are your example texts in an updated regex101 for a live test

Instead of making it one big (HUGE) regular expression, you could make a function that take a message and then check it against a number of regular expression (much flexible and much easier to implement). Like this:
function isValid(msg){
// starts with "WHISPER" or "SHOUT"
if(/^(?:whisper|shout)/i.test(msg)) return true;
// Check if it begins with "notice:" or "type:"
if(/^(?:notice|type)\s*:/i.test(msg)) return false;
// Check if it ends with "joined the game" or "left the game."
if(/(?:joined|left)\s+the\s+game\.?$/i.test(msg)) return false;
// starts with "(ooc)"
if(/^\(ooc\)/i.test(msg)) return false;
// "[at]word:" or "[a]word_word" or "word:" or "word_word" ...
if(/^(?:\[[a-z]{1,2}\])?[a-z_]+:?.*$/i.test(msg)) return true;
return false;
}
Example:
function isValid(msg) {
if (/^(?:whisper|shout)/i.test(msg)) return true;
if (/^(?:notice|type)\s*:/i.test(msg)) return false;
if (/(?:joined|left)\s+the\s+game\.?$/i.test(msg)) return false;
if (/^\(ooc\)/i.test(msg)) return false;
if (/^(?:\[[a-z]{1,2}\])?[a-z_]+:?.*$/i.test(msg)) return true;
return false;
}
function check() {
var string = prompt("Enter a message: ");
if(isValid(string))
alert(string + " is valid!");
else
alert(string + " is not valid!");
}
<button onclick="check()">TRY</button>

Related

JavaScript regex inline validation for basic calculation string with one operator

I've written a basic 2 operand calculator app (+ - * /) that uses a couple of inline regex validations to filter away invalid characters as they are typed.
An example looks like:
//check if operator is present
if(/[+\-*\/]/.test(display1.textContent)){
//validate the string each time a new character is added
if(!/^\d+\.?\d*[+\-*\/]?\d*\.?\d*$/.test(display1.textContent)){
console.log('invalid')
return false
}
//validate the string character by character before operator
} else {
if(!/^\d+\.?\d*$/.test(display1.textContent)){
console.log('invalid')
return false
}
}
In the above, a valid character doesn't return false:
23.4x0.00025 (no false returned and hence the string is typed out)
But, if an invalid character is typed the function returns false and the input is filtered away:
23.4x0.(x) x at the end returns a false so is filtered (only one operator allowed per calculation)
23.4x0. is typed
It works pretty well but allows for the following which I would like to deal with:
2.+.1
I would prefer 2.0+0.1
My regex would need an if-then-else conditional stating that if the current character is '.' then the next character must be a number else the next char can be number|.|operator. Or if the current character is [+-*/] then the next character must be a number, else the next char can be any char (while following the overall logic).
The tricky part is that the logic must process the string as it is typed character by character and validate at each addition (and be accurate), not at the end when the string is complete.
if-then-else regex is not supported in JavaScript (which I think would satisfy my needs) so I need to use another approach whilst remaining within the JS domain.
Any suggestions about this specific problem would be really helpful.
Thanks
https://github.com/jdineley/Project-calculator
Thanks #trincot for the tips using capturing groups and look around. This helped me write what I needed:
https://regex101.com/r/khUd8H/1
git hub app is updated and works as desired. Now just need to make it pretty!
For ensuring that an operator is not allowed when the preceding number ended in a point, you can insert a positive look behind in your regex that requires the character before an operator to always be a digit: (?<=\d)
Demo:
const validate = s => /^(\d+(\.\d*)?((?<=\d)[+*/-]|$))*$/.test(s);
document.querySelector("input").addEventListener("input", function () {
this.style.backgroundColor = validate(this.value) ? "" : "orange";
});
Input: <input>

Javascript regular expression to capture every possible mathematical operation between parenthesis

I am trying to capture mathematical expressions between parenthesis in a string with javascript. I need to capture parenthesis that ONLY include numbers and mathematical operators [0-9], +, - , *, /, % and the decimal dot. The examples below demonstrate what I am after. I managed to get close to the desired result but the nested parenthesis always screw my regex up so I need help! I also need it to look globally, not for first occurence only.
let string = "If(2>1,if(a>100, (-2*(3-5)(8-2)), (1+2)), (3(1+2)) )";
What I want to do if possible is manage to transform this syntax
if(condition, iftrue, iffalse)
to this syntax
if(condition) { iftrue } else { iffalse }
so that it can be evaluated by javascript and previewed in the browser. I have done it so far but if the iftrue or iffalse blocks contain parenthesis, everything blows up! So I m trying to capture that parenthesis and calculate before transforming the syntax. Any advice is appreciated.
The closest i got was this /[\d()+-*/.]/g which gets whats i want but in this example
(1+2) (1 < 1) sdasdasd (1*(2+3))
instead of dismissing the (1<1) group entirelly it matches (1 and 1). My ideal scenario would be
(1+2) (1<1) sdasdasd (1*(2+3))
Another example:
let codeToEval = "if(a>10, 2, 2*(b+4))";
codeToEval is the passed in a function that replaces a and b with the correct values so it ends up like this.
codeToEvalAfterReplacement = "if(5>10,2,2*(5+4))";
And now I want to transform this in
if(5>10) {
2
} else {
2*(5+4)
}
so it can be evaluated by javascript eval() and eventually previewed to the users.
Your current regex /[\d()+-*/.]/g will match single characters from the class
but multiple times because of the g flag, this is why (1 and 1) are still matched
in (1 < 1).
Based on your pattern requirements I would change it to /\([-+*/%.0-9()]+\)/g.
This will match parentheses containing one or more of the characters you describe within them.
Note that your current pattern has a - somewhere in the middle of a class which can lead to weird behaviours because some regex engines will treat +-* within a class as a range (plus through asterisk, which is a stange range). Notice I put - at the start of the class in the new pattern so it matches an actual -.
I've assumed there will be no empty parentheses (), if there are you can change + (one or more) after ] to * (zero or more)
The g flag is still added so you match every one of such expressions.
I can't say with 100% certainty that the new regex will allow you to robustly transform the syntax you state, as it depends on the complexity of the 'iftrue' and 'iffalse' code blocks. See if you can make it work with the new pattern, otherwise you may want to look into other solutions for parsing code.
Call function in if parenthesis and all conditions in that function.
if(test()){
// if code
}else{
// else code
}
function test(){
// check both cases here
if(case 1 && case 2){
return true
}
return false;
}

RegExp for Cyrillic+minimum 3 letters+without numbers and space+non repeatable at the end

I do have the following RegExp in the current web application.
function myCyrillicValidator(text){
return XRegExp("^\\p{Cyrillic}+$").test(text);
}
As you can see I use XRegExp javasciprt library. Currently, this regxep checks if its cyrillic. I want to extend it and to check:
It's Cyrillic
It's minimum 3 letters
It doesnt have space
The last letters doesnt repeat, i.e Gabenn <- is wrong, Moneyy <- is wrong, beucase the last 2 letters are repeating
First letter is capital
I tried few online RegExp tester/builders to build on top of the current rule. But none of them showed me that the current regexp is working correctly. But surprisingly, its working in the webapp, but not in the online testers.
XregxExp version is 2.0.0 if it does matter
Use the following regex:
XRegExp("^(?=\\p{Lu})(?!.*(.)\\1$)\\p{Cyrillic}{3,}$")
See the regex demo.
Details:
^ - start of string anchor
(?=\\p{Lu}) - the first letter must be an uppercase letter
(?!.*(.)\\1$) - the string should not end with 2 identical chars
\\p{Cyrillic}{3,} - the string should only consist of 3 or more Cyrillic letters
$ - end of string anchor
function myCyrillicValidator(text){
return XRegExp("^(?=\\p{Lu})(?!.*(.)\\1$)\\p{Cyrillic}{3,}$").test(text);
}
console.log(myCyrillicValidator("Ва")); // => false
console.log(myCyrillicValidator("вася")); // => false
console.log(myCyrillicValidator("Васяя")); // => false
console.log(myCyrillicValidator("Вася")); // => true
<script src="https://cdnjs.cloudflare.com/ajax/libs/xregexp/2.0.0/xregexp-all-min.js"></script>

Why does this regex match in javascript?

Fiddle at http://jsfiddle.net/42zcL/
I have the following code, which should alert "No Match". If I put the regex into regexpal.com and run it, it doesn't match (as expected). With this code, it does match. I know there is another way to do it, which works correctly - /^((.*)Waiting(.*))?$/, but I am curious as to why this one fails. It should match a string with the text "Waiting" in it or nothing at all.
var teststring="Anything";
if (teststring.match(/^((.*)Waiting(.*))|()$/)) alert('match');
else alert('No Match');
EDIT: Clearer example:
var teststring="b";
if (teststring.match(/^(a)|()$/)) alert('match');
else alert('No Match');
Produces a Match, when I would expect "No Match"
Expected behaviour, as per regexpal.com:
teststring: a = match
teststring: b = no match
Actual behaviour in javascript:
teststring: a = match
teststring: b = match
Because you have |()$ at the end which is like saying "Match what comes before | but if you don't find it, match anything as long as there's an end of line."
- Full RegEx reference
- Try it out
Hopefully this explains it a little better:
The use of () in RegEx does not mean "Don't match anything". If no characters are specified it will still match against () at each position in the string (letter position that is). Imagine it like this: The word "Anything" turned into an array - [A,n,y,t,h,i,n,g] - if n = length of that array, the placeholder at [n] is non-empty, resulting in a "match" since no specific restriction was expressed in the pattern.
Since #1 essentially means |()$ will return a positive result on any word tested, you will always see "match" in your alert.
I'm pretty terrible at conveying my thoughts so maybe this previous stack answer will fill in whatever holes my answer left open.

How to look for a pattern that might be missing some characters, but following a certain order?

I am trying to make a validation for "KQkq" <or> "-", in the first case, any of the letters can be missing (expect all of them, in which case it should be "-"). The order of the characters is also important.
So quick examples of legal examples are:
-
Kkq
q
This is for a Chess FEN validation, I have validated the first two parts using:.
var fen_parts = "rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1";
fen_parts = fen_parts.split(" ");
if(!fen_parts[0].replace(/[1-8/pnbrqk]/gi,"").length
&& !fen_parts[1].replace(/[wb]/,"").length
&& !fen_parts[2].replace(/[kq-]/gi,"").length /*not working, allows KKKKKQkq to be valid*/
){
//...
}
But simply using /[kq-]/gi to validate the third part allows too many things to be introduced, here are some quick examples of illegal examples:
KKKKQkq (there is more than one K)
QK (order is incorrect)
You can do
-|K?Q?k?q?
though you will need to do a second test to ensure that the input is not empty. Alternatively, using only regex:
KQ?k?q?|Qk?q?|kq?|q|-
This seems to work for me...
^(-|(K)?((?!\2)Q)?((?!\2\3)k)?((?!\2\3\4)q)?)$
A .match() returns null if the expression did not match. In that case you can use the logical OR to default to an array with an empty-string (a structure similar to the one returned by .match() on a successful match), which will allow you to check the length of the matched expression. The length will be 0 if the expression did not match, or K?Q?k?q? matched the empty string. If the pattern matches, the length will be > 0. in code:
("KQkq".match(/^(?:K?Q?k?q?|-)$/) || [""])[0].length
Because | is "stronger" than you'd expect, it is necessary to wrap your actual expression in a non-capturing group (?:).
Having answered the question, let's have a look at the rest of your code:
if (!fen_parts[0].replace(/[1-8/pnbrqk]/gi,"").length)
is, from the javascript's perspective equivalent to
if (!fen_parts[0].match(/[^1-8/pnbrqk]/gi))
which translates to "false if any character but 1-8/pnbrqk". This notation is not only simpler to read, it also executes faster as there is no unnecessary string mutation (replace) and computation (length) going on.

Categories

Resources