Javascript profanity match NOT replace

Javascript profanity match NOT replace - javascript

I am building a very basic profanity filter that I only want to apply on some fields on my application (fullName, userDescription) on the serverside.
Does anyone have experience with a profanity filter in production? I only want it to:
'ass hello' <- match
'asster' <- NOT match
Below is my current code but it returns true and false on in succession for some reason.
var badWords = [ 'ass', 'whore', 'slut' ]
, check = new Regexp(badWords.join('|'), 'gi');
function filterString(string) {
return check.test(string);
}
filterString('ass'); // Returns true / false in succession.
How can I fix this "in succession" bug?

The test method sets the lastIndex property of the regex to the current matched position, so that further invocations will match further occurrences (if there were any).
check.lastIndex // 0 (init)
filterString('ass'); // true
check.lastIndex // 3
filterString('ass'); // false
check.lastIndex // now 0 again
So, you will need to reset it manually in your filterString function if you don't recreate the RegExp each time:
function filterString(string) {
check.lastIndex = 0;
return check.test(string);
}
Btw, to match only full words (like "ass", but not "asster"), you should wrap your matches in word boundaries like WTK suggested, i.e.
var check = new Regexp("\\b(?:"+badWords.join('|')+")\\b", 'gi');

You are matching via a substring comparison. Your Regex needs to be modified to match for whole words instead

How about with fixed regexp:
check = new Regexp('(^|\b)'+badWords.join('|')+'($|\b)', 'gi');
check.test('ass') // true
check.test('suckass') // false
check.test('mass of whore') // true
check.test('massive') // false
check.test('slut is massive') // true
I'm using \b match here to match for word boundry (and start or end of whole string).

Related

cannot match the string that finishes with one equality or two equality

I have a string like this
[network-traffic:src_port =
and I want to check it it ends with = or == or !=
I have a regex like this
[^=]*={1}
just to start and now when feed it with ssss=== it matches and in the first step I am failing as 3 = is also matching though I need only 1 or 2 equality to be matched
what is the best way to achieve the above?

you can use the following regex ^[^=]*(?:={1,2}|!=)$ it breaks down as follows
match the start of the line
match 0 or more chars which are not an =
match 1 or 2 = OR match !=
match the end of the line

How about this?
function validate(str) {
return /(?<!.*=)([=!])?=$/.test(str)
}
console.log(validate('[network-traffic:src_port =')); // True
console.log(validate('[network-traffic:src_port ==!=')); // True
console.log(validate('ssss=== it')); // False
console.log(validate('ssss=== it===')); // False

How to include a dictionary in this regex expression

I'm starting with Javascript, I have created this function to validate certain words on input, (return true or false)
export default function validate(props) {
return props.match(/war|gun|kill/g) != null;
}
But I will be including in the future more words and the regex expression will be very long, can you tell me a better way to rewrite this function?

You can maintain a list of words, and include regex in the words, such as guns? for singular and plural form.
Here is a flagString function based on your example:
function flagString(str) {
const bannedRe = new RegExp('\\b(' + banned.join('|') + ')\\b', 'i');
return bannedRe.test(str);
}
var banned = [ 'guns?', 'kill', 'war' ];
console.log(flagString('this is ok')); // returns false
console.log(flagString('guns are not ok')); // returns true
console.log(flagString('to kill is not ok')); // returns true
Notes:
The '\\b(' and ')\\b' anchor the words on boundaries, this is to avoid false positives
The .join('|') joins the words into a single regex with ORed words, so that you can test your string in a single swoop for performance

Wrap a regular expression so that it only matches the whole string

Related: Regex - Match whole string
I want a given regexp to match the whole string. For example, when given a regexp /abc/, it should only match string "abc" but not "abcd". I have searched the above question, which has somehow a similar situation than mine. The difference here is that the regexp is not directly written in my code. So I can't change /abc/ to /^abc$/ directly in the source code.
Let's say, I want a function which takes two arguments: a 'regexp' (e.g. /abc/) and a string (e.g. 'abc'). The function returns matched result if and only if the given regexp matches the whole string, but returns null otherwise.
What I'm trying:
function match(regexp, string) {
var parse = /\/(.*)\/(.*)/.exec(regexp);
var reg = new RegExp('^' + parse[1] + '$', parse[2]);
return string.match(reg);
}
Is this code correct? Any better way to do so?

You transform regex such as /a+|b+/ into ^a+|b+$ which still matches e.g. 'xbb'. A solution would be to wrap the inner regex with an anonymous group: ^(?:a+|b+)$.
Also, you currently truncate useful regex flags such as /.../m or /.../i.
Alternatively, you could simply use the original regex and check if the result covers the whole input string length:
function fullmatch(regex, string) {
const result = string.match(regex);
return !!result && result[0].length === string.length;
}
// Example:
console.log(fullmatch(/a+|b+/, 'xbb')); // false
console.log(fullmatch(/a+|b+/, 'bbb')); // true
// Respects lazy quantifier:
console.log(fullmatch(/b+?/, 'bbb')); // false
console.log(fullmatch(/b+/, 'bbb')); // true
// Invariant to global flag g:
console.log(fullmatch(/b+?/g, 'bbb')); // false
console.log(fullmatch(/b+/g, 'bbb')); // true

Regex, end of string must contain number

I just need a really simple javascript regex test to make sure the last character of a given string is a number 0-9. I keep finding different variations of this in my searches but none that really match the simplicity of what I need. So for instance
var strOne = "blahblahblahblah"; // === false
var strTwo = "blahblahblahblah///"; // === false
var strThree = "blahblahblahblah123"; // === true
Any help is appreciated... I'm still trying to wrap my head around the rules of regex.

\d$ should do it.
\d - digit character class
$ - must match end of string
Tests:
/\d$/.test('blahblahblahblah'); // false
/\d$/.test('blahblahblahblah///'); // false
/\d$/.test('blahblahblahblah123'); // true

Try this regex /\d$/
var strOne = "blahblahblahblah";
var strTwo = "blahblahblahblah///";
var strThree = "blahblahblahblah123";
var regex = /\d$/;
console.log(regex.test(strOne)); // false
console.log(regex.test(strTwo)); // false
console.log(regex.test(strThree)); // true

This regex is pretty safe, give this a go: /\d+(\.\d+)?$/

Determine if string has any characters that aren't in a list of characters and if so, which characters don't match?

I'm working on a simple password validator and wondering if its possible in Regex or... anything besides individually checking for each character.
Basically if the user types in something like "aaaaaaaaa1aaaaa", I want to let the user know that the character "1" is not allowed (This is a super simple example).
I'm trying to avoid something like
if(value.indexOf('#') {}
if(value.indexOf('#') {}
if(value.indexOf('\') {}
Maybe something like:
if(/[^A-Za-z0-9]/.exec(value) {}
Any help?

If you just want to check if the string is valid, you can use RegExp.test() - this is more efficient that exec() as it will return true when it finds the first occurrence:
var value = "abc$de%f";
// checks if value contains any invalid character
if(/[^A-Za-z0-9]/.test(value)) {
alert('invalid');
}
If you want to pick out which characters are invalid you need to use String.match():
var value = "abc$de%f";
var invalidChars = value.match(/[^A-Za-z0-9]/g);
alert('The following characters are invalid: ' + invalidChars.join(''));

Although a simple loop can do the job, here's another approach using a lesser known Array.prototype.some method. From MDN's description of some:
The some() method tests whether some element in the array passes the test implemented by the provided function.
The advantage over looping is that it'll stop going through the array as soon as the test is positive, avoiding breaks.
var invalidChars = ['#', '#', '\\'];
var input = "test#";
function contains(e) {
return input.indexOf(e) > -1;
}
console.log(invalidChars.some(contains)); // true

I'd suggest:
function isValid (val) {
// a simple regular expression to express that the string must be, from start (^)
// to end ($) a sequence of one or more letters, a-z ([a-z]+), of upper-, or lower-,
// case (i):
var valid = /^[a-z]+$/i;
// returning a Boolean (true/false) of whether the passed-string matches the
// regular expression:
return valid.test(val);
}
console.log(isValid ('abcdef') ); // true
console.log(isValid ('abc1def') ); // false
Otherwise, to show the characters that are found in the string and not allowed:
function isValid(val) {
// caching the valid characters ([a-z]), which can be present multiple times in
// the string (g), and upper or lower case (i):
var valid = /[a-z]/gi;
// if replacing the valid characters with zero-length strings reduces the string to
// a length of zero (the assessment is true), then no invalid characters could
// be present and we return true; otherwise, if the evaluation is false
// we replace the valid characters by zero-length strings, then split the string
// between characters (split('')) to form an array and return that array:
return val.replace(valid, '').length === 0 ? true : val.replace(valid, '').split('');
}
console.log(isValid('abcdef')); // true
console.log(isValid('abc1de#f')); // ["1", "#"]
References:
JavaScript conditional operator (assessment ? ifTrue : ifFalse).
JavaScript Regular Expressions.
String.prototype.replace().
String.prototype.split().
RegExp.prototype.test().

If I understand what you are asking you could do the following:
function getInvalidChars() {
var badChars = {
'#' : true,
'/' : true,
'<' : true,
'>' : true
}
var invalidChars = [];
for (var i=0,x = inputString.length; i < x; i++) {
if (badChars[inputString[i]]) invalidChars.push(inputString[i]);
}
return invalidChars;
}
var inputString = 'im/b#d:strin>';
var badCharactersInString = getInvalidChars(inputString);
if (badCharactersInString.length) {
document.write("bad characters in string: " + badCharactersInString.join(','));
}

Develop Reference

JavaScript is the programming language of the Web.

Javascript profanity match NOT replace - javascript

You are matching via a substring comparison. Your Regex needs to be modified to match for whole words instead

Related

cannot match the string that finishes with one equality or two equality

How to include a dictionary in this regex expression

Wrap a regular expression so that it only matches the whole string

Regex, end of string must contain number

Determine if string has any characters that aren't in a list of characters and if so, which characters don't match?

Categories

Resources