How to include a dictionary in this regex expression

How to include a dictionary in this regex expression - javascript

I'm starting with Javascript, I have created this function to validate certain words on input, (return true or false)
export default function validate(props) {
return props.match(/war|gun|kill/g) != null;
}
But I will be including in the future more words and the regex expression will be very long, can you tell me a better way to rewrite this function?

You can maintain a list of words, and include regex in the words, such as guns? for singular and plural form.
Here is a flagString function based on your example:
function flagString(str) {
const bannedRe = new RegExp('\\b(' + banned.join('|') + ')\\b', 'i');
return bannedRe.test(str);
}
var banned = [ 'guns?', 'kill', 'war' ];
console.log(flagString('this is ok')); // returns false
console.log(flagString('guns are not ok')); // returns true
console.log(flagString('to kill is not ok')); // returns true
Notes:
The '\\b(' and ')\\b' anchor the words on boundaries, this is to avoid false positives
The .join('|') joins the words into a single regex with ORed words, so that you can test your string in a single swoop for performance

Related

How can i check for only one occurence only of each provided symbol?

I have a provided array of symbols, which can be different. For instance, like this - ['#']. One occurrence of each symbol is a mandatory. But in a string there can be only one of each provided sign.
Now I do like this:
const regex = new RegExp(`^\\w+[${validatedSymbols.join()}]\\w+$`);
But it also returns an error on signs like '=' and so on. For example:
/^\w+[#]\w+$/.test('string#=string') // false
So, the result I expect:
'string#string' - ok
'string##string - not ok

Using a complex regex is most likely not the best solution. I think you would be better of creating a validation function.
In this function you can find all occurrence of the provided symbols in string. Then return false if no occurrences are found, or if the list of occurrences contains duplicate entries.
// https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_Expressions#escaping
const escapeRegExp = (string) => string.replace(/[.*+?^${}()|[\]\\]/g, '\\$&');
function validate(string, symbols) {
if (symbols.length == 0) {
throw new Error("at least one symbol must be provided in the symbols array");
}
const symbolRegex = new RegExp(symbols.map(escapeRegExp).join("|"), "g");
const symbolsInString = string.match(symbolRegex); // <- null if no match
// string must at least contain 1 occurrence of any symbol
if (!symbolsInString) return false;
// symbols may only occur once
const hasDuplicateSymbols = symbolsInString.length != new Set(symbolsInString).size;
return !hasDuplicateSymbols;
}
const validatedSymbols = ["#", "="];
const strings = [
"string!*string", // invalid (doesn't have "#" nor "=")
"string#!string", // valid
"string#=string", // valid
"string##string", // invalid (max 1 occurance per symbol)
];
console.log("validatedSymbols", "=", JSON.stringify(validatedSymbols));
for (const string of strings) {
const isValid = validate(string, validatedSymbols);
console.log(JSON.stringify(string), "//=>", isValid);
}

I think you are looking for the following:
const regex = new RegExp(`^\\w+[${validatedSymbols.join()}]?\\w+$`);
The question mark means 1 or 0 of the previous group.
You might also need to escape the symbols in validatedSymbols as some symbols have a different meaning in regex
Edit:
For mandatory symbols it would be easier to add a group per symbol:
^\w+(#\w*){1}(#\w*){1}\w+$
Where the group is:
(#\w*){1}

Wrap a regular expression so that it only matches the whole string

Related: Regex - Match whole string
I want a given regexp to match the whole string. For example, when given a regexp /abc/, it should only match string "abc" but not "abcd". I have searched the above question, which has somehow a similar situation than mine. The difference here is that the regexp is not directly written in my code. So I can't change /abc/ to /^abc$/ directly in the source code.
Let's say, I want a function which takes two arguments: a 'regexp' (e.g. /abc/) and a string (e.g. 'abc'). The function returns matched result if and only if the given regexp matches the whole string, but returns null otherwise.
What I'm trying:
function match(regexp, string) {
var parse = /\/(.*)\/(.*)/.exec(regexp);
var reg = new RegExp('^' + parse[1] + '$', parse[2]);
return string.match(reg);
}
Is this code correct? Any better way to do so?

You transform regex such as /a+|b+/ into ^a+|b+$ which still matches e.g. 'xbb'. A solution would be to wrap the inner regex with an anonymous group: ^(?:a+|b+)$.
Also, you currently truncate useful regex flags such as /.../m or /.../i.
Alternatively, you could simply use the original regex and check if the result covers the whole input string length:
function fullmatch(regex, string) {
const result = string.match(regex);
return !!result && result[0].length === string.length;
}
// Example:
console.log(fullmatch(/a+|b+/, 'xbb')); // false
console.log(fullmatch(/a+|b+/, 'bbb')); // true
// Respects lazy quantifier:
console.log(fullmatch(/b+?/, 'bbb')); // false
console.log(fullmatch(/b+/, 'bbb')); // true
// Invariant to global flag g:
console.log(fullmatch(/b+?/g, 'bbb')); // false
console.log(fullmatch(/b+/g, 'bbb')); // true

How to replace all occurrences between two specific characters with the associated char Code using reqex?

I'd like to replace everything between two characters with another string. I came up with this function:
String.prototype.unformat = function() {
var s='';
for (var i=0; i<this.length;i++) s+=this[i]
return s.replace(/\$[^$]*\$/g, '')
};
Using a string like 'This is a test $33$' and unformat it with the function above, it will return 'This is a test '.
Ok-cool, but I'd like to replace all occurrences in ( $ ... $ ) with the associated char code.
In the example 'This is a test $33$', I like to replace $33$ with the result of the javascript String.fromCharCode() function to get the string 'This is a test !' as result.
How to edit the prototype function above to get the desired result?
Thanks in advance :)

You can use a callback function that returns fromCharCode() with the matched code
String.prototype.unformat = function() {
return this.replace(/\$([^$]*)\$/g, function (string, charcode) {
return String.fromCharCode(charcode);
});
};
console.log(("char: $33$").unformat());
In order to avoid any future problems, I would also adapt the regex to only match digits: /\$(\d+)\$/g

You can use a match group () and replace it with the String.fromCharCode result:
String.prototype.unformat = function() {
return this.replace(/\$(.*?)\$/g, function(match, group) { // the match is the whole match (including the $s), group is the matched group (the thing between the $s)
return String.fromCharCode(group);
});
};
Notes:
No need to copy the string as replace doesn't mutate the original string (this).
The match group (.*?) is a non-greedy one (lazy one) that matches as few characters as possible.
It is better if you don't mess around natives' prototypes (such as String, Number, ...).
Example:
String.prototype.unformat = function() {
return this.replace(/\$(.*?)\$/g, function(match, group) {
return String.fromCharCode(group);
});
};
console.log('This is a test $33$'.unformat());

Determine if string has any characters that aren't in a list of characters and if so, which characters don't match?

I'm working on a simple password validator and wondering if its possible in Regex or... anything besides individually checking for each character.
Basically if the user types in something like "aaaaaaaaa1aaaaa", I want to let the user know that the character "1" is not allowed (This is a super simple example).
I'm trying to avoid something like
if(value.indexOf('#') {}
if(value.indexOf('#') {}
if(value.indexOf('\') {}
Maybe something like:
if(/[^A-Za-z0-9]/.exec(value) {}
Any help?

If you just want to check if the string is valid, you can use RegExp.test() - this is more efficient that exec() as it will return true when it finds the first occurrence:
var value = "abc$de%f";
// checks if value contains any invalid character
if(/[^A-Za-z0-9]/.test(value)) {
alert('invalid');
}
If you want to pick out which characters are invalid you need to use String.match():
var value = "abc$de%f";
var invalidChars = value.match(/[^A-Za-z0-9]/g);
alert('The following characters are invalid: ' + invalidChars.join(''));

Although a simple loop can do the job, here's another approach using a lesser known Array.prototype.some method. From MDN's description of some:
The some() method tests whether some element in the array passes the test implemented by the provided function.
The advantage over looping is that it'll stop going through the array as soon as the test is positive, avoiding breaks.
var invalidChars = ['#', '#', '\\'];
var input = "test#";
function contains(e) {
return input.indexOf(e) > -1;
}
console.log(invalidChars.some(contains)); // true

I'd suggest:
function isValid (val) {
// a simple regular expression to express that the string must be, from start (^)
// to end ($) a sequence of one or more letters, a-z ([a-z]+), of upper-, or lower-,
// case (i):
var valid = /^[a-z]+$/i;
// returning a Boolean (true/false) of whether the passed-string matches the
// regular expression:
return valid.test(val);
}
console.log(isValid ('abcdef') ); // true
console.log(isValid ('abc1def') ); // false
Otherwise, to show the characters that are found in the string and not allowed:
function isValid(val) {
// caching the valid characters ([a-z]), which can be present multiple times in
// the string (g), and upper or lower case (i):
var valid = /[a-z]/gi;
// if replacing the valid characters with zero-length strings reduces the string to
// a length of zero (the assessment is true), then no invalid characters could
// be present and we return true; otherwise, if the evaluation is false
// we replace the valid characters by zero-length strings, then split the string
// between characters (split('')) to form an array and return that array:
return val.replace(valid, '').length === 0 ? true : val.replace(valid, '').split('');
}
console.log(isValid('abcdef')); // true
console.log(isValid('abc1de#f')); // ["1", "#"]
References:
JavaScript conditional operator (assessment ? ifTrue : ifFalse).
JavaScript Regular Expressions.
String.prototype.replace().
String.prototype.split().
RegExp.prototype.test().

If I understand what you are asking you could do the following:
function getInvalidChars() {
var badChars = {
'#' : true,
'/' : true,
'<' : true,
'>' : true
}
var invalidChars = [];
for (var i=0,x = inputString.length; i < x; i++) {
if (badChars[inputString[i]]) invalidChars.push(inputString[i]);
}
return invalidChars;
}
var inputString = 'im/b#d:strin>';
var badCharactersInString = getInvalidChars(inputString);
if (badCharactersInString.length) {
document.write("bad characters in string: " + badCharactersInString.join(','));
}

Javascript profanity match NOT replace

I am building a very basic profanity filter that I only want to apply on some fields on my application (fullName, userDescription) on the serverside.
Does anyone have experience with a profanity filter in production? I only want it to:
'ass hello' <- match
'asster' <- NOT match
Below is my current code but it returns true and false on in succession for some reason.
var badWords = [ 'ass', 'whore', 'slut' ]
, check = new Regexp(badWords.join('|'), 'gi');
function filterString(string) {
return check.test(string);
}
filterString('ass'); // Returns true / false in succession.
How can I fix this "in succession" bug?

The test method sets the lastIndex property of the regex to the current matched position, so that further invocations will match further occurrences (if there were any).
check.lastIndex // 0 (init)
filterString('ass'); // true
check.lastIndex // 3
filterString('ass'); // false
check.lastIndex // now 0 again
So, you will need to reset it manually in your filterString function if you don't recreate the RegExp each time:
function filterString(string) {
check.lastIndex = 0;
return check.test(string);
}
Btw, to match only full words (like "ass", but not "asster"), you should wrap your matches in word boundaries like WTK suggested, i.e.
var check = new Regexp("\\b(?:"+badWords.join('|')+")\\b", 'gi');

You are matching via a substring comparison. Your Regex needs to be modified to match for whole words instead

How about with fixed regexp:
check = new Regexp('(^|\b)'+badWords.join('|')+'($|\b)', 'gi');
check.test('ass') // true
check.test('suckass') // false
check.test('mass of whore') // true
check.test('massive') // false
check.test('slut is massive') // true
I'm using \b match here to match for word boundry (and start or end of whole string).

Develop Reference

JavaScript is the programming language of the Web.

How to include a dictionary in this regex expression - javascript

Related

How can i check for only one occurence only of each provided symbol?

Wrap a regular expression so that it only matches the whole string

How to replace all occurrences between two specific characters with the associated char Code using reqex?

Determine if string has any characters that aren't in a list of characters and if so, which characters don't match?

Javascript profanity match NOT replace

Categories

Resources