Wrap a regular expression so that it only matches the whole string - javascript

Related: Regex - Match whole string
I want a given regexp to match the whole string. For example, when given a regexp /abc/, it should only match string "abc" but not "abcd". I have searched the above question, which has somehow a similar situation than mine. The difference here is that the regexp is not directly written in my code. So I can't change /abc/ to /^abc$/ directly in the source code.
Let's say, I want a function which takes two arguments: a 'regexp' (e.g. /abc/) and a string (e.g. 'abc'). The function returns matched result if and only if the given regexp matches the whole string, but returns null otherwise.
What I'm trying:
function match(regexp, string) {
var parse = /\/(.*)\/(.*)/.exec(regexp);
var reg = new RegExp('^' + parse[1] + '$', parse[2]);
return string.match(reg);
}
Is this code correct? Any better way to do so?

You transform regex such as /a+|b+/ into ^a+|b+$ which still matches e.g. 'xbb'. A solution would be to wrap the inner regex with an anonymous group: ^(?:a+|b+)$.
Also, you currently truncate useful regex flags such as /.../m or /.../i.
Alternatively, you could simply use the original regex and check if the result covers the whole input string length:
function fullmatch(regex, string) {
const result = string.match(regex);
return !!result && result[0].length === string.length;
}
// Example:
console.log(fullmatch(/a+|b+/, 'xbb')); // false
console.log(fullmatch(/a+|b+/, 'bbb')); // true
// Respects lazy quantifier:
console.log(fullmatch(/b+?/, 'bbb')); // false
console.log(fullmatch(/b+/, 'bbb')); // true
// Invariant to global flag g:
console.log(fullmatch(/b+?/g, 'bbb')); // false
console.log(fullmatch(/b+/g, 'bbb')); // true

Related

How can i check for only one occurence only of each provided symbol?

I have a provided array of symbols, which can be different. For instance, like this - ['#']. One occurrence of each symbol is a mandatory. But in a string there can be only one of each provided sign.
Now I do like this:
const regex = new RegExp(`^\\w+[${validatedSymbols.join()}]\\w+$`);
But it also returns an error on signs like '=' and so on. For example:
/^\w+[#]\w+$/.test('string#=string') // false
So, the result I expect:
'string#string' - ok
'string##string - not ok
Using a complex regex is most likely not the best solution. I think you would be better of creating a validation function.
In this function you can find all occurrence of the provided symbols in string. Then return false if no occurrences are found, or if the list of occurrences contains duplicate entries.
// https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_Expressions#escaping
const escapeRegExp = (string) => string.replace(/[.*+?^${}()|[\]\\]/g, '\\$&');
function validate(string, symbols) {
if (symbols.length == 0) {
throw new Error("at least one symbol must be provided in the symbols array");
}
const symbolRegex = new RegExp(symbols.map(escapeRegExp).join("|"), "g");
const symbolsInString = string.match(symbolRegex); // <- null if no match
// string must at least contain 1 occurrence of any symbol
if (!symbolsInString) return false;
// symbols may only occur once
const hasDuplicateSymbols = symbolsInString.length != new Set(symbolsInString).size;
return !hasDuplicateSymbols;
}
const validatedSymbols = ["#", "="];
const strings = [
"string!*string", // invalid (doesn't have "#" nor "=")
"string#!string", // valid
"string#=string", // valid
"string##string", // invalid (max 1 occurance per symbol)
];
console.log("validatedSymbols", "=", JSON.stringify(validatedSymbols));
for (const string of strings) {
const isValid = validate(string, validatedSymbols);
console.log(JSON.stringify(string), "//=>", isValid);
}
I think you are looking for the following:
const regex = new RegExp(`^\\w+[${validatedSymbols.join()}]?\\w+$`);
The question mark means 1 or 0 of the previous group.
You might also need to escape the symbols in validatedSymbols as some symbols have a different meaning in regex
Edit:
For mandatory symbols it would be easier to add a group per symbol:
^\w+(#\w*){1}(#\w*){1}\w+$
Where the group is:
(#\w*){1}

How to replace all substrings enclosed by {} in a string?

I have to address an algo problem with vanilla JS.
I have a string:
let stringExample = "my {cat} has {2} ears";
And I want to replace it (in order) with an array of replacements
const replaceArray = ["dog", "4"];
So that after the replacement, the stringExample should be:
"my dog has 4 ears"
One approach that can be used is as follows:
// a sample of Strings and their corresponding replacement-arrays:
let string1 = "my {cat} has {2} ears",
arr1 = ["dog", "4"],
string2 = "They call me ~Ishmael~",
arr2 = ['Susan'],
string3 = "It is a ##truth## ##universally acknowledged##...",
arr3 = ['questionable assertion', 'oft cited'];
// here we use a named Arrow function which takes three arguments:
// haystack: String, the string which features the words/characters to
// be replaced,
// needles: Array of Strings with which to replace the identified
// groups in the 'haystack',
// Array of Strings, these strings represent the characters with
// which the words/characters to be replaced may be identified;
// the first String (index 0) is the character marking the begining
// of the capture group, and the second (index 1) indicates the
// end of the captured group; this is supplied with the default
// curly-braces:
const replaceWords = (haystack, needles, delimiterPair = ['{', '}']) => {
// we use destructuring assignment, to assign the string at
// index 0 to the 'start' variable, the string at index 1 to
// the 'end' variable; if no argument is provided the default
// curly-braces are used/assigned:
const [start, end] = delimiterPair,
// here we construct a regular expression, using a template
// string which interpolates the variables within the string;
// the regular expression is composed of:
// 'start' (eg: '{')
// .+ : any character that appears one or more times,
// ? : lazy quantifier so the expression matches the
// shortest possible string,
// 'end' (eg: '}'),
// matched with the 'g' (global) flag to replace all
// matches within the supplied string.
// this gives a regular expression of: /{.+?}/g
regexp = new RegExp(`${start}.+?${end}`, 'g')
// here we compose a String using the template literal, to
// interpolate the 'haystack' variable and also a tab-character
// concatenated with the result returned by String.prototype.replace()
// we use an anonymous function to supply the replacement-string:
//
return `"${haystack}":\t` + haystack.replace(regexp, () => {
// here we remove the first element from the array of replacements
// provided to the function, and return that to the string as the
// replacement:
return needles.shift();
})
}
console.log(replaceWords(string1, arr1));
console.log(replaceWords(string2, arr2, ['~', '~']));
console.log(replaceWords(string3, arr3, ['##', '##']));
JS Fiddle demo.
This has no sanity checks at all, nor have I explored to find any edge-cases. If at all possible I would seriously recommend using an open source – and well-tested, well-proofed – framework or templating library.
References:
Arrow function syntax.
Regular Expressions.
String.prototype.replace().
Bibliography:
Regular Expression Syntax Cheat Sheet.

How to include a dictionary in this regex expression

I'm starting with Javascript, I have created this function to validate certain words on input, (return true or false)
export default function validate(props) {
return props.match(/war|gun|kill/g) != null;
}
But I will be including in the future more words and the regex expression will be very long, can you tell me a better way to rewrite this function?
You can maintain a list of words, and include regex in the words, such as guns? for singular and plural form.
Here is a flagString function based on your example:
function flagString(str) {
const bannedRe = new RegExp('\\b(' + banned.join('|') + ')\\b', 'i');
return bannedRe.test(str);
}
var banned = [ 'guns?', 'kill', 'war' ];
console.log(flagString('this is ok')); // returns false
console.log(flagString('guns are not ok')); // returns true
console.log(flagString('to kill is not ok')); // returns true
Notes:
The '\\b(' and ')\\b' anchor the words on boundaries, this is to avoid false positives
The .join('|') joins the words into a single regex with ORed words, so that you can test your string in a single swoop for performance

Determine if string has any characters that aren't in a list of characters and if so, which characters don't match?

I'm working on a simple password validator and wondering if its possible in Regex or... anything besides individually checking for each character.
Basically if the user types in something like "aaaaaaaaa1aaaaa", I want to let the user know that the character "1" is not allowed (This is a super simple example).
I'm trying to avoid something like
if(value.indexOf('#') {}
if(value.indexOf('#') {}
if(value.indexOf('\') {}
Maybe something like:
if(/[^A-Za-z0-9]/.exec(value) {}
Any help?
If you just want to check if the string is valid, you can use RegExp.test() - this is more efficient that exec() as it will return true when it finds the first occurrence:
var value = "abc$de%f";
// checks if value contains any invalid character
if(/[^A-Za-z0-9]/.test(value)) {
alert('invalid');
}
If you want to pick out which characters are invalid you need to use String.match():
var value = "abc$de%f";
var invalidChars = value.match(/[^A-Za-z0-9]/g);
alert('The following characters are invalid: ' + invalidChars.join(''));
Although a simple loop can do the job, here's another approach using a lesser known Array.prototype.some method. From MDN's description of some:
The some() method tests whether some element in the array passes the test implemented by the provided function.
The advantage over looping is that it'll stop going through the array as soon as the test is positive, avoiding breaks.
var invalidChars = ['#', '#', '\\'];
var input = "test#";
function contains(e) {
return input.indexOf(e) > -1;
}
console.log(invalidChars.some(contains)); // true
I'd suggest:
function isValid (val) {
// a simple regular expression to express that the string must be, from start (^)
// to end ($) a sequence of one or more letters, a-z ([a-z]+), of upper-, or lower-,
// case (i):
var valid = /^[a-z]+$/i;
// returning a Boolean (true/false) of whether the passed-string matches the
// regular expression:
return valid.test(val);
}
console.log(isValid ('abcdef') ); // true
console.log(isValid ('abc1def') ); // false
Otherwise, to show the characters that are found in the string and not allowed:
function isValid(val) {
// caching the valid characters ([a-z]), which can be present multiple times in
// the string (g), and upper or lower case (i):
var valid = /[a-z]/gi;
// if replacing the valid characters with zero-length strings reduces the string to
// a length of zero (the assessment is true), then no invalid characters could
// be present and we return true; otherwise, if the evaluation is false
// we replace the valid characters by zero-length strings, then split the string
// between characters (split('')) to form an array and return that array:
return val.replace(valid, '').length === 0 ? true : val.replace(valid, '').split('');
}
console.log(isValid('abcdef')); // true
console.log(isValid('abc1de#f')); // ["1", "#"]
References:
JavaScript conditional operator (assessment ? ifTrue : ifFalse).
JavaScript Regular Expressions.
String.prototype.replace().
String.prototype.split().
RegExp.prototype.test().
If I understand what you are asking you could do the following:
function getInvalidChars() {
var badChars = {
'#' : true,
'/' : true,
'<' : true,
'>' : true
}
var invalidChars = [];
for (var i=0,x = inputString.length; i < x; i++) {
if (badChars[inputString[i]]) invalidChars.push(inputString[i]);
}
return invalidChars;
}
var inputString = 'im/b#d:strin>';
var badCharactersInString = getInvalidChars(inputString);
if (badCharactersInString.length) {
document.write("bad characters in string: " + badCharactersInString.join(','));
}

Javascript profanity match NOT replace

I am building a very basic profanity filter that I only want to apply on some fields on my application (fullName, userDescription) on the serverside.
Does anyone have experience with a profanity filter in production? I only want it to:
'ass hello' <- match
'asster' <- NOT match
Below is my current code but it returns true and false on in succession for some reason.
var badWords = [ 'ass', 'whore', 'slut' ]
, check = new Regexp(badWords.join('|'), 'gi');
function filterString(string) {
return check.test(string);
}
filterString('ass'); // Returns true / false in succession.
How can I fix this "in succession" bug?
The test method sets the lastIndex property of the regex to the current matched position, so that further invocations will match further occurrences (if there were any).
check.lastIndex // 0 (init)
filterString('ass'); // true
check.lastIndex // 3
filterString('ass'); // false
check.lastIndex // now 0 again
So, you will need to reset it manually in your filterString function if you don't recreate the RegExp each time:
function filterString(string) {
check.lastIndex = 0;
return check.test(string);
}
Btw, to match only full words (like "ass", but not "asster"), you should wrap your matches in word boundaries like WTK suggested, i.e.
var check = new Regexp("\\b(?:"+badWords.join('|')+")\\b", 'gi');
You are matching via a substring comparison. Your Regex needs to be modified to match for whole words instead
How about with fixed regexp:
check = new Regexp('(^|\b)'+badWords.join('|')+'($|\b)', 'gi');
check.test('ass') // true
check.test('suckass') // false
check.test('mass of whore') // true
check.test('massive') // false
check.test('slut is massive') // true
I'm using \b match here to match for word boundry (and start or end of whole string).

Categories

Resources