Javascript regular expression for matching whole words including special characters

Javascript regular expression for matching whole words including special characters - javascript

I am trying to match whole exact words using a javascript regular expression.
Given the strings: 1) "I know C++." and 2) "I know Java."
I have tried using new Regex('\\b' + text + '\\b', 'gi') and that works great for words without special characters like example #2.
I've also taken a look at this url:
Regular expression for matching exact word affect the special character matching
and implemented the:
escaped = escaped.replace(/^(\w)/, "\\b$1");
escaped = escaped.replace(/(\w)$/, "$1\\b");
and that will match text = 'C++' (it will match both examples)
However, if someone types a typo, and the string is "I know C++too.", the latter regex will still match the C++ when I don't want it to because the word "C++too" is not an exact match for text = 'C++'.
What changes can I make so that it will not match unless C++ is both the front of the word and the end of the word.

You can add a range of accepted characters([+#]) after word characters:
str = 'I know C++too. I know Java and C#.';
console.log(str.match(/(\w[+#]+|\w+)/g));
NB: \w[+#]+ must be placed first in the alternation expression to take precedence over the more generic \w+.

If whole words including special characters means everything but [\r\n\t\f\v ], you can simply do:
const REGEX = /([^\s]+)+/g;
function selectWords(string) {
const REGEX = /([^\s]+)+/g;
return string
// remove punctuation
.replace(/[^a-z0-9\s+#]/ig, "")
// perform the match
.match(REGEX)
// prevent null returns
|| []
;
}
var text = "Hello World"
var [first, second, ...rest] = selectWords(text);
console.log(1, {first, second, rest});
// example with punctuation
var text = "I can come today, she said, but not tomorrow."
var [first, second, third, ...rest] = selectWords(text);
console.log(2, {first, second, third, rest});
// example with possible throw
var text = ",.'\"` \r"
var [first, second, third, ...rest] = selectWords(text);
console.log(3, {first, second, third, rest});
// example with a specific word to be matched
function selectSpecificWord(string, ...words) {
return selectWords(string)
.filter(word => ~words.indexOf(word))
;
}
var expected = "C++";
var test = "I know C++";
var test1 = "I know C++AndJava";
console.log("Test Case 1", selectSpecificWord(test, expected));
console.log("Test Case 2", selectSpecificWord(test1, expected));

Use this ((?:(?:\w)+?)(?=\b|\w[-+]{2,2})(?:[-+]{2,2})?)
I've included a - symbol for an example also. See it in life.

Related

Replace a specific character from a string with HTML tags

Having a text input, if there is a specific character it must convert it to a tag. For example, the special character is *, the text between 2 special characters must appear in italic.
For example:
This is *my* wonderful *text*
must be converted to:
This is <i>my</i> wonderful <i>text</i>
So I've tried like:
const arr = "This is *my* wonderful *text*";
if (arr.includes('*')) {
arr[index] = arr.replace('*', '<i>');
}
it is replacing the star character with <i> but doesn't work if there are more special characters.
Any ideas?

You can simply create wrapper and thereafter use regular expression to detect if there is any word that is surrounded by * and simply replace it with any tag, in your example is <i> tag so just see the following
Example
let str = "This is *my* wonderful *text*";
let regex = /(?<=\*)(.*?)(?=\*)/;
while (str.includes('*')) {
let matched = regex.exec(str);
let wrap = "<i>" + matched[1] + "</i>";
str = str.replace(`*${matched[1]}*`, wrap);
}
console.log(str);

here you go my friend:
var arr = "This is *my* wonderful *text*";
const matched = arr.match(/\*(?:.*?)\*/g);
for (let i = 0; i < matched.length; i++) {
arr = arr.replace(matched[i], `<i>${matched[i].replaceAll("*", "")}</i>`);
}
console.log(arr);
an explanation first of all we're matching the regex globaly by setting /g NOTE: that match with global flag returns an array.
secondly we're looking for any character that lies between two astrisks and we're escaping them because both are meta characters.
.*? match everything in greedy way so we don't get something like this my*.
?: for non capturing groups, then we're replacing every element we've matched with itself but without astrisk.

Regex that allows a pattern to start with a an optional, specific character, but no other character

How can I write a regex that allows a pattern to start with a specific character, but that character is optional?
For example, I would like to match all instances of the word "hello" where "hello" is either at the very start of the line or preceded by an "!", in which case it does not have to be at the start of the line. So the first three options here should match, but not the last:
hello
!hello
some other text !hello more text
ahello
I'm specfically interested in JavaScript.

Match it with: /^hello|!hello/g
The ^ will only grab the word "hello" if it's at the beginning of a line.
The | works as an OR.
var str = "hello\n!hello\n\nsome other text !hello more text\nahello";
var regex = /^hello|!hello/g;
console.log( str.match(regex) );
Edit:
If you're trying to match the whole line beginning with "hello" or containing "!hello" as suggested in the comment below, then use the following regex:
/^.*(^hello|!hello).*$/gm
var str = "hello\n!hello\n\nsome other text !hello more text\nahello";
var regex = /^.*(^hello|!hello).*$/gm;
console.log(str.match(regex));

Final solution (hopefully)
Looks like, catching the groups is only available in ECMAScript 2020. Link 1, Link 2.
As a workaround I've found the following solution:
const str = `hello
!hello
some other text !hello more text
ahello
this is a test hello !hello
JvdV is saying hello
helloing or helloed =).`;
function collectGroups(regExp, str) {
const groups = [];
str.replace(regExp, (fullMatch, group1, group2) => {
groups.push(group1 || group2);
});
return groups;
}
const regex = /^(hello)|(?:!)(hello\b)/g;
const groups = collectGroups(regex, str)
console.log(groups)
/(?=!)?(\bhello\b)/g should do it. Playground.
Example:
const regexp = /(?=!)?(\bhello\b)/g;
const str = `
hello
!hello
some other text !hello more text
ahello
`;
const found = str.match(regexp)
console.log(found)
Explanation:
(?=!)?
(?=!) positive lookahead for !
? ! is optional
(\bhello\b): capturing group
\b word boundary ensures that hello is not preceded or succeeded by a character
Note: If you also make sure, that hello should not be succeeded by !, then you could simply add a negative lookahead like so /(?=!)?(\bhello\b)(?!!)/g.
Update
Thanks to the hint of #JvdV in the comment, I've adapted the regex now, which should meet your requirements:
/(^hello\b)|(?:!)(hello\b)/gm
Playground: https://regex101.com/r/CXXPHK/4 (The explanation can be found on the page as well).
Update 2:
Looks like the non-capturing group (?:!) doesn't work well in JavaScript, i.e. I get a matching result like ["hello", "!hello", "!hello", "!hello"], where ! is also included. But who cares, here is a workaround:
const regex = /(^hello\b)|(?:!)(hello\b)/gm;
const found = (str.match(regex) || []).map(m => m.replace(/^!/, ''));

How to check if a string contains a WORD in javascript? [duplicate]

This question already has answers here:
How to check if a string contain specific words?
(11 answers)
Closed 3 years ago.
So, you can easily check if a string contains a particular substring using the .includes() method.
I'm interested in finding if a string contains a word.
For example, if I apply a search for "on" for the string, "phones are good", it should return false. And, it should return true for "keep it on the table".

You first need to convert it into array using split() and then use includes()
string.split(" ").includes("on")
Just need to pass whitespace " " to split() to get all words

This is called a regex - regular expression
You can use of 101regex website when you need to work around them (it helps). Words with custom separators aswell.
function checkWord(word, str) {
const allowedSeparator = '\\\s,;"\'|';
const regex = new RegExp(
`(^.*[${allowedSeparator}]${word}$)|(^${word}[${allowedSeparator}].*)|(^${word}$)|(^.*[${allowedSeparator}]${word}[${allowedSeparator}].*$)`,
// Case insensitive
'i',
);
return regex.test(str);
}
[
'phones are good',
'keep it on the table',
'on',
'keep iton the table',
'keep it on',
'on the table',
'the,table,is,on,the,desk',
'the,table,is,on|the,desk',
'the,table,is|the,desk',
].forEach((x) => {
console.log(`Check: ${x} : ${checkWord('on', x)}`);
});
Explaination :
I am creating here multiple capturing groups for each possibily :
(^.*\son$) on is the last word
(^on\s.*) on is the first word
(^on$) on is the only word
(^.*\son\s.*$) on is an in-between word
\s means a space or a new line
const regex = /(^.*\son$)|(^on\s.*)|(^on$)|(^.*\son\s.*$)/i;
console.log(regex.test('phones are good'));
console.log(regex.test('keep it on the table'));
console.log(regex.test('on'));
console.log(regex.test('keep iton the table'));
console.log(regex.test('keep it on'));
console.log(regex.test('on the table'));

You can .split() your string by spaces (\s+) into an array, and then use .includes() to check if the array of strings has your word within it:
const hasWord = (str, word) =>
str.split(/\s+/).includes(word);
console.log(hasWord("phones are good", "on"));
console.log(hasWord("keep it on the table", "on"));
If you are worried about punctuation, you can remove it first using .replace() (as shown in this answer) and then split():
const hasWord = (str, word) =>
str.replace(/[.,\/#!$%\^&\*;:{}=\-_`~()]/g,"").split(/\s+/).includes(word);
console.log(hasWord("phones are good son!", "on"));
console.log(hasWord("keep it on, the table", "on"));

You can split and then try to find:
const str = 'keep it on the table';
const res = str.split(/[\s,\?\,\.!]+/).some(f=> f === 'on');
console.log(res);
In addition, some method is very efficient as it will return true if any predicate is true.

You can use .includes() and check for the word. To make sure it is a word and not part of another word, verify that the place you found it in is followed by a space, comma, period, etc and also has one of those before it.

A simple version could just be splitting on the whitespace and looking through the resulting array for the word:
"phones are good".split(" ").find(word => word === "on") // undefined
"keep it on the table".split(" ").find(word => word === "on") // "on"
This just splits by whitespace though, when you need parse text (depending on your input) you'll encounter more word delimiters than whitespace. In that case you could use a regex to account for these characters.
Something like:
"Phones are good, aren't they? They are. Yes!".split(/[\s,\?\,\.!]+/)

I would go with the following assumptions:
Words the start of a sentence always have a trailing space.
Words at the end of a sentence always have a preceding space.
Words in the middle of a sentence always have a trailing and preceding space.
Therefore, I would write my code as follows:
function containsWord(word, sentence) {
return (
sentence.startsWith(word.trim() + " ") ||
sentence.endsWith(" " + word.trim()) ||
sentence.includes(" " + word.trim() + " "));
}
console.log(containsWord("test", "This is a test of the containsWord function."));

Try the following -
var mainString = 'codehandbook'
var substr = /hand/
var found = substr.test(mainString)
if(found){
console.log('Substring found !!')
} else {
console.log('Substring not found !!')
}

How to match a string inside another string while ignoring whitespace

I have a string to search, and I need to match and return another string at the beginning of the search string. The string being searched may have whitespace in it, which needs to be ignored for the purpose of searching, but still returned accurately. The string to be matched will never have whitespace in it.
stringA = "ThisIsAString";
Should give the following results when compared to stringB:
stringB = "This Is A String"; //"This Is A String"
stringB = "ThisIsAlsoAString"; //undefined
stringB = "ThisIs A String With Extra Words At The End"; //"ThisIs A String"
stringB = "Hey, ThisIsAString"; //undefined
What's an efficient way to do this?

You can use \s* to match zero or more spaces. With the code below we put one of those matchers between each character.
const stringA = "ThisIsAString";
const tests = [
"This Is A String",
"ThisIsAlsoAString",
"ThisIs A String With Extra Words At The End",
"Hey, ThisIsAString",
];
const optionalSpaces = '\\s*';
const results = tests.map(test =>
test.match(
new RegExp(stringA.split('').join(optionalSpaces))
)
);
console.log(results);

An easy way to do this would be to remove all white space from the two things you are comparing.
var search = "Hello There";
var text = "HelloThere I'm Greg";
var found = text.replace(/ +?/g, '').indexOf(search.replace(/ +?/g, '')) !== -1;

Correct sentence structure via javascript regular expressions

Below I have a sentance and desiredResult for the sentance. Using the pattern below I can snag the t T that needs to be changed to t, t but I don't know where to go further.
var sentence = "Over the candidate behaves the patent Then the doctor.";
var desiredResult = "Over the candidate behaves the patent, then the doctor.";
var pattern = /[a-z]\s[A-Z]/g;
I want to a correct sentence by adding comma and a space before a capital other than 'I' if the preceding letter is lowercase.

Use .replace() on your sentence and pass replacing function as second parameter
var corrected = sentence.replace(
/([a-z])\s([A-Z])/g,
function(m,s1,s2){ //arguments: whole match (t T), subgroup1 (t), subgroup2 (T)
return s1+', '+s2.toLowerCase();
}
);
As for preserving uppercased I, there are many ways, one of them:
var corrected = sentence.replace(
/([a-z])\s([A-Z])(.)/g,
function(m,s1,s2,s3){
return s1+((s2=='I' && /[^a-z]/i.test(s3))?(' '+s2):(', '+s2.toLowerCase()))+s3;
}
);
But there are more cases when it will fail, like: His name is Joe., WTF is an acronym for What a Terrible Failure. and many others.

Develop Reference

JavaScript is the programming language of the Web.

Javascript regular expression for matching whole words including special characters - javascript

You can add a range of accepted characters([+#]) after word characters: str = 'I know C++too. I know Java and C#.'; console.log(str.match(/(\w[+#]+|\w+)/g)); NB: \w[+#]+ must be placed first in the alternation expression to take precedence over the more generic \w+.

Use this ((?:(?:\w)+?)(?=\b|\w[-+]{2,2})(?:[-+]{2,2})?) I've included a - symbol for an example also. See it in life.

Related

Replace a specific character from a string with HTML tags

Regex that allows a pattern to start with a an optional, specific character, but no other character

How to check if a string contains a WORD in javascript? [duplicate]

How to match a string inside another string while ignoring whitespace

Correct sentence structure via javascript regular expressions

Categories

Resources