Regex not finding two letter words that include Swedish letters

Regex not finding two letter words that include Swedish letters - javascript

So I am very new with Regex and I have managed to create a way to check if a specific word exists inside of a string without just being part of another word.
Example:
I am looking for the word "banana".
banana == true, bananarama == false
This is all fine, however a problem occurs when I am looking for words containing Swedish letters (Å,Ä,Ö) with words containing only two letters.
Example:
I am looking for the word "på" in a string looking like this: "på påsk"
and it comes back as negative.
However if I look for the word "påsk" then it comes back positive.
This is the regex I am using:
const doesWordExist = (s, word) => new RegExp('\\b' + word + '\\b', 'i').test(s);
stringOfWords = "Färg på plagg";
console.log(doesWordExist(stringOfWords, "på"))
//Expected result: true
//Actual result: false
However if I were to change the word "på" to a three letter word then it comes back true:
const doesWordExist = (s, word) => new RegExp('\\b' + word + '\\b', 'i').test(s);
stringOfWords = "Färg pås plagg";
console.log(doesWordExist(stringOfWords, "pås"))
//Expected result: true
//Actual result: true
I have been looking around for answers and I have found a few that have similar issues with Swedish letters, none of them really look for only the word in its entirity.
Could anyone explain what I am doing wrong?

The word boundary \b strictly depends on the characters matched by \w, which is a short-hand character class for [A-Za-z0-9_].
For obtaining a similar behaviour you must re-implement its functionality, for example like this:
const swedishCharClass = '[a-zäöå]';
const doesWordExist = (s, word) => new RegExp(
'(?<!' + swedishCharClass + ')' + word + '(?!' + swedishCharClass + ')', 'i'
).test(s);
console.log(doesWordExist("Färg på plagg", "på")); // true
console.log(doesWordExist("Färg pås plagg", "pås")); // true
console.log(doesWordExist("Färg pås plagg", "på")); // false
For more complex alphabets, I'd suggest you to take a look at Concrete Javascript Regex for Accented Characters (Diacritics).

Related

capitalize first letter of word more than 3 character Regex

I code something in React and i want to use Regex to capitalize first letter of word more than 3 letters with Regex, but I'am lost with Regex, i found lot of things but nothings works. Any advice?
Regex example but dont work
"^[a-z](?=[a-zA-Z'-]{3})|\b[a-zA-Z](?=[a-zA-Z'-]{3,}$)|['-][a-z]"

\w{4,} - this regex expression will match all words that have more than 3 letters
let str = "this is Just an long string with long and short words";
const matches = str.matchAll(/\w{4,}/g);
for(match of matches) {
str = str.substring(0, match.index) + match[0].charAt(0).toUpperCase() + match[0].slice(1) + str.substring(match.index + match[0].length);
}
console.log(str);

Here are two example. One for sentences (like AidOnline01's answer, but using String#replaceAll) and a second one when using words only.
However, when using words only, you can also check for the length instead of using a regexp.
const sentence = "This is a sentence with a few words which should be capitialized";
const word = "capitialized";
// use String#replaceAll to replace all words in a sentence
const sentenceResult = sentence.replaceAll(/\w{4,}/g, word => word[0].toUpperCase() + word.slice(1));
// use String#replace for a single word
const wordResult = word.replace(/\w{4,}/, word => word[0].toUpperCase() + word.slice(1));
console.log(sentenceResult);
console.log(wordResult);

How to find a word that has surrounded with indicator? javascript

I have a string below which has some identifier to get an specific word on it.
string example: "I will c#hec*k on it"
the "#" indicates starting, and the "*" indicates for last.
I want to get two strings.
check - the whole word that has "#" and "*" on it.
hec - string that was surrounded.
I have started to use the below code, but it seems does not work.
sentence.split('#').pop().split('*')[0];
Somebody knows how to do it. would appreciate it thanks

var s = "I will c#hec*k on it"
console.log(s.match(/(?<=#)[^*]*(?=\*)/)) // this will print ["hec"]
console.log(s.match(/\w*#[^*]*\*\w*/).map(s => s.replace(/#(.*)\*/, "$1"))) // this will print ["check"]
where:
(?<=#) means "preceded by a #"
[^*]* matches zero or more characters that are not a *
(?=\*) means "followed by a *"
\w* matches zero or more word characters
(.*) is a capturing group (referenced by $1) matching any number of any kind of character (except for newlines)

I would try something like this with Javascript,
there might be a better approach with regex though.
let sentence = "I will c#hec*k on it";
sentence.split(" ").forEach(word => {
if(word.includes("#") && word.includes("*")){
let betweenChars = word.substring(
word.lastIndexOf("#") + 1,
word.lastIndexOf("*")
)
console.log('Between chars: ', betweenChars);
let withoutChars = word.replace(/[#*]/g,"");
console.log('Without chars: ', withoutChars);
}
});

How to check if a string contains a WORD in javascript? [duplicate]

This question already has answers here:
How to check if a string contain specific words?
(11 answers)
Closed 3 years ago.
So, you can easily check if a string contains a particular substring using the .includes() method.
I'm interested in finding if a string contains a word.
For example, if I apply a search for "on" for the string, "phones are good", it should return false. And, it should return true for "keep it on the table".

You first need to convert it into array using split() and then use includes()
string.split(" ").includes("on")
Just need to pass whitespace " " to split() to get all words

This is called a regex - regular expression
You can use of 101regex website when you need to work around them (it helps). Words with custom separators aswell.
function checkWord(word, str) {
const allowedSeparator = '\\\s,;"\'|';
const regex = new RegExp(
`(^.*[${allowedSeparator}]${word}$)|(^${word}[${allowedSeparator}].*)|(^${word}$)|(^.*[${allowedSeparator}]${word}[${allowedSeparator}].*$)`,
// Case insensitive
'i',
);
return regex.test(str);
}
[
'phones are good',
'keep it on the table',
'on',
'keep iton the table',
'keep it on',
'on the table',
'the,table,is,on,the,desk',
'the,table,is,on|the,desk',
'the,table,is|the,desk',
].forEach((x) => {
console.log(`Check: ${x} : ${checkWord('on', x)}`);
});
Explaination :
I am creating here multiple capturing groups for each possibily :
(^.*\son$) on is the last word
(^on\s.*) on is the first word
(^on$) on is the only word
(^.*\son\s.*$) on is an in-between word
\s means a space or a new line
const regex = /(^.*\son$)|(^on\s.*)|(^on$)|(^.*\son\s.*$)/i;
console.log(regex.test('phones are good'));
console.log(regex.test('keep it on the table'));
console.log(regex.test('on'));
console.log(regex.test('keep iton the table'));
console.log(regex.test('keep it on'));
console.log(regex.test('on the table'));

You can .split() your string by spaces (\s+) into an array, and then use .includes() to check if the array of strings has your word within it:
const hasWord = (str, word) =>
str.split(/\s+/).includes(word);
console.log(hasWord("phones are good", "on"));
console.log(hasWord("keep it on the table", "on"));
If you are worried about punctuation, you can remove it first using .replace() (as shown in this answer) and then split():
const hasWord = (str, word) =>
str.replace(/[.,\/#!$%\^&\*;:{}=\-_`~()]/g,"").split(/\s+/).includes(word);
console.log(hasWord("phones are good son!", "on"));
console.log(hasWord("keep it on, the table", "on"));

You can split and then try to find:
const str = 'keep it on the table';
const res = str.split(/[\s,\?\,\.!]+/).some(f=> f === 'on');
console.log(res);
In addition, some method is very efficient as it will return true if any predicate is true.

You can use .includes() and check for the word. To make sure it is a word and not part of another word, verify that the place you found it in is followed by a space, comma, period, etc and also has one of those before it.

A simple version could just be splitting on the whitespace and looking through the resulting array for the word:
"phones are good".split(" ").find(word => word === "on") // undefined
"keep it on the table".split(" ").find(word => word === "on") // "on"
This just splits by whitespace though, when you need parse text (depending on your input) you'll encounter more word delimiters than whitespace. In that case you could use a regex to account for these characters.
Something like:
"Phones are good, aren't they? They are. Yes!".split(/[\s,\?\,\.!]+/)

I would go with the following assumptions:
Words the start of a sentence always have a trailing space.
Words at the end of a sentence always have a preceding space.
Words in the middle of a sentence always have a trailing and preceding space.
Therefore, I would write my code as follows:
function containsWord(word, sentence) {
return (
sentence.startsWith(word.trim() + " ") ||
sentence.endsWith(" " + word.trim()) ||
sentence.includes(" " + word.trim() + " "));
}
console.log(containsWord("test", "This is a test of the containsWord function."));

Try the following -
var mainString = 'codehandbook'
var substr = /hand/
var found = substr.test(mainString)
if(found){
console.log('Substring found !!')
} else {
console.log('Substring not found !!')
}

Using Regex to add spaces after punctuation but ignore instances of U.S

I am using
(/(?<=[.,])(?=[^\s])/mg,' ')
to add spaces after . and , that are not followed by spaces. I want to ignore instances of the word U.S. Could someone help do this?

You can use this regex
\b(U\.S)\b|([,.])(?=\S)
\b(U\.S)\b - Matches U.S. Since nothing is mentioned in question so i am considering word boundaries. (g1)
([.,])(?=\S) - Matches . or , followed by a non space character. (g2)
let str = 'ab.c,de'
let str2 = 'U.S xyzU.S U.S xyz.x'
const replacer = (input)=>{
return input.replace(/\b(U\.S)\b|([,.])(?=\S)/gm, function(match,g1,g2){
return g1 ? g1 : g2+' '
})
}
console.log(replacer(str))
console.log(replacer(str2))

regex to extract numbers starting from second symbol

Sorry for one more to the tons of regexp questions but I can't find anything similar to my needs. I want to output the string which can contain number or letter 'A' as the first symbol and numbers only on other positions. Input is any string, for example:
---INPUT--- -OUTPUT-
A123asdf456 -> A123456
0qw#$56-398 -> 056398
B12376B6f90 -> 12376690
12A12345BCt -> 1212345
What I tried is replace(/[^A\d]/g, '') (I use JS), which almost does the job except the case when there's A in the middle of the string. I tried to use ^ anchor but then the pattern doesn't match other numbers in the string. Not sure what is easier - extract matching characters or remove unmatching.

I think you can do it like this using a negative lookahead and then replace with an empty string.
In an non capturing group (?:, use a negative lookahad (?! to assert that what follows is not the beginning of the string followed by ^A or a digit \d. If that is the case, match any character .
(?:(?!^A|\d).)+
var pattern = /(?:(?!^A|\d).)+/g;
var strings = [
"A123asdf456",
"0qw#$56-398",
"B12376B6f90",
"12A12345BCt"
];
for (var i = 0; i < strings.length; i++) {
console.log(strings[i] + " ==> " + strings[i].replace(pattern, ""));
}

You can match and capture desired and undesired characters within two different sides of an alternation, then replace those undesired with nothing:
^(A)|\D
JS code:
var inputStrings = [
"A-123asdf456",
"A123asdf456",
"0qw#$56-398",
"B12376B6f90",
"12A12345BCt"
];
console.log(
inputStrings.map(v => v.replace(/^(A)|\D/g, "$1"))
);

You can use the following regex : /(^A)?\d+/g
var arr = ['A123asdf456','0qw#$56-398','B12376B6f90','12A12345BCt', 'A-123asdf456'],
result = arr.map(s => s.match(/(^A|\d)/g).join(''));
console.log(result);

Develop Reference

JavaScript is the programming language of the Web.

Regex not finding two letter words that include Swedish letters - javascript

Related

capitalize first letter of word more than 3 character Regex

How to find a word that has surrounded with indicator? javascript

How to check if a string contains a WORD in javascript? [duplicate]

Using Regex to add spaces after punctuation but ignore instances of U.S

regex to extract numbers starting from second symbol

Categories

Resources