find and remove words matching a substring in a sentence - javascript

Is it possible to use regex to find all words within a sentence that contains a substring?
Example:
var sentence = "hello my number is 344undefined848 undefinedundefined undefinedcalling whistleundefined";
I need to find all words in this sentence which contains 'undefined' and remove those words.
Output should be "hello my number is ";
FYI - currently I tokenize (javascript) and iterate through all the tokens to find and remove, then merge the final string. I need to use regex. Please help.
Thanks!

You can use:
str = str.replace(/ *\b\S*?undefined\S*\b/g, '');
RegEx Demo

It certainly is possible.
Something like start of word, zero or more letters, "undefined", zero or more letters, end of word should do it.
A word boundary is \b outside a character class, so:
\b\w*?undefined\w*?\b
using non-greedy repetition to avoid the letter matching tryig to match "undefined" and leading to lots of backtracking.
Edit switch [a-zA-Z] to \w because the example includes numbers in the "words".

\S*undefined\S*
Try this simple regex.Replace by empty string.See demo.
https://www.regex101.com/r/fG5pZ8/5

you can use str.replace function like this
str = str.replace(/undefined/g, '');

Since there are enough solutions with regular expressions, here is another one - using arrays and simple function that finds occurrence of a string in a string :)
Even though the code looks more "dirty", it actually works faster than regular expression, so it might make sense to consider it when dealing with LARGE strings
var sentence = "hello my number is 344undefined848 undefinedundefined undefinedcalling whistleundefined";
var array = sentence.split(' ');
var sanitizedArray = [];
for (var i = 0; i <= array.length; i++) {
if (undefined !== array[i] && array[i].indexOf('undefined') == -1) {
sanitizedArray.push(array[i]);
}
}
var sanitizedSentence = sanitizedArray.join(' ');
alert(sanitizedSentence);
Fiddle: http://jsfiddle.net/448bbumh/

Related

RegExp to filter characters after the last dot

For example, I have a string "esolri.gbn43sh.earbnf", and I want to remove every character after the last dot(i.e. "esolri.gbn43sh"). How can I do so with regular expression?
I could of course use non-RegExp way to do it, for example:
"esolri.gbn43sh.earbnf".slice("esolri.gbn43sh.earbnf".lastIndexOf(".")+1);
But I want a regular expression.
I tried /\..*?/, but that remove the first dot instead.
I am using Javascript. Any help is much appreciated.
I would use standard js rather than regex for this one, as it will be easier for others to understand your code
var str = 'esolri.gbn43sh.earbnf'
console.log(
str.slice(str.lastIndexOf('.') + 1)
)
Pattern Matching
Match a dot followed by non-dots until the end of string
let re = /\.[^.]*$/;
Use this with String.prototype.replace to achieve the desired output
'foo.bar.baz'.replace(re, ''); // 'foo.bar'
Other choices
You may find it is more efficient to do a simple substring search for the last . and then use a string slicing method on this index.
let str = 'foo.bar.baz',
i = str.lastIndexOf('.');
if (i !== -1) // i = -1 means no match
str = str.slice(0, i); // "foo.bar"

change regex to match some words instead of all words containing PRP

This regex matches all characters between whitespace if the word contains PRP.
How can I get it to match all words, or characters in-between whitepsace, if they contain PRP, but not if they contain me in any case.
So match all words containing PRP, but not containing ME or me.
Here is the regex to match words containing PRP: \S*PRP\S*
You can use negative lookahead for this:
(?:^|\s)((?!\S*?(?:ME|me))\S*?PRP\S*)
Working Demo
PS: Use group #1 for your matched word.
Code:
var re = /(?:^|\s)((?!\S*?(?:ME|me))\S*?PRP\S*)/;
var s = 'word abcPRP def';
var m = s.match(re);
if (m) console.log(m[1]); //=> abcPRP
Instead of using complicated regular expressions which would be confusing for almost anyone who's reading it, why don't you break up your code into two sections, separating the words into an array and filtering out the results with stuff you don't want?
function prpnotme(w) {
var r = w.match(/\S+/g);
if(r == null)
return [];
var i=0;
while(i<r.length) {
if(!r[i].contains('PRP') || r[i].toLowerCase().contains('me'))
r.splice(i,1);
else
i++;
}
return r;
}
console.log(prpnotme('whattttttt ok')); // []
console.log(prpnotme('MELOLPRP PRPRP PRPthemeok PRPmhm')); // ['PRPRP', 'PRPmhm']
For a very good reason why this is important, imagine if you ever wanted to add more logic. You're much more likely to make a mistake when modifying complicated regex to make it even more complicated, and this way it's done with simple logic that make perfect sense when reading each predicate, no matter how much you add on.

getting contents of string between digits

have a regex problem :(
what i would like to do is to find out the contents between two or more numbers.
var string = "90+*-+80-+/*70"
im trying to edit the symbols in between so it only shows up the last symbol and not the ones before it. so trying to get the above variable to be turned into 90+80*70. although this is just an example i have no idea how to do this. the length of the numbers, how many "sets" of numbers and the length of the symbols in between could be anything.
many thanks,
Steve,
The trick is in matching '90+-+' and '80-+/' seperately, and selecting only the number and the last constant.
The expression for finding the a number followed by 1 or more non-numbers would be
\d+[^\d]+
To select the number and the last non-number, add parens:
(\d+)[^\d]*([^\d])
Finally add a /g to repeat the procedure for each match, and replace it with the 2 matched groups for each match:
js> '90+*-+80-+/*70'.replace(/(\d+)[^\d]*([^\d])/g, '$1$2');
90+80*70
js>
Or you can use lookahead assertion and simply remove all non-numerical characters which are not last: "90+*-+80-+/*70".replace(/[^0-9]+(?=[^0-9])/g,'');
You can use a regular expression to match the non-digits and a callback function to process the match and decide what to replace:
var test = "90+*-+80-+/*70";
var out = test.replace(/[^\d]+/g, function(str) {
return(str.substr(-1));
})
alert(out);
See it work here: http://jsfiddle.net/jfriend00/Tncya/
This works by using a regular expression to match sequences of non-digits and then replacing that sequence of non-digits with the last character in the matched sequence.
i would use this tutorial, first, then review this for javascript-specific regex questions.
This should do it -
var string = "90+*-+80-+/*70"
var result = '';
var arr = string.split(/(\d+)/)
for (i = 0; i < arr.length; i++) {
if (!isNaN(arr[i])) result = result + arr[i];
else result = result + arr[i].slice(arr[i].length - 1, arr[i].length);
}
alert(result);
Working demo - http://jsfiddle.net/ipr101/SA2pR/
Similar to #Arnout Engelen
var string = "90+*-+80-+/*70";
string = string.replace(/(\d+)[^\d]*([^\d])(?=\d+)/g, '$1$2');
This was my first thinking of how the RegEx should perform, it also looks ahead to make sure the non-digit pattern is followed by another digit, which is what the question asked for (between two numbers)
Similar to #jfriend00
var string = "90+*-+80-+/*70";
string = string.replace( /(\d+?)([^\d]+?)(?=\d+)/g
, function(){
return arguments[1] + arguments[2].substr(-1);
});
Instead of only matching on non-digits, it matches on non-digits between two numbers, which is what the question asked
Why would this be any better?
If your equation was embedded in a paragraph or string of text. Like:
This is a test where I want to clean up something like 90+*-+80-+/*70 and don't want to scrap the whole paragraph.
Result (Expected) :
This is a test where I want to clean up something like 90+80*70 and don't want to scrap the whole paragraph.
Why would this not be any better?
There is more pattern matching, which makes it theoretically slower (negligible)
It would fail if your paragraph had embedded numbers. Like:
This is a paragraph where Sally bought 4 eggs from the supermarket, but only 3 of them made it back in one piece.
Result (Unexpected):
This is a paragraph where Sally bought 4 3 of them made it back in one piece.

Remove all special characters with RegExp

I would like a RegExp that will remove all special characters from a string. I am trying something like this but it doesn’t work in IE7, though it works in Firefox.
var specialChars = "!##$^&%*()+=-[]\/{}|:<>?,.";
for (var i = 0; i < specialChars.length; i++) {
stringToReplace = stringToReplace.replace(new RegExp("\\" + specialChars[i], "gi"), "");
}
A detailed description of the RegExp would be helpful as well.
var desired = stringToReplace.replace(/[^\w\s]/gi, '')
As was mentioned in the comments it's easier to do this as a whitelist - replace the characters which aren't in your safelist.
The caret (^) character is the negation of the set [...], gi say global and case-insensitive (the latter is a bit redundant but I wanted to mention it) and the safelist in this example is digits, word characters, underscores (\w) and whitespace (\s).
Note that if you still want to exclude a set, including things like slashes and special characters you can do the following:
var outString = sourceString.replace(/[`~!##$%^&*()_|+\-=?;:'",.<>\{\}\[\]\\\/]/gi, '');
take special note that in order to also include the "minus" character, you need to escape it with a backslash like the latter group. if you don't it will also select 0-9 which is probably undesired.
Plain Javascript regex does not handle Unicode letters.
Do not use [^\w\s], this will remove letters with accents (like àèéìòù), not to mention to Cyrillic or Chinese, letters coming from such languages will be completed removed.
You really don't want remove these letters together with all the special characters. You have two chances:
Add in your regex all the special characters you don't want remove, for example: [^èéòàùì\w\s].
Have a look at xregexp.com. XRegExp adds base support for Unicode matching via the \p{...} syntax.
var str = "Їжак::: résd,$%& adùf"
var search = XRegExp('([^?<first>\\pL ]+)');
var res = XRegExp.replace(str, search, '',"all");
console.log(res); // returns "Їжак::: resd,adf"
console.log(str.replace(/[^\w\s]/gi, '') ); // returns " rsd adf"
console.log(str.replace(/[^\wèéòàùì\s]/gi, '') ); // returns " résd adùf"
<script src="https://cdnjs.cloudflare.com/ajax/libs/xregexp/3.1.1/xregexp-all.js"></script>
using \W or [a-z0-9] regex won't work for non english languages like chinese etc.,
It's better to use all special characters in regex and exclude them from given string
str.replace(/[~`!##$%^&*()+={}\[\];:\'\"<>.,\/\\\?-_]/g, '');
The first solution does not work for any UTF-8 alphabet. (It will cut text such as Їжак). I have managed to create a function which does not use RegExp and use good UTF-8 support in the JavaScript engine. The idea is simple if a symbol is equal in uppercase and lowercase it is a special character. The only exception is made for whitespace.
function removeSpecials(str) {
var lower = str.toLowerCase();
var upper = str.toUpperCase();
var res = "";
for(var i=0; i<lower.length; ++i) {
if(lower[i] != upper[i] || lower[i].trim() === '')
res += str[i];
}
return res;
}
Update: Please note, that this solution works only for languages where there are small and capital letters. In languages like Chinese, this won't work.
Update 2: I came to the original solution when I was working on a fuzzy search. If you also trying to remove special characters to implement search functionality, there is a better approach. Use any transliteration library which will produce you string only from Latin characters and then the simple Regexp will do all magic of removing special characters. (This will work for Chinese also and you also will receive side benefits by making Tromsø == Tromso).
I use RegexBuddy for debbuging my regexes it has almost all languages very usefull. Than copy/paste for the targeted language.
Terrific tool and not very expensive.
So I copy/pasted your regex and your issue is that [,] are special characters in regex, so you need to escape them. So the regex should be : /!##$^&%*()+=-[\x5B\x5D]\/{}|:<>?,./im
str.replace(/\s|[0-9_]|\W|[#$%^&*()]/g, "") I did sth like this.
But there is some people who did it much easier like str.replace(/\W_/g,"");
#Seagull anwser (https://stackoverflow.com/a/26482552/4556619)
looks good but you get undefined string in result when there are some special (turkish) characters. See example below.
let str="bənövşəyi 😟пурпурный İdÖĞ";
i slightly improve it and patch with undefined check.
function removeSpecials(str) {
let lower = str.toLowerCase();
let upper = str.toUpperCase();
let res = "",i=0,n=lower.length,t;
for(i; i<n; ++i) {
if(lower[i] !== upper[i] || lower[i].trim() === ''){
t=str[i];
if(t!==undefined){
res +=t;
}
}
}
return res;
}
text.replace(/[`~!##$%^*()_|+\-=?;:'",.<>\{\}\[\]\\\/]/gi, '');
why dont you do something like:
re = /^[a-z0-9 ]$/i;
var isValid = re.test(yourInput);
to check if your input contain any special char

Regular expressions for parsing "real" words in JavaScript

I've got some text where some words are "real" words, and others are masks that will be replaced with some text and that are surrounded with, say, "%". Here's the example:
Hello dear %Name%! You're %Age% y.o.
What regular expression should I use to get "real" words, without using lookbehind, because they don't exist in JavaScript?
UPD: I want to get words "Hello", "dear", "you're", "y.o.".
If I've understood your question correctly this might work.
I would go about it the other way around, instead of finding the real words I would remove the "fake-words."
s = "Hello dear %Name%! You're %Age% y.o."
realWords = s.replace(/%.*?%/g, "").split(/ +/)
You could use split to get the words and filter the words afterwards:
var str = "Hello dear %Name%! You're %Age% y.o.", words;
words = str.split(/\s+/).filter(function(val) {
return !/%[^%]*%/.test(val);
});
To do a search and replace with regexes, use the string's replace() method:
myString.replace(/replaceme/g, "replacement")
Using the /g modifier makes sure that all occurrences of "replaceme" are replaced. The second parameter is an normal string with the replacement text.
You can match the %Something% matches using %[^%]*?%, but how are you storing all of the individual mask values like Name and Age?
Use regular expression in Javascript and split the string based on matching regular expression.
//javascript
var s = "Hello dear %Name%! You're %Age% y.o.";
words = s.split(/%[^%]*?%/i);
//To get all the words
for (var i = 0; i < words.length; i++) {
}

Categories

Resources