Regexp replace all exact matching words/characters in string - javascript

I need to replace all matching words or special characters in a string, but cant figure out a way to do so.
For example i have a string: "This - is a great victory"
I need to replace all - with + signs. Or great with unpleasant - user selects a word to be replaced and gives replacement for it.
"\\b"+originalTex+"\\b"
was working fie until i realised that \b does work only with word characters.
So the question is: what is replacement for \b would let me replace any matching word that is enclosed by whitespaces?
EDIT: I can not remove word boundaries as it would result inexact match. For example: you are creator of your world, while change you, your also would be changed. as it contains "you"

You need to use the following code:
var s = "you are creator of your world";
var search = "you";
var torepl = "we";
var rx = new RegExp("(^|\\s)" + search.replace(/[-\/\\^$*+?.()|[\]{}]/g, '\\$&') + "(?!\\S)", "gi");
var res = s.replace(rx, "$1" + torepl);
console.log(res);
The (^|\\s) will match and capture into Group 1 start of string or a whitespace. The search.replace(/[-\/\\^$*+?.()|[\]{}]/g, '\\$&') will escape special chars (if any) inside the search word. The (?!\\S) lookahead will require a whitespace or end of string right after the search word.
The $1 backreference inserts the contents of Group 1 back into the string during replacement (no need to use any lookbehinds here).

How about two replaces
var txt = "This - is a great, great - and great victory"
var originalTex1 = "great",originalTex2 = "-",
re1 = new RegExp("\\b"+originalTex1+"\\b","g"),
re2 = new RegExp("\\s"+originalTex2+"\\s","g")
console.log(txt.replace(re1,"super").replace(re2," + "))

Related

How to use regex with an array of keywords to replace?

I am trying to create a loop that will replace certain words with their uppercase version. However I cannot seem to get it to work with capture groups as I need to only uppercase words surrounded by whitespace or a start-line marker. If I understand correctly \b is the boundary matcher? The list below is shortened for convenience.
raw_text = 'crEate Alter Something banana'
var lower_text = raw_text.toLowerCase();
var sql_keywords = ['ALTER', 'ANY', 'CREATE']
for (i = 0; i < sql_keywords.length; i++){
search_key = '(\b)' + sql_keywords[i].toLowerCase() + '(\b)';
replace_key = sql_keywords[i].toUpperCase();
lower_text = lower_text.replace(search_key, '$1' + replace_key + '$2');
}
It loops fine but the replace fails. I assume I have formatted it incorrectly but I cannot work out how to correctly format it. To be clear, it is searching for a word surrounded by either line start or a space, then replacing the word with the upper case version while keeping the boundaries preserved.
Several issues:
A backslash inside a string literal is an escape character, so if you intend to have a literal backslash (for the purpose of generating regex syntax), you need to double it
You did not create a regular expression. A dynamic regular expression is created with a call to RegExp
You would want to provide regex option flags, including g for global, and you might as well ease things by adding the i (case insensitive) flag.
There is no reason to make a capture group of a \b as it represents no character from the input. So even if your code would work, then $1 and $2 would just resolve to empty strings -- they serve no purpose.
You are casting the input to all lower case, so you will lose the capitalisation on words that are not matched.
It will be easier when you create one regular expression for all at the same time, and use the callback argument of replace:
var raw_text = 'crEate Alter Something banana';
var sql_keywords = ['ALTER','ANY','CREATE'];
var regex = RegExp("\\b(" + sql_keywords.join("|") + ")\\b", "gi");
var result = raw_text.replace(regex, word => word.toUpperCase());
console.log(result);
BTW, you probably also want to match reserved words when they are followed by punctuation, such as a comma. \b will match any switch between alphanumerical and non-alphanumerical, and vice versa, so that seems fine.
You can use the RegExp constructor.
Then make a function:
const listRegexp = list => new RegExp(list.map(word => `(${word})`).join("|"), "gi");
Then use it:
const re = listRegexp(sql_keywords);
Then replace:
const output = raw_text.replace(r, x => x.toUpperCase())

Replace multiple characters by one character with regex

I have this string :
var str = '#this #is____#a test###__'
I want to replace all the character (#,_) by (#) , so the excepted output is :
'#this #is#a test#'
Note :
I did not knew How much sequence of (#) or (_) in the string
what I try :
I try to write :
var str = '#this #is__ __#a test###__'
str = str.replace(/[#_]/g,'#')
alert(str)
But the output was :
#this #is## ###a test#####
my try online
I try to use the (*) for sequence But did not work :
var str = '#this #is__ __#a test###__'
str = str.replace(/[#_]*/g,'#')
alert(str)
so How I can get my excepted output ?
A well written RegEx can handle your problem rather easily.
Quoting Mohit's answer to have a startpoint:
var str = '#this #is__ __#a test###__';
var formattedStr = str.replace(/[#_,]+/g, '#');
console.log( formattedStr );
Line 2:
Put in formattedStr the result of the replace method on str.
How does replace work? The first parameter is a string or a RegEx.
Note: RegExps in Javascripts are Objects of type RegExp, not strings. So writing
/yourRegex/
or
New RegExp('/yourRegex/')
is equivalent syntax.
Now let's discuss this particular RegEx itself.
The leading and trailing slashes are used to surround the pattern, and the g at the end means "globally" - do not stop after the first match.
The square parentheses describe a set of characters who can be supplied to match the pattern, while the + sign means "1 or more of this group".
Basically, ### will match, but also # or #####_# will, because _ and # belong to the same set.
A preferred behavior would be given by using (#|_)+
This means "# or _, then, once found one, keep looking forward for more or the chosen pattern".
So ___ would match, as well as #### would, but __## would be 2 distinct match groups (the former being __, the latter ##).
Another problem is not knowing wheter to replace the pattern found with a _ or a #.
Luckily, using parentheses allows us to use a thing called capturing groups. You basically can store any pattern you found in temporary variabiles, that can be used in the replace pattern.
Invoking them is easy, propend $ to the position of the matched (pattern).
/(foo)textnotgetting(bar)captured(baz)/ for example would fill the capturing groups "variables" this way:
$1 = foo
$2 = bar
$3 = baz
In our case, we want to replace 1+ characters with the first occurrence only, and the + sign is not included in the parentheses!
So we can simply
str.replace("/(#|_)+/g", "$1");
In order to make it work.
Have a nice day!
Your regex replaces single instance of any matched character with character that you specified i.e. #. You need to add modifier + to tell it that any number of consecutive matching characters (_,#) should be replaced instead of each character individually. + modifier means that 1 or more occurrences of specified pattern is matched in one go. You can read more about modifiers from this page:
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp
var str = '#this #is__ __#a test###__';
var formattedStr = str.replace(/[#_,]+/g, '#');
console.log( formattedStr );
You should use the + to match one-or-more occurrences of the previous group.
var str = '#this #is__ __#a test###__'
str = str.replace(/[#_]+/g,'#')
alert(str)

Match parts of code

I'm trying to match parts of code with regex. How can I match var, a, =, 2 and ; from
"var a = 2;"
?
I believe you want this regexp: /\S+/g
To break it down: \S selects all non-whitespace characters, + makes sure you it selects multiple non whitespace characters together (i.e. 'var'),
and the 'g' flag makes sure it selects all of the occurrences in the string, and instead of stopping at the first one which is the default behavior.
This is a helpful link for playing around until you find the right regexp: https://regex101.com/#javascript
var str = "var a = 2;";
// clean the duplicate whitespaces
var no_duplicate_whitespace = str.replace(new RegExp("\\s+", "g"), " ");
// and split by space
var tokens = no_duplicate_whitespace.split(" ");
Or as #kuujinbo pointed out:
str.split(/\s+/);

How to replace all \n with space? [duplicate]

I have a var that contains a big list of words (millions) in this format:
var words = "
car
house
home
computer
go
went
";
I want to make a function that will replace the newline between each word with space.
So the results would something look like this:
car house home computer go went
You can use the .replace() function:
words = words.replace(/\n/g, " ");
Note that you need the g flag on the regular expression to get replace to replace all the newlines with a space rather than just the first one.
Also, note that you have to assign the result of the .replace() to a variable because it returns a new string. It does not modify the existing string. Strings in Javascript are immutable (they aren't directly modified) so any modification operation on a string like .slice(), .concat(), .replace(), etc... returns a new string.
let words = "a\nb\nc\nd\ne";
console.log("Before:");
console.log(words);
words = words.replace(/\n/g, " ");
console.log("After:");
console.log(words);
In case there are multiple line breaks (newline symbols) and if there can be both \r or \n, and you need to replace all subsequent linebreaks with one space, use
var new_words = words.replace(/[\r\n]+/g," ");
See regex demo
To match all Unicode line break characters and replace/remove them, add \x0B\x0C\u0085\u2028\u2029 to the above regex:
/[\r\n\x0B\x0C\u0085\u2028\u2029]+/g
The /[\r\n\x0B\x0C\u0085\u2028\u2029]+/g means:
[ - start of a positive character class matching any single char defined inside it:
\r - (\x0D) - \n] - a carriage return (CR)
\n - (\x0A) - a line feed character (LF)
\x0B - a line tabulation (LT)
\x0C - form feed (FF)
\u0085 - next line (NEL)
\u2028 - line separator (LS)
\u2029 - paragraph separator (PS)
] - end of the character class
+ - a quantifier that makes the regex engine match the previous atom (the character class here) one or more times (consecutive linebreaks are matched)
/g - find and replace all occurrences in the provided string.
var words = "car\r\n\r\nhouse\nhome\rcomputer\ngo\n\nwent";
document.body.innerHTML = "<pre>OLD:\n" + words + "</pre>";
var new_words = words.replace(/[\r\n\x0B\x0C\u0085\u2028\u2029]+/g," ");
document.body.innerHTML += "<pre>NEW:\n" + new_words + "</pre>";
Code : (FIXED)
var new_words = words.replace(/\n/g," ");
Some simple solution would look like
words.replace(/(\n)/g," ");
No need for global regex, use replaceAll instead of replace
myString.replaceAll('\n', ' ')

Javascript word boundary unicode space issue

I want to write a regex pattern that matches for full words or phrases even if they have unicode chars to wrap them with some html code. So I use this pattern:
var pattern=new RegExp('(^|\\s)'+phrase+'(?=\\s|$)', "gi");
It works perfectly even on multi-word phrases expect for one issue. If the phrase isn't the start of the string, it matches with the space before the word. So after I wrap it I'll lose that space. I only want to wrap the phrase variable and not the spaces.
For example:
var string="This is a nice sentence.";
var phrase="is a nice";
/*OUTPUT: Thisis a nicesentence*//*HTML OUTPUT: This<span>is a nice</span>sentence*/
/*What I want: This <span>is a nice</span> sentence*/
Of course this pattern could work:
var pattern=new RegExp(phrase, "gi");
But I'm not looking for those strings that are substrings of another.
Is it possible to solve my issue with a better regex pattern?
Simply write back what you captured in group 1:
output = string.replace(pattern, '$1<span>' + phrase + '</span>');
If you are not using replace but match or exec and do the replacement manually, you can still access the capturing group in the returned array and insert the space or empty string before your span.
By the way, if you capture the phrase as well, you don't need any string concatenation in the replacement:
var pattern = new RegExp('(^|\\s)('+phrase+')(?=\\s|$)', "gi");
output = string.replace(pattern, '$1<span>$2</span>');

Categories

Resources