Unexpected result with Regular Expression in JavaScript - javascript

This is my String.
var re = "i have a string";
And this my expression
var str = re.replace(/(^[a-z])/g, function(x){return x.toUpperCase();});
I want that it will make the the first character of any word to Uppercase. But the replacement above return only the first character uppercased. But I have added /g at the last.
Where is my problem?

You can use the \b to mark a boundary to the expression.
const re = 'i am a string';
console.log(re.replace(/(\b[a-z])/g, (x) => x.toUpperCase()));
The metacharacter \b is an anchor like the caret and the dollar sign. It matches at a position that is called a "word boundary". This match is zero-length.
There are three different positions that qualify as word boundaries:
Before the first character in the string, if the first character is a word character.
After the last character in the string, if the last character is a word character.
Between two characters in the string, where one is a word character and the other is not a word character.

Vlaz' comment looks like the right answer to me -- by putting a "^" at the beginning of your pattern, you've guaranteed that you'll only find the first character -- the others won't match the pattern despite the "/g" because they don't immediately follow the start of the line.

Related

Why can't '\b\w+(?=ing)\b' match "I'm singing while you're dancing"

I have the following regex
var reg = /\b\w+(?=ing)\b/g;
var str = "I'm singing while you're dancing";
str.match(reg) // ==>null
But if the regex is /\b\w+(?=ing\b)/g then the str can match 'sing,danc'
Why does that mach but my previous example doesn't?
Because the (?=ing) is a zero length match. So you are trying to match one or more word characters followed by 'ing' that at the same time is followed by a word boundary.
Because a word boundary is a change from word characters to non-word characters, a word character followed by an 'i' is not followed by a word boundary.
/\b\w+(?=ing\b)/g matches the 's' in 'sing,danc' because ',' is a non-word characater - and therfore there is a word boundary between the 'g' and ','. To find the correct regexp you need to be clearer on why 'sing,danc' should not match.
you need to remove the \b at the end since the part you want to match does not end on a boundary . danc and sing would batch \b\w+\b but dancing and singing would not
\b\w+(?=ing)
check the demo here
\b\w+(?=ing\b)
would make sure that your ing is at the boundary and not the
\w+
part

How to understand regex '\b'?

I am learning the regex.But I can't understand the '\b' , match a word boundary . there have three situation,like this:
Before the first character in the string, if the first character is a word character.
After the last character in the string, if the last character is a word character.
Between two characters in the string, where one is a word character and the other is not a word character.
I can't understand the third situation.for example:
var reg = /end\bend/g;
var string = 'wenkend,end,end,endend';
alert( reg.test(string) ) ; //false
The '\b' require a '\w' character at its one side , another not '\w' character at the other side . the string 'end,end' should match the rule, after the first character is string ',' , before the last character is string ',' , so why the result is error .Could you help,Thanks in advance!
============dividing line=============
With your help, I understand it. the 'end,end' match the first 'end' and have a boundary ,but the next character is ',' not 'e',so '/end\bend' is false.
In other words ,the reg '/end\bend/g' or others similar reg aren't exit forever.
Thanks again
The \b matches position, not a character. So this regex /end\bend/g says that there must be string end. Then it should be followed by not a word character, which is , and it matches, but the regex engine doesn't move in the string and it stays at ,. So the next character in your regex is e, and e doesn't match ,. So regexp fails. Here is step by step what happens:
-----------------
/end\bend/g, "end,end" (match)
| |
-----------------
/end\bend/g, "end,end" (both regex and string position moved - match)
| |
------------------
/end\bend/g, "end,end" (the previous match was zero-length, so only regex position moved - not match)
| |
With (most) regular expression engines, you can match, capture characters and assert positions within a string.
For the purpose of this example let's assume the string
Rogue One: A Star Wars Story
where you want to match the character o (which is there twice, after R and after t). Now you want to specify the position and want to match os only before lowercase rs.
You write (with a positive lookahead):
o(?=r)
Now generalize the idea of zero-width assertions where you want to look for a word character ahead while making sure there's no word character immediately behind. Herefore you could write:
(?=\w)(?<!\w)
A positive and a negative lookahead, combined. We're almost there :) You only need the same thing around (a word character behind and not a word character ahead) which is:
(?<=\w)(?!\w)
If you combine these two, you'll eventually get (see the | in the middle):
(?:(?=\w)(?<!\w)|(?<=\w)(?!\w))
Which is equivalent to \b (and a lot longer). Coming back to our string, this is true for:
Rogue One: A Star Wars Story
# right before R
# right after e in Rogue
# right before O of One
# right after e of One (: is not a word character)
# and so on...
See a demo on regex101.com.
To conclude, you can think of \b as a zero-width assertion which only ensures a position within the string.
Try this Expression
/(end)\b|\b(end)/g

How to extract the last word in a string with a JavaScript regex?

I need is the last match. In the case below the word test without the $ signs or any other special character:
Test String:
$this$ $is$ $a$ $test$
Regex:
\b(\w+)\b
The $ represents the end of the string, so...
\b(\w+)$
However, your test string seems to have dollar sign delimiters, so if those are always there, then you can use that instead of \b.
\$(\w+)\$$
var s = "$this$ $is$ $a$ $test$";
document.body.textContent = /\$(\w+)\$$/.exec(s)[1];
If there could be trailing spaces, then add \s* before the end.
\$(\w+)\$\s*$
And finally, if there could be other non-word stuff at the end, then use \W* instead.
\b(\w+)\W*$
In some cases a word may be proceeded by non-word characters, for example, take the following sentence:
Marvelous Marvin Hagler was a very talented boxer!
If we want to match the word boxer all previous answers will not suffice due the fact we have an exclamation mark character proceeding the word. In order for us to ensure a successful capture the following expression will suffice and in addition take into account extraneous whitespace, newlines and any non-word character.
[a-zA-Z]+?(?=\s*?[^\w]*?$)
https://regex101.com/r/D3bRHW/1
We are informing upon the following:
We are looking for letters only, either uppercase or lowercase.
We will expand only as necessary.
We leverage a positive lookahead.
We exclude any word boundary.
We expand that exclusion,
We assert end of line.
The benefit here are that we do not need to assert any flags or word boundaries, it will take into account non-word characters and we do not need to reach for negate.
var input = "$this$ $is$ $a$ $test$";
If you use var result = input.match("\b(\w+)\b") an array of all the matches will be returned next you can get it by using pop() on the result or by doing: result[result.length]
Your regex will find a word, and since regexes operate left to right it will find the first word.
A \w+ matches as many consecutive alphanumeric character as it can, but it must match at least 1.
A \b matches an alphanumeric character next to a non-alphanumeric character. In your case this matches the '$' characters.
What you need is to anchor your regex to the end of the input which is denoted in a regex by the $ character.
To support an input that may have more than just a '$' character at the end of the line, spaces or a period for instance, you can use \W+ which matches as many non-alphanumeric characters as it can:
\$(\w+)\W+$
Avoid regex - use .split and .pop the result. Use .replace to remove the special characters:
var match = str.split(' ').pop().replace(/[^\w\s]/gi, '');
DEMO

JS & Regex: how to replace punctuation pattern properly?

Given an input text such where all spaces are replaced by n _ :
Hello_world_?. Hello_other_sentenc3___. World___________.
I want to keep the _ between words, but I want to stick each punctuation back to the last word of a sentence without any space between last word and punctuation. I want to use the the punctuation as pivot of my regex.
I wrote the following JS-Regex:
str = str.replace(/(_| )*([:punct:])*( |_)/g, "$2$3");
This fails, since it returns :
Hello_world_?. Hello_other_sentenc3_. World_._
Why it doesn't works ? How to delete all "_" between the last word and the punctuation ?
http://jsfiddle.net/9c4z5/
Try the following regex, which makes use of a positive lookahead:
str = str.replace(/_+(?=\.)/g, "");
It replaces all underscores which are immediately followed by a punctuation character with the empty string, thus removing them.
If you want to match other punctuation characters than just the period, replace the \. part with an appropriate character class.
JavaScript doesn't have :punct: in its regex implementation. I believe you'd have to list out the punctuation characters you care about, perhaps something like this:
str = str.replace(/(_| )+([.,?])/g, "$2");
That is, replace any group of _ or space that is immediately followed by punctation with just the punctuation.
Demo: http://jsfiddle.net/9c4z5/2/

With a JS Regex matching exact word but not hypenated words starting with said word

I could not find a match to this question.
I have a string like so
var s="one two one-two one-three one one_four"
and my function is as follows
function replaceMatches( str, word )
{
var pattern=new RegExp( '\\b('+word+')\\b','g' )
return str.replace( pattern, '' )
}
the problem is if I run the function like
var problem=replaceMatches( s,'one' )
it
returns two -two -three one_four"
the function replaces every "one" like it should but treats words with a hyphen as
two words replacing the "one" before the hyphen.
My question is not about the function but about the regex. What literal regex will match
only the words "one" in my string and not "one-two" or "one-\w"<--you know what I mean lol
basically
var pat=/\b(one)\b/g
"one one-two one".replace( pat, '')
I want the above ^ to return
" one-two "
only replace the exact match "one" and not the one in "one-two"
the "one" on the end is important to, the regex must work if the match is at the very end
Thank you, sorry if my question is relatively confusing. I am just trying to get my learn on, and expand my personal library.
What do you considered to be a word?
A word is a sequence of 1 or more word characters, and word boundary \b is defined based upon the definition of word character (and non-word character).
Word character as defined by \w in JavaScript RegExp is shorthand for character class [a-zA-Z0-9_].
What is your definition of a "word"? Let's say your definition is [a-zA-Z0-9_-].
Emulating word boundary
This post describes how to emulate a word boundary in languages that support look-behind and look-ahead. Too bad, JS doesn't support look-behind.
Let us assume the word to be replaced is one for simplicity.
We can limit the replacement with the following code:
inputString.replace(/([^a-zA-Z0-9_-]|^)one(?![a-zA-Z0-9_-])/g, "$1")
Note: I use the expanded form [a-zA-Z0-9_-] instead of [\w-] to avoid association with \w.
Break down the regex:
(
[^a-zA-Z0-9_-] # Negated character class of "word" character
| # OR
^ # Beginning of string
)
one # Keyword
(?! # Negative look-ahead
[a-zA-Z0-9_-] # Word character
)
I emulate the negative look-behind (which is (?<![a-zA-Z0-9_-]) if supported) by matching a character from negated character class of "word" character and ^ beginning of string. This is natural, since if we can't find a "word" character, then it must be either a non-"word" character or beginning of the string. Everything is wrapped in a capturing group so that it can be replaced back later.
Since one is only replace if there is no "word" character before or after, there is no risk of missing a match.
Putting together
Since you are removing "word"s, you must make sure your keyword contains only "word" characters.
function replaceMatches(str, keyword)
{
// The keyword must not contain non-"word" characters
if (!/^[a-zA-Z0-9_-]+$/.test(keyword)) {
throw "not a word";
}
// Customize [a-zA-Z0-9_-] and [^a-zA-Z0-9_-] with your definition of
// "word" character
var pattern = new RegExp('([^a-zA-Z0-9_-]|^)' + keyword + '(?![a-zA-Z0-9_-])', 'g')
return str.replace(pattern, '$1')
}
You need to escape meta-characters in the keyword if your definition of "word" character includes regex meta-characters.
Use this for your RegExp:
function replaceMatches( str, word ) {
var pattern = new RegExp('(^|[^-])\\b('+word+')\\b([^-]|$)', 'g');
return str.replace(pattern, '$1$3')
}
The (^|[^-]) will match either the start of the string or any character except -. The ([^-]|$) will match either a character other than - or the end of the string.
I'm not a JS pattern function expert but the function should replace all.
As for the hyphen in 'one-two' between one and - is a word boundry (ie. \b) and the
end of string is a word boundry if a \w character is there before it.
But, it sounds like you may want 'one' to be preceeded with a space or BOL.
([ ]|^)one\b in that case you want to make the replacement capture group 1, thus strippking out 'one' only.
And, I'm not sure how that function call works in JS.
Edit: after new expected output, the regex could be -
([ ]|^)one(?=[ ]|$)

Categories

Resources