The second pattern of a regex not replacing apostrophe - javascript

I'm creating a regex that matches straight apostrophes and replaces them with a curly ones. Sometimes an apostrophe goes in the middle of two characters. Other times goes at the end of a character/word (e.g. ellipsis').
So I have two regexes that handle both situations (separated by an or statement).
However, only the first case is being replaced, not the second. In other words, this:
"Wor'd word'".replace(/(?<=\w)\'(?=\w)|(?<=\w)\'(?=\s)/, '’')
Becomes this:
"Wor’d word'"
This confuses me because both types of apostrophes are matching: https://regexr.com/4td7p
Why is this, and how to fix it?
Update: I figured the problem was that there's no space after the last apostrophe, so I changed the second part of the regex to this: (?<=\w)\'(?!\w) (don't match if there's a character after the apostrophe). But I'm getting the same result.

If you want to match (?<=\w)\' followed by a character and also match (?<=\w)\' not followed by a character, why not just drop the logic after it altogether and just use (?<=\w)'? (no need to escape 's in a regex)
You also need the global flag to replace more than one thing at a time:
console.log(
"Wor'd word'".replace(/(?<=\w)'/g, '’')
);

updated
var str = "Wor'd word' that's a good thing'";
var afterReplace = str.replace(/'\b/g, '’')
console.log(afterReplace);

Related

Replace part of a string and upper case the result

everything good?
I would like some help from you, I have the following scenario:
STRING_ONE:
string_two
STRING_three :
string_four:
stringfive:
I need to identify words from the beginning of the line that end with :
after identifying the words, I need to erase the spaces and convert them to uppercase,
i tried doing some regex but as the words change I can't set the default for the replacement because i need to keep the same word, just removing the spaces and converting to capital letters
The result I'm trying to get is this:
STRING_ONE:
string_two
STRING_three :
STRING_FOUR:
STRINGFIVE:
I can capture the words that match this pattern, with the following regex, but I don't know how to replace it by just erasing the spaces, keeping the rest of the string the same, and doing the upper case
^.*\b:
I tried to replace like this but it didn't work
"$1".toUpperCase()
Can anyone help please?
Thanks!
As you are not really replacing anything but instead you are looking for a pattern I used the pattern in a String.prototype.match() to identify the lines in which .trim() and .toLowerCase() need to be applied. The .split("\n") turns the initial string into an array over which I can then .map() the individual lines. At the end I .join() everything together again.
const str=`STRING_ONE:
string_two
STRING_three :
string_four:
stringfive:`;
console.log(str
.split("\n")
.map(s=>s.match(/^\s*\w+:\s*$/) && s.trim().toUpperCase() || s )
.join("\n")
);
Our regex patterns differ slightly:
while yours (/^.*\b:/) will match any line that has at least one word-end followed by a colon in it
mine (/^\s*\w+:\s*$/) is stricter and demands that there is exactly one word followed by a colon in a line that can optionally be padded by any number of whitespace characters on either side.

JavaScript regular expression replace - why does one work, but this other not?

I grabbed the following JavaScript regular expression replace from another site to strip out some invalid characters:
str = str.replace(/[^\u000D\u00B7\u0020-\u007E\u00A2-\u00A4]/g,'');
However, I noticed it wasn't catching occurrences of \00B7 (the ISO-8859-1 center dot character).
If I did it in two steps however, it works:
str = str.replace(/\u00B7/g,'');
str = str.replace(/[^\u000D\u00B7\u0020-\u007E\u00A2-\u00A4]/g,'');
The 1st replace seems to be included in the 2nd replace. Can somebody explain to me why the 2nd line doesn't work all by itself. Thanks.
The first and second pattern are completely different. Pattern one replaces \u00B7, while the second pattern replaces all characters NOT listed in the pattern. Remove the carat from pattern two and that should fix your issue.
Just to be clear:
/[^\u000D\u00B7\u0020-\u007E\u00A2-\u00A4]/
matches all characters not in the set. So to match \u00B7 (and have it replaced with ''), remove it from the pattern:
/[^\u000D\u0020-\u007E\u00A2-\u00A4]/
The ASCII character set is given at http://www.asciitable.com/, likely that is the set you want to keep. The range \u0020-\u007E covers most the common set that is of interest, the others are typically not wanted.
\u000D is a carriage return, I would investigate whether you really need u00A2, u00A3 and u00A4.

regular expressions explanation in javascript

Can somebody explain what this regular expression does?
document.cookie.match(/cookieInfo=([^;]*).*$/)[1]
Also it would be great if I can strip out the double quotes I'm seeing in the cookieInfo values. i.e. when cookieInfo="xyz+asd" - I want to strip out the double quotes using the above regular expression.
It basically saying grab as many characters that are not semi-colons and that follow after the string 'cookieInfo='
Try this to eliminate the double quotes:
document.cookie.match(/cookieInfo="([^;]*)".*$/)[1]
It searches the document.cookie string for cookieInfo=.
Next it grabs all of the characters which are not ; (until it hits the first semicolon).
[...] set of all characters included inside.
[^...] set of all characters which don't match
Then it lets the RegEx search through all other characters.
.* any character, 0 or more times.
$ end of string (or in some special cases, end of line).
You could replace " a couple of different ways, but rather than stuffing it into the regex, I'd recommend doing a replace on it after the fact:
var string = document.cookie.match(...)[1],
cleaned_string = string.replace(/^"|"$/g, "");
That second regex says "look at the start of the string and see if there's a ", or look at the end of the string and see if there's a ".
Normally, a RegEx would stop after it did the first thing it found. The g at the end means to keep going for every match it can possibly find in the string that you gave it.
I wouldn't put it in the original RegEx, because playing around with optional quotes can be ugly.
If they're guaranteed to always, always be there, then that's great, but if you assume they are, and you hit one that doesn't have them, then you're going to get a null match.
The regular expression matches a string starting with 'cookieInfo=' followed by and capturing 0 or more non-semi-column characters followed by 0 or more 'anythings'.
To strip out the double quotes you can use the regex /"/ and replace it with an empty string.

silent group not working in javascript regex match()

I'm trying to extract (potentially hyphenated) words from a string that have been marked with a '#'.
So for example from the string
var s = '#moo, #baa and #moo-baa are writing an email to a#bc.de'
I would like to return
['#moo', '#baa', '#moo-baa']
To make sure I don't capture the email address, I check that the group is preceded by a white-space character OR the beginning of the line:
s.match(/(^|\s)#(\w+[-\w+]*)/g)
This seems to do the trick, but it also captures the spaces, which I don't want:
["#moo", " #baa", " #moo-baa"]
Silencing the grouping like this
s.match(/(?:^|\s)#(\w+[-\w+]*)/g)
doesn't seem to work, it returns the same result as before. I also tried the opposite, and checked that there's no \w or \S in front of the group, but that also excludes the beginning of the line. I know I could simply trim the spaces off, but I'd really like to get this working with just a single 'match' call.
Anybody have a suggestion what I'm doing wrong? Thanks a lot in advance!!
[edit]
I also just noticed: Why is it returning the '#' symbols as well?! I mean, it's what I want, but why is it doing that? They're outside of the group, aren't they?
As far as I know, the whole match is returned from String.match when using the "g" modifier. Because, with the modifier you are telling the function to match the whole expression instead of creating numbered matches from sub-expressions (groups). A global match does not return groups, instead the groups are the matches themselves.
In your case, the regular expression you were looking for might be this:
'#moo, #baa and #moo-baa are writing an email to a#bc.de'.match(/(?!\b)(#[\w\-]+)/g);
You are looking for every "#" symbol that doesn't follow a word boundary. So there is no need for silent groups.
If you don't want to capture the space, don't put the \s inside of the parentheses. Anything inside the parentheses will be returned as part of the capture group.

How can I shorten this regex for JavaScript?

Basically I just want it to match anything inside (). I tried the . and * but they don't seem to work. Right now my regex looks like:
\(([\\\[\]\-\d\w\s/*\.])+\)
The strings it's going to match are URL routes like:
#!/foo/bar/([a-z])/([\d\w])/(*)
In this example, my regex above matches:
([a-z])
([\d\w])
(*)
BONUS:
How can I make it so that it only matches when it starts with a ( and ends with a ). I thought I used the ^ at the front where it's \( and the $ and the end where it's \) but no luck.
Disregard this bonus. I didnt realize it didnt matter...
Are you worried about nested parentheses? If not, you could set it up to match all characters that aren't a closing paren:
\(([^)]*)\)
Basically I just want it to match anything inside ().
BONUS: How can I make it so that it only matches when it starts with a ( and ends with a )?
Easy peasy.
var re1 = /^\(.*\)$/
// or
var re2 = new RegExp('^\\(.*\\)$');
Edit
Re: #Mike Samuel's comments
Does not match newlines between the parentheses which were explicitly matched by \s in the original.
...
Maybe you should use [\s\S] instead of .
...
If you're going to exclude newlines you should do so intentionally or explicitly.
Note that . matches any single character except the newline character. If you also want to match newlines as part of the "anything" between parentheses, use the [\s\S] character class:
var re3 = /^\([\s\S]*\)$/
// or
var re4 = new RegExp('^\\([\\s\\S]*\\)$');
To negate a match, you use the [^...] construct. Thus, to match anything within parentheses, you would use:
\([^)]+\)
which says "match any string that starts with an open parenthesis, contains any number of characters that are not closing parentheses and ends with a closing parenthesis.
To match entire lines that match the above construct, just wrap it with ^ and $:
^\([^)]+\)$
I'm not completely sure I understand what you're doing, but try this:
var re = /\/(\([^()]+\)(?=\/|$)/;
Matching the leading slash in addition to the opening paren ensures that the paren is indeed at the beginning. You can't do the same thing at the end because you don't know there will be a trailing slash. And if there is one, you don't want to consume it because it's also the leading slash for the next match attempt.
Instead, you use the lookahead - (?=\/|$) - to match the trailing slash without consuming it. If there is no slash, I assume no other character should be present either--hence the anchor: $.
#patorjk brought up a good point, though: can there be more parentheses between the outermost pair? If there are, the problem is much more complicated. I won't bother trying to expand my regex to deal with nested parens; some regex flavors can handle such things, but not JavaScript. Instead I'll recommend this sloppier regex:
\/(\([\s\S]+?\))(?=\/|$)
I say "sloppy" because it relies on the assumption that the sequences /( and )/ will never appear inside a valid match. As with my first regex, the text that you're interested in (i.e., everything but the leading and trailing slashes) will be captured in group #1.
Notice the non-greedy quantifier, too. With a regular greedy quantifier it will match everything from the first ( to the last ) in one shot. In other words, it'll match ([a-z])/([\d\w])/(*) instead of ([a-z]), ([\d\w]) and (*) as you wanted.

Categories

Resources