Match exact word and remove leading space in regular expression - javascript

I'm looking for a regular expression.
Requirement:
I need to select a complete word from a string (word might contain special character or anything). And m pretty close to the solution.
Example:
character-set
Regular expression: (?:^|\s)(cent-er)(?=\s|$)
Result: " character-set" with a leading space.
But i want to remove leading space from the selected word. The word should match exactly i.e if i say character or character- or -set or set it should not get any result.
Any help is much appreciated. Thanks in advance.

It is not exactly what you seem to describe (as far as I could understand, that is), but maybe what you are looking for are word boundaries: \b. Try the regex (parentheses optional):
(\b)(cent-er)(\b)
Other than that, if you have to have a space before the word, then you will have to match it (and then use capturing groups to extract the word without the space), because JavaScript's regex has no lookbehinds.

Related

Can I make my regex split the punctuation marks from my special words?

I have the following string:
"By signing in, I agree to the {{#a}}[Terms of Use](https://www.example.com/termsofuse){{/a}} and {{#a}}[Privacy Policy](https://www.example.com/privacy){{/a}}."
And I am using the following regex to split the words while considering {{#a}}[Terms of Use](https://www.example.com/termsofuse){{/a}} and {{#a}}[Privacy Policy](https://www.example.com/privacy){{/a}} as whole words.
\s+(?![^\[]*\])
My problem is that my current regex does not remove the full stop at the end of {{#a}}[Privacy Policy](https://www.example.com/privacy){{/a}}.. Ideally I would like my regex to split full stops, exclamation marks and question marks. That being said, I'm not sure how would I differentiate between a full stop at the end of the word and a full stop that is part of the URL.
You can try a variation of the following regular expression:
\s+(?![^\[]*\])|(?=[\.?!](?![a-zA-Z0-9_%-]))
The new part being the alternation of (?=[\.?!](?![a-zA-Z0-9_%-])) at the end. It performs a positive lookahead of a period, question mark or bang, using a negative lookahead to make sure it's not followed by a URL-ish looking character. You may need to adjust that character class in brackets to contain the characters you want to consider part of the URL.
Instead of .split you will be better off using .match here using this regex:
\{\{#a}}.*?\{\{\/a}}/g
This matches {{#a}} followed by 0 or of any character followed by {{/a}}.
or else you may use this more strict regex match:
\{\{#a}}\[[^\]]*]\([^)]*\)\{\{\/a}}
Here:
\[[^\]]*]: Matches [...] substring
\([^)]*\): Matches (...) substring
RegEx Demo
var string = "By signing in, I agree to the {{#a}}[Terms of Use](https://www.example.com/termsofuse){{/a}} and {{#a}}[Privacy Policy](https://www.example.com/privacy){{/a}}.";
console.log( string.match(/\{\{#a}}.*?\{\{\/a}}/g) );

If Statement with .match(regex) in javascript not picking up spaces

Hi guys I'm trying to check if user input string contains a space. I'm using http://regexr.com/ to check if my regular expression is correct. FYI new to regex. Seems to be correct.
But it doesn't work, the value still gets returned even if there is a space. is there something wrong with my if statement or am I missing how regex works.
var regex = /([ ])\w+/g;
if (nameInput.match(regex)||realmInput.match(regex)) {
alert('spaces not allowed');
} else {
//do something else
}
Thanks in Advance
This regex /([ ])\w+/g will match any string which contain a space followed by any number of "word characters". This won't catch, for example, a space at the end of the string, not followed by anything.
Try using /\s+/g instead. It will match any occurrence of at least one space (including tabs).
Update:
If you wish to match only a single space this will do the trick: / /g. There's no real need for the brackets and parenthesis, and since one space is enough even the g flag is kind of obsolete, it could have simply been / /.
Your current regex doesn't match 'abc '(a word with space character at the end) . If you want to make sure, you can trim you input before check :).
You can check here https://regex101.com/
The right regex for matching only white space is
/([ ])/g

Issue with JS regular expression

I'm trying to create a regular expression in order to check some text inserted in a textarea. Basically, when typing I check the number of words inserted. As special chars, only commas and full stop are allowed.
The problem is, for example, when I type word,anotherword my regex recognises only one word instead of two. I cannot find a good regex for it.
My current regex is:
val.match(/\S+[A-Za-z]/g
What shall I add? Thanks a lot!
Since \S matches any non-whitespace characters, and a comma, too, you should be using
val.match(/\w+/g)
A \w matches word characters, those in A-Z, a-z, 0-9 ranges and a _ (underscore).
Do you want to extract the words, or check if the entered text is of the correct format?
If the first, use js Array.split(",") and check that each word is ok.
Otherwise, I think this refer should do it:
val.match(/^(\w+\,?)+$/);
Use the ^ and $ to make sure it starts and end around the found correct format.

Specific regex positive look(around|ahead|behind) in Javascript

I'm looking to match /(?=\W)(gimme)(?=\W)/gi or alike. The \W are supposed to be zero-width characters to surround my actual match.
Maybe some background. I want te replace certain words (always \w+) with some literal padding added, but only if it's not surrounded by a \w. (That does sound like a negative lookaround, but I hear JS doesn't do those!?)
(Btw: the above "gimme" is the word literal I want to replace. If that wasn't obvious.)
It has to be (?) a lookaround, because the \W have to be zero-width, because the intention is a .replace(...) and I cannot replace/copy the surrounding characters.
So this won't work:
text.replace(/(?=\W)(gimme)(?=\W)/gi, function(l, match, r) {
return l + doMagic(match) + r;
})
The zero-width chars have to be ignored, so the function can return (and replace) only doMagic(match).
I have only very limited lookaround experience and non of it in JS. Grazie.
PS. Or maybe I need a lookbehind and those aren't supported in JS..? I'm confused?
PS. A little bit of context: http://jsfiddle.net/rudiedirkx/kMs2N/show/ (ooh a link!)
you can use word boundary shortcut \b to assert that it's the whole word that you are matching.
The easiest way to achieve what you want to do is probably to match:
/(\s+gimme)(?=\W)/gi
and replace with [yourReplacement] - i.e. capture the whitespaces before 'gimme' and then include one in the replacement.
Another way to approach this would be capturing more characters before and after the gimme literal and then using the groups with backreference:
(\W+?)gimme(\W+?) - your match - note that this time the before and after characters are in the capturing groups 1 and 2
And you'd want to use \1[yourReplacement]\2 as replacement string - not sure how you use backreference in JS, but the idea is to tell the engine that with \1 you mean whatever was matched by the first captuing parenthesis. In some languages these are accessed with $1.
What you currently have will not work, for the following reason, (?=\W) means "the next character is not a word character", and the next thing you try to match is a literal g, so you have a contradiction ("next character is a g, but isn't a letter").
You do in fact need a lookbehind, but they are not supported by JavaScript.
Check out this article on Mimicking Lookbehind in JavaScript for a possible approach.
Have you considered using a lexer/parser combo?
This one is javascript based, and comes with a spiffy demonstration.

Javascript lookahead regular expression

I'm trying to write a regular expression to parse the following string out into three distinct parts. This is for a highlighting engine I'm writing:
"\nOn and available after solution."
I have a regular expression that's dynamically created for any word a user might input. In the above example, the word is "on".
The regular expression expects a word with any amount of white space ([\s]*) followed by the search word (with no -\w following it, eg: on-time, on-wards should not be a valid result. To complicate this, there can be a -,$,< or > symbol following the example, so on-, on> or on$ are valid. This is why there is a negative lookahead after the search word in my regular expression below.
There's a complicated reason for this, but it's not relevant to the question. The last part should be the rest of the sentence. In this example, " and available after solution."
So,
p1 = "\n"
p2 = "On"
p3 = " and available after solution"
I currently have the following regular expression.
test = new RegExp('([\\s]*)(on(?!\\-\\w))([$\\-><]*?\\s(?=[.]*))',"gi")
The first part of this regular expression ([\\s]*)(on(?!\\-\\w))[$\\-><]*? works as expected. The last part does not.
In the last part, what I'm trying to do is force the regular expression engine to match whitespace before matching additional characters. If it can not match a space, then the regular expression should end. However, when I run this regular expression, I get the following results
str1 = "\nOn ly available after solution."
test.exec(str1)
["\n On ", "\n ", "On"]
So it would appear to me that the last positive look ahead is not working. Thanks for any suggestions, and if anyone needs some clarification, let me know.
EDIT:
It would appear that my regular expression was not matching because I didn't realize the following caveat:
You can use any regular expression inside the lookahead. (Note that this is not the case with lookbehind. I will explain why below.) Any valid regular expression can be used inside the lookahead. If it contains capturing parentheses, the backreferences will be saved. Note that the lookahead itself does not create a backreference. So it is not included in the count towards numbering the backreferences. If you want to store the match of the regex inside a backreference, you have to put capturing parentheses around the regex inside the lookahead, like this: (?=(regex)). The other way around will not work, because the lookahead will already have discarded the regex match by the time the backreference is to be saved.
The dot in the character class [.] means a literal dot. Change it to just . if you wish to match any character.
The lookahead (?=.*) will always match and is completely pointless. Change it to (.*) if you just want to capture that part of the string.
I think the problem is your positive lookahead on(?!\-\w) is trying to match any on that is not followed by - then \w. I think what you want instead is on(?!\-|\w), which matches on that is not followed by - OR \w

Categories

Resources