JS RegExp catch words not followed by bracket - javascript

I'm trying to come up with a regular expression to match words that are not the beginning of a function.
So it should match everything, that is not followed by an opening bracket.
The "something" following the text should also not be put into the [0]-element of the result. So for a string of test), the closing ) should not be part of the matching group, which is why something like ^([a-zA-Z][\w-]*)(\s|$|\|,)) doesn't work.
An additional problem is, that the function name may contain a dash (hence the [\w-]*).
My first attempt:
new RegExp(/^([a-zA-Z][\w-]*)(?!\()/)
This will match everything but the last character from the word, so tes from test(.
The next attempt was: new RegExp(/^([a-zA-Z][\w-]*)(?!\()\b/).
This will not match something like test( but will match get- from get-border(, because the - is a word breaking character.
I guess what I would need is "\b that is not a -", but not capturing it?
A few examples to maybe make clearer what I'm trying to accomplish:
foo( -> null
arg) -> arg
foo-bar( -> null
arg -> arg
The motivation for this problem: I want to split a text like foo(bar(argument)) into a list of tokens: ['foo(', 'bar(', 'argument', ')', ')'], given the regular expressions FUNCTION_START, ARGUMENT (< problem), FUNCTION_END.
Pseuo-Code:
while (line.length > 0) {
regExp.some(r => {
const match = line.match(r);
if (match) {
tokens.push(...);
line = line.replace(r, '').trim();
return true;
}
return false;
});
}
Which should not depend on the order of the regular expressions.

You may use expression:
^[a-zA-Z]+(?=\)|$)
^ Assert beginning of line.
[a-zA-Z]+ Alphabetic characters, lower and upper case, one or more.
(?=\)|$) Positive lookahead, match either a closing bracket ) or end of line $.
You can test the regex live here.

Related

check every occurrence of a special character followed by a whitespace using regex

I'm trying to check for every occurrence that a string has an # at the beginning of a string.
So something like this works for only one string occurance
const comment = "#barnowl is cool"
const regex = /#[a-z]/i;
if (comment.charAt(0).includes("#")) {
if (regex.test(comment)) {
// do something
console.log('test passeed')
} else {
// do something else
}
} else {
// do other
}
but....
What if you have a textarea and a user uses the # multiple times to reference another user this test will no longer work because charAt(0) is looking for the first character in a string.
What regex test is doable in a situation where you have to check the occurrence of a # followed by a space. I know i can ditch charAt(0) and use comment.includes("#") but i want to use a regex pattern to check if there is space after wards
So if user does #username followed by a space after words, the regex should pass.
Doing this \s doesn't seem to make the test pass
const regex = /#[a-z]\s/i; // shouldn't this check for white space after a letter ?
demo:
https://jsbin.com/riraluxape/edit?js,console
I think your expression is very close. There are two things that are missing:
The [a-z] match is only looking for one character, so in order to look for multiple characters it needs to be [a-z]+.
The flags section is missing the g modifier, which enables the expression to look through the entire text string instead of just the first match.
I believe the regular expression declaration should be adjusted to the following:
const regex = /#[a-z]+\s/ig;
Is this what you want? Matching all the occurrences of the mention?
const regex = /#\w+/ig
I used the \w flag here which matches any word character.
To check for multiple matches instead of only the first one, append g to the regex:
const regex = /#[a-z]*\s/ig;
Your regex with \s actually works, see: https://regex101.com/r/gyMyvB/1

JS Regex: Remove anything (ONLY) after a word

I want to remove all of the symbols (The symbol depends on what I select at the time) after each word, without knowing what the word could be. But leave them in before each word.
A couple of examples:
!!hello! my! !!name!!! is !!bob!! should return...
!!hello my !!name is !!bob ; for !
and
$remove$ the$ targetted$# $$symbol$$# only $after$ a $word$ should return...
$remove the targetted# $$symbol# only $after a $word ; for $
You need to use capture groups and replace:
"!!hello! my! !!name!!! is !!bob!!".replace(/([a-zA-Z]+)(!+)/g, '$1');
Which works for your test string. To work for any generic character or group of characters:
var stripTrailing = trail => {
let regex = new RegExp(`([a-zA-Z0-9]+)(${trail}+)`, 'g');
return str => str.replace(regex, '$1');
};
Note that this fails on any characters that have meaning in a regular expression: []{}+*^$. etc. Escaping those programmatically is left as an exercise for the reader.
UPDATE
Per your comment I thought an explanation might help you, so:
First, there's no way in this case to replace only part of a match, you have to replace the entire match. So we need to find a pattern that matches, split it into the part we want to keep and the part we don't, and replace the whole match with the part of it we want to keep. So let's break up my regex above into multiple lines to see what's going on:
First we want to match any number of sequential alphanumeric characters, that would be the 'word' to strip the trailing symbol from:
( // denotes capturing group for the 'word'
[ // [] means 'match any character listed inside brackets'
a-z // list of alpha character a-z
A-Z // same as above but capitalized
0-9 // list of digits 0 to 9
]+ // plus means one or more times
)
The capturing group means we want to have access to just that part of the match.
Then we have another group
(
! // I used ES6's string interpolation to insert the arg here
+ // match that exclamation (or whatever) one or more times
)
Then we add the g flag so the replace will happen for every match in the target string, without the flag it returns after the first match. JavaScript provides a convenient shorthand for accessing the capturing groups in the form of automatically interpolated symbols, the '$1' above means 'insert contents of the first capture group here in this string'.
So, in the above, if you replaced '$1' with '$1$2' you'd see the same string you started with, if you did 'foo$2' you'd see foo in place of every word trailed by one or more !, etc.

Select everything between not starting with but ending with

So I am trying to select some names with JS but I can figure out how. I found 3 solution here but still could not get it to work:
I would like to select word that DOESN'T start with . and HAS to end with {
Here is what I have:
\b(?!\.)[\w\-]+(?=\s*{)\b
Also tried: ^(?!\.)[\w\-]+(?=\s*:)
Example:
.test { }
test { } <--- Select this test
If you wish to match -^!foo {}, use (?:\s|^)([^\s\.]+(?=\s*\{)).
If you wish to only match foo {}, use (?:[^\w\.]|^)([^\W\.]+(?=\s*\{)).
var pattern1 = /(?:\s|^)([^\s\.]+(?=\s*\{))/gm,
pattern2 = /(?:[^\w\.]|^)([^\W\.]+(?=\s*\{))/gm,
text = ".foo{} bar {} !!baz{} ..-boom {}",
match;
console.log('First pattern:');
while (match = pattern1.exec(text)) {
console.log(match[1]); // Prints "bar", "!!baz"
}
console.log('Second pattern:');
while (match = pattern2.exec(text)) {
console.log(match[1]); // Prints "bar", "baz", "boom"
}
Explanation of the first regex:
We expect the leading position before your word to either be the start
of the line ^ or whitespace \s.
The word itself consists of repeated non-whitespace characters that
are not dots [^\s\.].
The word must be followed by a {, for which we use lookahead
via (?=\{).
JavaScript's regex engine doesn't support lookbehind, so you have to use a non-capturing group (?:...) to match the leading position before your word.
See JavaScript regular expressions and sub-matches for an explanation of how to access capturing groups
See https://regex101.com/r/bT8sE5/5 for a live demo of the regex with further explanation.
How about this:
([^\w\.]|^)(\w+\{)
It's basically saying anything at the start of the line, or beginning with a nonword / nondot character.
It's tricky to do with \b since it matches after the dot quite happily. You can possibly get it to work with the negative lookahead but it's pretty funky stuff at this point :)
You can do it with this: ^.*\.(\w+\{\}){1}.*$
Explanation:
^ is the beginning of the string
.* matches everything behind the dot (.)
(\w+\{\})* the capture group matches the word and the brackets after it (for example test{}} zero or more times
.* matches everything after the word
$ is the end of the string
So for the input: sadasdas.test{}daasdasdasdasd it will match test{}
Try it out here: https://regex101.com/r/hE4uY4/1
The following works in relation to http://regexr.com/
You can test it there.
/(?![\s])(^[^.]([\S]+)[{}][\s])/igm

How to match string inside second set of brackets with Regex Javascript

Here is my string:
type_logistics[][delivery]
type_logistics[][random]
type_logistics[][word]
I would like to pull out the word, whatever it is, inside the second set of brackets. I thought that meant doing something like this:
Indicate that the start of the string I want to capture is [ by writing ^\[
Indicate that there will be any number 1+ of characters using [a-z]+
Indicate that the end will be ] by using \]$
The above three steps should get me to [delivery], [random], [word] in which case I'd just wrap the entire regex in a capture parenthesis ()
My finished statement would have been
string.match(/^\[([a-z]+)\]$/)
Have been playing with regex101.com and literally none of my assumptions have worked LOL. Please help?
With ^ you are assuming the String you are checking starts there. Your String starts with type_logistics and not as expected by the regex with a [
To detect the 2nd set of brackets you need to either add the type_logistics[] to the regex or just match everything before the 1st set of brackets with .*
When working with multiple lines (for example during testing on regex101), don't forget to set the modifiers gm
g modifier: global. All matches (don't return on first match) m modifier: multi-line. Causes ^ and $ to match the begin/end of each
line (not only begin/end of string)
These all would work for your test cases
/^.*\[\]\[([a-z]+)\]$/gm
/^type_logistics\[\]\[([a-z]+)\]$/gm
/^.*\[([a-z]+)\]$/gm
Match [ followed by a-z followed by ] , convert back to string, split [ character, filter "" empty string
var str = "type_logistics[][delivery] type_logistics[][random] type_logistics[][word]"
var res = str.match(/(\[[a-z]+)(?=\])/g).join("").split(/\[/).filter(Boolean);
console.log(res);
document.body.textContent = res;

With a JS Regex matching exact word but not hypenated words starting with said word

I could not find a match to this question.
I have a string like so
var s="one two one-two one-three one one_four"
and my function is as follows
function replaceMatches( str, word )
{
var pattern=new RegExp( '\\b('+word+')\\b','g' )
return str.replace( pattern, '' )
}
the problem is if I run the function like
var problem=replaceMatches( s,'one' )
it
returns two -two -three one_four"
the function replaces every "one" like it should but treats words with a hyphen as
two words replacing the "one" before the hyphen.
My question is not about the function but about the regex. What literal regex will match
only the words "one" in my string and not "one-two" or "one-\w"<--you know what I mean lol
basically
var pat=/\b(one)\b/g
"one one-two one".replace( pat, '')
I want the above ^ to return
" one-two "
only replace the exact match "one" and not the one in "one-two"
the "one" on the end is important to, the regex must work if the match is at the very end
Thank you, sorry if my question is relatively confusing. I am just trying to get my learn on, and expand my personal library.
What do you considered to be a word?
A word is a sequence of 1 or more word characters, and word boundary \b is defined based upon the definition of word character (and non-word character).
Word character as defined by \w in JavaScript RegExp is shorthand for character class [a-zA-Z0-9_].
What is your definition of a "word"? Let's say your definition is [a-zA-Z0-9_-].
Emulating word boundary
This post describes how to emulate a word boundary in languages that support look-behind and look-ahead. Too bad, JS doesn't support look-behind.
Let us assume the word to be replaced is one for simplicity.
We can limit the replacement with the following code:
inputString.replace(/([^a-zA-Z0-9_-]|^)one(?![a-zA-Z0-9_-])/g, "$1")
Note: I use the expanded form [a-zA-Z0-9_-] instead of [\w-] to avoid association with \w.
Break down the regex:
(
[^a-zA-Z0-9_-] # Negated character class of "word" character
| # OR
^ # Beginning of string
)
one # Keyword
(?! # Negative look-ahead
[a-zA-Z0-9_-] # Word character
)
I emulate the negative look-behind (which is (?<![a-zA-Z0-9_-]) if supported) by matching a character from negated character class of "word" character and ^ beginning of string. This is natural, since if we can't find a "word" character, then it must be either a non-"word" character or beginning of the string. Everything is wrapped in a capturing group so that it can be replaced back later.
Since one is only replace if there is no "word" character before or after, there is no risk of missing a match.
Putting together
Since you are removing "word"s, you must make sure your keyword contains only "word" characters.
function replaceMatches(str, keyword)
{
// The keyword must not contain non-"word" characters
if (!/^[a-zA-Z0-9_-]+$/.test(keyword)) {
throw "not a word";
}
// Customize [a-zA-Z0-9_-] and [^a-zA-Z0-9_-] with your definition of
// "word" character
var pattern = new RegExp('([^a-zA-Z0-9_-]|^)' + keyword + '(?![a-zA-Z0-9_-])', 'g')
return str.replace(pattern, '$1')
}
You need to escape meta-characters in the keyword if your definition of "word" character includes regex meta-characters.
Use this for your RegExp:
function replaceMatches( str, word ) {
var pattern = new RegExp('(^|[^-])\\b('+word+')\\b([^-]|$)', 'g');
return str.replace(pattern, '$1$3')
}
The (^|[^-]) will match either the start of the string or any character except -. The ([^-]|$) will match either a character other than - or the end of the string.
I'm not a JS pattern function expert but the function should replace all.
As for the hyphen in 'one-two' between one and - is a word boundry (ie. \b) and the
end of string is a word boundry if a \w character is there before it.
But, it sounds like you may want 'one' to be preceeded with a space or BOL.
([ ]|^)one\b in that case you want to make the replacement capture group 1, thus strippking out 'one' only.
And, I'm not sure how that function call works in JS.
Edit: after new expected output, the regex could be -
([ ]|^)one(?=[ ]|$)

Categories

Resources