Match only # and not ## without negative lookbehind - javascript

Using JavaScript, I need a regex that matches any instance of #{this-format} in any string. My original regex was the following:
#{[a-z-]*}
However, I also need a way to "escape" those instances. I want it so that if you add an extra #, the match gets escaped, like ##{this}.
I originally used a negative lookbehind:
(?<!#)#{[a-z-]*}
And that would work just fine, except... lookbehinds are an ECMAScript2018 feature, only supported by Chrome.
I read some people suggesting the usage of a negated character set. So my little regex became this:
(?:^|[^#])#{[a-z-]*}
...which would have worked just as well, except it doesn't work if you put two of these together: #{foo}#{bar}
So, anyone knows how can I achieve this? Remember that these conditions need to be met:
Find #{this} anywhere in a string
Be able to escape like ##{this}
Be able to put multiple adjacent, like #{these}#{two}
Lookbehinds must not be used

If you include ## in your regex pattern as an alternate match option, it will consume the ## instead of allowing a match on the subsequent bracketed entity. Like this:
##|(#{[a-z-]*})
You can then evaluate the inner match object in javascript. Here is a jsfiddle to demonstrate, using the following code.
var targetText = '#{foo} in a #{bar} for a ##{foo} and #{foo}#{bar} things.'
var reg = /##|(#{[a-z-]*})/g;
var result;
while((result = reg.exec(targetText)) !== null) {
if (result[1] !== undefined) {
alert(result[1]);
}
}

You could use (?:^|[^#])# to match the start of the pattern, and capture the following #{<sometext>} in a group. Since you don't want the initial (possible) [^#] to be in the result, you'll have to iterate over the matches manually and extract the group that contains the substring you want. For example:
function test(str) {
const re = /(?=(?:^|[^#])(#{[a-z-]*}))./g;
let match;
const matches = [];
while (match = re.exec(str)) {
matches.push(match[1]); // extract the captured group
}
return matches;
}
console.log(test('##{this}'))
console.log(test('#{these}#{two}'))

Related

RegEx for detecting a string and a path in one go

Here is an example of what regex I need regex
I have many of these lines in a file
build test/testfoo/CMakeFiles/testfoo2.dir/testfoo2.cpp.o: CXX_COMPILER__testfoo2_Debug /home/juxeii/projects/gtest-cmake-example/test/testfoo/testfoo2.cpp || cmake_object_order_depends_target_testfoo2
I need to detect the string between CXX_COMPILER__ and _Debug, which here is testfoo2.
At the same time, I need to also detect the entire file path /home/juxeii/projects/gtest-cmake-example/test/testfoo/testfoo2.cpp, which comes always after the first match.
I could not figure out a regex for this. So far I have .*CXX_COMPILER__(.\w+)_\w+|(\/[a-zA-Z_0-9-]+)+\.\w+ and I am using it in typescript like so:
const fileAndTargetRegExp = new RegExp('.*CXX_COMPILER__(.\w+)_\w+|(\/[a-zA-Z_0-9-]+)+\.\w+', 'gm');
let match;
while (match = fileAndTargetRegExp.exec(fileContents)) {
//do something
}
But I get no matches. Is there an easy way to do this?
Will it always have the || <stuff here> at the end? If so, this regex based on the one you provided should work:
/.*CXX_COMPILER__(\w+)_.+?((?:\/.+)+) \|\|.*/g
As the regex101 breakdown shows, the first capturing group should contain the string between CXX_COMPILER__ and _Debug, while the second should contain the path, using the space and pipes to detect where the latter ends.
let line = 'build test/testfoo/CMakeFiles/testfoo2.dir/testfoo2.cpp.o: CXX_COMPILER__testfoo2_Debug /home/juxeii/projects/gtest-cmake-example/test/testfoo/testfoo2.cpp || cmake_object_order_depends_target_testfoo2';
const matches = line.match(/.*CXX_COMPILER__(\w+)_.+?((?:\/.+)+) \|\|.*/).slice(1); //slice(1) just to not include the first complete match returned by match!
for (let match of matches) {
console.log(match);
}
If the pipes won't always be there, then this version should work instead (regex101):
.*CXX_COMPILER__(\w+)_.+?((?:\/(?:\w|\.|-)+)+).*
But it requires you to add all of the valid path characters individually every time you realize a new one might be there, and you'll need to make sure the paths don't have spaces because adding space to the regex would make it detect the stuff after the path too.
Looks good, but you need delimiters. Add "/" before and after your Regex - no quotation marks.
let fileContents = 'build test/testfoo/CMakeFiles/testfoo2.dir/testfoo2.cpp.o: CXX_COMPILER__testfoo2_Debug /home/juxeii/projects/gtest-cmake-example/test/testfoo/testfoo2.cpp || cmake_object_order_depends_target_testfoo2';
const fileAndTargetRegExp = new RegExp(/.*CXX_COMPILER__(.\w+)_\w+|(\/[a-zA-Z_0-9-]+)+\.\w+/, 'gm');
let match;
while (match = fileAndTargetRegExp.exec(fileContents)) {
console.log(match);
}
Here's my way of doing it with replace:
I need to detect the string between CXX_COMPILER__ and _Debug, which is here testfoo2.
Try to replace all characters of the string with just the first captured group $1 which is between CXX_COMPILER__ and _Debug:
/.*CXX_COMPILER__(\w+)_Debug.*/
^^^^<--testfoo2
I need to also detect the entire file path /home/juxeii/projects/gtest-cmake-example/test/testfoo/testfoo2.cpp
The same, just this time replace all just leave the second matched group which is anything comes after our first captured group:
/.*CXX_COMPILER__(\w+)_Debug\s+(.*?)(?=\\|\|).*/
^^^<-- /home/.../testfoo2.cpp
let line = 'build test/testfoo/CMakeFiles/testfoo2.dir/testfoo2.cpp.o: CXX_COMPILER__testfoo2_Debug /home/juxeii/projects/gtest-cmake-example/test/testfoo/testfoo2.cpp || cmake_object_order_depends_target_testfoo2'
console.log(line.replace(/.*CXX_COMPILER__(\w+)_Debug.*/gm,'$1'))
console.log(line.replace(/.*CXX_COMPILER__(\w+)_Debug\s+(.*?)(?=\\|\|).*/gm,'$2'))

How to allow only certain words consecutively with Regex in javascript

I'm trying to write a regex that will return true if it matches the format below, otherwise, it should return false. It should only allow words as below:
Positive match (return true)
UA-1234-1,UA-12345-2,UA-34578-2
Negative match (return false or null)
Note: A is missing after U
UA-1234-1,U-12345-2
It should always give me true when the string passed to regex is
UA-1234-1,UA-12345-2,UA-34578-2,...........
Below is what I am trying to do but it is matching only the first element and not returning null.
var pattern=/^UA-[0-9]+(-[0-9]+)?/g;
pattern.match("UA-1234-1,UA-12345-2,UA-34578-2");
pattern.exec("UA-1234-1,UA-12345-2,UA-34578-2)
Thanks in advance. Help is greatly appreciated.
The pattern you need is a pattern enclosed with anchors (^ - start of string and $ - end of string) that matches your pattern at first (the initial "block") and then matches 0 or more occurrences of a , followed with the block pattern.
It looks like /^BLOCK(?:,BLOCK)*$/. You may introduce optional whitespaces in between, e.g. /^BLOCK(?:,\s*BLOCK)*$/.
In the end, the pattern looks like ^UA-[0-9]+(?:-[0-9]+)?(?:,UA-[0-9]+(?:-[0-9]+)?)*$. It is best to build it dynamically to keep it readable and easy to maintain:
const block = "UA-[0-9]+(?:-[0-9]+)?";
let rx = new RegExp(`^${block}(?:,${block})*$`); // RegExp("^" + block + "(?:," + block + ")*$") // for non-ES6
let tests = ['UA-1234-1,UA-12345-2,UA-34578-2', 'UA-1234-1,U-12345-2'];
for (var s of tests) {
console.log(s, "=>", rx.test(s));
}
split the string by commas, and test each element instead.

How to split a string by a character not directly preceded by a character of the same type?

Let's say I have a string: "We.need..to...split.asap". What I would like to do is to split the string by the delimiter ., but I only wish to split by the first . and include any recurring .s in the succeeding token.
Expected output:
["We", "need", ".to", "..split", "asap"]
In other languages, I know that this is possible with a look-behind /(?<!\.)\./ but Javascript unfortunately does not support such a feature.
I am curious to see your answers to this question. Perhaps there is a clever use of look-aheads that presently evades me?
I was considering reversing the string, then re-reversing the tokens, but that seems like too much work for what I am after... plus controversy: How do you reverse a string in place in JavaScript?
Thanks for the help!
Here's a variation of the answer by guest271314 that handles more than two consecutive delimiters:
var text = "We.need.to...split.asap";
var re = /(\.*[^.]+)\./;
var items = text.split(re).filter(function(val) { return val.length > 0; });
It uses the detail that if the split expression includes a capture group, the captured items are included in the returned array. These capture groups are actually the only thing we are interested in; the tokens are all empty strings, which we filter out.
EDIT: Unfortunately there's perhaps one slight bug with this. If the text to be split starts with a delimiter, that will be included in the first token. If that's an issue, it can be remedied with:
var re = /(?:^|(\.*[^.]+))\./;
var items = text.split(re).filter(function(val) { return !!val; });
(I think this regex is ugly and would welcome an improvement.)
You can do this without any lookaheads:
var subject = "We.need.to....split.asap";
var regex = /\.?(\.*[^.]+)/g;
var matches, output = [];
while(matches = regex.exec(subject)) {
output.push(matches[1]);
}
document.write(JSON.stringify(output));
It seemed like it'd work in one line, as it did on https://regex101.com/r/cO1dP3/1, but had to be expanded in the code above because the /g option by default prevents capturing groups from returning with .match (i.e. the correct data was in the capturing groups, but we couldn't immediately access them without doing the above).
See: JavaScript Regex Global Match Groups
An alternative solution with the original one liner (plus one line) is:
document.write(JSON.stringify(
"We.need.to....split.asap".match(/\.?(\.*[^.]+)/g)
.map(function(s) { return s.replace(/^\./, ''); })
));
Take your pick!
Note: This answer can't handle more than 2 consecutive delimiters, since it was written according to the example in the revision 1 of the question, which was not very clear about such cases.
var text = "We.need.to..split.asap";
// split "." if followed by "."
var res = text.split(/\.(?=\.)/).map(function(val, key) {
// if `val[0]` does not begin with "." split "."
// else split "." if not followed by "."
return val[0] !== "." ? val.split(/\./) : val.split(/\.(?!.*\.)/)
});
// concat arrays `res[0]` , `res[1]`
res = res[0].concat(res[1]);
document.write(JSON.stringify(res));

change regex to match some words instead of all words containing PRP

This regex matches all characters between whitespace if the word contains PRP.
How can I get it to match all words, or characters in-between whitepsace, if they contain PRP, but not if they contain me in any case.
So match all words containing PRP, but not containing ME or me.
Here is the regex to match words containing PRP: \S*PRP\S*
You can use negative lookahead for this:
(?:^|\s)((?!\S*?(?:ME|me))\S*?PRP\S*)
Working Demo
PS: Use group #1 for your matched word.
Code:
var re = /(?:^|\s)((?!\S*?(?:ME|me))\S*?PRP\S*)/;
var s = 'word abcPRP def';
var m = s.match(re);
if (m) console.log(m[1]); //=> abcPRP
Instead of using complicated regular expressions which would be confusing for almost anyone who's reading it, why don't you break up your code into two sections, separating the words into an array and filtering out the results with stuff you don't want?
function prpnotme(w) {
var r = w.match(/\S+/g);
if(r == null)
return [];
var i=0;
while(i<r.length) {
if(!r[i].contains('PRP') || r[i].toLowerCase().contains('me'))
r.splice(i,1);
else
i++;
}
return r;
}
console.log(prpnotme('whattttttt ok')); // []
console.log(prpnotme('MELOLPRP PRPRP PRPthemeok PRPmhm')); // ['PRPRP', 'PRPmhm']
For a very good reason why this is important, imagine if you ever wanted to add more logic. You're much more likely to make a mistake when modifying complicated regex to make it even more complicated, and this way it's done with simple logic that make perfect sense when reading each predicate, no matter how much you add on.

Convert string to function with parameters

I am struggling with writing a regular expression to turn the string
"fetchSomething('param1','param2','param3')"
into the proper function call. I can do it with some splitting and substrings but would rather do it with a .match using capture groups for efficiency's sake (and my own education).
However when I use
'something("stuff","moreStuff","yetMoreStuff")'.match(/(?:\(|,)("?\w+"?)/g)
I get
["("stuff"", ","moreStuff"", ","yetMoreStuff""]
Which is the same result regardless of the ?:, this confuses me since I thought ?: would cause it to ignore the first capture group? Or am I completely miss understanding capture groups?
You get the whole string when you have the g flag active. If you're going only after the sub-matches, then you will need to use .exec and a loop:
var regex = /(?:\(|,)("?\w+"?)/g;
var s = 'something("stuff","moreStuff","yetMoreStuff")';
var match, matches=[];
while ( (match=regex.exec(s)) !== null ) {
matches.push(match[1]);
}
alert(matches);
jsfiddle

Categories

Resources