Get last 2 or 3 elements from path regex - javascript

So i currently have a path and i am trying to fetch the last 3;
Test:
/testing/path/here/src/handlebar/sample/colors.txt
/testing/path/here/src/handlebar/testing/another/colors.txt
Regex:
\/([^/]+\/[^/]+\/[^/]+)\.[^.]+$
Result:
handlebar/sample/colors
testing/another/colors
What i want it to do:
sample/colors
testing/another/colors
If there are 2 directories and then the item, it should utilise the 3 and if it contains the word handlebar, it should only be two.

You could just create a group for everything behind handlebar/ like this:
with a named capturing group (subPath group contains wanted value):
/handlebar\/(?<subPath>\S*)\.\S+$/gm
without naming (first group contains wanted value):
/handlebar\/(\S*)\.\S+$/gm
Explanation: This regex matches everything ending with 'handlebar/(...any non white-space chacters 0 to infinite times).(any white-space character 1-inifite times)'. With flags globally and multiline, if you want to check multiple paths within one string separated with a line break e.g.
As you tagged the question with the tag javascript, here is some example code, how to retrieve the value of the regex group
function getSubPath(fullPath = '') {
const regex = /handlebar\/(?<subPath>\S*)\.\S+$/gm
const match = regex.exec(fullPath)
if (match) {
return match.groups.subPath
}
return fullPath // regex.exec did not deliver match
}
getSubPath('/testing/path/here/src/handlebar/sample/colors.txt')
// returns 'sample/colors'
getSubPath('/testing/path/here/src/handlebar/testing/another/colors.txt')
// returns 'testing/another/colors'
without the named group, just read / return match.groups[1] for first capturing group; index 0 is for the full match (which would include the '/handlebars' and the file extension)

I hope you'll get like this.
This is the dynamic tomorrow you can pass as per your required parameters and get result..
<script>
var res = "/testing/path/here/src/handlebar/sample/colors.txt";
var res1 = "/testing/path/here/src/handlebar/testing/another/colors.txt";;
Result = (val, text) => {
var r = val.split(text + '/')[1];
return r.substr(0, r.lastIndexOf('.'));
}
console.log(Result(res, "handlebar"));
console.log(Result(res1, "handlebar"));
</script>

A javascript solution without regex would look like this:
const getTokenizedPath = path => {
const pathArray = path.split('/');
// last element of array looks like "colors.txt" - split by dot and read the first value, removing the extension
pathArray[pathArray.length-1] = pathArray[pathArray.length-1].split('.')[0];
// Remove all elements before the 'handlebar' token and join the remaining values together by '/'.
return pathArray.slice(pathArr2.indexOf('handlebar')+1).join('/');
}
getTokenizedPath('/testing/path/here/src/handlebar/sample/colors.txt');
--- sample/colors.txt
getTokenizedPath('/testing/path/here/src/handlebar/testing/another/colors.txt');
--- testing/another/colors

I guess,
(?!.*handlebar)/([^/]+/[^/]+/[^/]+)\.[^.]+$|/([^/]+/[^/]+)\.[^.]+$
might work OK.
Demo 1
and if lookarounds would be supported,
(?!.*handlebar)(?<=/)[^/]+/[^/]+/[^/]+(?=\.[^.]+$)|$|(?<=/)([^/]+/[^/]+)(?=\.[^.]+$)
Demo 2
would be an option too.
const regex = /(?!.*handlebar)\/([^\/]+\/[^\/]+\/[^\/]+)\.[^.]+$|\/([^\/]+\/[^\/]+)\.[^.]+$/gm;
const str = `/testing/path/here/src/handlebar/sample/colors.txt
/testing/path/here/src/handlebar/testing/another/colors.txt`;
let m;
while ((m = regex.exec(str)) !== null) {
// This is necessary to avoid infinite loops with zero-width matches
if (m.index === regex.lastIndex) {
regex.lastIndex++;
}
// The result can be accessed through the `m`-variable.
m.forEach((match, groupIndex) => {
console.log(`Found match, group ${groupIndex}: ${match}`);
});
}
RegEx Circuit
jex.im visualizes regular expressions:
If you wish to simplify/modify/explore the expression, it's been explained on the top right panel of regex101.com. If you'd like, you can also watch in this link, how it would match against some sample inputs.

Related

Insure that regex moves to the second OR element only if the first one doesn't exist

I'm trying to match a certain word on a string and only if it doesn't exist i want to match the another one using the OR | operator ....but the match is ignoring that... how can i insure that the behavior works :
const str = 'Soraka is an ambulance 911'
const regex = RegExp('('+'911'+'|'+'soraka'+')','i')
console.log(str.match(regex)[0]) // should get 911 instead
911 occurs late in the string, whereas Soraka occurs earlier, and the regex engine iterates character-by-character, so Soraka gets matched first, even though it's on the right-hand side of the alternation.
One option would be to match Soraka or 911 in captured lookaheads instead, and then with the regex match object, alternate between the two groups to get the one which is not undefined:
const check = (str) => {
const regex = /^(?=.*(911)|.*(Soraka))/;
const match = str.match(regex);
console.log(match[1] || match[2]);
};
check('Soraka is an ambulance 911');
check('foo 911');
check('foo Soraka');
You can use includes and find
You can pass the strings in the priority sequence, so as soon as find found any string in the original string it returns that strings back,
const str = 'Soraka is an ambulance 911'
const findStr = (...arg) => {
return [...arg].find(toCheck => str.includes(toCheck))
}
console.log(findStr("911", "Soraka"))
You can extend the findStr if you want your match to be case insensitive something like this
const str = 'Soraka is an ambulance 911'
const findStr = (...arg) => {
return [...arg].find(toCheck => str.toLowerCase().includes(toCheck.toLowerCase()))
}
console.log(findStr("Soraka", "911"))
If you want match to be whole word not the partial words than you can build dynamic regex and use it search value
const str = '911234 Soraka is an ambulance 911'
const findStr = (...arg) => {
return [...arg].find(toCheck =>{
let regex = new RegExp(`\\b${toCheck}\\b`,'i')
return regex.test(str)
})
}
console.log(findStr("911", "Soraka"))
Just use a greedy dot before a capturing group that matches 911 or Soraka:
/.*(911)|(Soraka)/
See the regex demo
The .* (or, if there are line breaks, use /.*(911)|(Soraka)/s in Chrome/Node, or /[^]*(911)|(Soraka)/ to support legacy EMCMScript versions) will ensure the regex index advances to the rightmost position when matching 911 or Soraka.
JS demo (borrowed from #CertainPerformance's answer):
const check = (str) => {
const regex = /.*(911)|(Soraka)/;
const match = str.match(regex) || ["","NO MATCH","NO MATCH"];
console.log(match[1] || match[2]);
};
check('Soraka is an ambulance 911');
check('Ambulance 911, Soraka');
check('foo 911');
check('foo Soraka');
check('foo oops!');

Problem with extracting multiple segments [duplicate]

This question already has answers here:
How can I get query string values in JavaScript?
(73 answers)
Closed 3 years ago.
I'm trying to capture the bellow parts from the attached log:
aff_lsr?tt_adv_id=806&tt_cid=&tt_adv_sub=b3fff3722fc6b52aedde9b86bb22bf23&tt_time=2016-04-05+16%3A08%3A18&
should capture:
● tt_adv_id
● 806
● tt_adv_sub
● b3fff3722fc6b52aedde9b86bb22bf23
● tt_time
● 2016-04-05+16%3A08%3A18
I have tried to create a regex to extract all strings which start with either “?” “&” or “=” and end with either “=” or “&”
this is the regex I have tried:
(?=[?/&/=]).*?(=)|(?=[=/&])
it ignores all parts between “=” and “&”
so the result i get is :
?tt_adv_id =
&tt_cid=
&tt_adv_sub=
&tt_time =
You can simply use
/[?&=]([^&=]+)/
[?&=] - Match ?, & or =
([^&=]+) - Match anything except & and = one or more time
and than drop off the first character of each match,
let str = "aff_lsr?tt_adv_id=806&tt_cid=&tt_adv_sub=b3fff3722fc6b52aedde9b86bb22bf23&tt_time=2016-04-05+16%3A08%3A18&"
let matched = str.match(/[?&=]([^&=]+)/g).map((v)=>v.substr(1))
console.log(matched)
Maybe, these simple expressions would extract our desired values:
.*(tt_adv_id)=([^&]*)&(?:tt_cid)=(?:[^&]*)&(tt_adv_sub)=([^&]*)&(tt_time)=([^&]*).*
or
(tt_adv_id)=([^&]*)&(?:tt_cid)=(?:[^&]*)&(tt_adv_sub)=([^&]*)&(tt_time)=([^&]*)
The expression is explained on the top right panel of this demo, if you wish to explore further or modify it, and in this link, you can watch how it would match against some sample inputs step by step, if you like.
const regex = /.*(tt_adv_id)=([^&]*)&(?:tt_cid)=(?:[^&]*)&(tt_adv_sub)=([^&]*)&(tt_time)=([^&]*).*/gm;
const str = `aff_lsr?tt_adv_id=806&tt_cid=&tt_adv_sub=b3fff3722fc6b52aedde9b86bb22bf23&tt_time=2016-04-05+16%3A08%3A18&
`;
let m;
while ((m = regex.exec(str)) !== null) {
// This is necessary to avoid infinite loops with zero-width matches
if (m.index === regex.lastIndex) {
regex.lastIndex++;
}
// The result can be accessed through the `m`-variable.
m.forEach((match, groupIndex) => {
console.log(`Found match, group ${groupIndex}: ${match}`);
});
}

Need to obtain match between specified delimiters

I'm trying to match specific tags between double block quote delimiters within a sentence :
Look for `foo="x"` ONLY between the specific double block quote delimiters [[foo="x"|bar="y"|baz="z"]]
Using the following regex matches also the foo="x" outside the delimiters :
(?:(foo|bar|baz)="([^"]+)")+
I've tried adding the positive lookbehind : (?<=\[\[) but it only returns me the first foo="x" within the bockquotes but ignores the bar="y" and baz="z" matches.
const regex = /(?:(foo|bar|baz)="([^"]+)")+/gm;
const str = `Look for \`foo="x"\` ONLY between the specific double block quote delimiters [[foo="x"|bar="y"|baz="z"]]`;
let m;
while ((m = regex.exec(str)) !== null) {
// This is necessary to avoid infinite loops with zero-width matches
if (m.index === regex.lastIndex) {
regex.lastIndex++;
}
// The result can be accessed through the `m`-variable.
m.forEach((match, groupIndex) => {
console.log(`Found match, group ${groupIndex}: ${match}`);
});
}
If your strings inside [[ and ]] don't have [ and ] a simple
/(foo|bar|baz)="([^"]+)"(?=[^\][]*]])/g
will work for you. The (?=[^\][]*]]) part will make sure there are 0 or more chars other than [ and ] and then ] are immediately to the right of the current location. See the regex demo.
The safest solution includes two steps: 1) get the Group 1 value with /\[\[((foo|bar|baz)="([^"]+)"(?:\|(foo|bar|baz)="([^"]+)")*)]]/ (or a simpler but less precise but more generic /\[\[\w+="[^"]+"(?:\|\w+="[^"]+")*]]/g, see demo), and 2) use /(foo|bar|baz)="([^"]+)"/g (or /(\w+)="([^"]+)"/g) to extract the necessary values from Group 1.
const x = '(foo|bar|baz)="([^"]+)"'; // A key-value pattern block
const regex = new RegExp(`\\[\\[(${x}(?:\\|${x})*)]]`, 'g'); // Extracting the whole `[[]]`
const str = `Look for \`foo="x"\` ONLY between the specific double block quote delimiters [[foo="x"|bar="y"|baz="z"]]`;
let m;
while (m = regex.exec(str)) {
let results = [...m[1].matchAll(/(foo|bar|baz)="([^"]+)"/g)]; // Grabbing individual matches
console.log(Array.from(results, m => [m[1],m[2]]));
}
The \[\[((foo|bar|baz)="([^"]+)"(?:\|(foo|bar|baz)="([^"]+)")*)]] pattern will match
\[\[ - [[
((foo|bar|baz)="([^"]+)"(?:\|(foo|bar|baz)="([^"]+)")*) - Group 1:
(foo|bar|baz) - foo, bar or baz
= - =
"([^"]+)" - ", 1 or more chars other than " and a "
(?:\|(foo|bar|baz)="([^"]+)")* - 0 or more repetitions of | and the pattern described above
]] - ]] substring.
See the regex demo.
Try slightly another definition of your requirements:
match name="value", with capturing groups for both name and value,
before the name there should be:
either double opening bracket ([[),
or a vertical bar (|),
after the value (and closing double quote) there should be:
either double closing bracket (]]),
or a vertical bar (|).
Then the regex can be as follows:
(?:\[\[|\|)(foo|ba[rz])="(\w+)"(?=]]|\|)
Details:
(?:\[\[|\|) - the content before (will be a part of the match,
but not a part of any capturing group),
(foo|ba[rz])="(\w+)" - name / value pair (with double quotes),
(?=]]|\|) - the content after (this time expressed as a
positive lookahead).
For a working example see https://regex101.com/r/dj51GS/1

Regex optimization and best practice

I need to parse information out from a legacy interface. We do not have the ability to update the legacy message. I'm not very proficient at regular expressions, but I managed to write one that does what I want it to do. I just need peer-review and feedback to make sure it's clean.
The message from the legacy system returns values resembling the example below.
%name0=value
%name1=value
%name2=value
Expression: /\%(.*)\=(.*)/g;
var strBody = body_text.toString();
var myRegexp = /\%(.*)\=(.*)/g;
var match = myRegexp.exec(strBody);
var objPair = {};
while (match != null) {
if (match[1]) {
objPair[match[1].toLowerCase()] = match[2];
}
match = myRegexp.exec(strBody);
}
This code works, and I can add partial matches the middle of the name/values without anything breaking. I have to assume that any combination of characters could appear in the "values" match. Meaning it could have equal and percent signs within the message.
Is this clean enough?
Is there something that could break the expression?
First of all, don't escape characters that don't need escaping: %(.*)=(.*)
The problem with your expression: An equals sign in the value would break your parser. %name0=val=ue would result in name0=val=ue instead of name0=val=ue.
One possible fix is to make the first repetition lazy by appending a question mark: %(.*?)=(.*)
But this is not optimal due to unneeded backtracking. You can do better by using a negated character class: %([^=]*)=(.*)
And finally, if empty names should not be allowed, replace the first asterisk with a plus: %([^=]+)=(.*)
This is a good resource: Regex Tutorial - Repetition with Star and Plus
Your expression is fine, and wrapping it with two capturing groups is simple to get your desired variables and values.
You likely may not need to escape some chars and it would still work.
You can use this tool and test/edit/modify/change your expressions if you wish:
%(.+)=(.+)
Since your data is pretty structured, you can also do so with string split and get the same desired outputs, if you want.
RegEx Descriptive Graph
This graph shows how the expression would work and you can visualize other expressions in this link:
JavaScript Test
const regex = /%(.+)=(.+)/gm;
const str = `%name0=value
%name1=value
%name2=value`;
let m;
while ((m = regex.exec(str)) !== null) {
// This is necessary to avoid infinite loops with zero-width matches
if (m.index === regex.lastIndex) {
regex.lastIndex++;
}
// The result can be accessed through the `m`-variable.
m.forEach((match, groupIndex) => {
console.log(`Found match, group ${groupIndex}: ${match}`);
});
}
Performance Test
This JavaScript snippet shows the performance of that expression using a simple 1-million times for loop.
const repeat = 1000000;
const start = Date.now();
for (var i = repeat; i >= 0; i--) {
const string = '%name0=value';
const regex = /(%(.+)=(.+))/gm;
var match = string.replace(regex, "\nGroup #1: $1 \n Group #2: $2 \n Group #3: $3 \n");
}
const end = Date.now() - start;
console.log("YAAAY! \"" + match + "\" is a match 💚💚💚 ");
console.log(end / 1000 + " is the runtime of " + repeat + " times benchmark test. 😳 ");

Counting all the occurrences of a substing in a string using regular expression

I've seen many examples of this but didn't helped. I have the following string:
var str = 'asfasdfasda'
and I want to extract the following
asfa asfasdfa asdfa asdfasda asda
i.e all sub-strings starting with 'a' and ending with 'a'
here is my regular expression
/a+[a-z]*a+/g
but this always returns me only one match:
[ 'asdfasdfsdfa' ]
Someone can point out mistake in my implementation.
Thanks.
Edit Corrected no of substrings needed. Please note that overlapping and duplicate substring are required as well.
For capturing overlapping matches you will need to lookahead regex and grab the captured group #1 and #2:
/(?=(a.*?a))(?=(a.*a))/gi
RegEx Demo
Explanation:
(?=...) is called a lookahead which is a zero-width assertion like anchors or word boundary. It just looks ahead but doesn't move the regex pointer ahead thus giving us the ability to grab overlapping matches in groups.
See more on look arounds
Code:
var re = /(?=(a.*?a))(?=(a.*a))/gi;
var str = 'asfasdfasda';
var m;
var result = {};
while ((m = re.exec(str)) !== null) {
if (m.index === re.lastIndex)
re.lastIndex++;
result[m[1]]=1;
result[m[2]]=1;
}
console.log(Object.keys(result));
//=> ["asfa", "asfasdfasda", "asdfa", "asdfasda", "asda"]
parser doesnt goto previous state on tape to match the start a again.
var str = 'asfaasdfaasda'; // you need to have extra 'a' to mark the start of next string
var substrs = str.match(/a[b-z]*a/g); // notice the regular expression is changed.
alert(substrs)
You can count it this way:
var str = "asfasdfasda";
var regex = /a+[a-z]*a+/g, result, indices = [];
while ((result = regex.exec(str))) {
console.log(result.index); // you can instead count the values here.
}

Categories

Resources