Problem with extracting multiple segments [duplicate] - javascript

This question already has answers here:
How can I get query string values in JavaScript?
(73 answers)
Closed 3 years ago.
I'm trying to capture the bellow parts from the attached log:
aff_lsr?tt_adv_id=806&tt_cid=&tt_adv_sub=b3fff3722fc6b52aedde9b86bb22bf23&tt_time=2016-04-05+16%3A08%3A18&
should capture:
● tt_adv_id
● 806
● tt_adv_sub
● b3fff3722fc6b52aedde9b86bb22bf23
● tt_time
● 2016-04-05+16%3A08%3A18
I have tried to create a regex to extract all strings which start with either “?” “&” or “=” and end with either “=” or “&”
this is the regex I have tried:
(?=[?/&/=]).*?(=)|(?=[=/&])
it ignores all parts between “=” and “&”
so the result i get is :
?tt_adv_id =
&tt_cid=
&tt_adv_sub=
&tt_time =

You can simply use
/[?&=]([^&=]+)/
[?&=] - Match ?, & or =
([^&=]+) - Match anything except & and = one or more time
and than drop off the first character of each match,
let str = "aff_lsr?tt_adv_id=806&tt_cid=&tt_adv_sub=b3fff3722fc6b52aedde9b86bb22bf23&tt_time=2016-04-05+16%3A08%3A18&"
let matched = str.match(/[?&=]([^&=]+)/g).map((v)=>v.substr(1))
console.log(matched)

Maybe, these simple expressions would extract our desired values:
.*(tt_adv_id)=([^&]*)&(?:tt_cid)=(?:[^&]*)&(tt_adv_sub)=([^&]*)&(tt_time)=([^&]*).*
or
(tt_adv_id)=([^&]*)&(?:tt_cid)=(?:[^&]*)&(tt_adv_sub)=([^&]*)&(tt_time)=([^&]*)
The expression is explained on the top right panel of this demo, if you wish to explore further or modify it, and in this link, you can watch how it would match against some sample inputs step by step, if you like.
const regex = /.*(tt_adv_id)=([^&]*)&(?:tt_cid)=(?:[^&]*)&(tt_adv_sub)=([^&]*)&(tt_time)=([^&]*).*/gm;
const str = `aff_lsr?tt_adv_id=806&tt_cid=&tt_adv_sub=b3fff3722fc6b52aedde9b86bb22bf23&tt_time=2016-04-05+16%3A08%3A18&
`;
let m;
while ((m = regex.exec(str)) !== null) {
// This is necessary to avoid infinite loops with zero-width matches
if (m.index === regex.lastIndex) {
regex.lastIndex++;
}
// The result can be accessed through the `m`-variable.
m.forEach((match, groupIndex) => {
console.log(`Found match, group ${groupIndex}: ${match}`);
});
}

Related

Get last 2 or 3 elements from path regex

So i currently have a path and i am trying to fetch the last 3;
Test:
/testing/path/here/src/handlebar/sample/colors.txt
/testing/path/here/src/handlebar/testing/another/colors.txt
Regex:
\/([^/]+\/[^/]+\/[^/]+)\.[^.]+$
Result:
handlebar/sample/colors
testing/another/colors
What i want it to do:
sample/colors
testing/another/colors
If there are 2 directories and then the item, it should utilise the 3 and if it contains the word handlebar, it should only be two.
You could just create a group for everything behind handlebar/ like this:
with a named capturing group (subPath group contains wanted value):
/handlebar\/(?<subPath>\S*)\.\S+$/gm
without naming (first group contains wanted value):
/handlebar\/(\S*)\.\S+$/gm
Explanation: This regex matches everything ending with 'handlebar/(...any non white-space chacters 0 to infinite times).(any white-space character 1-inifite times)'. With flags globally and multiline, if you want to check multiple paths within one string separated with a line break e.g.
As you tagged the question with the tag javascript, here is some example code, how to retrieve the value of the regex group
function getSubPath(fullPath = '') {
const regex = /handlebar\/(?<subPath>\S*)\.\S+$/gm
const match = regex.exec(fullPath)
if (match) {
return match.groups.subPath
}
return fullPath // regex.exec did not deliver match
}
getSubPath('/testing/path/here/src/handlebar/sample/colors.txt')
// returns 'sample/colors'
getSubPath('/testing/path/here/src/handlebar/testing/another/colors.txt')
// returns 'testing/another/colors'
without the named group, just read / return match.groups[1] for first capturing group; index 0 is for the full match (which would include the '/handlebars' and the file extension)
I hope you'll get like this.
This is the dynamic tomorrow you can pass as per your required parameters and get result..
<script>
var res = "/testing/path/here/src/handlebar/sample/colors.txt";
var res1 = "/testing/path/here/src/handlebar/testing/another/colors.txt";;
Result = (val, text) => {
var r = val.split(text + '/')[1];
return r.substr(0, r.lastIndexOf('.'));
}
console.log(Result(res, "handlebar"));
console.log(Result(res1, "handlebar"));
</script>
A javascript solution without regex would look like this:
const getTokenizedPath = path => {
const pathArray = path.split('/');
// last element of array looks like "colors.txt" - split by dot and read the first value, removing the extension
pathArray[pathArray.length-1] = pathArray[pathArray.length-1].split('.')[0];
// Remove all elements before the 'handlebar' token and join the remaining values together by '/'.
return pathArray.slice(pathArr2.indexOf('handlebar')+1).join('/');
}
getTokenizedPath('/testing/path/here/src/handlebar/sample/colors.txt');
--- sample/colors.txt
getTokenizedPath('/testing/path/here/src/handlebar/testing/another/colors.txt');
--- testing/another/colors
I guess,
(?!.*handlebar)/([^/]+/[^/]+/[^/]+)\.[^.]+$|/([^/]+/[^/]+)\.[^.]+$
might work OK.
Demo 1
and if lookarounds would be supported,
(?!.*handlebar)(?<=/)[^/]+/[^/]+/[^/]+(?=\.[^.]+$)|$|(?<=/)([^/]+/[^/]+)(?=\.[^.]+$)
Demo 2
would be an option too.
const regex = /(?!.*handlebar)\/([^\/]+\/[^\/]+\/[^\/]+)\.[^.]+$|\/([^\/]+\/[^\/]+)\.[^.]+$/gm;
const str = `/testing/path/here/src/handlebar/sample/colors.txt
/testing/path/here/src/handlebar/testing/another/colors.txt`;
let m;
while ((m = regex.exec(str)) !== null) {
// This is necessary to avoid infinite loops with zero-width matches
if (m.index === regex.lastIndex) {
regex.lastIndex++;
}
// The result can be accessed through the `m`-variable.
m.forEach((match, groupIndex) => {
console.log(`Found match, group ${groupIndex}: ${match}`);
});
}
RegEx Circuit
jex.im visualizes regular expressions:
If you wish to simplify/modify/explore the expression, it's been explained on the top right panel of regex101.com. If you'd like, you can also watch in this link, how it would match against some sample inputs.

Need to obtain match between specified delimiters

I'm trying to match specific tags between double block quote delimiters within a sentence :
Look for `foo="x"` ONLY between the specific double block quote delimiters [[foo="x"|bar="y"|baz="z"]]
Using the following regex matches also the foo="x" outside the delimiters :
(?:(foo|bar|baz)="([^"]+)")+
I've tried adding the positive lookbehind : (?<=\[\[) but it only returns me the first foo="x" within the bockquotes but ignores the bar="y" and baz="z" matches.
const regex = /(?:(foo|bar|baz)="([^"]+)")+/gm;
const str = `Look for \`foo="x"\` ONLY between the specific double block quote delimiters [[foo="x"|bar="y"|baz="z"]]`;
let m;
while ((m = regex.exec(str)) !== null) {
// This is necessary to avoid infinite loops with zero-width matches
if (m.index === regex.lastIndex) {
regex.lastIndex++;
}
// The result can be accessed through the `m`-variable.
m.forEach((match, groupIndex) => {
console.log(`Found match, group ${groupIndex}: ${match}`);
});
}
If your strings inside [[ and ]] don't have [ and ] a simple
/(foo|bar|baz)="([^"]+)"(?=[^\][]*]])/g
will work for you. The (?=[^\][]*]]) part will make sure there are 0 or more chars other than [ and ] and then ] are immediately to the right of the current location. See the regex demo.
The safest solution includes two steps: 1) get the Group 1 value with /\[\[((foo|bar|baz)="([^"]+)"(?:\|(foo|bar|baz)="([^"]+)")*)]]/ (or a simpler but less precise but more generic /\[\[\w+="[^"]+"(?:\|\w+="[^"]+")*]]/g, see demo), and 2) use /(foo|bar|baz)="([^"]+)"/g (or /(\w+)="([^"]+)"/g) to extract the necessary values from Group 1.
const x = '(foo|bar|baz)="([^"]+)"'; // A key-value pattern block
const regex = new RegExp(`\\[\\[(${x}(?:\\|${x})*)]]`, 'g'); // Extracting the whole `[[]]`
const str = `Look for \`foo="x"\` ONLY between the specific double block quote delimiters [[foo="x"|bar="y"|baz="z"]]`;
let m;
while (m = regex.exec(str)) {
let results = [...m[1].matchAll(/(foo|bar|baz)="([^"]+)"/g)]; // Grabbing individual matches
console.log(Array.from(results, m => [m[1],m[2]]));
}
The \[\[((foo|bar|baz)="([^"]+)"(?:\|(foo|bar|baz)="([^"]+)")*)]] pattern will match
\[\[ - [[
((foo|bar|baz)="([^"]+)"(?:\|(foo|bar|baz)="([^"]+)")*) - Group 1:
(foo|bar|baz) - foo, bar or baz
= - =
"([^"]+)" - ", 1 or more chars other than " and a "
(?:\|(foo|bar|baz)="([^"]+)")* - 0 or more repetitions of | and the pattern described above
]] - ]] substring.
See the regex demo.
Try slightly another definition of your requirements:
match name="value", with capturing groups for both name and value,
before the name there should be:
either double opening bracket ([[),
or a vertical bar (|),
after the value (and closing double quote) there should be:
either double closing bracket (]]),
or a vertical bar (|).
Then the regex can be as follows:
(?:\[\[|\|)(foo|ba[rz])="(\w+)"(?=]]|\|)
Details:
(?:\[\[|\|) - the content before (will be a part of the match,
but not a part of any capturing group),
(foo|ba[rz])="(\w+)" - name / value pair (with double quotes),
(?=]]|\|) - the content after (this time expressed as a
positive lookahead).
For a working example see https://regex101.com/r/dj51GS/1

RegEx mismatch in node.js [duplicate]

This question already has answers here:
Why do regex constructors need to be double escaped?
(5 answers)
Closed 3 years ago.
Consider the following regex:
^[^-\s][a-zA-Z\sàèìòùÀÈÌÒÙáéíóúýÁÉÍÓÚÝâêîôûÂÊÎÔÛãñõÃÑÕäëïöüÿÄËÏÖÜŸçÇßØøÅåÆæœ\d!##$\+%&\'*]{1,20}$
I did try it on https://regexr.com/ using as test Collection '98 and matches.
I then did implement it in Node.js:
const myRegex = '^[^-\s][a-zA-Z\sàèìòùÀÈÌÒÙáéíóúýÁÉÍÓÚÝâêîôûÂÊÎÔÛãñõÃÑÕäëïöüÿÄËÏÖÜŸçÇßØøÅåÆæœ\d!##$\+%&\'*]{1,20}$';
const name = 'Collection \'98';
if (!name.match(myRegex))
console.log('NOK');
else
console.log('OK');
However, it always prints NOK.
Why doesn't the validation work via app?
I'm not sure about your codes, however it seems to me that your expression is correct and it works.
This snippets shows that it would return a match.
const regex = /[^-\s][a-zA-Z\sàèìòùÀÈÌÒÙáéíóúýÁÉÍÓÚÝâêîôûÂÊÎÔÛãñõÃÑÕäëïöüÿÄËÏÖÜŸçÇßØøÅåÆæœ\d!##$\+%&\'*]{1,20}/gm;
const str = `Collection '98`;
let m;
while ((m = regex.exec(str)) !== null) {
// This is necessary to avoid infinite loops with zero-width matches
if (m.index === regex.lastIndex) {
regex.lastIndex++;
}
// The result can be accessed through the `m`-variable.
m.forEach((match, groupIndex) => {
console.log(`Found match, group ${groupIndex}: ${match}`);
});
}
You can test/modify your expressions in this link.
It appears that you might have forgotten to add your expression in between two forward slashes, which you can simply fix it using /expression/.
Enclose your regex between slashes (/) instead of quotation marks " and it'll work:
const myRegex = /^[^-\s][a-zA-Z\sàèìòùÀÈÌÒÙáéíóúýÁÉÍÓÚÝâêîôûÂÊÎÔÛãñõÃÑÕäëïöüÿÄËÏÖÜŸçÇßØøÅåÆæœ\d!##$\+%&\'*]{1,20}$/;
const name = 'Collection \'98';
if (!name.match(myRegex))
console.log('NOK');
else
console.log('OK');

Regex optimization and best practice

I need to parse information out from a legacy interface. We do not have the ability to update the legacy message. I'm not very proficient at regular expressions, but I managed to write one that does what I want it to do. I just need peer-review and feedback to make sure it's clean.
The message from the legacy system returns values resembling the example below.
%name0=value
%name1=value
%name2=value
Expression: /\%(.*)\=(.*)/g;
var strBody = body_text.toString();
var myRegexp = /\%(.*)\=(.*)/g;
var match = myRegexp.exec(strBody);
var objPair = {};
while (match != null) {
if (match[1]) {
objPair[match[1].toLowerCase()] = match[2];
}
match = myRegexp.exec(strBody);
}
This code works, and I can add partial matches the middle of the name/values without anything breaking. I have to assume that any combination of characters could appear in the "values" match. Meaning it could have equal and percent signs within the message.
Is this clean enough?
Is there something that could break the expression?
First of all, don't escape characters that don't need escaping: %(.*)=(.*)
The problem with your expression: An equals sign in the value would break your parser. %name0=val=ue would result in name0=val=ue instead of name0=val=ue.
One possible fix is to make the first repetition lazy by appending a question mark: %(.*?)=(.*)
But this is not optimal due to unneeded backtracking. You can do better by using a negated character class: %([^=]*)=(.*)
And finally, if empty names should not be allowed, replace the first asterisk with a plus: %([^=]+)=(.*)
This is a good resource: Regex Tutorial - Repetition with Star and Plus
Your expression is fine, and wrapping it with two capturing groups is simple to get your desired variables and values.
You likely may not need to escape some chars and it would still work.
You can use this tool and test/edit/modify/change your expressions if you wish:
%(.+)=(.+)
Since your data is pretty structured, you can also do so with string split and get the same desired outputs, if you want.
RegEx Descriptive Graph
This graph shows how the expression would work and you can visualize other expressions in this link:
JavaScript Test
const regex = /%(.+)=(.+)/gm;
const str = `%name0=value
%name1=value
%name2=value`;
let m;
while ((m = regex.exec(str)) !== null) {
// This is necessary to avoid infinite loops with zero-width matches
if (m.index === regex.lastIndex) {
regex.lastIndex++;
}
// The result can be accessed through the `m`-variable.
m.forEach((match, groupIndex) => {
console.log(`Found match, group ${groupIndex}: ${match}`);
});
}
Performance Test
This JavaScript snippet shows the performance of that expression using a simple 1-million times for loop.
const repeat = 1000000;
const start = Date.now();
for (var i = repeat; i >= 0; i--) {
const string = '%name0=value';
const regex = /(%(.+)=(.+))/gm;
var match = string.replace(regex, "\nGroup #1: $1 \n Group #2: $2 \n Group #3: $3 \n");
}
const end = Date.now() - start;
console.log("YAAAY! \"" + match + "\" is a match 💚💚💚 ");
console.log(end / 1000 + " is the runtime of " + repeat + " times benchmark test. 😳 ");

Best way to get the word right before a certain word in javascript

I have the following string
this is the string and THIS is the word I want
I've tried using a regex for this:
var to_search = "is"
var regex = "/\S+(?="+to_search+")/g";
var matches = string.match(regex);
And I wanted matches to contain "THIS" (word that comes after the second if) but it does not seem to be working
Any idea? Thanks
regex101.com is a really great site to test your regex and it even generates the code for you.
const regex = /\bis.*(this)/gi;
const str = `this is the string and THIS is the word I want`;
let m;
while ((m = regex.exec(str)) !== null) {
// This is necessary to avoid infinite loops with zero-width matches
if (m.index === regex.lastIndex) {
regex.lastIndex++;
}
// The result can be accessed through the `m`-variable.
m.forEach((match, groupIndex) => {
console.log(`Found match, group ${groupIndex}: ${match}`);
});
}
First you have to double backslashes when using the string form of regexes.
Second, you forgot whitespace in your pattern:
var regex = new RegExp("\\S+\\s+(?="+to_search+")", "g");

Categories

Resources