Finding multiple groups in one string - javascript

Figure the following string, it's a list of html a separated by commas. How to get a list of {href,title} that are between 'start' and 'end'?
not thisstartfoo, barendnot this
The following regex give only the last iteration of a.
/start((?:<a href="(?<href>.*?)" title="(?<title>.*?)">.*?<\/a>(?:, )?)+)end/g
How to have all the list?

This should give you what you need.
https://regex101.com/r/isYIeR/1
/(?:start)*(?:<a href=(?<href>.*?)\s+title=(?<title>.*?)>.*?<\/a>)+(?:,|end)
UPDATE
This does not meet the requirement.
The Returned Value for a Given Group is the Last One Captured
I do not think this can be done in one regex match. Here is a javascript solution with 2 regex matches to get a list of {href, title}
var sample='startfoo, bar,barendstart<img> something end\n' +
'beginfoo, bar,barend\n'+
'startfoo again, bar again,bar2 againend';
var reg = /start((?:\s*<a href=.*?\s+title=.*?>.*?<\/a>,?)+)end/gi;
var regex2 = /href=(?<href>.*?)\s+title=(?<title>.*?)>/gi;
var step1, step2 ;
var hrefList = [];
while( (step1 = reg.exec(sample)) !== null) {
while((step2 = regex2.exec(step1[1])) !== null) {
hrefList.push({href:step2.groups["href"], title:step2.groups["title"]});
}
}
console.log(hrefList);

If the format is constant - ie only href and title for each tag, you can use this regex to find a string which is not "", and has " and a space or < after it using lookahead (regex101):
const str = 'startfoo, barend';
const result = str.match(/[^"]+(?="[\s>])/gi);
console.log(result);

This regex:
<.*?>
removes all html tags
so for example
<h1>1. This is a title </h1><ul><a href='www.google.com'>2. Click here </a></ul>
After using regex you will get:
1. This is a title 2. Click here
Not sure if this answers your question though.

Related

JS What's the fastest way to display one specific line of a list?

In my Javascript code, I get one very long line as a string.
This one line only has around 65'000 letters. Example:
config=123&url=http://localhost/example&path_of_code=blablaba&link=kjslfdjs...
What I have to do is replace all & with an break (\n) first and then pick only the line which starts with "path_of_code=". This line I have to write in a variable.
The part with replace & with an break (\n) I already get it, but the second task I didn't.
var obj = document.getElementById('div_content');
var contentJS= obj.value;
var splittedResult;
splittedResult = contentJS.replace(/&/g, '\n');
What is the fastest way to do it? Please note, the list is usually very long.
It sounds like you want to extract the text after &path_of_code= up until either the end of the string or the next &. That's easily done with a regular expression using a capture group, then using the value of that capture group:
var rex = /&path_of_code=([^&]+)/;
var match = rex.exec(theString);
if (match) {
var text = match[1];
}
Live Example:
var theString = "config=123&url=http://localhost/example&path_of_code=blablaba&link=kjslfdjs...";
var rex = /&path_of_code=([^&]+)/;
var match = rex.exec(theString);
if (match) {
var text = match[1];
console.log(text);
}
Use combination of String.indexOf() and String.substr()
var contentJS= "123&url=http://localhost/example&path_of_code=blablaba&link=kjslfdjs...";
var index = contentJS.indexOf("&path_of_code"),
substr = contentJS.substr(index+1),
res = substr.substr(0, substr.indexOf("&"));
console.log(res)
but the second task I didn't.
You can use filter() and startsWith()
splittedResult = splittedResult.filter(i => i.startsWith('path_of_code='));

Regex extracting multiple matches for string [duplicate]

I'm trying to obtain all possible matches from a string using regex with javascript. It appears that my method of doing this is not matching parts of the string that have already been matched.
Variables:
var string = 'A1B1Y:A1B2Y:A1B3Y:A1B4Z:A1B5Y:A1B6Y:A1B7Y:A1B8Z:A1B9Y:A1B10Y:A1B11Y';
var reg = /A[0-9]+B[0-9]+Y:A[0-9]+B[0-9]+Y/g;
Code:
var match = string.match(reg);
All matched results I get:
A1B1Y:A1B2Y
A1B5Y:A1B6Y
A1B9Y:A1B10Y
Matched results I want:
A1B1Y:A1B2Y
A1B2Y:A1B3Y
A1B5Y:A1B6Y
A1B6Y:A1B7Y
A1B9Y:A1B10Y
A1B10Y:A1B11Y
In my head, I want A1B1Y:A1B2Y to be a match along with A1B2Y:A1B3Y, even though A1B2Y in the string will need to be part of two matches.
Without modifying your regex, you can set it to start matching at the beginning of the second half of the match after each match using .exec and manipulating the regex object's lastIndex property.
var string = 'A1B1Y:A1B2Y:A1B3Y:A1B4Z:A1B5Y:A1B6Y:A1B7Y:A1B8Z:A1B9Y:A1B10Y:A1B11Y';
var reg = /A[0-9]+B[0-9]+Y:A[0-9]+B[0-9]+Y/g;
var matches = [], found;
while (found = reg.exec(string)) {
matches.push(found[0]);
reg.lastIndex -= found[0].split(':')[1].length;
}
console.log(matches);
//["A1B1Y:A1B2Y", "A1B2Y:A1B3Y", "A1B5Y:A1B6Y", "A1B6Y:A1B7Y", "A1B9Y:A1B10Y", "A1B10Y:A1B11Y"]
Demo
As per Bergi's comment, you can also get the index of the last match and increment it by 1 so it instead of starting to match from the second half of the match onwards, it will start attempting to match from the second character of each match onwards:
reg.lastIndex = found.index+1;
Demo
The final outcome is the same. Though, Bergi's update has a little less code and performs slightly faster. =]
You cannot get the direct result from match, but it is possible to produce the result via RegExp.exec and with some modification to the regex:
var regex = /A[0-9]+B[0-9]+Y(?=(:A[0-9]+B[0-9]+Y))/g;
var input = 'A1B1Y:A1B2Y:A1B3Y:A1B4Z:A1B5Y:A1B6Y:A1B7Y:A1B8Z:A1B9Y:A1B10Y:A1B11Y'
var arr;
var results = [];
while ((arr = regex.exec(input)) !== null) {
results.push(arr[0] + arr[1]);
}
I used zero-width positive look-ahead (?=pattern) in order not to consume the text, so that the overlapping portion can be rematched.
Actually, it is possible to abuse replace method to do achieve the same result:
var input = 'A1B1Y:A1B2Y:A1B3Y:A1B4Z:A1B5Y:A1B6Y:A1B7Y:A1B8Z:A1B9Y:A1B10Y:A1B11Y'
var results = [];
input.replace(/A[0-9]+B[0-9]+Y(?=(:A[0-9]+B[0-9]+Y))/g, function ($0, $1) {
results.push($0 + $1);
return '';
});
However, since it is replace, it does extra useless replacement work.
Unfortunately, it's not quite as simple as a single string.match.
The reason is that you want overlapping matches, which the /g flag doesn't give you.
You could use lookahead:
var re = /A\d+B\d+Y(?=:A\d+B\d+Y)/g;
But now you get:
string.match(re); // ["A1B1Y", "A1B2Y", "A1B5Y", "A1B6Y", "A1B9Y", "A1B10Y"]
The reason is that lookahead is zero-width, meaning that it just says whether the pattern comes after what you're trying to match or not; it doesn't include it in the match.
You could use exec to try and grab what you want. If a regex has the /g flag, you can run exec repeatedly to get all the matches:
// using re from above to get the overlapping matches
var m;
var matches = [];
var re2 = /A\d+B\d+Y:A\d+B\d+Y/g; // make another regex to get what we need
while ((m = re.exec(string)) !== null) {
// m is a match object, which has the index of the current match
matches.push(string.substring(m.index).match(re2)[0]);
}
matches == [
"A1B1Y:A1B2Y",
"A1B2Y:A1B3Y",
"A1B5Y:A1B6Y",
"A1B6Y:A1B7Y",
"A1B9Y:A1B10Y",
"A1B10Y:A1B11Y"
];
Here's a fiddle of this in action. Open up the console to see the results
Alternatively, you could split the original string on :, then loop through the resulting array, pulling out the the ones that match when array[i] and array[i+1] both match like you want.

Trim a variable's value until it reaches to a certain character

so my idea is like this..
var songList = ["1. somesong.mid","13. abcdef.mid","153. acde.mid"];
var newString = myString.substr(4); // i want this to dynamically trim the numbers till it has reached the .
// but i wanted the 1. 13. 153. and so on removed.
// i have more value's in my array with different 'numbers' in the beginning
so im having trouble with this can anyone help me find a more simple solution which dynamically chop's down the first character's till the '.' ?
You can do something like
var songList = ["1. somesong.mid","13. abcdef.mid","153. acde.mid"];
songList.forEach(function(value, i){
songList[i] = value.replace(/\d+\./, ''); //value.substr(value.indexOf('.') + 1)
});
Demo: Fiddle
You can use the string .match() method to extract the part up to and including the first . as follows:
var newString = myString.match(/[^.]*./)[0];
That assumes that there will be a match. If you need to allow for no match occurring then perhaps:
var newString = (myString.match(/[^.]*./) || [myString])[0];
If you're saying you want to remove the numbers and keep the rest of the string, then a simple .replace() will do it:
var newString = myString.replace(/^[^.]*. */, "");

Replace query string returns only first array value using Regex?

My url is having a parameter value as follows below:
Nr=AND(OR(abc:def),OR(ghi:jkl),OR(mno:pqr)...)
Used the below regex expression to extract the above query string successfully but it returns only the first array value For Ex. getting only abc and def value in the array.
OR\(([^:]*):([^)]*)\)
I wanted to extract all the values as two separate array values as abc,ghi,mno and def,jkl,pqr...
Plz find my code below:
var getNrValue = 'AND(OR(Analyzed:abc),OR(Compounds:def),OR(Chemical:mno))';
var regex = /OR\(([^:]*):([^)]*)\)/gm;
var s = regex.exec(getNrValue);
console.log(s);
any help on this?
You can use this regex:
([^():]+):([^():]+)
In the regex demo, the right pane shows the capture groups. There is also a live JS demo.
Use this code to create the arrays (see the output of the live JS demo):
var array1 = [];
var array2 = [];
var string = 'Nr=AND(OR(abc:def),OR(ghi:jkl),OR(mno:pqr)...)'
var string = 'Nr=AND(OR(abc:def),OR(ghi:jkl),OR(mno:pqr)...)'
var myregex = /([^():]+):([^():]+)/g;
var thematch = myregex.exec(string);
while (thematch != null) {
// add it to array of captures
array1.push(thematch[1]);
array2.push(thematch[2]);
document.write("left side: ",thematch[1],"<br />");
document.write("right side: ",thematch[2],"<br />");
// match the next one
thematch = myregex.exec(string);
}
Explanation:
([^():]+) captures to Group 1 any characters that are not parentheses ()or colons :
:
([^():]+) captures to Group 2 any characters that are not parentheses ()or colons :
the code retrieves Group 1 and Group 2 matches and pushes them onto the two arrays
Let me know if you have questions. :)

How to remove the last matched regex pattern in javascript

I have a text which goes like this...
var string = '~a=123~b=234~c=345~b=456'
I need to extract the string such that it splits into
['~a=123~b=234~c=345','']
That is, I need to split the string with /b=.*/ pattern but it should match the last found pattern. How to achieve this using RegEx?
Note: The numbers present after the equal is randomly generated.
Edit:
The above one was just an example. I did not make the question clear I guess.
Generalized String being...
<word1>=<random_alphanumeric_word>~<word2>=<random_alphanumeric_word>..~..~..<word2>=<random_alphanumeric_word>
All have random length and all wordi are alphabets, the whole string length is not fixed. the only text known would be <word2>. Hence I needed RegEx for it and pattern being /<word2>=.*/
This doesn't sound like a job for regexen considering that you want to extract a specific piece. Instead, you can just use lastIndexOf to split the string in two:
var lio = str.lastIndexOf('b=');
var arr = [];
var arr[0] = str.substr(0, lio);
var arr[1] = str.substr(lio);
http://jsfiddle.net/NJn6j/
I don't think I'd personally use a regex for this type of problem, but you can extract the last option pair with a regex like this:
var str = '~a=123~b=234~c=345~b=456';
var matches = str.match(/^(.*)~([^=]+=[^=]+)$/);
// matches[1] = "~a=123~b=234~c=345"
// matches[2] = "b=456"
Demo: http://jsfiddle.net/jfriend00/SGMRC/
Assuming the format is (~, alphanumeric name, =, and numbers) repeated arbitrary number of times. The most important assumption here is that ~ appear once for each name-value pair, and it doesn't appear in the name.
You can remove the last token by a simple replacement:
str.replace(/(.*)~.*/, '$1')
This works by using the greedy property of * to force it to match the last ~ in the input.
This can also be achieved with lastIndexOf, since you only need to know the index of the last ~:
str.substring(0, (str.lastIndexOf('~') + 1 || str.length() + 1) - 1)
(Well, I don't know if the code above is good JS or not... I would rather write in a few lines. The above is just for showing one-liner solution).
A RegExp that will give a result that you may could use is:
string.match(/[a-z]*?=(.*?((?=~)|$))/gi);
// ["a=123", "b=234", "c=345", "b=456"]
But in your case the simplest solution is to split the string before extract the content:
var results = string.split('~'); // ["", "a=123", "b=234", "c=345", "b=456"]
Now will be easy to extract the key and result to add to an object:
var myObj = {};
results.forEach(function (item) {
if(item) {
var r = item.split('=');
if (!myObj[r[0]]) {
myObj[r[0]] = [r[1]];
} else {
myObj[r[0]].push(r[1]);
}
}
});
console.log(myObj);
Object:
a: ["123"]
b: ["234", "456"]
c: ["345"]
(?=.*(~b=[^~]*))\1
will get it done in one match, but if there are duplicate entries it will go to the first. Performance also isn't great and if you string.replace it will destroy all duplicates. It would pass your example, but against '~a=123~b=234~c=345~b=234' it would go to the first 'b=234'.
.*(~b=[^~]*)
will run a lot faster, but it requires another step because the match comes out in a group:
var re = /.*(~b=[^~]*)/.exec(string);
var result = re[1]; //~b=234
var array = string.split(re[1]);
This method will also have the with exact duplicates. Another option is:
var regex = /.*(~b=[^~]*)/g;
var re = regex.exec(string);
var result = re[1];
// if you want an array from either side of the string:
var array = [string.slice(0, regex.lastIndex - re[1].length - 1), string.slice(regex.lastIndex, string.length)];
This actually finds the exact location of the last match and removes it regex.lastIndex - re[1].length - 1 is my guess for the index to remove the ellipsis from the leading side, but I didn't test it so it might be off by 1.

Categories

Resources