java script regex .match find only one result - javascript

i have this js code :
result = subject.match(/<a.*class="gallery_browser_thumbnail".*href="(.+)">/i);
i want to get href of multiple a tags on a html source
but it shows only 1 result
if i use /g at end of pattern it returns whole patterns but i just want only the href part
i mean -> (.+) this part
this is how i capture html input :
var subject = String(document
.getElementsByTagName("body")[0].innerHTML);
any help?
final working script :
var subject = String(document.getElementsByTagName("body")[0].innerHTML);
var regex = /<a.*class="gallery_browser_thumbnail".*href="(.+)">/gi;
var matched = null;
while (matched = regex.exec(subject)) {
alert(matched[1]);
}

Change to lazy match by adding the lazy quantifier ?:
result = subject.match(/<a.*?class="gallery_browser_thumbnail".*?href="(.+?)">/i);

You can use exec to test RegExp. Something like this:
var subject = String(document.getElementsByTagName("body")[0].innerHTML),
regexp = /<a.*class="gallery_browser_thumbnail".*href="(.+)">/gi, //g for global match
match = regexp.exec(subject),
result = [];
while(match != null){
result.push(match[1]); // the second value is matched group
match = regexp.exec(subject);
}

Related

Finding multiple groups in one string

Figure the following string, it's a list of html a separated by commas. How to get a list of {href,title} that are between 'start' and 'end'?
not thisstartfoo, barendnot this
The following regex give only the last iteration of a.
/start((?:<a href="(?<href>.*?)" title="(?<title>.*?)">.*?<\/a>(?:, )?)+)end/g
How to have all the list?
This should give you what you need.
https://regex101.com/r/isYIeR/1
/(?:start)*(?:<a href=(?<href>.*?)\s+title=(?<title>.*?)>.*?<\/a>)+(?:,|end)
UPDATE
This does not meet the requirement.
The Returned Value for a Given Group is the Last One Captured
I do not think this can be done in one regex match. Here is a javascript solution with 2 regex matches to get a list of {href, title}
var sample='startfoo, bar,barendstart<img> something end\n' +
'beginfoo, bar,barend\n'+
'startfoo again, bar again,bar2 againend';
var reg = /start((?:\s*<a href=.*?\s+title=.*?>.*?<\/a>,?)+)end/gi;
var regex2 = /href=(?<href>.*?)\s+title=(?<title>.*?)>/gi;
var step1, step2 ;
var hrefList = [];
while( (step1 = reg.exec(sample)) !== null) {
while((step2 = regex2.exec(step1[1])) !== null) {
hrefList.push({href:step2.groups["href"], title:step2.groups["title"]});
}
}
console.log(hrefList);
If the format is constant - ie only href and title for each tag, you can use this regex to find a string which is not "", and has " and a space or < after it using lookahead (regex101):
const str = 'startfoo, barend';
const result = str.match(/[^"]+(?="[\s>])/gi);
console.log(result);
This regex:
<.*?>
removes all html tags
so for example
<h1>1. This is a title </h1><ul><a href='www.google.com'>2. Click here </a></ul>
After using regex you will get:
1. This is a title 2. Click here
Not sure if this answers your question though.

JS What's the fastest way to display one specific line of a list?

In my Javascript code, I get one very long line as a string.
This one line only has around 65'000 letters. Example:
config=123&url=http://localhost/example&path_of_code=blablaba&link=kjslfdjs...
What I have to do is replace all & with an break (\n) first and then pick only the line which starts with "path_of_code=". This line I have to write in a variable.
The part with replace & with an break (\n) I already get it, but the second task I didn't.
var obj = document.getElementById('div_content');
var contentJS= obj.value;
var splittedResult;
splittedResult = contentJS.replace(/&/g, '\n');
What is the fastest way to do it? Please note, the list is usually very long.
It sounds like you want to extract the text after &path_of_code= up until either the end of the string or the next &. That's easily done with a regular expression using a capture group, then using the value of that capture group:
var rex = /&path_of_code=([^&]+)/;
var match = rex.exec(theString);
if (match) {
var text = match[1];
}
Live Example:
var theString = "config=123&url=http://localhost/example&path_of_code=blablaba&link=kjslfdjs...";
var rex = /&path_of_code=([^&]+)/;
var match = rex.exec(theString);
if (match) {
var text = match[1];
console.log(text);
}
Use combination of String.indexOf() and String.substr()
var contentJS= "123&url=http://localhost/example&path_of_code=blablaba&link=kjslfdjs...";
var index = contentJS.indexOf("&path_of_code"),
substr = contentJS.substr(index+1),
res = substr.substr(0, substr.indexOf("&"));
console.log(res)
but the second task I didn't.
You can use filter() and startsWith()
splittedResult = splittedResult.filter(i => i.startsWith('path_of_code='));

Regex check both side of match but not include in match string

I want get match with checking both side expropriation of main match.
var str = 1234 word !!! 5678 another *** 000more))) get word and another
console.log(str.match(/(?!\d+\s?)\w+(?=\s?\W+)/g))
>> (3) ["word", "another", "more"]
it check both side but not include in the main match sets.
But in html it not working [not working]
var str = ''; get url, url2 and url3
console.log(str.match(/(?!href=")[^"]+?(?=")/g))
>> (6) ["<a href=", "url3"]
I try to Negative lookarounds using (?!href=") and Positive lookarounds using (?=") to match only the value of its attribute but it return more attributes.
Is there any way to so like this here, Thanks
What you could do for your example data is capture what is between double quotes href="([^"]+) in an captured group and loop through the result:
var str = '';
var pattern = /href="([^"]+)/g;
var match = pattern.exec(str);
while (match != null) {
console.log(match[1]);
match = pattern.exec(str);
}
In other flavors of regex you could have used e.g. positive lookbehind
((?<=href="), but unfortunately Javascript regex does not support
lookbehinds.
A reasonable solution is:
Match href=" as "ordinary" content, to be ignored.
Match the attribute value as a capturing group ((\w+)),
to be "consumed".
Set the boundary of the above group with a *positive lookup"
((?=")), just as you did.
So the whole regex can be:
href="(\w+)(?=")
and read "your" value from group 1.
You can't parse HTML with regex. Because HTML can't be parsed by regex.
Have you tried using the DOM parser that's right at your fingertips?
var str = '';
var div = document.createElement('div');
div.innerHTML = str; // parsing magic!
var links = Array.from(div.getElementsByTagName("a"));
var urls = links.map(function(a) {return a.href;});
// above returns fully-resolved absolute URLs.
// for the literal attribute value, try a.getAttribute("href")
console.log(urls);

Display characters other than alphabets using reqular expression

I have tried to display characters other than alphabets in the particular string but it is displaying only the first char.
var myArray = /[^a-zA-Z]+/g.exec("cdAbb#2547dbsbz78678");
The reason it is only displaying the first character is because with using exec and the g modifier (global), this method is meant to be used in a loop for getting all sub matches.
var str = "cdAbb#2547dbsbz78678";
var re = /[^a-zA-Z]+/g;
var myArray;
while (myArray = re.exec(str)) {
console.log(myArray[0]);
}
Output
#2547
78678
If you were wanting to combine the matches you could use the following.
var str = "cdAbb#2547dbsbz78678",
res = str.match(/[\W\d]+/g).join('');
# => "#254778678"
Or do a replacement
str = str.replace(/[a-z]+/gi, '');
You can do:
"cdAbb#2547dbsbz78678".match(/[^a-zA-Z]+/g).join('');
//=> #254778678
RegExp.exec with g (global) modifier needs to run in loop to give you all the matches.

regex: any string between two slashes first of them is prefixed with a defined string

I'd like to get the talker name of some mp3s files paths such as the following:
/assets/audio/James_Lee/001.mp3
/assets/audio/Marc_Smith/001.mp3
/aasets/audio/blahblah/001.mp3
In the previous example we note that each talker name is surrounded by two slashes where the first of them is prefixed with the word audio. I need a pattern that matches names like the example above using javascript.
I tried at http://regexpal.com/ :
audio/.*/
but it only matches *audio/The_name/* where I need *The_name* only. The other thing I don't know how could I use such patterns with javascript replace().
This will get your the name: (?<=\/assets\/audio\/).*(?=\/)
Here's the regex in use: http://regexr.com?34747
Considering Javascript, you could do this:
var string = "/assets/audio/James_Lee/001.mp3";
var name = string.replace(/^.*\/audio\/|\/[\d]+\..*$/g, '');
Try this:
var str = "/assets/audio/James_Lee/001.mp3\n/assets/audio/Marc_Smith/001.mp3";
var pattern = /audio\/(.+?)\//g;
var match;
var matches = [];
while ((match = pattern.exec(str)) !== null){
matches.push(match[1]);
}
console.log(matches);
// If you want a string with only the names, you can re-combine the matches
str = matches.join('\n');
how about this?
str.replace(/.*audio\/([^\/]*)\/.*/,"$1")

Categories

Resources