My regular expression isn't matching the correct group - javascript

I have a string:
var str = ' not valid xml here <something unknown>123</something>\
<something hello>555</something>\
<something what>655</something>';
var matches = str.match(/something[^>]+>([^<]+)/g);
I want matches to equal [123, 555, 655] and I thought () around my regex indicated this, but for some reason matches equals ["something unknown>123", "something hello>555", "something what>655"]. My solution was to do
matches.map(function(data){ return data.split('>').pop() })
but I was wondering if there's a more elegant way to do this by directly editing the regex, and I was wondering why () did not work.

Macmee, with all the usual disclaimers about parsing xml in regex, here is a simple regex that captures what you want:
<[^>]*>([^<]*)<\/
See the online demo (you are looking for the Group 1 captures in the bottom right pane).
Make sure you use g to get all the captures—but you already know that.
Here is also a full code demo for the code below.
<script>
var subject = '<something unknown>123</something>\
<something hello>555</something>\
<something what>655</something>';
var regex = /<[^>]*>([^<]*)<\//g;
var group1Caps = [];
var match = regex.exec(subject);
// put Group 1 captures in an array
while (match != null) {
if( match[1] != null ) group1Caps.push(match[1]);
match = regex.exec(subject);
}
document.write("<br>*** Matches ***<br>");
if (group1Caps.length > 0) {
for (key in group1Caps) document.write(group1Caps[key],"<br>");
}
</script>

Related

Regex match whole expression instead of 2 matches [duplicate]

I have the text:
s.events="event3"
s.pageName="Forum: Index"
s.channel="forum"
s.prop1="Forum: Index"
s.prop2="Index Page"
s.prop36=""
s.prop37=""
s.prop38=""
s.prop39=""
s.prop40="53"
s.prop41="Anonymous"
s.prop42="username"
s.prop43=""
s.prop47=""
s.eVar1="Forum: Index"
s.eVar2="Index Page"
s.eVar36=""
s.eVar37=""
saved in a var in javascript and I want to extract the text between the quotes of s.prop42 giving me the result:
"username"
what I have right now is
var regex = /\?prop.42="([^']+)"/;
var test = data.match(regex);
but it doesnt seem to work, can someone help me out?
Use this:
var myregex = /s\.prop42="([^"]*)"/;
var matchArray = myregex.exec(yourString);
if (matchArray != null) {
thematch = matchArray[1];
}
In the regex demo, look at the capture group in the right pane.
Explanation
s\.prop42=" matches s.prop42=" (but we won't retrieve it)
The parentheses in ([^"]*) capture any chars that are not a " to Group 1: this is what we want
The code gets the Group 1 capture
Can't comment on above answer, but I think the regex is better with .* like so:
var myregex = /s\.prop42="(.*)"/;
var matchArray = myregex.exec(yourString);
if (matchArray != null) {
thematch = matchArray[1];
}

JavaScript: RegEx - Do not return match1 match2 etc

I'm trying to adopt this regex for my needs:
(?:^|\s)(?:https?:\/\/)?(?:\w+(?=\.).)?(?<name>.*).(?<tld>(?<=\.)\w+)
Demo:
https://regex101.com/r/lI2lB4/2
I would like to return a match for all entries like:
www.example.com
example.com
http://example.com
http://www.example.com
but not for
example.
http://www.example
www.example
Now the regex shown above is working fine, but it returns different matches (Match 1, Match 2, ...) - but would like to get only one result: Matching or not matching.
As a result I would like to use
regExDomain.test($regExDomain.test(input.val()))
{
console.log('valid');
}
else
{
console.log('invalid');
}
The problem is: The regEx above seems always to return "valid".
Any ideas how to do that?
The test() function of Regex class should be enough to validate whether the input matches the pattern.
You could do something like this:
var pattern = /^(http[s]?:\/\/)?(www\.)?([^\.]+)\.[^\.]{2,3}$/
var regex = new RegExp(pattern);
for(var i=1; i<=3; i++) {
if ( regex.test( $("#text"+i).text() ) )
$("#isMatch"+i).html("MATCHES");
else
$("#isMatch"+i).html("DOESN'T MATCH");
}
jsfiddle example: http://jsfiddle.net/jyu16m89/1/
The above example will return false for the extended domains (e.g. ".digital" or ".menu" ). If you want to include it in your pattern, replace {2,3} by +
If you want to include subdomains/folders in your pattern (e.g. returning true for entries like http://stackoverflow.com/questions/), remove the dollar sign (this not limiting the string to end there).
You have a grouped regex so it will return match[n] where n is the number of groups that matched. If nothing matches then you'll get null as a result:
function isUrl(myString) {
var match = myString.match('/(?:^|\s)(?:https?:\/\/)?(?:\w+(?=\.).)?(?<name>.*).(?<tld>(?<=\.)\w+)/');
if(match !== null) {
return true;
}
return false;
}

Extract word between '=' and '('

I have the following string
234234=AWORDHERE('sdf.'aa')
where I need to extract AWORDHERE.
Sometimes there can be space in between.
234234= AWORDHERE('sdf.'aa')
Can I do this with a regular expression?
Or should I do it manually by finding indexes?
The datasets are huge, so it's important to do it as fast as possible.
Try this regex:
\d+=\s?(\w+)\(
Check Demo
in Javascript it would like that:
var myString = "234234=AWORDHERE('sdf.'aa')";// or 234234= AWORDHERE('sdf.'aa')
var myRegexp = /\d+=\s?(\w+)\(/g;
var match = myRegexp.exec(myString);
console.log(match[1]); // AWORDHERE
You could do this at least three ways. You need to benchmark to see what's fastest.
Substring w/ indexes
function extract(from) {
var ixEq = from.indexOf("=");
var ixParen = from.indexOf("(");
return from.substring(ixEq + 1, ixParen);
}
.
Splits
function extract(from) {
var spEq = from.split("=");
var spParen = spEq[1].split("(");
return spParen[0];
}
Regex (demo)
Here is some sample regex you could use
/[^=]+=([^(]+).*/g
This says
[^=]+ - One or more character which is not an =
= - The = itself
( - creates a matching group so you can access your match in code
[^(]+ - One or more character which is not a (
) - closes the matching group
.* - Matches the rest of the line
the /g on the end tells it to perform the match on all lines.
Using look around you can search for string preceded by = and followed by ( as following.
Regex: (?<==)[A-Z ]+(?=\()
Explanation:
(?<==) checks if [A-Z ] is preceded by an =.
[A-Z ]+ matches your pattern.
(?=\() checks if matched pattern is followed by a (.
Regex101 Demo
var str = "234234= AWORDHERE('sdf.'aa')";
var regexp = /.*=\s+(\w+)\(.*\)/g;
var match = regexp.exec(str);
alert( match[1] );
I made my solution for this just a little more general than you asked for, but I don't think it takes much more time to execute. I didn't measure. If you need greater efficiency than this provides, comment and I or someone else can help you with that.
Here's what I did, using the command prompt of node:
> var s = "234234= AWORDHERE('sdf.'aa')"
undefined
> var a = s.match(/(\w+)=\s*(\w+)\s*\(.*/)
undefined
> a
[ '234234= AWORDHERE(\'sdf.\'aa\')',
'234234',
'AWORDHERE',
index: 0,
input: '234234= AWORDHERE(\'sdf.\'aa\')' ]
>
As you can see, this matches the number before the = in a[1], and it matches the AWORDHERE name as you requested in a[2]. This will work with any number (including zero) spaces before and/or after the =.

Javascript Regex to get text between certain characters

I need a regex in Javascript that would allow me to match an order number in two different formats of order URL:
The URLs:
http://store.apple.com/vieworder/1003123464/test#test.com
http://store.apple.com/vieworder/W411234368/test#test.com/AOS-A=
M-104121
The first one will always be all numbers, and the second one will always start with a W, followed by just numbers.
I need to be able to use a single regex to return these matches:
1003123464
W411234368
This is what I've tried so far:
/(vieworder\/)(.*?)(?=\/)/g
RegExr link
That allows me to match:
vieworder/1003123464
vieworder/W411234368
but I'd like it to not include the first capture group.
I know I could then run the result through a string.replace('vieworder/'), but it'd be cool to be able to do this in just one command.
Use your expression without grouping vieworder
vieworder\/(.*?)(?=\/)
DEMO
var string = 'http://store.apple.com/vieworder/1003123464/test#test.com http://store.apple.com/vieworder/W411234368/test#test.com/AOS-A=M-104121';
var myRegEx = /vieworder\/(.*?)(?=\/)/g;
var index = 1;
var matches = [];
var match;
while (match = myRegEx.exec(string)) {
matches.push(match[index]);
}
console.log(matches);
Use replace instead of match since js won't support lookbehinds. You could use capturing groups and exec method to print the chars present inside a particular group.
> var s1 = 'http://store.apple.com/vieworder/1003123464/test#test.com'
undefined
> var s2 = 'http://store.apple.com/vieworder/W411234368/test#test.com/AOS-A='
undefined
> s1.replace(/^.*?vieworder\/|\/.*/g, '')
'1003123464'
> s2.replace(/^.*?vieworder\/|\/.*/g, '')
'W411234368'
OR
> s1.replace(/^.*?\bvieworder\/([^\/]*)\/.*/g, '$1')
'1003123464'
I'd suggest
W?\d+
That ought to translate to "one or zero W and one or more digits".

Using Regex to pull out a part of a string

I can't figure out how to pull out multiple matches from the following example:
This code:
/prefix-(\w+)/g.exec('prefix-firstname prefix-lastname');
returns:
["prefix-firstname", "firstname"]
How do I get it to return:
[
["prefix-firstname", "firstname"],
["prefix-lastname", "lastname"]
]
Or
["prefix-firstname", "firstname", "prefix-lastname", "lastname"]
This will do what you want:
var str="prefix-firstname prefix-lastname";
var out =[];
str.replace(/prefix-(\w+)/g,function(match, Group) {
var row = [match, Group]
out.push(row);
});
Probably a mis-use of .replace, but I don't think you can pass a function to .match...
_Pez
Using a loop:
re = /prefix-(\w+)/g;
str = 'prefix-firstname prefix-lastname';
match = re.exec(str);
while (match != null) {
match = re.exec(str);
}
You get each match one at a time.
Using match:
Here, the regex will have to be a bit different, because you cannot get sub-captures (or I don't know how to do it with multiple matches)...
re = /[^\s-]+(?=\s|$)/g;
str = 'prefix-firstname prefix-lastname';
match = str.match(re);
alert(match);
[^\s-]+ matches all characters except spaces and dashes/hyphens only if they are followed by a space or are at the end of the string, which is a confition imposed by (?=\s|$).
You can find the groups in two steps:
"prefix-firstname prefix-lastname".match(/prefix-\w+/g)
.map(function(s) { return s.match(/prefix-(\w+)/) })

Categories

Resources