regex in js, match pattern except keywords - javascript

I m trying to found a regex pattern in js
Any_Function() //match : Any_Function(
butnotthis() //I don't want to match butnotthis(
I have this pattern : /([a-zA-Z_]+\()/ig
and would like something like
/(not:butnotthis)|([a-zA-Z_]+\()/ig (don't try this)
Demo here :
http://regexr.com/38qag
Is it possible to don't match keywords ?

The way I interpreted your question, you wanted to be able to create a blacklist of ignored functions. As far as I know, you cannot do this with regular expressions; however, you could do it with a bit of JavaScript.
I created a JSFiddle: http://jsfiddle.net/DQN79/
var str = "Any_Function();butnotthis();",
matches = [],
blacklist = { butnotthis: true };
str.replace(/([a-zA-Z_]+\()/ig, function (match) {
if (!blacklist[match.substr(0, match.length - 1)])
matches.push(match);
});
console.log(matches);
In this example, I abused the String#replace() method because it accepts a callback that will be fired for each match. I used this callback to check for blacklisted function names - if the function is not blacklisted, it will be added to the matches array.
I used a hashmap for the blacklist because it is programmatically easier, but you could also use a string, array, etc.

You can establish a convention between functions and keywords, where the functions should start with uppercase letter. In that case the regex would be:
/(^[A-Z][a-zA-z_]+\()/ig

Here is one working version:
^(?!(butnotthis\())([a-zA-Z_]+\()/ig
specific the list of functions to be ignored within the braces
http://regexr.com/38qb8
For Javascript:
var str = "Any_Function();butnotthis();",
matches = [],
blacklist = ["butnotthis"];
// Uses filter method of jQuery
matches = str.match(/([a-zA-Z_]+\()/ig).filter(
function (e) {
var flag = false;
for (var i in blacklist) {
if (e.indexOf(blacklist[i]) !== 0) flag = true;
}
return flag;
});
console.log(matches)
jsBin : http://jsbin.com/vevip/1/edit

Related

Regex extracting multiple matches for string [duplicate]

I'm trying to obtain all possible matches from a string using regex with javascript. It appears that my method of doing this is not matching parts of the string that have already been matched.
Variables:
var string = 'A1B1Y:A1B2Y:A1B3Y:A1B4Z:A1B5Y:A1B6Y:A1B7Y:A1B8Z:A1B9Y:A1B10Y:A1B11Y';
var reg = /A[0-9]+B[0-9]+Y:A[0-9]+B[0-9]+Y/g;
Code:
var match = string.match(reg);
All matched results I get:
A1B1Y:A1B2Y
A1B5Y:A1B6Y
A1B9Y:A1B10Y
Matched results I want:
A1B1Y:A1B2Y
A1B2Y:A1B3Y
A1B5Y:A1B6Y
A1B6Y:A1B7Y
A1B9Y:A1B10Y
A1B10Y:A1B11Y
In my head, I want A1B1Y:A1B2Y to be a match along with A1B2Y:A1B3Y, even though A1B2Y in the string will need to be part of two matches.
Without modifying your regex, you can set it to start matching at the beginning of the second half of the match after each match using .exec and manipulating the regex object's lastIndex property.
var string = 'A1B1Y:A1B2Y:A1B3Y:A1B4Z:A1B5Y:A1B6Y:A1B7Y:A1B8Z:A1B9Y:A1B10Y:A1B11Y';
var reg = /A[0-9]+B[0-9]+Y:A[0-9]+B[0-9]+Y/g;
var matches = [], found;
while (found = reg.exec(string)) {
matches.push(found[0]);
reg.lastIndex -= found[0].split(':')[1].length;
}
console.log(matches);
//["A1B1Y:A1B2Y", "A1B2Y:A1B3Y", "A1B5Y:A1B6Y", "A1B6Y:A1B7Y", "A1B9Y:A1B10Y", "A1B10Y:A1B11Y"]
Demo
As per Bergi's comment, you can also get the index of the last match and increment it by 1 so it instead of starting to match from the second half of the match onwards, it will start attempting to match from the second character of each match onwards:
reg.lastIndex = found.index+1;
Demo
The final outcome is the same. Though, Bergi's update has a little less code and performs slightly faster. =]
You cannot get the direct result from match, but it is possible to produce the result via RegExp.exec and with some modification to the regex:
var regex = /A[0-9]+B[0-9]+Y(?=(:A[0-9]+B[0-9]+Y))/g;
var input = 'A1B1Y:A1B2Y:A1B3Y:A1B4Z:A1B5Y:A1B6Y:A1B7Y:A1B8Z:A1B9Y:A1B10Y:A1B11Y'
var arr;
var results = [];
while ((arr = regex.exec(input)) !== null) {
results.push(arr[0] + arr[1]);
}
I used zero-width positive look-ahead (?=pattern) in order not to consume the text, so that the overlapping portion can be rematched.
Actually, it is possible to abuse replace method to do achieve the same result:
var input = 'A1B1Y:A1B2Y:A1B3Y:A1B4Z:A1B5Y:A1B6Y:A1B7Y:A1B8Z:A1B9Y:A1B10Y:A1B11Y'
var results = [];
input.replace(/A[0-9]+B[0-9]+Y(?=(:A[0-9]+B[0-9]+Y))/g, function ($0, $1) {
results.push($0 + $1);
return '';
});
However, since it is replace, it does extra useless replacement work.
Unfortunately, it's not quite as simple as a single string.match.
The reason is that you want overlapping matches, which the /g flag doesn't give you.
You could use lookahead:
var re = /A\d+B\d+Y(?=:A\d+B\d+Y)/g;
But now you get:
string.match(re); // ["A1B1Y", "A1B2Y", "A1B5Y", "A1B6Y", "A1B9Y", "A1B10Y"]
The reason is that lookahead is zero-width, meaning that it just says whether the pattern comes after what you're trying to match or not; it doesn't include it in the match.
You could use exec to try and grab what you want. If a regex has the /g flag, you can run exec repeatedly to get all the matches:
// using re from above to get the overlapping matches
var m;
var matches = [];
var re2 = /A\d+B\d+Y:A\d+B\d+Y/g; // make another regex to get what we need
while ((m = re.exec(string)) !== null) {
// m is a match object, which has the index of the current match
matches.push(string.substring(m.index).match(re2)[0]);
}
matches == [
"A1B1Y:A1B2Y",
"A1B2Y:A1B3Y",
"A1B5Y:A1B6Y",
"A1B6Y:A1B7Y",
"A1B9Y:A1B10Y",
"A1B10Y:A1B11Y"
];
Here's a fiddle of this in action. Open up the console to see the results
Alternatively, you could split the original string on :, then loop through the resulting array, pulling out the the ones that match when array[i] and array[i+1] both match like you want.

Javascript Regex to get text between certain characters

I need a regex in Javascript that would allow me to match an order number in two different formats of order URL:
The URLs:
http://store.apple.com/vieworder/1003123464/test#test.com
http://store.apple.com/vieworder/W411234368/test#test.com/AOS-A=
M-104121
The first one will always be all numbers, and the second one will always start with a W, followed by just numbers.
I need to be able to use a single regex to return these matches:
1003123464
W411234368
This is what I've tried so far:
/(vieworder\/)(.*?)(?=\/)/g
RegExr link
That allows me to match:
vieworder/1003123464
vieworder/W411234368
but I'd like it to not include the first capture group.
I know I could then run the result through a string.replace('vieworder/'), but it'd be cool to be able to do this in just one command.
Use your expression without grouping vieworder
vieworder\/(.*?)(?=\/)
DEMO
var string = 'http://store.apple.com/vieworder/1003123464/test#test.com http://store.apple.com/vieworder/W411234368/test#test.com/AOS-A=M-104121';
var myRegEx = /vieworder\/(.*?)(?=\/)/g;
var index = 1;
var matches = [];
var match;
while (match = myRegEx.exec(string)) {
matches.push(match[index]);
}
console.log(matches);
Use replace instead of match since js won't support lookbehinds. You could use capturing groups and exec method to print the chars present inside a particular group.
> var s1 = 'http://store.apple.com/vieworder/1003123464/test#test.com'
undefined
> var s2 = 'http://store.apple.com/vieworder/W411234368/test#test.com/AOS-A='
undefined
> s1.replace(/^.*?vieworder\/|\/.*/g, '')
'1003123464'
> s2.replace(/^.*?vieworder\/|\/.*/g, '')
'W411234368'
OR
> s1.replace(/^.*?\bvieworder\/([^\/]*)\/.*/g, '$1')
'1003123464'
I'd suggest
W?\d+
That ought to translate to "one or zero W and one or more digits".

Regular Expression to find complex markers

I want to use JavaScript's regular expression something like this
/marker\d+"?(\w+)"?\s/gi
In a string like this:
IDoHaveMarker1"apple" IDoAlsoHaveAMarker352pear LastPointMakingmarker3134"foo"
And I want it to return an array like this:
[ "apple", "pear", "foo" ]
The quotes are to make clear they are strings. They shouldn't be in the result.
If you are asking about how to actually use the regex:
To get all captures of multiple (global) matches you have to use a loop and exec in JavaScript:
var regex = /marker\d+"?(\w+)/gi;
var result = [];
var match;
while (match = regex.exec(input)) {
result.push(match[1]);
}
(Note that you can omit the trailing "?\s? if you are only interested in the capture, since they are optional anyway, so they don't affect the matched result.)
And no, g will not allow you to do all of that in one call. If you had omitted g then exec would return the same match every time.
As Blender mentioned, if you want to rule out things like Marker13"something Marker14bar (unmatched ") you need to use another capturing group and a backreference. Note that this will push your desired capture to index 2:
var regex = /marker\d+("?)(\w+)\1/gi;
var result = [];
var match;
while (match = regex.exec(input)) {
result.push(match[2]);
}

how do I capture something after something else? like a referer=someString

I have ref=Apple
and my current regex is
var regex = /ref=(.+)/;
var ref = regex.exec(window.location.href);
alert(ref[0]);
but that includes the ref=
now, I also want to stop capturing characters if a & is at the end of the ref param. cause ref may not always be the last param in the url.
You'll want to split the url parameters, rather than using a regular expression.
Something like:
var get = window.location.href.split('?')[1];
var params = get.split('&');
for (p in params) {
var key = params[p].split('=')[0];
var value = params[p].split('=')[1];
if (key == 'ref') {
alert('ref is ' + value);
}
}
Use ref[1] instead.
This accesses what is captured by group 1 in your pattern.
Note that there's almost certainly a better way to do key/value parsing in Javascript than regex.
References
regular-expressions.info/Brackets for Capturing
You are using the ref wrong, you should use ref[1] for the (.+), ref[0] is the whole match.
If & is at the end, modify the regexp to /ref=([^&]+)/, to exclude &s.
Also, make sure you urldecode (unescape in JavaScript) the match.
Capture only word characters and numbers:
var regex = /ref=(\w+)/;
var ref = regex.exec(window.location.href);
alert(ref[1]);
Capture word characters, numbers, - and _:
var regex = /ref=([\w_\-]+)/;
var ref = regex.exec(window.location.href);
alert(ref[1]);
More information about Regular Expressions (the basics)
try this regex pattern ref=(.*?)&
This pattern will match anything after ref= and stop before '&'
To get the value of m just use following code:
var regex = /ref=(.*?)&/;
var ref = regex.exec(window.location.href);
alert(ref[1]);

How to extract a string using JavaScript Regex?

I'm trying to extract a substring from a file with JavaScript Regex. Here is a slice from the file :
DATE:20091201T220000
SUMMARY:Dad's birthday
the field I want to extract is "Summary". Here is the approach:
extractSummary : function(iCalContent) {
/*
input : iCal file content
return : Event summary
*/
var arr = iCalContent.match(/^SUMMARY\:(.)*$/g);
return(arr);
}
function extractSummary(iCalContent) {
var rx = /\nSUMMARY:(.*)\n/g;
var arr = rx.exec(iCalContent);
return arr[1];
}
You need these changes:
Put the * inside the parenthesis as
suggested above. Otherwise your matching
group will contain only one
character.
Get rid of the ^ and $. With the global option they match on start and end of the full string, rather than on start and end of lines. Match on explicit newlines instead.
I suppose you want the matching group (what's
inside the parenthesis) rather than
the full array? arr[0] is
the full match ("\nSUMMARY:...") and
the next indexes contain the group
matches.
String.match(regexp) is
supposed to return an array with the
matches. In my browser it doesn't (Safari on Mac returns only the full
match, not the groups), but
Regexp.exec(string) works.
You need to use the m flag:
multiline; treat beginning and end characters (^ and $) as working
over multiple lines (i.e., match the beginning or end of each line
(delimited by \n or \r), not only the very beginning or end of the
whole input string)
Also put the * in the right place:
"DATE:20091201T220000\r\nSUMMARY:Dad's birthday".match(/^SUMMARY\:(.*)$/gm);
//------------------------------------------------------------------^ ^
//-----------------------------------------------------------------------|
Your regular expression most likely wants to be
/\nSUMMARY:(.*)$/g
A helpful little trick I like to use is to default assign on match with an array.
var arr = iCalContent.match(/\nSUMMARY:(.*)$/g) || [""]; //could also use null for empty value
return arr[0];
This way you don't get annoying type errors when you go to use arr
This code works:
let str = "governance[string_i_want]";
let res = str.match(/[^governance\[](.*)[^\]]/g);
console.log(res);
res will equal "string_i_want". However, in this example res is still an array, so do not treat res like a string.
By grouping the characters I do not want, using [^string], and matching on what is between the brackets, the code extracts the string I want!
You can try it out here: https://www.w3schools.com/jsref/tryit.asp?filename=tryjsref_match_regexp
Good luck.
(.*) instead of (.)* would be a start. The latter will only capture the last character on the line.
Also, no need to escape the :.
You should use this :
var arr = iCalContent.match(/^SUMMARY\:(.)*$/g);
return(arr[0]);
this is how you can parse iCal files with javascript
function calParse(str) {
function parse() {
var obj = {};
while(str.length) {
var p = str.shift().split(":");
var k = p.shift(), p = p.join();
switch(k) {
case "BEGIN":
obj[p] = parse();
break;
case "END":
return obj;
default:
obj[k] = p;
}
}
return obj;
}
str = str.replace(/\n /g, " ").split("\n");
return parse().VCALENDAR;
}
example =
'BEGIN:VCALENDAR\n'+
'VERSION:2.0\n'+
'PRODID:-//hacksw/handcal//NONSGML v1.0//EN\n'+
'BEGIN:VEVENT\n'+
'DTSTART:19970714T170000Z\n'+
'DTEND:19970715T035959Z\n'+
'SUMMARY:Bastille Day Party\n'+
'END:VEVENT\n'+
'END:VCALENDAR\n'
cal = calParse(example);
alert(cal.VEVENT.SUMMARY);

Categories

Resources