Iterate through regular expression array in Javascript - javascript

How can I use an array of Regex expressions and iterate that array with 'exec' operation. I did initialize an array with various regular expressions like this:
var arrRegex = new Array(/(http:\/\/(?:.*)\/)/g, /(http:\/\/(?:.*)\/)/g);
Now I created a for loop that does this:
for(i=0;i<arrRegex.length;i++){
arrRegex[i].exec(somestring);
}
The thing is that this doesn't seems to work. I don't want to use it hardcoded like this:
(/(http:\/\/(?:.*)\/)/g).exec(somestring);
When using the array option, the '.exec' function returns null. When I use the hardcoded option it returns the matches as I wanted.

The exec() returns the match so you should be able to capture it.
somestring = 'http://stackoverflow.com/questions/11491489/iterate-through-regular-expression-array-in-javascript';
var arrRegex = new Array(/(http:\/\/(?:.*)\/)/g, /(http:\/\/(?:.*)\/)/g);
for (i = 0; i < arrRegex.length; i++) {
match = arrRegex[i].exec(somestring);
}
match is an array, with the following structure:
{
[0] = 'string matched by the regex'
[1] = 'match from the first capturing group'
[2] = 'match from the second capturing group'
... and so on
}
Take a look at this jsFiddle http://jsfiddle.net/HHKs2/1/
You can also use test() instead of exec() as a shorthand for exec() != null. test() will return a boolean variable depending on whether the regex matches part of the string or not.

What you probably want to do is to capture the first group:
for(i=0;i<arrRegex.length;i++){
var someotherstring = arrRegex[i].exec(somestring)[1];
// do something with it ...
}
BTW: That is my guess, not sure what you are trying to do. But if you are trying to get the host name of a URL you should use /(http:\/\/(?:.?)\/)/g. The question mark after .* makes the previous quantifier (*) ungreedy.

Related

How can I cut the string after a second underscore?

I'm receiving a list of files in an object and I just need to display a file name and its type in a table.
All files come back from a server in such format: timestamp_id_filename.
Example: 1568223848_12345678_some_document.pdf
I wrote a helper function which cuts the string.
At first, I did it with String.prototype.split() method, I used regex, but then again - there was a problem. Files can have underscores in their names so that didn't work, so I needed something else. I couldn't come up with a better idea. I think it looks really dumb and it's been haunting me the whole day.
The function looks like this:
const shortenString = (attachmentName) => {
const file = attachmentName
.slice(attachmentName.indexOf('_') + 1)
.slice(attachmentName.slice(attachmentName.indexOf('_') + 1).indexOf('_') + 1);
const fileName = file.slice(0, file.lastIndexOf('.'));
const fileType = file.slice(file.lastIndexOf('.'));
return [fileName, fileType];
};
I wonder if there is a more elegant way to solve the problem without using loops.
You can use replace and split, with the pattern we are replacing the string upto the second _ from start of string and than we split on . to get name and type
let nameAndType = (str) => {
let replaced = str.replace(/^(?:[^_]*_){2}/g, '')
let splited = replaced.split('.')
let type = splited.pop()
let name = splited.join('.')
return {name,type}
}
console.log(nameAndType("1568223848_12345678_some_document.pdf"))
console.log(nameAndType("1568223848_12345678_some_document.xyz.pdf"))
function splitString(val){
return val.split('_').slice('2').join('_');
}
const getShortString = (str) => str.replace(/^(?:[^_]*_){2}/g, '')
For input like
1568223848_12345678_some_document.pdf, it should give you something like some_document.pdf
const re = /(.*?)_(.*?)_(.*)/;
const name = "1568223848_12345678_some_document.pdf";
[,date, id, filename] = re.exec(name);
console.log(date);
console.log(id);
console.log(filename);
some notes:
you want to make the regular expression 1 time. If you do this
function getParts(str) {
const re = /expression/;
...
}
Then you're making a new regular expression object every time you call getParts.
.*? is faster than .*
This is because .* is greedy so the moment the regular expression engine sees that it puts the entire rest of the string into that slot and then checks if can continue the expression. If it fails it backs off one character. If that fails it backs off another character, etc.... .*? on the other hand is satisfied as soon as possible. So it adds one character then sees if the next part of the expression works, if not it adds one more character and sees if the expressions works, etc..
splitting on '_' works but it could potentially make many temporary strings
for example if the filename is 1234_1343_a________________________.pdf
you'd have to test to see if using a regular experssion is faster or slower than splitting, assuming speed matters.
You can kinda chain .indexOf to get second offset and any further, although more than two would look ugly. The reason is that indexOf takes start index as second argument, so passing index of the first occurrence will help you find the second one:
var secondUnderscoreIndex = name.indexOf("_",name.indexOf("_")+1);
So my solution would be:
var index = name.indexOf("_",name.indexOf("_")+1));
var [timestamp, name] = [name.substring(0, index), name.substr(index+1)];
Alternatively, using regular expression:
var [,number1, number2, filename, extension] = /([0-9]+)_([0-9]+)_(.*?)\.([0-9a-z]+)/i.exec(name)
// Prints: "1568223848 12345678 some_document pdf"
console.log(number1, number2, filename, extension);
I like simplicity...
If you ever need the date in times, theyre in [1] and [2]
var getFilename = function(str) {
return str.match(/(\d+)_(\d+)_(.*)/)[3];
}
var f = getFilename("1568223848_12345678_some_document.pdf");
console.log(f)
If ever files names come in this format timestamp_id_filename. You can use a regular expression that skip the first two '_' and save the nex one.
test:
var filename = '1568223848_12345678_some_document.pdf';
console.log(filename.match(/[^_]+_[^_]+_(.*)/)[1]); // result: 'some_document.pdf'
Explanation:
/[^]+[^]+(.*)/
[^]+ : take characters diferents of ''
: take '' character
Repeat so two '_' are skiped
(.*): Save characters in a group
match method: Return array, his first element is capture that match expression, next elements are saved groups.
Split the file name string into an array on underscores.
Discard the first two elements of the array.
Join the rest of the array with underscores.
Now you have your file name.

Convert string to function with parameters

I am struggling with writing a regular expression to turn the string
"fetchSomething('param1','param2','param3')"
into the proper function call. I can do it with some splitting and substrings but would rather do it with a .match using capture groups for efficiency's sake (and my own education).
However when I use
'something("stuff","moreStuff","yetMoreStuff")'.match(/(?:\(|,)("?\w+"?)/g)
I get
["("stuff"", ","moreStuff"", ","yetMoreStuff""]
Which is the same result regardless of the ?:, this confuses me since I thought ?: would cause it to ignore the first capture group? Or am I completely miss understanding capture groups?
You get the whole string when you have the g flag active. If you're going only after the sub-matches, then you will need to use .exec and a loop:
var regex = /(?:\(|,)("?\w+"?)/g;
var s = 'something("stuff","moreStuff","yetMoreStuff")';
var match, matches=[];
while ( (match=regex.exec(s)) !== null ) {
matches.push(match[1]);
}
alert(matches);
jsfiddle

get the matched data into array in javascript without using any loop

Problem background -- I want to get the nth root of the number where user can enter expression like "the nth root of x'.I have written a function nthroot(x,n) which return proper expected output.My problem is to extract the value of x and n from expression.
I want to extract some matched pattern and store it to a an array for further processing so that in next step i will pop two elements from array and replace the result in repression.But I am unable to get all the values into an array without using loop.
A perl equivalent of my code will be like below.
$str = "the 2th root of 4+678+the 4th root of -10000x90";
#arr = $str =~ /the ([-+]?\d+)th\s?root\s?of\s?([-+]?\d+)/g;
print "#arr";
I want the javascript equivalent of the above code.
or
any one line expression like below.
expr = expr.replace(/the\s?([+-]\d+)th\s?root\s?of([+-]\d+)/g,nthroot(\\$2,\\$1));
Please help me for the same.
The .replace() method that you are currently using is, as its name implies, used to do a string replacement, not to return the individual matches. It would make more sense to use the .match() method instead, but you can (mis)use .replace() if you use a callback function:
var result = expr.replace(/the\s?([+-]\d+)th\s?root\s?of([+-]\d+)/,function(m,m1,m2){
return nthroot(+m2, +m1);
});
Note that the arguments in the callback will be strings, so I'm converting them to numbers with the unary plus operator when passing them to your nthroot() function.
var regex=/the ([-+]?\d+)th\s?root\s?of\s?([-+]?\d+)/g;
expr=expr.replace(regex, replaceCallback);
var replaceCallback = function(match,p1,p2,offset, s) {
return nthroot(p2,p1);
//p1 and p2 are similar to $1 $2
}

javascript regexp can't find way to access grouped results

re = //?(\w+)/(\w+)/
s = '/projects/new'
s.match(re)
I have this regular expression which I will use to sieve out the branch name, e.g., projects, and the 'tip' name, e.g., new
I read that one can have access to the grouped results with $1, $2, and so on, but I can't seem to get it to work, at least in Firebug console
When I run the above code, then run
RegExp.$1
it shows
""
Same goes on for $2.
any ideas?
Thanks!
Without the g flag, str.match(regexp) returns the same as regexp.exec(str). And that is:
The returned array has the matched text as the first item, and then one item for each capturing parenthesis that matched containing the text that was captured. If the match fails, the exec method returns null.
So you can do this:
var match = s.match(re);
match[1]
match gives you an array of the matched expressions:
> s.match(re)[1]
"projects"
> s.match(re)[2]
"new"
I you are accessing the array of matches the wrong way do something like:
re = /\/?(\w+)\/(\w+)/
s = '/projects/new'
var j = s.match(re)
alert(j[1]);
alert(j[2]);

Regex to extract substring, returning 2 results for some reason

I need to do a lot of regex things in javascript but am having some issues with the syntax and I can't seem to find a definitive resource on this.. for some reason when I do:
var tesst = "afskfsd33j"
var test = tesst.match(/a(.*)j/);
alert (test)
it shows
"afskfsd33j, fskfsd33"
I'm not sure why its giving this output of original and the matched string, I am wondering how I can get it to just give the match (essentially extracting the part I want from the original string)
Thanks for any advice
match returns an array.
The default string representation of an array in JavaScript is the elements of the array separated by commas. In this case the desired result is in the second element of the array:
var tesst = "afskfsd33j"
var test = tesst.match(/a(.*)j/);
alert (test[1]);
Each group defined by parenthesis () is captured during processing and each captured group content is pushed into result array in same order as groups within pattern starts. See more on http://www.regular-expressions.info/brackets.html and http://www.regular-expressions.info/refcapture.html (choose right language to see supported features)
var source = "afskfsd33j"
var result = source.match(/a(.*)j/);
result: ["afskfsd33j", "fskfsd33"]
The reason why you received this exact result is following:
First value in array is the first found string which confirms the entire pattern. So it should definitely start with "a" followed by any number of any characters and ends with first "j" char after starting "a".
Second value in array is captured group defined by parenthesis. In your case group contain entire pattern match without content defined outside parenthesis, so exactly "fskfsd33".
If you want to get rid of second value in array you may define pattern like this:
/a(?:.*)j/
where "?:" means that group of chars which match the content in parenthesis will not be part of resulting array.
Other options might be in this simple case to write pattern without any group because it is not necessary to use group at all:
/a.*j/
If you want to just check whether source text matches the pattern and does not care about which text it found than you may try:
var result = /a.*j/.test(source);
The result should return then only true|false values. For more info see http://www.javascriptkit.com/javatutors/re3.shtml
I think your problem is that the match method is returning an array. The 0th item in the array is the original string, the 1st thru nth items correspond to the 1st through nth matched parenthesised items. Your "alert()" call is showing the entire array.
Just get rid of the parenthesis and that will give you an array with one element and:
Change this line
var test = tesst.match(/a(.*)j/);
To this
var test = tesst.match(/a.*j/);
If you add parenthesis the match() function will find two match for you one for whole expression and one for the expression inside the parenthesis
Also according to developer.mozilla.org docs :
If you only want the first match found, you might want to use
RegExp.exec() instead.
You can use the below code:
RegExp(/a.*j/).exec("afskfsd33j")
I've just had the same problem.
You only get the text twice in your result if you include a match group (in brackets) and the 'g' (global) modifier.
The first item always is the first result, normally OK when using match(reg) on a short string, however when using a construct like:
while ((result = reg.exec(string)) !== null){
console.log(result);
}
the results are a little different.
Try the following code:
var regEx = new RegExp('([0-9]+ (cat|fish))','g'), sampleString="1 cat and 2 fish";
var result = sample_string.match(regEx);
console.log(JSON.stringify(result));
// ["1 cat","2 fish"]
var reg = new RegExp('[0-9]+ (cat|fish)','g'), sampleString="1 cat and 2 fish";
while ((result = reg.exec(sampleString)) !== null) {
console.dir(JSON.stringify(result))
};
// '["1 cat","cat"]'
// '["2 fish","fish"]'
var reg = new RegExp('([0-9]+ (cat|fish))','g'), sampleString="1 cat and 2 fish";
while ((result = reg.exec(sampleString)) !== null){
console.dir(JSON.stringify(result))
};
// '["1 cat","1 cat","cat"]'
// '["2 fish","2 fish","fish"]'
(tested on recent V8 - Chrome, Node.js)
The best answer is currently a comment which I can't upvote, so credit to #Mic.

Categories

Resources