Regex do not match content but whole searched string - javascript

I'm using this regex to match an "href" attribute in a <a> tag:
var href_matches = postRep.match(/href="(.*?)"/g);
The regex matches correctly the href except it returns the whole "href=http:example.com" string.
How do I manage to get only the href value (eg. "example.com")?

You can either run exec() on the regex :
var url_match = /href="(.*?)"/g.exec(postRep);
or remove the global flag
var url_match = postRep.match(/href="(.*?)"/);
Using String's match() function won't return captured groups if the
global modifier is set.

Just another idea.
You can try something like this function:
function getHrefs(inputString) {
var out = [];
inputString.replace(/\bhref\b=['"]([^'"]+)['"]/gi, function(result, backreference) {
out.push(backreference);
return '';
});
return out;
}
Improved solution (much shortest):
function getHrefs(inputString) {
return (inputString.match(/\bhref\b=['"][^'"]+(?=['"])/gi) || []).map(s => s.replace(/^href=["']/,""));
}
Edit:
There is other option - exec. But with exec you will need loop to get all matches (if you need this).

You can use regex lookbehinds to check if the "href=" is there without actually including it in the match.
For example, the regex (?<=href=)example\.com applied to href=example.com should only match example.com.
EDIT:
This method only works in languages that support regex lookbehinds. Javascript doesn't support this feature. (thanks to Georgi Naumov for pointing this out)

Related

Match only # and not ## without negative lookbehind

Using JavaScript, I need a regex that matches any instance of #{this-format} in any string. My original regex was the following:
#{[a-z-]*}
However, I also need a way to "escape" those instances. I want it so that if you add an extra #, the match gets escaped, like ##{this}.
I originally used a negative lookbehind:
(?<!#)#{[a-z-]*}
And that would work just fine, except... lookbehinds are an ECMAScript2018 feature, only supported by Chrome.
I read some people suggesting the usage of a negated character set. So my little regex became this:
(?:^|[^#])#{[a-z-]*}
...which would have worked just as well, except it doesn't work if you put two of these together: #{foo}#{bar}
So, anyone knows how can I achieve this? Remember that these conditions need to be met:
Find #{this} anywhere in a string
Be able to escape like ##{this}
Be able to put multiple adjacent, like #{these}#{two}
Lookbehinds must not be used
If you include ## in your regex pattern as an alternate match option, it will consume the ## instead of allowing a match on the subsequent bracketed entity. Like this:
##|(#{[a-z-]*})
You can then evaluate the inner match object in javascript. Here is a jsfiddle to demonstrate, using the following code.
var targetText = '#{foo} in a #{bar} for a ##{foo} and #{foo}#{bar} things.'
var reg = /##|(#{[a-z-]*})/g;
var result;
while((result = reg.exec(targetText)) !== null) {
if (result[1] !== undefined) {
alert(result[1]);
}
}
You could use (?:^|[^#])# to match the start of the pattern, and capture the following #{<sometext>} in a group. Since you don't want the initial (possible) [^#] to be in the result, you'll have to iterate over the matches manually and extract the group that contains the substring you want. For example:
function test(str) {
const re = /(?=(?:^|[^#])(#{[a-z-]*}))./g;
let match;
const matches = [];
while (match = re.exec(str)) {
matches.push(match[1]); // extract the captured group
}
return matches;
}
console.log(test('##{this}'))
console.log(test('#{these}#{two}'))

JavaScript: Can I use the filter function with regular expressions?

I tried to find a similar question to avoid creating a duplicate and I couldn’t, but I apologise if I missed any. I've just started learning how to code and I've encountered this problem:
With JavaScript, I want to use the filter arrays method (https://www.freecodecamp.org/challenges/filter-arrays-with-filter) with a general expression for all non alphanumeric characters.
For example:
var newArray = oldArray.filter(function(val) {
return val !== /[\W_]/g;
});
Can I do that? In the mozilla guide (https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_Expressions) it mentions you can use regular expressions with replace, and I understand how to do that, but it doesn’t mention filter at all.
To put another less abstract example, this is the code I’m working on:
function palindrome(str) {
var splitStr = str.split("");
var filterArray = splitStr.filter(function(val) {
return val !== /[\W_]/g;
});
return filterArray;
}
palindrome("ey*e");
If I’m doing things right so far, the function should return [“e”, “y”, “e”]. But it returns [“e”, “y”, “*”, “e”] (as if I hadn’t filtered it at all). I just wonder if I’ve made a mistake in my code, or if one simply can’t use filter with regular expressions.
If that's the case, why? Why can't one use filter with regular expressions!? Why do we have to use replace instead?
This really isn't an issue relating to .filter(), it's just that you aren't testing your string against your regular expression properly.
To test a string against a regular expression you use the .test() method:
function palindrome(str) {
var splitStr = str.split("");
var filterArray = splitStr.filter(function(val) {
// Test the string against the regular expression
// and test for no match (whole thing is preceeded by !)
return !/[\W_]/g.test(val);
});
return filterArray;
}
console.log(palindrome("ey*e"));
Instead of first splitting the string into chars, and then test every single one of them, why don't you just get all matches for the string?
function palindrome(str) {
return str.match(/[a-zA-Z0-9]/g) || [];
}
let chars = palindrome("ey*e");
console.log(chars);
About the used regex: \W is the same as [^\w] or [^a-zA-Z0-9_]. So, not [\W_] is equivalent to [a-zA-Z0-9].

Javascript exec maintaing state

I am currently trying to build a little templating engine in Javascript by replacing tags in a html5 tag by find and replace with a regex.
I am using exec on my regular expression and I am looping over the results. I am wondering why the regular expressions breaks in its current form with the /g flag on the regular expression but is fine without it?
Check the broken example and remove the /g flag on the regular expression to view the correct output.
var TemplateEngine = function(tpl, data) {
var re = /(?:<|<)%(.*?)(?:%>|>)/g, match;
while(match = re.exec(tpl)) {
tpl = tpl.replace(match[0], data[match[1]])
}
return tpl;
}
https://jsfiddle.net/stephanv/u5d9en7n/
Can somebody explain to me a little bit more on depth why my example breaks exactly on:
<p><%more%></p>
The reason is explained in javascript string exec strange behavior.
The solution you need is actually a String.replace with a callback as a replacement:
var TemplateEngine = function(tpl, data) {
var re = /(?:<|<)%(.*?)(?:%>|>)/g, match;
return tpl.replace(re, function($0, $1) {
return data[$1] ? data[$1] : $0;
});
}
See the updated fiddle
Here, the regex finds all non-overlapping matches in the string, sequentially, and passes the match to the callback method. $0 is the full match and $1 is the Group 1 contents. If data[$1] exists, it is used to replace the whole match, else, the whole match is inserted back.
Check this link https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/lastIndex. When using the g flag the object that you store the regex in (re) will keep track of the position of the last match in the lastIndex property and the next time you use that object the search will start from the position of lastIndex.
To solve this you could either manually reset the lastIndex property each time or not save the regex in an object and use it inline like so:
while(match = /(?:<|<)%(.*?)(?:%>|>)/g.exec(tpl)) {

JS regex match (doesn't begin with [A-z]+://)www

I have some links in which people did not add the protocol to. I.e., www.stackoverflow.com. If the link begins with www., I want to replace it with 'http://www.'.
How can I do this with JavaScript regular expressions?
I tried the code below, but I can't seem to match the pattern 'doesn't start with [A-z]+://www.'.
The links are mixed in with text.
jQuery(document).ready(function () {
jQuery('.myClass').each(function (index) {
var temp = wwwify(jQuery(this).text());
jQuery(this).html(temp);
});
});
function wwwify(text) {
var regex = /(?!\b([A-z]+:\/\/))www\./igm;
return text.replace(regex, 'http://www.');
}
Why not just use the following?
if (text.substring(0,4)=='www.') {
text = 'http://'+text;
}
You could just easily replace each "http://www." to "www." and then replace all "www." to "http://www.". It might not be the prettiest regexp you could imagine, but it will solve your problem.
$(document).ready(function () {
$('.myClass').each(function (index) {
var $elm = $(this); // cache $(this) for reuse
var html = $elm.html();
html = html.replace(/http\:\/\/www\./ig, "www.").replace(/www\./ig, "http://www."); ;
$elm.html(html);
});
});
You need to anchor your regex to the start of the string. Also the range needs to be /[a-z]/ as the /i modifier will cover the upper-case possibilities. The /m and /g modifiers are irrelevant here. Leaving
var regex = /^(?![a-z]+:\/\/)www\./i;
My apologies, I missed the part saying "The links are mixed in with text". Without look-behind this can only be done using a function to return a replacement string. I suggest this, which captures any protocol before the www. and replaces it with http:// if it is blank
var regex = /\b([a-z]+:\/\/)?www\./ig;
text.replace(regex, function(url, protocol) {
return protocol ? url : "http://" + url;
});
Since I haven't found any suitable regex solutions through SO or elsewhere, just using regular javascript replace may be the best solution.
For now I'm making two passes through the text:
function wwwLineBeginsWith(text) {
var regex = /^www./gi;
return text.replace(regex, 'http://');
}
function wwwWordBeginsWith(text) {
var regex = /\swww./gi; return text.replace(regex, 'http://');
}
var test1 = 'www.test2.com';
test1 = wwwLineBeginsWith(test1);
test1 = wwwWordBeginsWith(test1);
console.log(wwwWordBeginsWith(test1));
How about replacing those with a protocol regardless?
function wwwify(text) {
return text.replace(/(http(s?):\/\/)?www\./ig, 'http$2://www.');
}
The reason it's currently not working is because JavaScript doesn't support lookbehinds, only lookaheads. You would need the syntax (?<!, which is not available in JavaScript regular expressions.
If you absolutely must use RegExp to determine this, I would recommend using something like /^[^Hh][^Tt]{2}[^Pp]:\/\// for the RegExp. Otherwise, I agree with the other posters... using indexOf would be far easier (i.e., url.toLowerCase().indexOf('http://') !== 0).

JavaScript match substring after RegExp

I have a string that look something like
something30-mr200
I would like to get everything after the mr (basically the # followed by mr) *always there is going to be the -mr
Any help will be appreciate it.
You can use a regexp like the one Bart gave you, but I suggest using match rather than replace, since in case a match is not found, the result is the entire string when using replace, while null when using match, which seems more logical. (as a general though).
Something like this would do the trick:
function getNumber(string) {
var matches = string.match(/-mr([0-9]+)/);
return matches[1];
}
console.log(getNumber("something30-mr200"));
var result = "something30-mr200".split("mr")[1];
or
var result = "something30-mr200".match(/mr(.*)/)[1];
Why not simply:
-mr(\d+)
Then getting the contents of the capture group?
What about:
function getNumber(input) { // rename with a meaningful name
var match = input.match(/^.*-mr(\d+)$/);
if (match) { // check if the input string matched the pattern
return match[1]; // get the capturing group
}
}
getNumber("something30-mr200"); // "200"
This may work for you:
// Perform the reg exp test
new RegExp(".*-mr(\d+)").test("something30-mr200");
// result will equal the value of the first subexpression
var result = RegExp.$1;
What about finding the position of -mr, then get the substring from there + 3?
It's not regex, but seems to work given your description?

Categories

Resources