Regex check both side of match but not include in match string - javascript

I want get match with checking both side expropriation of main match.
var str = 1234 word !!! 5678 another *** 000more))) get word and another
console.log(str.match(/(?!\d+\s?)\w+(?=\s?\W+)/g))
>> (3) ["word", "another", "more"]
it check both side but not include in the main match sets.
But in html it not working [not working]
var str = ''; get url, url2 and url3
console.log(str.match(/(?!href=")[^"]+?(?=")/g))
>> (6) ["<a href=", "url3"]
I try to Negative lookarounds using (?!href=") and Positive lookarounds using (?=") to match only the value of its attribute but it return more attributes.
Is there any way to so like this here, Thanks

What you could do for your example data is capture what is between double quotes href="([^"]+) in an captured group and loop through the result:
var str = '';
var pattern = /href="([^"]+)/g;
var match = pattern.exec(str);
while (match != null) {
console.log(match[1]);
match = pattern.exec(str);
}

In other flavors of regex you could have used e.g. positive lookbehind
((?<=href="), but unfortunately Javascript regex does not support
lookbehinds.
A reasonable solution is:
Match href=" as "ordinary" content, to be ignored.
Match the attribute value as a capturing group ((\w+)),
to be "consumed".
Set the boundary of the above group with a *positive lookup"
((?=")), just as you did.
So the whole regex can be:
href="(\w+)(?=")
and read "your" value from group 1.

You can't parse HTML with regex. Because HTML can't be parsed by regex.
Have you tried using the DOM parser that's right at your fingertips?
var str = '';
var div = document.createElement('div');
div.innerHTML = str; // parsing magic!
var links = Array.from(div.getElementsByTagName("a"));
var urls = links.map(function(a) {return a.href;});
// above returns fully-resolved absolute URLs.
// for the literal attribute value, try a.getAttribute("href")
console.log(urls);

Related

How to regex replace a query string with matching 2 words?

I have a url and I want to replace the query string. For example
www.test.com/is/images/383773?wid=200&hei=200
I want to match the wid= and hei= and the numbers don't have to be 200 to replace the whole thing so it should look like this.
Expected
www.test.com/is/images/383773?#HT_dtImage
So I've tried doing but it only replaced the matching wei and hei.
const url = "www.test.com/is/images/383773?wid=200&hei=200"
url.replace(/(wid)(hei)_[^\&]+/, "#HT_dtImage")
You can match either wid= or hei= until the next optional ampersand and then remove those matches, and then append #HT_dtImage to the result.
\b(?:wid|hei)=[^&]*&?
The pattern matches:
\b A word boundary to prevent a partial word match
(?:wid|hei)= Non capture group, match either wid or hei followed by =
[^&]*&? Match 0+ times a char other than &, and then match an optional &
See a regex demo.
let url = "www.test.com/is/images/383773?wid=200&hei=200"
url = url.replace(/\b(?:wid|hei)=[^&]*&?/g, "") + "#HT_dtImage";
console.log(url)
I would just use string split here:
var url = "www.test.com/is/images/383773?wid=200&hei=200";
var output = url.split("?")[0] + "?#HT_dtImage";
console.log(output);
If you only want to target query strings havings both keys wid and hei, then use a regex approach:
var url = "www.test.com/is/images/383773?wid=200&hei=200";
var output = url.replace(/(.*)\?(?=.*\bwid=\d+)(?=.*\bhei=\d+).*/, "$1?#HT_dtImage");
console.log(output);
You can make use of lookaround using regex /\?.*/
const url = 'www.test.com/is/images/383773?wid=200&hei=200';
const result = url.replace(/\?.*/, '?#HT_dtImage');
console.log(result);
Try
url.replace(/\?.*/, "?#HT_dtImage")

Regex match cookie value and remove hyphens

I'm trying to extract out a group of words from a larger string/cookie that are separated by hyphens. I would like to replace the hyphens with a space and set to a variable. Javascript or jQuery.
As an example, the larger string has a name and value like this within it:
facility=34222%7CConner-Department-Store;
(notice the leading "C")
So first, I need to match()/find facility=34222%7CConner-Department-Store; with regex. Then break it down to "Conner Department Store"
var cookie = document.cookie;
var facilityValue = cookie.match( REGEX ); ??
var test = "store=874635%7Csomethingelse;facility=34222%7CConner-Department-Store;store=874635%7Csomethingelse;";
var test2 = test.replace(/^(.*)facility=([^;]+)(.*)$/, function(matchedString, match1, match2, match3){
return decodeURIComponent(match2);
});
console.log( test2 );
console.log( test2.split('|')[1].replace(/[-]/g, ' ') );
If I understood it correctly, you want to make a phrase by getting all the words between hyphens and disallowing two successive Uppercase letters in a word, so I'd prefer using Regex in that case.
This is a Regex solution, that works dynamically with any cookies in the same format and extract the wanted sentence from it:
var matches = str.match(/([A-Z][a-z]+)-?/g);
console.log(matches.map(function(m) {
return m.replace('-', '');
}).join(" "));
Demo:
var str = "facility=34222%7CConner-Department-Store;";
var matches = str.match(/([A-Z][a-z]+)-?/g);
console.log(matches.map(function(m) {
return m.replace('-', '');
}).join(" "));
Explanation:
Use this Regex (/([A-Z][a-z]+)-?/g to match the words between -.
Replace any - occurence in the matched words.
Then just join these matches array with white space.
Ok,
first, you should decode this string as follows:
var str = "facility=34222%7CConner-Department-Store;"
var decoded = decodeURIComponent(str);
// decoded = "facility=34222|Conner-Department-Store;"
Then you have multiple possibilities to split up this string.
The easiest way is to use substring()
var solution1 = decoded.substring(decoded.indexOf('|') + 1, decoded.length)
// solution1 = "Conner-Department-Store;"
solution1 = solution1.replace('-', ' ');
// solution1 = "Conner Department Store;"
As you can see, substring(arg1, arg2) returns the string, starting at index arg1 and ending at index arg2. See Full Documentation here
If you want to cut the last ; just set decoded.length - 1 as arg2 in the snippet above.
decoded.substring(decoded.indexOf('|') + 1, decoded.length - 1)
//returns "Conner-Department-Store"
or all above in just one line:
decoded.substring(decoded.indexOf('|') + 1, decoded.length - 1).replace('-', ' ')
If you want still to use a regular Expression to retrieve (perhaps more) data out of the string, you could use something similar to this snippet:
var solution2 = "";
var regEx= /([A-Za-z]*)=([0-9]*)\|(\S[^:\/?#\[\]\#\;\,']*)/;
if (regEx.test(decoded)) {
solution2 = decoded.match(regEx);
/* returns
[0:"facility=34222|Conner-Department-Store",
1:"facility",
2:"34222",
3:"Conner-Department-Store",
index:0,
input:"facility=34222|Conner-Department-Store;"
length:4] */
solution2 = solution2[3].replace('-', ' ');
// "Conner Department Store"
}
I have applied some rules for the regex to work, feel free to modify them according your needs.
facility can be any Word built with alphabetical characters lower and uppercase (no other chars) at any length
= needs to be the char =
34222 can be any number but no other characters
| needs to be the char |
Conner-Department-Store can be any characters except one of the following (reserved delimiters): :/?#[]#;,'
Hope this helps :)
edit: to find only the part
facility=34222%7CConner-Department-Store; just modify the regex to
match facility= instead of ([A-z]*)=:
/(facility)=([0-9]*)\|(\S[^:\/?#\[\]\#\;\,']*)/
You can use cookies.js, a mini framework from MDN (Mozilla Developer Network).
Simply include the cookies.js file in your application, and write:
docCookies.getItem("Connor Department Store");

Getting each 'word' after every underscore in a string in Javascript using regex

I'm wanting to extract each block of alphanumeric characters that come after underscores in a Javascript string. I currently have it working using a combination of string methods and regex like so:
var string = "ignore_firstMatch_match2_thirdMatch";
var firstValGone = string.substr(string.indexOf('_'));
// returns "_firstMatch_match2_thirdMatch"
var noUnderscore = firstValGone.match(/[^_]+/g);
// returns ["firstMatch", "match2" , "thirdMatch"]
I'm wondering if there's a way to do it purely using regex? Best I've managed is:
var string = "ignore_firstMatch_match2_thirdMatch";
var matchTry = string.match(/_[^_]+/g);
// returns ["_firstMatch", "_match2", "_thirdMatch"]
but that returns the preceding underscore too. Given you can't use lookbehinds in JS I don't know how to match the characters after, but exclude the underscore itself. Is this possible?
You can use a capture group (_([^_]+)) and use RegExp#exec in a loop while pushing the captured values into an array:
var re = /_([^_]+)/g;
var str = 'ignore_firstMatch_match2_thirdMatch';
var res = [];
while ((m = re.exec(str)) !== null) {
res.push(m[1]);
}
document.body.innerHTML = "<pre>" + JSON.stringify(res, 0, 4) + "</pre>";
Note that using a string#match() with a regex defined with a global modifier /g will lose all the captured texts, that's why you cannot just use str.match(/_([^_]+)/g).
Since lookbehind is not supported in JS the only way I can think of is using a group like this.
Regex: _([^_]+) and capture group using \1 or $1.
Regex101 Demo
var myString = "ignore_firstMatch_match2_thirdMatch";
var myRegexp = /_([^_]+)/g;
match = myRegexp.exec(myString);
while (match != null) {
document.getElementById("match").innerHTML += "<br>" + match[0];
match = myRegexp.exec(myString);
}
<div id="match">
</div>
An alternate way using lookahead would be something like this.
But it takes long in JS. Killed my page thrice. Would make a good ReDoS exploit
Regex: (?=_([A-Za-z0-9]+)) and capture groups using \1 or $1.
Regex101 Demo
Why do you assume you need regex? a simple split will do the job:
string str = "ignore_firstMatch_match2_thirdMatch";
IEnumerable<string> matches = str.Split('_').Skip(1);

Javascript regex to bring back all symbol matches?

I need a javascript regex object that brings back any matches of symbols in a string,
take for example the following string:
input = !"£$[]{}%^&*:#\~#';/.,<>\|¬`
then the following code:
input.match(regExObj,"g");
would return an array of matches:
[[,!,",£,$,%,^,&,*,:,#,~,#,',;,/,.,,,<,>,\,|,¬,`,]]
I have tried the following with no luck.
match(/[U+0021-U+0027]/g);
and I cannot use the following because I need to allow none ascii chars, for example Chinese characters.
[^0-9a-zA-Z\s]
var re = /[!"\[\]{}%^&*:#~#';/.<>\\|`]/g;
var matches = [];
var someString = "aejih!\"£$[]{}%^&*:#\~#';/.,<>\\|¬`oejtoj%";
while(match = re.exec(someString)) {
matches.push(match[1]);
}
Getting
['!','"','[',']','{','}','%','^','&','*',':','#','~','#',''',';','/','.','<','>','\','|','`','%]
What about
/[!"£$\[\]{}%^&*:#\\~#';\/.,<>|¬`]/g
?

How can I remove all characters up to and including the 3rd slash in a string?

I'm having trouble with removing all characters up to and including the 3 third slash in JavaScript. This is my string:
http://blablab/test
The result should be:
test
Does anybody know the correct solution?
To get the last item in a path, you can split the string on / and then pop():
var url = "http://blablab/test";
alert(url.split("/").pop());
//-> "test"
To specify an individual part of a path, split on / and use bracket notation to access the item:
var url = "http://blablab/test/page.php";
alert(url.split("/")[3]);
//-> "test"
Or, if you want everything after the third slash, split(), slice() and join():
var url = "http://blablab/test/page.php";
alert(url.split("/").slice(3).join("/"));
//-> "test/page.php"
var string = 'http://blablab/test'
string = string.replace(/[\s\S]*\//,'').replace(/[\s\S]*\//,'').replace(/[\s\S]*\//,'')
alert(string)
This is a regular expression. I will explain below
The regex is /[\s\S]*\//
/ is the start of the regex
Where [\s\S] means whitespace or non whitespace (anything), not to be confused with . which does not match line breaks (. is the same as [^\r\n]).
* means that we match anywhere from zero to unlimited number of [\s\S]
\/ Means match a slash character
The last / is the end of the regex
var str = "http://blablab/test";
var index = 0;
for(var i = 0; i < 3; i++){
index = str.indexOf("/",index)+1;
}
str = str.substr(index);
To make it a one liner you could make the following:
str = str.substr(str.indexOf("/",str.indexOf("/",str.indexOf("/")+1)+1)+1);
You can use split to split the string in parts and use slice to return all parts after the third slice.
var str = "http://blablab/test",
arr = str.split("/");
arr = arr.slice(3);
console.log(arr.join("/")); // "test"
// A longer string:
var str = "http://blablab/test/test"; // "test/test";
You could use a regular expression like this one:
'http://blablab/test'.match(/^(?:[^/]*\/){3}(.*)$/);
// -> ['http://blablab/test', 'test]
A string’s match method gives you either an array (of the whole match, in this case the whole input, and of any capture groups (and we want the first capture group)), or null. So, for general use you need to pull out the 1th element of the array, or null if a match wasn’t found:
var input = 'http://blablab/test',
re = /^(?:[^/]*\/){3}(.*)$/,
match = input.match(re),
result = match && match[1]; // With this input, result contains "test"
let str = "http://blablab/test";
let data = new URL(str).pathname.split("/").pop();
console.log(data);

Categories

Resources