Javascript regex to bring back all symbol matches? - javascript

I need a javascript regex object that brings back any matches of symbols in a string,
take for example the following string:
input = !"£$[]{}%^&*:#\~#';/.,<>\|¬`
then the following code:
input.match(regExObj,"g");
would return an array of matches:
[[,!,",£,$,%,^,&,*,:,#,~,#,',;,/,.,,,<,>,\,|,¬,`,]]
I have tried the following with no luck.
match(/[U+0021-U+0027]/g);
and I cannot use the following because I need to allow none ascii chars, for example Chinese characters.
[^0-9a-zA-Z\s]

var re = /[!"\[\]{}%^&*:#~#';/.<>\\|`]/g;
var matches = [];
var someString = "aejih!\"£$[]{}%^&*:#\~#';/.,<>\\|¬`oejtoj%";
while(match = re.exec(someString)) {
matches.push(match[1]);
}
Getting
['!','"','[',']','{','}','%','^','&','*',':','#','~','#',''',';','/','.','<','>','\','|','`','%]

What about
/[!"£$\[\]{}%^&*:#\\~#';\/.,<>|¬`]/g
?

Related

Regex check both side of match but not include in match string

I want get match with checking both side expropriation of main match.
var str = 1234 word !!! 5678 another *** 000more))) get word and another
console.log(str.match(/(?!\d+\s?)\w+(?=\s?\W+)/g))
>> (3) ["word", "another", "more"]
it check both side but not include in the main match sets.
But in html it not working [not working]
var str = ''; get url, url2 and url3
console.log(str.match(/(?!href=")[^"]+?(?=")/g))
>> (6) ["<a href=", "url3"]
I try to Negative lookarounds using (?!href=") and Positive lookarounds using (?=") to match only the value of its attribute but it return more attributes.
Is there any way to so like this here, Thanks
What you could do for your example data is capture what is between double quotes href="([^"]+) in an captured group and loop through the result:
var str = '';
var pattern = /href="([^"]+)/g;
var match = pattern.exec(str);
while (match != null) {
console.log(match[1]);
match = pattern.exec(str);
}
In other flavors of regex you could have used e.g. positive lookbehind
((?<=href="), but unfortunately Javascript regex does not support
lookbehinds.
A reasonable solution is:
Match href=" as "ordinary" content, to be ignored.
Match the attribute value as a capturing group ((\w+)),
to be "consumed".
Set the boundary of the above group with a *positive lookup"
((?=")), just as you did.
So the whole regex can be:
href="(\w+)(?=")
and read "your" value from group 1.
You can't parse HTML with regex. Because HTML can't be parsed by regex.
Have you tried using the DOM parser that's right at your fingertips?
var str = '';
var div = document.createElement('div');
div.innerHTML = str; // parsing magic!
var links = Array.from(div.getElementsByTagName("a"));
var urls = links.map(function(a) {return a.href;});
// above returns fully-resolved absolute URLs.
// for the literal attribute value, try a.getAttribute("href")
console.log(urls);

regex match everything between two different characters

I want to match a string between (but not including) these two characters: ? and &
Example string:
localhost/path/doc.html?970441179&token=specialtoken&actionurl=/portletaction/01654/0112
So from the above I want to match the string 970441179
var str = "?samplestring&";
var patt = /[?]([^&]*)[&]/g;
var res = patt.exec(str)[1];
'res' is your desired result.
Try this regex (\d+)(?=&):
var str = "localhost/path/doc.html?970441179&token=specialtoken&actionurl=/portletaction/01654/0112";
console.log(str.match(/(\d+)(?=&)/g));
Note that it will work just for that specific case.

Getting each 'word' after every underscore in a string in Javascript using regex

I'm wanting to extract each block of alphanumeric characters that come after underscores in a Javascript string. I currently have it working using a combination of string methods and regex like so:
var string = "ignore_firstMatch_match2_thirdMatch";
var firstValGone = string.substr(string.indexOf('_'));
// returns "_firstMatch_match2_thirdMatch"
var noUnderscore = firstValGone.match(/[^_]+/g);
// returns ["firstMatch", "match2" , "thirdMatch"]
I'm wondering if there's a way to do it purely using regex? Best I've managed is:
var string = "ignore_firstMatch_match2_thirdMatch";
var matchTry = string.match(/_[^_]+/g);
// returns ["_firstMatch", "_match2", "_thirdMatch"]
but that returns the preceding underscore too. Given you can't use lookbehinds in JS I don't know how to match the characters after, but exclude the underscore itself. Is this possible?
You can use a capture group (_([^_]+)) and use RegExp#exec in a loop while pushing the captured values into an array:
var re = /_([^_]+)/g;
var str = 'ignore_firstMatch_match2_thirdMatch';
var res = [];
while ((m = re.exec(str)) !== null) {
res.push(m[1]);
}
document.body.innerHTML = "<pre>" + JSON.stringify(res, 0, 4) + "</pre>";
Note that using a string#match() with a regex defined with a global modifier /g will lose all the captured texts, that's why you cannot just use str.match(/_([^_]+)/g).
Since lookbehind is not supported in JS the only way I can think of is using a group like this.
Regex: _([^_]+) and capture group using \1 or $1.
Regex101 Demo
var myString = "ignore_firstMatch_match2_thirdMatch";
var myRegexp = /_([^_]+)/g;
match = myRegexp.exec(myString);
while (match != null) {
document.getElementById("match").innerHTML += "<br>" + match[0];
match = myRegexp.exec(myString);
}
<div id="match">
</div>
An alternate way using lookahead would be something like this.
But it takes long in JS. Killed my page thrice. Would make a good ReDoS exploit
Regex: (?=_([A-Za-z0-9]+)) and capture groups using \1 or $1.
Regex101 Demo
Why do you assume you need regex? a simple split will do the job:
string str = "ignore_firstMatch_match2_thirdMatch";
IEnumerable<string> matches = str.Split('_').Skip(1);

Javascript Regex to get text between certain characters

I need a regex in Javascript that would allow me to match an order number in two different formats of order URL:
The URLs:
http://store.apple.com/vieworder/1003123464/test#test.com
http://store.apple.com/vieworder/W411234368/test#test.com/AOS-A=
M-104121
The first one will always be all numbers, and the second one will always start with a W, followed by just numbers.
I need to be able to use a single regex to return these matches:
1003123464
W411234368
This is what I've tried so far:
/(vieworder\/)(.*?)(?=\/)/g
RegExr link
That allows me to match:
vieworder/1003123464
vieworder/W411234368
but I'd like it to not include the first capture group.
I know I could then run the result through a string.replace('vieworder/'), but it'd be cool to be able to do this in just one command.
Use your expression without grouping vieworder
vieworder\/(.*?)(?=\/)
DEMO
var string = 'http://store.apple.com/vieworder/1003123464/test#test.com http://store.apple.com/vieworder/W411234368/test#test.com/AOS-A=M-104121';
var myRegEx = /vieworder\/(.*?)(?=\/)/g;
var index = 1;
var matches = [];
var match;
while (match = myRegEx.exec(string)) {
matches.push(match[index]);
}
console.log(matches);
Use replace instead of match since js won't support lookbehinds. You could use capturing groups and exec method to print the chars present inside a particular group.
> var s1 = 'http://store.apple.com/vieworder/1003123464/test#test.com'
undefined
> var s2 = 'http://store.apple.com/vieworder/W411234368/test#test.com/AOS-A='
undefined
> s1.replace(/^.*?vieworder\/|\/.*/g, '')
'1003123464'
> s2.replace(/^.*?vieworder\/|\/.*/g, '')
'W411234368'
OR
> s1.replace(/^.*?\bvieworder\/([^\/]*)\/.*/g, '$1')
'1003123464'
I'd suggest
W?\d+
That ought to translate to "one or zero W and one or more digits".

regex: any string between two slashes first of them is prefixed with a defined string

I'd like to get the talker name of some mp3s files paths such as the following:
/assets/audio/James_Lee/001.mp3
/assets/audio/Marc_Smith/001.mp3
/aasets/audio/blahblah/001.mp3
In the previous example we note that each talker name is surrounded by two slashes where the first of them is prefixed with the word audio. I need a pattern that matches names like the example above using javascript.
I tried at http://regexpal.com/ :
audio/.*/
but it only matches *audio/The_name/* where I need *The_name* only. The other thing I don't know how could I use such patterns with javascript replace().
This will get your the name: (?<=\/assets\/audio\/).*(?=\/)
Here's the regex in use: http://regexr.com?34747
Considering Javascript, you could do this:
var string = "/assets/audio/James_Lee/001.mp3";
var name = string.replace(/^.*\/audio\/|\/[\d]+\..*$/g, '');
Try this:
var str = "/assets/audio/James_Lee/001.mp3\n/assets/audio/Marc_Smith/001.mp3";
var pattern = /audio\/(.+?)\//g;
var match;
var matches = [];
while ((match = pattern.exec(str)) !== null){
matches.push(match[1]);
}
console.log(matches);
// If you want a string with only the names, you can re-combine the matches
str = matches.join('\n');
how about this?
str.replace(/.*audio\/([^\/]*)\/.*/,"$1")

Categories

Resources