Regex advanced way to replace with many expressions - javascript

I didn't find a solution inside regex's documentation for my current problem. I'm using javascript, html.
My code is like this:
var text = 'This [close](animal) is a dog';
I want to get this using regex:
'This {animal} is a dog';
what I mean, i want to have 'close' replaced with { and }.
I know, there's a solution like:
var res = text.replace('[close](','{').replace(')','}');
but in my case, I have many rules and I don't want to duplicate that line to do so. Sometimes I'm using other replacement like '[ xxxxx ]'.
Any idea? Thank you!

You may use
var text = 'This [close](animal) is a dog';
console.log(text.replace(/\[[^\][]*]\(([^()]*)\)/g, '{$1}'));
See the regex demo.
Details
\[ - a [ char
[^\][]* - 0 or more chars other than [ and ]
]\( - a ]( substring
([^()]*) - Capturing group 1: any 0 or more chars other than ( and )
\) - a ) char.
The {$1} replacement is the contents of the capturing group enclosed with braces.
If you can only have two values - close and open - inside [...], and replace close with {...} and open with }...{, you may use
var text = '[open](animal)This [close](animal) is a dog';
console.log(text.replace(/\[(open|close)]\(([^()]*)\)/g, function($0, $1, $2) {
return $1==='close' ? '{'+$2+'}' : '}'+$2+'{';})
);

Don't forget, that you can pass custom regex in Array.prototype.replace. In your case it would be text.replace(/[close](/g,'{'). Full solution of your question will look like:
var res = res.replace(/\[\w+\]\((.*)\)/, (a, b) => {
console.log(a, b);
return `{${b}}`;
});
The brackets around .* used to 'capture' animal inside variable b

Thank you Wiktor, I'v found a solution by what you said
var res0 = text.replace(/\[close]\(([^()]*)\)/g, '{$1}');
var res1 = text.replace(/\[open]\(([^()]*)\)/g, '}$1{');
Sorry if i did miskates, i'm not used to english expression so :-)

Related

Parse query parameters with regexp

I need to parse the url /domain.com?filter[a.b.c]=value1&filter[a.b.d]=value2
and get 2 groups: 'a.b.c' and 'a.b.d'.
I try to parse with regexp [\?&]filter\[(.+\..+)+\]= but the result is 'a.b.c]=value1&filter[a.b.d'. How can I specify to search for the 1st occurrence?
You may use
/[?&]filter\[([^\].]+\.[^\]]+)]=/g
See the regex demo
Details
[?&] - a ? or &
filter\[ - a filter[ substring
([^\].]+\.[^\]]+) - Capturing group 1:
[^\].]+ - 1 or more chars other than ] and .
\. - a dot
[^\]]+ - 1 or more chars other than ]
]= - a ]= substring
JS demo:
var s = '/domain.com?filter[a.b.c]=value1&filter[a.b.d]=value2';
var rx = /[?&]filter\[([^\].]+\.[^\]]+)]=/g;
var m, res=[];
while(m=rx.exec(s)) {
res.push(m[1]);
}
console.log(res);
Note that in case & is never present as part of the query param value, you may add it to the negated character classes, [^\].]+ => [^\]&.]+, to make sure the regex does not overmatch across param values.
Since you need to extract text inside outer square brackets that may contain consecutive [...] substrings with at least 1 dot inside one of them, you may use a simpler regex with a bit more code:
var strs = ['/domain.com?filter[a.b.c]=value1&filter[a.b.d]=value2',
'/domain.com?filter[a.b.c]=value1&filter[a.b.d]=value2&filter[a][b.e]=value3',
'/domain.com?filter[a.b.c]=value1&filter[b][a.b.d][d]=value2&filter[a][b.e]=value3'];
var rx = /[?&]filter((?:\[[^\][]*])+)=/g;
for (var s of strs) {
var m, res=[];
console.log(s);
while(m=rx.exec(s)) {
if (m[1].indexOf('.') > -1) {
res.push(m[1].substring(1,m[1].length-1));
}
}
console.log(res);
console.log("--- NEXT STRING ----");
}
(?<=[\?&]filter\[)([^\]]+\.[^\]]+)+(?!>\]=)
This will give you only the groups you mentioned (a.b.c and a.b.d)
This part (?<=[\?&]filter\[) says recognise but don't capture [?&]filter before what you want and this part (?!>\]=) says recognise but don't capture after ] after what you want.
[^\]] this captures everything that isn't a square bracket

Extract word between '=' and '('

I have the following string
234234=AWORDHERE('sdf.'aa')
where I need to extract AWORDHERE.
Sometimes there can be space in between.
234234= AWORDHERE('sdf.'aa')
Can I do this with a regular expression?
Or should I do it manually by finding indexes?
The datasets are huge, so it's important to do it as fast as possible.
Try this regex:
\d+=\s?(\w+)\(
Check Demo
in Javascript it would like that:
var myString = "234234=AWORDHERE('sdf.'aa')";// or 234234= AWORDHERE('sdf.'aa')
var myRegexp = /\d+=\s?(\w+)\(/g;
var match = myRegexp.exec(myString);
console.log(match[1]); // AWORDHERE
You could do this at least three ways. You need to benchmark to see what's fastest.
Substring w/ indexes
function extract(from) {
var ixEq = from.indexOf("=");
var ixParen = from.indexOf("(");
return from.substring(ixEq + 1, ixParen);
}
.
Splits
function extract(from) {
var spEq = from.split("=");
var spParen = spEq[1].split("(");
return spParen[0];
}
Regex (demo)
Here is some sample regex you could use
/[^=]+=([^(]+).*/g
This says
[^=]+ - One or more character which is not an =
= - The = itself
( - creates a matching group so you can access your match in code
[^(]+ - One or more character which is not a (
) - closes the matching group
.* - Matches the rest of the line
the /g on the end tells it to perform the match on all lines.
Using look around you can search for string preceded by = and followed by ( as following.
Regex: (?<==)[A-Z ]+(?=\()
Explanation:
(?<==) checks if [A-Z ] is preceded by an =.
[A-Z ]+ matches your pattern.
(?=\() checks if matched pattern is followed by a (.
Regex101 Demo
var str = "234234= AWORDHERE('sdf.'aa')";
var regexp = /.*=\s+(\w+)\(.*\)/g;
var match = regexp.exec(str);
alert( match[1] );
I made my solution for this just a little more general than you asked for, but I don't think it takes much more time to execute. I didn't measure. If you need greater efficiency than this provides, comment and I or someone else can help you with that.
Here's what I did, using the command prompt of node:
> var s = "234234= AWORDHERE('sdf.'aa')"
undefined
> var a = s.match(/(\w+)=\s*(\w+)\s*\(.*/)
undefined
> a
[ '234234= AWORDHERE(\'sdf.\'aa\')',
'234234',
'AWORDHERE',
index: 0,
input: '234234= AWORDHERE(\'sdf.\'aa\')' ]
>
As you can see, this matches the number before the = in a[1], and it matches the AWORDHERE name as you requested in a[2]. This will work with any number (including zero) spaces before and/or after the =.

regular expression disallow in javascript

I want to select all literal letter s but not literal word \s
(?<!\\)s
works in c# but I'm not able to adjust it to work with javascript. how do I disallow literal \s in javascript matching all literal s?
for example int the expression: test\ss should match test\ss
Edit:
as Mitch says I want to catch all literal s that are not after a literal \
You can create DIY Boundaries ...
var r = 'test\\ss'.replace(/(^|[^\\])s/gi, '$1ş');
console.log(r); //=> 'teşt\sş'
Or use a workaround:
var r = 'test\\ss'.replace(/(\\)?s/gi, function($0,$1) { return $1 ? $0 : 'ş'; });
According to your comment in your question, try this then
/(?:\B|\s)s/g
Try this in your browsers console to confirm it works
re = /(?:\B|\s)s/g;
str = 'test\\ss';
res = str.match(re)
console.log(str.replace(re, '0'));
res will have 2 results in it

Extract all email addresses from bulk text using jquery

I'm having the this text below:
sdabhikagathara#rediffmail.com, "assdsdf" <dsfassdfhsdfarkal#gmail.com>, "rodnsdfald ferdfnson" <rfernsdfson#gmail.com>, "Affdmdol Gondfgale" <gyfanamosl#gmail.com>, "truform techno" <pidfpinfg#truformdftechnoproducts.com>, "NiTsdfeSh ThIdfsKaRe" <nthfsskare#ysahoo.in>, "akasdfsh kasdfstla" <akashkatsdfsa#yahsdfsfoo.in>, "Bisdsdfamal Prakaasdsh" <bimsdaalprakash#live.com>,; "milisdfsfnd ansdfasdfnsftwar" <dfdmilifsd.ensfdfcogndfdfatia#gmail.com>
Here emails are seprated by , or ;.
I want to extract all emails present above and store them in array. Is there any easy way using regex to get all emails directly?
Here's how you can approach this:
HTML
<p id="emails"></p>
JavaScript
var text = 'sdabhikagathara#rediffmail.com, "assdsdf" <dsfassdfhsdfarkal#gmail.com>, "rodnsdfald ferdfnson" <rfernsdfson#gmal.com>, "Affdmdol Gondfgale" <gyfanamosl#gmail.com>, "truform techno" <pidfpinfg#truformdftechnoproducts.com>, "NiTsdfeSh ThIdfsKaRe" <nthfsskare#ysahoo.in>, "akasdfsh kasdfstla" <akashkatsdfsa#yahsdfsfoo.in>, "Bisdsdfamal Prakaasdsh" <bimsdaalprakash#live.com>,; "milisdfsfnd ansdfasdfnsftwar" <dfdmilifsd.ensfdfcogndfdfatia#gmail.com> datum eternus hello+11#gmail.com';
function extractEmails (text)
{
return text.match(/([a-zA-Z0-9._+-]+#[a-zA-Z0-9._-]+\.[a-zA-Z0-9._-]+)/gi);
}
$("#emails").text(extractEmails(text).join('\n'));
Result
sdabhikagathara#rediffmail.com,dsfassdfhsdfarkal#gmail.com,rfernsdfson#gmal.com,gyfanamosl#gmail.com,pidfpinfg#truformdftechnoproducts.com,nthfsskare#ysahoo.in,akashkatsdfsa#yahsdfsfoo.in,bimsdaalprakash#live.com,dfdmilifsd.ensfdfcogndfdfatia#gmail.com,hello+11#gmail.com
Source: Extract email from bulk text (with Regular Expressions, JavaScript & jQuery)
Demo 1 Here
Demo 2 Here using jQuery's each iterator function
You can use this regex:
var re = /(([^<>()[\]\\.,;:\s#\"]+(\.[^<>()[\]\\.,;:\s#\"]+)*)|(\".+\"))#((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\])|(([a-zA-Z\-0-9]+\.)+[a-zA-Z]{2,}))/g;
You can extract the e-mails like this:
('sdabhikagathara#rediffmail.com, "assdsdf" <dsfassdfhsdfarkal#gmail.com>, "rodnsdfald ferdfnson" <rfernsdfson#gmail.com>, "Affdmdol Gondfgale" <gyfanamosl#gmail.com>, "truform techno" <pidfpinfg#truformdftechnoproducts.com>, "NiTsdfeSh ThIdfsKaRe" <nthfsskare#ysahoo.in>, "akasdfsh kasdfstla" <akashkatsdfsa#yahsdfsfoo.in>, "Bisdsdfamal Prakaasdsh" <bimsdaalprakash#live.com>,; "milisdfsfnd ansdfasdfnsftwar" <dfdmilifsd.ensfdfcogndfdfatia#gmail.com>').match(re);
//["sdabhikagathara#rediffmail.com", "dsfassdfhsdfarkal#gmail.com", "rfernsdfson#gmail.com", "gyfanamosl#gmail.com", "pidfpinfg#truformdftechnoproducts.com", "nthfsskare#ysahoo.in", "akashkatsdfsa#yahsdfsfoo.in", "bimsdaalprakash#live.com", "dfdmilifsd.ensfdfcogndfdfatia#gmail.com"]
Just an update to the accepted answer. This does not work for "plus" signs in the email address. GMAIL supports emailaddress+randomtext#gmail.com.
I've updated to:
return text.match(/([a-zA-Z0-9._+-]+#[a-zA-Z0-9._-]+\.[a-zA-Z0-9._-]+)/gi);
The bellow function is RFC2822 compliant according to Regexr.com
ES5 :
var extract = function(value) {
var reg = /[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*#(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?/g;
return value && value.match(reg);
}
ES6 :
const reg = /[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*#(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?/g
const extract = value => value && value.match(reg)
Regexr community source
function GetEmailsFromString(input) {
var ret = [];
var email = /\"([^\"]+)\"\s+\<([^\>]+)\>/g
var match;
while (match = email.exec(input))
ret.push({'name':match[1], 'email':match[2]})
return ret;
}
var str = '"Name one" <foo#domain.com>, ..., "And so on" <andsoon#gmx.net>'
var emails = GetEmailsFromString(str)
Source
You don't need jQuery for that; JavaScript itself supports regexes built-in.
Have a look at Regular Expression for more info on using regex with JavaScript.
Other than that, I think you'll find the exact answer to your question somewhere else on Stack Overflow - How to find out emails and names out of a string in javascript
const = regex = /\S+[a-z0-9]#[a-z0-9\.]+/img
"hello sean#example.com how are you? do you know bob#example.com?".match(regex)
A bunch of the answer in here are including lower/capital letters [a-zA-Z] AND the insensitive regex flag i, which is nonsense.
i modifier: insensitive. Case insensitive match (ignores case of [a-zA-Z]).
\d matches a digit (equivalent to [0-9])As domain extensions don't end with numeric characters).
As a result, combined with the \d token. we get a much more condenses and elegant sentence.
/[a-z\d._+-]+#[a-z\d._-]+/gi
Demo
let input = 'sdabhikagathara#rediffmail.com, "assdsdf" <dsfassdfhsdfarkal#gmail.com>, "rodnsdfald ferdfnson" <rfernsdfson#gmail.com>, "Affdmdol Gondfgale" <gyfanamosl#gmail.com>, "truform techno" <pidfpinfg#truformdftechnoproducts.com>, "NiTsdfeSh ThIdfsKaRe" <nthfsskare#ysahoo.in>, "akasdfsh kasdfstla" <akashkatsdfsa#yahsdfsfoo.in>, "Bisdsdfamal Prakaasdsh" <bimsdaalprakash#live.com>,; "milisdfsfnd ansdfasdfnsftwar" <dfdmilifsd.ensfdfcogndfdfatia#gmail.com>'
function get_email(string) {
return string.match(/[a-z\d._+-]+#[a-z\d._-]+/gi)
};
$('#output').html(get_email(input).join('; '));
<script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/3.3.1/jquery.min.js"></script>
<div id="output"></div>
See it live # https://regex101.com/r/OveC5B/1/

Prevent regex group from including previous character?

I'm attempting to get any words starting with #, such as in "#word", but only get the "word" value.
My sample text is:
#bob asodija qwwiq qwe #john #cat asdasd#qeqwe
My current regex is:
/\B#(\w+)/gi
This works perfectly, except that "#" is still being captured. The output of this match is:
"#bob"
"#john"
"#cat"
I've tried setting the # in a back reference, but its still including the # in the results.
/\B(?:#)(\w+)/gi
You want to use the match array returned from exec
var teststr = '#bob asodija qwwiq qwe #john #cat asdasd#qeqwe';
var exp = /\B#(\w+)/gi;
var match = exp.exec(teststr);
while(match != null){
alert(match[1]); // match 1 = 1st group captured
match = exp.exec(teststr);
}
Here's a neat trick using the String.replace method, which can take a function as the replacement.
var matches = [];
var str = "#bob asodija qwwiq qwe #john #cat asdasd#qeqwe";
str.replace( /\B#(\w+)/g, function( all, firstCaptureGroup ) {
matches.push( firstCaptureGroup );
});
console.log( matches ); //["bob", "john", "cat"]
Here is a better solution without additional calculations except of regular expression:
(?<=\B#)(\w+)

Categories

Resources