Regex between two text elements without capturing those elements? Javascript - javascript

So, I'm writing a google script, and I've been using the regex101 tool to troubleshoot it, but I'm having absolutely no luck.
Here's the regex I need:
Is there a way in JS to isolate "randomstuff" from "myemail+randomstuff#gmail.com" without getting the + or # included in the returned result?
I've tried using my more limited regex skills to do so, but the end result always includes '+' and '#'.

You can use the match function with captured groups as
matched = "myemail+randomstuff#gmail.com".match(/\+(.*?)#/);
matched[1]
// Outputs
// randomstuff
\+ Matches +
(.*?) Matches anything, captured in group 1
# Matches #
matched[1] The match will return an Array containing the matched results. The contentes of the array will be the captured groups in the regex match along( which are indexed by the the capture group number. Here the group is one hence we used matched[1]) and the entire match( indexed by 0)

var originalString = "myemail+randomstuff#gmail.com";
/\+([^#]+)#/.exec(originalString )[1];
This will return
"randomstuff"

I used this regex http://regexr.com/3b1fa
Here's a fiddle: http://jsfiddle.net/rt4h215g/
/\w+(?=#)/g
the (?=#) uses positive look ahead to check and see when the # symbol presents itself without including it in the match. Since + isn't considered a word character you can simply use /w+ to get the characters before the # symbol and after the +.

Related

Is it possible to replace only in a match group - REGEX

I have always tried to avoid regex because I simply can't get my head around how it really works. Most of the time I manage to get the expected result by luck more than actual skill.
However, I am trying to replace any whitespace character in a bundled webpack source with the string-replace-loader or the String-Replace-Plugin (which ever turns out easier). But before I try to do this on the actual source, I want to understand the regex which I am trying to perform.
The problem
I have query strings which always start with dqlParse followed by \n then maybe some \t and other whitespace characters. I have already managed to get my whitespace characters removed in a test string if I match this
/\s+\s/g
and simply replace it with " ".
Since I don't have control over all the strings within my bundle, I thought I can indicate which string is set for replacement by adding dqlParse infront of the string and then match and replace by groups. Unfortunately no luck so far.
What I have tried
So far I have tried something like this
/(^dqlParse)(.*)/g
which basically does what it should since match group $1 is dqlParse and match group $2 is the rest of the string where I would like to do the replacement.
Is it possible to replace only in the second match group?
Thanks! Any help appreciated!
Yes, you can do that with String#replace:
text = text.replace(/^(dqlParse)(.*)/g, function(_, x, y) {return x + y.replace(/\s{2,}/g, ' ');})
This will match and capture dqlParse into Group 1 (x variable in the callback function), and the rest of the line will get captured into Group 2 (y in the callback function). So, once the match is found, the replacement will be the concatenation of x and y with all two or more whitespace chunks replaced with a single space.

Unable to match regex for any character except ' and "

I've written a regex to match against the string
{{AB.group.one}}:"eighth",{{AB.group.TWO}}:"third",{{attr1111}}:"fourth","fifth":{{attr_22_2qq2}},"sixth":{{AB.group.three}},{{ab.group.fourth}}:"seventh","ninth":{{attr1111}}}
Regex:
/[^'"]({{2}[a-zA-Z0-9$_].*?}{2})[^'"]/gi
Breaking the regex above:
[^'"]: Start with a character which is neither ' nor ".
({{2}[a-zA-Z0-9$_].*?}{2}): Have exactly 2 {{, then any character in the range a-zA-Z0-9$_ . After that, exactly 2 }}
[^'"]: Any character except for ' and ".
Below matches are not the exact matches but the captured groups. I'll perform my operations on the captured groups so for simplicity, we can consider them as our matches.
Expected matches:
{{AB.group.one}}
{{AB.group.TWO}}
{{attr1111}}
{{attr_22_2qq2}}
{{AB.group.three}}
{{ab.group.fourth}}
{{attr1111}}}
Resultant matches:
{{AB.group.TWO}}
{{attr1111}}
{{attr_22_2qq2}}
{{AB.group.three}}
{{attr1111}}}
As you can see in the image below {{AB.group.one}} and {{ab.group.fourth}} do not match. I want them to match them as well.
I know the reasons why they aren't matching.
The reason why {{AB.group.one}} doesn't match is because [^'"] expects one character except for ' and " and I'm not providing one. If I replace [^'"] with ["'"]*, it'll work but in that case "{{AB.group.one}}" will match as well.
So, the problem statement is match any character(if there's any) before {{ and after }} but the character can't be ' or ".
The reason why {{ab.group.fourth}} doesn't match is because the character preceding this match i.e. , is part of another match. This is just my speculation, the reason could be something else. But if I include any character between {{AB.group.three}}, and {{ab.group.fourth}} (e.g. {{AB.group.three}}, {{ab.group.fourth}}), then the pattern matches. I have no idea how can I fix this.
Please help me in solving these two problems. Thank you.
Here is a regex based approach which seems to be working. First, we can string off all double-quoted terms, then replace islands of comma/colon with just a single comma separator. Finally, split on comma to generate an array of terms.
var input = "{{AB.group.one}}:\"eighth\",{{AB.group.TWO}}:\"third\",{{attr1111}}:\"fourth\",\"fifth\":{{attr_22_2qq2}},\"sixth\":{{AB.group.three}},{{ab.group.fourth}}:\"seventh\",\"ninth\":{{attr1111}}},\"blah\":\"stuff\",{{one}}:{{two}}";
var terms = input.replace(/\".*?\"/g, "").replace(/[,:]+/g, ",").split(",");
console.log(terms);
You were actually really close with what you had.
let input = '{{AB.group.one}}:"eighth",{{AB.group.TWO}}:"third",{{attr1111}}:"fourth","fifth":{{attr_22_2qq2}},"sixth":{{AB.group.three}},{{ab.group.fourth}}:"seventh","ninth":{{attr1111}}}'
let regex = /(?<=[^'"]?)({{2}[a-zA-Z0-9$_].*?}{2})(?=[^'"]?)/gi;
console.log(input.match(regex))
(?<=[^'"]?) is a positive lookbehind. Since the negated character set is used, we're checking that the character before the match is not ' or ". The question mark makes this optional - match zero or one of the previous token (the negated character set).
(?=[^'"]?) is a positive lookahead and checks the token immediately after the expression to ensure that it's not a ' or " (or that there is no token after the expression).
Another option, since lookbehinds aren't supported in every browser:
let input = '{{AB.group.one}}:"eighth",{{AB.group.TWO}}:"third",{{attr1111}}:"fourth","fifth":{{attr_22_2qq2}},"sixth":{{AB.group.three}},{{ab.group.fourth}}:"seventh","ninth":{{attr1111}}}'
let regex = /(?:[^{'"])?({{2}[a-zA-Z0-9$_].*?}{2})(?:[^}'"])?/gi
console.log([...input.matchAll(regex)].map(reg => reg[1]))
String.match() loses reference to capture groups when the global flag is passed, so only returns the "match". Since you're creating a capture group with ({{2}[a-zA-Z0-9$_].*?}{2}), if you wanted to just ensure the characters immediately surrounding the bracketed expression aren't quotation marks, you can just use non-capture groups for those optional checks.
(?:[^{'"])? is a non-capturing group, as is (?:[^}'"])?
Using String.matchAll, the first element of the arrays created for each match is the entire match, the second element is the first capturing group, etc. So the logic for mapping over [...input.matchAll(regex)] is just to collect the capturing group from each match.

JS Regex: Remove anything (ONLY) after a word

I want to remove all of the symbols (The symbol depends on what I select at the time) after each word, without knowing what the word could be. But leave them in before each word.
A couple of examples:
!!hello! my! !!name!!! is !!bob!! should return...
!!hello my !!name is !!bob ; for !
and
$remove$ the$ targetted$# $$symbol$$# only $after$ a $word$ should return...
$remove the targetted# $$symbol# only $after a $word ; for $
You need to use capture groups and replace:
"!!hello! my! !!name!!! is !!bob!!".replace(/([a-zA-Z]+)(!+)/g, '$1');
Which works for your test string. To work for any generic character or group of characters:
var stripTrailing = trail => {
let regex = new RegExp(`([a-zA-Z0-9]+)(${trail}+)`, 'g');
return str => str.replace(regex, '$1');
};
Note that this fails on any characters that have meaning in a regular expression: []{}+*^$. etc. Escaping those programmatically is left as an exercise for the reader.
UPDATE
Per your comment I thought an explanation might help you, so:
First, there's no way in this case to replace only part of a match, you have to replace the entire match. So we need to find a pattern that matches, split it into the part we want to keep and the part we don't, and replace the whole match with the part of it we want to keep. So let's break up my regex above into multiple lines to see what's going on:
First we want to match any number of sequential alphanumeric characters, that would be the 'word' to strip the trailing symbol from:
( // denotes capturing group for the 'word'
[ // [] means 'match any character listed inside brackets'
a-z // list of alpha character a-z
A-Z // same as above but capitalized
0-9 // list of digits 0 to 9
]+ // plus means one or more times
)
The capturing group means we want to have access to just that part of the match.
Then we have another group
(
! // I used ES6's string interpolation to insert the arg here
+ // match that exclamation (or whatever) one or more times
)
Then we add the g flag so the replace will happen for every match in the target string, without the flag it returns after the first match. JavaScript provides a convenient shorthand for accessing the capturing groups in the form of automatically interpolated symbols, the '$1' above means 'insert contents of the first capture group here in this string'.
So, in the above, if you replaced '$1' with '$1$2' you'd see the same string you started with, if you did 'foo$2' you'd see foo in place of every word trailed by one or more !, etc.

regex exact match multiple search words using jQuery

I'm using jQuery. I have to check if a given list of words are in a paragraph or not. I want the exact match of a word or a phrase(whole word match).ie, if i search for 'be' in 'Be a bee', only one match is there. I have done like this.
var searchText="tool,media,be,team";
var regexExactMatch = new RegExp('\^' + searchText.split(",").join("|") + '\$');
if (regexExactMatch.test(item.Name))
{
//Found
}
It is working for one search term, ie, without any comma (eg: media).
But for comma separated search, it will break.
How to do a exact match search for multiple search terms. I'm very very new to regex. Also I have to do the same search for integers and date (MM/dd/yyyy). Thanks in advance.
For full input string match use
new RegExp('^(?:' + searchText.split(",").join("|") + ')$');
^^^ ^
For a whole word search, replace ^ and $ with \b:
new RegExp('\\b(?:' + searchText.split(",").join("|") + ')\\b');
Otherwise, the anchors are applied respectively to the first and last alternatives only (i.e. your regex will look like /^tool|media|be|team$/ looking for tool at the beginning only, media and be anywhere in the string and team only at the end of the string).
Note I am using (?:...) non-capturing group since grouping is only necessary here, not capturing (no storing of the submatch). If you need to access the matched text, you can access the 0th group that equals the whole match.
Also, you do not need those \s before ^ and $, they are not necessary at all and are ignored in the constructor notation since there are no escape sequences like \^ and \$.
Remove the ^ from the beginning and $ from the end of the RegExp. Like this :
var regexExactMatch = new RegExp(searchText.split(",").join("|"));
Reason
^ will set the condition that the matched text need to be at the beginning of the string and $ set the condition that the matched text need to be at the end of the string, which can only happen if there is only that text in the string.

Regexp: excluding a word but including non-standard punctuation

I want to find strings that contain words in a particular order, allowing non-standard characters in between the words but excluding a particular word or symbol.
I'm using javascript's replace function to find all instances and put into an array.
So, I want select...from, with anything except 'from' in between the words. Or I can separate select...from from select...from (, as long as I exclude nesting. I think the answer is the same for both, i.e. how do I write: find x and not y within the same regexp?
From the internet, I feel this should work: /\bselect\b^(?!from).*\bfrom\b/gi but this finds no matches.
This works to find all select...from: /\bselect\b[0-9a-zA-Z#\(\)\[\]\s\.\*,%_+-]*?\bfrom\b/gi but modifying it to exclude the parenthesis "(" at the end prevents any matches: /\bselect\b[0-9a-zA-Z#\(\)\[\]\s\.\*,%_+-]*?\bfrom\b\s*^\(/gi
Can anyone tell me how to exclude words and symbols within this regexp?
Many thanks
Emma
Edit: partial string input:
left outer join [stage].[db].[table14] o on p.Project_id = o.project_id
left outer join
(
select
different_id
,sum(costs) - ( sum(brushes) + sum(carpets) + sum(fabric) + sum(other) + sum(chairs)+ sum(apples) ) as overallNumber
from
(
select ace from [stage].db.[table18] J
Javascript:
sequel = stringInputAsAbove;
var tst = sequel.replace(/\bselect\b[\s\S]*?\bfrom\b/gi, function(a,b) { console.log('match: '+a); selects.push(b); return a; });
console.log(selects);
Console.log(selects) should print an array of numbers, where each number is the starting character of a select...from. This works for the second regexp I gave in my info, printing: [95, 251]. Your \s\S variation does the same, #stribizhev.
The first example ^(?!from).* should do likewise but returns [].
The third example \s*^\( should return 251 only but returns []. However I have just noticed that the positive expression \s*\( does give 95, so some progress! It's the negatives I'm getting wrong.
Your \bselect\b^(?!from).*\bfrom\b regex doesn't work as expected because:
^ means here beginning of a line, not negation of next part, so
the \bselect\b^ means, select word followed by beginning of a
line. After removal of ^ regex start to match something
(DEMO) but it is still invalid.
in multiline text .* without modification will not match new line,
so regex will match only select...from in single lines, but if you
change it for (.|\n)* (as a simple example) it will match
multiline, but still invalid
the * is greede quantifire, so it will match as much a possible,
but if you use reluctant quantifire *?, regex will match to first
occurance of from word, and int will start to return relativly
correct result.
\bselect\b(?!from) means match separate select word which is not
directly followed by separate from word, so it would be
selectfrom somehow composed of separate words (because
select\bfrom) so (?!from) doesn't work and it is redundant
In effect you will get regex very similar to what Stribizhev gave you: \bselect\b(.|\n)*?\bfrom\b
In third expression you meke same mistake: \bselect\b[0-9a-zA-Z#\(\)\[\]\s\.\*,%_+-]*?\bfrom\b\s*^\( using ^ as (I assume) a negation, not beginning of a line. Remove ^ and you will again get relativly valid result (match from select through from to closing parathesis ) ).
Your second regex works similar to \bselect\b(.|\n)*?\bfrom\b or \bselect\b[\s\S]*?\bfrom\b.
I wrote "relativly valid result", as I also think, that parsing SQL with regex could be very camplicated, so I am not sure if it will work in every case.
You can also try to use positive lookahead to match just position in text, like:
(?=\bselect\b(?:.|\n)*?\bfrom\b)
DEMO - the () was added to regex just to return beginning index of match in groups, so it would be easier to check it validity
Negation in regex
We use ^ as negation in character class, for example [^a-z] means match anything but not letter, so it will match number, symbol, whitespace, etc, but not letter from range a to z (Look here). But this negation is on a level of single character. I you use [^from] it will prevent regex from matching characters f,r,o and m (demo). Also the [^from]{4} will avoid matching from but also form, morf, etc.
To exlude the whole word from matching by regex, you need to use negative look ahead, like (?!from), which will fail to match, if there will be chosen word from fallowing given position. To avoid matching whole line containing from you could use ^(?!.*from.*).+$ (demo).
However in your case, you don't need to use this construction, because if you replace greedy quantifire .*\bfrom with .*?\bfrom it will match to first occurance of this word. Whats more it would couse problems. Take a look on this regex, it will not match anything because (?![\s\S]*from[\s\S]*) is not restricted by anything, so it will match only if there is no from after select, but we want to match also from! in effect this regex try to match and exclude from at once, and fail. so the (?!.*word.*) construction works much better to exclude matching line with given word.
So what to do if we don't what to match a word in a fragment of a match? I think select\b([^f]|f(?!rom))*?\bfrom\b is a good solution. With ([^f]|f(?!rom))*? it will match everything between select and from, but will not exclude from.
But if you would like to match only select...from not followed by ( then it is good idea to use (?!\() like. But in your regex (multiline, use of (.|\n)*? or [\s\S]*? it will cause to match up to next select...from part, because reluctant quantifire will chenge a plece where it need to match to make whole regex . In my opinion, good solution would be to use again:
select\b([^f]|f(?!rom))*?\bfrom\b(?!\s*?\()
which will not overlap additional select..from and will not match if there is \( after select...from - check it here

Categories

Resources