JS Regex: Remove anything (ONLY) after a word - javascript

I want to remove all of the symbols (The symbol depends on what I select at the time) after each word, without knowing what the word could be. But leave them in before each word.
A couple of examples:
!!hello! my! !!name!!! is !!bob!! should return...
!!hello my !!name is !!bob ; for !
and
$remove$ the$ targetted$# $$symbol$$# only $after$ a $word$ should return...
$remove the targetted# $$symbol# only $after a $word ; for $

You need to use capture groups and replace:
"!!hello! my! !!name!!! is !!bob!!".replace(/([a-zA-Z]+)(!+)/g, '$1');
Which works for your test string. To work for any generic character or group of characters:
var stripTrailing = trail => {
let regex = new RegExp(`([a-zA-Z0-9]+)(${trail}+)`, 'g');
return str => str.replace(regex, '$1');
};
Note that this fails on any characters that have meaning in a regular expression: []{}+*^$. etc. Escaping those programmatically is left as an exercise for the reader.
UPDATE
Per your comment I thought an explanation might help you, so:
First, there's no way in this case to replace only part of a match, you have to replace the entire match. So we need to find a pattern that matches, split it into the part we want to keep and the part we don't, and replace the whole match with the part of it we want to keep. So let's break up my regex above into multiple lines to see what's going on:
First we want to match any number of sequential alphanumeric characters, that would be the 'word' to strip the trailing symbol from:
( // denotes capturing group for the 'word'
[ // [] means 'match any character listed inside brackets'
a-z // list of alpha character a-z
A-Z // same as above but capitalized
0-9 // list of digits 0 to 9
]+ // plus means one or more times
)
The capturing group means we want to have access to just that part of the match.
Then we have another group
(
! // I used ES6's string interpolation to insert the arg here
+ // match that exclamation (or whatever) one or more times
)
Then we add the g flag so the replace will happen for every match in the target string, without the flag it returns after the first match. JavaScript provides a convenient shorthand for accessing the capturing groups in the form of automatically interpolated symbols, the '$1' above means 'insert contents of the first capture group here in this string'.
So, in the above, if you replaced '$1' with '$1$2' you'd see the same string you started with, if you did 'foo$2' you'd see foo in place of every word trailed by one or more !, etc.

Related

Unable to match regex for any character except ' and "

I've written a regex to match against the string
{{AB.group.one}}:"eighth",{{AB.group.TWO}}:"third",{{attr1111}}:"fourth","fifth":{{attr_22_2qq2}},"sixth":{{AB.group.three}},{{ab.group.fourth}}:"seventh","ninth":{{attr1111}}}
Regex:
/[^'"]({{2}[a-zA-Z0-9$_].*?}{2})[^'"]/gi
Breaking the regex above:
[^'"]: Start with a character which is neither ' nor ".
({{2}[a-zA-Z0-9$_].*?}{2}): Have exactly 2 {{, then any character in the range a-zA-Z0-9$_ . After that, exactly 2 }}
[^'"]: Any character except for ' and ".
Below matches are not the exact matches but the captured groups. I'll perform my operations on the captured groups so for simplicity, we can consider them as our matches.
Expected matches:
{{AB.group.one}}
{{AB.group.TWO}}
{{attr1111}}
{{attr_22_2qq2}}
{{AB.group.three}}
{{ab.group.fourth}}
{{attr1111}}}
Resultant matches:
{{AB.group.TWO}}
{{attr1111}}
{{attr_22_2qq2}}
{{AB.group.three}}
{{attr1111}}}
As you can see in the image below {{AB.group.one}} and {{ab.group.fourth}} do not match. I want them to match them as well.
I know the reasons why they aren't matching.
The reason why {{AB.group.one}} doesn't match is because [^'"] expects one character except for ' and " and I'm not providing one. If I replace [^'"] with ["'"]*, it'll work but in that case "{{AB.group.one}}" will match as well.
So, the problem statement is match any character(if there's any) before {{ and after }} but the character can't be ' or ".
The reason why {{ab.group.fourth}} doesn't match is because the character preceding this match i.e. , is part of another match. This is just my speculation, the reason could be something else. But if I include any character between {{AB.group.three}}, and {{ab.group.fourth}} (e.g. {{AB.group.three}}, {{ab.group.fourth}}), then the pattern matches. I have no idea how can I fix this.
Please help me in solving these two problems. Thank you.
Here is a regex based approach which seems to be working. First, we can string off all double-quoted terms, then replace islands of comma/colon with just a single comma separator. Finally, split on comma to generate an array of terms.
var input = "{{AB.group.one}}:\"eighth\",{{AB.group.TWO}}:\"third\",{{attr1111}}:\"fourth\",\"fifth\":{{attr_22_2qq2}},\"sixth\":{{AB.group.three}},{{ab.group.fourth}}:\"seventh\",\"ninth\":{{attr1111}}},\"blah\":\"stuff\",{{one}}:{{two}}";
var terms = input.replace(/\".*?\"/g, "").replace(/[,:]+/g, ",").split(",");
console.log(terms);
You were actually really close with what you had.
let input = '{{AB.group.one}}:"eighth",{{AB.group.TWO}}:"third",{{attr1111}}:"fourth","fifth":{{attr_22_2qq2}},"sixth":{{AB.group.three}},{{ab.group.fourth}}:"seventh","ninth":{{attr1111}}}'
let regex = /(?<=[^'"]?)({{2}[a-zA-Z0-9$_].*?}{2})(?=[^'"]?)/gi;
console.log(input.match(regex))
(?<=[^'"]?) is a positive lookbehind. Since the negated character set is used, we're checking that the character before the match is not ' or ". The question mark makes this optional - match zero or one of the previous token (the negated character set).
(?=[^'"]?) is a positive lookahead and checks the token immediately after the expression to ensure that it's not a ' or " (or that there is no token after the expression).
Another option, since lookbehinds aren't supported in every browser:
let input = '{{AB.group.one}}:"eighth",{{AB.group.TWO}}:"third",{{attr1111}}:"fourth","fifth":{{attr_22_2qq2}},"sixth":{{AB.group.three}},{{ab.group.fourth}}:"seventh","ninth":{{attr1111}}}'
let regex = /(?:[^{'"])?({{2}[a-zA-Z0-9$_].*?}{2})(?:[^}'"])?/gi
console.log([...input.matchAll(regex)].map(reg => reg[1]))
String.match() loses reference to capture groups when the global flag is passed, so only returns the "match". Since you're creating a capture group with ({{2}[a-zA-Z0-9$_].*?}{2}), if you wanted to just ensure the characters immediately surrounding the bracketed expression aren't quotation marks, you can just use non-capture groups for those optional checks.
(?:[^{'"])? is a non-capturing group, as is (?:[^}'"])?
Using String.matchAll, the first element of the arrays created for each match is the entire match, the second element is the first capturing group, etc. So the logic for mapping over [...input.matchAll(regex)] is just to collect the capturing group from each match.

Cut prefix and suffix from word by regex

can you help me write regex which gives me word without specified prefix and suffix?
Every word starts with dot (.) and ends with 'Zacher', e.g:
.mobileZacher => output should be mobile
.carZacher => output should be car
.StevenZacher => => output should be Steven
I tried this str.replace(/(?:.)|(?:Zacher)/, '') but it replace only dot
Just try with following regex:
str.replace(/\.(.+?)Zacher/, '$1')
We're looking for dot character, then match everything to the first occurence of Zacher and replace it with the string between those.
You can also replace (.+?) part (which accepts any char) with ([a-zA-Z]+?) to match only letters.
Or make it even case insensitive with i:
str.replace(/\.([a-z]+?)Zacher/i, '$1')
I would extract the group between . and Zacher using this RegEx:
\.(.*)Zacher
The backslash is used to escape the . character.
It will basically tell RegEx not to interpret the . as a jolly character (its standard function in RegEx) but as a simple ".".
Then I'd use it in a string replace.
Since we want to extract the 1st (and only) group extracted we'll use $1:
str.replace(/\.(.*)Zacher/, '$1')
If you want to know more this kind of result is obtained using RegEx grouping function.
Grouping function syntax makes uses of parenthesis (something_in_here).
Here's a brief explanation from Mozilla Documentation:
(x) Matches x and remembers the match. These are called capturing groups.
For example, /(foo)/ matches and remembers "foo" in "foo bar".
The capturing groups are numbered according to the order of left parentheses of capturing groups, starting from 1. The matched substring can be recalled from the resulting array's elements 2, ..., [n] or from the predefined RegExp object's properties $1, ..., $9.
Capturing groups have a performance penalty. If you don't need the matched substring to be recalled, prefer non-capturing parentheses (see below).
I suggest you to experiment with your RegEx using RegExr.
If you want learn more while doing exercises RegExOne was of great help for me.

JavaScript RegExp all chracters except dynamic series

So, I'm working on an opensource project as a way to expand my knowledge of JavaScript, and created an utility that processes strings dynamically, and replaces specific occurrences with other strings.
An example of this would be the following:
jdhfkjhs${c1}kdfjh$%^%$S654sgdsjh${c20}SUYTDRF^%$&*#(Y
And assuming I select the character '#', the RegExp processes it to be:
########${c1}####################${c20}###############
The problem I am facing is my RegExp /[^\$\{c\d\}]/g is also matching any of the characters inside of the RegExp, so a string such as _,met$$$$$1234{}cccgg. will be returned as #####$$$$$1234{}ccc###
Is there a way I can catch such a dynamic group with JavaScript, or should I find an alternative way to achieve what I am doing?
For some context, the project code can be found here.
You may match the group and capture it to restore later, and just match any char (with . if no line breaks are expected or with [^] / [\s\S]):
var rx = /(\${c\d+})|./g;
var str = 'jdhfkjhs\${c1}kdfjh\$%^%\$S654sgdsjh\${c20}SUYTDRF^%\$&*#(Y';
var result = str.replace(rx, function ($0,$1) {
return $1 ? $1 : '#';
});
console.log(result);
Details:
(\${c\d+}) - Group 1: a literal ${c substring, then 1+ digits and a literal }
| - or
. - any char but a line break char (or any char if you use [^] or [\s\S]).
In the replacement, $0 stands for the whole match, $1 stands for the contents of the first capturing group. If the $1 is set, it is re-inserted to the resulting string, else, the char is replaced with #.

regex exact match multiple search words using jQuery

I'm using jQuery. I have to check if a given list of words are in a paragraph or not. I want the exact match of a word or a phrase(whole word match).ie, if i search for 'be' in 'Be a bee', only one match is there. I have done like this.
var searchText="tool,media,be,team";
var regexExactMatch = new RegExp('\^' + searchText.split(",").join("|") + '\$');
if (regexExactMatch.test(item.Name))
{
//Found
}
It is working for one search term, ie, without any comma (eg: media).
But for comma separated search, it will break.
How to do a exact match search for multiple search terms. I'm very very new to regex. Also I have to do the same search for integers and date (MM/dd/yyyy). Thanks in advance.
For full input string match use
new RegExp('^(?:' + searchText.split(",").join("|") + ')$');
^^^ ^
For a whole word search, replace ^ and $ with \b:
new RegExp('\\b(?:' + searchText.split(",").join("|") + ')\\b');
Otherwise, the anchors are applied respectively to the first and last alternatives only (i.e. your regex will look like /^tool|media|be|team$/ looking for tool at the beginning only, media and be anywhere in the string and team only at the end of the string).
Note I am using (?:...) non-capturing group since grouping is only necessary here, not capturing (no storing of the submatch). If you need to access the matched text, you can access the 0th group that equals the whole match.
Also, you do not need those \s before ^ and $, they are not necessary at all and are ignored in the constructor notation since there are no escape sequences like \^ and \$.
Remove the ^ from the beginning and $ from the end of the RegExp. Like this :
var regexExactMatch = new RegExp(searchText.split(",").join("|"));
Reason
^ will set the condition that the matched text need to be at the beginning of the string and $ set the condition that the matched text need to be at the end of the string, which can only happen if there is only that text in the string.

Regexp: excluding a word but including non-standard punctuation

I want to find strings that contain words in a particular order, allowing non-standard characters in between the words but excluding a particular word or symbol.
I'm using javascript's replace function to find all instances and put into an array.
So, I want select...from, with anything except 'from' in between the words. Or I can separate select...from from select...from (, as long as I exclude nesting. I think the answer is the same for both, i.e. how do I write: find x and not y within the same regexp?
From the internet, I feel this should work: /\bselect\b^(?!from).*\bfrom\b/gi but this finds no matches.
This works to find all select...from: /\bselect\b[0-9a-zA-Z#\(\)\[\]\s\.\*,%_+-]*?\bfrom\b/gi but modifying it to exclude the parenthesis "(" at the end prevents any matches: /\bselect\b[0-9a-zA-Z#\(\)\[\]\s\.\*,%_+-]*?\bfrom\b\s*^\(/gi
Can anyone tell me how to exclude words and symbols within this regexp?
Many thanks
Emma
Edit: partial string input:
left outer join [stage].[db].[table14] o on p.Project_id = o.project_id
left outer join
(
select
different_id
,sum(costs) - ( sum(brushes) + sum(carpets) + sum(fabric) + sum(other) + sum(chairs)+ sum(apples) ) as overallNumber
from
(
select ace from [stage].db.[table18] J
Javascript:
sequel = stringInputAsAbove;
var tst = sequel.replace(/\bselect\b[\s\S]*?\bfrom\b/gi, function(a,b) { console.log('match: '+a); selects.push(b); return a; });
console.log(selects);
Console.log(selects) should print an array of numbers, where each number is the starting character of a select...from. This works for the second regexp I gave in my info, printing: [95, 251]. Your \s\S variation does the same, #stribizhev.
The first example ^(?!from).* should do likewise but returns [].
The third example \s*^\( should return 251 only but returns []. However I have just noticed that the positive expression \s*\( does give 95, so some progress! It's the negatives I'm getting wrong.
Your \bselect\b^(?!from).*\bfrom\b regex doesn't work as expected because:
^ means here beginning of a line, not negation of next part, so
the \bselect\b^ means, select word followed by beginning of a
line. After removal of ^ regex start to match something
(DEMO) but it is still invalid.
in multiline text .* without modification will not match new line,
so regex will match only select...from in single lines, but if you
change it for (.|\n)* (as a simple example) it will match
multiline, but still invalid
the * is greede quantifire, so it will match as much a possible,
but if you use reluctant quantifire *?, regex will match to first
occurance of from word, and int will start to return relativly
correct result.
\bselect\b(?!from) means match separate select word which is not
directly followed by separate from word, so it would be
selectfrom somehow composed of separate words (because
select\bfrom) so (?!from) doesn't work and it is redundant
In effect you will get regex very similar to what Stribizhev gave you: \bselect\b(.|\n)*?\bfrom\b
In third expression you meke same mistake: \bselect\b[0-9a-zA-Z#\(\)\[\]\s\.\*,%_+-]*?\bfrom\b\s*^\( using ^ as (I assume) a negation, not beginning of a line. Remove ^ and you will again get relativly valid result (match from select through from to closing parathesis ) ).
Your second regex works similar to \bselect\b(.|\n)*?\bfrom\b or \bselect\b[\s\S]*?\bfrom\b.
I wrote "relativly valid result", as I also think, that parsing SQL with regex could be very camplicated, so I am not sure if it will work in every case.
You can also try to use positive lookahead to match just position in text, like:
(?=\bselect\b(?:.|\n)*?\bfrom\b)
DEMO - the () was added to regex just to return beginning index of match in groups, so it would be easier to check it validity
Negation in regex
We use ^ as negation in character class, for example [^a-z] means match anything but not letter, so it will match number, symbol, whitespace, etc, but not letter from range a to z (Look here). But this negation is on a level of single character. I you use [^from] it will prevent regex from matching characters f,r,o and m (demo). Also the [^from]{4} will avoid matching from but also form, morf, etc.
To exlude the whole word from matching by regex, you need to use negative look ahead, like (?!from), which will fail to match, if there will be chosen word from fallowing given position. To avoid matching whole line containing from you could use ^(?!.*from.*).+$ (demo).
However in your case, you don't need to use this construction, because if you replace greedy quantifire .*\bfrom with .*?\bfrom it will match to first occurance of this word. Whats more it would couse problems. Take a look on this regex, it will not match anything because (?![\s\S]*from[\s\S]*) is not restricted by anything, so it will match only if there is no from after select, but we want to match also from! in effect this regex try to match and exclude from at once, and fail. so the (?!.*word.*) construction works much better to exclude matching line with given word.
So what to do if we don't what to match a word in a fragment of a match? I think select\b([^f]|f(?!rom))*?\bfrom\b is a good solution. With ([^f]|f(?!rom))*? it will match everything between select and from, but will not exclude from.
But if you would like to match only select...from not followed by ( then it is good idea to use (?!\() like. But in your regex (multiline, use of (.|\n)*? or [\s\S]*? it will cause to match up to next select...from part, because reluctant quantifire will chenge a plece where it need to match to make whole regex . In my opinion, good solution would be to use again:
select\b([^f]|f(?!rom))*?\bfrom\b(?!\s*?\()
which will not overlap additional select..from and will not match if there is \( after select...from - check it here

Categories

Resources