Cut prefix and suffix from word by regex - javascript

can you help me write regex which gives me word without specified prefix and suffix?
Every word starts with dot (.) and ends with 'Zacher', e.g:
.mobileZacher => output should be mobile
.carZacher => output should be car
.StevenZacher => => output should be Steven
I tried this str.replace(/(?:.)|(?:Zacher)/, '') but it replace only dot

Just try with following regex:
str.replace(/\.(.+?)Zacher/, '$1')
We're looking for dot character, then match everything to the first occurence of Zacher and replace it with the string between those.
You can also replace (.+?) part (which accepts any char) with ([a-zA-Z]+?) to match only letters.
Or make it even case insensitive with i:
str.replace(/\.([a-z]+?)Zacher/i, '$1')

I would extract the group between . and Zacher using this RegEx:
\.(.*)Zacher
The backslash is used to escape the . character.
It will basically tell RegEx not to interpret the . as a jolly character (its standard function in RegEx) but as a simple ".".
Then I'd use it in a string replace.
Since we want to extract the 1st (and only) group extracted we'll use $1:
str.replace(/\.(.*)Zacher/, '$1')
If you want to know more this kind of result is obtained using RegEx grouping function.
Grouping function syntax makes uses of parenthesis (something_in_here).
Here's a brief explanation from Mozilla Documentation:
(x) Matches x and remembers the match. These are called capturing groups.
For example, /(foo)/ matches and remembers "foo" in "foo bar".
The capturing groups are numbered according to the order of left parentheses of capturing groups, starting from 1. The matched substring can be recalled from the resulting array's elements 2, ..., [n] or from the predefined RegExp object's properties $1, ..., $9.
Capturing groups have a performance penalty. If you don't need the matched substring to be recalled, prefer non-capturing parentheses (see below).
I suggest you to experiment with your RegEx using RegExr.
If you want learn more while doing exercises RegExOne was of great help for me.

Related

JS Regex: Remove anything (ONLY) after a word

I want to remove all of the symbols (The symbol depends on what I select at the time) after each word, without knowing what the word could be. But leave them in before each word.
A couple of examples:
!!hello! my! !!name!!! is !!bob!! should return...
!!hello my !!name is !!bob ; for !
and
$remove$ the$ targetted$# $$symbol$$# only $after$ a $word$ should return...
$remove the targetted# $$symbol# only $after a $word ; for $
You need to use capture groups and replace:
"!!hello! my! !!name!!! is !!bob!!".replace(/([a-zA-Z]+)(!+)/g, '$1');
Which works for your test string. To work for any generic character or group of characters:
var stripTrailing = trail => {
let regex = new RegExp(`([a-zA-Z0-9]+)(${trail}+)`, 'g');
return str => str.replace(regex, '$1');
};
Note that this fails on any characters that have meaning in a regular expression: []{}+*^$. etc. Escaping those programmatically is left as an exercise for the reader.
UPDATE
Per your comment I thought an explanation might help you, so:
First, there's no way in this case to replace only part of a match, you have to replace the entire match. So we need to find a pattern that matches, split it into the part we want to keep and the part we don't, and replace the whole match with the part of it we want to keep. So let's break up my regex above into multiple lines to see what's going on:
First we want to match any number of sequential alphanumeric characters, that would be the 'word' to strip the trailing symbol from:
( // denotes capturing group for the 'word'
[ // [] means 'match any character listed inside brackets'
a-z // list of alpha character a-z
A-Z // same as above but capitalized
0-9 // list of digits 0 to 9
]+ // plus means one or more times
)
The capturing group means we want to have access to just that part of the match.
Then we have another group
(
! // I used ES6's string interpolation to insert the arg here
+ // match that exclamation (or whatever) one or more times
)
Then we add the g flag so the replace will happen for every match in the target string, without the flag it returns after the first match. JavaScript provides a convenient shorthand for accessing the capturing groups in the form of automatically interpolated symbols, the '$1' above means 'insert contents of the first capture group here in this string'.
So, in the above, if you replaced '$1' with '$1$2' you'd see the same string you started with, if you did 'foo$2' you'd see foo in place of every word trailed by one or more !, etc.

Select a character if some character from a list is before the character

I have this regular expression:
/([a-záäéěíýóôöúüůĺľŕřčšťžňď])-$\s*/gmi
This regex selects č- from my text:
sme! a Želiezovce 2015: Spoloíč-
ne pre Európu. Oslávili aj 940.
But I want to select only - (without č) (if some character from the list [a-záäéěíýóôöúüůĺľŕřčšťžňď] is before the -).
In other languages you would use a lookbehind
/(?<=[a-záäéěíýóôöúüůĺľŕřčšťžňď])-$\s*/gmi
This matches -$\s* only if it's preceded by one of the characters in the list.
However, Javascript doesn't have lookbehind, so the workaround is to use a capturing group for the part of the regular expression after it.
var match = /[a-záäéěíýóôöúüůĺľŕřčšťžňď](-$\s*)/gmi.match(string);
When you use this, match[1] will contain the part of the string beginning with the hyphen.
First, in regex everything you put in parenthesis will be broken down in the matching process, so that the matches array will contain the full matching string at it's 0 position, followed by all of the regex's parenthesis from left to right.
/[a-záäéěíýóôöúüůĺľŕřčšťžňď](-)$\s*/gmi
Would have returned the following matches for you string: ["č-", "-"] so you can extract the specific data you need from your match.
Also, the $ character indicates in regex the end of the line and you are using the multiline flag, so technically this part \s* is just being ignored as nothing can appear in a line after the end of it.
The correct regex should be /[a-záäéěíýóôöúüůĺľŕřčšťžňď](-)$/gmi

Regular expression with optional group wrapping multiple groups returns undefined for branches not taken

I'm trying to write a regular expression in JavaScript that returns the first quoted or non-quoted word in a string without the quotes (if present). For example:
'"quoted phrase" followed by text' => 'quoted phrase'
'phrase without quotes followed by text' => 'phrase'
My regular expression currently is this: (?:"([^"]*)"|([^"\s]+))
However, what I'm noticing is that the output always includes two match groups, one that's always undefined, presumably from the branch that wasn't taken (i.e. it's the first match if the first word is not quoted, second otherwise).
What kind of changes can I make to avoid getting the undefined match group and still get the quote-stripping behavior?
NOTE: The words are NOT strictly "word-only" (e.g. alphanumeric) characters. They can include non-word characters, just not the " character.
You can't do what you want using just the regex. Other regex flavors have power features like the Branch Reset Group (which causes capturing groups in each branch to start with the same number):
(?|"([^"]*)"|([^"\s]+))
...or they let you use the same name for more than one group:
(?:"(?<token>[^"]*)"|(?<token>[^"\s]+))
...but JavaScript has nothing. Of all the regex flavors associated with programming languages (Perl, Python, Java, etc.), JavaScript is the most lacking in useful features. You just have to go through all the groups and find the one that's not undefined.
You are getting extra matches because of the nested groupings you have defined inside your regular expression. The corrected expression should be
(?:"[^"]*"|[^"\s]+) which would produce the following for your inputs (without string quotes)
'"quoted phrase" followed by text' => "quoted phrase"
'phrase without quotes followed by text' => phrase
You need to use ^ (Start anchor) to match the first word and simply use \w+ to match the word also i think you don't need the main group :
"([^"]*)"|(^\w+)
Demo

RegEx not working as expected /(d).\1/

I am beginner in RegEx so I am reading the info page of regEx on stackoverflow.
eg: /(d).\1/ matches and captures 'dad' in "abcdadef" while
/(?:.d){2}/ matches but doesn't capture 'cdad'.
I tried :-
var pattern=/(d).\1/
var val="abcdadef";
console.log(pattern.exec(val));
It shows array of ["dad","d"] but i don't know why ?
As said in info it just only capture the "dad" why it is capturing two values in array?.
And what is the use of '\1' in the end of pattern ?
Please provide me more info how to use it.
Thanks :-)
when you use (), you're telling regex to match the in between () and store it as a capturing group. Each match will have capturing groups of its own. Try your expression here. A regex match object is normally a collection that contains the entire match of the regex followed by capturing groups of that match.
Edit: As per your comment below, here's an another pattern (m).\1 and the text upon which we're executing the regex is mum.
In this example, regex will attempt to do the following:
match the literal m and hence we used (), it's going to store the match in a capturing group. This capturing group will make it to the match collection later.
. will match any character other than newline so in our case, it will match the literal u.
\1 will attempt to match the next character using the first matching group as a pattern and that would be the literal m in our case.
The final result will be the regex match of mum and the only capturing group would be m.

how to replace all occurrances of "\\" string in java script

This seems a very simple question but I haven't been able to get this to work.
How do I convert the following string:
var origin_str = "abc/!/!"; // Original string
var modified_str = "abc!!"; // replaced string
I tried this:
console.log(origin_str.replace(/\\/,''));
This only removes the first occurrence of backslash. I want to replaceAll. I followed this instruction in SO: How to replace all occurrences of a string in JavaScript?
origin_str.replace(new RegExp('\\', 'g'), '');
This code throws me an error SyntaxError: Invalid regular expression: /\/: \ at end of pattern. What's the regex for removing backslash in javascript.
A quick basic overview of regular expressions in JavaScript
When using regular expressions you can define the expression on two ways.
Either directly in the function or variable by using /regular expression/
Or by using the regExp contructor: new RegExp('regular expression').
Please note the difference between the two ways of defining. In the first the search pattern is encapsuled by forward slashes, while in the second one the search pattern is passed as a string.
Remember that regular expressions is in fact a search language with it's own syntax. Some characters are used to define actions: /, \, ^, $, . (dot), |, ?, *, +, (, ), [, {, ', ". These characters are called metacharacters and need to be escaped if you want them to be part of the search pattern. If not they will be treated as an option or generate script errors. Escaping is done by using the backslash. E.g. \\ escapes the second backslash and the search pattern will now search for backslashes.
There are a multitude of options you can add to your search pattern.:
Examples
adding \d will make the pattern search for a numeric value between [0-9] and/or the underscore. Simple regular expressions are parsed from left to right.
/javascript/
Searches for the word javascript in a string.
/[a-z]/
When a pattern is put between square bracket the search pattern searches for a character matching any one of the values inside the square brackets. This will find d in 229302d34330
You can build a regular expression with multiple blocks.
/(java)|(emca)script/
Find javascript or emcascript in a string. The | is the or operator.
/a/ vs. /a+/
The first matches the first a in aaabbb, the second matches a repetition of a until another character is found. So the second matches: aaa.
The plus sign + means find a one or more times. You can also use * which means zero or more times.
/^\d+$/
We've seen the \d earlier and also the plus sign. This means find one or more numeric characters. The ^ (caret) and $ (dollar sign) are new. The ^ says start searching from the begin of the string, while the $ says until the end of the string. This expression will match: 574545485 but not d43849343, 549854fff or 4348d8788.
Flags
Flags are operators and are declared after the regular expression /regular expression/flags
JavaScript has three flags you can use:
g (global) Searches multiples times for the pattern.
i (ignore case) Ignores case in pattern.
m (multiline) treat beginning and end characters (^ and $) as working over multiple lines (i.e., match the beginning or end of each line (delimited by \n or \r), not only the very beginning or end of the whole input string)
So a regular expression like this:
/d[0-9]+/ig
matches D094938 and D344783 in 98498D094938A37834D344783.
The i makes the search case-insensitive. Matching a D because of the d in the pattern. If D is followed by one or more numbers then the pattern is matched. The g flag commands the expression to look for the pattern globally or simply said: multiple times.
In your case #Qwerty provided the correct regex:
origin_str.replace(/\//g, "")
Where the search pattern is a single forward slash /. Escaped by the backslash to prevent script errors. The g flags commands the replace function to search for all occurrences of the forward slash in the string and replace them with an empty string "".
For a comprehensive tutorial and reference : http://www.regular-expressions.info/tutorial.html
Looking for this?
origin_str.replace(/\//g, "")
The syntax for replace is
.replace(/pattern/flags, replacement)
So in my case the pattern is \/ - an escaped slash
and g is global flag.

Categories

Resources