Extract specific string using RegExp - javascript

I have a string something like this,
Example 1 #abc#xy-example.com and #xyz#abc.com Example 2
Now, I want to remove the string occurring after the second # encountered, i.e. #example.com #abc.com and preserve the rest of the data which should look like,
Example 1 #abc and #xyz Example 2
I have tried a lot of RegExp and saw many examples but have had no luck so far.
If anyone has tried something similar, it'd be great if you can help me out.

For the second half of your sample you can simply match both #s and replace only the second on, by grouping and using them in the replace.
Pattern: /(#[^\s#]*)#[^\s#]*/g
Replacement: '$1'
This matches an # followed by anything but spaces and # and stores it in group 1. It then matches the next # and again anythihng but spaces and #.
If there might be other stuff between both #s, you could adjust your pattern to use (#[^#]*) for the capturing group.
Four the first part of your sample, you would have to find a better pattern to match what follows the second #, this could be something along #[^\s#<]*(?:<[^<>]*>[^<>]*<\/[^<>]*>) but I'm not quite sure about your requirements and matching along tags is always tricky.

Capture the first # followed by non-# characters in a group, then match # again followed by non-space characters, and replace with the first captured group:
(#[^#]+?)#[^ ]+?(?= )
Result:
Example 1 #abc and #xyz Example 2
https://regex101.com/r/mXRlsZ/1
Note that this will also replace any #s in a row, past the first - eg #abc#xy#foo will become #abc

Try this pattern: ^[^#]*#[^#]+(?=#). It will match everything before second #. It anchors at the beginning of a string, first it matches everything except #: [^#], then matches #, then again matches everything except #: [^#], until next # is met: (?=#) (positive lookahead).
Demo

string input = #"Example 1 #abc#xy-example.com and #xyz#abc.com Example 2";
string pattern = #"#.+?(?=#|$)";
int x = 0;
string s = Regex.Replace(
input,
pattern,
m => ++x == 2 ? Regex.Match(m.Value, #">(\s+.+?\s+)$").Groups[1].Value : m.Value);
Explanation
First, start with searching for # symbol. Variable x is tracking the number of occurrences of #. As soon as it hits 2, then we extract everything between the end and >. If x doesn't equal to 2, then we just return the match (m.Value);

Related

Check for a specific suffix by RegEx and select entire match including suffix

First of all, this is my first question in the community hence please pardon my wrongs Experts! I am learning regex and faced a scenario where I am failing to create answer by myself.
Let's say if there is humongous paragraph, can we first match on the basis of a specific suffix (say '%') and Only then go back and select the desired logic including suffix?
e.g. part of the text is "abcd efghMNP 0.40 % ijkl mnopSNP -3.20 % xyz".
Now in this, if you notice - and I got this much - that there is pattern like /([MS]NP[\s\d\.-%]+)/
I want to replace "MNP 0.40 %" or "SNP -3.20 %" with blank. replacing part seems easy :) But the problem is with all my learning I am not able to select desired ONLY IF there exists a '%' at the end of match.
The sequence of match I wish to reach at is -- if suffix '%' exists, then match the previous pattern, and if successful then select everything including suffix and replace with empty.
There are several expressions that would do so, for instance this one with added constraints:
[A-Z]{3}\s+[-+]?[0-9.]+\s*%
Test
const regex = /[A-Z]{3}\s+[-+]?[0-9.]+\s*%/gm;
const str = `abcd efghMNP 0.40 % ijkl mnopSNP -3.20 % xyz
"MNP 0.40 %" or "SNP -3.20 %"`;
const subst = ``;
// The substituted value will be contained in the result variable
const result = str.replace(regex, subst);
console.log(result);
Demo 1
Or a simplified version would be:
[A-Z]{3}(.*?)%
Demo 2
You can not go back in the matching if you have encountered a suffix %, but what you can do is to make it part of the pattern so that is has to be matched.
In Javascript you could perform a zero length lookahead assertion (?= making sure that what is on the right contains a pattern or in this case a % but that will not be a real benefit in this case as you want it to be part of the match.
A bit more specific match could be:
[MS]NP\s*-?\d+(?:\.\d+)?\s*%
[MS]NP Match M or S followed by NP
\s*-? Match 0+ times a whitespace char followed by an optional -
\d+(?:\.\d+)? Match 1+ digits followed by an optional part to match a dot and 1+ digits
\s*% Match 0+ whitespace chars followed by matching %
Regex demo

exclude full word with javascript regex word boundary

I'am looking to exclude matches that contain a specific word or phrase. For example, how could I match only lines 1 and 3? the \b word boundary does not work intuitively like I expected.
foo.js # match
foo_test.js # do not match
foo.ts # match
fun_tset.js # match
fun_tset_test.ts # do not match
UPDATE
What I want to exclude is strings ending explicitly with _test before the extension. At first I had something like [^_test], but that also excludes any combination of those characters (like line 3).
Regex: ^(?!.*_test\.).*$
Working examples: https://regex101.com/r/HdGom7/1
Why it works: uses negative lookahead to check if _test. exists somewhere in the string, and if so doesn't match it.
Adding to #pretzelhammer's answer, it looks like you want to grab strings that are file names ending in ts or js:
^(?!.*_test)(.*\.[jt]s)
The expression in the first parentheses is a negative lookahead that excludes any strings with _test, the second parentheses matches any strings that end in a period, followed by [jt] (j or t), followed by s.

How can I match the last part of an email via JavaScript? [duplicate]

Using a regular expression (replaceregexp in Ant) how can I match (and then replace) everything from the start of a line, up to and including the last occurrence of a slash?
What I need is to start with any of these:
../../replace_this/keep_this
../replace_this/replace_this/Keep_this
/../../replace_this/replace_this/Keep_this
and turn them into this:
what_I_addedKeep_this
It seems like it should be simple but I'm not getting it. I've made regular expressions that will identify the last slash and match from there to the end of the line, but what I need is one that will match everything from the start of a line until the last slash, so I can replace it all.
This is for an Ant build file that's reading a bunch of .txt files and transforming any links it finds in them. I just want to use replaceregexp, not variables or properties. If possible.
You can match this:
.*\/
and replace with your text.
DEMO
What you want to do is match greedily, the longest possible match of the pattern, it is default usually, but match till the last instance of '/'.
That would be something like this:
.*\/
Explanation:
. any character
* any and all characters after that (greedy)
\/ the slash escaped, this will stop at the **last** instance of '/'
You can see it in action here: http://regex101.com/r/pI4lR5
Option 1
Search: ^.*/
Replace: Empty string
Because the * quantifier is greedy, ^.*/ will match from the start of the line to the very last slash. So you can directly replace that with an empty string, and you are left with your desired text.
Option 2
Search: ^.*/(.*)
Replace: Group 1 (typically, the syntax would be $1 or \1, not sure about Ant)
Again, ^.*/ matches to the last slash. You then capture the end of the line to Group 1 with (.*), and replace the whole match with Group 1.
In my view, there's no reason to choose this option, but it's good to understand it.

Regex delimit the start of a string and the end

I'm been having trouble with regex, which I doesn't understand at all.
I have a string '#anything#that#i#say' and want that the regex detect one word per #, so it will be [#anything, #that, #i, #say].
Need to work with spaces too :(
The closest that I came is [#\w]+, but this only get 1 word and I want separated.
You're close; [#\w] will match anything that is either a # or a word character. But what you want is to match a single # followed by any number of word characters, like this: #\w+ without the brackets
var str = "#anything#that#i#say";
var regexp = /#\w+/gi;
console.log(str.match(regexp));
It's possible to have this deal with spaces as well, but I'd need to see an example of what you mean to tell you how; there are lots of ways that "need to work with spaces" can be interpreted, and I'd rather not guess.
use expression >> /#\s*(\w+)/g
\s* : to check if zero or more spaces you have between # and word
This will match 4 word in your string '#anything#that#i#say'
even your string is containing space between '#anything# that#i# say'
sample to check: http://www.regextester.com/?fam=97638

JavaScript and regular expressions: get the number of parenthesized subpattern

I have to get the number of parenthesized substring matches in a regular expression:
var reg=/([A-Z]+?)(?:[a-z]*)(?:\([1-3]|[7-9]\))*([1-9]+)/g,
nbr=0;
//Some code
alert(nbr); //2
In the above example, the total is 2: only the first and the last couple of parentheses will create grouping matches.
How to know this number for any regular expressions?
My first idea was to check the value of RegExp.$1 to RegExp.$9, but even if there are no corresponding parenthseses, these values are not null, but empty string...
I've also seen the RegExp.lastMatch property, but this one represents only the value of the last matched characters, not the corresponding number.
So, I've tried to build another regular expression to scan any RegExp and count this number, but it's quite difficult...
Do you have a better solution to do that?
Thanks in advance!
Javascripts RegExp.match() method returns an Array of matches. You might just want to check the length of that result array.
var mystr = "Hello 42 world. This 11 is a string 105 with some 2 numbers 55";
var res = mystr.match(/\d+/g);
console.log( res.length );
Well, judging from the code snippet we can assume that the input pattern is always a valid regular expression, because otherwise it would fail before the some code partm right? That makes the task much easier!
Because We just need to count how many starting capturing parentheses there are!
var reg = /([A-Z]+?)(?:[a-z]*)(?:\([1-3]|[7-9]\))*([1-9]+)/g;
var nbr = (' '+reg.source).match(/[^\\](\\\\)*(?=\([^?])/g);
nbr = nbr ? nbr.length : 0;
alert(nbr); // 2
And here is a breakdown:
[^\\] Make sure we don't start the match with an escaping slash.
(\\\\)* And we can have any number of escaped slash before the starting parenthes.
(?= Look ahead. More on this later.
\( The starting parenthes we are looking for.
[^?] Make sure it is not followed by a question mark - which means it is capturing.
) End of look ahead
Why match with look ahead? To check that the parenthes is not an escaped entity, we need to capture what goes before it. No big deal here. We know JS doens't have look behind.
Problem is, if there are two starting parentheses sticking together, then once we capture the first parenthes the second parenthes would have nothing to back it up - its back has already been captured!
So to make sure a parenthes can be the starting base of the next one, we need to exclude it from the match.
And the space added to the source? It is there to be the back of the first character, in case it is a starting parenthes.

Categories

Resources