Capturing the character before the regex - javascript

I have a quick question about a regex that I wrote in JavaScript. It is the following (?<=,)(.*)(?=:) and it captures everything between , and :. I want it, however, to capture the comma itself too, as in.
So,<< this is what my regex captures at the moment>>: end would become
So<<, this is what my regex captures at the moment>>: end.
I tried using a . before the , in the regex but it doesn't seem to be working.

Use a simple capturing group - it's shorter than your current regex and works perfectly:
var regex = /(,.*?):/g;
var string = "So,<< this is what my regex captures at the moment>>: end";
console.log(string.match(regex));
Explanation:
() - denotes a capturing group
, - match a comma
.?* - match any amount of any characters
: - match a comma

Assuming the double arrows are for indicating the start and the end what your current pattern matches, you could match the comma and then 1+ times not a comma using a negated character class:
,[^:]+
If the comma at the end should be there, you could use the capturing group:
(,[^:]+):
Regex demo
You can omit the positive lookahead (?=:) by just matching the colon because you are already using a capturing group to get the match.
const regex = /(,[^:]+):/;
const str = `So,<< this is what my regex captures at the moment>>: end`;
let res = str.match(regex);
console.log(res[1]);

As you said :
So,<< this is what my regex captures at the moment>>: end would become
So<<, this is what my regex captures at the moment>>: end.
you could use replace like this :
var str = `So,<< this is what my regex captures at the moment>>: end`;
var replace = str.replace(/(.*?)(,)(<<)(.*)/,"$1$3$2$4");
console.log(replace);

Related

Validate text with javascript RegEX

I'm trying to validate text with javascript but can find out why it's not working.
I have been using : https://regex101.com/ for testing where it works but in my script it fails
var check = "test"
var pattern = new RegExp('^(?!\.)[a-zA-Z0-9._-]+$(?<!\.)','gmi');
if (!pattern.test(check)) validate_check = false;else validate_check = true;
What i'm looking for is first and last char not a dot, and string may contain [a-zA-Z0-9._-]
But the above check always fails even on the word : test
+$(?<!\.) is invalid in your RegEx
$ will match the end of the text or line (with the m flag)
Negative lookbehind → (?<!Y)X will match X, but only if Y is not before it
What about more simpler RegEx?
var checks = ["test", "1-t.e_s.t0", ".test", "test.", ".test."];
checks.forEach(check => {
var pattern = new RegExp('^[^.][a-zA-Z0-9\._-]+[^.]$','gmi');
console.log(check, pattern.test(check))
});
Your code should look like this:
var check = "test";
var pattern = new RegExp('^[^.][a-zA-Z0-9\._-]+[^.]$','gmi');
var validate_check = pattern.test(check);
console.log(validate_check);
A few notes about the pattern:
You are using the RegExp constructor, where you have to double escape the backslash. In this case with a single backslash, the pattern is ^(?!.)[a-zA-Z0-9._-]+$(?<!.) and the first negative lookahead will make the pattern fail if there is a character other than a newline to the right, that is why it does not match test
If you use the /i flag for a case insensitive match, you can shorten [A-Za-z] to just one of the ranges like [a-z] or use \w to match a word character like in your character class
This part (?<!\.) using a negative lookbehind is not invalid in your pattern, but is is not always supported
For your requirements, you don't have to use lookarounds. If you also want to allow a single char, you can use:
^[\w-]+(?:[\w.-]*[\w-])?$
^ Start of string
[\w-]+ Match 1+ occurrences of a word character or -
(?: Non capture group
[\w.-]*[\w-] Match optional word chars, a dot or hyphen
)? Close non capture group and make it optional
$ End of string
Regex demo
const regex = /^[\w-]+(?:[\w.-]*[\w-])?$/;
["test", "abc....abc", "a", ".test", "test."]
.forEach((s) =>
console.log(`${s} --> ${regex.test(s)}`)
);

Regex capturing group only capturing last occurence

Here is my input:
start
#var=somevar1
#var=somevar2
end
I am using this regex
start(?:\s*\n*(?:#var=(.*)\s*)*)\s*\n*end
Its should give the output as
somevar1
somevar2
but its giving just somevar2.
Is there any way to get all occurrence of the capturing group?
One option is to use a positive lookbehind with an inifite quantifier to assert start followed by a newline at the left.
See the support for lookbehinds.
(?<=^start\n[^]*#var=)\S+(?=[^]*\nend$)
Regex demo
const regex = /(?<=^start\n[^]*#var=)\S+(?=[^]*\nend$)/gm;
const str = `start
#var=somevar1
#var=somevar2
end`;
let m;
console.log(str.match(regex));
If there can only be formats of #var=somevar preceding and following instead of other content:
(?<=^start\n\s*(?:#var=\S+\s+)*#var=)\S+(?=(?:\s*#var=\S+)*\s*\nend$)
See another regex demo

Javascript how to identify a combination of letters and strip a portion of it

Im very new to Regex . Right now im trynig to use regex to prepare my markup string before sending it to the database.
Here is an example string:
#[admin](user:3) Testing this string #[hellotessginal](user:4) Hey!
So far i am able to identify #[admin](user:3) the entire term here using /#\[(.*?)]\((.*?):(\d+)\)/g
But the next step forward is that i wish to remove the (user:3) leaving me with #[admin].
Hence the result of passing through the stripper function would be:
#[admin] Testing this string #[hellotessginal] Hey!
Please help!
You may use
s.replace(/(#\[[^\][]*])\([^()]*?:\d+\)/g, '$1')
See the regex demo. Details:
(#\[[^\][]*]) - Capturing group 1: #[, 0 or more digits other than [ and ] as many as possible and then ]
\( - a ( char
[^()]*? - 0 or more (but as few as possible) chars other than ( and )
: - a colon
\d+ - 1+ digits
\) - a ) char.
The $1 in the replacement pattern refers to the value captured in Group 1.
See the JavaScript demo:
const rx = /(#\[[^\][]*])\([^()]*?:\d+\)/g;
const remove_parens = (string, regex) => string.replace(regex, '$1');
let s = '#[admin](user:3) Testing this string #[hellotessginal](user:4) Hey!';
s = remove_parens(s, rx);
console.log(s);
Try this:
var str = "#[admin](user:3) Testing this string #[hellotessginal](user:4) Hey!";
str = str.replace(/ *\([^)]*\) */g, ' ');
console.log(str);
You can replace matches of the following regular expression with empty strings.
str.replace(/(?<=\#\[(.*?)\])\(.*?:\d+\)/g, ' ');
regex demo
I've assumed the strings for which "admin" and "user" are placeholders in the example cannot contain the characters in the string "()[]". If that's not the case please leave a comment and I will adjust the regex.
I've kept the first capture group on the assumption that it is needed for some unstated purpose. If it's not needed, remove it:
(?<=\#\[.*?\])\(.*?:\d+\)
There is of course no point creating a capture group for a substring that is to be replaced with an empty string.
Javascript's regex engine performs the following operations.
(?<= : begin positive lookbehind
\#\[ : match '#['
(.*?) : match 0+ chars, lazily, save to capture group 1
\] : match ']'
) : end positive lookbehind
\(.*?:\d+\) : match '(', 0+ chars, lazily, 1+ digits, ')'

How to get last occurrence with regex javascript?

Could you help me extract "women-watches" from the string:
https://www.aliexpress.com/category/200214036/women-watches.html?spm=2114.search0103.0.0.160b628cMC1npI&site=glo&SortType=total_tranpro_desc&g=y&needQuery=n&shipFromCountry=cn&tag=
I tried
\/(?:.(?!\/.+\.))+$
But I don't know how to do it right.
One option could be to use a capturing group to match a word character or a hyphen. Your match will be in the first capturing group.
^.*?\/([\w-]+)\.html
That will match:
^ Start of the string
.*? Match any character except a newline non greedy
\/ Match /
([\w-]+) Capturing group to match 1+ times a wordcharacter of a hyphen
\.html Match .html
Regex demo
const regex = /^.*?\/([\w-]+)\.html/;
const str = `https://www.aliexpress.com/category/200214036/women-watches.html?spm=2114.search0103.0.0.160b628cMC1npI&site=glo&SortType=total_tranpro_desc&g=y&needQuery=n&shipFromCountry=cn&tag=`;
console.log(str.match(regex)[1]);
Another option to match from the last occurence of the forward slash could be to match a forward slash and use a negative lookahead to check if there are no more forward slashes following. Then use a capturing group to match not a dot:
\/(?!.*\/)([^.]+)\.html
Regex demo
const regex = /\/(?!.*\/)([^.]+)\.html/;
const str = `https://www.aliexpress.com/category/200214036/women-watches.html?spm=2114.search0103.0.0.160b628cMC1npI&site=glo&SortType=total_tranpro_desc&g=y&needQuery=n&shipFromCountry=cn&tag=`;
console.log(str.match(regex)[1]);
Without using a regex, you might use the dom and split:
const str = `https://www.aliexpress.com/category/200214036/women-watches.html?spm=2114.search0103.0.0.160b628cMC1npI&site=glo&SortType=total_tranpro_desc&g=y&needQuery=n&shipFromCountry=cn&tag=`;
let elm = document.createElement("a");
elm.href = str;
let part = elm.pathname.split('/').pop().split('.')[0];
console.log(part);

regex capturing group/alternative combination not working in quotes

I am trying to use this expression:
var reg = "/(jan|feb|mar)[A-z]*\[0-9]/"
to capture at least the first three letters of the month(or more letters) plus a digit. This does not work however. When I remove the parenthesis, it works but then the [A-z]*[0-9] bit only aplies to march. Please help, thanks.
Your regex is incorrect, also the regex should not be a string.
Use regex /(jan|feb|mar)[a-z]*[0-9]/i
Regex explanation: https://regex101.com/r/9Qv2dy/2
Snippet:
var reg = /(jan|feb|mar)[a-z]*[0-9]/i;
console.log('January1'.match(reg));
Your code contains several issues.
The /.../ regex literal should not be put inside quotes.
[A-z] matches more than just letters, you need [A-Za-z]
A \[ pattern matches a literal [ char. To match a digit, you need [0-9] or \d. To match 1 or more digits: [0-9]+ or \d+.
Use
var reg = /(?:jan|feb|mar)[a-z]*[0-9]/i;
See JS demo:
var reg = /(?:jan|feb|mar)[a-z]*[0-9]/i;
console.log("Date: January1".match(reg));

Categories

Resources