JS - how to replace in regex for links - javascript

I have a regex that replaces the link from the moment http to the occurrence of certain characters in [], I would like to add to these characters - that is to replace the string with the occurrence of certain characters or a hard space:
"https://test.pl'".replace(/https?:\/\/[^ "'><]+/g," ")
Works fine for the characters mentioned in [], I don't know how to add here

You can use
.replace(/https?:\/\/.*?(?: |[ '"><]|$)/g," ")
See the regex demo.
Details:
https?:\/\/ - http:// or https://
.*? - any zero or more chars other than line break chars as few as possible
(?: |[ '"><]|$) - one of the following:
- char sequence
| - or
[ "'><] - a space, ", ', > or < char
| - or
$ - end of string.
JavaScript demo:
const texts = ["https://test.pl test","https://test.pl'test"];
const re = /https?:\/\/.*?(?: |[ '"><]|$)/g;
for (const s of texts) {
console.log(s, '=>', s.replace(re, " "));
}

Related

Is there a regex to remove everything after comma in a string except first letter

I am trying to remove all the characters from the string after comma except the first letter. The string is basically the last name,first name.
For example:
Smith,John
I tried as below but it removes comma and everything after comma.
let str = "Smith,John";
str = str.replace(/\s/g, ""); // to remove all whitespace if there is any at the beginning, in the middle and at the end
str = str.split(',')[0];
Expected output: Smith,J
Thank you!
Or try (,\w).* with replace:
let str = "Smith,John";
str = str.replace(/(,\w).*/, '$1');
console.log(str);
Try this regex out:
\w+,\w
This matches one or more characters before the comma and then matches only 1 character.
Here is the demo: https://regex101.com/r/bKpWt7/1
Note: \w matches any character from [a-zA-Z0-9_].
Taking optional spaces around the comma in to account, and perhaps multiple "names" before the comma:
*([^\s,][^,\n]*?) *, *([^\s,]).*
* Match optional spaces
( Capture group 1
*([^\s,] Match optional spaces and match at least a single char other than a whitespace char or a ,
[^,\n]*? Match any char except a , or a newline non greedy
) Close group 1
*, * Match a comma between optional spaces
([^\s,]) Capture group 2, match a single char other than , or a whitespace char
.* Match the rest of the line
Regex demo
In the replacement using group 1 and group 2 with a comma in between $1,$2
const regex = / *([^\s,][^,\n]*?) *, *([^\s,]).*/;
[
"Smith,John Jack",
"Smith Lastname , Jack John",
"Smith , John",
" ,Jack"
].forEach(s => console.log(s.replace(regex, "$1,$2")));

How to match 2 separate numbers in Javascript

I have this regex that should match when there's two numbers in brackets
/(P|C\(\d+\,{0,1}\s*\d+\))/g
for example:
C(1, 2) or P(2 3) //expected to match
C(43) or C(43, ) // expect not to match
but it also matches the ones with only 1 number, how can i fix it?
You have a couple of issues. Firstly, your regex will match either P on its own or C followed by numbers in parentheses; you should replace P|C with [PC] (you could use (?:P|C) but [PC] is more performant, see this Q&A). Secondly, since your regex makes both the , and spaces optional, it can match 43 without an additional number (the 4 matches the first \d+ and the 3 the second \d+). You need to force the string to either include a , or at least one space between the numbers. You can do that with this regex:
[PC]\(\d+[ ,]\s*\d+\)
Demo on regex101
Try this regex
[PC]\(\d+(?:,| +) *\d+\)
Click for Demo
Explanation:
[PC]\( - matches either P( or C(
\d+ - matches 1+ digits
(?:,| +) - matches either a , or 1+ spaces
*\d+ - matches 0+ spaces followed by 1+ digits
\) - matches )
You can relax the separator between the numbers by allowing any combination of command and space by using \d[,\s]+\d. Test case:
const regex = /[PC]\(\d+[,\s]+\d+\)/g;
[
'C(1, 2) or P(2 3)',
'C(43) or C(43, )'
].forEach(str => {
let m = str.match(regex);
console.log(str + ' ==> ' + JSON.stringify(m));
});
Output:
C(1, 2) or P(2 3) ==> ["C(1, 2)","P(2 3)"]
C(43) or C(43, ) ==> null
Your regex should require the presence of at least one delimiting character between the numbers.
I suppose you want to get the numbers out of it separately, like in an array of numbers:
let tests = [
"C(1, 2)",
"P(2 3)",
"C(43)",
"C(43, )"
];
for (let test of tests) {
console.log(
test.match(/[PC]\((\d+)[,\s]+(\d+)\)/)?.slice(1)?.map(Number)
);
}

Setting the end of the match

I have the following string:
[TITLE|prefix=a] [STORENAME|prefix=b|suffix=c] [DYNAMIC|limit=10|random=0|reverse=0]
And I would like to get the value of the prefix of TITLE, which is a.
I have tried it with (?<=TITLE|)(?<=prefix=).*?(?=]|\|) and that seems to work but that gives me also the prefix of STORENAME (b). So if [TITLE|prefix=a] will be missing in the string, I'll have the wrong value.
So I need to set the end of the match with ] that belongs to [TITLE. Please notice that this string is dynamic. So it could be [TITLE|suffix=x|prefix=y] as well.
const regex = "[TITLE|prefix=a] [STORENAME|prefix=b|suffix=c] [DYNAMIC|limit=10|random=0|reverse=0]".match(/(?<=TITLE|)(?<=prefix=).*?(?=]|\|)/);
console.log(regex);
You can use
(?<=TITLE(?:\|suffix=[^\]|]+)?\|prefix=)[^\]|]+
See the regex demo. Details:
(?<=TITLE(?:\|suffix=[^\]|]+)?\|prefix=) - a location in string immediately preceded with TITLE|prefix| or TITLE|suffix=...|prefix|
[^\]|]+ - one or more chars other than ] and |.
See JavaScript demo:
const texts = ['[TITLE|prefix=a] [STORENAME|prefix=b|suffix=c] [DYNAMIC|limit=10|random=0|reverse=0]', '[TITLE|suffix=s|prefix=a]'];
for (let s of texts) {
console.log(s, '=>', s.match(/(?<=TITLE(?:\|suffix=[^\]|]+)?\|prefix=)[^\]|]+/)[0]);
}
You could also use a capturing group
\[TITLE\|(?:[^|=\]]*=[^|=\]]*\|)*prefix=([^|=\]]*)[^\]]*]
Explanation
\[TITLE\| Match [TITLE|
(?:\w+=\w+\|)* Repeat 0+ occurrences wordchars = wordchars and |
prefix= Match literally
(\w+) Capture group 1, match 1+ word chars
[^\]]* Match any char except ]
] Match the closing ]
Regex demo
const regex = /\[TITLE\|(?:\w+=\w+\|)*prefix=(\w+)[^\]]*\]/g;
const str = `[TITLE|prefix=a] [STORENAME|prefix=b|suffix=c] [DYNAMIC|limit=10|random=0|reverse=0]
[TITLE|suffix=x|prefix=y]`;
let m;
while ((m = regex.exec(str)) !== null) {
console.log(m[1]);
}
Or with a negated character class instead of \w
\[TITLE\|(?:[^|=\]]*=[^|=\]]*\|)*prefix=([^|=\]]*)[^\]]*]
Regex demo

Regex - ignoring text between quotes / HTML(5) attribute filtering

So I have this Regular expression, which basically has to filter the given string to a HTML(5) format list of attributes. It currently isn't doing my fulfilling, but that's about to change! (I hope so)
I'm trying to achieve that whenever an occurrence is found, it selects the text until the next occurrence OR the end of the string, as the second match. So if you'd take a look at the current regular expression:
/([a-zA-Z]+|[a-zA-Z]+-[a-zA-Z0-9]+)=["']/g
A string like this: hey="hey world" hey-heyhhhhh3123="Hello world" data-goed="hey"
Would be filtered / matched out like this:
MATCH 1. [0-3] `hey`
MATCH 2. [16-32] `hey-heyhhhhh3123`
MATCH 3. [47-56] `data-goed`
This has to be seen as the attribute-name(s), and now.. we just have to fetch the attribute's value(s). So the mentioned string has to have an outcome like this:
MATCH 1.
1 [0-3] `hey`
2 [6-14] `hey world`
MATCH 2.
1 [16-32] `hey-heyhhhhh3123`
2 [35-45] `Hello world`
MATCH 3.
1 [47-56] `data-goed`
2 [59-61] `hey`
Could anyone try and help me to get my fulfilling? It would be appericiated a lot!
You can use
/([^\s=]+)=(?:"([^"\\]*(?:\\.[^"\\]*)*)"|(\S+))/g
See regex demo
Pattern details:
([^\s=]+) - Group 1 capturing 1 or more characters other than whitespace and = symbol
= - an equal sign
(?:"([^"\\]*(?:\\.[^"\\]*)*)"|(\S+)) - a non-capturing group of 2 alternatives (one more '([^'\\]*(?:\\.[^'\\]*)*)' alternative can be added to account for single quoted string literals)
"([^"\\]*(?:\\.[^"\\]*)*)" - a double quoted string literal pattern:
" - a double quote
([^"\\]*(?:\\.[^"\\]*)*) - Group 2 capturing 0+ characters other than \ and ", followed with 0+ sequences of any escaped symbol followed with 0+ characters other than \ and "
" - a closing dlouble quote
| - or
(\S+) - Group 3 capturing one or more non-whitespace characters
JS demo (no single quoted support):
var re = /([^\s=]+)=(?:"([^"\\]*(?:\\.[^"\\]*)*)"|(\S+))/g;
var str = 'hey="hey world" hey-heyhhhhh3123="Hello \\"world\\"" data-goed="hey" more=here';
var res = [];
while ((m = re.exec(str)) !== null) {
if (m[3]) {
res.push([m[1], m[3]]);
} else {
res.push([m[1], m[2]]);
}
}
console.log(res);
JS demo (with single quoted literal support)
var re = /([^\s=]+)=(?:"([^"\\]*(?:\\.[^"\\]*)*)"|'([^'\\]*(?:\\.[^'\\]*)*)'|(\S+))/g;
var str = 'pseudoprefix-before=\'hey1"\' data-hey="hey\'hey" more=data and="more \\"here\\""';
var res = [];
while ((m = re.exec(str)) !== null) {
if (m[2]) {
res.push([m[1], m[2]])
} else if (m[3]) {
res.push([m[1], m[3]])
} else if (m[4]) {
res.push([m[1], m[4]])
}
}
console.log(res);

Javascript regexp capture matches delimited by character

I have a string like
classifier1:11:some text1##classifier2:fdglfgfg##classifier3:fgdfgfdg##classifier4
I am trying to capture terms like classifier1:11, classifier2:, classifier3 and classifier4
So these classifiers can be followed by a single semicolon or not.
So far I came up with
/([^#]*)(?::(?!:))/g
But that does not seem to capture classifier4, not sure what I am missing here
It seems that a classifier in your case consists of any word chars that may have single : in between and ends with a digit.
Thus, you may use
/(\w+(?::+\w+)*\d)[^#]*/g
See the regex demo
Explanation:
(\w+(?::+\w+)*\d) - Group 1 capturing
\w+ - 1 or more [a-zA-Z0-9_] (word) chars
(?::+\w+)* - zero or more sequences of 1+ :s and then 1+ word chars
\d - a digit should be at the end of this group
[^#]* - zero or more characters other than the delimiter #.
JS:
var re = /(\w+(?::+\w+)*\d)[^#\n]*/g;
var str = 'classifier4##classifier1:11:some text1##classifier2:fdglfgfg##classifier3:fgdfgfdg\nclassifier1:11:some text1##classifier4##classifier2:fdglfgfg##classifier3:fgdfgfdg##classifier4';
var res = [];
while ((m = re.exec(str)) !== null) {
res.push(m[1]);
}
document.body.innerHTML = "<pre>" + JSON.stringify(res, 0, 4) + "</pre>";
Basing on your pattern you can use a regex like this:
([^#]*)(?::|$)
Working demo

Categories

Resources