Could you help me extract "women-watches" from the string:
https://www.aliexpress.com/category/200214036/women-watches.html?spm=2114.search0103.0.0.160b628cMC1npI&site=glo&SortType=total_tranpro_desc&g=y&needQuery=n&shipFromCountry=cn&tag=
I tried
\/(?:.(?!\/.+\.))+$
But I don't know how to do it right.
One option could be to use a capturing group to match a word character or a hyphen. Your match will be in the first capturing group.
^.*?\/([\w-]+)\.html
That will match:
^ Start of the string
.*? Match any character except a newline non greedy
\/ Match /
([\w-]+) Capturing group to match 1+ times a wordcharacter of a hyphen
\.html Match .html
Regex demo
const regex = /^.*?\/([\w-]+)\.html/;
const str = `https://www.aliexpress.com/category/200214036/women-watches.html?spm=2114.search0103.0.0.160b628cMC1npI&site=glo&SortType=total_tranpro_desc&g=y&needQuery=n&shipFromCountry=cn&tag=`;
console.log(str.match(regex)[1]);
Another option to match from the last occurence of the forward slash could be to match a forward slash and use a negative lookahead to check if there are no more forward slashes following. Then use a capturing group to match not a dot:
\/(?!.*\/)([^.]+)\.html
Regex demo
const regex = /\/(?!.*\/)([^.]+)\.html/;
const str = `https://www.aliexpress.com/category/200214036/women-watches.html?spm=2114.search0103.0.0.160b628cMC1npI&site=glo&SortType=total_tranpro_desc&g=y&needQuery=n&shipFromCountry=cn&tag=`;
console.log(str.match(regex)[1]);
Without using a regex, you might use the dom and split:
const str = `https://www.aliexpress.com/category/200214036/women-watches.html?spm=2114.search0103.0.0.160b628cMC1npI&site=glo&SortType=total_tranpro_desc&g=y&needQuery=n&shipFromCountry=cn&tag=`;
let elm = document.createElement("a");
elm.href = str;
let part = elm.pathname.split('/').pop().split('.')[0];
console.log(part);
Related
I want to validate the url end point using regex. Example end point like: /user/update.
First I tried with (/[A-Za-z0-9_.:-~/]*) but also matches http://url.com/user/update with javascript regex. I want the string to only validate pass if it is equal to /user/update like end points
You can use regex look behind technique to get the path after the .com with /(?<=.com).*/
const matchEndPoint = (str) => str.match(/(?<=.com).*/)
const [result] = matchEndPoint('http://url.com/user/update');
console.log(result)
You might use a pattern like
^\/[\w.:~-]+\/[\w.:~-]+$
Regex demo
Or for example not allowing consecutive dashes like -- and match one or more forward slashes:
^\/\w+(?:[.:~-]\w+)*(?:\/\w+(?:[.:~-]\w+)*)*$
Explanation
^ Start of string
\/\w+ Match / and 1+ word chars
(?:[.:~-]\w+)* Optionally repeat a char of the character class and 1+ word chars
(?: Non capture group
\/\w+ Match / and 1+ word chars
(?:[.:~-]\w+)* Optionally repeat a char of the character class and 1+ word chars
)* Close group and optionally repeat
$ End of string
Regex demo
I have a string like:
Webcam recording https://www.example.com/?id=456&code=123
or like:
Webcam recording https://www.example.com/?id=456&code=123<br><b>test<b>
To extract the URL from the first example I used: var reg_exUrl = /\bhttps?:\/\/[^ ]+/g;
Now I tried to extend the Regex so it takes the first match until whitespace (end of line) or <br> tag.
This was my attempt:
var reg_exUrl = /\b(https?:\/\/[^ ]+)(\<br\>)/g;
Which looks good on https://regex101.com/r/gudNab/1 and shows up as two different matches.
But using the Regex in Javascript, the <br> tag gets always included in the link.
Using var matches = line.match(reg_exUrl); gives me with matches[0]:
https://www.example.com/?id=456&code=123<br>
instead of the desired https://www.example.com/?id=456&code=123
If you want to select text before the <br> you can use a postive lookahead.
https?:\/\/.*?(?=<br>)
Adding in a $ and \n for an early end of input: https?:\/\/.*?(?=<br>|$|\n)
const regexp = /https?:\/\/.*?(?=<br>|$|\n)/;
const testString = "Webcam-Aufnahme https://www.example.com/file?id=959559110184937375.mp4&code=4yrn1ev<br>**test**";
console.log(testString.match(regexp)[0])
See on regex101
You get the full match as you are using matches[0] but you have 2 capture groups where the part without the <br> is in capture group 1.
You can get that group value using match if you remove the global /g flag.
var line = "Webcam recording https://www.example.com/?id=456&code=123<br><b>test<b>\n";
var reg_exUrl = /\b(https?:\/\/[^ ]+)(\<br\>)/;
var matches = line.match(reg_exUrl);
console.log(matches[1]);
If you want both examples to match, you can use a pattern without a non greedy quantifier by using a negated character class that matches any char except <, and only matches it if it is not directly followed by br>
The pattern matches:
\bhttps?:\/\/
[^\s<]* Optionally match any char except a whitespace char or <
(?: Non capture group
<(?!br>) Match < if not directly followed by br>
[^\s<]* Optionally match any char except a whitespace char or <
)* Close non capture group and optionally repeat
const regex = /\bhttps?:\/\/[^\s<]*(?:<(?!br>)[^\s<]*)*/;
[
"Webcam-Aufnahme https://www.example.com/file?id=959559110184937375.mp4&code=4yrn1ev<br><b>test<b><br>",
"Webcam-Aufnahme https://www.example.com/file?id=959559110184937375.mp4&code=4yrn1ev"
].forEach(s => {
const m = s.match(regex);
if (m) {
console.log(m[0]);
}
});
I'm trying to validate text with javascript but can find out why it's not working.
I have been using : https://regex101.com/ for testing where it works but in my script it fails
var check = "test"
var pattern = new RegExp('^(?!\.)[a-zA-Z0-9._-]+$(?<!\.)','gmi');
if (!pattern.test(check)) validate_check = false;else validate_check = true;
What i'm looking for is first and last char not a dot, and string may contain [a-zA-Z0-9._-]
But the above check always fails even on the word : test
+$(?<!\.) is invalid in your RegEx
$ will match the end of the text or line (with the m flag)
Negative lookbehind → (?<!Y)X will match X, but only if Y is not before it
What about more simpler RegEx?
var checks = ["test", "1-t.e_s.t0", ".test", "test.", ".test."];
checks.forEach(check => {
var pattern = new RegExp('^[^.][a-zA-Z0-9\._-]+[^.]$','gmi');
console.log(check, pattern.test(check))
});
Your code should look like this:
var check = "test";
var pattern = new RegExp('^[^.][a-zA-Z0-9\._-]+[^.]$','gmi');
var validate_check = pattern.test(check);
console.log(validate_check);
A few notes about the pattern:
You are using the RegExp constructor, where you have to double escape the backslash. In this case with a single backslash, the pattern is ^(?!.)[a-zA-Z0-9._-]+$(?<!.) and the first negative lookahead will make the pattern fail if there is a character other than a newline to the right, that is why it does not match test
If you use the /i flag for a case insensitive match, you can shorten [A-Za-z] to just one of the ranges like [a-z] or use \w to match a word character like in your character class
This part (?<!\.) using a negative lookbehind is not invalid in your pattern, but is is not always supported
For your requirements, you don't have to use lookarounds. If you also want to allow a single char, you can use:
^[\w-]+(?:[\w.-]*[\w-])?$
^ Start of string
[\w-]+ Match 1+ occurrences of a word character or -
(?: Non capture group
[\w.-]*[\w-] Match optional word chars, a dot or hyphen
)? Close non capture group and make it optional
$ End of string
Regex demo
const regex = /^[\w-]+(?:[\w.-]*[\w-])?$/;
["test", "abc....abc", "a", ".test", "test."]
.forEach((s) =>
console.log(`${s} --> ${regex.test(s)}`)
);
I have a quick question about a regex that I wrote in JavaScript. It is the following (?<=,)(.*)(?=:) and it captures everything between , and :. I want it, however, to capture the comma itself too, as in.
So,<< this is what my regex captures at the moment>>: end would become
So<<, this is what my regex captures at the moment>>: end.
I tried using a . before the , in the regex but it doesn't seem to be working.
Use a simple capturing group - it's shorter than your current regex and works perfectly:
var regex = /(,.*?):/g;
var string = "So,<< this is what my regex captures at the moment>>: end";
console.log(string.match(regex));
Explanation:
() - denotes a capturing group
, - match a comma
.?* - match any amount of any characters
: - match a comma
Assuming the double arrows are for indicating the start and the end what your current pattern matches, you could match the comma and then 1+ times not a comma using a negated character class:
,[^:]+
If the comma at the end should be there, you could use the capturing group:
(,[^:]+):
Regex demo
You can omit the positive lookahead (?=:) by just matching the colon because you are already using a capturing group to get the match.
const regex = /(,[^:]+):/;
const str = `So,<< this is what my regex captures at the moment>>: end`;
let res = str.match(regex);
console.log(res[1]);
As you said :
So,<< this is what my regex captures at the moment>>: end would become
So<<, this is what my regex captures at the moment>>: end.
you could use replace like this :
var str = `So,<< this is what my regex captures at the moment>>: end`;
var replace = str.replace(/(.*?)(,)(<<)(.*)/,"$1$3$2$4");
console.log(replace);
Let's say in the following text
I want [this]. I want [this too]. I don't want \[this]
I want the contents of anything between [] but not \[]. How would I go about doing that? So far I've got /\[([^\]]+)\]/gi. but it matched everything.
Use this one: /(?:^|[^\\])\[(.*?)\]/gi
Here's a working example: http://regexr.com/3clja
?: Non-capturing group
^|[^\\] Beggining of string or anything but \
\[(.*?)\] Match anything between []
Here's a snippet:
var string = "[this i want]I want [this]. I want [this too]. I don't want \\[no]";
var regex = /(?:^|[^\\])\[(.*?)\]/gi;
var match = null;
document.write(string + "<br/><br/><b>Matches</b>:<br/> ");
while(match = regex.exec(string)){
document.write(match[1] + "<br/>");
}
Use this regexp, which first matches the \[] version (but doesn't capture it, thereby "throwing it away"), then the [] cases, capturing what's inside:
var r = /\\\[.*?\]|\[(.*?)\]/g;
^^^^^^^^^ MATCH \[this]
^^^^^^^^^ MATCH [this]
Loop with exec to get all the matches:
while(match = r.exec(str)){
console.log(match[1]);
}
/(?:[^\\]|^)\[([^\]]*)/g
The content is in the first capture group, $1
(?:^|[^\\]) matches the beginning of a line or anything that's not a slash, non-capturing.
\[ matches a open bracket.
([^\]]*) captures any number of consecutive characters that are not closed brackets
\] matches a closing bracket