Regex matching a string pattern and number ( url format ) - javascript

I have a string that follows this url pattern as
https://www.examples.org/activity-group/8712/activity/202803
// note : the end ending of the url can be different
https://www.examples.org/activity-group/8712/activity/202803‌​?ref=bla
https://www.examples.org/activity-group/8712/activity/202803‌​/something
I'm trying to write a regex that matches
https://www.examples.org/activity-group/{number}/activity/{number}*
Where {number} is an integer of length 1 to 10.
How to define a regex that checks the string pattern and checks if the number is at the right position in the string ?
Background: in Google form, in order validate an answer , I want to enforce people to enter an url in this format. Hence the use of this regular expression.
For Urls not matching that format, the regex should return false. For example : https://www.notthesite.org/group/8712/activity/astring
I went through several examples, but they match only if the number is present in the string.
Examples sources :
How to find a number in a string using JavaScript?
Get the first Int(s) in a string with javascript

^https:\/\/www\.examples\.org\/activity-group\/[0-9]{1,10}\/activity\/[0-9]{1,10}(\/[a-z]+)*((\?[a-z]+=[a-zA-Z0-9]+)(\&[a-z]+=[a-zA-Z0-9]+)*)*$
^ - start of string
\ - escape character
[0-9] - a digit
{1,10} - between one and ten of the previous items
(\/[a-z]+)* - Allow additional URL segments
((\?[a-z]+=[a-zA-Z0-9]+)(\&[a-z]+=[a-zA-Z0-9]+)*)* - Allow query parameters with first parameter using a ? and all others using &
$ - end of string
This is assuming the URL segment and query parameter keys are lowercase letters only. The query parameter values can be lowercase letters, uppercase letters, or digits.

You could use
https?:\/\/(?:[^/]+\/){2}(\d+)\/[^/]+\/(\d+)
See a demo on regex101.com.
Broken down, this says:
https?:\/\/ # http:// or https://
(?:[^/]+\/){2} # not "/", followed by "/", twice
(\d+) # 1+ digits
\/[^/]+\/ # same pattern as above
(\d+) # the other number
You'll need to use group 1 and 2, respectively.
If this is too permissive, use
https:\/\/[^/]+\/activity-group\/(\d+)\/activity\/(\d+)
Which reads
https:\/\/[^/]+ # https:// + some domain name
\/activity-group\/ # /activity-group/
(\d+) # first number
\/activity\/ # /activity/
(\d+) # second number
See another demo on regex101.com.

Probably you need something like:
(http[s]?:\/\/)?www.examples.org\/activity-group\/(\d{1,10})\/activity\/(\d{1,10})([\S]+?)$
Where:
(http[s]?:\/\/)? matches any http:// or https:// part.
www.examples.org is your domain name.
(\d{1,10}) will match the first integer with max len of 10(after activity-group).
Second (\d{1,10}) will match the second integer after activity.
And finally ([\S]+?)$ will match any optional data after the second number until a new line is found, assuming that you use multiline flag with \m.
Check it at http://regexr.com/3h448
Hope it helps!

Related

Regex - how to exclude a specific string that contains a specific character

I need to exclude whole specific string that contains on of these characters: $ % &
The string looks like a URL. It should starts with 'http(s)://', 'ftp://' or 'www.' and match everything after it accept invalid characters $ % &
------- For example:-------
Valid strings are:
www.localhost
http://www.aaaaaa.com/aaaaa5-test5
https://map:1234
www.google.com
http://map:1234
Invalid strings are:
http://www.aaaaa%a.com/test5
https://map:12$34
www.google.com&
I have written this regex (https://regex101.com/r/Gl60ls/1)
/(\b(https?:\/\/|ftp:\/\/|www\.).+?([^\$\%\&\s\n])+)/gim
But it match first part of the string till the invalid character
------- For example: -------
If I have a string http://www.aaaaa%a.com/test5 , it will match http://www.aaaaa
I need to completely exclude the entire string
Any ideas ? I will be so grateful !
The part .+?([^\$\%\&\s\n])+) will match as soon as at least one character is not in your "forbidden" list. It is not forbidding those characters to be matched in the .+? part.
You can use negative look-ahead for your purpose:
(^(https?:\/\/|ftp:\/\/|www\.)(?!.*?[$%&]).+)
If you have several URLs on the same line, separated by a space, then:
\b(https?:\/\/|ftp:\/\/|www\.)([^$%&\s]+)(?!\S)

Regex for matching certain url

It's should match those urls
https://example.com/id/username/
http://example.com/id/username/
https://www.example.com/id/username
http://example.com/id/username/
basically it's should start with http or https when maybe www when example.com and /id and last is username which could be anything, and / is not always in end
username could be anything
I got this so far:
if (input.match(/http\:\/\/example\.com/i)) {
console.log('-');
}
also how to check with regex if urls ends with 7 number like 1234567/ or 3523173. / not always in end
Use the following regular expression
http(s)?:\/\/(www\.)example.com\/id\/[a-zA-Z0-9]+
You can change [a-zA-Z0-9] as per your username format if you required. See following example:
[a-zA-Z0-9]+ ==> Username contain Uppercase, Lowercase, Number. (john008)
[a-zA-Z]+ ===> Username contain Uppercase, Lowercase. (john)
[0-9]+ ===> Username contain only Number. (123456)
https?\:\/\/(www\.)?example\.com\/id\/([a-zA-Z]+)\/?
Without further specification you could use
\bhttps?:.+?example\.com\/[a-zA-Z]+\/\w+\/?(?=\s|\Z)
See a demo on regex101.com.
This is
\b # a word boundary
https?: # http/https:
.+? # anything else afterwards, lazily
example\.com # what it says
\/[a-zA-Z]+\/\w+\/? # /id/username with / optional
(?=\s|\Z) # followed by a whitespace or the end of the string

exclude full word with javascript regex word boundary

I'am looking to exclude matches that contain a specific word or phrase. For example, how could I match only lines 1 and 3? the \b word boundary does not work intuitively like I expected.
foo.js # match
foo_test.js # do not match
foo.ts # match
fun_tset.js # match
fun_tset_test.ts # do not match
UPDATE
What I want to exclude is strings ending explicitly with _test before the extension. At first I had something like [^_test], but that also excludes any combination of those characters (like line 3).
Regex: ^(?!.*_test\.).*$
Working examples: https://regex101.com/r/HdGom7/1
Why it works: uses negative lookahead to check if _test. exists somewhere in the string, and if so doesn't match it.
Adding to #pretzelhammer's answer, it looks like you want to grab strings that are file names ending in ts or js:
^(?!.*_test)(.*\.[jt]s)
The expression in the first parentheses is a negative lookahead that excludes any strings with _test, the second parentheses matches any strings that end in a period, followed by [jt] (j or t), followed by s.

regular expression match numeric value

I'm trying to match a pattern:
show_clipping.php?CLIP_id=*
from:
a href="javascript:void(0);" onclick="MM_openBrWindow('show_clipping.php?CLIP_id=575','news','scrollbars=yes,resizable=yes,width=500,height=400,left=100,top=60')">some text</a>
where
*
can be only numeric values(eg: 0, 1 , 1234)
the result has to return the whole thing(show_clipping.php?CLIP_id=575)
what I've tried:
show_clipping.php\?CLIP_id=([1-9]|[1-9][0-9]|[1-9][0-9][0-9])
but my attempt would truncate the rest of the digits from 575, leaving the results like:
show_clipping.php?CLIP_id=5
How do I match numeric part properly?
Another issue is that the value 575 can contain any numeric value, my regex will not work after 3 digits, how do i make it work with infinit amount of digits
You didn't specify what language your are using so here is just the regex:
'([^']+)'
Explanation
' # Match a single quote
([^`])+ # Capture anything not a single quote
' # Match the closing single quote
So basically it capture everything in single quotes, show_clipping.php?CLIP_id=5 is in the first capture group.
See it action here.
To only capture show_clipping.php?CLIP_id=5 I would do '(.*CLIP_id=[0-9]+)'
' # Match a single quote
(.* # Start capture group, match anyting
CLIP_id= # Match the literal string
[0-9]+) # Match one of more digit and close capture group
' # Match the closing single quote
Answer:
^(0|[1-9][0-9]*)$
answered before:
Regex pattern for numeric values
(answer number 6)
What about this?
onclick.match(/show_clipping\.php\?CLIP_id=\d+/)
["show_clipping.php?CLIP_id=575"]
(From the tags of your question I assume you're using JavaScript)
show_clipping.php\?CLIP_id=(\d+)
\d matches a digit, and + means 1 or more of them.
How about:
/(show_clipping.php\?CLIP_id=[1-9]\d*)/

Javascript multiple regex pattern

I'm trying to exclude some internal IP addresses and some internal IP address formats from viewing certain logos and links in the site.I have multiple range of IP addresses(sample given below). Is it possible to write a regex that could match all the IP addresses in the list below using javascript?
10.X.X.X
12.122.X.X
12.211.X.X
64.X.X.X
64.23.X.X
74.23.211.92
and 10 more
Quote the periods, replace the X's with \d+, and join them all together with pipes:
const allowedIPpatterns = [
"10.X.X.X",
"12.122.X.X",
"12.211.X.X",
"64.X.X.X",
"64.23.X.X",
"74.23.211.92" //, etc.
];
const allowedRegexStr = '^(?:' +
allowedIPpatterns.
join('|').
replace(/\./g, '\\.').
replace(/X/g, '\\d+') +
')$';
const allowedRegexp = new RegExp(allowedRegexStr);
Then you're all set:
'10.1.2.3'.match(allowedRegexp) // => ['10.1.2.3']
'100.1.2.3'.match(allowedRegexp) // => null
How it works:
First, we have to turn the individual IP patterns into regular expressions matching their intent. One regular expression for "all IPs of the form '12.122.X.X'" is this:
^12\.122\.\d+\.\d+$
^ means the match has to start at the beginning of the string; otherwise, 112.122.X.X IPs would also match.
12 etc: digits match themselves
\.: a period in a regex matches any character at all; we want literal periods, so we put a backslash in front.
\d: shorthand for [0-9]; matches any digit.
+: means "1 or more" - 1 or more digits, in this case.
$: similarly to ^, this means the match has to end at the end of the string.
So, we turn the IP patterns into regexes like that. For an individual pattern you could use code like this:
const regexStr = `^` + ipXpattern.
replace(/\./g, '\\.').
replace(/X/g, '\\d+') +
`$`;
Which just replaces all .s with \. and Xs with \d+ and sticks the ^ and $ on the ends.
(Note the doubled backslashes; both string parsing and regex parsing use backslashes, so wherever we want a literal one to make it past the string parser to the regular expression parser, we have to double it.)
In a regular expression, the alternation this|that matches anything that matches either this or that. So we can check for a match against all the IP's at once if we to turn the list into a single regex of the form re1|re2|re3|...|relast.
Then we can do some refactoring to make the regex matcher's job easier; in this case, since all the regexes are going to have ^...$, we can move those constraints out of the individual regexes and put them on the whole thing: ^(10\.\d+\.\d+\.\d+|12\.122\.\d+\.\d+|...)$. The parentheses keep the ^ from being only part of the first pattern and $ from being only part of the last. But since plain parentheses capture as well as group, and we don't need to capture anything, I replaced them with the non-grouping version (?:..).
And in this case we can do the global search-and-replace once on the giant string instead of individually on each pattern. So the result is the code above:
const allowedRegexStr = '^(?:' +
allowedIPpatterns.
join('|').
replace(/\./g, '\\.').
replace(/X/g, '\\d+') +
')$';
That's still just a string; we have to turn it into an actual RegExp object to do the matching:
const allowedRegexp = new RegExp(allowedRegexStr);
As written, this doesn't filter out illegal IPs - for instance, 10.1234.5678.9012 would match the first pattern. If you want to limit the individual byte values to the decimal range 0-255, you can use a more complicated regex than \d+, like this:
(?:\d{1,2}|1\d{2}|2[0-4]\d|25[0-5])
That matches "any one or two digits, or '1' followed by any two digits, or '2' followed by any of '0' through '4' followed by any digit, or '25' followed by any of '0' through '5'". Replacing the \d with that turns the full string-munging expression into this:
const allowedRegexStr = '^(?:' +
allowedIPpatterns.
join('|').
replace(/\./g, '\\.').
replace(/X/g, '(?:\\d{1,2}|1\\d{2}|2[0-4]\\d|25[0-5])') +
')$';
And makes the actual regex look much more unwieldy:
^(?:10\.(?:\d{1,2}|1\d{2}|2[0-4]\d|25[0-5])\.(?:\d{1,2}|1\d{2}|2[0-4]\d|25[0-5]).(?:\d{1,2}|1\d{2}|2[0-4]\d|25[0-5])|12\.122\....
but you don't have to look at it, just match against it. :)
You could do it in regex, but it's not going to be pretty, especially since JavaScript doesn't even support verbose regexes, which means that it has to be one humongous line of regex without any comments. Furthermore, regexes are ill-suited for matching ranges of numbers. I suspect that there are better tools for dealing with this.
Well, OK, here goes (for the samples you provided):
var myregexp = /\b(?:74\.23\.211\.92|(?:12\.(?:122|211)|64\.23)\.(?:25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])\.(?:25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])|(?:10|64)\.(?:25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])\.(?:25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])\.(?:25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9]))\b/g;
As a verbose ("readable") regex:
\b # start of number
(?: # Either match...
74\.23\.211\.92 # an explicit address
| # or
(?: # an address that starts with
12\.(?:122|211) # 12.122 or 12.211
| # or
64\.23 # 64.23
)
\. # .
(?:25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])\. # followed by 0..255 and a dot
(?:25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9]) # followed by 0..255
| # or
(?:10|64) # match 10 or 64
\. # .
(?:25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])\. # followed by 0..255 and a dot
(?:25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])\. # followed by 0..255 and a dot
(?:25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9]) # followed by 0..255
)
\b # end of number
/^(X|\d{1,3})(\.(X|\d{1,3})){3}$/ should do it.
If you don't actually need to match the "X" character you could use this:
\b(?:\d{1,3}\.){3}\d{1,3}\b
Otherwise I would use the solution cebarrett provided.
I'm not entirely sure of what you're trying to achieve here (doesn't look anyone else is either).
However, if it's validation, then here's a solution to validate an IP address that doesn't use RegEx. First, split the input string at the dot. Then using parseInt on the number, make sure it isn't higher than 255.
function ipValidator(ipAddress) {
var ipSegments = ipAddress.split('.');
for(var i=0;i<ipSegments.length;i++)
{
if(parseInt(ipSegments[i]) > 255){
return 'fail';
}
}
return 'match';
}
Running the following returns 'match':
document.write(ipValidator('10.255.255.125'));
Whereas this will return 'fail':
document.write(ipValidator('10.255.256.125'));
Here's a noted version in a jsfiddle with some examples, http://jsfiddle.net/VGp2p/2/

Categories

Resources