Regex: Getting content from URL - javascript

I want to get "the-game" using regex from URLs like
http://www.somesite.com.domain.webdev.domain.com/en/the-game/another-one/another-one/another-one/
http://www.somesite.com.domain.webdev.domain.com/en/the-game/another-one/another-one/
http://www.somesite.com.domain.webdev.domain.com/en/the-game/another-one/

What parts of the URL could vary and what parts are constant? The following regex will always match whatever is in the slashes following "/en/" - the-game in your example.
(?<=/en/).*?(?=/)
This one will match the contents of the 2nd set of slashes of any URL containing "webdev", assuming the first set of slashes contains a 2 or 3 character language code.
(?<=.*?webdev.*?/.{2,3}/).*?(?=/)
Hopefully you can tweak these examples to accomplish what you're looking for.

var myregexp = /^(?:[^\/]*\/){4}([^\/]+)/;
var match = myregexp.exec(subject);
if (match != null) {
result = match[1];
} else {
result = "";
}
matches whatever lies between the fourth and fifth slash and stores the result in the variable result.

You probably should use some kind of url parsing library rather than resorting to using regex.
In python:
from urlparse import urlparse
url = urlparse('http://www.somesite.com.domain.webdev.domain.com/en/the-game/another-one/another-one/another-one/')
print url.path
Which would yield:
/en/the-game/another-one/another-one/another-one/
From there, you can do simple things like stripping /en/ from the beginning of the path. Otherwise, you're bound to do something wrong with a regular expression. Don't reinvent the wheel!

Related

Match only # and not ## without negative lookbehind

Using JavaScript, I need a regex that matches any instance of #{this-format} in any string. My original regex was the following:
#{[a-z-]*}
However, I also need a way to "escape" those instances. I want it so that if you add an extra #, the match gets escaped, like ##{this}.
I originally used a negative lookbehind:
(?<!#)#{[a-z-]*}
And that would work just fine, except... lookbehinds are an ECMAScript2018 feature, only supported by Chrome.
I read some people suggesting the usage of a negated character set. So my little regex became this:
(?:^|[^#])#{[a-z-]*}
...which would have worked just as well, except it doesn't work if you put two of these together: #{foo}#{bar}
So, anyone knows how can I achieve this? Remember that these conditions need to be met:
Find #{this} anywhere in a string
Be able to escape like ##{this}
Be able to put multiple adjacent, like #{these}#{two}
Lookbehinds must not be used
If you include ## in your regex pattern as an alternate match option, it will consume the ## instead of allowing a match on the subsequent bracketed entity. Like this:
##|(#{[a-z-]*})
You can then evaluate the inner match object in javascript. Here is a jsfiddle to demonstrate, using the following code.
var targetText = '#{foo} in a #{bar} for a ##{foo} and #{foo}#{bar} things.'
var reg = /##|(#{[a-z-]*})/g;
var result;
while((result = reg.exec(targetText)) !== null) {
if (result[1] !== undefined) {
alert(result[1]);
}
}
You could use (?:^|[^#])# to match the start of the pattern, and capture the following #{<sometext>} in a group. Since you don't want the initial (possible) [^#] to be in the result, you'll have to iterate over the matches manually and extract the group that contains the substring you want. For example:
function test(str) {
const re = /(?=(?:^|[^#])(#{[a-z-]*}))./g;
let match;
const matches = [];
while (match = re.exec(str)) {
matches.push(match[1]); // extract the captured group
}
return matches;
}
console.log(test('##{this}'))
console.log(test('#{these}#{two}'))

How would I write a Regular Expression to capture the value between Last Slash and Query String?

Problem:
Extract image file name from CDN address similar to the following:
https://cdnstorage.api.com/v0/b/my-app.com/o/photo%2FB%_2.jpeg?alt=media&token=4e32-a1a2-c48e6c91a2ba
Two-stage Solution:
I am using two regular expressions to retrieve the file name:
var postLastSlashRegEx = /[^\/]+$/,
preQueryRegEx = /^([^?]+)/;
var fileFromURL = urlString.match(postLastSlashRegEx)[0].match(preQueryRegEx)[0];
// fileFromURL = "photo%2FB%_2.jpeg"
Question:
Is there a way I can combine both regular expressions?
I've tried using capture groups, but haven't been able to produce a working solution.
From my comment
You can use a lookahead to find the "?" and use [^/] to match any non-slash characters.
/[^/]+(?=\?)/
To remove the dependency on the URL needing a "?", you can make the lookahead match a question mark or the end of line indicator (represented by $), but make sure the first glob is non-greedy.
/[^/]+?(?=\?|$)/
You don't have to use regex, you can just use split and substr.
var str = "https://cdnstorage.api.com/v0/b/my-app.com/o/photo%2FB%_2.jpeg?alt=media&token=4e32-a1a2-c48e6c91a2ba".split("?")[0];
var fileName = temp.substr(temp.lastIndexOf('/')+1);
but if regex is important to you, then:
str.match(/[^?]*\/([^?]+)/)[1]
The code using the substring method would look like the following -
var fileFromURL = urlString.substring(urlString.lastIndexOf('/') + 1, urlString.lastIndexOf('?'))

Regex expression to match the First url after a space followed

I want to match the First url followed by a space using regex expression while typing in the input box.
For example :
if I type www.google.com it should be matched only after a space followed by the url
ie www.google.com<SPACE>
Code
$(".site").keyup(function()
{
var site=$(this).val();
var exp = /^http(s?):\/\/(\w+:{0,1}\w*)?(\S+)(:[0-9]+)?(\/|\/([\w#!:.?+=&%#!\-\/]))?/;
var find = site.match(exp);
var url = find? find[0] : null;
if (url === null){
var exp = /[-\w]+(\.[a-z]{2,})+(\S+)?(\/|\/[\w#!:.?+=&%#!\-\/])?/g;
var find = site.match(exp);
url = find? 'http://'+find[0] : null;
}
});
Fiddle
Please help, Thanks in advance
you should be using a better regex to correctly match the query & fragment parts of your url. Have a look here (What is the best regular expression to check if a string is a valid URL?) for a correct IRI/URI structured Regex test.
But here's a rudimentary version:
var regex = /[-\w]+(\.[a-z]{2,})+(\/?)([^\s]+)/g;
var text = 'test google.com/?q=foo basdasd www.url.com/test?q=asdasd#cheese something else';
console.log(text.match(regex));
Expected Result:
["google.com/?q=foo", "www.url.com/test?q=asdasd#cheese"]
If you really want to check for URLs, make sure you include scheme, port, username & password checks just to be safe.
In the context of what you're trying to achieve, you should really put in some delay so that you don't impact browser performance. Regex tests can be expensive when you use complex rules especially so when running the same rule every time a new character is entered. Just think about what you're trying to achieve and whether or not there's a better solution to get there.
With a lookahead:
var exp = /[-\w]+(\.[a-z]{2,})+(\S+)?(\/|\/[\w#!:.?+=&%#!\-\/])?(?= )/g;
I only added this "(?= )" to your regex.
Fiddle

How to find in javascript with regular expression string from url?

Good evening, How can I find in javascript with regular expression string from url address for example i have url: http://www.odsavacky.cz/blog/wpcproduct/mikronebulizer/ and I need only string between last slashes (/ /) http://something.cz/something/string/ in this example word that i need is mikronebulizer. Thank you very much for you help.
You could use a regex match with a group.
Use this:
/([\w\-]+)\/$/.exec("http://www.odsavacky.cz/blog/wpcproduct/mikronebulizer/")[1];
Here's a jsfiddle showing it in action
This part: ([\w\-]+)
Means at least 1 or more of the set of alphanumeric, underscore and hyphen and use it as the first match group.
Followed by a /
And then finally the: $
Which means the line should end with this
The .exec() returns an array where the first value is the full match (IE: "mikronebulizer/") and then each match group after that.
So .exec()[1] returns your value: mikronebulizer
Simply:
url.match(/([^\/]*)\/$/);
Should do it.
If you want to match (optionally) without a trailing slash, use:
url.match(/([^\/]*)\/?$/);
See it in action here: http://regex101.com/r/cL3qG3
If you have the url provided, then you can do it this way:
var url = 'http://www.odsavacky.cz/blog/wpcproduct/mikronebulizer/';
var urlsplit = url.split('/');
var urlEnd = urlsplit[urlsplit.length- (urlsplit[urlsplit.length-1] == '' ? 2 : 1)];
This will match either everything after the last slash, if there's any content there, and otherwise, it will match the part between the second-last and the last slash.
Something else to consider - yes a pure RegEx approach might be easier (heck, and faster), but I wanted to include this simply to point out window.location.pathName.
function getLast(){
// Strip trailing slash if present
var path = window.location.pathname.replace(/\/$?/, '');
return path.split('/').pop();
}
Alternatively you could get using split:
var pieces = "http://www.odsavacky.cz/blog/wpcproduct/mikronebulizer/".split("/");
var lastSegment = pieces[pieces.length - 2];
// lastSegment == mikronebulizer
var url = 'http://www.odsavacky.cz/blog/wpcproduct/mikronebulizer/';
if (url.slice(-1)=="/") {
url = url.substr(0,url.length-1);
}
var lastSegment = url.split('/').pop();
document.write(lastSegment+"<br>");

Regex to detect a string that contains a URL or file extension

I'm trying to create a small script that detects whether the string input is either:
1) a URL (which will hold a filename): 'http://ajax.googleapis.com/html5shiv.js'
2) just a filename: 'html5shiv.js'
So far I've found this but I think it just checks the URL and file extension. Is there an easy way to make it so it uses an 'or' check? I'm not very experienced with RegExp.
var myRegExp = /[^\\]*\.(\w+)$/i;
Thank you in advance.
How bout this regex?
(\.js)$
it checks the end of the line if it has a .js on it.
$ denotes end of line.
tested here.
Basically, to use 'OR' in regex, simply use the 'pipe' delimiter.
(aaa|bbb)
will match
aaa
or
bbb
For regex to match a url, I'd suggest the following:
\w+://[\w\._~:/?#\[\]#!$&'()*+,;=%]*
This is based on the allowed character set for a url.
For the file, what's your definition of a filename?
If you want to search for strings, that match "(at least) one to many non-fullstop characters, followed by a fullstop, followed by (at least) one to many non-fullstop characters", I'd suggest the following regex:
[^\.]+\.[^\.]+
And altogether:
(\w+://[\w\._~:/?#\[\]#!$&'()*+,;=%]*|[^\.]+\.[^\.]+)
Here's an example of working (in javascript): jsfiddle
You can test it out regex online here: http://gskinner.com/RegExr/
If it is for the purpose of flow control you can do the following:
var test = "http://ajax.googleapis.com/html5shiv.js";
// to recognize http & https
var regex = /^https?:\/\/.*/i;
var result = regex.exec(test);
if (result == null){
// no URL found code
} else {
// URL found code
}
For the purpose of capturing the file name you could use:
var test = "http://ajax.googleapis.com/html5shiv.js";
var regex = /(\w+\.\w+)$/i;
var filename = regex.exec(test);
Yes, you can use the alternation operator |. Be careful, though, because its priority is very low. Lower than sequencing. You will need to write things like /(cat)|(dog)/.
It's very hard to understand what you exactly want with so few use/test cases, but
(http://[a-zA-Z0-9\./]+)|([a-zA-Z0-9\.]+)
should give you a starting point.
If it's a URL, strip it down to the last part and treat it the same way as "just a filename".
function isFile(fileOrUrl) {
// This will return everything after the last '/'; if there's
// no forward slash in the string, the unmodified string is used
var filename = fileOrUrl.split('/').pop();
return (/.+\..+/).test(filename);
}
Try this:
var ajx = 'http://ajax.googleapis.com/html5shiv.js';
function isURL(str){
return /((\/\w+)|(^\w+))\.\w{2,}$/.test(str);
}
console.log(isURL(ajx));
Have a look at this (requires no regex at all):
var filename = string.indexOf('/') == -1
? string
: string.split('/').slice(-1)[0];
Here is the program!
<script>
var url="Home/this/example/file.js";
var condition=0;
var result="";
for(var i=url.length; i>0 && condition<2 ;i--)
{
if(url[i]!="/" && url[i]!="."){result= (condition==1)? (url[i]+result):(result);}
else{condition++;}
}
document.write(result);
</script>

Categories

Resources