How to extract end of URL in Javascript?

How to extract end of URL in Javascript? - javascript

I have URLs in the form:
serverName/app/image/thumbnail/2012/4/23/1335228884300/bb65efd50ade4b3591dcf7f4c693042b
Where serverName is the domain name of the server.
I would like to write a JS function that accepts one of these URLs and returns the very last (right-most) forward-slash-delimited string. So if the URL above was passed into the function, it would return "bb65efd50ade4b3591dcf7f4c693042b";.
function getImageDirectoryByFullURL(url) {
// ... Not sure how to define regexp to delimit on forward slashes,
// or how to start searching from end of string.
}

split by slashes /, pop off the last and return it
function getImageDirectoryByFullURL(url){
return url.split('/').pop()
}
//a step by step breakdown
function getImageDirectoryByFullURL(url){
url = url.split('/'); //url = ["serverName","app",...,"bb65efd50ade4b3591dcf7f4c693042b"]
url = url.pop(); //url = "bb65efd50ade4b3591dcf7f4c693042b"
return url; //return "bb65efd50ade4b3591dcf7f4c693042b"
}
what this does is split the url per / and returns an array of values in between, but not including, the slashes. then, since what's returned by split() is an array, we can use pop() to pop off the last item and return it.

In this case, substr() might be faster than split(). Not 100% sure.
function getImageDirectoryByFullURL(url){
return url.substr(url.lastIndexOf("/")+1);
}
Edit: I forgot, you don't need to include the extra length parameter. Without passing that in you just get a substr to the end of the string, which is what is desired. While this solution is admittedly a little uglier than Joseph's answer, it is twice as fast in Chrome and something like fives times as fast in Firefox.

To make it a little more robust and allow for the possible presence of a trailing slash, hash tags or query parameters on the URL:
function getImageDirectoryByFullURL(url){
url = url.replace(/#[^#]+$/, "").replace(/\?[^\?]+$/, "").replace(/\/$/, "");
return url.substr(url.lastIndexOf("/") + 1);
}
And a working demo with a bunch of test cases: http://jsfiddle.net/jfriend00/akVVf/

Since there are no sensible regular expression versions, consider:
return url.replace(/^.*\//,'');

Also you can try something like this, getting the same result.
function getImageUrl( url){
var result = url.substring(url.lastIndexOf("/") + 1);
return result;
}

Related

How can I cut the string after a second underscore?

I'm receiving a list of files in an object and I just need to display a file name and its type in a table.
All files come back from a server in such format: timestamp_id_filename.
Example: 1568223848_12345678_some_document.pdf
I wrote a helper function which cuts the string.
At first, I did it with String.prototype.split() method, I used regex, but then again - there was a problem. Files can have underscores in their names so that didn't work, so I needed something else. I couldn't come up with a better idea. I think it looks really dumb and it's been haunting me the whole day.
The function looks like this:
const shortenString = (attachmentName) => {
const file = attachmentName
.slice(attachmentName.indexOf('_') + 1)
.slice(attachmentName.slice(attachmentName.indexOf('_') + 1).indexOf('_') + 1);
const fileName = file.slice(0, file.lastIndexOf('.'));
const fileType = file.slice(file.lastIndexOf('.'));
return [fileName, fileType];
};
I wonder if there is a more elegant way to solve the problem without using loops.

You can use replace and split, with the pattern we are replacing the string upto the second _ from start of string and than we split on . to get name and type
let nameAndType = (str) => {
let replaced = str.replace(/^(?:[^_]*_){2}/g, '')
let splited = replaced.split('.')
let type = splited.pop()
let name = splited.join('.')
return {name,type}
}
console.log(nameAndType("1568223848_12345678_some_document.pdf"))
console.log(nameAndType("1568223848_12345678_some_document.xyz.pdf"))

function splitString(val){
return val.split('_').slice('2').join('_');
}

const getShortString = (str) => str.replace(/^(?:[^_]*_){2}/g, '')
For input like
1568223848_12345678_some_document.pdf, it should give you something like some_document.pdf

const re = /(.*?)_(.*?)_(.*)/;
const name = "1568223848_12345678_some_document.pdf";
[,date, id, filename] = re.exec(name);
console.log(date);
console.log(id);
console.log(filename);
some notes:
you want to make the regular expression 1 time. If you do this
function getParts(str) {
const re = /expression/;
...
}
Then you're making a new regular expression object every time you call getParts.
.*? is faster than .*
This is because .* is greedy so the moment the regular expression engine sees that it puts the entire rest of the string into that slot and then checks if can continue the expression. If it fails it backs off one character. If that fails it backs off another character, etc.... .*? on the other hand is satisfied as soon as possible. So it adds one character then sees if the next part of the expression works, if not it adds one more character and sees if the expressions works, etc..
splitting on '_' works but it could potentially make many temporary strings
for example if the filename is 1234_1343_a________________________.pdf
you'd have to test to see if using a regular experssion is faster or slower than splitting, assuming speed matters.

You can kinda chain .indexOf to get second offset and any further, although more than two would look ugly. The reason is that indexOf takes start index as second argument, so passing index of the first occurrence will help you find the second one:
var secondUnderscoreIndex = name.indexOf("_",name.indexOf("_")+1);
So my solution would be:
var index = name.indexOf("_",name.indexOf("_")+1));
var [timestamp, name] = [name.substring(0, index), name.substr(index+1)];
Alternatively, using regular expression:
var [,number1, number2, filename, extension] = /([0-9]+)_([0-9]+)_(.*?)\.([0-9a-z]+)/i.exec(name)
// Prints: "1568223848 12345678 some_document pdf"
console.log(number1, number2, filename, extension);

I like simplicity...
If you ever need the date in times, theyre in [1] and [2]
var getFilename = function(str) {
return str.match(/(\d+)_(\d+)_(.*)/)[3];
}
var f = getFilename("1568223848_12345678_some_document.pdf");
console.log(f)

If ever files names come in this format timestamp_id_filename. You can use a regular expression that skip the first two '_' and save the nex one.
test:
var filename = '1568223848_12345678_some_document.pdf';
console.log(filename.match(/[^_]+_[^_]+_(.*)/)[1]); // result: 'some_document.pdf'
Explanation:
/[^]+[^]+(.*)/
[^]+ : take characters diferents of ''
: take '' character
Repeat so two '_' are skiped
(.*): Save characters in a group
match method: Return array, his first element is capture that match expression, next elements are saved groups.

Split the file name string into an array on underscores.
Discard the first two elements of the array.
Join the rest of the array with underscores.
Now you have your file name.

covering query string variations of a window.location.href

Consider the following JS code:
if ( window.location.href == "https://teamtreehouse.com/signin" ) {
// do stuff...
}
I need that the comparison operator will not only cover the exact URL, but also any possible variation of it with query strings and data coming after the phrase "signin".
How will you do that in JS? I know it should include regex but as linear learning is important for me, I would prefer waiting to my course on JS regex in the coming weeks and just ask here, in this special occasion.

Try the following regex:
/^https:\/\/teamtreehouse\.com\/signin/i
The ^ means "starting with", and the extra \ characters are just to escape special characters inside the regex.
As per the comment by #epascarello, you should definitely use the "ignore-case" parameter, i (added above).
Edit:
Use the test function:
(/^https:\/\/teamtreehouse\.com\/signin/i).test(window.location.href)

You can use String#match to check if some string contains something.
Ex:
if (window.location.href.match("https://teamtreehouse.com/signin")){
// Do something
}
Also you can use this .host & .pathname from Window.location object.
Ex:
var link = window.location.host + window.location.pathname;
if (link === "www.teamtreehouse.com/signin"){
// Do something
}

if ( /^[https:\/\/teamtreehouse.com/signin][\?[\.]+]?/.test(window.location.href)) {
// do stuff...
}

Regex do not match content but whole searched string

I'm using this regex to match an "href" attribute in a <a> tag:
var href_matches = postRep.match(/href="(.*?)"/g);
The regex matches correctly the href except it returns the whole "href=http:example.com" string.
How do I manage to get only the href value (eg. "example.com")?

You can either run exec() on the regex :
var url_match = /href="(.*?)"/g.exec(postRep);
or remove the global flag
var url_match = postRep.match(/href="(.*?)"/);
Using String's match() function won't return captured groups if the
global modifier is set.

Just another idea.
You can try something like this function:
function getHrefs(inputString) {
var out = [];
inputString.replace(/\bhref\b=['"]([^'"]+)['"]/gi, function(result, backreference) {
out.push(backreference);
return '';
});
return out;
}
Improved solution (much shortest):
function getHrefs(inputString) {
return (inputString.match(/\bhref\b=['"][^'"]+(?=['"])/gi) || []).map(s => s.replace(/^href=["']/,""));
}
Edit:
There is other option - exec. But with exec you will need loop to get all matches (if you need this).

You can use regex lookbehinds to check if the "href=" is there without actually including it in the match.
For example, the regex (?<=href=)example\.com applied to href=example.com should only match example.com.
EDIT:
This method only works in languages that support regex lookbehinds. Javascript doesn't support this feature. (thanks to Georgi Naumov for pointing this out)

Regex to find urls with hashes and exclamation marks #! [duplicate]

I know this has been asked a thousand times before (apologies), but searching SO/Google etc I am yet to get a conclusive answer.
Basically, I need a JS function which when passed a string, identifies & extracts all URLs based on a regex, returning an array of all found. e.g:
function findUrls(searchText){
var regex=???
result= searchText.match(regex);
if(result){return result;}else{return false;}
}
The function should be able to detect and return any potential urls. I am aware of the inherant difficulties/isses with this (closing parentheses etc), so I have a feeling the process needs to be:
Split the string (searchText) into distinct sections starting/ending) with either nothing, a space or carriage return either side of it, resulting in distinct content chunks, e.g. do a split.
For each content chunk that results from the split, see whether it fits the logic for a URL of any construction, namely, does it contain a period immediately followed the text (the one constant rule for qualifying a potential URL).
The regex should see whether the period is immediately followed by other text, of the type allowable for a tld, directory structure & query string, and preceded by text of the allowable type for a URL.
I am aware false positives may result, however any returned values will then be checked with a call to the URL itself, so this can be ignored. The other functions I have found often dont return the URLs query string too, if present.
From a block of text, the function should thus be able to return any type of URL, even if it means identifying will.i.am as a valid one!
eg. http://www.google.com, google.com, www.google.com, http://google.com,
ftp.google.com, https:// etc...and any derivation thereof with a query string
should be returned...
Many thanks, apologies again if this exists elsewhere on SO but my searches havent returned it..

I just use URI.js -- makes it easy.
var source = "Hello www.example.com,\n"
+ "http://google.com is a search engine, like http://www.bing.com\n"
+ "http://exämple.org/foo.html?baz=la#bumm is an IDN URL,\n"
+ "http://123.123.123.123/foo.html is IPv4 and "
+ "http://fe80:0000:0000:0000:0204:61ff:fe9d:f156/foobar.html is IPv6.\n"
+ "links can also be in parens (http://example.org) "
+ "or quotes »http://example.org«.";
var result = URI.withinString(source, function(url) {
return "<a>" + url + "</a>";
});
/* result is:
Hello <a>www.example.com</a>,
<a>http://google.com</a> is a search engine, like <a>http://www.bing.com</a>
<a>http://exämple.org/foo.html?baz=la#bumm</a> is an IDN URL,
<a>http://123.123.123.123/foo.html</a> is IPv4 and <a>http://fe80:0000:0000:0000:0204:61ff:fe9d:f156/foobar.html</a> is IPv6.
links can also be in parens (<a>http://example.org</a>) or quotes »<a>http://example.org</a>«.
*/
https://github.com/medialize/URI.js
http://medialize.github.io/URI.js/

You could use the regex from URI.js:
// gruber revised expression - http://rodneyrehm.de/t/url-regex.html
var uri_pattern = /\b((?:[a-z][\w-]+:(?:\/{1,3}|[a-z0-9%])|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}\/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'".,<>?«»“”‘’]))/ig;
String#match and or String#replace may help…

Following regular expression extract URLs from string (inc. query string) and returns array
var url = "asdasdla hakjsdh aaskjdh https://www.google.com/search?q=add+a+element+to+dom+tree&oq=add+a+element+to+dom+tree&aqs=chrome..69i57.7462j1j1&sourceid=chrome&ie=UTF-8 askndajk nakjsdn aksjdnakjsdnkjsn";
var matches = strings.match(/\bhttps?::\/\/\S+/gi) || strings.match(/\bhttps?:\/\/\S+/gi);
Output:
["https://www.google.com/search?q=format+to+6+digir&…s=chrome..69i57.5983j1j1&sourceid=chrome&ie=UTF-8"]
Note:
This handles both http:// with single colon and http::// with double colon in string, vice versa for https, So it's safe for you to use. :)

try this
var expression = /[-a-zA-Z0-9#:%_\+.~#?&//=]{2,256}\.[a-z]{2,4}\b(\/[-a-zA-Z0-9#:%_\+.~#?&//=]*)?/gi;
you could use this website to test regexp http://gskinner.com/RegExr/

In UIPath Studio the following built-in regex rule has been defined:
/(?:(?:https?|ftp|file):\/\/|www\.|ftp\.)(?:\([-a-zA-Z0-9+&##\/%=~_|$?!:,.]*\)|[-a-zA-Z0-9+&##\/%=~_|$?!:,.])*(?:\([-a-zA-Z0-9+&##\/%=~_|$?!:,.]*\)|[a-zA-Z0-9+&##\/%=~_|$])/

Regex to detect a string that contains a URL or file extension

I'm trying to create a small script that detects whether the string input is either:
1) a URL (which will hold a filename): 'http://ajax.googleapis.com/html5shiv.js'
2) just a filename: 'html5shiv.js'
So far I've found this but I think it just checks the URL and file extension. Is there an easy way to make it so it uses an 'or' check? I'm not very experienced with RegExp.
var myRegExp = /[^\\]*\.(\w+)$/i;
Thank you in advance.

How bout this regex?
(\.js)$
it checks the end of the line if it has a .js on it.
$ denotes end of line.
tested here.

Basically, to use 'OR' in regex, simply use the 'pipe' delimiter.
(aaa|bbb)
will match
aaa
or
bbb
For regex to match a url, I'd suggest the following:
\w+://[\w\._~:/?#\[\]#!$&'()*+,;=%]*
This is based on the allowed character set for a url.
For the file, what's your definition of a filename?
If you want to search for strings, that match "(at least) one to many non-fullstop characters, followed by a fullstop, followed by (at least) one to many non-fullstop characters", I'd suggest the following regex:
[^\.]+\.[^\.]+
And altogether:
(\w+://[\w\._~:/?#\[\]#!$&'()*+,;=%]*|[^\.]+\.[^\.]+)
Here's an example of working (in javascript): jsfiddle
You can test it out regex online here: http://gskinner.com/RegExr/

If it is for the purpose of flow control you can do the following:
var test = "http://ajax.googleapis.com/html5shiv.js";
// to recognize http & https
var regex = /^https?:\/\/.*/i;
var result = regex.exec(test);
if (result == null){
// no URL found code
} else {
// URL found code
}
For the purpose of capturing the file name you could use:
var test = "http://ajax.googleapis.com/html5shiv.js";
var regex = /(\w+\.\w+)$/i;
var filename = regex.exec(test);

Yes, you can use the alternation operator |. Be careful, though, because its priority is very low. Lower than sequencing. You will need to write things like /(cat)|(dog)/.
It's very hard to understand what you exactly want with so few use/test cases, but
(http://[a-zA-Z0-9\./]+)|([a-zA-Z0-9\.]+)
should give you a starting point.

If it's a URL, strip it down to the last part and treat it the same way as "just a filename".
function isFile(fileOrUrl) {
// This will return everything after the last '/'; if there's
// no forward slash in the string, the unmodified string is used
var filename = fileOrUrl.split('/').pop();
return (/.+\..+/).test(filename);
}

Try this:
var ajx = 'http://ajax.googleapis.com/html5shiv.js';
function isURL(str){
return /((\/\w+)|(^\w+))\.\w{2,}$/.test(str);
}
console.log(isURL(ajx));

Have a look at this (requires no regex at all):
var filename = string.indexOf('/') == -1
? string
: string.split('/').slice(-1)[0];

Here is the program!
<script>
var url="Home/this/example/file.js";
var condition=0;
var result="";
for(var i=url.length; i>0 && condition<2 ;i--)
{
if(url[i]!="/" && url[i]!="."){result= (condition==1)? (url[i]+result):(result);}
else{condition++;}
}
document.write(result);
</script>

Develop Reference

JavaScript is the programming language of the Web.

How to extract end of URL in Javascript? - javascript

Since there are no sensible regular expression versions, consider: return url.replace(/^.*\//,'');

Also you can try something like this, getting the same result. function getImageUrl( url){ var result = url.substring(url.lastIndexOf("/") + 1); return result; }

Related

How can I cut the string after a second underscore?

covering query string variations of a window.location.href

Regex do not match content but whole searched string

Regex to find urls with hashes and exclamation marks #! [duplicate]

Regex to detect a string that contains a URL or file extension

Categories

Resources