Extract part of URL with Regex - javascript

I have a URL that will always be in either this format
http://domain.tld/foo/bar/boo
http://www.domain.tld/foo/bar/boo
http://sub.domain.tld/foo/bar/boo
http://www.sub.domain.tld/foo/bar/boo
I'd like to use Regex to extract bar from the url, regardless of the format.
I am using JavaScript.
I tried to break the url up using something like
var x = 'http://domain.tld/foo/bar/boo`'
x.split(/^((http[s]?):\/)?\/?([^:\/\s]+)((\/\w+)*\/)([\w\-\.]+[^#?\s]+)(.*)?(#[\w\-]+)?$/g)
but this doesn't really work nor does it help as I seem to get an array or items when I really just need the value at bar

var el = document.createElement('a');
el.href = "http://www.domain.tld/foo/bar/boo";
var importantPart = el.pathname.split('/')[2];
console.log(importantPart);
fiddle: https://jsfiddle.net/dcyo4ph5/1/
sources: https://css-tricks.com/snippets/javascript/get-url-and-url-parts-in-javascript/ & JavaScript - Get Portion of URL Path
I guess this doesn't use regex. So that's maybe not what you want.

I'll list both regex and non regex way. Surprisingly the regex way seems shorter.
Regex Way
The regex to find bar and boo is this /.*\/(.*)\/(.*)$/ which is short, precise and exactly what you need.
Let's put into practice,
const params = "http://www.sub.domain.tld/foo/bar/boo".match(/.*\/(.*)\/(.*)$/)
This results in,
params;
["http://www.sub.domain.tld/foo/bar/boo","bar","boo"]
Just access it like params[0] and params[1].
Regex Explanation:
Extended Version:
The regex can be extended more to grab the /bar/foo/ pattern with a ending slash like this,
.*\/\b(.*)\/\b(.*)(\/?)$
Which means,
and it can be further extended, but let's keep it simple for now.
Non Regex Way
Use native methods like .split(),
function getLastParam(str, targetIndex = 1) {
const arr = str
.split("/") // split by slash
.filter(e=>e); // remove empty array elements
return arr[arr.length - targetIndex];
}
Let's test it out quickly for different cases
[
"http://domain.tld/foo/bar/boo",
"http://www.domain.tld/foo/bar/boo",
"http://sub.domain.tld/foo/bar/boo",
"http://www.sub.domain.tld/foo/bar/boo",
"http://domain.tld/foo/bar/boo/",
".../bar/boo"
].map(e => {
console.log({ input: e, output: getLastParam(e, 1) });
});
This will yield in following,
{input: "http://domain.tld/foo/bar/boo", output: "boo"}
{input: "http://www.domain.tld/foo/bar/boo", output: "boo"}
{input: "http://sub.domain.tld/foo/bar/boo", output: "boo"}
{input: "http://www.sub.domain.tld/foo/bar/boo", output: "boo"}
{input: "http://domain.tld/foo/bar/boo/", output: "boo"}
{input: ".../bar/boo", output: "boo"}
If you want bar, then use 2 for targetIndex instead. It will get the second last. In which case, getLastParam(str, 2) would result in bar.
Speed stuff
Here is the small benchmark stuff, http://jsbench.github.io/#a6bcecaa60b7d668636f8f760db34483
getLastParamNormal: 5,203,853 ops/sec
getLastParamRegex: 6,619,590 ops/sec
Well, it doesn't matter. But nonetheless, it's interesting.

Split and slice will do that as simple as this, where split('/') creates an array and slice(-2)[0] will pick the first [0] of the last two (-2).
With replace(/\/$/, "") you get rid of any trailing slash (showed in 4th sample below)
Stack snippet
var x = 'http://domain.tld/foo/bar/boo'
console.log( x.split('/').slice(-2)[0] );
var x = 'http://www.sub.domain.tld/foo/bar/boo'
console.log( x.split('/').slice(-2)[0] );
var x = 'http://www.domain.tld/foo/bar/boo'
console.log( x.split('/').slice(-2)[0] );
// and this one will trim trailing slash
var x = 'http://www.domain.tld/foo/bar/boo/'
console.log( x.replace(/\/$/, "").split('/').slice(-2)[0] );
Or maybe just reverse the array and get the 2nd item ([1] as array is zero based)
var x = 'http://www.domain.tld/foo/bar/boo/'
console.log( x.split('/').reverse()[1] );

You don't need regex. Anchor elements have an API that breaks down the URL for you. You can then split the pathname to get the path
function parse(path) {
let a = document.createElement('a');
a.href = path;
return a.pathname.split('/')[2];
}
console.log(parse('http://domain.tld/foo/bar/boo'));
console.log(parse('http://www.domain.tld/foo/bar/boo'));
console.log(parse('http://sub.domain.tld/foo/bar/boo'));
console.log(parse('http://www.sub.domain.tld/foo/bar/boo'));

Related

JavaScript find string in string between 2 words

I have strings as below.
/edit/120/test or /edit/120/test1, ... etc
I want to find 120 or from the string like that, How can I find that with javascript.
As David said, there are a few ways to do this. Given that the string you are looking for is always between "/edit/" and "/" followed by any text, here is how I would do it :
var target = "/edit/120/test" // for example
var result = target.substring(6) // removes the first 6 characters ("/edit/")
.split('/') // produces the following array : [120, test]
[0] // the first element of the array ("120")
Actually, let's be honest, I'm too lazy for that. Here's how I'd really do it :
var target = "/edit/120/test" // for example+
var result = target.split('/')[2]; // takes the 3rd element of the following array : ['', 'edit', '120', 'test']

Regex working in regex tester but not in JS (wrong matches)

this is actually the first time I encounter this problem.
I'am trying to parse a string for key value pairs where the seperator can be a different character. It works fine at any regex tester but not in my current JS project. I already found out, that JS regex works different then for example php. But I couldn't find what to change with mine.
My regex is the following:
[(\?|\&|\#|\;)]([^=]+)\=([^\&\#\;]+)
it should match:
#foo=bar#site=test
MATCH 1
1. [1-4] `foo`
2. [5-8] `bar`
MATCH 2
1. [9-13] `site`
2. [14-18] `test`
and the JS is:
'#foo=bar#site=test'.match(/[(\?|\&|\#|\;)]([^=]+)\=([^\&\#\;]+)/g);
Result:
["#foo=bar", "#site=test"]
For me it looks like the grouping is not working properly.
Is there a way around this?
String#match doesn't include capture groups. Instead, you loop through regex.exec:
var match;
while (match = regex.exec(str)) {
// Use each result
}
If (like me) that assignment in the condition bothers you, you can use a !! to make it clearly a test:
var match;
while (!!(match = regex.exec(str))) {
// Use each result
}
Example:
var regex = /[(\?|\&|\#|\;)]([^=]+)\=([^\&\#\;]+)/g;
var str = '#foo=bar#site=test';
var match;
while (!!(match = regex.exec(str))) {
console.log("result", match);
}
I wouldn't rely on a complex regex which could fail anytime for strange reasons and is hard to read but use simple functions to split the string:
var str = '#foo=bar#site=test'
// split it by # and do a loop
str.split('#').forEach(function(splitted){
// split by =
var splitted2 = splitted.split('=');
if(splitted2.length === 2) {
// here you have
// splitted2[0] = foo
// splitted2[1] = bar
// and in the next iteration
// splitted2[0] = site
// splitted2[1] = test
}
}

How do I select the second last word from each string in an array in JavaScript?

Say I have a list of strings like:
foo = ["asdfas ad sfa", "asdfa dfas fasd", "adf adfasd sdfasf adfdf"];
Now I want a mapping that maps this array into a list of the second last word of each string:
bar = ["ad", "dfas", "sdfasf"];
I know I can find them by using the regex /(\w+)\s\w+$/, but then how do I use the found result to perform the mapping?
I'd like to use regex replacement like
bar = foo.map(function(d){return d.replace(/^.*\s(\w+)\s\w+$/,"$1")});
Update: After reading the comment below, I think this one might be better:
bar = foo.map(function(d){return d.match(/(\w+)\s\w+$/)[1]});
It throws error when there's no second-last word, which satisfies my need. And of course if you want it to just leave the string unchanged when no second-last word found then try this:
bar = foo.map(function(d){return d.replace(/.*?(\w+)\s\w+$/,"$1")});
that's right, map is what you need
foo = ["asdfas ad sfa", "asdfa dfas", "adf adfasd sdfasf adfdf"];
foo.map(function(str) {
var words = str.split(" "), total = words.length;
if (total>2) this.push(words[total-2]);;
}, bar=[]);
alert(bar);
forgot demo again
new demo
EDIT: as #jcsanyi mentioned my previous code returned empty elements when string contains less than 3 words, so I found a solution and modified it again
foo is still global (it's not a bug) and second element now has only two words to check
very new demo
try this . I used JavaScript 1.8 Arrayreduce method
[
"asdfas ad sfa",
"asdfa dfas fasd",
"adf adfasd sdfasf adfdf"
].reduce( function ( m , r ){
var splitted = r.split(/\W+/g);
if ( splitted.length > 1 ){
m.push( splitted[splitted.length-2] );
}
return m;
},[]);
try this:
bar = foo.map(function(el){
return el.match(/^.*?(\w+)\s\w+$/)[1];
});
or with jquery
bar = $.map(foo, function(el, i){
return el.match(/^.*?(\w+)\s\w+$/)[1];
});
Try This. Get the Second Last Character From string..
var str = "HELLO WORLD";
var n = str.charAt(str.length-2);
Result is: D

How to remove the last matched regex pattern in javascript

I have a text which goes like this...
var string = '~a=123~b=234~c=345~b=456'
I need to extract the string such that it splits into
['~a=123~b=234~c=345','']
That is, I need to split the string with /b=.*/ pattern but it should match the last found pattern. How to achieve this using RegEx?
Note: The numbers present after the equal is randomly generated.
Edit:
The above one was just an example. I did not make the question clear I guess.
Generalized String being...
<word1>=<random_alphanumeric_word>~<word2>=<random_alphanumeric_word>..~..~..<word2>=<random_alphanumeric_word>
All have random length and all wordi are alphabets, the whole string length is not fixed. the only text known would be <word2>. Hence I needed RegEx for it and pattern being /<word2>=.*/
This doesn't sound like a job for regexen considering that you want to extract a specific piece. Instead, you can just use lastIndexOf to split the string in two:
var lio = str.lastIndexOf('b=');
var arr = [];
var arr[0] = str.substr(0, lio);
var arr[1] = str.substr(lio);
http://jsfiddle.net/NJn6j/
I don't think I'd personally use a regex for this type of problem, but you can extract the last option pair with a regex like this:
var str = '~a=123~b=234~c=345~b=456';
var matches = str.match(/^(.*)~([^=]+=[^=]+)$/);
// matches[1] = "~a=123~b=234~c=345"
// matches[2] = "b=456"
Demo: http://jsfiddle.net/jfriend00/SGMRC/
Assuming the format is (~, alphanumeric name, =, and numbers) repeated arbitrary number of times. The most important assumption here is that ~ appear once for each name-value pair, and it doesn't appear in the name.
You can remove the last token by a simple replacement:
str.replace(/(.*)~.*/, '$1')
This works by using the greedy property of * to force it to match the last ~ in the input.
This can also be achieved with lastIndexOf, since you only need to know the index of the last ~:
str.substring(0, (str.lastIndexOf('~') + 1 || str.length() + 1) - 1)
(Well, I don't know if the code above is good JS or not... I would rather write in a few lines. The above is just for showing one-liner solution).
A RegExp that will give a result that you may could use is:
string.match(/[a-z]*?=(.*?((?=~)|$))/gi);
// ["a=123", "b=234", "c=345", "b=456"]
But in your case the simplest solution is to split the string before extract the content:
var results = string.split('~'); // ["", "a=123", "b=234", "c=345", "b=456"]
Now will be easy to extract the key and result to add to an object:
var myObj = {};
results.forEach(function (item) {
if(item) {
var r = item.split('=');
if (!myObj[r[0]]) {
myObj[r[0]] = [r[1]];
} else {
myObj[r[0]].push(r[1]);
}
}
});
console.log(myObj);
Object:
a: ["123"]
b: ["234", "456"]
c: ["345"]
(?=.*(~b=[^~]*))\1
will get it done in one match, but if there are duplicate entries it will go to the first. Performance also isn't great and if you string.replace it will destroy all duplicates. It would pass your example, but against '~a=123~b=234~c=345~b=234' it would go to the first 'b=234'.
.*(~b=[^~]*)
will run a lot faster, but it requires another step because the match comes out in a group:
var re = /.*(~b=[^~]*)/.exec(string);
var result = re[1]; //~b=234
var array = string.split(re[1]);
This method will also have the with exact duplicates. Another option is:
var regex = /.*(~b=[^~]*)/g;
var re = regex.exec(string);
var result = re[1];
// if you want an array from either side of the string:
var array = [string.slice(0, regex.lastIndex - re[1].length - 1), string.slice(regex.lastIndex, string.length)];
This actually finds the exact location of the last match and removes it regex.lastIndex - re[1].length - 1 is my guess for the index to remove the ellipsis from the leading side, but I didn't test it so it might be off by 1.

Javascript split to split string in 2 parts irrespective of number of spit characters present in string

I want to split a string in Javascript using split function into 2 parts.
For Example i have string:
str='123&345&678&910'
If i use the javascripts split, it split it into 4 parts.
But i need it to be in 2 parts only considering the first '&' which it encounters.
As we have in Perl split, if i use like:
($fir, $sec) = split(/&/,str,2)
it split's str into 2 parts, but javascript only gives me:
str.split(/&/, 2);
fir=123
sec=345
i want sec to be:
sec=345&678&910
How can i do it in Javascript.
var subStr = string.substring(string.indexOf('&') + 1);
View this similar question for other answers:
split string only on first instance of specified character
You can use match instead of split:
str='123&345&678&910';
splited = str.match(/^([^&]*?)&(.*)$/);
splited.shift();
console.log(splited);
output:
["123", "345&678&910"]
You can remain on the split part by using the following trick:
var str='123&345&678&910',
splitted = str.split( '&' ),
// shift() removes the first item and returns it
first = splitted.shift();
console.log( first ); // "123"
console.log( splitted.join( '&' ) ); // "345&678&910"
I wrote this function:
function splitter(mystring, mysplitter) {
var myreturn = [],
myindexplusone = mystring.indexOf(mysplitter) + 1;
if (myindexplusone) {
myreturn[0] = mystring.split(mysplitter, 1)[0];
myreturn[1] = mystring.substring(myindexplusone);
}
return myreturn;
}
var str = splitter("hello-world-this-is-a-test", "-");
console.log(str.join("<br>"));
//hello
//world-this-is-a-test​​​
The output will be either an empty array (not match) or an array with 2 elements (before the split and everything after)
Demo
I have that:
var str='123&345&678&910';
str.split('&',1).concat( str.split('&').slice(1).join('&') );
//["123", "345&678&910"]
str.split('&',2).concat( str.split('&').slice(2).join('&') );
//["123", "345", "678&910"];
for convenience:
String.prototype.mySplit = function( sep, chunks) {
chunks = chunks|=0 &&chunks>0?chunks-1:0;
return this.split( sep, chunks )
.concat(
chunks?this.split( sep ).slice( chunks ).join( sep ):[]
);
}
What about the use of split() and replace()?:
Given we have that string str='123&345&678&910' We can do
var first = str.split("&",1); //gets the first word
var second = str.replace(first[0]+"&", ""); //removes the first word and the ampersand
Please note that split() returns an array that is why getting the index with first[0] is recommended, however, without getting the index, it still worked as needed i.e first+"&".
Feel free to replace the "&" with the string you need to split with.
Hope this helps :)

Categories

Resources