Extracting url path using regexp - javascript

Scenario
I want to extract the path string from the document.location, excluding the leading slash.
So for example, if the url is:
http://stackoverflow.com/questions/ask
I would get:
questions/ask
This should be straightforward:
/* group everything after the leading slash */
var re = /\/(.+)/gi;
var matches = document.location.pathname.match(re);
console.log(matches[0]);
But if I run this snippet in the firebug console, I still get the leading slash.
I have already tested the regexp, and the regexp engine correctly extract the group.
Question
How to properly get the group 1 string?

You don't really need regular expressions if you just want to get pathname without leading slash. Since location.pathname always starts with / you can simply take the substring from the first index:
document.location.pathname.substr(1) // or .slice(1)

Are you saying trailing or leading slash? From your post it looks like leading slash.
document.location.pathname.replace(/^\//,"")
By the way, your regexp is right, but you just need to remove gi and read matches[1] rather than matches[0], because matches[0] is the whole string matches the regexp, while matches[1] is the captured part within the matched string (quote with brackets in the regexp).
var matches = document.location.pathname.match(/\/(.+)/);
console.log(matches); // ["/questions/ask", "questions/ask"]

Using regex you can do:
var m = 'http://stackoverflow.com/questions/ask'.match(/\/{2}[^\/]+(\/.+)/);
console.log(m[1]); /questions/ask

Related

Trying to split string with escaped and non-escaped delimiter

I have a string with the form of a b/c\/d\/e/f. I'm trying to split the string on the non-escaped forward slashes.
I have this regex so far (?:[^\\/])/. However it consumes the last character preceding the /. So if I'm doing a replace with "#" instead of a split, the string looks like a #c\/d\/#f. In the case of the split, I get the strings separated the same with the last character being consumed.
I tried using a non capturing group but that doesn't seem to do the trick either. Doing this in javascript.
You may use this regex in JS to return you all the matches before / ignoring all escaped cases i.e. \/. This regex also takes care of the cases when \ is also escaped as \\.
/[^\\\/]*(?:\\.[^\\\/]*)*(?=\/|$)/gm
RegEx Demo
const regex = /[^\\\/]*(?:\\.[^\\\/]*)*(?=\/|$)/gm
const str = `\\\\\\\\\\\\/a b/c\\/d\\\\/e\\\\\\/f1/2\\\\\\\\\\\\\\/23/99`;
let m = str.match(regex).filter(Boolean)
console.log(m)
.filter(Boolean) is used to filter out empty matches.

Javascript: Remove trailing chars from string if they are non-numeric

I am passing codes to an API. These codes are alphanumeric, like this one: M84.534D
I just found out that the API does not use the trailing letters. In other words, the API is expecting M84.534, no letter D at the end.
The problem I am having is that the format is not the same for the codes.
I may have M84.534DAC, or M84.534.
What I need to accomplish before sending the code is to remove any non-numeric characters from the end of the code, so in the examples:
M84.534D -> I need to pass M84.534
M84.534DAC -> I also need to pass M84.534
Is there any function or regex that will do that?
Thank you in advance to all.
You can use the regex below. It will remove anything from the end of the string that is not a number
let code = 'M84.534DAC'
console.log(code.replace(/[^0-9]+?$/, ""));
[^0-9] matches anything that is not a numer
+? Will match between 1 and unlimited times
$ Will match the end of the string
So linked together, it will match any non numbers at the end of the string, and replace them with nothing.
You could use the following expression:
\D*$
As in:
var somestring = "M84.534D".replace(/\D*$/, '');
console.log(somestring);
Explanation:
\D stands for not \d, the star * means zero or more times (greedily) and the $ anchors the expression to the end of the string.
Given your limited data sample, this simple regular expression does the trick. You just replace the match with an empty string.
I've used document.write just so we can see the results. You use this whatever way you want.
var testData = [
'M84.534D',
'M84.534DAC'
]
regex = /\D+$/
testData.forEach((item) => {
var cleanValue = item.replace(regex, '')
document.write(cleanValue + '<br>')
})
RegEx breakdown:
\D = Anything that's not a digit
+ = One or more occurrences
$ = End of line/input

Use Regex to Replace the Starting substring

The following statement needs to be replaced, as in the following code
page_url.replace("www.", "");
This needs to be done with the use of regex. Please let me know the method of achieving the same.
Logically, it needs to replace the "www." with "" if and only if the string starts with www..
"www.test.com".replace(/^www\./, "") // returns "test.com"
or in long form:
var regex = /^www\./;
var stringToMatch = "www.test.com";
var result = stringToMatch.replace(regex, ""); // "test.com"
where the /^www\./ defines the regex object. This is made up of:
/ start of the regex
^ start of string, means this match must appear at the begining
www match the characters www
\. match the character .. We must escape it with \ because in regex . means match anything
/ end of the regex definition
If you wan't to play about with the regex to see how it works, try it in this web regex tester: https://regex101.com/r/7ErXz8/2

Javascript string split with regex

I am trying to split a string using a regular expression for links (urls).
The regex in question is
var regex = new RegExp('(?:^(?:(?:[a-z]+:)?//)(?:\S+(?::\S*)?#)?(?:localhost|(?:(?:[a-z\u00a1-\uffff0-9]-*)*[a-z\u00a1-\uffff0-9]+)(?:\.(?:[a-z\u00a1-\uffff0-9]-*)*[a-z\u00a1-\uffff0-9]+)*(?:\.(?:[a-z\u00a1-\uffff]{2,})))(?::\d{2,5})?(?:[/?#]\S*)?$)')
If i do
regex.test("https://google.com"); // returns true
but doing -
"Go to https://google.com".split(regex);
// return ["Go to https://google.com"]
Whereas i expect it to return
["Go to ", "https://google.com"]
Any idea what's going on here?
First of all, you're using a string literal to build your regex, which means that you have to escape your backslashes (since a backslash has a special meaning in strings, used for the line feed char \n for example):
var regex = new RegExp('(?:^(?:(?:[a-z]+:)?//)(?:\\S+(?::\\S*)?#)?(?:localhost|(?:(?:[a-z\\u00a1-\\uffff0-9]-*)*[a-z\\u00a1-\\uffff0-9]+)(?:\\.(?:[a-z\\u00a1-\\uffff0-9]-*)*[a-z\\u00a1-\\uffff0-9]+)*(?:\\.(?:[a-z\\u00a1-\\uffff]{2,})))(?::\\d{2,5})?(?:[/?#]\\S*)?$)');
Another solution would be to use the regex literal, as JavaScript proposes one, but you would then have to escape the slashes:
var regex = /(?:^(?:(?:[a-z]+:)?\/\/)(?:\S+(?::\S*)?#)?(?:localhost|(?:(?:[a-z\u00a1-\uffff0-9]-*)*[a-z\u00a1-\uffff0-9]+)(?:\.(?:[a-z\u00a1-\uffff0-9]-*)*[a-z\u00a1-\uffff0-9]+)*(?:\.(?:[a-z\u00a1-\uffff]{2,})))(?::\d{2,5})?(?:[\/?#]\S*)?$)/;
Then, your regex will try to match against the entire input due to the ^ and $ anchors. So if you remove them (or better, replace them with word boundaries \b), you'll be able to find URLs in a string for example:
var regex = /(?:\b(?:(?:[a-z]+:)?\/\/)(?:\S+(?::\S*)?#)?(?:localhost|(?:(?:[a-z\u00a1-\uffff0-9]-*)*[a-z\u00a1-\uffff0-9]+)(?:\.(?:[a-z\u00a1-\uffff0-9]-*)*[a-z\u00a1-\uffff0-9]+)*(?:\.(?:[a-z\u00a1-\uffff]{2,})))(?::\d{2,5})?(?:[\/?#]\S*)?\b)/;
But, the main point is that you're misunderstanding the split concept. Given the string "hello world", if you split by space, you'll end up with ["hello", "world"]: no more space anymore since it was the char that was used to split.
That is, if you split by the URL regex, the output array won't contain the URLs anymore. It seems to me that a lookahead could suit your needs:
var regex = /(?=(?:\b(?:(?:[a-z]+:)?\/\/)(?:\S+(?::\S*)?#)?(?:localhost|(?:(?:[a-z\u00a1-\uffff0-9]-*)*[a-z\u00a1-\uffff0-9]+)(?:\.(?:[a-z\u00a1-\uffff0-9]-*)*[a-z\u00a1-\uffff0-9]+)*(?:\.(?:[a-z\u00a1-\uffff]{2,})))(?::\d{2,5})?(?:[\/?#]\S*)?\b))/;
"Go to https://google.com".split(regex) // ["Go to ", "https://google.com"]
The regex explained:
(?=(?:\b(?:(?:[a-z]+:)?//)(?:\S+(?::\S*)?#)?(?:localhost|(?:(?:[a-z\u00a1-\uffff0-9]-*)*[a-z\u00a1-\uffff0-9]+)(?:\.(?:[a-z\u00a1-\uffff0-9]-*)*[a-z\u00a1-\uffff0-9]+)*(?:\.(?:[a-z\u00a1-\uffff]{2,})))(?::\d{2,5})?(?:[/?#]\S*)?\b))
Debuggex Demo
By splitting a string with a positive lookahead (?=content_of_lookahead), you'll split by each interchar that is followed by the content of the lookahead.
Take a look at 8 Regular Expressions You Should Know.
To match an url you can use following regex :
var regex = "(https?:\/\/)?([\da-z\.-]+)\.([a-z\.]{2,6})([\/\w# \.-]*)*\/?$";
"Go to https://google.com".split(regex);
// return ["https://google.com"]
Live example.
Hope this helps.

Regex produces different result in javascript

Why does this regex return an entirely different result in javascript as compared to an on-line regex tester, found at http://www.gskinner.com/RegExr/
var patt = new RegExp(/\D([0-9]*)/g);
"/144444455".match(patt);
The return in the console is:
["/144444455"]
While it does return the correct group in the regexr tester.
All I'm trying to do is extract the first amount inside a piece of text. Regardless if that text starts with a "/" or has a bunch of other useless information.
The regex does exactly what you tell it to:
\D matches a non-digit (in this case /)
[0-9]* matches a string of digits (144444455)
You will need to access the content of the first capturing group:
var match = patt.exec(subject);
if (match != null) {
result = match[1];
}
Or simply drop the \D entirely - I'm not sure why you think you need it in the first place...
Then, you should probably remove the /g modifier if you only want to match the first number, not all numbers in your text. So,
result = subject.match(/\d+/);
should work just as well.

Categories

Resources