Javascript: Website url validation with regex - javascript

I'm working on create a regular expression in javascript to validate website urls. I searched a bit in the stackoverflow community and i did not find something to be completed helpful.
My regex until now: /(https?:\/\/)?(www\.)?[a-zA-Z0-9]+\.[a-zA-Z]{2,}/g
But it seems to fail and pass the validation for the url with two w like ww.test.com
Should pass the test of regex:
http://www.test.com
https://www.test.com
www.test.com
www.test.co.uk
www.t.com
test.com
test.fr
test.co.uk
Should not pass the test of regex:
w.test.com
ww.test.com
www.test
test
ww.test.
.test
.test.com
.test.co.ul
.test.
Any suggestions or thoughts?

Even if this answer is a bit too much for this Problem, it illustrates the problem: Even if it might be possible to create a regexp to check the url, it is much simpler and more robust to parse the URL and "create a real Object", on/with which the overall test can be decomposed to a number of smaller tests.
So probably the builtin URL constructor of modern browsers may help you here (LINK 1, LINK 2).
One approach to test you url might look like this:
function testURL (urlstring) {
var errors = [];
try {
var url = new URL(urlstring);
if (!/https/.test(url.protocol)) {
errors.push('wrong protocol');
}
//more tests here
} catch(err) {
//something went really wrong
//log the error here
} finally {
return errors;
}
}
if (testURL('mr.bean').length == 0) { runSomething(); }

Here's a non official, but works for most things one with an explanation. This should be good enough for most situations.
(https?:\/\/)?[\w\-~]+(\.[\w\-~]+)+(\/[\w\-~]*)*(#[\w\-]*)?(\?.*)?
(https?:\/\/)? - start with http:// or https:// or not
[\w\-~]+(\.[\w\-~]+)+ follow it with the domain name [\w\-~] and at least one extension (\.[\w\-~])+
[\w\-~] == [a-zA-Z0-9_\-~]
Multiple extensions would mean test.go.place.com
(\/[\w\-~]*)* then as many sub directories as wished
In order to easily make test.com/ pass, the slash does not enforce following characters. This can be abused like so: test.com/la////la.
(#[\w\-]*)? Followed maybe by an element id
(\?.*)? Followed maybe by url params, which (for the sake of simplicity) can be pretty much whatever
There are plenty of edge cases where this will break, or where it should but it doesn't. But, for most cases where people aren't doing anything wacky, this should work.

/((http|https)\:\/\/)?[a-zA-Z0-9\.\/\?\:#\-_=#]+\.([a-zA-Z0-9\&\.\/\?\:#\-_=#])*/g

Related

Optimising regex for matching domain name in url

I have a regex that matches iframe urls, and captures various components. The regex is given below
/(<iframe.*?src=['|"])((?:https?:\/\/|\/\/)[^\/]*)(?:.*?)(['|"][^>]*some-token:)([a-zA-Z0-9]+)(.*?>)/igm
To be clear my actual requirement is to transforms in a html string, such strings
<iframe src="http://somehost.com/somepath1/path2" class="some-token:abc123">
to
<iframe src="http://somehost.com/newpath?token=abc123" class="some-token:abc123">
The regex works as it is supposed to be, but for normal length html, it takes around 2 seconds to execute, which i think is very, high.
I would really appreciate if someone could point me how to optimise this regex, i am sure i am doing something terribly wrong, because before i used this regex
/(<iframe.*?src=['|"])(?:.*?)(['|"][^>]*some-token:)([a-zA-Z0-9]+)(.*?>)/igm
to completely replace the source url and just add the paramter, it was taking just 100 ms
You do not need to (and should not) parse the iframe element as a string; you just need to access its attributes, and retrieve information from them and rewrite them.
function fix_iframe_src(iframe) {
var src = iframe.getAttribute('src');
var klass = iframe.getAttribute('class');
var token = get_token(klass);
src = fix_src(src, token);
iframe.setAttribute('src', src);
}
Writing get_token and fix_src are left as an exercise.
If you want to find a bunch of iframes and fix them all up, then
var iframes = document.querySelectorAll('iframe');
for (var i = 0; i < iframes.length; i++) {
fix_iframe_src(iframes[i]);
}
By the way, the value of your class attribute seems to be broken. I doubt if it will match any CSS rules, if that's the intent. Are you using it for something other than to provide the token? In that case, you would be best off using a data attribute such as data-token.
Minor point about regexp flags: the g and m flags are going to do nothing for you. m is about matching anchors like ^ and $ to the beginning and end of lines within the source string, which is not an issue for you. g is about matching multiple times, which is also not an issue.
The reason your regexp is taking so long is most likely that you are throwing the entire DOM at it. Hard to tell unless you show us the code from which you are calling it.

Find and replace regex JavaScript

I know this is a super easy question, but I can't seem to wrap my head about it. I've got a bunch of URLs in varying languages such as:
www.myurl.com?lang=spa
www.myurl.com?lang=deu
www.myurl.com?lang=por
I need to create buttons to quickly switch from any language extension (spa, por, deu, rus, ukr, etc) to another language. I have the following code so far:
var url = window.location.toString();
window.location = url.replace(/lang=xxx/, 'lang=deu');
I just can't figure out the 3-character wildcard character. I know that I need to do some sort of regular expression or something, I'm just not sure how to go about it. Any help?
Thanks in advance
You can use
([&?]lang=)\w+
This will work with urls like www.myurl.com?foo=bar&lang=por&bar=foo too.
Instead of lang=deu, you'll have to replace with $1deu.
Try ... or .{3} or \w{3} or even [a-z]{3}, depending on how specific you want to be.
var s = 'www.myurl.com?lang=spa';
s.replace(/lang=[a-z]{3}/, 'lang=deu');
// => "www.myurl.com?lang=deu"
Here's a railroad diagram of the above example:
Use /lang=[a-z][3}/, here's an example:
/lang=[a-z]{3}/
Debuggex Demo

JavaScript my function to test a password doesn't work

function demoMatchClick() {
var validString = /^[a-z](?=[a-z]*[0-9])[a-z0-9]{0,6}[a-z]$/
var re = new RegExp(validString);
if (document.form1.subject.value.test(re)) {
alert("Successful match");
} else {
alert("No match");
}
}
<INPUT TYPE=SUBMIT VALUE="Replace" ONCLICK="demoReplaceClick()">
I can't get it to popup an Alert to pop up
I want these rules to be enforced
•Not have upper-case letters.
•Begin with a letter.
•Have at least 1 digit(s) not at the beginning and end.
•Have up to 8 alphanumeric
•Does NOT have any symbols like ##$ characters (symbols like !##$%^&*()-+).
I am using a button to execute the code for now.
Well, I suppose this regex suits your rules...
var rules = /^[a-z](?=[a-z]*[0-9])[a-z0-9]{0,6}[a-z]$/;
But I think there are several issues with your code, which I'd like to point out. Don't take it as a personal offense, please: believe me, I'm actually saving you a LOT of time and nerves.
First, there's a standard rule: each function should do only one thing - but do it really well (or so they say, these perfectionists!). Your code is too tightly coupled with DOM extraction: I was really surprised when it failed to work when pasted in my environment! Only then I noticed that document.forms call. It's not really needed here: it's sufficient to build a function taking one parameter, then call this function with the value extracted somewhere else. This way, btw, you can easily separate the causes of errors: it would be either in DOM part, or within the function.
Second, Regexes are really very close to be considered first-class citizens in JavaScript (not so as in Perl, but still much closer than in some other languages). That means you can write the regex literals as is, and use it later - without new Regexp constructs.
With all that said, I'd write your code as...
function validatePassword(password) {
var rules = /^[a-z](?=[a-z]*[0-9])[a-z0-9]{0,6}[a-z]$/;
return rules.test(password);
}
... then use it by something like ...
var password = document.form1.subject.value;
alert( validatePassword(password) ? 'Success! :)' : 'Failure... :(' );
P.S. And yes, Riccardo is right: set too strict rules for passwords - and suffer the consequences of narrowing the range of search for an attacker. And it's quite easy to see the validation rules set in Javascript: even obfuscators won't help much.
Here is the modified code:
function demoMatchClick(input) {
var validString = /^[a-z](?=[a-z]*[0-9])[a-z0-9]{0,6}[a-z]$/;
if (validString.test(input)) {
alert("Successful match");
} else {
alert("No match");
}
}
demoMatchClick("hello world");
validString variable is already a RegExp object and you can use it directly, additionally .test() method belongs to regex object not to string.

Javascript xPath [#StoreName]?

I'm doing some research for a project that I have going on the uses the document.createTreeWalker and I'm looking at a script that uses quite a few xpath's, but I'm curious as to where these come from. Some are obvious and I have been able to find answers to online, such as [#AttributeName] and [#TagName], but what is [#StoreName], [#AttributeValue1], [#AttributeValue2]...these I have not been able to look up online.
Particularly, I'm looking at these lines and not understanding:
thisURL = window.document.location.href.toString();
if(thisURL.search("[#StoreName]") != -1) { //do something }
Perhaps I'm misunderstanding your question, but there's nothing functionally or syntactically different between [#AttributeName] and [#StoreName]. They're both predicates that are looking for elements with particular attributes. The first one is looking for AttributeName attributes, while the second is looking for StoreName attributes.
That said, the code you're showing isn't actually doing any XPath work. It's just looking at whether the URL contains the character sequence [#StoreName] using JavaScript's string search function, and doing something if it does.

What would be the regex to get the view code from youtube urls?

I just need to get the view code from youtube urls. The api is returning back strings that look like this:
http:\/\/www.youtube.com\/watch?v=XODUrTtvZks&feature=youtube_gdata_player
I need to get this part:
XODUrTtvZks
from the above, keep in mind that sometimes there may be additional parameters after the v=something like:
&feature=youtube_gdata_player
and sometimes there may not be. Can someone please provide the regex that would work in this situation and an example of how to use it using javascript?
You can use /v=([^&]+)/ and get the match at offset 1.
This snippet only matches on URL's from youtube.com:
var url = 'http://www.youtube.com/watch?v=XODUrTtvZks&feature=youtube_gdata_player';
var matches = url.match(/^http[s]?:\/\/www.youtube.com\/watch\?\s*v=([^&]+)/i);
if (matches) {
var videoID = matches[1];
// do stuff
}
You can use an online tool called RegExr to get your regular expression ,[http://gskinner.com/RegExr/].
Regards
Rahul
This snippet is from Google’s own parser at closure:
function getIdFromUrl(url) {
return /https?:\/\/(?:[a-zA_Z]{2,3}.)?(?:youtube\.com\/watch\?)((?:[\w\d\-\_\=]+&(?:amp;)?)*v(?:<[A-Z]+>)?=([0-9a-zA-Z\-\_]+))/i.exec(url)[2];
}
You can see it here:
http://code.google.com/p/closure-library/source/browse/trunk/closure/goog/ui/media/youtube.js?r=1221#246

Categories

Resources