JS regex match (doesn't begin with [A-z]+://)www - javascript

I have some links in which people did not add the protocol to. I.e., www.stackoverflow.com. If the link begins with www., I want to replace it with 'http://www.'.
How can I do this with JavaScript regular expressions?
I tried the code below, but I can't seem to match the pattern 'doesn't start with [A-z]+://www.'.
The links are mixed in with text.
jQuery(document).ready(function () {
jQuery('.myClass').each(function (index) {
var temp = wwwify(jQuery(this).text());
jQuery(this).html(temp);
});
});
function wwwify(text) {
var regex = /(?!\b([A-z]+:\/\/))www\./igm;
return text.replace(regex, 'http://www.');
}

Why not just use the following?
if (text.substring(0,4)=='www.') {
text = 'http://'+text;
}

You could just easily replace each "http://www." to "www." and then replace all "www." to "http://www.". It might not be the prettiest regexp you could imagine, but it will solve your problem.
$(document).ready(function () {
$('.myClass').each(function (index) {
var $elm = $(this); // cache $(this) for reuse
var html = $elm.html();
html = html.replace(/http\:\/\/www\./ig, "www.").replace(/www\./ig, "http://www."); ;
$elm.html(html);
});
});

You need to anchor your regex to the start of the string. Also the range needs to be /[a-z]/ as the /i modifier will cover the upper-case possibilities. The /m and /g modifiers are irrelevant here. Leaving
var regex = /^(?![a-z]+:\/\/)www\./i;
My apologies, I missed the part saying "The links are mixed in with text". Without look-behind this can only be done using a function to return a replacement string. I suggest this, which captures any protocol before the www. and replaces it with http:// if it is blank
var regex = /\b([a-z]+:\/\/)?www\./ig;
text.replace(regex, function(url, protocol) {
return protocol ? url : "http://" + url;
});

Since I haven't found any suitable regex solutions through SO or elsewhere, just using regular javascript replace may be the best solution.
For now I'm making two passes through the text:
function wwwLineBeginsWith(text) {
var regex = /^www./gi;
return text.replace(regex, 'http://');
}
function wwwWordBeginsWith(text) {
var regex = /\swww./gi; return text.replace(regex, 'http://');
}
var test1 = 'www.test2.com';
test1 = wwwLineBeginsWith(test1);
test1 = wwwWordBeginsWith(test1);
console.log(wwwWordBeginsWith(test1));

How about replacing those with a protocol regardless?
function wwwify(text) {
return text.replace(/(http(s?):\/\/)?www\./ig, 'http$2://www.');
}
The reason it's currently not working is because JavaScript doesn't support lookbehinds, only lookaheads. You would need the syntax (?<!, which is not available in JavaScript regular expressions.

If you absolutely must use RegExp to determine this, I would recommend using something like /^[^Hh][^Tt]{2}[^Pp]:\/\// for the RegExp. Otherwise, I agree with the other posters... using indexOf would be far easier (i.e., url.toLowerCase().indexOf('http://') !== 0).

Related

Regex do not match content but whole searched string

I'm using this regex to match an "href" attribute in a <a> tag:
var href_matches = postRep.match(/href="(.*?)"/g);
The regex matches correctly the href except it returns the whole "href=http:example.com" string.
How do I manage to get only the href value (eg. "example.com")?
You can either run exec() on the regex :
var url_match = /href="(.*?)"/g.exec(postRep);
or remove the global flag
var url_match = postRep.match(/href="(.*?)"/);
Using String's match() function won't return captured groups if the
global modifier is set.
Just another idea.
You can try something like this function:
function getHrefs(inputString) {
var out = [];
inputString.replace(/\bhref\b=['"]([^'"]+)['"]/gi, function(result, backreference) {
out.push(backreference);
return '';
});
return out;
}
Improved solution (much shortest):
function getHrefs(inputString) {
return (inputString.match(/\bhref\b=['"][^'"]+(?=['"])/gi) || []).map(s => s.replace(/^href=["']/,""));
}
Edit:
There is other option - exec. But with exec you will need loop to get all matches (if you need this).
You can use regex lookbehinds to check if the "href=" is there without actually including it in the match.
For example, the regex (?<=href=)example\.com applied to href=example.com should only match example.com.
EDIT:
This method only works in languages that support regex lookbehinds. Javascript doesn't support this feature. (thanks to Georgi Naumov for pointing this out)

Regex to detect a string that contains a URL or file extension

I'm trying to create a small script that detects whether the string input is either:
1) a URL (which will hold a filename): 'http://ajax.googleapis.com/html5shiv.js'
2) just a filename: 'html5shiv.js'
So far I've found this but I think it just checks the URL and file extension. Is there an easy way to make it so it uses an 'or' check? I'm not very experienced with RegExp.
var myRegExp = /[^\\]*\.(\w+)$/i;
Thank you in advance.
How bout this regex?
(\.js)$
it checks the end of the line if it has a .js on it.
$ denotes end of line.
tested here.
Basically, to use 'OR' in regex, simply use the 'pipe' delimiter.
(aaa|bbb)
will match
aaa
or
bbb
For regex to match a url, I'd suggest the following:
\w+://[\w\._~:/?#\[\]#!$&'()*+,;=%]*
This is based on the allowed character set for a url.
For the file, what's your definition of a filename?
If you want to search for strings, that match "(at least) one to many non-fullstop characters, followed by a fullstop, followed by (at least) one to many non-fullstop characters", I'd suggest the following regex:
[^\.]+\.[^\.]+
And altogether:
(\w+://[\w\._~:/?#\[\]#!$&'()*+,;=%]*|[^\.]+\.[^\.]+)
Here's an example of working (in javascript): jsfiddle
You can test it out regex online here: http://gskinner.com/RegExr/
If it is for the purpose of flow control you can do the following:
var test = "http://ajax.googleapis.com/html5shiv.js";
// to recognize http & https
var regex = /^https?:\/\/.*/i;
var result = regex.exec(test);
if (result == null){
// no URL found code
} else {
// URL found code
}
For the purpose of capturing the file name you could use:
var test = "http://ajax.googleapis.com/html5shiv.js";
var regex = /(\w+\.\w+)$/i;
var filename = regex.exec(test);
Yes, you can use the alternation operator |. Be careful, though, because its priority is very low. Lower than sequencing. You will need to write things like /(cat)|(dog)/.
It's very hard to understand what you exactly want with so few use/test cases, but
(http://[a-zA-Z0-9\./]+)|([a-zA-Z0-9\.]+)
should give you a starting point.
If it's a URL, strip it down to the last part and treat it the same way as "just a filename".
function isFile(fileOrUrl) {
// This will return everything after the last '/'; if there's
// no forward slash in the string, the unmodified string is used
var filename = fileOrUrl.split('/').pop();
return (/.+\..+/).test(filename);
}
Try this:
var ajx = 'http://ajax.googleapis.com/html5shiv.js';
function isURL(str){
return /((\/\w+)|(^\w+))\.\w{2,}$/.test(str);
}
console.log(isURL(ajx));
Have a look at this (requires no regex at all):
var filename = string.indexOf('/') == -1
? string
: string.split('/').slice(-1)[0];
Here is the program!
<script>
var url="Home/this/example/file.js";
var condition=0;
var result="";
for(var i=url.length; i>0 && condition<2 ;i--)
{
if(url[i]!="/" && url[i]!="."){result= (condition==1)? (url[i]+result):(result);}
else{condition++;}
}
document.write(result);
</script>

Matching hashes using regex, but not when they are part of an url

I am struggling with a regex in javascript that needs the text after # to the first word boundary, but not match it if it is part of an url. So
#test - should match test
sometext#test2 - should match test2
xx moretext#test3 - should match test3
http://test.com#tab1 - should not match tab1
I am replacing the text after the hash with a link (but not the hash character itself). There can be more than one hash in the text, and it should match them all (I guess I should use /g for that).
Matching the part after the hash is quite easy: /#\b(.+?)\b/g, but not matching it if the string itself starts with "http" is something I cannot solve. I should probably use a negative look-around, but I am having problems getting my head around that.
Any help is greatly appreciated!
Try this regex using a negative lookahead instead since JS doesn't support lookbehinds:
/^(?!http:\/\/).*#\b(.+?)\b/
You may want to check for www too, depending on your conditions.
Edit: Then you can do this:
str = str.replace(re.exec(str)[1], 'replaced!');
http://jsfiddle.net/j7c79/2/
Edit 2: Sometimes a regex alone is not the way to go if it gets too complicated. Try a different approach:
var txt = "asdfgh http://asdf#test1 #test2 woot#test3";
function replaceHashWords(str, rep) {
var isUrl = /^http/.test(str), result = [];
!isUrl && str.replace(/#\b(.+?)\b/g, function(a,b){ result.push(b); });
return str.replace((new RegExp('('+ result.join('|') +')','g')), rep);
}
alert(replaceHashWords(txt, 'replaced!'));
// asdfgh http://asdf#replaced! #replaced! woot#replaced!
As regex is, often (if not always), quite expensive to use, I'd suggest using basic string, and array, methods to determine whether a given set of characters represents an URL (though I'm assuming that all URLS will start with the http string):
$('ul li').each(
function() {
var t = $(this).text(),
words = t.split(/\s+/),
foundHashes = [],
word = '';
for (var i = 0, len = words.length; i < len; i++) {
word = words[i];
if (word.indexOf('http') == -1 && word.indexOf('#') !== -1) {
var match = word.substring(word.indexOf('#') + 1);
foundHashes.push(match);
}
}
// the following just shows what, if anything, was found
// and can definitely be safely omitted
if (foundHashes.length) {
var newSpan = $('<span />', {
'class': 'matchedWords'
}).text(foundHashes.join(', ')).appendTo($(this));
}
});
JS Fiddle demo (with some timing information printed to the console).
References:
jQuery:
appendTo().
each().
text().
'Vanilla' JavaScript
Array.join().
String.indexOf().
String.split().
String.substring().
This would require a lookbehind, something sadly lacking from JavaScript's capabilities.
However, if your subject string is some HTML and those URLs are in href attributes, you can create a document out of it and search for text nodes, only replacing their nodeValues instead of the whole HTML string.

How do I use match() for a white list of characters?

I used preg_match for my server-side validation but I want to have a client side too.
For my PHP I allow those characters:
'/^[A-Za-z][a-zA-Z0-9 .:-,!?]+$/'
How would I make a white list of characters with match() in JavaScript?
EDIT:
I tried this but it didn't work for some reason:
My debugger says, right before the if statement:
218SyntaxError: Invalid regular expression: range out of order in character class
$('#title').blur(function(){
input = $('#title').val();
var invalidChars = /^[^a-z][^a-z\d .:-,!?]+$/i;
if (!invalidChars.test(input)){
alert('true');
}
else {
alert('false');
}
});
all of the above answers are correct, though just a side-note: instead of writing [A-Za-z], a simple /[a-z]/i will suffice. The i is a case-insensitive flag...
var validChars = /^[a-z][a-z\d .:\-,!?]+$/i;
if (validChars.test(myText)){ ... }
Using regex.test(str) is slightly more performant than str.match(regex) if all you want is to know if a match exists or not.
Alternatively, you can early out if you see any invalid character:
var invalidChars = /^[^a-z][^a-z\d .:\-,!?]+$/i;
if (!invalidChars.test(myStr)){
// we passed
}
This allows the regex test to stop the moment it sees a disallowed character.
Try the following
var text = ...;
if (text.match(/^[A-Za-z][a-zA-Z0-9 .:-,!?]+$/)) {
...
}
Actually it's the opposite:
var regexp = /^[A-Za-z][a-zA-Z0-9 .:-,!?]+$/;
if (regexp.test(text)) {
}

Simple Regex question

I want to remove this from a url string
http://.....?page=1
I know this doesn't work, but I was wondering how you would do this properly.
document.URL.replace("?page=[0-9]", "")
Thanks
It seems like you want to get rid of the protocol and the querystring. So how about just concatenating the remaining parts?
var loc = window.location;
var str = loc.host + loc.pathname + loc.hash;
http://jsfiddle.net/9Ng3Z/
I'm not entirely certain what the requirements are, but this fairly simple regex works.
loc.replace(/https?\:\/\/([^?]+)(\?|$)/,'$1');
It may be a naive implementation, but give it a try and see if it fits your need.
http://jsfiddle.net/9Ng3Z/1/
? is a regex special character. You need to escape it for a literal ?. Also use regular expression literals.
document.URL.replace(/\?page=[0-9]/, "")
The answer from #patrick dw is most practical but if you're really curious about a regular expression solution then here is what I would do:
var trimUrl = function(s) {
var r=/^http:\/\/(.*?)\?page=\d+.*$/, m=(""+s).match(r);
return (m) ? m[1] : s;
}
trimUrl('http://foo.com/?page=123'); // => "foo.com/"
trimUrl('http://foo.com:8080/bar/?page=123'); // => "foo.com:8080/bar/"
trimUrl('foobar'); // => "foobar"
You're super close. To grab the URL use location.href and make sure to escape the question mark.
var URL = location.href.replace("\?page=[0-9]", "");
location.href = URL; // and redirect if that's what you intend to do
You can also strip all query string parameters:
var URL = location.href.replace("\?.*", "");

Categories

Resources