I'm writing a chrome extension which allows the user to modify content on specific websites. I'd like the user to be able to specify these websites using wildcards, for example http://*.google.com or http://google.com/*
I found the following code
currentUrl = "http://google.com/";
matchUrl = "http://*.google.com/*";
match = RegExp(matchUrl.replace(/\*/g, "[^]*")).test(currentUrl);
But there are a few problems with it.
http://test.google.com/ is a match
http://google.com/ is not a match
http://test.google.com is not a match
http://.google.com/ is a match
Clarification:
http://google.com Isn't a match, and that is the real problem.
So how can I can I create a JavaScript code snippet that will check if there is a match correctly?
I suggest parsing the URL into protocol, base part and the rest, and then re-build the validation regex replacing * inside the base part with (?:[^/]*\\.)* and otherwise with (?:/[^]*)?. Also, you must escape all other special chars with .replace(/[?()[\]\\.+^$|]/g, "\\$&"). You will also need anchors (^ for start of string and $ for the end of string position) to match the entire string. A case insensitive /i modifier is just a bonus to make the pattern case insensitive.
So, for this exact matchUrl, the regex will look like
/^http:\/\/(?:[^\/]*\.)*google\.com(?:\/[^]*)?$/
See the regex demo
var rxUrlSplit = /((?:http|ftp)s?):\/\/([^\/]+)(\/.*)?/;
var strs = ['http://test.google.com/', 'http://google.com/','http://test.google.com', 'http://.google.com/','http://one.more.test.google.com'];
var matchUrl = "http://*.google.com/*";
var prepUrl = "";
if ((m=matchUrl.match(rxUrlSplit)) !== null) {
prepUrl = m[1]+"://"+m[2].replace(/[?()[\]\\.+^$|]/g, "\\$&").replace(/\*\\./g,'(?:[^/]*\\.)*').replace(/\*$/,'[^/]*');
if (m[3]) {
prepUrl+= m[3].replace(/[?()[\]\\.+^$|]/g, "\\$&").replace(/\/\*(?=$|\/)/g, '(?:/[^]*)?');
}
}
if (prepUrl) {
// console.log(prepUrl); // ^http://(?:[^/]*\.)*google\.com(?:/[^]*)?$
var rx = RegExp("^" + prepUrl + "$", "i");
for (var s of strs) {
if (s.match(rx)) {
console.log(s + " matches!<br/>");
} else {
console.log(s + " does not match!<br/>");
}
}
}
with this matchUrl
matchUrl = "http://*.google.com/*";
the RexExp is something like this
"http://.*.google.com/.*"
so try to replace the * entered by the user with .* in the regexp match
you can use this tool to test it
Related
I'm trying to extract out a group of words from a larger string/cookie that are separated by hyphens. I would like to replace the hyphens with a space and set to a variable. Javascript or jQuery.
As an example, the larger string has a name and value like this within it:
facility=34222%7CConner-Department-Store;
(notice the leading "C")
So first, I need to match()/find facility=34222%7CConner-Department-Store; with regex. Then break it down to "Conner Department Store"
var cookie = document.cookie;
var facilityValue = cookie.match( REGEX ); ??
var test = "store=874635%7Csomethingelse;facility=34222%7CConner-Department-Store;store=874635%7Csomethingelse;";
var test2 = test.replace(/^(.*)facility=([^;]+)(.*)$/, function(matchedString, match1, match2, match3){
return decodeURIComponent(match2);
});
console.log( test2 );
console.log( test2.split('|')[1].replace(/[-]/g, ' ') );
If I understood it correctly, you want to make a phrase by getting all the words between hyphens and disallowing two successive Uppercase letters in a word, so I'd prefer using Regex in that case.
This is a Regex solution, that works dynamically with any cookies in the same format and extract the wanted sentence from it:
var matches = str.match(/([A-Z][a-z]+)-?/g);
console.log(matches.map(function(m) {
return m.replace('-', '');
}).join(" "));
Demo:
var str = "facility=34222%7CConner-Department-Store;";
var matches = str.match(/([A-Z][a-z]+)-?/g);
console.log(matches.map(function(m) {
return m.replace('-', '');
}).join(" "));
Explanation:
Use this Regex (/([A-Z][a-z]+)-?/g to match the words between -.
Replace any - occurence in the matched words.
Then just join these matches array with white space.
Ok,
first, you should decode this string as follows:
var str = "facility=34222%7CConner-Department-Store;"
var decoded = decodeURIComponent(str);
// decoded = "facility=34222|Conner-Department-Store;"
Then you have multiple possibilities to split up this string.
The easiest way is to use substring()
var solution1 = decoded.substring(decoded.indexOf('|') + 1, decoded.length)
// solution1 = "Conner-Department-Store;"
solution1 = solution1.replace('-', ' ');
// solution1 = "Conner Department Store;"
As you can see, substring(arg1, arg2) returns the string, starting at index arg1 and ending at index arg2. See Full Documentation here
If you want to cut the last ; just set decoded.length - 1 as arg2 in the snippet above.
decoded.substring(decoded.indexOf('|') + 1, decoded.length - 1)
//returns "Conner-Department-Store"
or all above in just one line:
decoded.substring(decoded.indexOf('|') + 1, decoded.length - 1).replace('-', ' ')
If you want still to use a regular Expression to retrieve (perhaps more) data out of the string, you could use something similar to this snippet:
var solution2 = "";
var regEx= /([A-Za-z]*)=([0-9]*)\|(\S[^:\/?#\[\]\#\;\,']*)/;
if (regEx.test(decoded)) {
solution2 = decoded.match(regEx);
/* returns
[0:"facility=34222|Conner-Department-Store",
1:"facility",
2:"34222",
3:"Conner-Department-Store",
index:0,
input:"facility=34222|Conner-Department-Store;"
length:4] */
solution2 = solution2[3].replace('-', ' ');
// "Conner Department Store"
}
I have applied some rules for the regex to work, feel free to modify them according your needs.
facility can be any Word built with alphabetical characters lower and uppercase (no other chars) at any length
= needs to be the char =
34222 can be any number but no other characters
| needs to be the char |
Conner-Department-Store can be any characters except one of the following (reserved delimiters): :/?#[]#;,'
Hope this helps :)
edit: to find only the part
facility=34222%7CConner-Department-Store; just modify the regex to
match facility= instead of ([A-z]*)=:
/(facility)=([0-9]*)\|(\S[^:\/?#\[\]\#\;\,']*)/
You can use cookies.js, a mini framework from MDN (Mozilla Developer Network).
Simply include the cookies.js file in your application, and write:
docCookies.getItem("Connor Department Store");
In a text file,there is a string encoding the http-related information. The followings are the string examples
URL 123.34.45.7:http://captive.apple.com/hotspot-detect.html
or
URL 123.45.67.8:http://www.google-analytics.com/r/collect?v=1&_v=j41&a=1071188231&t=pageview&_s=1&dl=http%3A%2F%2Fm.sherdog.com%2F&ul=en-us&de=UTF-8&dt=Sherdog.com%3A%20UFC%2C%20Mixed%20Martial%20Arts%20(MMA)%20News%2C%20Results%2C%20Fighting&sd=32-bit&sr=320x480&vp=320x460&je=0&_utma=236548035.1293902652.1385044241.1442
I wrote some regular expression to extract the part until http, such as
url)\\s\\d+[.]\\d+[.]\\d+[.]\\d+[:](http|https|ftp)
but I am not sure how to write regular expression to match the part following http. Thanks.
Try following regex:
/^(URL[^:]+:)(?:.*?\/\/)(.*)/gm
Demo:
var re = /^(URL[^:]+:)(?:.*?\/\/)(.*)/gm;
var str = 'URL 123.34.45.7:http://captive.apple.com/hotspot-detect.html';
var m;
while ((m = re.exec(str)) !== null) {
console.log(m[1]+m[2]);
}
It will print:
URL 123.34.45.7:captive.apple.com/hotspot-detect.html
.* will match any character any number of times.
So if you add that to the end of the string you'll get:
url\s\d+[.]\d+[.]\d+[.]\d+[:](http|https|ftp).*
That will match until the end of the line.
Note that I've unescaped \\ to just \ for readability. You may need to re-escape them.
Here's an example of that regular expression at work
For finding a string of the type "URL [IP ADDRESS]:[URL]" within a given (possibly large) string, try this:
var patterns = {
ip: '\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}',
url: (() => {
var protocol = '(http(s)?(:\/\/))?(www\.)?';
var domains = '[a-zA-Z0-9-_\.]+';
var params = '([-a-zA-Z0-9:%_\+.~#?&//=]*)';
return protocol + domains + params;
})()
}
var regex = new RegExp(`URL ${patterns.ip}:${patterns.url}`);
check out this fiddle: enter link description here
I am trying to create a regex which will ultimately be used with Google Forms to validate a texarea input.
The rule is,
Input area can have one or more URLs (http or https)
Each URL must be separated either by one or more new lines
Each line which has text, must be a single valid URL
Last URL may have or may not have new line character/s after it
Till now, I have written this regex ^(https?://.+[\r\n]+)*(https?://.+[\r\n]+?)$ but the problem is that if a line has more than 1 url, it validates that too.
Here is my testing playground: http://goo.gl/YPdvBH.
Here is what you are looking for
Demo , Demo with your URLS
function validate(ele) {
str = ele.value;
str = str.replace(/\r/g, "");
while (/\s\n/.test(str)) {
str = str.replace(/\s\n/g, "\n");
}
while (/\n\n/.test(str)) {
str = str.replace(/\n\n/g, "\n");
}
ele.value = str;
str = str.replace(/\n/g, "_!_&_!_").split("_!_&_!_")
var result = [], counter = 0;
for (var i = 0; i < str.length; i++) {
str[i] = str[i].replace(/(?:(?:^|\n)\s+|\s+(?:$|\n))/g, '').replace(/\s+/g, ' ');
if(str[i].length !== 0){
if (isValidAddress(str[i])) {
result.push(str[i]);
}
counter += 1;
}
}
function isValidAddress(s) {
return /^(https?|ftp):\/\/(((([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-f]{2})|[!\$&'\(\)\*\+,;=]|:)*#)?(((\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5])\.(\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5])\.(\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5])\.(\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5]))|((([a-z]|\d|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(([a-z]|\d|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])*([a-z]|\d|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])))\.)+(([a-z]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(([a-z]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])*([a-z]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])))\.?)(:\d*)?)(\/((([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-f]{2})|[!\$&'\(\)\*\+,;=]|:|#)+(\/(([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-f]{2})|[!\$&'\(\)\*\+,;=]|:|#)*)*)?)?(\?((([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-f]{2})|[!\$&'\(\)\*\+,;=]|:|#)|[\uE000-\uF8FF]|\/|\?)*)?(\#((([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-f]{2})|[!\$&'\(\)\*\+,;=]|:|#)|\/|\?)*)?$/i.test(s)
}
return (result.length === str.length);
}
var ele = document.getElementById('urls');
validate(ele);
This is closer to the regex you are looking for:
^(https?://[\S]+[\r\n]+)*(https?://[\S]+[\r\n]+?)$
The difference between your regex and this one is that you use .+ which will match all characters except newline whereas I use [\S]+ (note it is a capital S) which will match all non-whitespace characters. So, this doesn't match more than one token on one line. Hence, on each line you can match at max one token and that must be of the form that you have defined.
For a regex to match a single URL, look at this question on StackOverflow:
What is the best regular expression to check if a string is a valid URL?
I don't know whether google-forms have a length limit. But if they have, it is sure to almost bounce into it.
If i understand right - in your regexp missing m flag for multiline, so you need something like this
/^(https?://.+this your reg exp for one url)$/m
sample with regexp from Javascript URL validation regex
/^(ht|f)tps?:\/\/[a-z0-9-\.]+\.[a-z]{2,4}\/?([^\s<>\#%"\,\{\}\\|\\\^\[\]`]+)?$/m
Here's an example string:
++++#foo+bar+baz++#yikes
I need to extract foo and only foo from there or a similar scenario.
The + and the # are the only characters I need to worry about.
However, regardless of what precedes foo, it needs to be stripped or ignored. Everything else after it needs to as well.
try this:
/\++#(\w+)/
and catch the capturing group one.
You can simply use the match() method.
var str = "++++#foo+bar+baz++#yikes";
var res = str.match(/\w+/g);
console.log(res[0]); // foo
console.log(res); // foo,bar,baz,yikes
Or use exec
var str = "++++#foo+bar+baz++#yikes";
var match = /(\w+)/.exec(str);
alert(match[1]); // foo
Using exec with a g modifier (global) is meant to be used in a loop getting all sub matches.
var str = "++++#foo+bar+baz++#yikes";
var re = /\w+/g;
var match;
while (match = re.exec(str)) {
// In array form, match is now your next match..
}
How exactly do + and # play a role in identifying foo? If you just want any string that follows # and is terminated by + that's as simple as:
var foostring = '++++#foo+bar+baz++#yikes';
var matches = (/\#([^+]+)\+/g).exec(foostring);
if (matches.length > 1) {
// all the matches are found in elements 1 .. length - 1 of the matches array
alert('found ' + matches[1] + '!'); // alerts 'found foo!'
}
To help you more specifically, please provide information about the possible variations of your data and how you would go about identifying the token you want to extract even in cases of differing lengths and characters.
If you are just looking for the first segment of text preceded and followed by any combination of + and #, then use:
var foostring = '++++#foo+bar+baz++#yikes';
var result = foostring.match(/[^+#]+/);
// will be the single-element array, ['foo'], or null.
Depending on your data, using \w may be too restrictive as it is equivalent to [a-zA-z0-9_]. Does your data have anything else such as punctuation, dashes, parentheses, or any other characters that you do want to include in the match? Using the negated character class I suggest will catch every token that does not contain a + or a #.
How can I find if text contains a url string. I mean if I have
Sometexthttp://daasddas some text
I want http://daasddas to be achored or maked as a link wit javascript
function replaceURLWithHTMLLinks(text)
{
var exp = /(\b(https?|ftp|file):\/\/[-A-Z0-9+&##\/%?=~_|!:,.;]*[-A-Z0-9+&##\/%=~_|])/ig;
return text.replace(exp,"<a href='$1'>$1</a>");
}
While the code above works good if all given URLs are full (http://mydomain.com), I had problems parsing a URL like:
www.mydomain.com
i.e. without a protocol.
So I added some simple code to the function:
var exp = /(\b(((https?|ftp|file|):\/\/)|www[.])[-A-Z0-9+&##\/%?=~_|!:,.;]*[-A-Z0-9+&##\/%=~_|])/ig;
var temp = text.replace(exp,"$1");
var result = "";
while (temp.length > 0) {
var pos = temp.indexOf("href=\"");
if (pos == -1) {
result += temp;
break;
}
result += temp.substring(0, pos + 6);
temp = temp.substring(pos + 6, temp.length);
if ((temp.indexOf("://") > 8) || (temp.indexOf("://") == -1)) {
result += "http://";
}
}
return result;
If someone should fine a more optimal solution to add a default protocol to URLs, let me know!
You have to use regex(Regular expressions) to find URL patterns in blocks of text.
Here's a link to same question and answers:
Regular Expression to find URLs in block of Text (Javascript)
I tweaked dperinis regex-url script so that a URL embedded in a string can be found. It will not find google.com, this is necessary if it's a user input field, the user might leave out the whitespace after a period/full stop. It will also find www.google.com, since hardly anyone types the protocol.
(?:((?:https?|ftp):\/\/)|ww)(?:\S+(?::\S*)?#)?(?:(?!(?:10|127)(?:\.\d{1,3}){3})(?!(?:169\.254|192\.168)(?:\.\d{1,3}){2})(?!172\.(?:1[6-9]|2\d|3[0-1])(?:\.\d{1,3}){2})(?:[1-9]\d?|1\d\d|2[01]\d|22[0-3])(?:\.(?:1?\d{1,2}|2[0-4]\d|25[0-5])){2}(?:\.(?:[1-9]\d?|1\d\d|2[0-4]\d|25[0-4]))|(?:(?:[a-z\u00a1-\uffff0-9]-*)*[a-z\u00a1-\uffff0-9]+)(?:\.(?:[a-z\u00a1-\uffff0-9]-*)*[a-z\u00a1-\uffff0-9]+)*(?:\.(?:[a-z\u00a1-\uffff]{2,}))\.?)(?::\d{2,5})?(?:[/?#]\S*)?
I tested it on www.regextester.com, it worked for me, if you encounter a problem, please comment.
you can use a regular expression to find an URL and replace it by the same with a leading and a trailing tag
Many of the solutions start getting very complex and hard to work with a variety of situations. Here's a function I created to capture any URL beginning with http/https/ftp/file/www. This is working like a charm for me, the only thing it doesn't add a link to is user entered URL's without an http or www at the beginning (i.e. google.com). I hope this solution is helpful for somebody.
function convertText(txtData) {
var urlRegex =/(\b(https?|ftp|file):\/\/[-A-Z0-9+&##\/%?=~_|!:,.;]*[-A-Z0-9+&##\/%=~_|])/ig;
txtData = txtData.replace(urlRegex, '$1');
var urlRegex =/(\b(\swww).[-A-Z0-9+&##\/%?=~_|!:,.;]*[-A-Z0-9+&##\/%=~_|])/ig;
txtData = txtData.replace(urlRegex, ' $1');
var urlRegex =/(>\swww)/ig;
txtData = txtData.replace(urlRegex, '>www');
var urlRegex =/(\"\swww)/ig;
txtData = txtData.replace(urlRegex, '"http://www');
return txtData;
}
function replaceURLWithHTMLLinksHere(text)
{
var exp = /(\b(https?|ftp|file):\/\/[-A-Z0-9+&##\/%?=~_|!:,.;]*[-A-Z0-9+&##\/%=~_|])/ig;
return text.replace(exp,"<a href='$1'>$1</a>");
}
Okay we got this regular expresion here in function.
/(\b(https?|ftp|file)://[-A-Z0-9+&##/%?=~|!:,.;]*[-A-Z0-9+&##/%=~|])/ig
Lets understand this.
/ / this is how a regex starts.
\b > is maching https or ftp or file that is unique and is in the start of string. these keywords should not have any character attatched to them in
begining like bbhttps or bbhttp it will not match these otherwise.
https? > here ? means zero or one of preceding character or group. In this case s is optional.
| > match one out of given just like OR.
() > create group to be matched
/ > means the next character is special and is not to be interpreted literally. For example, a 'b' without a preceding '\' generally matches lowercase
'b's wherever they occur. But a '\b' by itself doesn't match any character
[] > this is Character Classes or Character Sets. It is used to have a group of characters and only one character out of all will be present at a time.
[-A-Z0-9+&##/%?=~_|!:,.;]* > zero or more occurrences of the preceding element. For example, b*c matches "c", "bc", "bbc", "bbbc", and so on.
[-A-Z0-9+&##/%=~_|] > means one charactor out of these all.
i > Case-insensitive search.
g > Global search.
function replaceURLWithLinks(text){
var text = "";
text= text.replace(/\r?\n/g, '<br />');
var result = URI.withinString(text, function(url) {
return "<a href='"+url+"' target='_blank'>" + url + "</a>";
});
}