Only match regex if it doesnt start with a pattern in javascript - javascript

I have a bit of a strange one here, I basically have a large chunk of text which may or may not contain links to images.
So lets say it does I have a pattern which will extract the image url fine, however once a match is found it is replaced with a element with the link as the src. Now the problem is there may be multiple matches within the text and this is where it gets tricky. As the url pattern will now match the src tags url, which will basically just enter an infinite loop.
So is there a way to ONLY match in regex if it doesnt start with a pattern like ="|=' ? as then it would match the url in something like:
some image http://cdn.sstatic.net/stackoverflow/img/sprites.png?v=6
but not
some image <img src="http://cdn.sstatic.net/stackoverflow/img/sprites.png?v=6">
I am not sure if it is possible, but if it is could someone point me in the right direction? A replace by itself will not suffice in this scenario as the url matched needs to be used elsewhere too so it needs to be used like a capture.
The main scenarios I need to account for are:
Many links in one block of varied text
A single link without any other text
A single link with other varied text
== edit ==
Here is the current regex I am using to match urls:
(\b(https?|ftp|file):\/\/[-A-Z0-9+&##\/%?=~_|!:,.;]*(?:png|jpeg|jpg|gif|bmp))
== edit 2 ==
Just so everyone understands why I cannot use the /g command here is an answer which explains the issue, if I could use this /g like I originally tried then it would make things a lot simpler.
Javascript regex multiple captures again

What you are looking for is a negative look behind, but Javascript doesn't support any kind of look behinds, so you will either have to use a callback function to check what was matched and make sure it is not preceded by a ' or ", or you can use the following regex:
(?:^|[^"'])(\b(https?|ftp|file):\/\/[-a-zA-Z0-9+&##\/%?=~_|!:,.;]*(?:png|jpeg|jpg|gif|bmp))
which has a single problem, that is in the case of a successful match it will catch one more character, the one right before the (\b(https?|ftp|file) pattern in the input, but I think you can deal with this easily.
Regex101 Demo

Using the /ig command at the end should work... the g is for global replace and the i is for case-insensitivity, which is necessary as you've only got A-Z instead of a-zA-Z.
Using the following vanilla JS appears to work for me (see jsfiddle)...
var test="some image http://cdn.sstatic.net/stackoverflow/img/sprites.png?v=6 some image http://cdn.sstatic.net/stackoverflow/img/sprites.png?v=6 some image http://cdn.sstatic.net/stackoverflow/img/sprites.png?v=6";
var re = new RegExp(/(\b(https?|ftp|file):\/\/[-A-Z0-9+&##\/%?=~_|!:,.;]*(?:png|jpeg|jpg|gif|bmp))/ig);
document.getElementById("output").innerHTML = test.replace(re,"<img src=\"$1\"/>");
Although, what it does highlight is that the query string part of the URL (the ?v=6 is not being picked up with your RegEx).
For jQuery, it would be (see jsfiddle)...
$(document).ready(function(){
var test="some image http://cdn.sstatic.net/stackoverflow/img/sprites.png?v=6 some image http://cdn.sstatic.net/stackoverflow/img/sprites.png?v=6 some image http://cdn.sstatic.net/stackoverflow/img/sprites.png?v=6";
var re = new RegExp(/(\b(https?|ftp|file):\/\/[-A-Z0-9+&##\/%?=~_|!:,.;]*(?:png|jpeg|jpg|gif|bmp))/ig);
$("#output").html(test.replace(re,"<img src=\"$1\"/>"));
});
Update
Just in case my example of using the same image URL in the example doesn't convince you - it also works with different URLs... see this jsfiddle update
var test="http://cdn.sstatic.net/stackoverflow/img/sprites.png?v=6 http://cdn.sstatic.net/serverfault/img/sprites.png?v=7";
var re = new RegExp(/(\b(https?|ftp|file):\/\/[-A-Z0-9+&##\/%?=~_|!:,.;]*(?:png|jpeg|jpg|gif|bmp))/ig);
document.getElementById("output").innerHTML = test.replace(re,"<img src=\"$1\"/>");

Couldn't you just see if there is a whitespace in front of the url, instead of that word-boundary? seems to work, although you will have to remove the matched whitespace later.
(\s(https?|ftp|file):\/\/[-A-Z0-9+&##\/%?=~_|!:,.;]*(?:png|jpeg|jpg|gif|bmp))
http://rubular.com/r/9wSc0HNWas
Edit: Damn, too slow :) I'll still leave this here as my regex is shorter ;)

as was said by freefaller, you might use /g flag to just find all matches in one go, if exec is not a must.
otherwise: you can add (="|=')? to the beginning of your regex, and check if $1 is undefined. if it is undefined, then it was not started with a ="|=' pattern

Related

regex to match all keywords in a string

Being noob in regex I require some support from community
Let say I have this string str
www.anysite.com hello demo try this link
anysite.com indeed demo link
http://www.anysite.com another one
www.anysite.com
http://anysite.com
Consider 1-5 as whole string str here
I want to convert all 'anysite.com' into clickable html links, for which I am using:
str = str.replace(/((http|https|ftp):\/\/[\w?=&.\/-;#~%-]+(?![\w\s?&.\/;#~%"=-]*>))/g, '$1');
This converts all space separated words starting with http/https/ftp into links as
url
So, line 3 and line 5 has been converted correctly. Now to convert all www.anysite.com into links I again used
str = str.replace(/(\b^(http|https|ftp)?(www\.)[-A-Z0-9+&##\/%?=~_|!:,.;]*[-A-Z0-9+&##\/%=~_|])/ig, '$1');
Though it only converts www.anysite.com into link if it is found at very beginning of str. So it convert line number 1 but not line number 4.
Note that I have used ^(http|https|ftp)?(www.) to find all www not
starting with http/https/ftp, as for http they already have been
converted
Also the link on line number 2, where it is neither started with http nor www rather it ends with .com, how the regex would be for that.
For reference you can try posting this whole string to you facebook timeline, it converts all five line into links. Check snapshot
Thanks for help, the final RegEx that helped me is:
//remove all http:// and https://
str = str.replace(/(http|https):\/\//ig, "");
//replace all string ending with .com or .in only into link
str = str.replace( /((www\.)?[-a-zA-Z0-9#:%._\+~#=]{2,256}\.(com|in))/ig, '$1');
I used .com and .in for my specific requirement, else the solution on this http://regexr.com/39i0i will work
Though sill there is issue like- it doesn't convert shortened url into
links perfectly. e.g http://s.ly/qhdfTyuiOP will give link till s.ly
Still any suggestions?
^(http|https|ftp)?(www\.) does not mean "all www not starting with http/https/ftp" but rather "a string that starts with an optional http/https/ftp followed by www..
Indeed, ^ in this context isn't a negation but rather an anchor representing the start of the string. I suppose you used it this way because of its meaning when used in a character class ([^...]) ; it is rather tricky since its meaning change depending on the context it is found in.
You could just remove it and you should be fine, as I see no point of making sure the string does not start with http/https/ftp (you transformed those occurrences just before, there should be none left).
Edit : I mentioned lookbehind but forgot it's not available in JS...
If you wanted to make some kind of negation, the easiest way would be to use a negative lookbehind :
(?<!http|https|ftp)www\.
This matches "www." only when it's not preceded by http, https nor ftp.

Javascript Regular Expression for non-image url

In JavaScript, I want to extract a non-image url from a string e.g.
http://example.com
http://example.com/a.png
http://www.example.ccom/acd.php
http://www.example.com/b.jpg etc.
I would like to extract 1st and 3rd (non-image) URLs and ignore 2nd and 4th (image) URLs.
I tried the following which did not work
(https?:)?\/\/?[^\'"<>]+?^(\.(jpe?g|gif|png))
Which is the modification of the following Image URL Regular Expression (RE) to whom I added ^() (for not) for above snippet
(https?:)?//?[^\'"<>]+?\.(jpg|jpeg|gif|png)
Note: The RE in above examples is case-sensitive, if any clue for making RE case-insensitive
You can use a negative lookahead like these examples It will exclude anything with the string
assuming your urls are newline delimited like your example, something like this should work
(?!.*(jpg|jpeg|gif|png).*).*
EDIT: it looks like my example doesn't work, hopefully it is pointing oyu in the right direction at least
first removing the images:
var tmp = text.replace(/https?:\/\/[\S]+\.(png|jpeg|jpg|gif)/gi, '');
and then matching:
var m = tmp.match(/https?:\/\/[\S]+/gi);
console.log(m);

javascript regex replace some words with links, but not within existing links

Trying to replace certain words in HTML pages with the same word but as a URL linking to that resource.
For example, replace the word 'MySQL' with MySQL
Using the JS replace function with regex, and it's doing the replacing just fine.
BUT it's also replacing words that are already part of URLs... which is the problem.
For the MySQL example, it's replacing BOTH the "MySQL" text that's already linked, AND the URL leading to mysql.com, so breaking the already existing link.
Is there a way to update the inline regex (in the .replace call) to NOT do replacing in existing links, i.e. elements?
Here's the replace code:
var NewHTML = OriginalHTML
.replace(/\bJavaScript\b/gi, "$&")
.replace(/\bMySQL\b/gi, "$&")
;
Here's the full sample code (tried to paste it inline but wasn't looking right with the backticks):
http://pastie.org/private/v4l2s2c42aqduqlopurpw
Went through the JS regexp reference (here), and tried various other permutations in the regex matching, like the following, but all that does it make it not match ANY words on the page...
.replace(/\b(\<a\>*!\>)JavaScript\b/i,xxxxx
The following regex DOES prevent the match from happening wherever the word is literally touching a slash or a dash... but that's not the solution (and it does not fix the mysql example above):
.replace(/\b(?!\>)(?!\-)(?!\/)MySQL\b(?!\-)(?!\/)/gi, "$&")`
I've read through the related threads on stackoverflow and elsewhere, but can't seem to find this particular scenario, not in JavaScript anyway.
Any help would be greatly appreciated. :-)
Thanks!
You could change your regex to exclude keywords that precede the end anchor tag, </a>:
.replace(/\bMySQL\b(?![^<]*?<\/a>)/gi, "$&")
See jsfiddle for example.
A negative lookahead should be sufficient:
.replace(/\bMySQL(?!\.com)\b/gi, "$&")

Need a regex for acceptable file names

I'm using Fancy Upload 3 and onSelect of a file I need to run a check to make sure the user doesn't have any bad characters in the filename. I'm currently getting people uploading files with hieroglyphics and such in the names.
What I need is to check if the filename only contains:
A-Z
a-z
0-9
_ (underscore)
- (minus)
SPACE
ÀÈÌÒÙàèìòùÁÉÍÓÚÝáéíóúýÂÊÎÔÛâêîôûÃÑÕãñõÄËÏÖÜäëïöü (as single and double byte)
Obviously you can see the difficult thing there. The non-english single and double byte chars.
I've seen this:
[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF]
And this:
[\x80-\xA5]
But neither of them fully cover the situation right.
Examples that should work:
fást.zip
abc.zip
ABC.zip
Über.zip
Examples that should NOT work:
∑∑ø∆.zip
¡wow!.zip
•§ªº¶.zip
The following is close, but I'm NO RegEx'pert, not even close.
var filenameReg = /^[A-Za-z0-9-_]|[\x00A0-\xD7FF\xF900-\xFDCF\xFDF0-\xFFEF]+$/;
Thanks in advance.
Solution from Zafer mostly works, but it does not catch all of the other symbols, see below.
Uncaught:
¡£¢§¶ª«ø¨¥®´åß©¬æ÷µç
Caught:
™∞•–≠'"πˆ†∑œ∂ƒ˙∆˚…≥≤˜∫√≈Ω
Regex:
var filenameReg = /^([A-Za-z0-9\-_. ]|[\x00A0-\xD7FF\xF900-\xFDCF\xFDF0-\xFFEF])+$/;
Alternation between two character classes (ie. [abc]|[def]) can be simplified to a single character class ([abcdef]) -- the first can be read as "(a or b or c) OR (d or e or f)"; the second as "(a or b or c or d or e or f)". What probably tripped up your regular expression is the unescaped dash in the first class -- if you want a literal dash, it should be the last character in the class.
So we'll modify your expression to get it working:
var filenameReg = /^[A-Za-z0-9_\x00A0-\xD7FF\xF900-\xFDCF\xFDF0-\xFFEF-]+$/;
The problem now is that you're not accounting for the file extension, but that is an easy modification (assuming you're always getting .zip files):
var filenameReg = /^[A-Za-z0-9_\x00A0-\xD7FF\xF900-\xFDCF\xFDF0-\xFFEF-]+\.zip$/;
Replace zip with another pattern if the extension differs.
It looks like it is the character ranges that are causing the problem, because they include some unallowable characters in between. Since you already have the list of allowable characters, the best thing would be to just use that directly:
var filenameReg = /^[A-Za-z0-9_\-\ ÀÈÌÒÙàèìòùÁÉÍÓÚÝáéíóúýÂÊÎÔÛâêîôûÃÑÕãñõÄËÏÖÜäëïöü]+$/;
The following should work:
var filenameReg = /^([A-Za-z0-9\-_. ]|[\x00A0-\xD7FF\xF900-\xFDCF\xFDF0-\xFFEF])+$/;
I've put \ next to - and grouped two expressions otherwise + sign doesn't affect the first expression.
EDIT 1 :I've also put . in the expression.
We have diffrent rules for diffrent platforms. But I think you mean long file names in windows. For that you can use following RegEx:
var longFilenames = #"^[^\./:*\?\""<>\|]{1}[^\/:*\?\""<>\|]{0,254}$";
NOTE: Instead of saying which Character is allowed, you need to say which ones are not allowed!
But keep in mind that this is not 100% complete RegEx. If you really want to make it complete you have to add exceptions for reserved names as well.
You can find more information about filename rules here:
http://msdn.microsoft.com/en-us/library/aa365247%28VS.85%29.aspx

Building a Hashtag in Javascript without matching Anchor Names, BBCode or Escaped Characters

I would like to convert any instances of a hashtag in a String into a linked URL:
#hashtag -> should have "#hashtag" linked.
This is a #hashtag -> should have "#hashtag" linked.
This is a [url=http://www.mysite.com/#name]named anchor[/url] -> should not be linked.
This isn't a pretty way to use quotes -> should not be linked.
Here is my current code:
String.prototype.parseHashtag = function() {
return this.replace(/[^&][#]+[A-Za-z0-9-_]+(?!])/, function(t) {
var tag = t.replace("#","")
return t.link("http://www.mysite.com/tag/"+tag);
});
};
Currently, this appears to fix escaped characters (by excluding matches with the amperstand), handles named anchors, but it doesn't link the #hashtag if it's the first thing in the message, and it seems to grab include the 1-2 characters prior to the "#" in the link.
Halp!
How about the following:
/(^|[^&])#([A-Za-z0-9_-]+)(?![A-Za-z0-9_\]-])/g
matches the hashtags in your example. Since JavaScript doesn't support lookbehind, it tries to either match the start of the string or any character except & before the hashtag. It captures the latter so it can later be replaced. It also captures the name of the hashtag.
So, for example:
subject.replace(/(^|[^&])#([A-Za-z0-9_-]+)(?![A-Za-z0-9_\]-])/g, "$1http://www.mysite.com/tag/$2");
will transform
#hashtag
This is a #hashtag and this one #too.
This is a [url=http://www.mysite.com/#name]named anchor[/url]
This isn't a pretty way to use quotes
into
http://www.mysite.com/tag/hashtag
This is a http://www.mysite.com/tag/hashtag and this one http://www.mysite.com/tag/too.
This is a [url=http://www.mysite.com/#name]named anchor[/url]
This isn't a pretty way to use quotes
This probably isn't what t.link() (which I don't know) would have returned, but I hope it's a good starting point.
There is an open-source Ruby gem to do this sort of thing (hashtags and #usernames) called twitter-text. You might get some ideas and regexes from that, or try out this JavaScript port.
Using the JavaScript port, you'll want to just do:
var linked = TwitterText.auto_link_hashtags(text, {hashtag_url_base: "http://www.mysite.come/tag/"});
Tim, your solution was almost perfect. Here's what I ended up using:
subject.replace(/(^| )#([A-Za-z0-9_-]+)(?![A-Za-z0-9_\]-])/g, "$1#$2");
The only change is the first conditional, changed it to match the beginning of the string or a space character. (I tried \s, but that didn't work at all.)

Categories

Resources