Regex to replace only links without replacing the src attributes - javascript

I'm trying to replace the urls in the block of text with clickable link while rendering.
The regex am using :
/(\b(https?|ftp|file):\/\/[-A-Z0-9+&##\/%?=~_|!:,.;]*[-A-Z0-9+&##\/%=~_|])/ig
Example
This is the text i got from http://www.sample.com
it should be converted to
This is the text i got from
http://www.sample.com
the problem is when the text having the img tag , then the src attribute also getting replaced which i don't want.
Kindly help me to replace only direct links not the links in the src="" attributes
Thanks

Add a negative look-behind assertion at the beginning of your regex, to search only for strings not after src=":
(?<!src=")
Edit: Unfortunately look-behind assertions do not work in javascript regexes. Alternatively, you can use a negative look-ahead assertion like this:
((?!src=").{0,4})
remembering that you need to use the matched string in the replacement (otherwise you would delete 4 characters before http://).

Related

replace \n from tag attribute using javascript regex with empty character

I have tag like <span style="font-size:10.5pt;\nfont-family:\nKaiTi"> and I want to replace \n within tag with empty character.
Note: Tag could be anything(not fixed)
I want regex expression to replace the same in the javascript.
You should be able to strip out the \n character before applying this HTML to the page.
Having said that, try this (\\n)
You can see it here: regex101
Edit: A bit of refinement and I have this (\W\\n). It works with the example you provided. It breaks down if you have spaces in the body of the tags (<span> \n </span>).
I've tried everything I know to do. Perhaps someone with more regex experience can assist?

Regexp to catch links not in anchor tag

I'm trying to make a regexp in JavaScript to catch all links in a text, except ones inside anchor tags (both href attribute or inner text).
For example, the following should match:
http://google.com
However, nothing should match in the following:
Link
http://google.com
I've found this post on StackOverflow, but it requires lookbehind, which is not supported by JavaScript.
Try:
(ht|f)tps?:\/\/[^"]*?(?=<|\s|$)
OhAuth's answer (ht|f)tps?:\/\/[^"]*?(?=<|\s|$) uses the fact that the actual link in anchor tag is followed by ("), meaning neither lookbehind nor its workarounds are neccesary.
EDIT:
Using only lookaheads, we can achieve something like this: (ht|f)tps?:\/\/[^\"<]*?(?=\s|$|<\/[^a]>), which results in this: https://regex101.com/r/eR3mT4/1 , failing in an anchor title that contains link and aditional characters. That situation seems difficult for regex and lookbehinds wouldn't help.
Check this:
https://stackoverflow.com/a/35603748/2943191
((https?|ftps?):\/\/[^"<\s]+)(?![^<>]*>|[^"]*?<\/a)

Check for HTML tags

I'm trying to write a regular expression to find out if there are any HTML tags, so far I have:
/^[^<>]+$/
It's for a validator, if no HTML tags exist, it will validate.
You can use the adapted version of the regex in this SO post:
^((?!<[^<]+>)[\s\S])*$
See demo.
Perhaps, you can further enhance it to only match if the first character after < is a letter:
^((?!<[a-zA-Z][^<]*>)[\s\S])*$
See another demo
You use a regex like this:
.*<\/?.*?>.*
Working demo
The idea is to find strings with tags like <tag>, <tag withAttribute="something"> or </closingTag>
Update: as Mr. Llama pointed in his comment you could enable s flag to enable . to match all. This will help you with multiple line strings.
(?s).*<\/?.*?>.*
^--- use this for inline single line flag or use the first regex but enable the `s` flag
Working demo

jQuery match first letter in a string and wrap with span tag

I'm trying to get the first letter in a paragraph and wrap it with a <span> tag. Notice I said letter and not character, as I'm dealing with messy markup that often has blank spaces.
Existing markup (which I can't edit):
<p> Actual text starts after a few blank spaces.</p>
Desired result:
<p> <span class="big-cap">A</span>ctual text starts after a few blank spaces.</p>
How do I ignore anything but /[a-zA-Z]/ ? Any help would be greatly appreciated.
$('p').html(function (i, html)
{
return html.replace(/^[^a-zA-Z]*([a-zA-Z])/g, '<span class="big-cap">$1</span>');
});
Demo: http://jsfiddle.net/mattball/t3DNY/
I would vote against using JS for this task. It'll make your page slower and also it's a bad practice to use JS for presentation purposes.
Instead I can suggest using :first-letter pseudo-class to assign additional styles to the first letter in paragraph. Here is the demo: http://jsfiddle.net/e4XY2/. It should work in all modern browsers except IE7.
Matt Ball's solution is good but if you paragraph has and image or markup or quotes the regex will not just fail but break the html
for instance
<p><strong>Important</strong></p>
or
<p>"Important"</p>
You can avoid breaking the html in these cases by adding "'< to the exuded initial characters. Though in this case there will be no span wrapped on the first character.
return html.replace(/^[^a-zA-Z'"<]*([a-zA-Z])/g, '<span class="big-cap">$1</span>');
I think Optimally you may wish to wrap the first character after a ' or "
I would however consider it best to not wrap the character if it was already in markup, but that probably requires a second replace trial.
I do not seem to have permission to reply to an answer so forgive me for doing it like this. The answer given by Matt Ball will not work if the P contains another element as first child. Go to the fiddle and add a IMG (very common) as first child of the P and the I from Img will turn into a drop cap.
If you use the x parameter (not sure if it's supported in jQuery), you can have the script ignore whitespace in the pattern. Then use something like this:
/^([a-zA-Z]).*$/
You know what format your first character should be, and it should grab only that character into a group. If you could have other characters other than whitespace before your first letter, maybe something like this:
/.*?([a-zA-Z]).*/
Conditionally catch other characters first, and then capture the first letter into a group, which you could then wrap around a span tag.

Using negative lookahead multiple times (or matching multiple characters with ^)?

I want to do something like this:
/<script[^>]*>(?!<\/script>)*<\/script>/g
to match all scripts tag in a html string using javascript.
I know this won't work but i can't seem to find any other solutions.
The script-tag can either use the src attribute and close it self right after (<script src="..." type="text/javascript"></script>) or can contain the code within the script-tag (<script type="text/javascript">...</script>)
You were close
/<script[^>]*>(?:(?!<\/script>).)*<\/script>/g
You must have something to eat the actual script body. That's what the . does here.
The look-ahead check must occur before every character, so it is wrapped in an extra (non-capturing) group. To capture the script source code in group 1, just add another set of parens around the (?:...) like #AlanMoore pointed out in the comments.
Try this
/<script[^>]*>.*?<\/script>/g
I don't see a reason for a negative look ahead. .*? is a lazy match so that it only matches till the next closing tag and not till the last one.

Categories

Resources