JavaScript RegExp replace HTML comments

JavaScript RegExp replace HTML comments - javascript

I'm searching a way to replace all html comments from a string like browser does. (multilined and unclosed)
For example, I actually use /(<\!--[\s\S]*?-->)/gim but if the html comment is unclosed, it does not replace it.
Normally, if the comment tag is not closed, comment tag gets everything after open tag...
Is there a way to adapt the regexp (or any other regexp) to do the stuff ? (in JavaScript)

This will mark all comments also the one without end tag: <!-- some text -->
<!--[\s\S]*?(?:-->|$)
This will mark all comments also the one without end tag: <!-- some text //-->
<!--[\s\S]*?(?://-->|$)
This will mark everything from the first <!-- to the very end of the file
<!--[\s\S]*?(?:$) and regex set to `^$ don't match at line breaks`
This will mark everything from the first <!-- to the end of the line
<!--.*

I must agree that using regex like this is not good practice and you shouldn't do it... here's why.
Buuuut, as a matter of understanding regex better, you can make something optional like this:
/(<\!--[\s\S]*?(?:-->)?)/gim
I wrapped --> in parenthesis to group it together
I put a ? after that group to make it optional
(not necessary) I put ?: inside of the group to keep the regex engine from saving a back reference... it's a performance nuance.

Thanks to #Andie2302 for the help.
This regexp /<!--[\s\S]*?(?:-->|$)/gi work find.
Do not use the flag m!

Related

replace \n from tag attribute using javascript regex with empty character

I have tag like <span style="font-size:10.5pt;\nfont-family:\nKaiTi"> and I want to replace \n within tag with empty character.
Note: Tag could be anything(not fixed)
I want regex expression to replace the same in the javascript.

You should be able to strip out the \n character before applying this HTML to the page.
Having said that, try this (\\n)
You can see it here: regex101
Edit: A bit of refinement and I have this (\W\\n). It works with the example you provided. It breaks down if you have spaces in the body of the tags (<span> \n </span>).
I've tried everything I know to do. Perhaps someone with more regex experience can assist?

Check for HTML tags

I'm trying to write a regular expression to find out if there are any HTML tags, so far I have:
/^[^<>]+$/
It's for a validator, if no HTML tags exist, it will validate.

You can use the adapted version of the regex in this SO post:
^((?!<[^<]+>)[\s\S])*$
See demo.
Perhaps, you can further enhance it to only match if the first character after < is a letter:
^((?!<[a-zA-Z][^<]*>)[\s\S])*$
See another demo

You use a regex like this:
.*<\/?.*?>.*
Working demo
The idea is to find strings with tags like <tag>, <tag withAttribute="something"> or </closingTag>
Update: as Mr. Llama pointed in his comment you could enable s flag to enable . to match all. This will help you with multiple line strings.
(?s).*<\/?.*?>.*
^--- use this for inline single line flag or use the first regex but enable the `s` flag
Working demo

jQuery match first letter in a string and wrap with span tag

I'm trying to get the first letter in a paragraph and wrap it with a <span> tag. Notice I said letter and not character, as I'm dealing with messy markup that often has blank spaces.
Existing markup (which I can't edit):
<p> Actual text starts after a few blank spaces.</p>
Desired result:
<p> <span class="big-cap">A</span>ctual text starts after a few blank spaces.</p>
How do I ignore anything but /[a-zA-Z]/ ? Any help would be greatly appreciated.

$('p').html(function (i, html)
{
return html.replace(/^[^a-zA-Z]*([a-zA-Z])/g, '<span class="big-cap">$1</span>');
});
Demo: http://jsfiddle.net/mattball/t3DNY/

I would vote against using JS for this task. It'll make your page slower and also it's a bad practice to use JS for presentation purposes.
Instead I can suggest using :first-letter pseudo-class to assign additional styles to the first letter in paragraph. Here is the demo: http://jsfiddle.net/e4XY2/. It should work in all modern browsers except IE7.

Matt Ball's solution is good but if you paragraph has and image or markup or quotes the regex will not just fail but break the html
for instance
<p><strong>Important</strong></p>
or
<p>"Important"</p>
You can avoid breaking the html in these cases by adding "'< to the exuded initial characters. Though in this case there will be no span wrapped on the first character.
return html.replace(/^[^a-zA-Z'"<]*([a-zA-Z])/g, '<span class="big-cap">$1</span>');
I think Optimally you may wish to wrap the first character after a ' or "
I would however consider it best to not wrap the character if it was already in markup, but that probably requires a second replace trial.

I do not seem to have permission to reply to an answer so forgive me for doing it like this. The answer given by Matt Ball will not work if the P contains another element as first child. Go to the fiddle and add a IMG (very common) as first child of the P and the I from Img will turn into a drop cap.

If you use the x parameter (not sure if it's supported in jQuery), you can have the script ignore whitespace in the pattern. Then use something like this:
/^([a-zA-Z]).*$/
You know what format your first character should be, and it should grab only that character into a group. If you could have other characters other than whitespace before your first letter, maybe something like this:
/.*?([a-zA-Z]).*/
Conditionally catch other characters first, and then capture the first letter into a group, which you could then wrap around a span tag.

Using negative lookahead multiple times (or matching multiple characters with ^)?

I want to do something like this:
/<script[^>]*>(?!<\/script>)*<\/script>/g
to match all scripts tag in a html string using javascript.
I know this won't work but i can't seem to find any other solutions.
The script-tag can either use the src attribute and close it self right after (<script src="..." type="text/javascript"></script>) or can contain the code within the script-tag (<script type="text/javascript">...</script>)

You were close
/<script[^>]*>(?:(?!<\/script>).)*<\/script>/g
You must have something to eat the actual script body. That's what the . does here.
The look-ahead check must occur before every character, so it is wrapped in an extra (non-capturing) group. To capture the script source code in group 1, just add another set of parens around the (?:...) like #AlanMoore pointed out in the comments.

Try this
/<script[^>]*>.*?<\/script>/g
I don't see a reason for a negative look ahead. .*? is a lazy match so that it only matches till the next closing tag and not till the last one.

Regex replace string but not inside html tag

I want to replace a string in HTML page using JavaScript but ignore it, if it is in an HTML tag, for example:
visit google search engine
you can search on google tatatata...
I want to replace google by <b>google</b>, but not here:
visit google search engine
you can search on <b>google</b> tatatata...
I tried with this one:
regex = new RegExp(">([^<]*)?(google)([^>]*)?<", 'i');
el.innerHTML = el.innerHTML.replace(regex,'>$1<b>$2</b>$3<');
but the problem: I got <b>google</b> inside the <a> tag:
visit <b>google</b> search engine
you can search on <b>google</b> tatatata...
How can fix this?

You'd be better using an html parser for this, rather than regex. I'm not sure it can be done 100% reliably.

You may or may not be able to do with with a regexp. It depends on how precisely you can define the conditions. Saying you want the string replaced except if it's in an HTML tag is not narrow enough, since everything on the page is presumably within some HTML tag (BODY if nothing else).
It would probably work better to traverse the DOM tree for this instead of trying to use a regexp on the HTML.

Parsing HTML with a regular expression is not going to be easy for anything other than trivial cases, since HTML isn't regular.
For more details see this Stackoverflow question (and answers).

I think you're all missing the question here...
When he says inside the tag, he means inside the opening tag, as in the <a href="google.com"> tag...This is something quite different than text, say, inside a <p> </p> tag pair or <body> </body>. While I don't have the answer yet, I'm struggling with this same problem and I know it has to be solvable using regex. Once I figure it out, i'll come back and post.

WORKAROUND
If You can't use a html parser or are quite confident about Your html structure try this:
do the "bad" changing
repeat replace (<[^>]*)(<[^>]+>) to $1 a few times (as much as You need)
It's a simple workaround, but works for me.
Cons?
Well... You have to do the replace twice for the case ... ...> as it removes only first unwanted tag from every tag on the page
[edit:]
SOLUTION
Why not use jQuery, put the html code into the page and do something like this:
$(containerOrSth).find('a').each(function(){
if($(this).children().length==0){
$(this).text($(this).text().replace('google','evil'));
}else{
//here You have to care about children tags, but You have to know where to expect them - before or after text. comment for more help
}
});

I'm using
regex = new RegExp("(?=[^>]*<)google", 'i');

you can't really do that, your "google" is always in some tag, either replace all or none

Well, since everything is part of a tag, your request makes no real sense. If it's just the <a /> tag, you might just check for that part. Mainly by making sure you don't have a tailing </a> tag before a fresh <a>

You can do that using REGEX, but filtering blocks like STYLE, SCRIPT and CDATA will need more work, and not implemented in the following solution.
Most of the answers state that 'your data is always in some tags' but they are missing the point, the data is always 'between' some tags, and you want to filter where it is 'in' a tag.
Note that tag characters in inline scripts will likely break this, so if they exist, they should be processed seperately with this method. Take a look at here :
complex html string.replace function

I can give you a hacky solution…
Pick a non printable character that’s not in your string…. Dup your buffer… now overwrite the tags in your dup buffer using the non printable character… perform regex to find position and length of match on dup buffer … Now you know where to perform replace in original buffer

Develop Reference

JavaScript is the programming language of the Web.

JavaScript RegExp replace HTML comments - javascript

Thanks to #Andie2302 for the help. This regexp /|$)/gi work find. Do not use the flag m!

Related

replace \n from tag attribute using javascript regex with empty character

Check for HTML tags

jQuery match first letter in a string and wrap with span tag

Using negative lookahead multiple times (or matching multiple characters with ^)?

Regex replace string but not inside html tag

Categories

Resources