replace several spans <span>with one <span> - javascript

I'm looking for a solution similar to
Regex to replace multiple spaces with a single space
but instead of space the question is about <span>. It doesn't contain additional attributes in it such as class. It's just exactly 6 symbols <span> (no spaces, no nothing).
As result, the string
"<span>The <span><span><span><span>dog <span><span>has</span> a long</span> tail, and it </span></span></span>is RED</span></span>!"
should be replaced to
"<span>The <span>dog <span>has</span> a long</span> tail, and it </span></span></span>is RED</span><span>!"
(please don't pay attention closing spans will be more, additional modifications are expected thereafter).
P.S. Yes, you're right, you may want to ask if 2+ consequent spans may have spaces in between, tabs or even new lines. Honestly - yes, but even without spaces, tabs, new lines the answer will be useful. Thank you.

Try out the following two replace methods (can you use them chained):
if or is repeated directly after another (twice or more often), replace that whole thing with just one expression:
.replace(/(\<span\>){2,}/g, "<span>")
.replace(/(\</span\>){2,}/g, "</span>")
By the way, regexr.com is a great place if you want to try out regex!

Related

Remove characters between 2 specifics strings in a JavaScript string

I'm trying to remove all the characters between the characters <p and </p> (basically all the attributes in the p tags).
With the following block of code, it removes everything, including the text inside the <p>
MyString.replace(/<p.*>/, '<p>');
Example: <p style="test" class="test">my content</p> gives <p></p>
Thank you in advance for your help!
Try this RegEx: /<p [^>]*>/, basically just remove the closing bracket from the accepted characters. . matches all characters, that's why this doesn't work. With the new one it stops at the first >.
Edit: You can add a global and multi-line flag: /<p [^>]*>/gm. Also as one of the comments pointed out, removing the tag makes it applicant for every tag, however this will make replacing a bit harder. This RegEx is: /<[^>]*>/gm
MyString.replace(/\<p.*<\/p>/, '<p></p>');

Using regular expression to parse text to prevent XSS

I'm trying to parse a blob of text in html format, that only allow bold <b></b> and italic <i></i>.
I know it nearly impossible to parse the html text to secure XSS. But given the constraints only to bold and italic, is that feasible to use regex to filter out the unnecessary tags?
Thanks.
--- Edit ---
I meant to do the parsing on the client side, and render it right back.
Please test your code against this, before jumping into conclusion.
http://voog.github.io/wysihtml/examples/simple.html
BTW, why is the question itself get down voted?
--- Closed ---
I picked #Siguza 's answer to close this discussion.
The easiest and probably most secure way I can think of (doing this with regex) is to first replace all < and > with < and > respectively, and then explicitly "un-replace" the b and i tags.
To replace < and > you just need text substitution, no regex. But I trust you know how to do this in regex anyway.
To re-enable the i and b tags, you could also use four text replacements:
<b> => <b>
</b> => </b>
<i> => <i>
</i> => </i>
Or, in regex replace /<(\/?[bi])>/g with <$1>.
But...
...for the sake of completeness, it actually is possible with just one single regex substitution:
Replace /<(|\/|[^>\/bi]|\/[^>bi]|[^\/>][^>]+|\/[^>][^>]+)>/g with <$1>.
I will not guarantee that this is bullet-proof, but I tested it against the following block using RegExr, where it appeared to hold up:
<>Test</>
<i>Test</i>
<iii>Test</iii>
<b>Test</b>
<bbb>Test</bbb>
<a>Test</a>
<abc>Test</abc>
<some tag with="attributes">Test</some>
<br/>
<br />
Can you do this with regex? Kind of. You have to write a regex to find all tags that are not b or i tags. Below is a simple example of one, it matches any tag with more than 1 character in it, which only allows <a>, <b>, <i>, <p>, <q>, <s>, and <u> (no spaces, no attributes and no classes allowed), which I believe fits your needs. There may well be a more precise regex for this, but this is simple. It may or may not catch everything. It probably doesn't.
<[^>]{2,}[^/]>
Should you do this with regex? No. There are other better, more secure ways.
Parse out tags, replace with a special delimiter (or store indices).
XSS sanitize the input.
Replace the delimiters with tags.
Make sure you don't have any mismatched tags.
XSS sanitizing needs to be done server-side - the client is in control of the client-side, and can circumvent any checks there.
I still maintain that the OWASP Cheat Sheet is sufficient for XSS sanitization, and replacing only empty bold and italic tags shouldn't compromise any of the rules.

Use Javascript to get the Sentence of a Clicked Word

This is a problem I'm running into and I'm not quite sure how to approach it.
Say I have a paragraph:
"This is a test paragraph. I love cats. Please apply here"
And I want a user to be able to click any one of the words in a sentence, and then return the entire sentence that contains it.
You first would have to split your paragraph into elements, as you can't (easily) detect clicks on text without elements :
$('p').each(function() {
$(this).html($(this).text().split(/([\.\?!])(?= )/).map(
function(v){return '<span class=sentence>'+v+'</span>'}
));
});
Note that it splits correctly paragraphs like this one :
<p>I love cats! Dogs are fine too... Here's a number : 3.4. Please apply here</p>​
Then you would bind the click :
$('.sentence').click(function(){
alert($(this).text());
});
Demonstration
I don't know if in English : is a separator between sentences. If so, it can be added to the regex of course.
First of all, be prepared to accept a certain level of inaccuracy. This may seem simple on the surface, but trying to parse natural languages is an exercise in madness. Let us assume, then, that all sentences are punctuated by ., ?, or !. We can forget about interrobangs and so forth for the moment. Let's also ignore quoted punctuation like "!", which doesn't end the sentence.
Also, let's try to grab quotation marks after the punctuation, so that "Foo?" ends up as "Foo?" and not "Foo?.
Finally, for simplicity, let's assume that there are no nested tags inside the paragraph. This is not really a safe assumption, but it will simplify the code, and dealing with nested tags is a separate issue.
$('p').each(function() {
var sentences = $(this)
.text()
.replace(/([^.!?]*[^.!?\s][.!?]['"]?)(\s|$)/g,
'<span class="sentence">$1</span>$2');
$(this).html(sentences);
});
$('.sentence').on('click', function() {
console.log($(this).text());
});​
It's not perfect (for example, quoted punctuation will break it), but it will work 99% of the time.
Live demo: http://jsfiddle.net/SmhV3/
Slightly amped-up version that can handle quoted punctuation: http://jsfiddle.net/pk5XM/1/
Match the sentences. You can use a regex along the lines of /[^!.?]+[!.?]/g for this.
Replace each sentence with a wrapping span that has a click event to alert the entire span.
I suggest you take a look at Selection and ranges in JavaScript.
There is not method parse, which can get you the current selected setence, so you have to code that on your own...
A Javascript library for getting the Selection Rang cross browser based is Rangy.
Not sure how to get the complete sentense. but you can try this to get word by word if you split each word by spaces.
<div id="myDiv" onmouseover="splitToSpans(this)" onclick="alert(event.target.innerHTML)">This is a test paragraph. I love cats. Please apply here</div>
function splitToSpans(element){
if($(element).children().length)
return;
var arr = new Array();
$($(element).text().split(' ')).each(function(){
arr.push($('<span>'+this+' </span>'));
});
$(element).text('');
$(arr).each(function(){$(element).append(this);});
}

jQuery match first letter in a string and wrap with span tag

I'm trying to get the first letter in a paragraph and wrap it with a <span> tag. Notice I said letter and not character, as I'm dealing with messy markup that often has blank spaces.
Existing markup (which I can't edit):
<p> Actual text starts after a few blank spaces.</p>
Desired result:
<p> <span class="big-cap">A</span>ctual text starts after a few blank spaces.</p>
How do I ignore anything but /[a-zA-Z]/ ? Any help would be greatly appreciated.
$('p').html(function (i, html)
{
return html.replace(/^[^a-zA-Z]*([a-zA-Z])/g, '<span class="big-cap">$1</span>');
});
Demo: http://jsfiddle.net/mattball/t3DNY/
I would vote against using JS for this task. It'll make your page slower and also it's a bad practice to use JS for presentation purposes.
Instead I can suggest using :first-letter pseudo-class to assign additional styles to the first letter in paragraph. Here is the demo: http://jsfiddle.net/e4XY2/. It should work in all modern browsers except IE7.
Matt Ball's solution is good but if you paragraph has and image or markup or quotes the regex will not just fail but break the html
for instance
<p><strong>Important</strong></p>
or
<p>"Important"</p>
You can avoid breaking the html in these cases by adding "'< to the exuded initial characters. Though in this case there will be no span wrapped on the first character.
return html.replace(/^[^a-zA-Z'"<]*([a-zA-Z])/g, '<span class="big-cap">$1</span>');
I think Optimally you may wish to wrap the first character after a ' or "
I would however consider it best to not wrap the character if it was already in markup, but that probably requires a second replace trial.
I do not seem to have permission to reply to an answer so forgive me for doing it like this. The answer given by Matt Ball will not work if the P contains another element as first child. Go to the fiddle and add a IMG (very common) as first child of the P and the I from Img will turn into a drop cap.
If you use the x parameter (not sure if it's supported in jQuery), you can have the script ignore whitespace in the pattern. Then use something like this:
/^([a-zA-Z]).*$/
You know what format your first character should be, and it should grab only that character into a group. If you could have other characters other than whitespace before your first letter, maybe something like this:
/.*?([a-zA-Z]).*/
Conditionally catch other characters first, and then capture the first letter into a group, which you could then wrap around a span tag.

Using negative lookahead multiple times (or matching multiple characters with ^)?

I want to do something like this:
/<script[^>]*>(?!<\/script>)*<\/script>/g
to match all scripts tag in a html string using javascript.
I know this won't work but i can't seem to find any other solutions.
The script-tag can either use the src attribute and close it self right after (<script src="..." type="text/javascript"></script>) or can contain the code within the script-tag (<script type="text/javascript">...</script>)
You were close
/<script[^>]*>(?:(?!<\/script>).)*<\/script>/g
You must have something to eat the actual script body. That's what the . does here.
The look-ahead check must occur before every character, so it is wrapped in an extra (non-capturing) group. To capture the script source code in group 1, just add another set of parens around the (?:...) like #AlanMoore pointed out in the comments.
Try this
/<script[^>]*>.*?<\/script>/g
I don't see a reason for a negative look ahead. .*? is a lazy match so that it only matches till the next closing tag and not till the last one.

Categories

Resources