Highlighting words across dom nodes - javascript

I have some messy html due to OCR that I need to highlight. Words are sometimes split between dom nodes. I need to search for any user input text and add a highlight span around wherever that text appears in the html.
Example
<div id="content">
<span>my birthday par<span>ty is today</span></span>
</div>
The search term here would be "birthday party". I have tried to regex but am unable to capture the right group.
(regex noob) new RegExp(`([${searchTerm}]+)!?<[^>]*>`, 'gi') which is producing ["y birthday pa<span>", "day</span>"]
I would need to capture something like my Birthday par<span>ty or the index of that so I can wrap it in another element to highlight.
Ideal outcome would be
<div id='content'>
<span>my<mark class='highlight'>birthday par<span>ty</mark> is today</span><span>
</div>
Thanks in advance!

Related

Specific Webpage Word-Counter Needed with a Logical Semantic Twist and an exact placeholder

Context
The following question has NOT already an answer in Stack Overflow.
Based on many broad unspecific undetailed and ambiguous questions and many broader answers like here I have created a very specific and very precisely formulated question that, unlike the other broader questions on this site, have no space for ambiguity.
Problem
This question narrows down the quest for a logical word-count in a semantically different way and differs thus from other questions as well as it allows for precise and useful answer with a practical implemenatation anywhere on the page where one wants to count and or to show the counter after a preset delay, that relieves the code from causing any overhead while the words are still being fetched.
You are free to fork into the link above or the many many other works already wandering around elsewhere, and tweak it for your specific answer of this question.
You are free to use jQuery library 3.6.0 if you wish so make one with zero-dependencies on jQuery.
Starting point
Given the following HTML syntax, I would like to count the number of useful words in the <article> html element.
These could be a word starting from two characters up. The meaning of the English word a or o is nullified and not counted as a meaningful word! Only words from two characters up, need to be counted to reflect substance and valuable contents.
Include words in normal html tags like <p> <h1> <h2> <h3> <em> <i> <strong> <mark> or <td>.
Only elements that are <article> and inside a <section> or a <hgroup> should be counted. All else should be skipped.
Dashed-words can count as one word: the space around the words is the defining factor.
<article>
<hgroup>
<hr>word word word word word</h1>
</hgroup>
<something>Ignored Ignored</something>
<section>
<h2>word word word word word</h2>
<p>word word</p>
<p><em>Word Word word</em></p>
<p><i>Word Word word</i></p>
<p>word word</p>
<p><strong>Word Word word</strong></p>
<p><mark>Word Word word</mark></p>
<p>word word</p>
</section>
<something>Ignored Ignored</something>
<section>
<h3>word word word word word</h3>
<ul>
<li>word word word word</li>
<li>word word word word</li>
<li>word word word word</li>
</ul>
<p>word word</p>
<p>word word</p>
<table>...<td>Maandag</td><td>Dinsdag</td>...
</table>
<p>a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a</p>
</section>
</article>
Placement
The numer of words should then be written inside a prechosen html element, say for example <wordcount>...</wordcount>.
In the example above, the correct word count should be: ???
(15 for the three headings, 10 in unstyled paragraphs, 12 in styled elements and 12 in list items and 2 in table data.)
<wordcount>51</wordcount>

JavaScript string: get content of two standing next to each other pieces of content and wrap them together

I'm trying to create a small script that would wrap some parts of text from e.g. <p> tag like this one: <p>... 'displayed text'[popup content] ...</p> in a span wrapper.
The end result would look like this:
<span class='wrapper'>
displayed text
<span class='popup'>popup content</span>
</span>
At the moment I'm able to find and replace the text between apostrophes like this:
some_string.replace(/'(.*?)'/g,'<span>$1</span>');
But I would really like to wrap the popup content part first and then wrap it together with displayed text inside the wrapper element.
Would that be possible?
Sure - how about this?
some_string.replace(/'(.*?)'\[(.*?)\]/, "$1<span class='popup'>$2</span>");
Add a \s* between the two parts of the regex if they could be separated by whitespace:
/'(.*?)'\s*\[(.*?)\]/

Raw HTML is getting rendered on DOM in React component

I am getting below response from the API, and I want to convert it into proper html and would like to render it on dom, but it is rendering raw html and special characters.
example api response:
resp = {
body: "<p>Cali Thirty Seven turned what appeared to be certain defeat into an exhilarating and much-deserved victory late Saturday afternoon at Gulfstream Park. She reasserted herself after relinquishing the lead to 8-5 favorite Stormy Victoria to successfully defend her title in the $100,000 Powder Break Stakes.</p>\r\n<p>"
}
In the react component I am rendering it in the following way:
<p
className="newsDescription"
dangerouslySetInnerHTML={{ __html:this.props.story.desp }}
/>
I tried to escape html but it is not working.
You have to replace whatever html characters you have in your string to the corresponding tags. Since in your example you only have "<p> (which correspond to <p>) you can do this:
validate if this.props.story.desp has a value a is a string and then replace:
<span
className="newsDescription"
dangerouslySetInnerHTML={{ __html: this.props.story.desp.replace(/</g, '<').replace(/>/g, '>')}}
/>
This will generate a <span> element with a <p> element (the paragraph element coming from your API) inside the <span> with your text. Also, notice that this will replace all occurrences for creating the paragraph tags.
Unfortunately there is no generic vanilla javascript function for replacing all possible tags.
Note that I changed the <p> tag to a <span> because block elements should not have other block elements inside. Have a look at this question in SO.

Get Only Tag ChildNodes Not TextNodes

I have been programming with JavaScript for a while now but today I just knew that element.childNodes returns an array of nodes including the text!!!
I probably have never run with trouble because I used to create the element and append text in paragraphs; but now that I think about it, things could get really messy with text as nodes!
my question is: how can I get only the child nodes that are tags not text nodes.
For example:
<div id="e">
I don't want to include this text right here.
<p>I want to get this paragraph child</p>
I also want to <em>exclude</em> this text...
<img src="image.jpg" alt=" " />
Google
Exclude this text too..
</div>
Therefore, I want to get the p, img, a and maybe em objects only..
Here is a simple example of using .children to target only the elements. childNodes returns textNodes by default and at times that is very useful, but not appropriate for your example usage.
[].forEach.call(document.querySelector('#e').children,function(el){ el.style.color = 'red'; });
<div id="e">
I don't want to include this text right here.
<p>I want to get this paragraph child</p>
I also want to <em>exclude</em> this text...
<img src="image.jpg" alt=" " />
Google
Exclude this text too..
</div>

JQuery - Paste event, stripping rich text

I have a contentEditable field where I'd like to do the following. When a user pastse rich text into the field (microsoft word, or otherwise) I'd like to strip all the rich text, but retain the line breaks.
My thinking was: If you paste rich text into a plain <texarea>, it removes all the formatting and retains the breaks (created by explicit new lines, as well as block level elements). I'd like to somehow simulate this. In other words, create a temporary textarea, intercept the paste event, apply it to the Textarea, retrieve the results, and insert them back into the original Content Editable field.
However, i haven't found a way to simulate this. If I paste the contents into a textarea via jquery, it seems to retan all the rich text formatting when I try and copy it from there, back to the original fields
You could achieve something like this without needing a textarea, just processing the code in the content editable div every time that there is a change in its value, and removing all the tags but the paragraphs and line breaks.
The idea would be that every time that the content changes in the div (listen to the input event):
Replace in the inner HTML all </p> and <br> for non-HTML tokens (e.g.: [p] and [br] respectively).
Remove all HTML tags (you can do it with .text())
Replace the tokens that you used for their equivalents (and their openings!)
Wrap everything between <p> and </p>.
Here a simple demo:
$("#editable").on("input", function() {
$this = $(this);
// replace all br and closing p tags with special tokens
$this.html($this.html().replace(/\<\/p\>/g,"[p]").replace(/\<br\>/g,"[br]"));
// remove all the tags, and then replace the tokens for their original values
$this.html("<p>" + $this.text().replace(/\[p\]/g, "</p><p>").replace(/\[br\]/g,"<br>") + "</p>");
});
div#editable, div#demo {
border:1px solid gray;
padding:6px;
}
<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.11.1/jquery.min.js"></script>
<div contenteditable="true" id="editable"></div>
<h3>Demo</h3>
<p>Copy this code and paste it on the editable div above:</p>
<div id="demo">
<p>This <b>is</b> <i>a</i> styled paragraph.</p>
<p> </p>
<p>The <span style="font-weight:bold">above paragraph</span> is empty.</p>
<p>And this is a new paragraph…<br>with a line break!</p>
</div>
You can also see it running on this JSFiddle: http://jsfiddle.net/v5rae96w/
I tried this solution with MS Word and HTML and it works fine. But it has one issue: it only does line breaks with p and br (that works nicely with MS Word and other word processors). If the user copies HTML like div (or other block elements that cause a line break), it won't work as nicely. If you need to break with all block elements, this solution may require some changes.
To fix that, you could replace all the block tags with p (or div or the element that you want), by indicating it on the regular expression:
$this.html().replace(/(\<\/p\>|\<\/h1\>|\<\/h2\>|\<\/div\>)/gi,"[p]")
As you can see here:
$("#editable").on("input", function() {
$this = $(this);
// replace all closing block tags with special token
$this.html($this.html().replace(/(\<\/p\>|\<\/h1\>|\<\/h2\>|\<\/h3\>|\<\/h4\>|\<\/h5\>|\<\/h6\>|\<\/div\>)/gi,"[p]").replace(/\<br\>/gi,"[br]"));
// remove all the tags
$this.html("<p>" + $this.text().replace(/\[p\]/g,"</div><p>").replace(/\[br\]/g,"<br>") + "</p>");
});
div#editable, div#demo {
border:1px solid gray;
padding:6px;
}
<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.11.1/jquery.min.js"></script>
<div contenteditable="true" id="editable"></div>
<div>
<h3>Demo</h3>
<p>Copy this code and paste it on the editable div above:</p>
<div id="demo">
<p>This <b>is</b> <i>a</i> styled paragraph.</p>
<p> </p>
<p>The <span style="font-weight:bold">above paragraph</span> is empty.</p>
<p>And this is a new paragraph…<br>with a line break!</p>
</div>
</div>
Or on this JSFiddle: http://jsfiddle.net/v5rae96w/1/

Categories

Resources