\b metacharacter not working correctly

\b metacharacter not working correctly - javascript

I know this code works and it has before but it now has stopped working. I am working on a person highlighter tool but if I type in a word then type in mark, it highlights the actual mark element. Here is my code:
function Search (tagC) {
var notes = document.getElementsByClassName("NoteOp");
for (var i = 0; i < notes.length; i++) {
var n = notes[i];
var tagOut = tagC
var tagFront = tagOut.slice(0, -9);
var tagLast = tagOut.slice(-9);
n.innerHTML = n.innerHTML.replace(new RegExp("\\b(" + tagFront + ")\\b", "gim"), "<mark class=" + tagLast + ">$1</mark>");
if(window.Bold === "Yes") {
$("mark").css("font-weight", "bold");
}
}
}
tagFront is the search term while tagLast is a class that always has 9 letters. Any problems seen in the coding?
An example of tagC would be:
testYelColBox
...and the text I'm searching looks like this:
<div id="NoteHolder">
<p class="NoteOp">This is a test paragraph uses to TeSt filters.</p>
<p class="NoteOp">Random words, I need to see if it will mess up mark</p>
</div>
Main question: Why does my code mark an HTML element even though my code has a \b metacharacter selector?

Your problem seems to be this:
If you first highlight a word, it works correctly. But now your HTML has <mark> tags, so if now you search a second time with search word "mark", that tag gets a nested mark, which is undesired and makes your HTML invalid.
Why this happens
The \b escape matches any position in the search string where the character sequence switches from an alphanumerical character to a non-alphanumerical character or vice versa. This means \b also matches with the position right after the < of <mark ...>, and with the position right after the k (because of the space that follows).
Solution
Do a controlled replacement by only applying it to text nodes, not to HTML elements. For this you need to iterate over the nodes, check their type, and when they are text nodes, perform the replacement. As the replacement involves the insertion of an HTML element, you should actually split that text node into what comes before, the mark element, and what comes after.
Here is code that does all this:
function clear() {
var notes = document.getElementsByClassName("NoteOp");
for (var i = 0; i < notes.length; i++) {
var n = notes[i];
// Remove all opening/closing mark tags
n.innerHTML = n.innerHTML.replace(/<\/?mark.*?>/gm, "");
}
}
function highlight(tagC) {
// Sanity check
if (tagC.length <= 9) return; // ignore wrong input
var notes = document.getElementsByClassName("NoteOp");
// Split into parts before entering loop:
var tagFront = tagC.slice(0, -9);
var tagLast = tagC.slice(-9);
// Escape tagFront characters that could conflict with regex syntax:
tagLast = tagLast.replace(/([.*+?^${}()|\[\]\/\\])/g, "\\$1");
var regex = new RegExp("\\b(" + tagFront + ")\\b", "gim");
// Create a template of the highlight that can be cloned
var mark = document.createElement('mark');
mark.setAttribute('class', tagLast);
// Loop over notes
for (var i = 0; i < notes.length; i++) {
// Create a span that will have the contents after replacements
var span = document.createElement('span');
// Go through all child nodes of this note
var nodes = notes[i].childNodes;
for (var j = 0; j < nodes.length; j++) {
var node = nodes[j];
if (node.nodeType === 3) {
// Only if text node, perform replacement
parts = node.textContent.split(regex);
// As regex has capture group, also the split expression is a part
for (var k = 0; k < parts.length; k++) {
// Add this part
if (k % 2) {
// Add highlighted text
mark.textContent = parts[k];
span.appendChild(mark.cloneNode(true));
} else {
// Add text part that did not match as such
span.appendChild(document.createTextNode(parts[k]));
}
}
} else {
// Non-text nodes are just copied as they are
span.appendChild(node.cloneNode(true));
}
}
// Replace note contents with new contents
notes[i].innerHTML = span.innerHTML;
}
// Setting style for CSS class should happen outside of the loop
$("mark").css("font-weight", window.Bold === "Yes" ? "bold": "normal");
}
// I/O
var inp = document.querySelector('#inp');
var btnMark = document.querySelector('#mark');
var btnClear = document.querySelector('#clear');
btnMark.onclick = function () {
highlight(inp.value + 'YelColBox');
}
btnClear.onclick = clear;
Type text to be highlighted and press Mark:<br>
<input id="inp" value="">
<button id="mark">Mark</button>
<button id="clear">Clear</button>
<div id="NoteHolder">
<p class="NoteOp">This is a test paragraph uses to TeSt filters.</p>
<p class="NoteOp">Random words, I need to see if it will mess up mark</p>
</div>

Related

Locate node by character position(position relative to innerText)

I have this structure (can be more complex/contain nested tags)
<div id="editor">
<p>Some <i>text</i>
<p>Some other text</p>
</div>
I'm extracting text as is visible on screen with editor.innerText
And I get:
Some text
Some other text
My backend service analyzes this text and gives me highlights with "positions". For the example above let's say it returns (0,9) meaning start is text node containing "Some" and end node is the one containing "text". How can I locate these nodes by given positions?
I've had some success with collecting text with DOM traversal and keeping track of positions but I'm losing newlines and some white space(due to textContent).

The answer to this isn't trivial.
The best solution is to pass the innerHtml to your backend service in order to correctly highlight the text, and it'll need to be able to parse HTML.
However, your other solution is to pass your innerText to the backend, then step through all the characters in the innerHtml and ignore all the characters inside angle brackets.
This will probably require some cleaning up of whitespace, and a bit of HTML mangling, but I'll leave that up to you.
Here's an example of what I mean
let searchHtml = "<p>Some <i>text</i><p>Some other text</p>";
let outputHtml = "";
let highlightOpenTag = "<b>";
let highlightCloseTag = "</b>";
let currentlyHighlighting = false;
// Start and end positions from your backend
let startIndex = 0;
let endIndex = 9;
let inTag = false;
let textIndex = 0;
for (let i = 0; i < searchHtml.length; i++) {
let char = searchHtml[i];
// We don't want to insert highlight tags when we're inside a tag already
if (char === '<') inTag = true;
if (inTag) {
outputHtml += char;
} else {
// If we're not in a tag, but we are within the text positions
// returned from the backend, let's get highlighting
if (textIndex >= startIndex && textIndex < endIndex) {
if (!currentlyHighlighting) {
outputHtml += highlightOpenTag;
currentlyHighlighting = true;
}
}
outputHtml += char;
// If we're about to hit a tag and we're already highlighting then
// insert our end highlight tag
if((searchHtml.length < i+1 || searchHtml[i+1]) === '<' && currentlyHighlighting) {
outputHtml += highlightCloseTag;
currentlyHighlighting = false;
}
textIndex++;
}
if (char === '>') inTag = false;
}
console.log(outputHtml);

Adding style tags around specific characters only

I am trying to add inline styling to only numbers in paragraph elements. For example:
<p>This paragraph has the numbers 1 and 2 in it.</p>
So in this instance, I would want to put <span class="style">1</span>and <span class="style">2</span>. Around the two numbers in that paragraph.
I am trying to write a javascript to accomplish this so I don't have to go back into the document I'm working on and manually add the styling tags around each number, as the document is very long.
So far this is what I wrote, but I'm having difficulty figuring out what to do for the next step on how to incorporate the edits back into the paragraph HTML.
let regEx=/[0-9]/g;
let list = [];
let paragraphs = document.getElementsByTagName("p");
for (var i = 0; i < paragraphs.length; i++) {
let html = paragraphs[i].innerHTML;
list.push(html);
}
// all paragraphs into one string.
let joined = list.join(' ');
// all the numbers in the paragraphs stored in array
let numbers = joined.match(regEx);
// define array for styling edits
let edits = [];
// adding the styling tags to each num
numbers.forEach(function(num){
edits.push('<span class="style">' + num + '</span>');
// outputs ["<span class='style'>3</span>", "<span class='style'>7</span>", "<span class='style'>4</span>", "<span class='style'>5</span>"]
});
// need to insert edits into paragraph html
If anyone can offer any suggestions on how I might be able to accomplish this that would be great, I am still relatively new to working with JS.

const paragraphs = document.getElementsByTagName("p");
for (var i = 0; i < paragraphs.length; i++) {
const regEx=/([0-9])/g;
const newHtml = paragraphs[i].innerHTML.replace(regEx, '<span class="style">$1</span>');
paragraphs[i].innerHTML = newHtml;
}
I updated your regex to put the number in a group, then in the string replace you can reference that group, since there is only one it will be $1. As you can see in the replace we are wrapping that with the appropriate span and then plugging it right back into the innerHTML.
I did notice that your regex is only capturing single digit numbers, if you wanted to capture multi-digit numbers, you could update your reg ex like this: /([0-9]+)/g.
I created a simple jsfiddle to show you how it works: https://jsfiddle.net/andyorahoske/dd6k6ekp/35/

I broke out the most fundamental part of this into a reusable function that you may find helpful in other contexts.
/**
* Wraps numbers in a string with any provided wrapper.
* #param {String} str A string containing numbers to be wrapped.
* #param {String} wrapper A string with placeholder %s to define the wrapper. Example - <pre>%s</pre>
* #return {String} The original string with numbers wrapped using the wrapper param.
*/
function wrapNumbers(str, wrapper) {
var numbersInStr = str.match(/\d+/g) || [];
var chunks = [];
var segmentStart = 0;
for(var i = 0; i < numbersInStr.length; i += 1) {
var number = numbersInStr[i];
var indexOfNumber = str.indexOf(number);
var fWrapper = wrapper.replace('%s', number);
chunks.push(str.slice(segmentStart, indexOfNumber));
chunks.push(fWrapper);
segmentStart = indexOfNumber + number.length;
}
if(segmentStart < str.length) {
chunks.push(str.slice(segmentStart, str.length));
}
return chunks.join('');
}
To use this in your use case it might look like the following:
var paragraphs = document.getElementsByTagName("p");
var wrapper = '<span class="style">%s</span>';
for(var i = 0; i < paragraphs.length; i += 1) {
var paragraph = paragraphs[i];
paragraph.innerHTML = wrapNumbers(paragraph.innerHTML, wrapper);
}
Codepen: https://codepen.io/bryceewatson/pen/vRqeVy?editors=1111

Here's a new code https://jsfiddle.net/fazanaka/au4jufrr/1/
var element = document.getElementById('text'),
text = element.innerText,
wordsArray = text.split(' '),
newString;
for(var i = 0; i < wordsArray.length; i++){
if(!isNaN(parseFloat(wordsArray[i])) && isFinite(wordsArray[i])){
wordsArray[i] = "<span class='style'>" + wordsArray[i] + "</span>";
}
}
newString = wordsArray.join(' ');
element.innerHTML = newString;
I hope it helps you
UPD:
For all paragraphs https://jsfiddle.net/fazanaka/qx2ehym4/

Pure Javascript - Find/Replace a list of words in DOM

I have a list of words and I want to replace them in DOM.
The replace is working but it has an issue. Before and after the replaced text, it puts an undesired slash bar (i.e. reduced to a brigade > reduced/ OTHER WORD 1 /a brigade)
Also, I need to wrap the replaced word with an html tag, like .
(i.e. reduced to a brigade > reduced OTHER WORD 1 a brigade)
You can see the code here:
https://jsfiddle.net/realgrillo/9uL6ofge/
var elements = document.getElementsByTagName('*');
var words = [
[/(^|\s)([Tt]o)(\s|$)/ig, 'OTHER WORD 1'],
[/(^|\s)([Aa]nd)(\s|$)/ig, 'OTHER WORD 2']
];
for (var i = 0; i < elements.length; i++) {
var element = elements[i];
for (var j = 0; j < element.childNodes.length; j++) {
var node = element.childNodes[j];
if (node.nodeType === 3) {
var text = node.data;
for (var k = 0; k < words.length; k++)
{
var from = new RegExp(words[k][0]);
var to = new RegExp('$1' + words[k][1] + '$3');
var replacedText = text.replace(from, to);
//console.log("from: " + from);
//console.log("to: " + to);
//console.log("toDo: " + toDo);
if (replacedText !== text)
{
element.innerHTML = element.textContent.replace(from, to);
console.log("text: " + text);
console.log("replacedText: " + replacedText);
}
}
/*
//
//!!!!THIS CODE FOR 1 WORD WORKS!!!! I need to do this for the list of words.
//
var replacedText = text.replace(/(^|\s)([Tt]o)(\s|$)/ig, '$1OTHER WORD$3');
if (replacedText !== text) {
element.innerHTML = element.textContent.replace(/(^|\s)([Tt]o)(\s|$)/ig, '$1<b class=modified>OTHER WORD</b>$3');
}
*/
}
}
}
Masters of JS, can you help me? Pure javascript, not jQuery please.
Thanks.

I can give you some much more simple function to use. First define a function for replacing words like this:
String.prototype.replaceAll = function (search, replacement) {
try {
var target = this;
return target.split(search).join(replacement);
} catch (e) {
return "";
}
};
That's why the pure javascript replace function only replace one time in a whole case. After that you can use it in your code like here:
var replacedText = text.replaceAll(/(^|\s)([Tt]o)(\s|$)/ig, '$1OTHER WORD$3');

You don't need to parse words[k][0] into a RegExp as it is already one, even though it won't break because the constructor will handle a RegExp well.
Also for the words[k][1], you don't need to parse it into a RegExp either because the second parameter of String.prototype.replace is expected to be a plain string. The slashes are there because you cast your RegExp to a string (and will end up output).
So this is a change that will fix it:
var to = '$1' + words[k][1] + '$3';
And this is optional but probably a good practice:
var from = words[k][0];
For getting an html tag, you can either change it directly in each words[k][1] or in the to var declaration, depending on what you want.
Extra:
In the for loop, the code assumes that there are always at least three groups (because of the '$3' reference), an alternative could be to delegate this replacement directly in words[k][1], but it is up to you to decide.

Javascript Markdown Parsing

I'm working on a markdown to html parser. I understand this is a big project and there are third party libraries, but none the less I want to roll a simple solution on my own that doesn't have to handle every single aspect of markdown.
So far the process is to take an input (in my case the value of a textarea) and parse it line by line.
var html = [];
var lines = txt.split('\n'); //Convert string to array
//Remove empty lines
for(var index = lines.length-1; index >= 0; index--) {
if(lines[index] == '') lines.splice(index, 1);
}
//Parse line by line
for(var index = 0; index <= lines.length-1; index++) {
var str = lines[index];
if(str.match(/^#[^#]/)) {
//Header
str = str.replace(/#(.*?)$/g, '<h1>$1</h1>');
} else if(str.match(/^##[^#]/)) {
//Header 2
str = str.replace(/##(.*?)$/g, '<h2>$1</h2>');
} else if(str.match(/^###[^#]/)) {
//Header 3
str = str.replace(/###(.*?)$/g, '<h3>$1</h3>');
} else if(str.trim().startsWith('+')) {
//Unordered List
var orig = str;
str = str.replace(/\+(.*?)$/, '<li>$1</li>');
var previous, next;
if(index > 0) previous = lines[index-1];
if(!previous || previous && previous.indexOf('+') < orig.indexOf('+')) {
str = '<ul>' + str;
}
if(index < lines.length-1) next = lines[index+1];
if(!next || next && next.indexOf('+') < orig.indexOf('+')) {
var count = Math.max(0, orig.indexOf('+') / 4);
if(next) count = count - Math.max(0, next.indexOf('+') / 4);
for(var i=1; i<=count; i++) {
str = str + '</ul>';
}
}
if(next && next.trim().indexOf('+') == -1) str = str + '</ul>';
} else if(str.match(/^[0-9a-zA-Z]/)) {
//Paragraph
str = str.replace(/^([0-9a-zA-Z].*?)$/g, '<p>$1</p>');
}
//Inline formatting
str = str.replace(/\*\*(.*?)\*\*/g, '<strong>$1</strong>'); //Bold
str = str.replace(/\_\_(.*?)\_\_/g, '<strong>$1</strong>'); //Another bold
str = str.replace(/\*(.*?)\*/g, '<em>$1</em>'); //Italics
str = str.replace(/\_(.*?)\_/g, '<em>$1</em>'); //Another italics
//Append formatted to return string
html.push(str);
}
Where I run into problems is with nested blocks such as ul. Currently the code looks at a line that starts with a + and wraps it in an li. Great, but these list items never get placed within a ul. I could run through the output again after the line by line and just wrap every group of li's, but that screws me up when I have nested li's that require their own ul.
Any thoughts on how to apply these additional wrapper tags? I've considered using my own special characters around list type elements so I know where to add the wrapper tags, but that breaks traditional markdown. I wouldn't be able to pass the raw markdown to someone other than myself and know they'd understand what was going on.
Edit I updated my code sample to include a working sample. The working sample also supports nested lists.

You need a very simple state machine.
When you encounter the first + you add <ul> and raise a flag.
If you don't see a line that starts with + and your flag is raised, then close the </ul>

JavaScript: loop through DOM tree and replace text?

I need to use JavaScript to loop through the DOM tree of a webpage and replace every instance of the word 'hipster' with a different word UNLESS it is part of a link or image src. Example, if 'hipster' appears in a paragraph, it should be replaced.
But if it's in the src="" url for an image, that should not be replaced because if it replaces that word in a url, the url obviously breaks.
I've been having a really hard time implementing this. Here's one thing I tried:
var items = document.getElementsByTagName("*");
var i = 0;
for (i = 0; i < items.length; i++){
if(i.nodeType == 3){
i.html().replace(/hipster/gi, 'James Montour');
}
else{
//do nothing for now
}
}

You’re close, but:
document.getElementsByTagName('*') will never return text nodes; it gets elements only (and comment nodes in old versions of IE)
Text nodes don’t have an html() method
replace doesn’t modify strings in place
With that in mind, you can get every child of every element, check their types, read their nodeValues, and, if appropriate, replace the entire node:
var elements = document.getElementsByTagName('*');
for (var i = 0; i < elements.length; i++) {
var element = elements[i];
for (var j = 0; j < element.childNodes.length; j++) {
var node = element.childNodes[j];
if (node.nodeType === 3) {
var text = node.nodeValue;
var replacedText = text.replace(/hipster/gi, 'James Montour');
if (replacedText !== text) {
element.replaceChild(document.createTextNode(replacedText), node);
}
}
}
}
Try it out on this page!

To use variables instead of 'hipster' replace:
var replacedText = text.replace(/hipster/gi, 'James Montour');
with these lines:
var needle = 'hipster';
var replacement = 'James Montour';
var regex = new RegExp(needle, "gi");
var replacedText = text.replace(regex, replacement);

Develop Reference

JavaScript is the programming language of the Web.

\b metacharacter not working correctly - javascript

Related

Locate node by character position(position relative to innerText)

Adding style tags around specific characters only

Pure Javascript - Find/Replace a list of words in DOM

Javascript Markdown Parsing

JavaScript: loop through DOM tree and replace text?

Categories

Resources