Locate node by character position(position relative to innerText) - javascript

I have this structure (can be more complex/contain nested tags)
<div id="editor">
<p>Some <i>text</i>
<p>Some other text</p>
</div>
I'm extracting text as is visible on screen with editor.innerText
And I get:
Some text
Some other text
My backend service analyzes this text and gives me highlights with "positions". For the example above let's say it returns (0,9) meaning start is text node containing "Some" and end node is the one containing "text". How can I locate these nodes by given positions?
I've had some success with collecting text with DOM traversal and keeping track of positions but I'm losing newlines and some white space(due to textContent).

The answer to this isn't trivial.
The best solution is to pass the innerHtml to your backend service in order to correctly highlight the text, and it'll need to be able to parse HTML.
However, your other solution is to pass your innerText to the backend, then step through all the characters in the innerHtml and ignore all the characters inside angle brackets.
This will probably require some cleaning up of whitespace, and a bit of HTML mangling, but I'll leave that up to you.
Here's an example of what I mean
let searchHtml = "<p>Some <i>text</i><p>Some other text</p>";
let outputHtml = "";
let highlightOpenTag = "<b>";
let highlightCloseTag = "</b>";
let currentlyHighlighting = false;
// Start and end positions from your backend
let startIndex = 0;
let endIndex = 9;
let inTag = false;
let textIndex = 0;
for (let i = 0; i < searchHtml.length; i++) {
let char = searchHtml[i];
// We don't want to insert highlight tags when we're inside a tag already
if (char === '<') inTag = true;
if (inTag) {
outputHtml += char;
} else {
// If we're not in a tag, but we are within the text positions
// returned from the backend, let's get highlighting
if (textIndex >= startIndex && textIndex < endIndex) {
if (!currentlyHighlighting) {
outputHtml += highlightOpenTag;
currentlyHighlighting = true;
}
}
outputHtml += char;
// If we're about to hit a tag and we're already highlighting then
// insert our end highlight tag
if((searchHtml.length < i+1 || searchHtml[i+1]) === '<' && currentlyHighlighting) {
outputHtml += highlightCloseTag;
currentlyHighlighting = false;
}
textIndex++;
}
if (char === '>') inTag = false;
}
console.log(outputHtml);

Related

Adding style tags around specific characters only

I am trying to add inline styling to only numbers in paragraph elements. For example:
<p>This paragraph has the numbers 1 and 2 in it.</p>
So in this instance, I would want to put <span class="style">1</span>and <span class="style">2</span>. Around the two numbers in that paragraph.
I am trying to write a javascript to accomplish this so I don't have to go back into the document I'm working on and manually add the styling tags around each number, as the document is very long.
So far this is what I wrote, but I'm having difficulty figuring out what to do for the next step on how to incorporate the edits back into the paragraph HTML.
let regEx=/[0-9]/g;
let list = [];
let paragraphs = document.getElementsByTagName("p");
for (var i = 0; i < paragraphs.length; i++) {
let html = paragraphs[i].innerHTML;
list.push(html);
}
// all paragraphs into one string.
let joined = list.join(' ');
// all the numbers in the paragraphs stored in array
let numbers = joined.match(regEx);
// define array for styling edits
let edits = [];
// adding the styling tags to each num
numbers.forEach(function(num){
edits.push('<span class="style">' + num + '</span>');
// outputs ["<span class='style'>3</span>", "<span class='style'>7</span>", "<span class='style'>4</span>", "<span class='style'>5</span>"]
});
// need to insert edits into paragraph html
If anyone can offer any suggestions on how I might be able to accomplish this that would be great, I am still relatively new to working with JS.
const paragraphs = document.getElementsByTagName("p");
for (var i = 0; i < paragraphs.length; i++) {
const regEx=/([0-9])/g;
const newHtml = paragraphs[i].innerHTML.replace(regEx, '<span class="style">$1</span>');
paragraphs[i].innerHTML = newHtml;
}
I updated your regex to put the number in a group, then in the string replace you can reference that group, since there is only one it will be $1. As you can see in the replace we are wrapping that with the appropriate span and then plugging it right back into the innerHTML.
I did notice that your regex is only capturing single digit numbers, if you wanted to capture multi-digit numbers, you could update your reg ex like this: /([0-9]+)/g.
I created a simple jsfiddle to show you how it works: https://jsfiddle.net/andyorahoske/dd6k6ekp/35/
I broke out the most fundamental part of this into a reusable function that you may find helpful in other contexts.
/**
* Wraps numbers in a string with any provided wrapper.
* #param {String} str A string containing numbers to be wrapped.
* #param {String} wrapper A string with placeholder %s to define the wrapper. Example - <pre>%s</pre>
* #return {String} The original string with numbers wrapped using the wrapper param.
*/
function wrapNumbers(str, wrapper) {
var numbersInStr = str.match(/\d+/g) || [];
var chunks = [];
var segmentStart = 0;
for(var i = 0; i < numbersInStr.length; i += 1) {
var number = numbersInStr[i];
var indexOfNumber = str.indexOf(number);
var fWrapper = wrapper.replace('%s', number);
chunks.push(str.slice(segmentStart, indexOfNumber));
chunks.push(fWrapper);
segmentStart = indexOfNumber + number.length;
}
if(segmentStart < str.length) {
chunks.push(str.slice(segmentStart, str.length));
}
return chunks.join('');
}
To use this in your use case it might look like the following:
var paragraphs = document.getElementsByTagName("p");
var wrapper = '<span class="style">%s</span>';
for(var i = 0; i < paragraphs.length; i += 1) {
var paragraph = paragraphs[i];
paragraph.innerHTML = wrapNumbers(paragraph.innerHTML, wrapper);
}
Codepen: https://codepen.io/bryceewatson/pen/vRqeVy?editors=1111
Here's a new code https://jsfiddle.net/fazanaka/au4jufrr/1/
var element = document.getElementById('text'),
text = element.innerText,
wordsArray = text.split(' '),
newString;
for(var i = 0; i < wordsArray.length; i++){
if(!isNaN(parseFloat(wordsArray[i])) && isFinite(wordsArray[i])){
wordsArray[i] = "<span class='style'>" + wordsArray[i] + "</span>";
}
}
newString = wordsArray.join(' ');
element.innerHTML = newString;
I hope it helps you
UPD:
For all paragraphs https://jsfiddle.net/fazanaka/qx2ehym4/

\b metacharacter not working correctly

I know this code works and it has before but it now has stopped working. I am working on a person highlighter tool but if I type in a word then type in mark, it highlights the actual mark element. Here is my code:
function Search (tagC) {
var notes = document.getElementsByClassName("NoteOp");
for (var i = 0; i < notes.length; i++) {
var n = notes[i];
var tagOut = tagC
var tagFront = tagOut.slice(0, -9);
var tagLast = tagOut.slice(-9);
n.innerHTML = n.innerHTML.replace(new RegExp("\\b(" + tagFront + ")\\b", "gim"), "<mark class=" + tagLast + ">$1</mark>");
if(window.Bold === "Yes") {
$("mark").css("font-weight", "bold");
}
}
}
tagFront is the search term while tagLast is a class that always has 9 letters. Any problems seen in the coding?
An example of tagC would be:
testYelColBox
...and the text I'm searching looks like this:
<div id="NoteHolder">
<p class="NoteOp">This is a test paragraph uses to TeSt filters.</p>
<p class="NoteOp">Random words, I need to see if it will mess up mark</p>
</div>
Main question: Why does my code mark an HTML element even though my code has a \b metacharacter selector?
Your problem seems to be this:
If you first highlight a word, it works correctly. But now your HTML has <mark> tags, so if now you search a second time with search word "mark", that tag gets a nested mark, which is undesired and makes your HTML invalid.
Why this happens
The \b escape matches any position in the search string where the character sequence switches from an alphanumerical character to a non-alphanumerical character or vice versa. This means \b also matches with the position right after the < of <mark ...>, and with the position right after the k (because of the space that follows).
Solution
Do a controlled replacement by only applying it to text nodes, not to HTML elements. For this you need to iterate over the nodes, check their type, and when they are text nodes, perform the replacement. As the replacement involves the insertion of an HTML element, you should actually split that text node into what comes before, the mark element, and what comes after.
Here is code that does all this:
function clear() {
var notes = document.getElementsByClassName("NoteOp");
for (var i = 0; i < notes.length; i++) {
var n = notes[i];
// Remove all opening/closing mark tags
n.innerHTML = n.innerHTML.replace(/<\/?mark.*?>/gm, "");
}
}
function highlight(tagC) {
// Sanity check
if (tagC.length <= 9) return; // ignore wrong input
var notes = document.getElementsByClassName("NoteOp");
// Split into parts before entering loop:
var tagFront = tagC.slice(0, -9);
var tagLast = tagC.slice(-9);
// Escape tagFront characters that could conflict with regex syntax:
tagLast = tagLast.replace(/([.*+?^${}()|\[\]\/\\])/g, "\\$1");
var regex = new RegExp("\\b(" + tagFront + ")\\b", "gim");
// Create a template of the highlight that can be cloned
var mark = document.createElement('mark');
mark.setAttribute('class', tagLast);
// Loop over notes
for (var i = 0; i < notes.length; i++) {
// Create a span that will have the contents after replacements
var span = document.createElement('span');
// Go through all child nodes of this note
var nodes = notes[i].childNodes;
for (var j = 0; j < nodes.length; j++) {
var node = nodes[j];
if (node.nodeType === 3) {
// Only if text node, perform replacement
parts = node.textContent.split(regex);
// As regex has capture group, also the split expression is a part
for (var k = 0; k < parts.length; k++) {
// Add this part
if (k % 2) {
// Add highlighted text
mark.textContent = parts[k];
span.appendChild(mark.cloneNode(true));
} else {
// Add text part that did not match as such
span.appendChild(document.createTextNode(parts[k]));
}
}
} else {
// Non-text nodes are just copied as they are
span.appendChild(node.cloneNode(true));
}
}
// Replace note contents with new contents
notes[i].innerHTML = span.innerHTML;
}
// Setting style for CSS class should happen outside of the loop
$("mark").css("font-weight", window.Bold === "Yes" ? "bold": "normal");
}
// I/O
var inp = document.querySelector('#inp');
var btnMark = document.querySelector('#mark');
var btnClear = document.querySelector('#clear');
btnMark.onclick = function () {
highlight(inp.value + 'YelColBox');
}
btnClear.onclick = clear;
Type text to be highlighted and press Mark:<br>
<input id="inp" value="">
<button id="mark">Mark</button>
<button id="clear">Clear</button>
<div id="NoteHolder">
<p class="NoteOp">This is a test paragraph uses to TeSt filters.</p>
<p class="NoteOp">Random words, I need to see if it will mess up mark</p>
</div>

Parsing text with javascript: mysteriously generated paragraph tags

When I parse text in js and want to retrieve a (DNA sequence) query name from multiple lines and put that between paragraph tags, it wont work correctly.
(Part of) The text file:
Database: db
22,774 sequences; 12,448,185 total letters
Searching..................................................done
Query= gi|998623327|dbj|LC126440.1| Rhodosporidium sp. 14Y315 genes
for ITS1, 5.8S rRNA, ITS2, partial and complete sequence
(591 letters)
Score E
Sequences producing significant alignments: (bits) Value
The code:
(I read the lines into an array first)
for(var i = 0; i < lines.length; i++){
var line = lines[i];
if(line.search("Query= ") != -1){
results.innerHTML += " <p class='result_name'> <br>Result name: ";
//the name starts at 7th char
results.innerHTML += line.slice(7);
//take the next line
i++;
// tried to searh for "\n" or "\r" or "\r\n" to end cycle - didn't work
// so instead I put this for the while condition:
while(lines[i].length > 2 ){
results.innerHTML += lines[i];
i++;
}
//here is where I want the result_name paragraph to end.
results.innerHTML += " </p> <p>Result(s):</p>";
}
}
The result:
Don't use
innerHTML +=
Generate your whole HTML before hand and then add it to the innerHTML, my guess is that when you use innerHTML, the browser add end tag automatically.
Filling innerHTML with a partial html will be auto-corrected with the end tags. So, Create a temporary variable to collect your string and fill it in the destination at once as below. that fixes the issue
var temp = "";
for(var i = 0; i < lines.length; i++){
var line = lines[i];
if(line.search("Query= ") != -1){
temp += " <p class='result_name'> <br>Result name: ";
//the name starts at 7th char
temp += line.slice(7);
//take the next line
i++;
// tried to searh for "\n" or "\r" or "\r\n" to end cycle - didn't work
// so instead I put this for the while condition:
while(lines[i].length > 2 ){
temp += lines[i];
i++;
}
//here is where I want the result_name paragraph to end.
temp += " </p> <p>Result(s):</p>";
}
}
results.innerHTML = temp;

Javascript Markdown Parsing

I'm working on a markdown to html parser. I understand this is a big project and there are third party libraries, but none the less I want to roll a simple solution on my own that doesn't have to handle every single aspect of markdown.
So far the process is to take an input (in my case the value of a textarea) and parse it line by line.
var html = [];
var lines = txt.split('\n'); //Convert string to array
//Remove empty lines
for(var index = lines.length-1; index >= 0; index--) {
if(lines[index] == '') lines.splice(index, 1);
}
//Parse line by line
for(var index = 0; index <= lines.length-1; index++) {
var str = lines[index];
if(str.match(/^#[^#]/)) {
//Header
str = str.replace(/#(.*?)$/g, '<h1>$1</h1>');
} else if(str.match(/^##[^#]/)) {
//Header 2
str = str.replace(/##(.*?)$/g, '<h2>$1</h2>');
} else if(str.match(/^###[^#]/)) {
//Header 3
str = str.replace(/###(.*?)$/g, '<h3>$1</h3>');
} else if(str.trim().startsWith('+')) {
//Unordered List
var orig = str;
str = str.replace(/\+(.*?)$/, '<li>$1</li>');
var previous, next;
if(index > 0) previous = lines[index-1];
if(!previous || previous && previous.indexOf('+') < orig.indexOf('+')) {
str = '<ul>' + str;
}
if(index < lines.length-1) next = lines[index+1];
if(!next || next && next.indexOf('+') < orig.indexOf('+')) {
var count = Math.max(0, orig.indexOf('+') / 4);
if(next) count = count - Math.max(0, next.indexOf('+') / 4);
for(var i=1; i<=count; i++) {
str = str + '</ul>';
}
}
if(next && next.trim().indexOf('+') == -1) str = str + '</ul>';
} else if(str.match(/^[0-9a-zA-Z]/)) {
//Paragraph
str = str.replace(/^([0-9a-zA-Z].*?)$/g, '<p>$1</p>');
}
//Inline formatting
str = str.replace(/\*\*(.*?)\*\*/g, '<strong>$1</strong>'); //Bold
str = str.replace(/\_\_(.*?)\_\_/g, '<strong>$1</strong>'); //Another bold
str = str.replace(/\*(.*?)\*/g, '<em>$1</em>'); //Italics
str = str.replace(/\_(.*?)\_/g, '<em>$1</em>'); //Another italics
//Append formatted to return string
html.push(str);
}
Where I run into problems is with nested blocks such as ul. Currently the code looks at a line that starts with a + and wraps it in an li. Great, but these list items never get placed within a ul. I could run through the output again after the line by line and just wrap every group of li's, but that screws me up when I have nested li's that require their own ul.
Any thoughts on how to apply these additional wrapper tags? I've considered using my own special characters around list type elements so I know where to add the wrapper tags, but that breaks traditional markdown. I wouldn't be able to pass the raw markdown to someone other than myself and know they'd understand what was going on.
Edit I updated my code sample to include a working sample. The working sample also supports nested lists.
You need a very simple state machine.
When you encounter the first + you add <ul> and raise a flag.
If you don't see a line that starts with + and your flag is raised, then close the </ul>

Making individual letters in a string different colours

I'm trying to make the colour different for certain letters (if found) in a string eg. the letter i. The search count is working I just can't figure out the changing html colour of the individual letter.
I know if it was a whole word then I could just use split strings, but can't figure out how to do it for a single letter. I've found some examples, one that I have tried is at the bottom that is not working either.
//getMsg is another function, which passes in a user inputted string
function searchMsg(getMsg) {
alert (getMsg);
var msgBoxObject = document.getElementById('msgBox');
var pos = getMsg.indexOf('i')
var txtToFind = (document.getElementById('txtToFind').value);
var count = 0;
while (pos !== -1){
count++;
pos = getMsg.indexOf('i', pos + 1);
document.writeln (+count);
msgBoxObject.innerHTML = (count);
}
getMsg = getMsg.replace('/i/g<span class="red">i</span>');
document.writeln (getMsg);
}
Edit; I've added in this, but can't get the loop to work correctly so it displays all instances of the letter found instead of just one: /*while (pos !== -1){
count++;
pos = getMsg.indexOf('i', pos + 1);
document.writeln (+count);
msgBoxObject.innerHTML = (count);
}
*/
var count = 0; // Count of target value
var i = 0; // Iterative counter
// Examine each element.
for(i=0; i<arr.length; i++)
{ if(arr[i] == targetValue)
count++;
}
return count;
}
searchIndex = txtMsg.indexOf(txtToFind);
if (searchIndex >=0 ) {
// Copy text from phrase up till the match.
matchPhrase = txtMsg.slice(0, searchIndex);
matchPhrase += '<font color="red">' + txtToFind + '</font>';
matchPhrase += txtMsg.slice(searchIndex + txtToFind.length);
} else {
matchPhrase = "No matches"
}
displayProcessedMsg(matchPhrase);
document.writeln(matchPhrase);
You either need to add the corresponding css for that class or change the tag like #john_Smith specified
Adding the CSS
span.red {
color: red;
}
Changing the tag
On your code replace this
getMsg = getMsg.replace('/i/g<span class="red">i</span>');
for
getMsg = getMsg.replace('/i/g<span style:"color:red">i</span>');
Some example of inline css
Some advice on color palettes
Try looking into d3 color scales(https://github.com/mbostock/d3/wiki/Ordinal-Scales#categorical-colors) or apply a principle similar to incrementing an RGB value instead of using names of colors.
Hope this helps.

Categories

Resources