Contenteditable Div - Cursor position in terms of innerHTML position

Contenteditable Div - Cursor position in terms of innerHTML position - javascript

I've done my research and come across these questions on StackOverflow where people asked this same question but the thing is that they wanted to get the position either in terms of x and y coordinates or column from the left. I want to know what the position of the cursor is with respect to the div's innerHTML.
For example:
innerHTML = "This is the innerHTML of the <b>div</b> and bla bla bla..."
^
Cursor is here
So the result I want for this case is 44. How to do it ?

var target = document.createTextNode("\u0001");
document.getSelection().getRangeAt(0).insertNode(target);
var position = contentEditableDiv.innerHTML.indexOf("\u0001");
target.parentNode.removeChild(target);
This temporarily inserts a dummy text node containing a non-printable character (\u0001), and then finds the index of that character within the div's innerHTML.
For the most part this leaves the DOM and the current selection unchanged, with one minor possible side effect: if the cursor is in the middle of text from a single text node, that node will be broken up into two consecutive text nodes. Usually that should be harmless, but keep it in mind in the context of your specific application.
UPDATE: Turns out you can merge the consecutive text nodes using Node.normalize().

Related

Preserve DOM elements position while removing texts

I am looking for a solution where I can remove texts (or replace texts with some characters) in DOM where the position of all DOM elements remain same.
Background
My project capture full source code of web pages from sensitive web pages, however, those sensitive data does not matter and need to be removed prior to transmitting to the server. Captured source code will be later used to recreate what Administrator was seeing (without texts)
Example
Assume this is a page:
<div>Some text here
<input type="button" value="some other text" />
some more text
</div>
So it will be rendered like this by browser:
some text here [some other text]some more text
I need it to be like this:
------ ------ ------ [------- ------ ------]------- ------- ------
Current buggy approach
Currently, I get texts in DOM, count characters between each space, and replace those characters with a dash. unfortunately, it will render like this:
---- ---- --- [---- ----- ----]---- ---- ----
Which as you can see, the position of button and link is completely different from the original.
Purpose
The main purpose is to recreate DOM later on for UX purposes, but without any texts transmitted to a server that might contain sensitive information. Texts can be completely removed, replaced with any characters (I used - in this example), replaced with other texts such as "Lorem ipsum", as long as it is completely removed from source code while preserving the exact location of DOM.
It is used to record mouse click and mouse move positions (X, Y) and show them as a click/move heat-map.
Restrictions
I am not able to change font or codes on target web pages and each element and page might be using a different font for each element.
Ideas?
Looking for help if anyone can come up with an idea about this?
The issue here is that - have different character width than characters used in the real text.
I have thought of scrambling words in all sentences therefor preserve final total width of the text. however, someone might be able to reshuffle them back to original word and it is a security/privacy risk.
I have thought of replacing with multiple dashes based on each word size (and using it currently), but How to get the size of each word in it's specified DOM element? (as each DOM element might use different font, therefore different size for each character) and it could have big performance issue trying to create a hidden div next to each element with their texts just to try to calculate text width of it.
on parent element which have text on it, get computed style for font-size,font-family and letter-spacing and use it in a new div to detect that font's width for space. then put original text on that div and detect width of original text. then divide original text width to space width for that font to detect how many space need to be there to generate same width, and generate those spaces. Issue here is that on some pages that have too many texts, It will be an overkill to browser performance.
Your idea?

Try with this:
// Select 'div','a' and 'input' elements.
// you can add more elements or even select all '*'
$('div,a,input').each(function() {
var contents = $(this).contents();
if (contents.length > 0) {
if (contents.get(0).nodeType == Node.TEXT_NODE) {
// Remove text from children nodes
var elementText = $(this)
.clone() //clone the element
.children() //select all the children
.remove() //remove all the children
.end() //again go back to selected element
.text();
// Replace text
$(this).text(elementText.replace(/[a-zA-Z0-9]{1}/g, '-')).append(contents.slice(1));
}
}
// From input tags we will replace value
if($(this).is('input'))
$(this).val($(this).val().replace(/[a-zA-Z0-9]{1}/g, '-'));
});
Here is a JSFiddle Demo

Managing No Man's Land in a content editable editor

Background:
My program offers smart brace completion, that is automatic addition of a ] on typing a [.
Problem:
Consider this scenario:
Notice the editor on the left where the caret is placed. As you can see in the Inspect Element on the right, the caret is placed right between two consecutive <br>s. There is no text node or element node between them. The caret belongs to neither of the <br>s. It is a no-man's land. What is surprising is that the caret belongs to the parent editor!
Above you can see range.startContainer and range.endContainer both point to the parent content editable editor div.editor.show. Also, the second print line that mentions 2 is the caret position that I got using range.startOffset. I have a faint guess that the 2 refers to one text node and one <br> that precede the caret.
What happens due to problem:
The second ] that has to be inserted after [ gets inserted at the second index in the entire div.editor, meaning right after Th at the beginning. So, after I locate my caret in the no-man's land and press [, this happens:
Question:
How am I supposed to detect and provide a fix for this no-man's land problem?
JSFiddle
Note: this does not occur in <textarea>s.

Contenteditable setCursorPos(elem, pos), pos = getCursorPos(elem)

Does anybody have workable solution to get or set cursor position in contenteditable div with elements inside?
What I going to do is to create twitter-like field where I will insert on # sign pressed. Problem is that browsers are pretty buggy with contenteditable attribute, so I cant rely on them for insert. So my algorithm is:
When # pressed
GetCursorPos
get content of div as a string, split it to two halves
Insert string with span between halves
Do $(div).html(new_content) - cursor will be dropped to beginning
move cursor to old_cursor + span's text length
Problem with 2 and 5. I checked following questions:
Set cursor position on contentEditable <div>
How to set caret(cursor) position in contenteditable element (div)?
Set the caret position always to end in contenteditable div
How to move cursor to end of contenteditable entity
contenteditable, set caret at the end of the text (cross-browser)
And many more... There is NO workable solution (FF, Chrome, IE) for setCursorPos(elem, pos) - everywhere is move to the end or save and restore. Also I have getCursorPosition, but sometimes in chrome it gives incorrect results, so this function appreciated also!
Thanks a lot!

Find location of line on screen

So I have HTML text being rendered in a browser (in this case an Android WebView). I want to find out what the (x,y) location in pixels of any given line of text is AFTER it is rendered. The working definition of line I am using is not just all the text contained in a <p> tag or that appears before a <br> tag. I mean a line as it would appear to the user.
I am open to any suggested method.
Is there any CSS property that you are able to find the number of lines in a div and their respective heights? That would provide a workable solution.
Thanks!!

You can't access a line individually. But, with some JavaScript, you can find the position of a line with a known index; here's a basic outline:
var p = document.getElementById("ptag"); //get the text container that contains your line
var nthline = 3; //the line for which you'd like to find the position
var lnheight = parseInt(window.getComputedStyle(p).lineHeight); //get the height of each line
var linepos = [p.offsetLeft, p.offsetTop + lnheight * (nthline - 1)]; //a [left, top] pair that represents the line's position
Note: This assumes the container doesn't have anything but text.

There is no standard way of doing that, you will have to refer to your imagination and invent some hack, right now I can think of two ideas for this:
Enclose each word within a span, like <span
class="word">word</span>, that could easily be done with regex or
string functions, later loop over each <span> reading its
position, add some calculation and you could find out how many
lines, where a line starts (word that incremented its top position
from last one) and when a line ends (last word of line + width of
that word).
Apply some style to first line using :first-line pseudoelement,
like
p:first-line{ background-color: white; /* same existent color so
no affecting display*/ }
later find in DOM what text that style was applied. This idea is not
as good and first one but maybe it can make you think of other ways.

JavaScript: Given the DOM, find the largest piece of continuous text (content part)

The goal is to find the largest piece of contiguous text in a document. The problem is that the largest piece does not lie under a single element, e.g. a blog post which has <p> tags in it so iterating nodes and comparing innerHTMLs is not going to work. And by getting innerText of an element, the root node always contains biggest text. So how should one accomplish that?
Thanks

Your problem can be complicated because if there is a div that contains 2 words, plus another <p> inside the div with 200 words in it, then do you count the div having 202 words, or do you count the p having 200 words and therefore is the biggest?
If there are 4 borders for p, then it can make sense to say it is p with 200 words. If there is no border, then it makes sense to say it is div with 202 words.
You can try writing a function to traverse down a node, and if there is any block element with 4 borders, then don't include the word counts.
Things can be more complicated if there are floated divs, which are set to display:inline to work around an IE 6 bug. Or if there are borders, but the color is the same as the background color of the containing div.
If you don't care about the inside elements having borders, then one attempt can be just to look at the immediate children of body, and find out how many characters there are inside of it (sum of text under all descendants, probably using innerText or innerHTML and strip all the tags).
You might also look into finding the biggest element with the biggest area (width x height), if you are looking for the content section, unless there is a long and narrow sidebar or ad section to the left and right, with the content area wide but really short.

The most time effective tactic in screen scraping is always to define templates for each instance of what you are scraping. Considering that most pages these days have a "content" container, all you have to do is add the name of the "content" div for each of your sources. If you are scraping blogs it also becomes much easier as you can create rules for most popular blogging systems as they usually have the same content container across implementations. So you can try defaults first and if they come up empty log the url and manually identify the container.
If you really want to automate this you probably will (and I am guessing here) need to compare size of sibling nodes and check their type of the DOM tree at each level of the DOM and only follow the largest branch. When you hit a level where all the siblings are text nodes the container for these most likely your "main content" container. You can accomplish this using jQuery for node iteration or just "normal" javascript DOM functions.

When I started out typing this answer, I was going to write that it is pretty simple.
I was thinking about cloneNode(false). Then i thought about textnodes, then the normalize function, and then the case when textnodes arent adjacent.
Apart from recursing the entire DOM you will have to do the following to each elementNode (NodeType = 1)
ElLength = thisEl.nodeValue.length ;
if (thisEl.hasChildNodes()){
for each (node in thisEl.childNodes){
if (node.nodeType == 3) { // textnode
ElLength += node.data.length;
}
}
}
then you'll have to remember the largest ElLength and the corresponding element.
It's gonna be slow if your DOM is huge.
Code hasn't been tested... I wrote it just to give an example

Develop Reference

JavaScript is the programming language of the Web.