text nodeValue containing HTML entity - javascript

I'm creating a real time HTML editor that loads after a DOM has been rendered, and builds the source by looping through all nodes. I've noticed that when I try to read nodeValue of a text node containing an HTML entity, I always get the rendered unicode value of that entity.
How can I read a rendered text node, and keep the HTML entity code? (using vanilla JS)
Example:
<div id="test">copyright ©</div>
<script>
var test = document.getElementById('test');
console.log(test.childNodes[0].nodeValue);
// expected: copyright ©
// actual: copyright ©
</script>

Unfortunately you can't. The Text interface inherits from CharacterData, and both interfaces provide only DOMStrings as a return value, which contains Unicode characters.
Furthermore, the HTML5 parsing algorithm basically removes the entity entirely. This is defined in several sections of 8.2.4 Tokenization.
8.2.4.1 Data state: describes that an ampersand puts the parser to the Character reference in data state
8.2.4.2 Character reference in data state describes that the tokens followed by the ampersand should be consumed. If everything works fine, it will return the Unicode character tokens, not the entity!
8.2.4.69 Tokenizing character references describes how one interprets &...; (basically do some things and if everything is OK, look it up in the table).
So by the time your parser has finished the entity is already gone and has been replaced by the Unicode symbols. This is not that surprising, since you can also just put the symbol © right into your HTML code if you want.
However, you can still undo that transformation: you need to take a copy of the table, and check for any character in your document whether it has a entry in it:
var entityTable = {
169: "©"
}
function reEntity(character){
var index = character.charCodeAt(0), name;
if( index < 127) // ignore ASCII symbols
return character;
if( entityTable[index] ) {
name = entityTable[index];
} else {
name = "#"+index;
}
return "&"+name+";"
}
This is quite a cumbersome task, but due to the parser's behaviour you probably have to do it. (Don't forget to check whether someone has already done that).

Related

How to populate currency symbols in html 5 input element

I have the following html Input element:
<input size=6 type="text" id="prd_price" title="prd_price" name="prd_price" >
I want Currency Symbol ман for "Azerbaijani manat
" to be saved in the database. I would like to populate this as the symbol inside HTML Input element and perhaps concatenate with the price of a product. But when I populate the Input Element dynamically with the UTF-8 Code, it remains in the code form and does not become the Currency Symbol it is suppose to become.
Anyone knows what am I missing here...
The UTF-8 encoding can represent the complete Unicode catalogue (in fact, the letter U in the acronym comes from Unicode) so you don't need HTML entities in the first place. Such entities are only necessary if have characters that your current encoding cannot handle, which isn't the case here.
If you absolutely need to use those HTML entities (e.g., you consume a third-party data feed you cannot tweak) you need to realise that they only make sense in HTML context. A fairly common error in jQuery is to use .text() where you should really be using .html().
In this precise situation you have an <input> element so you cannot use either. Your only choice is .val(). However, since an <input> cannot contain HTML at all everything you feed .val() with will be eventually handled as plain text.
A little trick you can use is to create a detached node so you can use .html() to populate it with HTML and .text() to extract the plain text representation:
var $input = $("#prd_price");
// Original string "&#1084 ;&#1072 ;&#1085 ;" fixed:
var symbols = "ман"
var plainText = $("<span></span>").html(symbols).text()
$input.val(plainText);
... will render as:
ман
Demo
First of all I got the UTF-8 Code for Azerbaijani manat ман which is able to be run in javascript from "https://r12a.github.io/apps/conversion/". In this case it came up to be \u043C\u0430\u043D. Then I ran it up with the following code to get it display inside the input element using javascript:
var x = "\u043C\u0430\u043D";
var r = /\\u([\d\w]{4})/gi;
x = x.replace(r, function (match, grp) {
return String.fromCharCode(parseInt(grp, 16)); } );
x = unescape(x);
console.log(x);

Regex replacement with prompting/callback UI

I'm trying to write a function that takes a long string of text, identifies place holders within the text, and prompts the user to supply a value that should take the place of the placeholder. The markup for the placeholders looks similar to markdown used for images or links:
some text, some more text, ?[name][description] more text, not just commas
Where name and description are arbitrary runs of text. When I've found these placeholders, I want to pop up a nicely formatted dialog, using the names and descriptions, and have the user supply a replacement value.
I already have a nice function (called htmlPrompt) available where you hand it a piece of HTML (for the main part of the prompt), has a text box, and then calls a callback function you've supplied with the result (or null if Cancel is pressed), with the following signature:
function (htmlText, inputStartValue, callback)
Before plugging in this function, I wrote the rough and ready:
myText = myText.replace(/(\?\[(.+)\][ ]?(?:\n[ ]*)?\[(.+)\])/g,
function (wholematch, m1, m2, m3) {
var repValue = prompt(m2);
if (repValue == null)
{
return m1;
}
return repValue;
});
Which uses the DOM built-in prompt method - which doesn't really do an adequate job for me, when it comes to formatting.
However, I can't think of a way of plugging in htmlPrompt - it only simulates a modal dialog and provides the final result by calling callback.
I did think of trying to manually do the replacements, using the results from match rather than replace - but so far as I can see, the values returned by match are just strings - they don't give you anything useful (such as the location of the match within the overall text).
Or do you think I'm going about this completely wrong? The overall flow I want is:
Find each placeholder in the text
Prompt the user for a replacement, using both the name and description values
Replace the placeholder expressions in the text with the user supplied value.
For each of the name and description tupples:
First use match to read name and desription.
Prompt user.
Then use replace to replace those.

How can spaces be converted to &nbsp without breaking HTML tags?

I've inherited some pretty complex code for a web forum, and one of the features I'm trying to implement is the ability for spaces to not be truncated into only one. This is mainly because our users often want to include ASCII art, tables etc in their posts.
I first did this using a simple search and replace in javascript, which had the side effect of breaking HTML tags (eg <a href=....> became <a href=.....>).
I then tried doing this on server side, when the strings are retrieved, by having spaces converted before links and code people insert is converted to HTML. This works to a degree but it causes some issues with other parts of the code, for example where a message is truncated to appear on the home page, it might leave some of the space code, such as
Here is a message&nb
I think there may be a way to just alter the original javascript to achieve this - it just needs to only match spaces that are not inside a HTML tag.
The script I was using originally was message = message.replace(/\s/g, " ").
Thanks for any help you can provide with this.
You can use the pre element to include preformatted text, which renders spaces as-is. See http://www.w3.org/TR/html5-author/the-pre-element.html
Those docs specifically say one of the best uses of the pre element is "Displaying ASCII art".
Example: http://jsbin.com/owuruz/edit#preview
<pre>
/\_/\
____/ o o \
/~____ =ø= /
(______)__m_m)
</pre>
In your case, just put your message inside a pre tag.
Yes, but you need to process text content of elements, not all of the HTML document content. Moreover, you need to exclude style and script element content. As you can limit yourself to things inside the body element, you could use a recursive function like following, calling it with process(document.body) to apply it to the entire document (but you probably want to apply it to a specific element only):
function process(element) {
var children = element.childNodes;
for(var i = 0; i < children.length; i++) {
var child = children[i];
if(child.nodeType === 3) {
if(child.data) {
child.data = child.data.replace(/[ ]/g, "\xa0");
}
} else if(child.tagName != "SCRIPT") {
process(child);
}
}
}
(No reason to use the entity reference here; you can use the no-break space character U+00A0 itself, referring to it as "\xa0" in JavaScript.)
One way is to use <pre> tags to wrap your users posts so that their ASCII art is preserved. But why not use Markdown (like Stackoverflow does). There's a couple of different ports of Markdown to Javascript:
Showdown
WMD
uedit

Interactive string manipulation via javascript

I have a webapp that must allow users to interactively manipulate strings (words, phrases and so on...)
Example:
given a foobar string, if the user clicks on b the string is split in two and a whitespace is added, resulting in foo bar.
I could put each single character inside a span element, but I fear this would be troublesome for long strings.
Any advice?
This version using jQuery (not necessary) should pretty much do what you need if I understood you correctly:
// Given a textarea with the content
var text = $('textarea').text().split('');
$('textarea').click(function(){
text.splice(this.selectionStart, 0, " ");
this.value = text.join('');
});
It's a very simple and not cross browser enabled example, but it should get you started.
Yes, it will be ok, but setup your event handler not on individual spans, but on the whole container and then see here: http://en.wikipedia.org/wiki/Flyweight_pattern

Looking for a way to search an html page with javascript

what I would like to do is to the html page for a specific string and read in a certain amount of characters after it and present those characters in an anchor tag.
the problem I'm having is figuring out how to search the page for a string everything I've found relates to by tag or id. Also hoping to make it a greasemonkey script for my personal use.
function createlinks(srchstart,srchend){
var page = document.getElementsByTagName('html')[0].innerHTML;
page = page.substring(srchstart,srchend);
if (page.search("file','http:") != -1)
{
var begin = page.search("file','http:") + 7;
var end = begin + 79;
var link = page.substring(begin,end);
document.body.innerHTML += 'LINK | ';
createlinks(end+1,page.length);
}
};
what I came up with unfortunately after finding the links it loops over the document again
Assisted Direction
Lookup JavaScript Regex.
Apply your regex to the page's HTML (see below).
Different regex functions do different things. You could search the document for the string, as suggested, but you'd have to do it recursively, since the string you're searching for may be listed in multiple places.
To Get the Text in the Page
JavaScript: document.getElementsByTagName('html')[0].innerHTML
jQuery: $('html').html()
Note:
IE may require the element to be capitalized (eg 'HTML') - I forget
Also, the document may have newline characters \n that might want to take out, since one could be between the string you're looking for.
Okay, so in javascript you've got the whole document in the DOM tree. You an search for your string by recursively searching the DOM for the string you want. This is striaghtforward; I'll put in pseudocode because you want to think about what libraries (if any) you're using.
function search(node, string):
if node.innerHTML contains string
-- then you found it
else
for each child node child of node
search(child,string)
rof
fi

Categories

Resources