jQuery html() function and - javascript

I have a string that contains unicode encoded nonbreaking space. I need to save this string to the hidden HTML element, so another function can read this value.
It looks like html() function does some transformation of the string. Example:
var testString = "string with \xa0 non breaking space";
$(".export-file-buffer").html(testString);
var receivedString = $(".export-file-buffer").html();
console.log(testString);
console.log(receivedString);
What I see in console:
string with   non breaking space
string with non breaking space
Why exactly it's happening? Could you point me to the doc that describes this behavior?

Rather than making it displayable, if you just need to store a reference to it on an element you can use the data() method.
var testString = "string with \xa0 non breaking space";
var $target = $('#target');
$target.data('rawData', testString);
console.log($target.data('rawData'));
var fromData = $target.data('rawData');
console.log(
fromData.split('').map(function(character){
if (character < ' ' || character > '~' ) {
return '\\x'+ character.charCodeAt(0).toString(16);
} else {
return character;
}
}).join('')
);
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<div id="target"></div>
As you can see, the value is not converted to . The reason for this is that when jQuery sets a value on an Element with data() it does not put it directly on the element. Rather, it stores the value in an internal cache and associates the value to the element. Since the value is only in javascript memory, the browser does not convert it.
The value still prints out in the console not as \xa0 because that hex character code reference is not a visible character code on the ascii chart. I included a little script that encodes the characters on the ascii chart before space and after the tilde.

There are multiple ways of keeping string that can be shared on a html page.
Using input type="hidden". Here you can simply keep value by $(".export-file-buffer").val(testString) just like you did for html
(less recommended) Using a global variable/ keeping on window.
window.export-file-buffer = testString
and retrieve later calling window.export-file-buffer

Related

String is treated differently when extracted from DOM

I am facing a very weird problem with Javascript. When I extract text from DOM and try to decode HTML entities, it's not working. However, when I assign the value directly in the code, it's working just fine.
I just don't get why the string is treated differently in both cases. I have tested in FireFox and Chrome and both produce the same result.
Update:
The correct output should be %7B (after decoding the string). That means that when I assign the value directly to the variable it's working correctly, but when extracted from DOM, it's not. How can I extract the text from DOM and decode it so it produces "%7B" ?
DEMO: jsFiddle
HTML:
<div class="myclass">\u00257B</div>
Javascript Code:
$(document).ready(function(){
//Extracting the text from DOM
var myText = $(".myclass").html();
//decoding HTML entities
var decodedText = $("<div />").html(myText).text();
//alerting the decoded text
alert(decodedText); // output: \u00257B
//assigning the value directly to the variable
var myText2 = "\u00257B";
//decoding HTML entities
var decodedText2 = $("<div />").html(myText2).text();
//alerting decoded text
alert(decodedText2); // output: %7B
});
The reason myText2 produces a different result is because the backslash in string literals is an escape character.
to escape a backslash, simply use it twice:
myText2 = "\\u00257b";
Here is a some further information about escape characters in JavaScript
EDIT
There's probably a better way, but this will work: (eval is generally frowned upon and has security implications if the value from your text is uncontrolled input)
myText = eval("\"" + decodedText + "\"")
I think this is because when you extract the string from the dom the "\u" is escaped.
If you do var myText2 = "\\u00257B"; you'll get the same result
http://jsfiddle.net/9n6t5qxr/1/
if you do console.log('\u0025') it prints %, which is why you are seeing %7B

ASCII character not being recognized in if statement

I am trying to get a string from a html page with jquery and this is what I have.
var text = $(this).text();
var key = text.substring(0,1);
if(key == ' ' || key == ' ')
key = text.substring(1,2);
text is this  Home
And I want to skip the space and or the keycode above It appears this code does not work either. It only gets the text.substring(0,1); instead of text.substring(1,2); because the if statement is not catching.= and I am not sure why. Any help would be super awesome! Thanks!
There are several problems with the code in the question. First,   has no special meaning in JavaScript: it is a string literal with six characters. Second, text.substring(1,2) returns simply the second character of text, not all characters from the second one onwards.
Assuming that you wish to remove one leading SPACE or NO-BREAK SPACE (which is what   means in HTML; it is not an Ascii character, by the way), then the following code would work:
var first = text.substring(0, 1);
if(first === ' ' || first === '\u00A0') {
text = text.substring(1, text.length);
}
The notation \u00A0 is a JavaScript escape notation for NO-BREAK SPACE U+00A0.
Should you wish to remove multiple spaces at the start, and perhaps at the end too, some modifications are needed. In that case, using a replace operation with regular expression is probably best.
If you want remove spaces at the beginning (and end) of a string, you can use the trim function
var myvar = " home"
myVar.trim() // --> "home"
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/Trim

Removing non-break-spaces in JavaScript

I am having trouble removing spaces from a string. First I am converting the div to text(); to remove the tags (which works) and then I'm trying to remove the "&nbsp" part of the string, but it won't work. Any Idea what I'm doing wrong.
newStr = $('#myDiv').text();
newStr = newStr.replace(/ /g, '');
$('#myText').val(newStr);
<html>
<div id = "myDiv"><p>remove space</p></div>
<input type = "text" id = "myText" />
</html>
When you use the text function, you're not getting HTML, but text: the entities have been changed to spaces.
So simply replace spaces:
var str = " a     b   ", // bunch of NBSPs
newStr = str.replace(/\s/g,'');
console.log(newStr)
If you want to replace only the spaces coming from do the replacement before the conversion to text:
newStr = $($('#myDiv').html().replace(/ /g,'')).text();
.text()/textContent do not contain HTML entities (such as ), these are returned as literal characters. Here's a regular expression using the non-breaking space Unicode escape sequence:
var newStr = $('#myDiv').text().replace(/\u00A0/g, '');
$('#myText').val(newStr);
Demo
It is also possible to use a literal non-breaking space character instead of the escape sequence in the Regex, however I find the escape sequence more clear in this case. Nothing that a comment wouldn't solve, though.
It is also possible to use .html()/innerHTML to retrieve the HTML containing HTML entities, as in #Dystroy's answer.
Below is my original answer, where I've misinterpreted OP's use case. I'll leave it here in case anyone needs to remove from DOM elements' text content
[...] However, be aware that re-setting the .html()/innerHTML of an element means trashing out all of the listeners and data associated with it.
So here's a recursive solution that only alters the text content of text nodes, without reparsing HTML nor any side effects.
function removeNbsp($el) {
$el.contents().each(function() {
if (this.nodeType === 3) {
this.nodeValue = this.nodeValue.replace(/\u00A0/g, '');
} else {
removeNbsp( $(this) );
}
});
}
removeNbsp( $('#myDiv') );
Demo

Value &# to unicode convert

I have lots of characters in the form ¶ which I would like to display as unicode characters in my text editor.
This ought to convert them:
var newtext = doctext.replace(
/&#(\d+);/g,
String.fromCharCode(parseInt("$1", 10))
);
But doesn't seem to work. The regular expression /&#(\d+);/ is getting me the numbers out -- but the String.fromCharCode does not appear to give the results I'd like. What is up?
The replacement part should be an anonymous function instead of an expression:
var newtext = doctext.replace(
/&#(\d+);/g,
function($0, $1) {
return String.fromCharCode(parseInt($1, 10));
}
);
The replace method is not foolproof, if you use full HTML (i.e. don't control what the input is). For example, the method submitted by Jack (and obviously the idea in the original post as well) works excellently if your entities are all decimal, but doesn't work for hex A, and even less for named entities like ".
For this, there is another trick you can do: create an element, set its innerHTML to the source, then read out its text value. Basically, browsers know what to do with entities, so we delegate. :) In jQuery it is easy:
$('<div/>').html('&').text()
// => "&"
With plain JS it gets a bit more verbose:
var el = document.createElement();
el.innerHTML = '&';
el.textContent
// => "&"

How is a non-breaking space represented in a JavaScript string?

This apparently is not working:
X = $td.text();
if (X == ' ') {
X = '';
}
Is there something about a non-breaking space or the ampersand that JavaScript doesn't like?
is a HTML entity. When doing .text(), all HTML entities are decoded to their character values.
Instead of comparing using the entity, compare using the actual raw character:
var x = td.text();
if (x == '\xa0') { // Non-breakable space is char 0xa0 (160 dec)
x = '';
}
Or you can also create the character from the character code manually it in its Javascript escaped form:
var x = td.text();
if (x == String.fromCharCode(160)) { // Non-breakable space is char 160
x = '';
}
More information about String.fromCharCode is available here:
fromCharCode - MDC Doc Center
More information about character codes for different charsets are available here:
Windows-1252 Charset
UTF-8 Charset
Remember that .text() strips out markup, thus I don't believe you're going to find in a non-markup result.
Made in to an answer....
var p = $('<p>').html(' ');
if (p.text() == String.fromCharCode(160) && p.text() == '\xA0')
alert('Character 160');
Shows an alert, as the ASCII equivalent of the markup is returned instead.
That entity is converted to the char it represents when the browser renders the page. JS (jQuery) reads the rendered page, thus it will not encounter such a text sequence. The only way it could encounter such a thing is if you're double encoding entities.
The jQuery docs for text() says
Due to variations in the HTML parsers
in different browsers, the text
returned may vary in newlines and
other white space.
I'd use $td.html() instead.

Categories

Resources