Javascript Removing Whitespace When It Shouldn't?

Javascript Removing Whitespace When It Shouldn't? - javascript

I have a HTML file that has code similar to the following.
<table>
<tr>
<td id="MyCell">Hello World</td>
</tr>
</table>
I am using javascript like the following to get the value
document.getElementById(cell2.Element.id).innerText
This returns the text "Hello World" with only 1 space between hello and world. I MUST keep the same number of spaces, is there any way for that to be done?
I've tried using innerHTML, outerHTML and similar items, but I'm having no luck.

HTML is white space insensititive which means your DOM is too. Would wrapping your "Hello World" in pre block work at all?

In HTML,any spaces >1 are ignored, both in displaying text and in retrieving it via the DOM. The only guaranteed way to maintain spaces it to use a non-breaking space .

Just a tip, innerText only works in Internet Explorer, while innerHTML works in every browser... so, use innerHTML instead of innerText

The pre tag or white-space: pre in your CSS will treat all spaces as meaningful. This will also, however, turn newlines into line breaks, so be careful.

Just an opinion here and not canonical advice, but you're headed for a world or hurt if you're trying to extract exact text values from the DOM using the inner/outer HTML/TEXT properties via Javascript. Different browsers are going to return slightly different values, based on how the browser "sees" the internal document.
If you can, I'd change the HTML you're rendering to include a hidden input, something like
<table>
<tr>
<td id="MyCell">Hello World<input id="MyCell_VALUE" type="hidden" value="Hello World" /></td>
</tr>
</table>
And then grab your value in javascript something like
document.getElementById(cell2.Element.id+'_VALUE').value
The input tags were designed to hold values, and you'll be less likely to run into fidelity issues.
Also, it sounds like you're using a .NET control of some kind. It might be worth looking through the documentation (ha) or asking a slightly different question to see if the control offers an official client-side API of some kind.

Just checked it and it looks like wrapping with the pre tag should do it.

Edit: I am wrong, ignore me.
You can get a text node's nodeValue, which should correctly represent its whitespace.
Here is a function to recursively get the text within a given element (and it's library-safe, won't fail if you use something that modifies Array.prototype or whatever):
var textValue = function(element) {
if(!element.hasOwnProperty('childNodes')) {
return '';
}
var childNodes = element.childNodes, text = '', childNode;
for(var i in childNodes) {
if(childNodes.hasOwnProperty(i)) {
childNode = childNodes[i];
if(childNode.nodeType == 3) {
text += childNode.nodeValue;
} else {
text += textValue(childNode);
}
}
}
return text;
};

This is a bit hacky, but it works on my IE.
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" lang="en">
<head>
<title></title>
</head>
<body>
<div id="a">a b</div>
<script>
var a = document.getElementById("a");
a.style.whiteSpace = "pre"
window.onload = function() {
alert(a.firstChild.nodeValue.length) // should show 4
}
</script>
</body>
</html>
Some notes:
You must have a doctype.
You cannot query the DOM element before window.onload has fired
You should use element.nodeValue instead of innerHTML et al to avoid bugs when the text contains things like < > & "
You cannot reset whiteSpace once IE finishes rendering the page due to what I assume is an ugly bug

If someone could format my last post correctly it would look more readable. Sorry, I messed that one up. Basically the trick is create create a throwaway pre element, then append a copy of your node to that. Then you can get innerText or textContent depending on the browser.
All browsers except IE basically do the obvious thing correctly. IE requires this hack since it only preserves white-space in pre elements, and only when you access innerText.

This following trick preserves white-space in innerText in IE
var cloned = element.cloneNode(true);
var pre = document.createElement("pre");
pre.appendChild(cloned);
var textContent = pre.textContent
? pre.textContent
: pre.innerText;
delete pre;
delete cloned;

Related

Replace non-code text on webpage

I searched through a bunch of related questions that help with replacing site innerHTML using JavaScript, but most reply on targetting the ID or Class of the text. However, my can be either inside a span or td tag, possibly elsewhere. I finally was able to gather a few resources to make the following code work:
$("body").children().each(function() {
$(this).html($(this).html().replace(/\$/g,"%"));
});
The problem with the above code is that I randomly see some code artifacts or other issues on the loaded page. I think it has something to do with there being multiple "$" part of the website code and the above script is converting it to %, hence breaking things.using JavaScript or Jquery
Is there any way to modify the code (JavaScript/jQuery) so that it does not affect code elements and only replaces the visible text (i.e. >Here<)?
Thanks!
---Edit---
It looks like the reason I'm getting a conflict with some other code is that of this error "Uncaught TypeError: Cannot read property 'innerText' of undefined". So I'm guessing there are some elements that don't have innerText (even though they don't meet the regex criteria) and it breaks other inline script code.
Is there anything I can add or modify the code with to not try the .replace if it doesn't meet the regex expression or to not replace if it's undefined?

Wholesale regex modifications to the DOM are a little dangerous; it's best to limit your work to only the DOM nodes you're certain you need to check. In this case, you want text nodes only (the visible parts of the document.)
This answer gives a convenient way to select all text nodes contained within a given element. Then you can iterate through that list and replace nodes based on your regex, without having to worry about accidentally modifying the surrounding HTML tags or attributes:
var getTextNodesIn = function(el) {
return $(el)
.find(":not(iframe, script)") // skip <script> and <iframe> tags
.andSelf()
.contents()
.filter(function() {
return this.nodeType == 3; // text nodes only
}
);
};
getTextNodesIn($('#foo')).each(function() {
var txt = $(this).text().trim(); // trimming surrounding whitespace
txt = txt.replace(/^\$\d$/g,"%"); // your regex
$(this).replaceWith(txt);
})
console.log($('#foo').html()); // tags and attributes were not changed
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<div id="foo"> Some sample data, including bits that a naive regex would trip up on:
foo<span data-attr="$1">bar<i>$1</i>$12</span><div>baz</div>
<p>$2</p>
$3
<div>bat</div>$0
<!-- $1 -->
<script>
// embedded script tag:
console.log("<b>$1</b>"); // won't be replaced
</script>
</div>

I did it solved it slightly differently and test each value against regex before attempting to replace it:
var regEx = new RegExp(/^\$\d$/);
var allElements = document.querySelectorAll("*");
for (var i = 0; i < allElements.length; i++){
var allElementsText = allElements[i].innerText;
var regExTest = regEx.test(allElementsText);
if (regExTest=== true) {
console.log(el[i]);
var newText = allElementsText.replace(regEx, '%');
allElements[i].innerText=newText;
}
}
Does anyone see any potential issues with this?
One issue I found is that it does not work if part of the page refreshes after the page has loaded. Is there any way to have it re-run the script when new content is generated on page?

Faithfully insert a string in an area

I would like to dynamically print a string (which is NOT constant) in a certain area (eg, div result). Then I use the following code:
<!DOCTYPE html>
<html>
<body>
<div id="result"></div>
<script>
var elt = document.createElement("span");
elt.innerHTML = "=A2<C2";
document.querySelector("#result").appendChild(elt);
</script>
</body>
</html>
The problem is that if the string I want to print contains <, it interprets that and does not print < faithfully. For example, the above code prints =A2.
I see some threads proposing to replace < by < + space. But I don't like the space inserted. Additionally, I don't know if there are other special characters that will be interpreted.
So does anyone know any general solution to print a string faithfully?
PS: JSBin

You can substitute .textContent for .innerHTML
elt.textContent = "=A2<C2";

Definitely don't insert text with innerHTML; only insert HTML that way. Inserting text is a bit more verbose, but not too difficult:
elt.appendChild(document.createTextNode("=A2<C2"));

Javascript get inner HTML text for span by class name

This is the basic format of the code the table is contained within a div named
<div class="leftCol">
.....
<tr id="my_cd">
<td><span class="agt_span">My Code</span></td>
</tr>
.....
</div>
I need to be able to get whatever text is contained within the span class, in this case I need to pull the text "My Code" and then add that into an array. Adding the text into an array is not the issue that's easy but I can't figure out how to pull the text. No matter what I try I can't get anything but an 'undefined' value.
How do I get the Inner HTML text value for a span by class name?
First Question solved thanks!!
Second question expand on first:
<div class="leftCol">
.....
<tr id="my_cd">
<td><span class="agt_span">My Code</span></td>
<td>
<div>
<select name="agt_drp" id="agt_drp" class="agt_drp">...</select>
</div>
</td>
</tr>
</div>
Let's say I have the select id "agt_drp" and I want to get the span class text. Is there any way to do that?

Jquery:
var test = $("span.agt_span").text();
alert(test):
Javascript:
http://www.w3schools.com/jsref/met_document_getelementsbyclassname.asp

in vanilla javascript, you can use getElementsByClassName():
var htmlString = document.getElementsByClassName('agt_span')[0].innerHTML;
https://jsfiddle.net/ky38esoo/
Notice the index behind the method.

JQuery:
$('span.agt_span').text();
Pure JavaScript (you need to specify the position of your class element: [0] to get the first one):
document.getElementsByClassName('agt_span')[0].innerHTML;
If you have multiples elements with this class, you can loop on it:
var elts = document.getElementsByClassName('agt_span');
for (var i = 0; i < elts.length; ++i) {
alert(elts[i].innerHTML);
}

Though getElementsByClassName seems to be supported by all major browser, that is now argument to use it. To keep your code compatible and usefull, better use the W3C Standard DOM Level 3 Core. The Document IDL does not describe such a method there!
So please use
var table = document.getElementById("my_cd"); /* id is unique by definition! */
var spans = table.getElementsByTagName("span");
var txt;
for(i in spans) {
if(spans[i].getAttribute("class").contains("agt_span")){
txt = spans[i].firstChild; /* a span should have only one child node, that contains the text */
}
}
return txt;
This method isn't perfect, as you actually need to split the spans[i].getAttribute("class").split(" ") on space chars and check if this array contains "agt_span".
By the way: innerHTML is no DOM Attribute too. But you can implement anything in a compatible and flexible way using W3C DOM and you will be sure to write effective and compatible code.
If the js programmers had used the W3C Documents and if there weren't no Internet Explorer to break all those ECMAScript and W3C rules, there would never have been that many incompatibilities between all those browser versions.

Does javascript consider everything enclosed in <> as html tags?

I am tasked with converting hundreds of Word document pages into a knowledge base html application. This means copying and pasting the HTML of the word document into an editor like Notepad++ and cleaning it up. (Since it is internal document I need to convert, I cannot use online converters).
I have been able to do most of what I need with a javascript function that works "onload" of the body tag. I then copy the resulting HTML into my application framework.
Here is part of the function I wrote: (it shows only code for removing attributes of div and p tags but works for all html tags in the document)
function removeatts() //this function will remove all attributes from all elements and also remove empty span elements
{//for removing div tag attributes
var divs=document.getElementsByTagName('div'); //look at all div tags
var divnum=divs.length; //number of div tags on the page
for (var i=0; i<divnum; i++) //run through all the div tags
{//remove attributes for each div tag
divs[i].removeAttribute("class");
divs[i].removeAttribute("id");
divs[i].removeAttribute("name");
divs[i].removeAttribute("style");
divs[i].removeAttribute("lang");
}
//for removing p tag attributes
var ps=document.getElementsByTagName('p'); //look at all p tags
var pnum=ps.length; //number of p tags on the page
for (var i=0; i<pnum; i++) //run through all the p tags
{//remove attributes for each p tag
var para=ps[i].innerHTML;
if (para.length!==0) //ie if there is content inside the p tag
{
ps[i].removeAttribute("class");
ps[i].removeAttribute("id");
ps[i].removeAttribute("name");
ps[i].removeAttribute("style");
ps[i].removeAttribute("lang");
}
else
{//remove empty p tag
ps[i].remove() ;
}
if (para=="<o:p></o:p>" || para=="<o:p> </o:p>" || para=="<o:p> </o:p>")
{
ps[i].remove() ;
}
}
The first problem I encountered is that if I included the if (para=="<o:p></o:p>" || para=="<o:p> </o:p>" || para=="<o:p> </o:p>") part in an else if statement, the whole function stopped executing.
However, without the if (para=="<o:p></o:p>" || para=="<o:p> </o:p>" || para=="<o:p> </o:p>") part, the function does exactly what it is supposed to.
If, however, I keep it the way it is right now, it does some of what I want it to do.
The trouble occurs over some of the Word generated html that looks like this:
<p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto; margin-
left:.25in;text-align:justify;text-indent:-.25in;line-height:150%;
mso-list:l0 level1 lfo1;tab-stops:list .75in'>
<![if !supportLists]><span style='font-family:Symbol;mso-fareast-font-family:Symbol;mso-bidi-font-family:Symbol;color:black'><span style='mso-list:Ignore'>·
<span style='font:7.0pt "Times New Roman"'>
</span></span></span>
<![endif]><span style='font-family:"Arial","sans-serif";mso-fareast-font-family:Calibri;color:black'>
SOME TEXT.<span style='mso-spacerun:yes'>  </span>SOME MORE TEXT.<span style='mso-spacerun:yes'>  </span>EVEN MORE TEXT.
<span style='mso-spacerun:yes'>  </span>BLAH BLAH BLAH.<o:p></o:p></span></p>
<p><o:p></o:p></p>
Notice the <o:p></o:p> in the last two lines..... This is not getting removed either when treated as plain text or if I write code for it in the function just like the divs and paragraphs as shown in the function above. When I run the function on this, I get
<p>
<![if !supportLists]><span>·
<span>
</span></span></span>
<![endif]><span>
SOME TEXT.<span> </span>SOME MORE TEXT.<span> </span>EVEN MORE TEXT.
<span> </span>BLAH BLAH BLAH.<o:p></o:p></span></p>
<p><o:p></o:p></p>
I have looked around but cannot find any information about whether javascript works the same on known html tags and on something like this that follows the principle of opening and closing tags but doesn't match known HTML tags!
Any ideas about a workaround would be greatly appreciated!

Javascript has no special processing of HTML tags in javascript strings. It honestly doesn't know anything about HTML in the string.
More likely your issue is trying to compare .innerHTML of a tag to a predetermined string. You cannot and should not do that because there is no guarentee for the format of .innerHTML. As there are hundreds of ways that the same HTML can be formatted and some browsers don't remember the original HTML, but reconstitue it when you ask for .innerHTML, you simply can't do that type of string comparison.
To be sure of your comparison, you will have to actually parse the HTML (at least with some sort of crude parser which perhaps could even be a regex) to see if it matches what you want because you can't rely on optional spacing or optional capitilization in a direct string comparison.
Or, perhaps even better, since your HTML is already parsed, why not just look at the actual HTML objects themselves and see if you have what you want there. You shouldn't even have to remove all those attributes then.

It's not Javascript that is unhappy with the unknown tags. It's the browser.
For JS it's simply a string. So, if it's a very specific case that you don't need <o:p> in particular then you could just remove it by running it with a regex itself.
para.replace(/<[/]?o:p>/ig, "");
But if there are many more, I would strongly suggest you to get familiar with XSLT transformation.

The first problem I encountered is that if I included the if (para=="<o:p></o:p>" || para=="<o:p> </o:p>" || para=="<o:p> </o:p>")
part in an else if statement, the whole function stopped executing.
This is because you cannot have else if after else.
Notice the <o:p></o:p> in the last two lines..... This is not getting removed
I cannot confirm that. When I run your function it removes the <o:p> inside the <p>, as it is supposed to. The <o:p> within the <span> is not processed, because your function does not do that.
If you want to remove all <o:p>s, try
[].forEach.call(document.querySelectorAll('o\\:p'), function (el) {
el.remove();
});
After that, you may want to remove empty <p>s like this
[].forEach.call(document.querySelectorAll('p'), function (el) {
if (!el.childNodes.length) {
el.remove();
}
});

How do I change the text of a span element using JavaScript?

If I have a span, say:
<span id="myspan"> hereismytext </span>
How do I use JavaScript to change "hereismytext" to "newtext"?

For modern browsers you should use:
document.getElementById("myspan").textContent="newtext";
While older browsers may not know textContent, it is not recommended to use innerHTML as it introduces an XSS vulnerability when the new text is user input (see other answers below for a more detailed discussion):
//POSSIBLY INSECURE IF NEWTEXT BECOMES A VARIABLE!!
document.getElementById("myspan").innerHTML="newtext";

Using innerHTML is SO NOT RECOMMENDED.
Instead, you should create a textNode. This way, you are "binding" your text and you are not, at least in this case, vulnerable to an XSS attack.
document.getElementById("myspan").innerHTML = "sometext"; //INSECURE!!
The right way:
span = document.getElementById("myspan");
txt = document.createTextNode("your cool text");
span.appendChild(txt);
For more information about this vulnerability:
Cross Site Scripting (XSS) - OWASP
Edited nov 4th 2017:
Modified third line of code according to #mumush suggestion: "use appendChild(); instead".
Btw, according to #Jimbo Jonny I think everything should be treated as user input by applying Security by layers principle. That way you won't encounter any surprises.

EDIT: This was written in 2014. A lot has changed. You probably don't care about IE8 anymore. And Firefox now supports innerText.
If you are the one supplying the text and no part of the text is supplied by the user (or some other source that you don't control), then setting innerHTML might be acceptable:
// * Fine for hardcoded text strings like this one or strings you otherwise
// control.
// * Not OK for user-supplied input or strings you don't control unless
// you know what you are doing and have sanitized the string first.
document.getElementById('myspan').innerHTML = 'newtext';
However, as others note, if you are not the source for any part of the text string, using innerHTML can subject you to content injection attacks like XSS if you're not careful to properly sanitize the text first.
If you are using input from the user, here is one way to do it securely while also maintaining cross-browser compatibility:
var span = document.getElementById('myspan');
span.innerText = span.textContent = 'newtext';
Firefox doesn't support innerText and IE8 doesn't support textContent so you need to use both if you want to maintain cross-browser compatibility.
And if you want to avoid reflows (caused by innerText) where possible:
var span = document.getElementById('myspan');
if ('textContent' in span) {
span.textContent = 'newtext';
} else {
span.innerText = 'newtext';
}

document.getElementById('myspan').innerHTML = 'newtext';

I use Jquery and none of the above helped, I don't know why but this worked:
$("#span_id").text("new_value");

Here's another way:
var myspan = document.getElementById('myspan');
if (myspan.innerText) {
myspan.innerText = "newtext";
}
else
if (myspan.textContent) {
myspan.textContent = "newtext";
}
The innerText property will be detected by Safari, Google Chrome and MSIE. For Firefox, the standard way of doing things was to use textContent but since version 45 it too has an innerText property, as someone kindly apprised me recently. This solution tests to see if a browser supports either of these properties and if so, assigns the "newtext".
Live demo: here

In addition to the pure javascript answers above, You can use jQuery text method as following:
$('#myspan').text('newtext');
If you need to extend the answer to get/change html content of a span or div elements, you can do this:
$('#mydiv').html('<strong>new text</strong>');
References:
.text(): http://api.jquery.com/text/
.html(): http://api.jquery.com/html/

You may also use the querySelector() method, assuming the 'myspan' id is unique as the method returns the first element with the specified selector:
document.querySelector('#myspan').textContent = 'newtext';
developer.mozilla

Many people still come across this question (in 2022) and the available answers are not really up to date.
Use innerText is the best method
As you can see in the MDM Docs innerText is the best way to retrieve and change the text of a <span> HTML element via Javascript.
The innerText property is VERY well supported (97.53% of all web users according to Caniuse)
How to use
Simple retrieve and set new text with the property like this:
let mySpan = document.getElementById("myspan");
console.log(mySpan.innerText);
mySpan.innerText = "Setting a new text content into the span element.";
Why better than innerHTML ?
Don't use innerHTML to updating the content with user inputs, this can lead to major vulnerability since the string content you will set will be interpreted and converted into HTML tags.
This means users can insert script(s) into your site, this is known as XSS attacks/vulnerabilities (Cross-site scripting).
Why better than textContent ?
First point textContent isn't supported by IE8 (but I think in 2022 nobody cares anymore).
But the main element is the true difference of result you can get using textContent instead of innerText.
The example from the MDM documentation is perfect to illustrate that, so we have the following setup:
<p id="source">
<style>#source { color: red; } #text { text-transform: uppercase; }</style>
<span id=text>Take a look at<br>how this text<br>is interpreted
below.</span>
<span style="display:none">HIDDEN TEXT</span>
</p>
If you use innerText to retrieve the text content of <p id="source"> we get:
TAKE A LOOK AT
HOW THIS TEXT
IS INTERPRETED BELOW.
This is perfectly what we wanted.
Now using textContent we get:
#source { color: red; } #text { text-transform: uppercase; }
Take a look athow this textis interpreted
below.
HIDDEN TEXT
Not exactly what you expected...
This is why using textContent isn't the correct way.
Last point
If you goal is only to append text to a <p> or <span> HTML element, the answer from nicooo. is right you can create a new text node and append it to you existing element like this:
let mySpan = document.getElementById("myspan");
const newTextNode = document.createTextNode("Youhou!"),
mySpan.appendChild(newTextNode);

Like in other answer, innerHTML and innerText are not recommended, it's better use textContent. This attribute is well supported, you can check it this:
http://caniuse.com/#search=textContent

document.getElementById("myspan").textContent="newtext";
this will select dom-node with id myspan and change it text content to new text

You can do document.querySelector("[Span]").textContent = "content_to_display";

Can't be used with HTML code insertion, something like:
var a = "get the file <a href='url'>the link</a>"
var b = "get the file <a href='url'>another link</a>"
var c = "get the file <a href='url'>last link</a>"
using
document.getElementById("myspan").textContent=a;
on
<span id="myspan">first text</span>
with a timer but it just shows the reference target as text not runing the code, even tho it does shows correctly on the source code. If the jquery approch is not really a solution, the use of:
document.getElementById("myspan").innerHTML = a to c;
is the best way to make it work.

const span = document.querySelector("#span");
const btn = document.querySelector("#changeBtn");
btn.addEventListener("click", () => {
span.innerText = "text changed"
})
<span id="span">Sample Text</span>
<button id="changeBtn">Change Text</button>

For this span
<span id="name">sdfsdf</span>
You can go like this :-
$("name").firstChild.nodeValue = "Hello" + "World";

(function ($) {
$(document).ready(function(){
$("#myspan").text("This is span");
});
}(jQuery));
<script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/3.3.1/jquery.min.js"></script>
<span id="myspan"> hereismytext </span>
user text() to change span text.

I used this one document.querySelector('ElementClass').innerText = 'newtext';
Appears to work with span, texts within classes/buttons

For some reason, it seems that using "text" attribute is the way to go with most browsers.
It worked for me
$("#span_id").text("text value to assign");

Develop Reference

JavaScript is the programming language of the Web.

Javascript Removing Whitespace When It Shouldn't? - javascript

HTML is white space insensititive which means your DOM is too. Would wrapping your "Hello World" in pre block work at all?

In HTML,any spaces >1 are ignored, both in displaying text and in retrieving it via the DOM. The only guaranteed way to maintain spaces it to use a non-breaking space .

Just a tip, innerText only works in Internet Explorer, while innerHTML works in every browser... so, use innerHTML instead of innerText

The pre tag or white-space: pre in your CSS will treat all spaces as meaningful. This will also, however, turn newlines into line breaks, so be careful.

Just checked it and it looks like wrapping with the pre tag should do it.

This following trick preserves white-space in innerText in IE var cloned = element.cloneNode(true); var pre = document.createElement("pre"); pre.appendChild(cloned); var textContent = pre.textContent ? pre.textContent : pre.innerText; delete pre; delete cloned;

Related

Replace non-code text on webpage

Faithfully insert a string in an area

Javascript get inner HTML text for span by class name

Does javascript consider everything enclosed in <> as html tags?

How do I change the text of a span element using JavaScript?

Categories

Resources