I have the following jQuery that mostly works:
$("article > p, article > div, article > ol > li, article > ul > li").contents().each(function() {
if (this.nodeType === 3) {
strippedValue = $.trim($(this).text());
doStuff(strippedValue);
}
if (this.nodeType === 1) {
strippedValue = $.trim($(this).html());
doStuff(strippedValue);
}
})
The problems comes when (inside doStuff()) I try to replace HTML tags. Here is a view of my elements:
And I'm trying to replace those <kbd> tags thusly:
newStr = newStr.replace(/<kbd>/g, " <b>");
newStr = newStr.replace(/<\/kbd>/g, "<b> ");
That doesn't work, and I'm seeing in the debugger that the <kbd> tags are seen as first-class children and looped separately. Whereas I want everything inside my selectors to be seen as a raw string so I can replace things. And I realize I'm asking for a contradiction, because .contents() means get children and their contents. So if I have a selector that is a direct parent of <kbd>, then <kdb> ceases to become a raw string and becomes instead a node that is being looped.
So it seems like my selectors are wrong BUT whenever I try to bring my selectors higher in the hierarchy, immediately I lose textual contents and I end up with a bunch of html with no contents inside the elements. (The screenshot shows good contents, as expected.)
So for example I tried this:
$("article").contents().each(function() {
...
}
...hoping that the selector looping would occur a little higher, and thus allow HTML tags further down to come through as raw text. But clearly I'm lost.
My objective is to simply perform a bunch of string replacements on the contents of the html. But there are two challenges with this:
The page contents load dynamically, with ajaxy calls or similar, so full contents are not available until about a second or two after page load.
When I try to grab high-level elements such as body, it ends up devoid of much of the textual contents. The selectors I currently have don't suffer from that problem; those get everything I want BUT then HTML/XML elements get looped instead of coming through as plain text so that I can perform replacements.
Why do you need to perform the modification on raw HTML? You could just replace the DOM elements directly (not to mention that this is much more reliable then using string replacement):
$('kbd').replaceWith(function() {
return ` <b>${this.textContent}</b> `;
// or directly create DOM elements:
// const b = document.createElement('b');
// b.textContent = this.textContent;
// return b;
});
console.log($('b').length);
<script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/3.3.1/jquery.min.js"></script>
<kbd>hello world</kbd>
Of course you can still do string replacements where it makes sense, but you should work with DOM elements as much as possible.
Related
Lets say I have an empty div:
<div id='myDiv'></div>
Is this:
$('#myDiv').html("<div id='mySecondDiv'></div>");
The same as:
var mySecondDiv=$("<div id='mySecondDiv'></div>");
$('#myDiv').append(mySecondDiv);
Whenever you pass a string of HTML to any of jQuery's methods, this is what happens:
A temporary element is created, let's call it x. x's innerHTML is set to the string of HTML that you've passed. Then jQuery will transfer each of the produced nodes (that is, x's childNodes) over to a newly created document fragment, which it will then cache for next time. It will then return the fragment's childNodes as a fresh DOM collection.
Note that it's actually a lot more complicated than that, as jQuery does a bunch of cross-browser checks and various other optimisations. E.g. if you pass just <div></div> to jQuery(), jQuery will take a shortcut and simply do document.createElement('div').
EDIT: To see the sheer quantity of checks that jQuery performs, have a look here, here and here.
innerHTML is generally the faster approach, although don't let that govern what you do all the time. jQuery's approach isn't quite as simple as element.innerHTML = ... -- as I mentioned, there are a bunch of checks and optimisations occurring.
The correct technique depends heavily on the situation. If you want to create a large number of identical elements, then the last thing you want to do is create a massive loop, creating a new jQuery object on every iteration. E.g. the quickest way to create 100 divs with jQuery:
jQuery(Array(101).join('<div></div>'));
There are also issues of readability and maintenance to take into account.
This:
$('<div id="' + someID + '" class="foobar">' + content + '</div>');
... is a lot harder to maintain than this:
$('<div/>', {
id: someID,
className: 'foobar',
html: content
});
They are not the same. The first one replaces the HTML without creating another jQuery object first. The second creates an additional jQuery wrapper for the second div, then appends it to the first.
One jQuery Wrapper (per example):
$("#myDiv").html('<div id="mySecondDiv"></div>');
$("#myDiv").append('<div id="mySecondDiv"></div>');
Two jQuery Wrappers (per example):
var mySecondDiv=$('<div id="mySecondDiv"></div>');
$('#myDiv').html(mySecondDiv);
var mySecondDiv=$('<div id="mySecondDiv"></div>');
$('#myDiv').append(mySecondDiv);
You have a few different use cases going on. If you want to replace the content, .html is a great call since its the equivalent of innerHTML = "...". However, if you just want to append content, the extra $() wrapper set is unneeded.
Only use two wrappers if you need to manipulate the added div later on. Even in that case, you still might only need to use one:
var mySecondDiv = $("<div id='mySecondDiv'></div>").appendTo("#myDiv");
// other code here
mySecondDiv.hide();
if by .add you mean .append, then the result is the same if #myDiv is empty.
is the performance the same? dont know.
.html(x) ends up doing the same thing as .empty().append(x)
Well, .html() uses .innerHTML which is faster than DOM creation.
.html() will replace everything.
.append() will just append at the end.
You can get the second method to achieve the same effect by:
var mySecondDiv = $('<div></div>');
$(mySecondDiv).find('div').attr('id', 'mySecondDiv');
$('#myDiv').append(mySecondDiv);
Luca mentioned that html() just inserts hte HTML which results in faster performance.
In some occassions though, you would opt for the second option, consider:
// Clumsy string concat, error prone
$('#myDiv').html("<div style='width:'" + myWidth + "'px'>Lorem ipsum</div>");
// Isn't this a lot cleaner? (though longer)
var newDiv = $('<div></div>');
$(newDiv).find('div').css('width', myWidth);
$('#myDiv').append(newDiv);
Other than the given answers, in the case that you have something like this:
<div id="test">
<input type="file" name="file0" onchange="changed()">
</div>
<script type="text/javascript">
var isAllowed = true;
function changed()
{
if (isAllowed)
{
var tmpHTML = $('#test').html();
tmpHTML += "<input type=\"file\" name=\"file1\" onchange=\"changed()\">";
$('#test').html(tmpHTML);
isAllowed = false;
}
}
</script>
meaning that you want to automatically add one more file upload if any files were uploaded, the mentioned code will not work, because after the file is uploaded, the first file-upload element will be recreated and therefore the uploaded file will be wiped from it. You should use .append() instead:
function changed()
{
if (isAllowed)
{
var tmpHTML = "<input type=\"file\" name=\"file1\" onchange=\"changed()\">";
$('#test').append(tmpHTML);
isAllowed = false;
}
}
This has happened to me . Jquery version : 3.3.
If you are looping through a list of objects, and want to add each object as a child of some parent dom element, then .html and .append will behave very different. .html will end up adding only the last object to the parent element, whereas .append will add all the list objects as children of the parent element.
I searched through a bunch of related questions that help with replacing site innerHTML using JavaScript, but most reply on targetting the ID or Class of the text. However, my can be either inside a span or td tag, possibly elsewhere. I finally was able to gather a few resources to make the following code work:
$("body").children().each(function() {
$(this).html($(this).html().replace(/\$/g,"%"));
});
The problem with the above code is that I randomly see some code artifacts or other issues on the loaded page. I think it has something to do with there being multiple "$" part of the website code and the above script is converting it to %, hence breaking things.using JavaScript or Jquery
Is there any way to modify the code (JavaScript/jQuery) so that it does not affect code elements and only replaces the visible text (i.e. >Here<)?
Thanks!
---Edit---
It looks like the reason I'm getting a conflict with some other code is that of this error "Uncaught TypeError: Cannot read property 'innerText' of undefined". So I'm guessing there are some elements that don't have innerText (even though they don't meet the regex criteria) and it breaks other inline script code.
Is there anything I can add or modify the code with to not try the .replace if it doesn't meet the regex expression or to not replace if it's undefined?
Wholesale regex modifications to the DOM are a little dangerous; it's best to limit your work to only the DOM nodes you're certain you need to check. In this case, you want text nodes only (the visible parts of the document.)
This answer gives a convenient way to select all text nodes contained within a given element. Then you can iterate through that list and replace nodes based on your regex, without having to worry about accidentally modifying the surrounding HTML tags or attributes:
var getTextNodesIn = function(el) {
return $(el)
.find(":not(iframe, script)") // skip <script> and <iframe> tags
.andSelf()
.contents()
.filter(function() {
return this.nodeType == 3; // text nodes only
}
);
};
getTextNodesIn($('#foo')).each(function() {
var txt = $(this).text().trim(); // trimming surrounding whitespace
txt = txt.replace(/^\$\d$/g,"%"); // your regex
$(this).replaceWith(txt);
})
console.log($('#foo').html()); // tags and attributes were not changed
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<div id="foo"> Some sample data, including bits that a naive regex would trip up on:
foo<span data-attr="$1">bar<i>$1</i>$12</span><div>baz</div>
<p>$2</p>
$3
<div>bat</div>$0
<!-- $1 -->
<script>
// embedded script tag:
console.log("<b>$1</b>"); // won't be replaced
</script>
</div>
I did it solved it slightly differently and test each value against regex before attempting to replace it:
var regEx = new RegExp(/^\$\d$/);
var allElements = document.querySelectorAll("*");
for (var i = 0; i < allElements.length; i++){
var allElementsText = allElements[i].innerText;
var regExTest = regEx.test(allElementsText);
if (regExTest=== true) {
console.log(el[i]);
var newText = allElementsText.replace(regEx, '%');
allElements[i].innerText=newText;
}
}
Does anyone see any potential issues with this?
One issue I found is that it does not work if part of the page refreshes after the page has loaded. Is there any way to have it re-run the script when new content is generated on page?
I am tasked with converting hundreds of Word document pages into a knowledge base html application. This means copying and pasting the HTML of the word document into an editor like Notepad++ and cleaning it up. (Since it is internal document I need to convert, I cannot use online converters).
I have been able to do most of what I need with a javascript function that works "onload" of the body tag. I then copy the resulting HTML into my application framework.
Here is part of the function I wrote: (it shows only code for removing attributes of div and p tags but works for all html tags in the document)
function removeatts() //this function will remove all attributes from all elements and also remove empty span elements
{//for removing div tag attributes
var divs=document.getElementsByTagName('div'); //look at all div tags
var divnum=divs.length; //number of div tags on the page
for (var i=0; i<divnum; i++) //run through all the div tags
{//remove attributes for each div tag
divs[i].removeAttribute("class");
divs[i].removeAttribute("id");
divs[i].removeAttribute("name");
divs[i].removeAttribute("style");
divs[i].removeAttribute("lang");
}
//for removing p tag attributes
var ps=document.getElementsByTagName('p'); //look at all p tags
var pnum=ps.length; //number of p tags on the page
for (var i=0; i<pnum; i++) //run through all the p tags
{//remove attributes for each p tag
var para=ps[i].innerHTML;
if (para.length!==0) //ie if there is content inside the p tag
{
ps[i].removeAttribute("class");
ps[i].removeAttribute("id");
ps[i].removeAttribute("name");
ps[i].removeAttribute("style");
ps[i].removeAttribute("lang");
}
else
{//remove empty p tag
ps[i].remove() ;
}
if (para=="<o:p></o:p>" || para=="<o:p> </o:p>" || para=="<o:p> </o:p>")
{
ps[i].remove() ;
}
}
The first problem I encountered is that if I included the if (para=="<o:p></o:p>" || para=="<o:p> </o:p>" || para=="<o:p> </o:p>") part in an else if statement, the whole function stopped executing.
However, without the if (para=="<o:p></o:p>" || para=="<o:p> </o:p>" || para=="<o:p> </o:p>") part, the function does exactly what it is supposed to.
If, however, I keep it the way it is right now, it does some of what I want it to do.
The trouble occurs over some of the Word generated html that looks like this:
<p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto; margin-
left:.25in;text-align:justify;text-indent:-.25in;line-height:150%;
mso-list:l0 level1 lfo1;tab-stops:list .75in'>
<![if !supportLists]><span style='font-family:Symbol;mso-fareast-font-family:Symbol;mso-bidi-font-family:Symbol;color:black'><span style='mso-list:Ignore'>·
<span style='font:7.0pt "Times New Roman"'>
</span></span></span>
<![endif]><span style='font-family:"Arial","sans-serif";mso-fareast-font-family:Calibri;color:black'>
SOME TEXT.<span style='mso-spacerun:yes'> </span>SOME MORE TEXT.<span style='mso-spacerun:yes'> </span>EVEN MORE TEXT.
<span style='mso-spacerun:yes'> </span>BLAH BLAH BLAH.<o:p></o:p></span></p>
<p><o:p></o:p></p>
Notice the <o:p></o:p> in the last two lines..... This is not getting removed either when treated as plain text or if I write code for it in the function just like the divs and paragraphs as shown in the function above. When I run the function on this, I get
<p>
<![if !supportLists]><span>·
<span>
</span></span></span>
<![endif]><span>
SOME TEXT.<span> </span>SOME MORE TEXT.<span> </span>EVEN MORE TEXT.
<span> </span>BLAH BLAH BLAH.<o:p></o:p></span></p>
<p><o:p></o:p></p>
I have looked around but cannot find any information about whether javascript works the same on known html tags and on something like this that follows the principle of opening and closing tags but doesn't match known HTML tags!
Any ideas about a workaround would be greatly appreciated!
Javascript has no special processing of HTML tags in javascript strings. It honestly doesn't know anything about HTML in the string.
More likely your issue is trying to compare .innerHTML of a tag to a predetermined string. You cannot and should not do that because there is no guarentee for the format of .innerHTML. As there are hundreds of ways that the same HTML can be formatted and some browsers don't remember the original HTML, but reconstitue it when you ask for .innerHTML, you simply can't do that type of string comparison.
To be sure of your comparison, you will have to actually parse the HTML (at least with some sort of crude parser which perhaps could even be a regex) to see if it matches what you want because you can't rely on optional spacing or optional capitilization in a direct string comparison.
Or, perhaps even better, since your HTML is already parsed, why not just look at the actual HTML objects themselves and see if you have what you want there. You shouldn't even have to remove all those attributes then.
It's not Javascript that is unhappy with the unknown tags. It's the browser.
For JS it's simply a string. So, if it's a very specific case that you don't need <o:p> in particular then you could just remove it by running it with a regex itself.
para.replace(/<[/]?o:p>/ig, "");
But if there are many more, I would strongly suggest you to get familiar with XSLT transformation.
The first problem I encountered is that if I included the if (para=="<o:p></o:p>" || para=="<o:p> </o:p>" || para=="<o:p> </o:p>")
part in an else if statement, the whole function stopped executing.
This is because you cannot have else if after else.
Notice the <o:p></o:p> in the last two lines..... This is not getting removed
I cannot confirm that. When I run your function it removes the <o:p> inside the <p>, as it is supposed to. The <o:p> within the <span> is not processed, because your function does not do that.
If you want to remove all <o:p>s, try
[].forEach.call(document.querySelectorAll('o\\:p'), function (el) {
el.remove();
});
After that, you may want to remove empty <p>s like this
[].forEach.call(document.querySelectorAll('p'), function (el) {
if (!el.childNodes.length) {
el.remove();
}
});
Here is an example. Check the console for the result. The first two divs (not appended; above the <script> in the console) have the proper spacing and indention. However, the second two divs do not show the same formatting or white space as the original even though they are completely the same, but appended.
For example the input
var newElem = document.createElement('div');
document.body.appendChild(newElem);
var another = document.createElement('div');
newElem.appendChild(another);
console.log(document.body.innerHTML);
Gives the output
<div><div></div></div>
When I want it to look like
<div>
<div></div>
</div>
Is there any way to generate the proper white space between appended elements and retain that spacing when obtaining it using innerHTML (or a possible similar means)? I need to be able to visually display the hierarchy and structure of the page I'm working on.
I have tried appending it within an element that is in the actual HTML but it has the same behavior
I'd be okay with doing it using text nodes and line breaks as lincolnk suggested, but it needs to affect dynamic results, meaning I cannot use the same .createTextNode(' </br>') because different elements are in different levels of the hierarchy
No jQuery please
I think you're asking to be able to append elements to the DOM, such that the string returned from document.body.innerHTML will be formatted with indentation etc. as if you'd typed it into a text editor, right?
If so, something like this might work:
function indentedAppend(parent,child) {
var indent = "",
elem = parent;
while (elem && elem !== document.body) {
indent += " ";
elem = elem.parentNode;
}
if (parent.hasChildNodes() && parent.lastChild.nodeType === 3 && /^\s*[\r\n]\s*$/.test(parent.lastChild.textContent)) {
parent.insertBefore(document.createTextNode("\n" + indent), parent.lastChild);
parent.insertBefore(child, parent.lastChild);
} else {
parent.appendChild(document.createTextNode("\n" + indent));
parent.appendChild(child);
parent.appendChild(document.createTextNode("\n" + indent.slice(0,-2)));
}
}
demo: http://jsbin.com/ilAsAki/28/edit
I've not put too much thought into it, so you might need to play with it, but it's a starting point at least.
Also, i've assumed an indentation of 2 spaces as that's what you seemed to be using.
Oh, and you'll obviously need to be careful when using this with a <pre> tag or anywhere the CSS is set to maintain the whitespace of the HTML.
You can use document.createTextNode() to add a string directly.
var ft = document.createElement('div');
document.body.appendChild(ft);
document.body.appendChild(document.createTextNode(' '));
var another = document.createElement('div');
document.body.appendChild(another);
console.log(document.body.innerHTML);
I'm using javascript, jQuery and regex to add anchors (#hashtag) around all hashtags on the page. The regex detects things that are hashtags, and then I use jQuery to re-write the HTML and a javascript .replace() to add in the anchor tags. I also do a javascript if statement so it doesn't replace things inside of script and style tags.
var regExp = /(\W)#([a-zA-Z_]+)(\W)/gm;
var boxLink = "$1<a class='tagLink' onClick=\"doServer('#$2')\">#$2</a>$3"
$('body').children().each(function(){
if (($(this).get(0).tagName.toLowerCase() != 'style')
&& ($(this).get(0).tagName.toLowerCase() != 'script')
) {
$(this).html($(this).html().replace(regExp, boxLink));
}
});
});
Simple enough... right?
The problem is that I'm making a plugin, so developers will deploy this on their websites. The html rewrite ($(this).html($(this).html().replace(regExp, boxLink));) breaks seemingly random areas of javascript on websites. It also messes up some HTML structure sometimes. It's just a really messy thing to be doing on lots of different sites.
So rather then fix the re-write, I'd like to just find another way to do this. Is there any way I can accomplish the same thing (adding anchor tags around all hashtags on the page) without re-writing the entire HTML on the page each load?
If not, how can I tweak the javascript I have so it isn't so conflicting with javascript on people's sites.
This replaces every textnode with a hash tag on this page with:
<span>texts without hash <a name = "myplugin">#</a></span>
You can substitute the regex to match yours :)
var getTextNodesIn = function(el) {
$(el).find("*").andSelf().contents().each(function() {
var parentNode = this.parentNode.nodeName,
data = this.data;
if(this.nodeType == 3 && parentNode !== "SCRIPT" && parentNode !== "STYLE" && data.indexOf("#") > -1){
var anch = data.replace(/#/g,"#".anchor("myplugin"));
$(this).replaceWith("<span>"+anch+"<span/>");
}
});
};
getTextNodesIn(document.body);
P.S getTextNodesIn function was taken from this post :
https://stackoverflow.com/a/4399718/776575
I think part of the problem is that you need to isolate the text nodes and operate on those, not chunks of html. Your example only iterates across the direct children of body, but then tries to apply replacements to whatever html is within those children. This could easily cause existing markup and javascript to break.
Answers to question might be helpful: How do I select text nodes with jQuery?