I have a page with some html rendered into it.
I want to get the rendered page as text, but somehow also include the newlines. In addition, if relevant, I'm looking for an extended solution that will also support lists (using spaces and •), tables (using spaces, but with no borders) and similar cases.
I'm looking for Javascript solution, either on client or server side.
Please mind: not every element in the page equals to new line (e.g: some divs can be inline and some can create new lines).
For exapmle, this snippet below will be the html, and the output will be the text itself as you can see below (after running).
#inline{
display:flex;
flex-direction:row;
}
#inline div{
margin-right:5px;
}
#notInline{
display:flex;
flex-direction:column;
}
<div>
<div id='inline'><div>some</div><div>divs</div><div>inline</div></div>
<div id='notInline'><div>some</div><div>divs</div><div>on top of each other</div>
You can try this. First inline text second "on top of each other" text:
var inlineOutput = '';
document.querySelector('#inline').childNodes.forEach(e=>{inlineOutput += e.textContent + ' '}) + "\n";
console.log(inlineOutput);
var noInLineOutput = '';
document.querySelector('#notInline').childNodes.forEach(e=>{noInLineOutput += e.textContent + " \n"});
console.log(noInLineOutput);
There's a js scraper called Cheerio that could extract all the text out for you, I've never used it though. It gives you access to the DOM and you can gather parts of whichever page you need. here's a tutorial that uses it with node.
Not sure if this is what you're looking for, if they're your own pages you can probably make a function that calls everything in the dom and delimits at the open close carats and grabs in the text inbetween, and maybe make a switch if it sees the notInLine class
Related
I'm working on a full-stack project that is somehow loading inconsistent CSS styles on my anchor elements. Using Javascript I am doing something like the following:
recordData.forEach(record => {
let a = help.createElement('a');
let text = record.jobTitle + " (" + record.deptName + ", " + record.subDeptName + ")-" + record.email;
a.textContent = text;
a.href = `/frontend/contractorForm/contractorForm.html`;
a.addEventListener('click', function(event) {
sessionStorage.clear();
sessionStorage.setItem('record', JSON.stringify(record));
}, false);
parent.appendChild(a);
}
The idea of this was that although I have one single HTML form "template" created, I can populate the values inside contractorForm.html through values stored in my sessionStorage.
Below are my anchor tag frontend views and I also attached images of what happens when I click on other ones. The problem with this is that when I click on my anchor tags on the front end, this is what I get.
My CSS for contractorForm.html is basically display:flex; justify-content:center. But as shown in the images, only the first anchor link works.
Things I've checked and verified: CSS page does load when looking at Devtools, disabling and clearing cache, attaching ?version={random number} onto the .html href, changing style on devtools to see if it works (and it does), changing the background color (it works perfectly), loading my CSS code after bootstrap link, checking paths and links (all correct)
The only issue here is that my display: flex is simple just not working. Any help or ideas will be appreciated! Thank you!
anchortaghtmlview
first-anchortag-click-view
second-anchortag-click-view
Fixed it - if you have CSS issues, make sure you are referring correctly to the parent.
My CSS looked like this before:
#contractor-form-section {
width: 50rem;
}
#contractor-form is wrapping just the form element, and the way how my DOM structure looked like was something like this:
<body>
<div id="contractor-form-border">
<section id="contractor-form-section">
<h1>...</h1>
<form>
</form>
</section>
</div>
</body>
Essentially, my CSS was referring to #contractor-form-section rather than #contractor-form-border. Switching to that basically fixed the issue!
I have the following jQuery that mostly works:
$("article > p, article > div, article > ol > li, article > ul > li").contents().each(function() {
if (this.nodeType === 3) {
strippedValue = $.trim($(this).text());
doStuff(strippedValue);
}
if (this.nodeType === 1) {
strippedValue = $.trim($(this).html());
doStuff(strippedValue);
}
})
The problems comes when (inside doStuff()) I try to replace HTML tags. Here is a view of my elements:
And I'm trying to replace those <kbd> tags thusly:
newStr = newStr.replace(/<kbd>/g, " <b>");
newStr = newStr.replace(/<\/kbd>/g, "<b> ");
That doesn't work, and I'm seeing in the debugger that the <kbd> tags are seen as first-class children and looped separately. Whereas I want everything inside my selectors to be seen as a raw string so I can replace things. And I realize I'm asking for a contradiction, because .contents() means get children and their contents. So if I have a selector that is a direct parent of <kbd>, then <kdb> ceases to become a raw string and becomes instead a node that is being looped.
So it seems like my selectors are wrong BUT whenever I try to bring my selectors higher in the hierarchy, immediately I lose textual contents and I end up with a bunch of html with no contents inside the elements. (The screenshot shows good contents, as expected.)
So for example I tried this:
$("article").contents().each(function() {
...
}
...hoping that the selector looping would occur a little higher, and thus allow HTML tags further down to come through as raw text. But clearly I'm lost.
My objective is to simply perform a bunch of string replacements on the contents of the html. But there are two challenges with this:
The page contents load dynamically, with ajaxy calls or similar, so full contents are not available until about a second or two after page load.
When I try to grab high-level elements such as body, it ends up devoid of much of the textual contents. The selectors I currently have don't suffer from that problem; those get everything I want BUT then HTML/XML elements get looped instead of coming through as plain text so that I can perform replacements.
Why do you need to perform the modification on raw HTML? You could just replace the DOM elements directly (not to mention that this is much more reliable then using string replacement):
$('kbd').replaceWith(function() {
return ` <b>${this.textContent}</b> `;
// or directly create DOM elements:
// const b = document.createElement('b');
// b.textContent = this.textContent;
// return b;
});
console.log($('b').length);
<script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/3.3.1/jquery.min.js"></script>
<kbd>hello world</kbd>
Of course you can still do string replacements where it makes sense, but you should work with DOM elements as much as possible.
I'm well aware that span is an inline element and by making a separate span and simply using display:block on it would be the solution in most other cases. Mine is a bit more complicated since my code is tangled up in Javascript & semi-PHP code. So here's what I'm facing:
This part of the code is used in the template file (.tpl):
<p id="reduction_amount" {if !$product->specificPrice || $product->specificPrice.reduction_type != 'amount' || $product->specificPrice.reduction|floatval ==0} style="display:none"{/if}>
{strip}
<span id="reduction_amount_display">
{if $product->specificPrice && $product->specificPrice.reduction_type == 'amount' && $product->specificPrice.reduction|floatval !=0}
SAVE {convertPrice price=$productPriceWithoutReduction|floatval-$productPrice|floatval}
{/if}
</span>
{/strip}
</p>
..And this part inside a .js file:
if (combination.specific_price.reduction_type == 'amount') {
$('#reduction_amount_display').html('SAVE ' + formatCurrency(discountValue, currencyFormat, currencySign, currencyBlank));
$('#reduction_amount').show();
}
This works perfectly and this is how it looks like currently:
(note: the <p> element has a fixed width and height)
The thing is, I'd like to give a line break after the "SAVE" part and be able to style the amount shown with a bigger font size. Demonstration:
This could be, as I mentioned above, achieved by separating the two with two other spans and stylizing them to how I want it to look. However, as my knowledge in Javascript is quite limited I do not know the correct way to insert a span. I tried several ways, for example adding html += '<span class="blabla">' right before formatCurrency and a closing tag after the paranthesis but this kinda messed up everything in the design for some reason. I also tried without using html += and plainly with quotes and the same thing happened. What solution is there for this? It does not necessarily have to be Javascript code. It could be a CSS or an HTML solution as well.
EDIT:
Using <br> is unfortunately not plausible since I wouldn't be able to stylize the amount shown separately. Also, using that tag makes a huge gap that the amount goes way out of (overflows) the <p> element.
You can wrap the dollar value in an element created by jQuery using append(), and style it accordingly with a given class.
var amount = $('#reduction_amount_display');
amount.html('SAVE ');
amount.append('<div class="style-amount">$10</div>');
.style-amount {
font-size:32px;
}
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<p id="reduction_amount_display">
</p>
Basically, I want to be able to show title and some text on hover on all my index thumbnail like this website.
http://www.timboelaars.nl/
However, in the current squarespace template that I am using (I believe it's called York), the markup is only grabbing the page title and therefore displaying the page title on hover. (See the below code block, you can see the page title in there, that's the only thing that the template displays on Hover)
<div class="index-item-text-wrapper">
<h2 class="index-item-title">
<a class="index-item-title-link" href="/google-shopping/" data-ajax-loader="ajax-loader-binded"><span class="index-item-title-text">**PAGE TITLE**</span></a>
</h2>
</div>
There's no field for me to put any HTML so I am seeking help to use javascript to manually inject custom HTML markup to every single thumbnail, then show them on hover.
TL;DR I want to be able to display more than just the title on hover (ideally my own HTML markup so I can customize the style) on my thumbnails but that's not supported by the template.
Here is my website http://shensamuel.com/
I am really weak at Javascript and I've searched for a solution for this problem for quite long. Any help will be much appreciated!
Thanks!
The following Javascript can be used to insert text for each tile on the page. The code would be inserted using the footer code injection area (unless you're using Developer Mode in which case you'd insert it with the rest of your scripts).
<script>
(function() {
var tiles = document.getElementsByClassName('index-section');
var thisTile;
var titleText;
var description;
var parent;
var i, I;
for (i=0, I=tiles.length; i<I; i++) {
thisTile = tiles[i];
titleText = thisTile.getElementsByClassName('index-item-title-text')[0];
parent = thisTile.getElementsByClassName('index-item-text-wrapper')[0];
description = document.createElement('span');
description.className = 'index-item-description-text';
switch(titleText.innerHTML.toLowerCase()) {
case "google shopping":
description.innerHTML = "Some custom text.";
break;
case "hana":
description.innerHTML = "More text that's custom.";
break;
case "wali":
description.innerHTML = "Custom text here.";
break;
case "cypress":
description.innerHTML = "Type anything you want.";
break;
case "ryde":
description.innerHTML = "Just another bit of text.";
break;
default:
description.innerHTML = "";
}
parent.appendChild(description);
}
})();
</script>
Observe the pattern in the code in order to add new tiles or edit existing ones. You will see that the script attempts to match (a lower case version of) the 'title' text and then inserts text based on each title. This allows you to add more in the future by repeating this 'case' pattern. Of course if you ever change the title of a tile you'd have to correspondingly change this Javascript code.
You can then style the description by inserting CSS via the Squarespace CSS Editor (or via your base.less file if using Developer Mode). For example:
.index-item-description-text {
display: block;
font-size: 1.2em;
color: #FFFFFF
}
Note that while there is an alternative method that would use each tile's respective URL to do an AJAX query and obtain meta data about each project (and therefore allow you to use the Squarespace content manager to insert this 'description'), that method seems unnecessarily complex for your case.
Update 8/17/2016: Regarding AJAX and how to disable AJAX loader in Squarespace: Jason Barone has suggested adding this snippet to your Code Injection > Footer to disable the "AJAX" pageloader. He noted that it will disable the smooth, AJAX transitions between pages, but will allow custom Javascript like usual.
<script>
//Credit: Jason Barone, http://jasonbarone.com/
window.Template.Constants.AJAXLOADER = false;
</script>
Also, some templates have an option to disable AJAX within the style editor (image credit: SSSUPERS):
Update 9/28/2016:
It has been reported that the code provided above no longer disable AJAX. However, some newer templates have added an 'Enable AJAX Loading' setting that can be toggled off.
I am tasked with converting hundreds of Word document pages into a knowledge base html application. This means copying and pasting the HTML of the word document into an editor like Notepad++ and cleaning it up. (Since it is internal document I need to convert, I cannot use online converters).
I have been able to do most of what I need with a javascript function that works "onload" of the body tag. I then copy the resulting HTML into my application framework.
Here is part of the function I wrote: (it shows only code for removing attributes of div and p tags but works for all html tags in the document)
function removeatts() //this function will remove all attributes from all elements and also remove empty span elements
{//for removing div tag attributes
var divs=document.getElementsByTagName('div'); //look at all div tags
var divnum=divs.length; //number of div tags on the page
for (var i=0; i<divnum; i++) //run through all the div tags
{//remove attributes for each div tag
divs[i].removeAttribute("class");
divs[i].removeAttribute("id");
divs[i].removeAttribute("name");
divs[i].removeAttribute("style");
divs[i].removeAttribute("lang");
}
//for removing p tag attributes
var ps=document.getElementsByTagName('p'); //look at all p tags
var pnum=ps.length; //number of p tags on the page
for (var i=0; i<pnum; i++) //run through all the p tags
{//remove attributes for each p tag
var para=ps[i].innerHTML;
if (para.length!==0) //ie if there is content inside the p tag
{
ps[i].removeAttribute("class");
ps[i].removeAttribute("id");
ps[i].removeAttribute("name");
ps[i].removeAttribute("style");
ps[i].removeAttribute("lang");
}
else
{//remove empty p tag
ps[i].remove() ;
}
if (para=="<o:p></o:p>" || para=="<o:p> </o:p>" || para=="<o:p> </o:p>")
{
ps[i].remove() ;
}
}
The first problem I encountered is that if I included the if (para=="<o:p></o:p>" || para=="<o:p> </o:p>" || para=="<o:p> </o:p>") part in an else if statement, the whole function stopped executing.
However, without the if (para=="<o:p></o:p>" || para=="<o:p> </o:p>" || para=="<o:p> </o:p>") part, the function does exactly what it is supposed to.
If, however, I keep it the way it is right now, it does some of what I want it to do.
The trouble occurs over some of the Word generated html that looks like this:
<p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto; margin-
left:.25in;text-align:justify;text-indent:-.25in;line-height:150%;
mso-list:l0 level1 lfo1;tab-stops:list .75in'>
<![if !supportLists]><span style='font-family:Symbol;mso-fareast-font-family:Symbol;mso-bidi-font-family:Symbol;color:black'><span style='mso-list:Ignore'>·
<span style='font:7.0pt "Times New Roman"'>
</span></span></span>
<![endif]><span style='font-family:"Arial","sans-serif";mso-fareast-font-family:Calibri;color:black'>
SOME TEXT.<span style='mso-spacerun:yes'> </span>SOME MORE TEXT.<span style='mso-spacerun:yes'> </span>EVEN MORE TEXT.
<span style='mso-spacerun:yes'> </span>BLAH BLAH BLAH.<o:p></o:p></span></p>
<p><o:p></o:p></p>
Notice the <o:p></o:p> in the last two lines..... This is not getting removed either when treated as plain text or if I write code for it in the function just like the divs and paragraphs as shown in the function above. When I run the function on this, I get
<p>
<![if !supportLists]><span>·
<span>
</span></span></span>
<![endif]><span>
SOME TEXT.<span> </span>SOME MORE TEXT.<span> </span>EVEN MORE TEXT.
<span> </span>BLAH BLAH BLAH.<o:p></o:p></span></p>
<p><o:p></o:p></p>
I have looked around but cannot find any information about whether javascript works the same on known html tags and on something like this that follows the principle of opening and closing tags but doesn't match known HTML tags!
Any ideas about a workaround would be greatly appreciated!
Javascript has no special processing of HTML tags in javascript strings. It honestly doesn't know anything about HTML in the string.
More likely your issue is trying to compare .innerHTML of a tag to a predetermined string. You cannot and should not do that because there is no guarentee for the format of .innerHTML. As there are hundreds of ways that the same HTML can be formatted and some browsers don't remember the original HTML, but reconstitue it when you ask for .innerHTML, you simply can't do that type of string comparison.
To be sure of your comparison, you will have to actually parse the HTML (at least with some sort of crude parser which perhaps could even be a regex) to see if it matches what you want because you can't rely on optional spacing or optional capitilization in a direct string comparison.
Or, perhaps even better, since your HTML is already parsed, why not just look at the actual HTML objects themselves and see if you have what you want there. You shouldn't even have to remove all those attributes then.
It's not Javascript that is unhappy with the unknown tags. It's the browser.
For JS it's simply a string. So, if it's a very specific case that you don't need <o:p> in particular then you could just remove it by running it with a regex itself.
para.replace(/<[/]?o:p>/ig, "");
But if there are many more, I would strongly suggest you to get familiar with XSLT transformation.
The first problem I encountered is that if I included the if (para=="<o:p></o:p>" || para=="<o:p> </o:p>" || para=="<o:p> </o:p>")
part in an else if statement, the whole function stopped executing.
This is because you cannot have else if after else.
Notice the <o:p></o:p> in the last two lines..... This is not getting removed
I cannot confirm that. When I run your function it removes the <o:p> inside the <p>, as it is supposed to. The <o:p> within the <span> is not processed, because your function does not do that.
If you want to remove all <o:p>s, try
[].forEach.call(document.querySelectorAll('o\\:p'), function (el) {
el.remove();
});
After that, you may want to remove empty <p>s like this
[].forEach.call(document.querySelectorAll('p'), function (el) {
if (!el.childNodes.length) {
el.remove();
}
});