Replacing Text with replacetext() and defining said replacement as heading - javascript

I have a Google spreadsheet and a Google document. The document is a report which gets filled by the spreadsheet. The spreadsheet is also defining what comes into the report. Therefore I have a script, which gathers a bunch of placeholders depending on values in the the document.
After all the placeholders have been inserted in the document (there are a couple of pages before that) it looks kind of like this:
{{header1.1}}
{{text1.1}}//this is already a couple lines of text
{{table1.1}}
{{table.dir}}
{{blob1.1}}
{{blob.dir}}
I already have a script, which inserts all the text parts and I have set up a script, which should be capable of writing the tables at the correct position. So far I can replace the {{header1.1}}, but if I try to define it as a heading it works, but everything after the header1.1 is also a heading
I've been at this problem for quite a while and didn't get and its always one step forward one step back. Also this is my first question after a couple of years just reading on stackoverflow. I'd appreciate if someone could help.
function myUeberschriftenboi() {
doc = DocumentApp.openById('someID');
console.log(doc.getName());
var body = doc.getBody();
//formate
const plain3style = {};
plain3style[DocumentApp.Attribute.HEADING] = DocumentApp.ParagraphHeading.HEADING3;
var lvl2array = [ "{{header1.1}}" , "{{header1.2}}" ];
var fill2array = [ "Energy" , "Energyflow" ]
var lvl2count = 1;
for( var j = 0 ; j < lvl2array.length ; j++)
{
var seek = body.findText(lvl2array[j]);
if( seek != null)
{
body.replaceText(lvl2array[j] , "1.1."+lvl2count+" "+fill2array[j]+"\n");
var seek2 = body.findText("1."+lvl2count+" "+fill2array[j]);
seek2.getElement().getParent().getChild().setAttributes(plain3style);
lvl2count++;
}}}

Related

Grab data from website HTML table and transfer to Google Sheets using App-Script

Ok, I know there are similar questions out there to mine, but so far I have yet to find any answers that work for me. What I am trying to do is gather data from an entire HTML table on the web (https://www.sports-reference.com/cbb/schools/indiana/2022-gamelogs.html) and then parse it/transfer it to a range in my Google Sheet. The code below is probably the closest thing I've found so far because at least it doesn't error out, but it will only find one string or value, not the whole table. I've found other answers where they use xmlservice.parse, however that doesn't work for me, I believe because the HTML format has issues that it can't parse. Does anyone have an idea of how to edit what I have below, or a whole new idea that may work for this website?
function SAMPLE() {
const url="http://www.sports-reference.com/cbb/schools/indiana/2022-gamelogs.html#sgl-basic?"
// Get all the static HTML text of the website
const res = UrlFetchApp.fetch(url, {muteHttpExceptions: true}).getContentText();
// Find the index of the string of the parameter we are searching for
index = res.search("td class");
// create a substring to only get the right number values ignoring all the HTML tags and classes
sub = res.substring(index+92,index+102);
Logger.log(sub);
return sub;
}
I understand that I can use importHTML natively in a Google Sheet, and that's what I'm currently doing. However I am doing this for over 350 webpage tables, and iterating through each one to load it and then copy the value to another sheet. App Script bogs down quite a bit when it is repeatedly waiting on Sheets to load an importHTMl and then grab some data and do it all over again on another url. I apologize for any formatting issues in this post or things I've done wrong, this is my first time posting here.
Edit: ok, I've found a method that works, but it's still much slower than I would like, because it is using Drive API to create a document with the HTML data and then parse and create an array from there. The Drive.Files.Insert line is the most time consuming part. Anyone have an idea of how to make this quicker? It may not seem that slow to you right now, but when I need to do this 350 times, it adds up.
function parseTablesFromHTML() {
var html = UrlFetchApp.fetch("https://www.sports-reference.com/cbb/schools/indiana/2022-gamelogs.html");
var docId = Drive.Files.insert(
{ title: "temporalDocument", mimeType: MimeType.GOOGLE_DOCS },
html.getBlob()
).id;
var tables = DocumentApp.openById(docId)
.getBody()
.getTables();
var res = tables.map(function(table) {
var values = [];
for (var row = 0; row < table.getNumRows(); row++) {
var temp = [];
var cols = table.getRow(row);
for (var col = 0; col < cols.getNumCells(); col++) {
temp.push(cols.getCell(col).getText());
}
values.push(temp);
}
return values;
});
Drive.Files.remove(docId);
var range=SpreadsheetApp.getActive().getSheetByName("Test").getRange(3,6,res[0].length,res[0][0].length);
range.setValues(res[0]);
SpreadsheetApp.flush();
}
Solution by formula
Try
=importhtml(url,"table",1)
Other solution by script
function importTableHTML() {
var url = 'https://www.sports-reference.com/cbb/schools/indiana/2022-gamelogs.html'
var html = '<table' + UrlFetchApp.fetch(url, {muteHttpExceptions: true}).getContentText().replace(/(\r\n|\n|\r|\t| )/gm,"").match(/(?<=\<table).*(?=\<\/table)/g) + '</table>';
var trs = [...html.matchAll(/<tr[\s\S\w]+?<\/tr>/g)];
var data = [];
for (var i=0;i<trs.length;i++){
var tds = [...trs[i][0].matchAll(/<(td|th)[\s\S\w]+?<\/(td|th)>/g)];
var prov = [];
for (var j=0;j<tds.length;j++){
donnee=tds[j][0].match(/(?<=\>).*(?=\<\/)/g)[0];
prov.push(stripTags(donnee));
}
data.push(prov);
}
return(data);
}
function stripTags(body) {
var regex = /(<([^>]+)>)/ig;
return body.replace(regex,"");
}

Exception: Document is missing (perhaps it was deleted, or you don't have read access?)

I'm working on a project that take "profiles" stored in a Google Sheet, makes a unique Google Doc for each profile, and then updates the unique Google Doc with any new information when you push a button on the Google Sheet.
I have some other automations built into my original code, but I simplified most of it to what's pertinent to the error I'm getting, which is this:
Exception: Document is missing (perhaps it was deleted, or you don't have read access?
It happens on Line 52 of my script in the fileUpdate funtion. Here's the appropriate line for reference:
var file = DocumentApp.openById(fileName);
And this is the rest of my code:
function manageFiles() {
//Basic setup. Defining the range and retrieving the spreadsheet to store as an array.
var date = new Date();
var sheet = SpreadsheetApp.getActiveSpreadsheet().getActiveSheet();
var array = sheet.getDataRange().getValues();
var arrayL = sheet.getLastRow();
var arrayW = sheet.getLastColumn();
for (var i = 1; i < arrayL; i++) {
if (array[i][arrayW-2] == "") {
//Collect the data from the current sheet.
//Create the document and retrieve some information from it.
var docTitle = array[i , 0]
var doc = DocumentApp.create(docTitle);
var docBody = doc.getBody();
var docLink = doc.getUrl();
//Use a for function to collect the unique data from each cell in the row.
docBody.insertParagraph(0 , "Last Updated: "+date);
for (var j = 2; j <= arrayW; j++) {
var colName = array[0][arrayW-j];
var data = array[i][arrayW-j];
if (colName !== "Filed?") {
docBody.insertParagraph(0 , colName+": "+data);
}
}
//Insert a hyperlink to the file in the cell containing the SID
sheet.getRange(i+1 , 1).setFormula('=HYPERLINK("'+docLink+'", "'+SID+'")');
//Insert a checkbox and check it.
sheet.getRange(i+1 , arrayW-1).insertCheckboxes();
sheet.getRange(i+1 , arrayW-1).setFormula('=TRUE');
}
else if (array[i][arrayW-2] !== "") {
updateFiles(i);
}
}
sheet.getRange(1 , arrayW).setValue('Last Update: '+date);
}
//Note: I hate how cluttered updateFiles is. I'm going to clean it up later.
function fileUpdate(rowNum) {
//now you do the whole thing over again from createFiles()
//Basic setup. Defining the range and retrieving the spreadsheet to store as an array.
var date = new Date();
var ss = SpreadsheetApp.getActiveSpreadsheet();
var sheet = ss.getActiveSheet();
var array = sheet.getDataRange().getValues();
var arrayL = sheet.getLastRow();
var arrayW = sheet.getLastColumn();
//Collect the data from the current sheet.
var fileName = array[rowNum][0];
var file = DocumentApp.openById(fileName);
//retrieve the body of the document and clear the text, making it blank.
file.getBody().setText("");
//Use a for function to collect the the unique date from every non-blank cell in the row.
file.getBody().insertParagraph(0 , "Last Updated: "+date);
for (var j = 2; j <= arrayW; j++) {
var colName = array[0][arrayW-j];
var data = array[rowNum][arrayW-j];
file.getBody().insertParagraph(0 , colName+": "+data);
}
}
If you'd like to take a look at my sample spreadsheet, you can see it here. I suggest you make a copy though, because you won't have permissions to the Google Docs my script created.
I've looked at some other forums with this same error and tried several of the prescribed solutions (signing out of other Google Accounts, clearing my cookies, completing the URL with a backslash, widening permissions to everyone with the link), but to no avail.
**Note to anyone offended by my janky code or formatting: I'm self-taught, so I do apologize if my work is difficult to read.
The problem (in the updated code attached to your sheet) comes from your URL
Side Note:
In your initial question, you define DocumentApp.openById(fileName);
I assume your realized that this is not correct, since you updated
your code to DocumentApp.openByUrl(docURL);, so I will discuss the
problem of the latter in the following.
The URLs in your sheet are of the form
https://docs.google.com/open?id=1pT5kr7V11TMH0pJea281VhZg_1bOt8YDRrh9thrUV0w
while DocumentApp.openByUrl expects a URL of form
https://docs.google.com/document/d/1pT5kr7V11TMH0pJea281VhZg_1bOt8YDRrh9thrUV0w/
Just adding a / is not enough!
Either create the expected URL manually, or - much easier / use the method DocumentApp.openById(id) instead.
For this, you can extract the id from your URL as following:
var id = docURL.split("https://docs.google.com/open?id=")[1];
var file = DocumentApp.openById(id)

Looping through InDesign file, placing graphic on each page

I've set-up a document that contains identical structures, including a text box labeled "pageNumber" to hold a variable based on the page number, an empty rectangle to receive an illustrator file, and a text box filled with placeholder text.
Eventually, I'll be using data-merge to create a much longer document, comprised of identically structured pages, but containing personalized data.
(I created a zip file containing these elements, in case it would be helpful to see what I'm testing)( http://arthousemedia.com/files/PageNumbers.zip )
The jsx I've created works to an extent. When I run it, there are a couple alerts that show I've created the right variables) But instead of placing pageNumber-p1.ai on page one and pageNumber-p2 on the second page, whichever page I'm "focused" upon gets its graphic placed twice on page 1. Apparently, I'm not succeeding in looping through pages. Not sure.
Any advice you have would be helpful. Thanks.
var doc = app.activeDocument,
pagesLength = doc.pages.length;
for (var i = 0; i < pagesLength; i++) {
var myLabel = "pageNumber",
myPage = app.properties.activeWindow && app.activeWindow.activePage,
myTextFrames = myPage.textFrames.everyItem().getElements(myLabel).slice(0),
l = myTextFrames.length,
myVariable
alert(myVariable);
while (l--) {
if (myTextFrames[l].label != myLabel) continue;
myVariable = myTextFrames[l].contents;
break;
}
var thisState = "~/Documents/PageNumbers/pageNumber-" + myVariable + ".ai";
alert(thisState);
var rect = doc.pages[i].rectangles[0];
try {
rect.place(File(thisState));
rect.fit (FitOptions.CONTENT_TO_FRAME);
rect.fit (FitOptions.PROPORTIONALLY);
rect.fit (FitOptions.CENTER_CONTENT);
}
catch(e){
alert(e);
};
}

Javascript performance optimization

I created the following js function
function csvDecode(csvRecordsList)
{
var cel;
var chk;
var chkACB;
var chkAF;
var chkAMR;
var chkAN;
var csvField;
var csvFieldLen;
var csvFieldsList;
var csvRow;
var csvRowLen = csvRecordsList.length;
var frag = document.createDocumentFragment();
var injectFragInTbody = function () {tblbody.replaceChild(frag, tblbody.firstElementChild);};
var isFirstRec;
var len;
var newEmbtyRow;
var objCells;
var parReEx = new RegExp(myCsvParag, 'ig');
var tblbody;
var tblCount = 0;
var tgtTblBodyID;
for (csvRow = 0; csvRow < csvRowLen; csvRow++)
{
if (csvRecordsList[csvRow].startsWith(myTBodySep))
{
if (frag.childElementCount > 0)
{
injectFragInTbody();
}
tgtTblBodyID = csvRecordsList[csvRow].split(myTBodySep)[1];
newEmbtyRow = getNewEmptyRow(tgtTblBodyID);
objCells = newEmbtyRow.cells;
len = newEmbtyRow.querySelectorAll('input')[0].parentNode.cellIndex; // Finds the cell index where is placed the first input (Check-box or button)
tblbody = getElById(tgtTblBodyID);
chkAF = toBool(tblbody.dataset.acceptfiles);
chkACB = toBool(tblbody.dataset.acceptcheckboxes) ;
chkAN = toBool(tblbody.dataset.acceptmultiplerows) ;
tblCount++;
continue;
}
csvRecordsList[csvRow] = csvRecordsList[csvRow].replace(parReEx, myInnerHTMLParag); // Replaces all the paragraph symbols ΒΆ used into the db.csv file with the tag <br> needed into the HTML content of table cells, this way will be possible to use line breaks into table cells
csvFieldsList = csvRecordsList[csvRow].split(myEndOfFld);
csvFieldLen = csvFieldsList.length;
for (csvField = 0; csvField < csvFieldLen; csvField++)
{
cel = chkAN ? csvField + 1 : csvField;
if (chkAF && cel === 1) {objCells[cel].innerHTML = makeFileLink(csvFieldsList[csvField]);}
else if (chkACB && cel === len) {objCells[cel].firstChild.checked = toBool(csvFieldsList[csvField]);}
else {objCells[cel].innerHTML = csvFieldsList[csvField];}
}
frag.appendChild(newEmbtyRow.cloneNode(true));
}
injectFragInTbody();
var recNum = getElById(tgtTblBodyID).childElementCount;
customizeHtmlTitle();
return csvRow - tblCount + ' (di cui '+ recNum + ' record di documenti)';
}
More than 90% of records could contain file names that have to be processed by the following makeFileLink function:
function makeFileLink(fname)
{
return ['<a href="', dirDocSan, fname, '" target="', previewWinName, '" title="Apri il file allegato: ', fname, '" >', fname, '</a>'].join('');
}
It aims to decode a record list from a special type of *.db.csv file (= a comma-separated values where commas are replaced by another symbol I hard-coded into the var myEndOfFld). (This special type of *.db.csv is created by another function I wrote and it is just a "text" file).
The record list to decode and append to HTML tables is passed to the function with its lone parameter: (csvRecordsList).
Into the csv file is hosted data coming from more HTML tables.
Tables are different for number of rows and columns and for some other contained data type (which could be filenames, numbers, string, dates, checkbox values).
Some tables could be just 1 row, others accept more rows.
A row of data has the following basic structure:
data field content 1|data field content 2|data field content 3|etc...
Once decoded by my algorithm it will be rendered correctly into the HTML td element even if into a field there are more paragraphs. In fact the tag will be added where is needed by the code:
csvRecordsList[csvRow].replace(par, myInnerHTMLParag)
that replaces all the char I choose to represent the paragraph symbol I have hard-coded into the variable myCsvParag.
Isn't possible to know at programming time the number of records to load in each table nor the number of records loaded from the CSV file, nor the number of fields of each record or what table field is going to contain data or will be empty: in the same record some fields could contain data others could be empty. Everything has to be discovered at runtime.
Into the special csv file each table is separated from the next by a row witch contains just a string with the following pattern: myTBodySep = tablebodyid where myTBodySep = "targettbodydatatable" that is just a hard coded string of my choice.
tablebodyid is just a placeholder that contains a string representing the id of the target table tbody element to insert new record in, for example: tBodyDataCars, tBodyDataAnimals... etc.
So when the first for loop finds into the csvRecordsList a string staring with the string into the variable myTBodySep it gets the tablebodyid from the same row: this will be the new tbodyid that has to be targeted for injecting next records in it
Each table is archived into the CSV file
The first for loop scan the csv record list from the file and the second for loop prepare what is needed to compile the targeted table with data.
The above code works well but it is a little bit slow: in fact to load into the HTML tables about 300 records from the CSV file it takes a bit more of 2.5 seconds on a computer with 2 GB ram and Pentium core 2 4300 dual-core at 1800 MHz but if I comment the row that update the DOM the function needs less than 0.1 sec. So IMHO the bottle neck is the fragment and DOM manipulating part of the code.
My aim and hope is to optimize the speed of the above code without losing functionalities.
Notice that I'm targeting just modern browsers and I don't care about others and non standards-compliant browsers... I feel sorry for them...
Any suggestions?
Thanks in advance.
Edit 16-02.2018
I don't know if it is useful but lastly I've noticed that if data is loaded from browser sessionstorage the load and rendering time is more or less halved. But strangely it is the exact same function that loads data from both file and sessionstorage.
I don't understand why of this different behavior considering that the data is exactly the same and in both cases is passed to a variable handled by the function itself before starting checking performance timing.
Edit 18.02.2018
Number of rows is variable depending on the target table: from 1 to 1000 (could be even more in particular cases)
Number of columns depending on the target table: from 10 to 18-20
In fact, building the table using DOM manipulations are way slower than simple innerHTML update of the table element.
And if you tried to rewrite your code to prepare a html string and put it into the table's innerHTML you would see a significant performance boost.
Browsers are optimized to parse the text/html which they receive from the server as it's their main purpose. DOM manipulations via JS are secondary, so they are not so optimized.
I've made a simple benchmark for you.
Lets make a table 300x300 and fill 90000 cells with 'A'.
There are two functions.
The first one is a simplified variant of your code which uses DOM methods:
var table = document.querySelector('table tbody');
var cells_in_row = 300, rows_total = 300;
var start = performance.now();
fill_table_1();
console.log('using DOM methods: ' + (performance.now() - start).toFixed(2) + 'ms');
table.innerHTML = '<tbody></tbody>';
function fill_table_1() {
var frag = document.createDocumentFragment();
var injectFragInTbody = function() {
table.replaceChild(frag, table.firstElementChild)
}
var getNewEmptyRow = function() {
var row = table.firstElementChild;
if (!row) {
row = table.insertRow(0);
for (var c = 0; c < cells_in_row; c++) row.insertCell(c);
}
return row.cloneNode(true);
}
for (var r = 0; r < rows_total; r++) {
var new_row = getNewEmptyRow();
var cells = new_row.cells;
for (var c = 0; c < cells_in_row; c++) cells[c].innerHTML = 'A';
frag.appendChild(new_row.cloneNode(true));
}
injectFragInTbody();
return false;
}
<table><tbody></tbody></table>
The second one prepares html string and put it into the table's innerHTML:
var table = document.querySelector('table tbody');
var cells_in_row = 300, rows_total = 300;
var start = performance.now();
fill_table_2();
console.log('setting innerHTML: ' + (performance.now() - start).toFixed(2) + 'ms');
table.innerHTML = '<tbody></tbody>';
function fill_table_2() {// setting innerHTML
var html = '';
for (var r = 0; r < rows_total; r++) {
html += '<tr>';
for (var c = 0; c < cells_in_row; c++) html += '<td>A</td>';
html += '</tr>';
}
table.innerHTML = html;
return false;
}
<table><tbody></tbody></table>
I believe you'll come to some conclusions.
I've got two thoughts for you.
1: If you want to know which parts of your code are (relatively) slow you can do very simple performance testing using the technique described here. I didn't read all of the code sample you gave but you can add those performance tests yourself and check out which operations take more time.
2: What I know of JavaScript and the browser is that changing the DOM is an expensive operation, you don't want to change the DOM too many times. What you can do instead is build up a set of changes and then apply all those changes with one DOM change. This may make your code less nice, but that's often the tradeoff you have when you want to have high performance.
Let me know how this works out for you.
You should start by refactoring your code in multiples functions to make it a bit more readable. Make sure that you are separating DOM manipulation functions from data processing functions. Ideally, create a class and get those variables out of your function, this way you can access them with this.
Then, you should execute each function processing data in a web worker, so you're sure that your UI won't get blocked by the process. You won't be able to access this in a web worker so you will have to limit it to pure "input/output" operations.
You can also use promises instead of homemade callbacks. It makes the code a bit more readable, and honestly easier to debug. You can do some cool stuff like :
this.processThis('hello').then((resultThis) => {
this.processThat(resultThis).then((resultThat) => {
this.displayUI(resultThat);
}, (error) => {
this.errorController.show(error); //processThat error
});
}, (error) => {
this.errorController.show(error); //processThis error
});
Good luck!

Extracting img urls from webpage using Google Apps Script

This is an Apps Script that goes through a webpage and collects img urls that are inside some div of a special class.
function getIMGs(url){
var url = 'url'
var result = UrlFetchApp.fetch(url);
if (result.getResponseCode() == 200) {
var doc = Xml.parse(result, true);
var bodyHtml = doc.html.body.toXmlString();
var doc = XmlService.parse(bodyHtml);
var html = doc.getRootElement();
var thumbs = getElementsByClassName(html, 'thumb');
var sheet = SpreadsheetApp.getActiveSheet();
for (i in Thumbs) {
var output = '';
var linksInMenu = getElementsByTagName(thumbs[i], 'img');
for(i in linksInMenu) {
output += XmlService.getRawFormat().format(linksInMenu[i]);
}
var linkRegExp = /data-src="(.*?)"/;
var dataSrc = linkRegExp.exec(output);
sheet.appendRow([dataSrc[1]]);
}
}
So first the code gets the html, and uses an auxiliary function to get certain elements, which look like this:
<div class="thumb"><div class="loader"><span class="icon-uniE611"></span></div><img src="//xxx" data-src="https://xxx/8491a83b1cacc2401907997b5b93e433c03c91f.JPG" data-target="#image-slider" data-slide-to="0"></div>
Then the code gets the img elements, and finally extracts the data-src address via RegExp.
While this kinda works, I have a problem:
1) After 9 loops it crashes, on the appendRow line, as the last 4 Thumbs elements don't have data-src, hence what i'm trying to write into the spreadsheet is null.
Any solution for this? I have fixed it for the moment by just doing 9 iterations only of the For loop, but this is far from optimal, as it's not automated and required me to go through the page to count the elements with data-src.
Also, any suggestion of a more elegant solution will be appreciated! I will be really grateful for any helping hand!
Cheers

Categories

Resources