Object - An Object in the Object - An array of those Objects - javascript

I'm new to javascript so let me just say that right up front.
A web site I frequent has 50 or so items, with details about that item, in a table. Each table row contains several td cells. Some rows have types of things that are similar, like USB drives or whatever. I want to capture each row so that I can group and reorder them to suit my tastes.
I have this object:
function vnlItemOnPage(){
this.Category = "unknown";
this.ItemClass = "vnlDefaultClass";
this.ItemBlock = {};
}
This represents one row.
What I've been trying to figure out is how to capture the block of html < tr>stuff< /tr> and save it into this.ItemBlock.
That part is pretty easy:
vnlItemOnPage.ItemBlock = element.getElementByClassName('className')[0]
?
That seems pretty straight forward. Am I missing something?
This part I am stuck:
There'll be 50 of them so I need an array of vnlItemOnPage?
vnlAllItems = ???
var vnlAllItems = [vnlItemOnPage]?
And, how would I add to the array and delete from the array? I probably wont delete from the array if that is complicated don't bother with it.
Once I capture the < tr> html, I can just append it to a table element like so:
myTable.appendChild(vnlAllItems[0].ItemBlock);
Correct?
I'm open to any suggestions if you think I'm approaching this from the wrong direction. Performance is not a big issue - at least right now. Later I may try to conflate several pages for a couple hundred items.
Thanks for your assistance!
[edit]
Perhaps the second part of the question is so basic it's hard to believe I don't know the answer.
The array could be: var vnlAllItems = []
And then it is just:
var row1 = new vnlItemOnPage;
vnlAllItems.push(row1);
var row2 = new vnlItemOnPage;
row2.ItemBlock = element.getElementByClassName('className')[0];
I'd like to close the question but I hate to do that without something about handling the array.

JQuery is your friend here.
This will give you the inner HTML for the first row in the body of your desired table:
var rowHtml = $('table#id-of-desired-table tbody tr:first').html() ;
To get the outer HTML, you need a jQuery extension method:
jQuery.fn.outerHTML = function() {
return $('<div>').append( this.eq(0).clone() ).html();
};
Usage is simple:
var rowHtml = $('table#id-of-desired-table tbody tr:first').outerHtml() ;
Enjoy!

Not sure if it is what you are looking for, but if I wanted to manipulate table rows I would store:
Row's whole html <td>1</td>...<td>n</td> as string so I can quickly reconstruct the row
For each row store actual cell values [1, ..., n], so I can do some manipulations with values (sort)
To get row as html you can use:
var rowHtml = element.getElementByClassName('className')[0].innerHTML;
To get array of cell values you can use:
var cells = [];
var cellElements = element.getElementByClassName('className')[0].cells;
for(var i=0;i<cellElements.length;i++) {
cells.push(cellElements[i].innerText);
}
So the object to store all this would look something like:
function vnlItemOnPage(){
this.Category = "unknown";
this.ItemClass = "vnlDefaultClass";
this.RowHtml = "";
this.RowCells = [];
}

Related

Grab data from website HTML table and transfer to Google Sheets using App-Script

Ok, I know there are similar questions out there to mine, but so far I have yet to find any answers that work for me. What I am trying to do is gather data from an entire HTML table on the web (https://www.sports-reference.com/cbb/schools/indiana/2022-gamelogs.html) and then parse it/transfer it to a range in my Google Sheet. The code below is probably the closest thing I've found so far because at least it doesn't error out, but it will only find one string or value, not the whole table. I've found other answers where they use xmlservice.parse, however that doesn't work for me, I believe because the HTML format has issues that it can't parse. Does anyone have an idea of how to edit what I have below, or a whole new idea that may work for this website?
function SAMPLE() {
const url="http://www.sports-reference.com/cbb/schools/indiana/2022-gamelogs.html#sgl-basic?"
// Get all the static HTML text of the website
const res = UrlFetchApp.fetch(url, {muteHttpExceptions: true}).getContentText();
// Find the index of the string of the parameter we are searching for
index = res.search("td class");
// create a substring to only get the right number values ignoring all the HTML tags and classes
sub = res.substring(index+92,index+102);
Logger.log(sub);
return sub;
}
I understand that I can use importHTML natively in a Google Sheet, and that's what I'm currently doing. However I am doing this for over 350 webpage tables, and iterating through each one to load it and then copy the value to another sheet. App Script bogs down quite a bit when it is repeatedly waiting on Sheets to load an importHTMl and then grab some data and do it all over again on another url. I apologize for any formatting issues in this post or things I've done wrong, this is my first time posting here.
Edit: ok, I've found a method that works, but it's still much slower than I would like, because it is using Drive API to create a document with the HTML data and then parse and create an array from there. The Drive.Files.Insert line is the most time consuming part. Anyone have an idea of how to make this quicker? It may not seem that slow to you right now, but when I need to do this 350 times, it adds up.
function parseTablesFromHTML() {
var html = UrlFetchApp.fetch("https://www.sports-reference.com/cbb/schools/indiana/2022-gamelogs.html");
var docId = Drive.Files.insert(
{ title: "temporalDocument", mimeType: MimeType.GOOGLE_DOCS },
html.getBlob()
).id;
var tables = DocumentApp.openById(docId)
.getBody()
.getTables();
var res = tables.map(function(table) {
var values = [];
for (var row = 0; row < table.getNumRows(); row++) {
var temp = [];
var cols = table.getRow(row);
for (var col = 0; col < cols.getNumCells(); col++) {
temp.push(cols.getCell(col).getText());
}
values.push(temp);
}
return values;
});
Drive.Files.remove(docId);
var range=SpreadsheetApp.getActive().getSheetByName("Test").getRange(3,6,res[0].length,res[0][0].length);
range.setValues(res[0]);
SpreadsheetApp.flush();
}
Solution by formula
Try
=importhtml(url,"table",1)
Other solution by script
function importTableHTML() {
var url = 'https://www.sports-reference.com/cbb/schools/indiana/2022-gamelogs.html'
var html = '<table' + UrlFetchApp.fetch(url, {muteHttpExceptions: true}).getContentText().replace(/(\r\n|\n|\r|\t| )/gm,"").match(/(?<=\<table).*(?=\<\/table)/g) + '</table>';
var trs = [...html.matchAll(/<tr[\s\S\w]+?<\/tr>/g)];
var data = [];
for (var i=0;i<trs.length;i++){
var tds = [...trs[i][0].matchAll(/<(td|th)[\s\S\w]+?<\/(td|th)>/g)];
var prov = [];
for (var j=0;j<tds.length;j++){
donnee=tds[j][0].match(/(?<=\>).*(?=\<\/)/g)[0];
prov.push(stripTags(donnee));
}
data.push(prov);
}
return(data);
}
function stripTags(body) {
var regex = /(<([^>]+)>)/ig;
return body.replace(regex,"");
}

Javascript performance optimization

I created the following js function
function csvDecode(csvRecordsList)
{
var cel;
var chk;
var chkACB;
var chkAF;
var chkAMR;
var chkAN;
var csvField;
var csvFieldLen;
var csvFieldsList;
var csvRow;
var csvRowLen = csvRecordsList.length;
var frag = document.createDocumentFragment();
var injectFragInTbody = function () {tblbody.replaceChild(frag, tblbody.firstElementChild);};
var isFirstRec;
var len;
var newEmbtyRow;
var objCells;
var parReEx = new RegExp(myCsvParag, 'ig');
var tblbody;
var tblCount = 0;
var tgtTblBodyID;
for (csvRow = 0; csvRow < csvRowLen; csvRow++)
{
if (csvRecordsList[csvRow].startsWith(myTBodySep))
{
if (frag.childElementCount > 0)
{
injectFragInTbody();
}
tgtTblBodyID = csvRecordsList[csvRow].split(myTBodySep)[1];
newEmbtyRow = getNewEmptyRow(tgtTblBodyID);
objCells = newEmbtyRow.cells;
len = newEmbtyRow.querySelectorAll('input')[0].parentNode.cellIndex; // Finds the cell index where is placed the first input (Check-box or button)
tblbody = getElById(tgtTblBodyID);
chkAF = toBool(tblbody.dataset.acceptfiles);
chkACB = toBool(tblbody.dataset.acceptcheckboxes) ;
chkAN = toBool(tblbody.dataset.acceptmultiplerows) ;
tblCount++;
continue;
}
csvRecordsList[csvRow] = csvRecordsList[csvRow].replace(parReEx, myInnerHTMLParag); // Replaces all the paragraph symbols ΒΆ used into the db.csv file with the tag <br> needed into the HTML content of table cells, this way will be possible to use line breaks into table cells
csvFieldsList = csvRecordsList[csvRow].split(myEndOfFld);
csvFieldLen = csvFieldsList.length;
for (csvField = 0; csvField < csvFieldLen; csvField++)
{
cel = chkAN ? csvField + 1 : csvField;
if (chkAF && cel === 1) {objCells[cel].innerHTML = makeFileLink(csvFieldsList[csvField]);}
else if (chkACB && cel === len) {objCells[cel].firstChild.checked = toBool(csvFieldsList[csvField]);}
else {objCells[cel].innerHTML = csvFieldsList[csvField];}
}
frag.appendChild(newEmbtyRow.cloneNode(true));
}
injectFragInTbody();
var recNum = getElById(tgtTblBodyID).childElementCount;
customizeHtmlTitle();
return csvRow - tblCount + ' (di cui '+ recNum + ' record di documenti)';
}
More than 90% of records could contain file names that have to be processed by the following makeFileLink function:
function makeFileLink(fname)
{
return ['<a href="', dirDocSan, fname, '" target="', previewWinName, '" title="Apri il file allegato: ', fname, '" >', fname, '</a>'].join('');
}
It aims to decode a record list from a special type of *.db.csv file (= a comma-separated values where commas are replaced by another symbol I hard-coded into the var myEndOfFld). (This special type of *.db.csv is created by another function I wrote and it is just a "text" file).
The record list to decode and append to HTML tables is passed to the function with its lone parameter: (csvRecordsList).
Into the csv file is hosted data coming from more HTML tables.
Tables are different for number of rows and columns and for some other contained data type (which could be filenames, numbers, string, dates, checkbox values).
Some tables could be just 1 row, others accept more rows.
A row of data has the following basic structure:
data field content 1|data field content 2|data field content 3|etc...
Once decoded by my algorithm it will be rendered correctly into the HTML td element even if into a field there are more paragraphs. In fact the tag will be added where is needed by the code:
csvRecordsList[csvRow].replace(par, myInnerHTMLParag)
that replaces all the char I choose to represent the paragraph symbol I have hard-coded into the variable myCsvParag.
Isn't possible to know at programming time the number of records to load in each table nor the number of records loaded from the CSV file, nor the number of fields of each record or what table field is going to contain data or will be empty: in the same record some fields could contain data others could be empty. Everything has to be discovered at runtime.
Into the special csv file each table is separated from the next by a row witch contains just a string with the following pattern: myTBodySep = tablebodyid where myTBodySep = "targettbodydatatable" that is just a hard coded string of my choice.
tablebodyid is just a placeholder that contains a string representing the id of the target table tbody element to insert new record in, for example: tBodyDataCars, tBodyDataAnimals... etc.
So when the first for loop finds into the csvRecordsList a string staring with the string into the variable myTBodySep it gets the tablebodyid from the same row: this will be the new tbodyid that has to be targeted for injecting next records in it
Each table is archived into the CSV file
The first for loop scan the csv record list from the file and the second for loop prepare what is needed to compile the targeted table with data.
The above code works well but it is a little bit slow: in fact to load into the HTML tables about 300 records from the CSV file it takes a bit more of 2.5 seconds on a computer with 2 GB ram and Pentium core 2 4300 dual-core at 1800 MHz but if I comment the row that update the DOM the function needs less than 0.1 sec. So IMHO the bottle neck is the fragment and DOM manipulating part of the code.
My aim and hope is to optimize the speed of the above code without losing functionalities.
Notice that I'm targeting just modern browsers and I don't care about others and non standards-compliant browsers... I feel sorry for them...
Any suggestions?
Thanks in advance.
Edit 16-02.2018
I don't know if it is useful but lastly I've noticed that if data is loaded from browser sessionstorage the load and rendering time is more or less halved. But strangely it is the exact same function that loads data from both file and sessionstorage.
I don't understand why of this different behavior considering that the data is exactly the same and in both cases is passed to a variable handled by the function itself before starting checking performance timing.
Edit 18.02.2018
Number of rows is variable depending on the target table: from 1 to 1000 (could be even more in particular cases)
Number of columns depending on the target table: from 10 to 18-20
In fact, building the table using DOM manipulations are way slower than simple innerHTML update of the table element.
And if you tried to rewrite your code to prepare a html string and put it into the table's innerHTML you would see a significant performance boost.
Browsers are optimized to parse the text/html which they receive from the server as it's their main purpose. DOM manipulations via JS are secondary, so they are not so optimized.
I've made a simple benchmark for you.
Lets make a table 300x300 and fill 90000 cells with 'A'.
There are two functions.
The first one is a simplified variant of your code which uses DOM methods:
var table = document.querySelector('table tbody');
var cells_in_row = 300, rows_total = 300;
var start = performance.now();
fill_table_1();
console.log('using DOM methods: ' + (performance.now() - start).toFixed(2) + 'ms');
table.innerHTML = '<tbody></tbody>';
function fill_table_1() {
var frag = document.createDocumentFragment();
var injectFragInTbody = function() {
table.replaceChild(frag, table.firstElementChild)
}
var getNewEmptyRow = function() {
var row = table.firstElementChild;
if (!row) {
row = table.insertRow(0);
for (var c = 0; c < cells_in_row; c++) row.insertCell(c);
}
return row.cloneNode(true);
}
for (var r = 0; r < rows_total; r++) {
var new_row = getNewEmptyRow();
var cells = new_row.cells;
for (var c = 0; c < cells_in_row; c++) cells[c].innerHTML = 'A';
frag.appendChild(new_row.cloneNode(true));
}
injectFragInTbody();
return false;
}
<table><tbody></tbody></table>
The second one prepares html string and put it into the table's innerHTML:
var table = document.querySelector('table tbody');
var cells_in_row = 300, rows_total = 300;
var start = performance.now();
fill_table_2();
console.log('setting innerHTML: ' + (performance.now() - start).toFixed(2) + 'ms');
table.innerHTML = '<tbody></tbody>';
function fill_table_2() {// setting innerHTML
var html = '';
for (var r = 0; r < rows_total; r++) {
html += '<tr>';
for (var c = 0; c < cells_in_row; c++) html += '<td>A</td>';
html += '</tr>';
}
table.innerHTML = html;
return false;
}
<table><tbody></tbody></table>
I believe you'll come to some conclusions.
I've got two thoughts for you.
1: If you want to know which parts of your code are (relatively) slow you can do very simple performance testing using the technique described here. I didn't read all of the code sample you gave but you can add those performance tests yourself and check out which operations take more time.
2: What I know of JavaScript and the browser is that changing the DOM is an expensive operation, you don't want to change the DOM too many times. What you can do instead is build up a set of changes and then apply all those changes with one DOM change. This may make your code less nice, but that's often the tradeoff you have when you want to have high performance.
Let me know how this works out for you.
You should start by refactoring your code in multiples functions to make it a bit more readable. Make sure that you are separating DOM manipulation functions from data processing functions. Ideally, create a class and get those variables out of your function, this way you can access them with this.
Then, you should execute each function processing data in a web worker, so you're sure that your UI won't get blocked by the process. You won't be able to access this in a web worker so you will have to limit it to pure "input/output" operations.
You can also use promises instead of homemade callbacks. It makes the code a bit more readable, and honestly easier to debug. You can do some cool stuff like :
this.processThis('hello').then((resultThis) => {
this.processThat(resultThis).then((resultThat) => {
this.displayUI(resultThat);
}, (error) => {
this.errorController.show(error); //processThat error
});
}, (error) => {
this.errorController.show(error); //processThis error
});
Good luck!

Javascript code doesn't load the page

I have a little problem with this javascript code, when I add more site on the list, the page doesn't load. I have to add more than 200 site.
I'm a noob with javascript. Can someone explain what is the problem, what
I'm doing wrong?
<script language="JavaScript" type="text/javascript">
var a = new Array(
'notiziepericolose.blogspot.it',
'ilcorrieredellanotte.it',
'ilmattoquotidiano.it',
'ilfattonequotidiano.com',
'rebubblica.altervista.org',
'coriere.net'
);
var aa = a.slice();
aa.sort();
document.write("<ol>");
document.write("<b>");
for (i = 0; i < a.length; i=i+1) {
document.write('<li id="demo'+i+'">'+a[i]+'</li>');
}
document.write("</b>");
document.write("</ol>");
</script>
I guess the first thing is that document.write is very rarely used now as there a better and more efficient ways of adding things (elements, text etc) to the DOM (more on that later). In addition, in your case, what you don't realise is that document.write is not like echo or println; each time it is used it clears the document, which is probably why you're not seeing anything appear. In other words, The results of multiple document.writes are not cumulative.
The second thing is that there are better ways of "labelling" elements than with ids, particularly if there are a lot of them on the page like you'll have. Again, there are now much better ways of targetting elements, or catching events than there were ten or fifteen years ago.
So, let's talk about your code.
You can quickly create a array using the [] brackets.
var arr = [
'notiziepericolose.blogspot.it',
'ilcorrieredellanotte.it',
'ilmattoquotidiano.it',
'ilfattonequotidiano.com',
'rebubblica.altervista.org',
'coriere.net'
];
You don't have to create a copy of the array in order to sort it - it can be done in place:
arr.sort();
I'm going to keep your loop but show you a different way of concatenating strings together. Some people prefer adding strings together, but I prefer this way, and that's to create an array of the little parts of your string and then join() them together**.
// Set l as the length, and create an output array called list
for (var i = 0, l = arr.length, list = []; i < l; i++) {
// I've changed things here. I've added a class called item
// but also changed the element id to a data-id instead
var li = ['<li class="item" data-id="', i, '">', arr[i], '</li>'];
// Push the joined li array of strings into list
list.push(li.join(''));
}
Assuming you have an element on your page called "main":
HTML
<div id="main"></div>
JS
You can add the list array as an HTML string to main by using [insertAdjacentHTML] method:
var main = document.getElementById('main');
// Note that I've given the ordered list an id called list
var HTML = ['<ol id="list"><b>', list.join(''), '</b></ol>'].join('');
main.insertAdjacentHTML('beforeend', html);
OK, so that's pretty easy. But I bet you're asking how you can target the individual items in the list so that if I click on one of them it alerts what it is (or something).
Instead of adding an event listener to each list item (which we could but it can work out performatively expensive the more items you have), we're going to attach one to the ol element we added that list id to and catch events from the items as they bubble up:
var ol = document.getElementById('list');
Then an event listener is added to the list that tells us what function (checkItem) is called when a click event is raised:
ol.addEventListener('click', checkItem);
Our function uses the event (e) to find out what the event's target was (what item was clicked), and alerts its text content.
function checkItem(e) {
alert(e.target.textContent);
}
You can see all this working in this demo. Hope some of this was of some help.
** Here's another way of sorting, and looping through the array using reduce:
var list = arr.sort().reduce(function (p, c, i) {
return p.concat(['<li class="item" data-id="', i, '">', c, '</li>']);
}, []).join('');
DEMO
if ES6 is possible for you, you can do it like this:
var a = new Array(
'notiziepericolose.blogspot.it',
'ilcorrieredellanotte.it',
'ilmattoquotidiano.it',
'ilfattonequotidiano.com',
'rebubblica.altervista.org',
'coriere.net');
var aa = a.slice();
var mL = document.getElementById('mylist');
aa.sort().map(el => {
var li = document.createElement("li");
var b = document.createElement("b");
var t = document.createTextNode(el);
b.appendChild(t);
li.appendChild(b);
mL.appendChild(li);
});
<ol id="mylist"></ol>
If you're using an Array, you can use a forEach instead of a loop.
var domUpdate = '';
var websites = ['notiziepericolose.blogspot.it','ilcorrieredellanotte.it','ilmattoquotidiano.it','ilfattonequotidiano.com','rebubblica.altervista.org','coriere.net'];
websites.forEach(function(website, index){
domUpdate += '<li id="website-' + ( index + 1 ) + '"><b>' + website + '</b></li>';
});
document.getElementById('hook').innerHTML = '<ol>' + domUpdate + '</ol>';
<div id="hook"></div>
I'm thinking document.write is the wrong choice here, as it seems to be clearing the document. https://developer.mozilla.org/en-US/docs/Web/API/Document/write Probably you want to bind the new content to existing html through document.getElementById or something like that

javaqscript: 2d associative arrays 101

This is my first Q&A ever, so hopefully it's alright.
As someone who generally picks stuff up quickly, I found the information on this topic was sporadic and generally over complicated, with many people saying it simply couldn't be done. so here's it broken down very simply.
Take this scenario as an example, we have a number of form components (text boxes, buttons, etc.), all with a number of properties, all of which have values... and we want to store these in a javascript array.
Here's my tinkerings. This code doesn't explicitly answer a question, as no question was explicitly asked, however I hope you find it useful
For good measure, here also is a jsfiddle http://jsfiddle.net/cQ8Xc/
var $parent_arr = new Array();
var $child_arr = new Array();
//we can add the key => value pairs like so:
//(obviously they won't be done like this, more likely a loop for example)
//this works just fine $child_arr[$key] = $value;
$child_arr['Top'] = '12';
$child_arr['Left'] = '13';
$child_arr['Right'] = '14';
$child_arr['Bottom'] = '15';
//we can add the array to another array like so:
$parent_arr['component1'] = $child_arr;
//clear the array for reuse (note that it is obviously not nessecary to reuse the array)
$child_arr = [];
//refill it
//note that the child arrays don't have to be identical lengths or values
$child_arr['Height'] = '22';
$child_arr['Width'] = '23';
$child_arr['Colour'] = 'blue';
$parent_arr['component2'] = $child_arr;
//we can access the data like this:
alert($parent_arr['component1']['Top']);
alert($parent_arr['component2']['Colour']);
//these didn't work for me, you've likely seen them in other answers if you've been researching this topic
//alert(JSON.stringify($parent_arr['component1'], null, 4));
//alert(JSON.stringify($parent_arr['component1']));
//alert($parent_arr['component1'].join("\n"));
//the array can be looped over like so:
for(var component in $parent_arr) {
for(var propertyName in $parent_arr[component]) {
alert(component + '.' + propertyName + '=' + $parent_arr['component1'][propertyName]);
}
}

Javascript and HTML data model and presentation model design question

so I've been working on a project in Javascript that takes in objects the user provides and represents them in HTML. Right now they are represented in memory as an array, and in the display as a separate array. After integrating some code changes, problems have arisen in that the display array seems to be having troubles removing it's contents, thus things that should be removed don't disappear from the view.
Declaring lists:
this.divList = gDocument.getElementById( element );
this.objectList = [];
Adding an object to the lists:
addObject = function (address, type){
var newDiv = gDocument.createElement("div");
this.divList.appendChild( newDiv );
var d = this.createObject( newDiv, address, type );
if (undefined != d)
{
this.objectList.push(d);
}
}
The divList accurately reflects the objectList until any changes are made to the objectList at runtime. When restarted, the lists are in sync once again. When I tried to fix it, things were very complicated. I'm wondering if there is a better way to design such an idea (the object model and the graphical representation). Any comments would be helpful, thanks.
Question vagueness aside, my recommendation would be to store one list, not two, in memory. Each list element is an object with all the necessary data you need for that particular abstract "object" (the ones that "the user provides"). Something like this:
this.divList = gDocument.getElementById(element);
this.masterList = [];
var i,
len = this.divList.length;
for (i = 0; i<len; i++)
{
this.masterList.push({
elt: this.divList[i],
obj: /* however you'd create the object in this.objectList */
});
}
Edit: your addObject function would be changed to something like this:
addObject = function (address, type)
{
var newDiv = gDocument.createElement("div"),
newObj = {elt: newDiv,
obj: this.createObject(newDiv, address, type)};
this.masterList.push(newObj);
this.divList.appendChild(newDiv);
}
You should store a reference to the HTML element that you're appendChild()ing to. You're already doing this - but when you need to manipulate the individual elements (say, remove one), use the masterList instead:
removeObject = function (i)
{
var toRemove = this.masterList.splice(i, 1);
if (toRemove)
{
this.divList.removeChild(toRemove.elt);
}
}
See also Array.splice().

Categories

Resources