Interact through an array generated from a Google Sheet

Interact through an array generated from a Google Sheet | Google Apps Script - javascript

I'm new to Google Script and I'm learning at work. What I am trying to do is a script that, based on a google sheet, check if the subject of the email contains the String present in one of the rows and save the attachment on the google drive. The ID for this folder is also on the same google sheet.
So the structure would be:
Column A
Column B
Column C
x
EXE2928
folderID29382938
x
EXE823
folderID29383994
x
EX3948
folderID55154988
The script to send save the attachments I am already using and it works. And if I read the information on the google sheet one by one, I can send it to the folder correctly. But this is not optimal, since there are a lot of rows.
What I tried so far was following function to get the rows as array and than interact over it
var dataspreadsheet = SpreadsheetApp.openById("SpreadsheetID");
var sheet = dataspreadsheet.getSheetByName("Sheet1");
// Just looking at the Column B
var data = sheet.getRange(1,4,5).getValues();
var header = data[0];
data.shift();
// ...
for (var l = 0; l < data.length; l++) {
if (filename.indexOf(data.length[l]) !== -1) {
// Here I still need to get the folder ID from the same google sheet
folderName = folder1;
} else {
folderName = folder2;
}
Could you give me some support or ideas on who to go on with this script?
Thank you in advance!

This answer assumes that you are using GmailApp to retrieve the emails. Also, it can only be considered a bit of a guideline, since I have not spend the time to test and measure code.
To keep in mind
Tooling
To optimize code you the first you need is tooling.
The most important tool is to measure that time that some part takes to run. You can do so with console.time and console.timeEnd (see console timers reference). This methods will allow you to measure the time it takes between them.
For parts of code where you don't require Apps Script classes or methods, you can test them locally to measure the performance with any other tool. Just be aware that it may not perfectly translate to Google Apps Script.
Know what you want to achieve
What is fast enough? This is something that you need to know before starting. This is usually specified for the reason you are optimizing in the first place.
Best practices
Read the official Apps Script best practices guide. It has a lot of advises that hold almost always true.
Know your system
A lot of times there are constraints on the system that you haven't even considered.
For example: if the string to search is always at the start of the subject, you may be able to make a more specific code that performs better.
Another example: Does this kind of threads only have a single email, or multiple of them? This can change the code a lot.
Measure everything
Don't assume. Measure. Sometime things that seem like should be slower are faster. This is more true the more you optimize. It's highly recommended to get a baseline of the current time and work from there.
Test the simple first
Don't get carried away trying complex code. Some time simple code is faster. Sometimes it's not faster but it's fast enough.
Weird can be good
Try to think outside the box. Sometime weird solutions are the fastest. This may reduce readability.
An example would be to generate a regular expression with all the values and use it to detect if it contains one and which. This could be slower or faster, but it's worth trying.
const r = /(EXE2928|EXE823|EX3948)/ // generate this dynamically
const m = r.match(string)
if (m != null) {
const key = m[1] // key is the value included on the subject
}
Some ideas
Get the minimum data and only once
You only need the mapping of columns B (text to find) and C (folder to send) to do what you need. Simply get that data. You will need to iterate so there's no need to transform the data. Also skip the headers.
var mapping = sheet.getRange(`B2:C`).getValues()
Also try to limit the number of Gmail threads that you read.
Organize the emails
I'd try moving the emails into a more easily digestible data to iterate.
Change when it's executed
I don't know when this code is executed but changing it could change execution time.
Changing platform
Google Apps Script may not be the best platform to be using. Calling the APIs directly from a client library (there is a python one) may be better.
References
console (MDN)
Best practices (Google Apps Script guides)

Related

How do I scrape dynamic URL of a page?

I am trying to do some website testing through selenium and python. I did fill the page http://www.flightcentre.co.nz/ and submitted the form. But now the search result is taking me to a new page with URL - https://secure.flightcentre.co.nz/eyWD/results . How does my web driver now will handle this? I am doing this for the first time. Could any one help me by providing an example or point me to a right tutorial of this sort.
Thanks.

Ok since I tried to answer your other question I'll give it a go on this one although you are not exactly explaining what you want.
One thing to remember is Selenium is running your browser and not a traditional web scraper. Which means if the url changes it's not a big deal, the only time you have to change how you approach scraping in selenium is when you get a popup.
One thing you can do from your other code is when looking for a flight do a
driver.implicitly_wait(40)//40 is the amount of seconds
this will wait for at least 40 seconds before crashing, and then start whenever the page finishes loading, or whatever you want to do next is active in the dom.
Now if you are trying to scrape all of the flight data that comes up, that'll be fairly tricky. You could do a for loop and grab every element on a page and write it to a csv file.
class_for_departure_flight = driver.find_elements_by_xpath('//div[#class="iata"]')
for flights in class_for_departure_flight:
try:
with open('my_flights.csv', 'a', newline='') as flights_book:
csv_writer = csv.writer(flights_book, delimiter = ',')
csv_writer.writerow(flights.text)
except:
print("Missed a flight")
Things to take notice in this second part is I am using the CSV library in Python to write rows of data. A note you can append a bunch of data together and write it as one row like:
data = (flights, dates, times)
csv_writer.writerow(data)
and it will right all of those different things on the same row in a spreadsheet.
The other two big things that are easily missed are:
class_for_departure_flight = driver.find_elements_by_xpath('//div[#class="iata"]')
that is driver.find_elements_by_xpath, you'll notice elements is plural, which means it is looking for multiple objects with the same class_name and it will store them in an array so you can iterate over them in a for loop.
The next part is csv_writer.writerow(flights.text) when you iterate over your flights, you need to grab the text to do that you do flights.text. If you were to do this with just a search function you could do something like this as well.
class_for_departure_flight = driver.find_elements_by_xpath('//div[#class="iata"]').text
hopefully this helps!

This is a good place to start: http://selenium-python.readthedocs.org/getting-started.html
Here are some things about Selenium I've learned the hard way:
1) When the DOM refreshes, you lose your references to page objects (i.e. return from something like element = driver.find_element_by_id("passwd-id"), element is now stale)
2) Test shallow; each test case should do only one assert/validation of page state, maybe two. This enables you to take screen shots when there's a failure and save you from dealing with "is it a failure in the test, or the app?"
3) It's a big race condition between any JavaScript on the page, and Selenium. Use Explicit Waits to block Selenium while the JavaScript works to refresh the DOM.
To be clear, this is my experience with using Selenium; thus not universal to everyone's experience.
Good luck! Hope this is helpful.

Parse JavaScript and keep track of all variables and their values

I was watching Bret Victor's talk "Inventing on Principle" the other night and decided to try and build the real time JavaScript editor he demoed. You can see it in action at 18:05 when he implements binary search.
It doesn't look like he ever released such an editor, but regardless, I thought I could learn a lot building one like it.
Here's what I have so far
What it can currently do:
Keep track of variables and their values (if assigned as literals)
Print them on the same line on the right
Show parsing errors
I'm using Electron and Angular to build the app, so it's a desktop app for OSX, but written in JavaScript and HTML.
For parsing, I'm using Acorn. So far it's a fantastic parser, but it's really hard to actually run the code after it's been parsed. Permitting only literal assignments such as var x = 1 is doable, but things get really complex fast once you try to do stuff as simple as var x = 1 + 2, due to how Acorn structures the parsed result.
I don't want to just eval the whole thing, since it's could be dangerous and there's probably better ways to do it.
Ideally, I could find a safe way to evaluate the code on the left and keep track of all the variables somehow. Unfortunately, my research indicates that there is no access to private variables in JavaScript, so I'm hoping I can count on fellow developers' ingenuity to help me with this. Any hints on how to do this better/easier than with Acorn would be greatly appreciated.
If you need it, my code base is here: https://github.com/dchacke/nasherai

Try sandbox for safe evaluation of strings.
var s = new Sandbox()
s.run( '1 + 1 + " apples"', function( output ) {
// output.result == "2 apples"
})

How do you send and receive the same image between the client and the server?

I'm trying to implement a bandwidth test, and it looks like the most conventional way to do this is basically to transmit one or more images back and forth between a client and a server and see what the upload and download times are. In fact, this answer already covers the idea of getting a download time.
At this point though, I'm not too sure how to make the process go both ways, with or without using the other answer. Even after adding debugging statements, I haven't found where the picture's data is stored in the other answer's code. And if I try to start off with a clean slate, a lot of the API information I'm finding on the Internet / Stack Overflow about sending images back and forth has very little explanation or elaboration, so I'm not able to use it put the two together. Furthermore some experiments I have put together that have sometimes involved other platforms seemed to really throttle bandwidth usage, as well as scale the delay improperly with the images' sizes.
Using JavaScript, how do you transmit the same image back and forth between the client and server in such a way that you can accurately use the delay and the image's size to measure bandwidth? How do you make this work both ways with the same image, without throttling, and without interaction from the user?
EDIT
I could try posting things I've tried, but it's hard for it to be meaningful. A lot of it was in Flash. When I start using JavaScript, I started to experiment a little along these lines:
$.get('http://ip address/test.php?filename=XXX&data=LONG STRING OF DATA REPRESTING THE DATA TO BE SAVED PROBABLY LESS THAN 2K IN SIZE AND ALSO YOU SHOULD ESCAPE OR ATLEAST URIENCODE IT', function(data) {
alert('here')
eval(data);
});
The PHP file being:
<?php
echo "response=here";
?>
And I used the PHP file both for Flash and for JavaScript. I also used Adobe Media Server with Flash. But going from a 1MB file to a 32MB file while using Flash/PHP, Flash would only scale the delay by 10 times, nowhere near 32. It also seemed to throttle bandwidth usage at least when paired with the AMS, and maybe even when it was paired with the PHP file.
I was about to convert the JavaScript code to pass the actual image in to the PHP file...but I can't get to it. Even when I do things like:
for (var s in download) {
alert(s + ": " + download[s]);
}
download being the object that downloaded the image in the JavaScript (see the linked answer for the code), I'm not seeing anything useful. download.children.length is 0 and so on. I'm also reluctant to trust that the results aren't throttling bandwidth usage, like the Flash experiments did, without further confirmation; maybe the image has to be passed in using one type of API call or another to get it to really work right?
In essence, I'm really looking for good API information. Other stuff I saw just wasn't elaborate enough to connect the dots with.
2ND EDIT
One of the pitfalls I've run into is using POST to download the images. I'm running into a lot of difficulty in getting IIS 7 to allow POST to download arbitrary file types (namely jpgs) with "unusual" binary characters and allow them to be more than 2MB in size.

Why don't you send some text using $.post.
E.g:
Generate some big text:
var someText = '0123456789';
for (var i = 0; i <= 10000; i++)
{
someText += someText;
}
then post it to the server:
var start = new Date();
$.post('yourUrl', {data:someText}, function(){
var end = new Date();
//this will show the bytes per second upload bandwidth
alert(someText.length / ((end - start)/1000));
});
To be sure the result is exact, you can run this, for example, 10 times and get the average value.

Run Database Stored RegEx against DOM

I have a question about how to approach a certain scenario before I get halfway through it and figure out it was not the best option.
I work for a large company that has a team that creates tools for the team mates to use that aren’t official enterprise tools. We have no access to the database directly, just access to an internal server to store our files to run and be able to access the main site with javascript etc (same domain).
What I am working on is a tool that has a ton of options in it that allow you to select that I will call “data points” on a page.
There are things like “Account status, Balance, Name, Phone number, email etc” and have it save those to an excel sheet.
So you input account numbers, choose what you need and then using IE Objects it navigates to the page and scrapes data you request.
My question is as follows..
I want to make the scraping part pretty Dynamic in the way it works. I want to be able to add new datapoints on the fly.
My goal or idea is so store the regular expression needed to get the specific piece of data in the table with the “data point option”.
If I choose “Name” it knows the expression for name in the database to run again the DOM.
What would be the best way about creating that type of function in Javascript / Jquery?
I need to pass a Regex to a function, have it run against the DOM and then return the result.
I have a feeling that there will be things that require more than 1 step to get the information etc.
I am just trying to think of the best way to approach it without having to hardcode 200+ expressions into the file as the page may get updated and need to be changed.
Any ideas?

IRobotSoft scraper may be the tool you are looking for. Check this forum and see if questions are similar to what you are doing: http://irobotsoft.org/bb/YaBB.pl?board=newcomer. It is free.
What it uses is not regular expression but a language called HTQL, which may be more suitable for extracting web pages. It also supports regular expression, but not as the main language.
It organizes all your actions well with a visual interface, so you can dynamically compose actions or tasks for changing needs.

Javascript and performance, use jQuery or Javascript?

I have a big data should be shown in a Table.
I use javascript to fill the table instead of priting in HTML.
Here is a sample data I use:
var aUsersData = [[1, "John Smith", "...."],[...],.......];
the problem is that Firefox warns me that "There is a heavy script running, should i continue or stop?"
I don't want my visitors see the warning. how can I make performance better? jQuery? pure script? or another library you suggest?

you can use the method here to show a progress bar and not have the browser lock up on you.
http://www.kryogenix.org/days/2009/07/03/not-blocking-the-ui-in-tight-javascript-loops
I am using almost that method on this page:
http://www.bacontea.com/bb/
to get the browser not to hang and show feedback while loading.

jQuery doesn't usually make things faster, just easier. I use jQuery to populate tables, but they're pretty small (at most 2 columns by 40 rows). How much data are you populating into the table? This could be the limiting factor.
If you post some of your table-populating code we can see if it's possible to improve performance in any way.

My suspicion is that it won't make much difference either way, although sometimes adding a layer of abstraction like jQuery can impact performance. Alternately, the jQuery team may have found an obscure, really fast way of doing something that you would have done in a more obvious, but slower, way if you weren't using it. It all depends.
Two suggestions that apply regardless:
First, since you're already relying on your users having JavaScript enabled, I'd use paging and possibly filtering as well. My suspicion is that it's building the table that takes the time. Users don't like to scroll through really long tables anyway, adding some paging and filtering features to the page to only show them the entries from the array they really want to see may help quite a lot.
Second, when building the table, the way you're do it can have a big impact on performance. It almost seems a bit counter-intuitive, but with most browsers, building up a big string and then setting the innerHTML property of a container is usually faster than using the DOM createElement function over and over to create each row and cell. The fastest overall tends to be to push strings onto an array (rather than repeated concatenation) and then join the array:
var markup, rowString;
markup = [];
markup.push("<table><tbody>");
for (index = 0, length = array.length; index < length; ++index) {
rowString = /* ...code here to build a row as a string... */;
markup.push(rowString);
}
markup.push("</tbody></table>");
document.getElementById('containerId').innerHTML = markup.join("");
(That's raw JavaScript/DOM; if you're already using jQuery and prefer, that last line can be rewritten $('#containerId').html(markup.join(""));)
This is faster than using createElement all over the place because it allows the browser to process the HTML by directly manipulating its internal structures, rather than responding to the DOM API methods layered on top of them. And the join thing helps because the strings are not constantly being reallocated, and the JavaScript engine can optimize the join operation at the end.
Naturally, if you can use a pre-built, pre-tested, and pre-optimised grid, you may want to use that instead -- as it can provide the paging, filters, and fast-building required if it's a good one.

You can try a JS templating engine like PURE (I'm the main contributor)
It is very fast on all browsers, and keeps the HTML totally clean and separated from the JS logic.
If you prefer the <%...%> type of syntax, there are plenty of other JS template engines available.

Develop Reference

JavaScript is the programming language of the Web.