Applescript with do Javascript and passed Applescript Variable - javascript

I have written a script that has automated the creation of products on a website I administer. In my process I upload JPEG images of the products and pull the Keywords that are tagged in the JPEG to add into the product information. In this process I use Applescript to Activate Safari and process a Javascript line of code. The line of code includes the a variable that is derived from Applescript Shell Script.
Code below
tell application "Finder"
set sourceFolder to folder POSIX file "/Users/<username>/Desktop/Upload/Temp/HighRes/"
set theFiles to files of sourceFolder
set inputPath to "/Users/<username>/Desktop/Upload/Temp/"
end tell
repeat with afile in theFiles
set filename to name of afile
set fname to text 1 thru ((offset of "." in filename) - 1) of filename
--INPUT CODE TO BE LOOPED OVER BELOW--
--Add Image Keywords from Metadata--
try
set pathVAR1 to "/Users/<username>/Desktop/Upload/Temp/HighRes/"
set pathVAR2 to pathVAR1 & filename
set myvar to do shell script "mdls -name kMDItemKeywords " & quoted form of pathVAR2
set var1 to ((offset of "(" in myvar) + 1)
set var2 to ((length of myvar) - 1)
set myKeywords to ((characters var1 thru var2 of myvar) as string)
--Inputs the Keywords from the Image Metadata--
tell application "Safari"
activate
do JavaScript "document.getElementById('ctl00_cphMainContent_txtKeyWords').value = \"" & myKeywords & "\";" in current tab of window 1
end tell
end try
--END OF CODE TO BE LOOPED OVER--
end repeat
==End Code==
Problem:
The code below is not passing the variable myKeywords to Safari, but if I run a dialog it will appear in the dialog.
do JavaScript "document.getElementById('ctl00_cphMainContent_txtKeyWords').value = \"" & myKeywords & "\";" in current tab of window 1

I don't have a specific solution that will definitely solve your problem, but I do have a number of observations about your script with recommendations on how it can be changed to improve its speed, robustness and adherence to principles of best practice.
Get rid of that try block. You have no idea what's happening in your script when things go wrong if you're masking the errors with unnecessary error-catching. The only line that needs to be enclosed in try...end try is do shell script, but only put it in once you know your code is working. In general, try blocks should only be used:
when your script has the potential to throw an error that is entirely predictable and explainable, and you understand the reasons why and under what conditions the error occurs, allowing you to implement an effective error-handling method;
around the fewest possible number of lines of code within which the error arises, leaving all lines of code whose existence doesn't depend on the result of the error-prone statement(s);
after your script has been written, tested, and debugged, where placing the try block(s) no longer serves to force a script to continue executing in the wake of an inconvenient error of unknown origin, but has a clear and well-defined function to perform in harmony with your code, and not against it.
As a general rule in AppleScript, don't use Finder to perform file system operations if you can avoid it: it's slow, and blocks while it's performing the operations, meaning you can't interact with the GUI during this time. Use System Events instead. It's a faceless application that won't stop other things operating when it's performing a task; it's fast, in the context of AppleScript and Finder in particular, and isn't prone to timing out quite so much as Finder does; it handles posix paths natively (including expansion of tildes), without any coercion necessary using POSIX file; it returns alias objects, which are the universal class of file object that every other scriptable application understands.
There are a couple of instances where Finder is still necessary. System Events cannot reveal a file; nor can it get you the currently selected files in Finder. But it's simple enough to have Finder retrieve the selection as an alias list, then switch to System Events to do the actual file handling on this list.
This is curious:
set filename to name of afile
set fname to text 1 thru ((offset of "." in filename) - 1) of filename
Am I right in thinking that fname is intending to hold just the base file name portion of the file name, and this operation is designed to strip off the extension ? It's a pretty good first attempt, and well done for using text here to itemise the components of the string rather than characters. But, it would, of course, end up chopping off a lot more than just the file extension if the file name had more than one "." in it, which isn't uncommon.
One way to safely castrate the end of the file name is to use text item delimiters:
set filename to the name of afile
set fname to the filename
set my text item delimiters to "."
if "." is in the filename then set fname to text items 1 thru -2 of the filename as text
You should then be mindful or resetting the text item delimiters afterwards or there'll be consequences later on when you try and concatenate strings together.
Another way of chopping of the extension without utilising text item delimiters is string scanning, which is where you iterate through the characters of a string performing operations or tests as you go, and achieving the desired outcome. It's speedier than it sounds and a powerful technique for very complex string searching and manipulations:
set filename to the name of afile
set fname to the filename
repeat while the last character of fname ≠ "."
set fname to text 1 thru -2 of fname
end
set fname to text 1 thru -2 of fname
You could also retrieve the name extension property of the file, get its length, and remove (1 + that) many characters from the end of the file's name. There a myriad ways to achieve the same outcome.
This is wrong in this particular instance:
set myKeywords to ((characters var1 thru var2 of myvar) as string)
characters produces a list, which you then have to concatenate back into a string, and this is unsafe if you aren't sure what the text item delimiters are set to. As you haven't made a reference to it in your script, it should be set to an empty string, which would result in the joining of the characters back into words produce the expected result. However, this could easily not be the case, if, say, you performed the first technique of file extension castration and neglected to set the text item limiters back—then the resulting string would have a period between every single letter.
As a policy in AppleScript (which you can personally choose to adhere to or ignore), it's considered by some as poor form if you perform list to string coercion operations without first setting the text item delimiters to a definitive value.
But you needn't do so here, because rather than using characters, use text:
set myKeywords to text var1 thru var2 of myvar
You're performing a shell command that looks like this: mdls -name kMDItemKeywords <file>, and then the two lines of AppleScript that follow awkwardly try and trim off the leading and trailing parentheses around the text representation of a bash array. Instead, you can turn on the -raw flag for mdls, which simplifies the output by stripping off the name of the key for you. This then places the parentheses as the very first and very last characters; however, since there's a load of dead whitespace in the output as well, you might as well get bash to perform all the clean up for you:
mdls -raw -name kMDItemContentTypeTree <file> | grep -E -io '[^()",[:blank:]]+'
This disregards parentheses, double quotes, commas, and whitespace, so all you get back is a list of keywords, one per line, and without any extra baggage. If you needed to itemise them, you can set a variable to the paragraphs of the output from the do shell script command, which splits the text into lines placing each keyword into a list. But it seems here that you need text and don't mind it being multilinear.
When I started to write this answer, I didn't have an inkling as to what was causing the specific issue that brought you here. Having gone through the details of how mdls formats its output, I now see the issue is with the fact that the myKeywords string will contain a bunch of double quotes, and you've surrounded the placement of the myKeywords entity in your JavaScript expression with double quotes. All of these quotes are only being escaped equally and once only in the AppleScript environment but not in the JavaScript environment, which results in each neighbouring double quote acting as an open-close pair. I ran a similar command in bash to obtain an array of values (kMDContentTreeType), and then processed the text in the way AppleScript does, before opening the JavaScript console in my browser and pasting it to illustrate what's going on:
Anything in red is contained inside a string; everything else is therefore taken as a JavaScript identifier or object (or it would be if the messed up quotes didn't also mess up the syntax, and then result in an unterminated string that's still expecting one last quote to pair with.
I think the solution is to use a continuation character "\" for backward compatibility with older browsers: so you would need to have each line (except the last one) appended with a backslash, and you need to change the pair of double quotes surrounding the myKeywords value in your JavaScript expression to a pair of single quotes. In newer browsers, you can forgo the headache of appending continuation marks to each line and instead replace the pair of outside double quotes with a pair of backticks (`) instead:
❌'This line throws
an EOF error in
JavaScript';
✅'This line is \
processed successfully \
in JavaScript';
✅`This line is also
processed successfully
in JavaScript`;

I had tried the backticks ( ` ) suggested by CJK but that did not work for me. The main issue being raised was that the kMDItemKeywords returned escaped characters.
Heart,
Studio,
Red,
\"RF126-10.tif\",
Tree,
\"Heart Tree\",
occasion,
Farm,
birds,
\"Red Farm Studio\",
\"all occasion\",
all
I was able to get rid of the escaped characters using the following:
NEW CODE
set myKeywords to do shell script "echo " & quoted form of myKeywords & " | tr -d '[:cntrl:]'| tr '[:upper:]' '[:lower:]' | tr -d '\"'"
UPDATED CODE FOR JAVASCRIPT
--Inputs the Keywords from the Image Metadata--
tell application "Safari"
activate
do JavaScript "document.getElementById('ctl00_cphMainContent_txtKeyWords').value = '" & myKeywords & "';" in current tab of window 1
end tell
RESULT
--> " heart, studio, red, rf126-10.tif, tree, heart tree, occasion, farm, birds, red farm studio, all occasion, all"

Related

[Nearley]: how to parse matching opening and closing tag

I'm trying to parse a very simple language with nearley: you can put a string between matching opening and closing tags, and you can chain some tags. It looks like a kind of XML, but with[ instead of < , with tag always 2 chars long, and without nesting.
[aa]My text[/aa][ab]Another Text[/ab]
But I don't seem to be able to parse correctly this, as I get the grammar should be unambiguous as soon as I have more than one tag.
The grammar that I have right now:
#builtin "string.ne"
#builtin "whitespace.ne"
openAndCloseTag[X] -> "[" $X "]" string "[/" $X "]"
languages -> openAndCloseTag[[a-zA-Z] [a-zA-Z]] (_ openAndCloseTag[[a-zA-Z] [a-zA-Z]]):*
string -> sstrchar:* {% (d) => d[0].join("") %}
And related, Ideally I would like the tags to be case insensitive (eg. [bc]TESt[/BC] would be valid)
Has anyone any idea how we can do that? I wasn't able to find a nearley XML parser example .
Your language is almost too simple to need a parser generator. And at the same time, it is not context free, which makes it difficult to use a parser generator. So it is quite possible that the Nearly parser is not the best tool for you, although it is probably possible to make it work with a bit of hackery.
First things first. You have not actually provided an unambiguous definition of your language, which is why your parser reports an ambiguity. To see the ambiguity, consider the input
[aa]My text[/ab][ab]Another Text[/aa]
That's very similar to your test input; all I did was swap a pair of letters. Now, here's the question: Is that a valid input consisting of a single aa tag? Or is it a syntax error? (That's a serious question. Some definitions of tagging systems like this consider a tag to only be closed by a matching close tag, so that things which look like different tags are considered to be plain text. Such systems would accept the input as a single tagged value.)
The problem is that you define string as sstrchar:*, and if we look at the definition of sstrchar in string.ne, we see (leaving out the postprocessing actions, which are irrelevant):
sstrchar -> [^\\'\n]
| "\\" strescape
| "\\'"
Now, the first possibility is "any character other than a backslash, a single quote or a newline", and it's easy to see that all of the characters in [/ab] are in sstrchar. (It's not clear to me why you chose sstrchar; single quotes don't appear to be special in your language. Or perhaps you just didn't mention their significance.) So a string could extend up to the end of the input. Of course, the syntax requires a closing tag, and the Nearley parser is determined to find a match if there is one. But, in fact, there are two of them. So the parser declares an ambiguity, since it doesn't have any criterion to choose between the two close tags.
And here's where we come up against the issue that your language is not context-free. (Actually, it is context-free in some technical sense, because there are "only" 676 two-letter case-insensitive tags, and it would theoretically be possible to list all 676 possibilities. But I'm guessing you don't want to do that.)
A context-free grammar cannot express a language that insists that two non-terminals expand to the same string. That's the very definition of context-free: if one non-terminal can only match the same input as a previous non-terminal, then
the second non-terminals match is dependent on the context, specifically on the match produced by the first non-terminal. In a context-free grammar, a non-terminal expands to the same thing, regardless of the rest of the text. The context in which the non-terminal appears is not allowed to influence the expansion.
Now, you quite possibly expected that your macro definition:
openAndCloseTag[X] -> "[" $X "]" string "[/" $X "]"
is expressing a context-sensitive match by repeating the $X macro parameter. But it is not by accident that the Nearley documentation describes this construct as a macro. X here refers exactly to the string used in the macro invocation. So when you say:
openAndCloseTag[[a-zA-Z] [a-zA-Z]]
Nearly macro expands that to
"[" [a-zA-Z] [a-zA-Z] "]" string "[/" [a-zA-Z] [a-zA-Z] "]"
and that's what it will use as the grammar production. Observe that the two $X macro parameters were expanded to the same argument, but that doesn't mean that will match the same input text. Each of those subpatterns will independently match any two alphabetic characters. Context-freely.
As I alluded to earlier, you could use this macro to write out the 676 possible tag patterns:
tag -> openAndCloseTag["aa"i]
| openAndCloseTag["ab"i]
| openAndCloseTag["ac"i]
| ...
| openAndCloseTag["zz"i]
If you did that (and you managed to correctly list all of the possibilities) then the parser would not complain about ambiguity as long as you never use the same tag twice in the same input. So it would be ok with both your original input and my altered input (as long as you accept the interpretation that my input is a single tagged object). But it would still report the following as ambiguous:
[aa]My text[/aa][aa]Another Text[/aa]
That's ambiguous because the grammar allows it to be either a single aa tagged string (whose text includes characters which look like close and open tags) or as two consecutive aa tagged strings.
To eliminate the ambiguity you would have to write the string pattern in a way which does not permit internal tags, in the same way that sstrchar doesn't allow internal single quotes. Except, of course, it is not nearly so simple to match a string which doesn't contain a pattern, than to match a string which doesn't contain a single character. It could be done using Nearley, but I really don't think that it's what you want.
Probably your best bet is to use native Javascript regular expressions to match tagged strings. This will prove simpler because Javascript regular expressions are much more powerful than mathematical regular expressions, even allowing the possibility of matching (certain) context-sensitive constructions. You could, for example, use Javascript regular expressions with the Moo lexer, which integrates well into Nearley. Or you could just use the regular expressions directly, since once you match the tagged text, there isn't much else you need to do.
To get you started, here's a simple Javascript regular expression which matches tagged strings with matching case-insensitive labels (the i flag at the end):
/\[([a-zA-Z]{2})\].*?\[\/\1\]/gmi
You can play with it online using Regex 101

Regex in Google Apps Script practical issue. Forms doesn't read regex as it should

I hope its just something i'm not doing right.
I've been using a simple script to create a form out of a spreadsheet. The script seems to be working fine. The output form is going to get some inputs from third parties so i can analyze them in my consulting activity.
Creating the form was not a big deal, the structure is good to go. However, after having the form creator script working, i've started working on its validations, and that's where i'm stuck at.
For text validations, i will need to use specific Regexes. Many of the inputs my clients need to give me are going to be places' and/or people's names, therefore, i should only allow them usign A-Z, single spaces, apostrophes and dashes.
My resulting regexes are:
//Regex allowing a **single name** with the first letter capitalized and the occasional use of "apostrophes" or "dashes".
const reg1stName = /^[A-Z]([a-z\'\-])+/
//Should allow (a single name/surname) like Paul, D'urso, Mac'arthur, Saint-Germaine ecc.
//Regex allowing **composite names and places names** with the first letter capitalized and the occasional use of "apostrophes" or "dashes". It must avoid double spaces, however.
const regNamesPlaces = /^[^\s]([A-Z]|[a-z]|\b[\'\- ])+[^\s]$/
//This should allow (names/surnames/places' names) like Giulius Ceasar, Joanne D'arc, Cosimo de'Medici, Cosimo de Medici, Jean-jacques Rousseau, Firenze, Friuli Venezia-giulia, L'aquila ecc.
Further in the script, these Regexes are called as validation pattern for the forms text items, in accordance with each each case.
//Validation for single names
var val1stName = FormApp.createTextValidation()
.setHelpText("Only the person First Name Here! Use only (A-Z), a single apostrophe (') or a single dash (-).")
.requireTextMatchesPattern(reg1stName)
.build();
//Validation for composite names and places names
var valNamesPlaces = FormApp.createTextValidation()
.setHelpText(("Careful with double spaces, ok? Use only (A-Z), a single apostrophe (') or a single dash (-)."))
.requireTextMatchesPattern(regNamesPlaces)
.build();
Further yet, i have a "for" loop that creates the form based on the spreadsheets fields. Up to this point, things are working just fine.
for(var i=0;i<numberRows;i++){
var questionType = data[i][0];
if (questionType==''){
continue;
}
else if(questionType=='TEXTNamesPlaces'){
form.addTextItem()
.setTitle(data[i][1])
.setHelpText(data[i][2])
.setValidation(valNamesPlaces)
.setRequired(false);
}
else if(questionType=='TEXT1stName'){
form.addTextItem()
.setTitle(data[i][1])
.setHelpText(data[i][2])
.setValidation(val1stName)
.setRequired(false);
}
The problem is when i run the script and test the resulting form.
Both validations types get imported just fine (as can be seen in the form's edit mode), but when testing it in preview mode i get an error, as if the Regex wasn't matching (sry the error message is in portuguese, i forgot to translate them as i did with the code up there):
A screenshot of the form in edit mode
A screeshot of the form in preview mode
However, if i manually remove the bars out of this regex "//" it starts working!
A screenshot of the form in edit mode, Regex without bars
A screenshot of the form in preview mode, Regex without bars
What am i doing wrong? I'm no professional dev but in my understanding, it makes no sense to write a Regex without bars.
If this is some Gforms pattern of reading regexes, i still need all of this to be read by the Apps script that creates this form after all. If i even try to pass the regex without the bars there, the script will not be able to read it.
const reg1stName = ^[A-Z]([a-z\'])+
const regNamesPlaces = ^[^\s]([A-Z]|[a-z]|\b[\'\- ])+[^\s]$
//Can't even be saved. Returns: SyntaxError: Unexpected token '^' (line 29, file "Code.gs")
Passing manually all the validations is not an option. Can anybody help me?
Thanks so much
This
/^[A-Z]([a-z\'\-])+/
will not work because the parser is trying to match your / as a string literal.
This
^[A-Z]([a-z\'\-])+
also will not work, because if the name is hyphenated, you will only match up to the hyphen. This will match the 'Some-' in 'Some-Name', for example. Also, perhaps you want a name like 'Saint John' to pass also?
I recommend the following :)
^[A-Z][a-z]*[-\.' ]?[A-Z]?[a-z]*
^ anchors to the start of the string
[A-Z] matches exactly 1 capital letter
[a-z]* matches zero or more lowercase letters (this enables you to match a name like D'Urso)
[-\.' ]? matches zero or 1 instances of - (hyphen), . (period), ' (apostrophe) or a single space (the . (period) needs to be escaped with a backslash because . is special to regex)
[A-Z]? matches zero or 1 capital letter (in case there's a second capital in the name, like D'Urso, St John, Saint-Germaine)

How to assign large string to a variable without ILLEGAL Token error?

I need to assign a long string (4 pages worth of text) to a variable, so far I've been doing it like this
var myText = "[SOME] Text goes \
.. here ? and 'there' \
is more ( to \
come etc. !)";
slashes at the end need to be added to all of the text, and I can't imagine how long this will take to do manually. Also, I get ILLEGAL error for some reason I don't understand for the first line.
Therefore I wanted to find out the best way to handle this situation. I was looking into solutions of passing in a .txt file, but would rather do it as a really long string (this is not a production app). Also string shown in example is random, showing that there can be a lot of various characters in it that need to be accounted for.
You have to concatenate the string:
var t = ""
+"text line 1"
...
+"text line n"
But I would put the text in a text file and read it using xhr (on client) or io (on server).
You cannot have a multiline string definition in javascript but you have several options :
save your text in a file and read this file from your program
use the multiline npm module which propose a hack to use function comments as multiline string definitions
use ES6 multi-line template strings notation, which have multi-line support - https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/template_strings#Multi-line_strings
saving the text in a file would seem to me as the preferred option in your case since the text seem to be very long an potentially coming from an untrusted source. You do not want the pasted text to close the string and start doing innapropriate function calls.

Parsing inconsistent data

Here's what the data's supposed to look like:
Some junk data
More junk data
1. fairly long key, all on one line
value: some other text with spaces and stuff
2. hey look! another long key. still on one line
value: a different value with some different information
There's several of these per file, usually between twenty and thirty. The total number of key-value pairs exceeds 20,000, meaning manually correcting each file is a non-option. The number prefacing each key is supposed to increment properly. There is supposed to be a newline between a value and the following key. Each value should be prefaced with the string "value: "
Right now, I go line by line and classify each line as either key, value, or junk. I then parse the number out of the key and store the number, key, and value in an object.
Issues arise when the data is improperly formatted. Here are a few issues I've encountered thus far:
no newline between the key and value.
an unexpected newline in the middle of the key or value, which results in the program viewing a portion of each key or value as junk data.
the word "value" being spelled wrong.
I handle the third scenario by computing the Levenstein distance between the first six characters of each line and a master string "value:". How can I fix the other two issues?
If it matters, the parsing is happening on a node.js server, but I'm open to other languages if they can work with this inconsistent data more easily.
Take a look at this:
RegEx: ^(\d+)\. ?(.+?)(?:value|vlaue|balue|valie): ?(.+?)[\n\r]{2,}
Explained demo here: http://regex101.com/r/gG0wH8
If you have your 'misspelled value' issue fixed you can simplify it to:
^(\d+)\. ?(.+?)value: ?(.+?)[\n\r]{2,} otherwise add as many misspellings with a | in that RegEx part.
For this to work I hooked on:
line must start with digit(s) and a dot with a optional space
key is everything after the id and before the value
value ends after at least 2 line breaks
You should also remove the correct entries and then reexamine the file to check if anything else is missing.

Regex won't find '\u2028' unicode characters

We're having a lot of trouble tracking down the source of \u2028 (Line Separator) in user submitted data which causes the 'unterminated string literal' error in Firefox.
As a result, we're looking at filtering it out before submitting it to the server (and then the database).
After extensive googling and reading of other people's problems, it's clear I have to filter these characters out before submitting to the database.
Before writing the filter, I attempted to search for the character just to ensure it can find it using:
var index = content.search("/\u2028/");
alert("Index: [" + index + "]");
I get -1 as the result everytime, even when I know the character is in the content variable (I've confirmed via a Java jUnit test on the server side).
Assuming that content.replace() would work the same way as search(), is there something I'm doing wrong or anything I'm missing in order to find and strip these line separators?
Your regex syntax is incorrect. You only use the two forward slashes when using a regex literal. It should be just:
var index = content.search("\u2028");
or:
var index = content.search(/\u2028/); // regex literal
But this should really be done on the server, if anywhere. JavaScript sanitization can be trivially bypassed. It's only useful for user convenience, and I don't think accidentally entering line separator is that common.

Categories

Resources