How to get producer of a PDF in google app script - javascript

I am trying to write a gmail add-on where I iterate over all emails and create a report based on their producers. Iterating over emails is the easiest part and I have done that, however I can't find any way to get producer line of each PDFs.
So far I tried
analyzing the blob, however this is something like writing a PDF library to parse all syntax. producer tag is not clearly present
adding pdf.js, which is a third party open source tool to extract such information. However, I couldn't add it due to ES3 - ES6 support issue.
What's the best way to get the producer line of a PDF in google app script?
Thank you

You want to retrieve the value of Producer from PDF file.
I could understand like above. If my understanding is correct, how about this sample script? In this sample script, from your shared PDF files, the value of Producer is retrieved by 2 regular expressions from the file content. Please think of this as one of several answers.
Sample script:
When you use this script, please set the folder ID of folder that PDF files are put. This script retrieves the value from all PDF files in a folder.
var folderId = "### folderId ###";
var files = DriveApp.getFolderById(folderId).getFilesByType(MimeType.PDF);
var regex = [/Producer\((\w.+)\)/i, /<pdf:Producer>(\w.+)<\/pdf:Producer>/i];
var result = [];
while (files.hasNext()) {
var file = files.next();
var content = file.getBlob().getDataAsString();
var r = regex.reduce(function(s, e) {
var m = content.match(e);
if (Array.isArray(m)) s = m[1];
return s;
}, "");
result.push({
fileName: file.getName(),
fileId: file.getId(),
vaueOfProducer: r,
});
}
Logger.log(result); // Result
Result:
This sample result was retrieved from a folder (my Google Drive) that the shared 3 PDF files were put.
[
{
"fileName": "2348706469653861032.pdf",
"fileId": "###",
"vaueOfProducer": "iText� 7.1.5 �2000-2019 iText Group NV \(iText; licensed version\)"
},
{
"fileName": "Getting started with OneDrive.pdf",
"fileId": "###",
"vaueOfProducer": "Adobe PDF library 15.00"
},
{
"fileName": "DITO-Salesflow-040419-1359-46.pdf",
"fileId": "###",
"vaueOfProducer": "iText 2.1.7 by 1T3XT"
}
]
Note:
About the file of 2348706469653861032.pdf, the characters which cannot be displayed are included in the value of Producer.
This is a sample script. So please modify this for your situation.

Related

Delete curent item in a Javascript while hasNext-loop?

I have read 10+ questions about deleting items in Javascripts loops but they don't seem to apply to my situation.
I have this code
var childFolders = parent.getFolders();
// List folders inside the folder
while (childFolders.hasNext()) {
var childFolder = childFolders.next();
//processing childFolder
After this I want to delete the current item (childFolder) from the array childFolders. The reason is that I am doing some work in a Google Apps script but it often takes too long and times out so I need to be able to restart the loop and only have unprocessed items left in the array. To achieve this, in every loop, I copy the content of childFolders into a permanent storage that I can restore on the next run.
I believe your goal is as follows.
You want to stop the loop of the folder iterator, and when you run the script again, you want to start the folder iterator from the stopped iterator.
In this case, how about using "ContinuationToken"? When "ContinuationToken" is used, I thought that your goal might be able to be achieved by the native methods of Google Apps Script. When this is reflected in your script, how about the following script?
Sample script:
Please set var parent = DriveApp.getFolderById("###");.
// When you want to clear the token, please run this function.
function clearToken() {
PropertiesService.getScriptProperties().deleteProperty("token");
}
// This is the main function.
function main() {
var parent = DriveApp.getFolderById("###"); // Please set your parent folder.
var numberOfLoop = 2; // The folder iterator is run every this number.
var p = PropertiesService.getScriptProperties();
var token = p.getProperty("token");
var childFolders = token ? DriveApp.continueFolderIterator(token) : parent.getFolders();
var count = 0;
while (childFolders.hasNext()) {
count++;
var childFolder = childFolders.next();
//processing childFolder
console.log(childFolder.getName()); // This is a sample.
if (count == numberOfLoop) {
var token = childFolders.getContinuationToken();
p.setProperty("token", token);
break;
}
}
}
When you run main function, in this sample script, 2 folders are processed and the script is finished. When you run main again, the folder iterator is started from the continuation.
When you want to start from the 1st folder iterator, please run clearToken.
References:
getContinuationToken() of Class FolderIterator
continueFolderIterator(continuationToken) of Class DriveApp
Added:
From your following reply,
It is the script in step 3 here ourtechroom.com/fix/… I have problem with. I changed it to add all files to an array first and insert them into a sheet in a separate step at the end but that wasn't enough. Hence my question.
My issue is that your solution is a little too complicated for me. That is, I have a hard time applying your solution to the script in the link.
Do you want to retrieve the file metadata of all files in your Google Drive? If my understanding is correct, I think that the script in your link is a high process cost because appendRow is used in the loop. Ref I thought that this might be the reason for your actual situation. If my understanding is correct, how about the following sample script?
Usage:
1. Install Google Apps Script library.
You can see how to install Google Apps Script library of [FilesApp] at here.
2. Enable Drive API.
This modified script uses Drive API. So, please enable Drive API at Advanced Google services.
3. Sample script.
Please copy and paste the following script to the script editor of Spreadsheet. And, please set the top folder ID to folderId. If you use var folderId = "root";, all files in your Google Drive are retrieved.
function myFunction() {
var folderId = "###"; // Please set the top folder ID.
var header = ["parent", "folder", "name", "update", "Size", "URL", "ID", "description", "type"]; // This is from your script.
var obj = FilesApp.createTree(folderId, null, "files(name,modifiedTime,size,webViewLink,id,description,mimeType)");
var values = [header, ...obj.files.flatMap(({ folderTreeByName, filesInFolder }) => {
const f = [folderTreeByName.join("|"), folderTreeByName.pop()];
return filesInFolder.length == 0 ? [[...f, ...Array(7).fill(null)]] : filesInFolder.filter(({ mimeType }) => mimeType != MimeType.FOLDER).map(({ name, modifiedTime, size, webViewLink, id, description, mimeType }) => [...f, name || null, new Date(modifiedTime), size || 0, webViewLink, id, description || null, mimeType]);
})];
SpreadsheetApp.getActiveSheet().clear().getRange(1, 1, values.length, values[0].length).setValues(values);
}
References:
FilesApp of Google Apps Script library (Author me)
Files: list of Drive API v3

Creating XLSX (Excel) spreadsheet (or Zip files) in Javacript Server Side

I'm looking for some guidance or ideas on how to create a proper formatted Excel (XLSX) spreadsheet using Javascript Serverside.
I've found multiple sites/libraries (such as SheetJS) which can create the file, but depend on web functions (ie. blobs and the like).
Alternatively a JS library which similarly can create a zip file without using blobs/web functions (ie. i can create the XML files structured within the XLSX file/zip but cannot compress server side.
The reason for this is the need to export these files on Server Side scripts within NetSuite/SuiteScript... so far I've come up empty.
You may be able to get what you are looking for using the 'N/file' SuiteScript module. Create a file using file.Type.EXCEL
Here's a scheduled script code sample that will take saved search results and create an excel file that is saved to the file cabinet. You can reference Suite Answer Id 93557. Also the "file.Type" enum does allow for the type Zip, reference Suite Answer Id 43530
define(['N/search','N/file'], function(search, file) {
function execute(scriptContext){
//Load saved search
var mySearch = search.load({id: '47'});
//Run saved search
var mySearchResultSet = mySearch.run();
//Headers of CSV File separated by commas and ends with a new line (\r\n)
var csvFile = 'Internal ID,Item Name,Average Cost\r\n';
//Iterate through each result
mySearchResultSet.each(iterator);
function iterator(resultObject){
//get values
var internalid = resultObject.getValue(mySearch.columns[0])
var itemid = resultObject.getValue(mySearch.columns[1])
var formulacolumn = resultObject.getValue(mySearch.columns[2])
//Add each result as a new line on CSV
csvFile += internalid+','+itemid+','+formulacolumn+'\r\n'
return true;
}
//Variable for datetime
var date = new Date();
//Creation of file
var fileObj = file.create({
name: 'Saved Search Result - ' + date.toLocaleDateString() +'.xlsx',
fileType: file.Type.EXCEL,
contents: csvFile,
description: 'This is a CSV file.',
encoding: file.Encoding.UTF8,
folder: 123
});
//Save the CSV file
var fileId = fileObj.save();
}
return {
execute: execute
};
});

Script to merge PDF files with same name in G Suite set to weekly trigger

Looking for help compiling a code to merge PDF files with the same name in Google Documents. I have a script that saves a spreadsheet as a PDF with a name which is a formatted date. That said, there could be up to 7 with the same name. I need to merge these into 1 PDF but I need to do it automatically set up on a weekly trigger.
Anything helps. Thank you!
This combines the text of all PDF's in a folder
function combinePDFs() {
var folder=DriveApp.getFolderById('1Zd_ty0O1WljjADzGQGrtUM57hvE5berT');
var destFolder=DriveApp.getFolderById('1mHRFCwzqccJn83N7THnvZ8_Z-DCLeGOV');
var files=folder.getFilesByType(MimeType.PDF);
var text='';
while(files.hasNext()) {
var file=files.next();
var blob=file.getBlob();
var resource={title:blob.getName(),mimeType:blob.getContentType()}
var f=Drive.Files.insert(resource, blob, {ocr: true,ocrLanguage: "en"});
var doc=DocumentApp.openById(f.id);
text+=doc.getBody().getText() + '\n------------------------------------------------------------\n';
DriveApp.getFileById(f.id).setTrashed(true);//trash intermediate files
}
var tf=DocumentApp.create('combined.doc');
tf.getBody().setText(text);
tf.saveAndClose();//combined text file
var docblob=DocumentApp.openById(tf.getId()).getAs('application/pdf');
docblob.setName('combined.pdf');//
destFolder.createFile(docblob);//combined PDF
DriveApp.getFileById(tf.getId()).setTrashed(true);//trashed combined text file
}
Reference from Amit Agarwal
Drive.Files.insert
Setup a Time Based Trigger
Upload files with Google Apps Script

Simple CSV parsing in Javascript

Here is my problem:
I am trying to parse a local CSV file in JavaScript. The file looks like this:
Year,Promo,Surname,Name,Mail
2005/2006,0,XXXX,XXXXX,xxxxx.xxxxx#gmail.com
(...)
2006/2007,1,XXXX,XXXXX,xxxxx.xxxxx#gmail.com
(...)
2007/2008,2,XXXX,XXXXX,xxxxx.xxxxx#gmail.com
etc.
I tried to parse it using several librairies (PapaParse.js, jquery-csv, d3.js...), but:
either the parsing fails (my array is empty)
or I get a XMLHttpRequest cannot load - Cross origin requests are only supported for protocol schemes: http, data, chrome, chrome-extension, https, chrome-extension-resource error, since my file is stored locally.
Is there a simple solution to parse a CSV file in JavaScript, in order to access the data? I looked up hundreds of posts on forums but I could not get it to work.
Thank you very much (excuse me, I am quite new in JS).
Tom.
this answer is canonical in that it addresses anyone's problem that might be described by the question. Only the first of these answers is meant for the OP, although, regarding the last comment, the edit section I added at the bottom is also specifically for the OP.
if you are doing this for a small, local app, you probably can do one of these two things:
launch the browser with CORS disabled:
Chrome.exe --disable-web-security
in the source there is also instructions for firefox
src
run a micro server for your files:
If you’ve got Python installed (most Mac and Linux users do), you can start a quick local web server for testing. Using the command prompt, navigate to the directory that has your HTML files and run the following command:
python -m SimpleHTTPServer
Your files should now be accessible from http://localhost:8000/ and may have a good chance of working when file:/// does not.
src
A better solution, if you run into CORS issues with the python server, might be local-web-server from node: https://www.npmjs.com/package/local-web-server
the typical user looking for an answer to this question is probably using node:
'use strict';
var fs = require('fs');
var readFilePromise = function(file) {
return new Promise(function(ok, notOk) {
fs.readFile(file, function(err, data) {
if (err) {
notOk(err)
} else {
ok(data)
}
})
})
}
readFilePromise('/etc/passwd').then(function(data) {
// do something with the data...
})
src
edit: setting it up for a simple application:
Make the server a serivce in rc.d or wherever. Follow a guide like this: https://blog.terminal.com/using-daemon-to-daemonize-your-programs/
Don't make the server a local service that is active! Instead, make a script to launch your app, and only from that script start the daemon. In your init script for the service, write a check to look for your app's PID or something every few minutes and autoshutdown when the app is no longer running.
Here is a code sample code for basic parsing of CSV you could try.
First step: Read the file.
We can read the file content using the FileReader class method readAsText, because the content in a CSV file is just some text .
Read more about FileReader here: https://developer.mozilla.org/en-US/docs/Web/API/FileReader
This code should be in an 'async' function. Because we have used 'await' to wait for the promise to resolve or reject.
Here the file variable is the File Object you have from the file input HTML element.
const fileContent = await(() => {
const promise = new Promise((resolve,reject) => {
const fileReader = new FileReader();
fileReader.onloadend = ()=>{
try {
const content = fileReader.result;
resolve(content);
} catch (error) {
reject(error);
}
};
fileReader.readAsText(file);
});
return promise;
})();
Second step: Transforming the file.
Here I transformed the file content into an array. A 2D array containing the CSV data.
/** extract the lines by splitting the text content by CRLF */
const linesArray = fileContent.split('\r\n');
const outcomeArray = [];
for (let rowIndex = 0; rowIndex < linesArray.length; rowIndex++) {
/** Checking whether the line is empty or not.
It's possible that there is a blank line in the CSV file.
We shall process only if not blank */
if (linesArray[rowIndex].trim()) {
/** Extract the cell out of the current line */
const currentline = linesArray[rowIndex].split(',').map((cellData, columnIndex) => {
/** Forming the data as an object. This can be customised as needed */
return {
rowIndex,
columnIndex,
value: cellData?.trim()
};
});
outcomeArray.push(currentline);
}
}
Example
If we parse a CSV having this content:
10,11
20,21
Output is a 2D array as below:
[
[
{
"rowIndex": 0,
"columnIndex": 0,
"value": "10"
},
{
"rowIndex": 0,
"columnIndex": 1,
"value": "11"
}
],
[
{
"rowIndex": 1,
"columnIndex": 0,
"value": "20"
},
{
"rowIndex": 1,
"columnIndex": 1,
"value": "21"
}
],
]

I need to overwrite an existing Google Sheets file with an attached Script

I have a Google Sheets file with an attached Script. The script does a number of things, one is it makes a clone of it self using makeCopy. This portion works. Now I want to be able to keep the same cloned Google file name and same Google file ID and just update the content which includes a Spreadsheet and the associated Google script.
if (!fileFound){
var file = masterSSFile.makeCopy(reportFileName, RepFolder);
} else {
oldFile.setContent(masterSSFile.getBlob());
}
When I use makeCopy with the same file name it creates a second file with the same name but with a different file ID.
The else portion fails because .setContent argument seems to just accept text. The result is the word "Blob" in the oldFile, everything else is gone.
I have other scripts that update the contents of a existing spreadsheet by overriding the contents of the various sheets, but I also want the associated script to also be included in the updated file keeping the same file ID.
I found this....
Overwrite an Image File with Google Apps Script
and tried using
var masterSpreadsheetID = SpreadsheetApp.getActiveSpreadsheet().getId();
var masterSpreadsheetFile = DriveApp.getFileById(masterSpreadsheetID);
var oldFileID = oldFile.getId();
var oldFileName = oldFile.getName();
var newBlob = masterSpreadsheetFile.getBlob();
var file = {
title: oldFileName,
mimeType: 'application/vnd.google-apps.spreadsheet'
};
var f = Drive.Files.update(file, oldFileID, newBlob);
I get error: "We're sorry, a server error occurred. Please wait a bit and try again. " on this line: "Drive.Files.update(file, oldFileID, newBlob);"
After reading this:
https://github.com/google/google-api-nodejs-client/issues/495
it looks like Drive.Files.update(), does not support bound scripts.

Categories

Resources