Preventing runaway AWS Lambda function triggers

Preventing runaway AWS Lambda function triggers - javascript

I have a Lambda function that is triggered when a folder object -- for example,
67459e53-20cb-4e7d-8b7a-10e4cd165a44
is created in the root bucket.
Also in the root is index.json, the content index -- a simple array of these folders. For example, { folder1, folder2, ..., folderN }.
Every time a folder object (like above) is added, the Lambda function triggers, gets index.json, adds the new folder object to the JSON array, and then puts index.json back.
Obviously, this createObject event is going to trigger the same Lambda function.
My code, below, should only process the event object if it's a folder; i.e., a key object with a / at the end. (A stackoverflow user was kind enough to help me with this solution.)
I have tested this code locally with lambda-local and everything looks good. My concern is (fear of God) that I could have RUNAWAY EXECUTION.
I have scoured the Lambda best practices and googled for "infinite loops" and the like, but cannot find a way to ENSURE that my Lambda won't execute more than, say, 50 times per day.
Yes, I could have the Lambda that actually creates the folder also write to index.json but that Lambda is part of the AWS Video-on-Demand reference example, and I don't really understand it yet.
Two questions: Can I configure notifications in S3 such that it filters on a (random folder key name with a) suffix of / as described
here? And/Or how can I configure this Lambda in the console to absolutely prevent runaway execution?
// dependencies
var async = require('async');
var AWS = require('aws-sdk');
var util = require('util');
// constants
const VOD_DEST_FOLDER = 'my-triggering-bucket'; //not used bc part of event object
const CONTENT_INDEX_FILENAME = 'index.json';
// get reference to S3 client
var s3 = new AWS.S3();
exports.handler = async (event) => {
try {
console.log('Event', JSON.stringify(event));
// Bucket name.
const triggerBucket = event.Records[0].s3.bucket.name;
// New folder key added.
const newKey = event.Records[0].s3.object.key;
// Add newKey to content index ONLY if it is a folder object. If any other object
// is added in the bucket root then it won't result in new write.
if (newKey.indexOf('/') > -1) {
// Get existing data.
let existing = await s3.getObject({
Bucket: triggerBucket,
Key: CONTENT_INDEX_FILENAME
}).promise();
// Parse JSON object.
let existingData = JSON.parse(existing.Body);
// Get the folder name.
const folderName = newKey.substring(0, newKey.indexOf("/"));
// Check if we have an array.
if (!Array.isArray(existingData)) {
// Create array.
existingData = [];
}
existingData.push(folderName);
await s3.putObject({
Bucket: triggerBucket,
Key: CONTENT_INDEX_FILENAME,
Body: JSON.stringify(existingData),
ContentType: 'application/json'
}).promise();
console.log('Added new folder name ' + folderName);
return folderName;
} else {
console.log('Not a folder.');
return 'Ignored';
}
}
catch(err) {
return err;
}
};

You can configure the S3 notifications with key name filtering. Here's a step by step guide on how to do it in the web console. I think if you add a / suffix filter to the notification that triggers your Lambda, you will achieve your goal.

Related

Copy a storage file within a Firebase Cloud Function

I am launching a cloud function in order to replicate one register I have in Firestore. One of the fields is an image, and the function first tries to copy the image and then duplicate the register.
This is the code:
export async function copyContentFunction(data: any, context: any): Promise<String> {
if (!context.auth || !context.auth.token.isAdmin) {
throw new functions.https.HttpsError('unauthenticated', 'Auth error.');
}
const id = data.id;
const originalImage = data.originalImage;
const copy = data.copy;
if (id === null || originalImage === null || copy === null) {
throw new functions.https.HttpsError('invalid-argument', 'Missing mandatory parameters.');
}
console.log(`id: ${id}, original image: ${originalImage}`);
try {
// Copy the image
await admin.storage().bucket('content').file(originalImage).copy(
admin.storage().bucket('content').file(id)
);
// Create new content
const ref = admin.firestore().collection('content').doc(id);
await ref.set(copy);
return 'ok';
} catch {
throw new functions.https.HttpsError('internal', 'Internal error.');
}
}
I have tried multiple combinations but this code always fail. For some reason the process of copying the image is failing, I am doing anything wrong?
Thanks.

Using the copy() method in a Cloud Function should work without problem. You don't share any detail about the error you get (I recommend to use catch(error) instead of just catch) but I can see two potential problems with your code:
The file corresponding to originalImage does not exist;
The content bucket does not exists in your Cloud Storage instance.
The second problem usually comes from the common mistake of mixing up the concepts of buckets and folders (or directories) in Cloud Storage.
Actually Google Cloud Storage does not have genuine "folders". In the Cloud Storage console, the files in your bucket are presented in a hierarchical tree of folders (just like the file system on your local hard disk) but this is just a way of presenting the files: there aren't genuine folders/directories in a bucket. The Cloud Storage console just uses the different parts of the file paths to "simulate" a folder structure, by using the "/" delimiter character.
This doc on Cloud Storage and gsutil explains and illustrates very well this "illusion of a hierarchical file tree".
So, if you want to copy a file from your default bucket to a content "folder", do as follows:
await admin.storage().bucket().file(`content/${originalImage}`).copy(
admin.storage().bucket().file(`content/${id}`)
);

Reading Multiple files and writing to one file Node.JS

I am currently trying to make a data pipeline using Node.js
Of course, it's not the best way to make it but I want to try implementing it anyways before I make improvements upon it.
This is the situation
I have multiple gzip compressed csv files on AWS S3. I get these "objects" using aws sdk
like the following and make them into readStream
const unzip = createGunzip()
const input = s3.getObject(parameterWithBucketandKey)
.createReadStream()
.pipe(unzip)
and using the stream above I create readline interface
const targetFile = createWriteSTream('path to target file');
const rl = createInterface({
input: input
})
let first = true;
rl.on('line', (line) => {
if(first) {
first = false;
return;
}
targetFile.write(line);
await getstats_and_fetch_filesize();
if(filesize > allowed_size){
changed_file_name = change_the_name_of_file()
compress(change_file_name)
}
});
and this is wrapped as a promise
and I have array of filenames to be retrieved from AWS S3 and map those array of filenames like this
const arrayOfFileNames = [name1, name2, name3 ... and 5000 more]
const arrayOfPromiseFileProcesses= arrayOfFileNames.map((filename) => return promiseFileProcess(filename))
await Promise.all(arrayOfPromiseFileProcesses);
// the result should be multiple gzip files that are compressed again.
sorry I wrote in pseudocode if it needs more to provide context then I will write more but I thought this would give a general contenxt of my problem.
My problem is that it writes to a file fine, but when i change the file_name it it doesn't create one afterwards. I am lost in this synchronous and asynchronous world...
Please give me a hint/reference to read upon. Thank you.

line event handler must be a async function as it invokes await
rl.on('line', async(line) => {
if(first) {
first = false;
return;
}
targetFile.write(line);
await getstats_and_fetch_filesize();
if(filesize > allowed_size){
changed_file_name = change_the_name_of_file()
compress(change_file_name)
}
});

Writing a file to a bare repostory not at the root with NodeGit / LibGit2

I've been able to write a file to a branch in a bare repository using the below code, but it only works for files in the root. I haven't been able to find a good example in the documentation of how to build a tree for a subfolder and use that as a commit.
async function writeFile(filename, buffer) {
const signature = NodeGit.Signature.now('Jamie', 'jamie#diffblue.com');
const repo = await NodeGit.Repository.openBare('java-demo.git');
const commit = await repo.getBranchCommit('master');
const rootTree = await commit.getTree();
const builder = await NodeGit.Treebuilder.create(repo, rootTree);
const oid = await NodeGit.Blob.createFromBuffer(repo, buffer, buffer.length);
await builder.insert(filename, oid, NodeGit.TreeEntry.FILEMODE.BLOB);
const finalOid = await builder.write();
await repo.createCommit('refs/heads/master', signature, signature, 'Commit message', finalOid, [commit]);
}
const buffer = new Buffer('Hello\n', 'utf-8');
writeFile('test.txt', buffer).then(() => console.log('Done'));
What modifications would be needed to post in (for example) src/test.txt, instead of test.txt?

The typical workflow for writing trees goes through the index. For example, git_index_add_frombuffer followed by git_index_write_tree. Even if you don't want to write to the repository's index on disk, you can still use the index interface by creating an in-memory index.
In a bare repository without an index, you can use git_index_new followed by git_index_read_tree to get an index initialized to the contents of your tree. Then write the tree out to the repository with git_index_write_tree_to.
I'm less familiar with the treebuilder interface, but it looks like you would have to create new subtrees recursively. For example, get or create the src subtree and insert the test.txt blob into it. Then get or create the root tree and insert the src subtree into it.

Running out of memory writing to a file in NodeJS

I'm processing a very large amount of data that I'm manipulating and storing it in a file. I iterate over the dataset, then I want to store it all in a JSON file.
My initial method using fs, storing it all in an object then dumping it didn't work as I was running out of memory and it became extremely slow.
I'm now using fs.createWriteStream but as far as I can tell it's still storing it all in memory.
I want the data to be written object by object to the file, unless someone can recommend a better way of doing it.
Part of my code:
// Top of the file
var wstream = fs.createWriteStream('mydata.json');
...
// In a loop
let JSONtoWrite = {}
JSONtoWrite[entry.word] = wordData
wstream.write(JSON.stringify(JSONtoWrite))
...
// Outside my loop (when memory is probably maxed out)
wstream.end()
I think I'm using Streams wrong, can someone tell me how to write all this data to a file without running out of memory? Every example I find online relates to reading a stream in but because of the calculations I'm doing on the data, I can't use a readable stream. I need to add to this file sequentially.

The problem is that you're not waiting for the data to be flushed to the filesystem, but instead keep throwing new and new data to the stream synchronously in a tight loop.
Here's an piece of pseudocode that should work for you:
// Top of the file
const wstream = fs.createWriteStream('mydata.json');
// I'm no sure how're you getting the data, let's say you have it all in an object
const entry = {};
const words = Object.keys(entry);
function writeCB(index) {
if (index >= words.length) {
wstream.end()
return;
}
const JSONtoWrite = {};
JSONtoWrite[words[index]] = entry[words[index]];
wstream.write(JSON.stringify(JSONtoWrite), writeCB.bind(index + 1));
}
wstream.write(JSON.stringify(JSONtoWrite), writeCB.bind(0));

You should wrap your data source in a readable stream too. I don't know what is your source, but you have to make sure, it does not load all your data in memory.
For example, assuming your data set come from another file where JSON objects are splitted with end of line character, you could create a Read stream as follow:
const Readable = require('stream').Readable;
class JSONReader extends Readable {
constructor(options={}){
super(options);
this._source=options.source: // the source stream
this._buffer='';
source.on('readable', function() {
this.read();
}.bind(this));//read whenever the source is ready
}
_read(size){
var chunk;
var line;
var lineIndex;
var result;
if (this._buffer.length === 0) {
chunk = this._source.read(); // read more from source when buffer is empty
this._buffer += chunk;
}
lineIndex = this._buffer.indexOf('\n'); // find end of line
if (lineIndex !== -1) { //we have a end of line and therefore a new object
line = this._buffer.slice(0, lineIndex); // get the character related to the object
if (line) {
result = JSON.parse(line);
this._buffer = this._buffer.slice(lineIndex + 1);
this.push(JSON.stringify(line) // push to the internal read queue
} else {
this._buffer.slice(1)
}
}
}}
now you can use
const source = fs.createReadStream('mySourceFile');
const reader = new JSONReader({source});
const target = fs.createWriteStream('myTargetFile');
reader.pipe(target);
then you'll have a better memory flow:
Please note that the picture and the above example are taken from the excellent nodejs in practice book

Create exportable object or module to wrap third-party library with CommonJS/NodeJS javascript

I'm new to JavaScript and creating classes/objects. I'm trying to wrap an open source library's code with some simple methods for me to use in my routes.
I have the below code that is straight from the source (sjwalter's Github repo; thanks Stephen for the library!).
I'm trying to export a file/module to my main app/server.js file with something like this:
var twilio = require('nameOfMyTwilioLibraryModule');
or whatever it is I need to do.
I'm looking to create methods like twilio.send(number, message)that I can easily use in my routes to keep my code modular. I've tried a handful of different ways but couldn't get anything to work. This might not be a great question because you need to know how the library works (and Twilio too). The var phone = client.getPhoneNumber(creds.outgoing); line makes sure that my outgoing number is a registered/paid for number.
Here's the full example that I'm trying to wrap with my own methods:
var TwilioClient = require('twilio').Client,
Twiml = require('twilio').Twiml,
creds = require('./twilio_creds').Credentials,
client = new TwilioClient(creds.sid, creds.authToken, creds.hostname),
// Our numbers list. Add more numbers here and they'll get the message
numbers = ['+numbersToSendTo'],
message = '',
numSent = 0;
var phone = client.getPhoneNumber(creds.outgoing);
phone.setup(function() {
for(var i = 0; i < numbers.length; i++) {
phone.sendSms(numbers[i], message, null, function(sms) {
sms.on('processed', function(reqParams, response) {
console.log('Message processed, request params follow');
console.log(reqParams);
numSent += 1;
if(numSent == numToSend) {
process.exit(0);
}
});
});
}
});`

Simply add the function(s) you wish to expose as properties on the exports object. Assuming your file was named mytwilio.js and stored under app/ and looks like,
app/mytwilio.js
var twilio = require('twilio');
var TwilioClient = twilio.Client;
var Twiml = twilio.Twiml;
var creds = require('./twilio_creds').Credentials;
var client = new TwilioClient(creds.sid, creds.authToken, creds.hostname);
// keeps track of whether the phone object
// has been populated or not.
var initialized = false;
var phone = client.getPhoneNumber(creds.outgoing);
phone.setup(function() {
// phone object has been populated
initialized = true;
});
exports.send = function(number, message, callback) {
// ignore request and throw if not initialized
if (!initialized) {
throw new Error("Patience! We are init'ing");
}
// otherwise process request and send SMS
phone.sendSms(number, message, null, function(sms) {
sms.on('processed', callback);
});
};
This file is mostly identical to what you already have with one crucial difference. It remembers whether the phone object has been initialized or not. If it hasn't been initialized, it simply throws an error if send is called. Otherwise it proceeds with sending the SMS. You could get fancier and create a queue that stores all messages to be sent until the object is initialized, and then sends em' all out later.
This is just a lazy approach to get you started. To use the function(s) exported by the above wrapper, simply include it the other js file(s). The send function captures everything it needs (initialized and phone variables) in a closure, so you don't have to worry about exporting every single dependency. Here's an example of a file that makes use of the above.
app/mytwilio-test.js
var twilio = require("./mytwilio");
twilio.send("+123456789", "Hello there!", function(reqParams, response) {
// do something absolutely crazy with the arguments
});
If you don't like to include with the full/relative path of mytwilio.js, then add it to the paths list. Read up more about the module system, and how module resolution works in Node.JS.

Develop Reference

JavaScript is the programming language of the Web.

Preventing runaway AWS Lambda function triggers - javascript

You can configure the S3 notifications with key name filtering. Here's a step by step guide on how to do it in the web console. I think if you add a / suffix filter to the notification that triggers your Lambda, you will achieve your goal.

Related

Copy a storage file within a Firebase Cloud Function

Reading Multiple files and writing to one file Node.JS

Writing a file to a bare repostory not at the root with NodeGit / LibGit2

Running out of memory writing to a file in NodeJS

Create exportable object or module to wrap third-party library with CommonJS/NodeJS javascript

Categories

Resources