Node - safest way to execute code from a string during runtime - javascript

My Node app gets an HTML page via axios, parses it via htmlparser2 then sends the valuable information to a frontend JS app as JSON.
The HTML page has some JavaScript in it that creates an array, and I need to work with that array in my code. htmlparser2 gets the content of the script as a string. I have two options to handle it as far as I know:
Write a parser that goes through the string and extracts the required info (doable, but complicated)
Run string as some JavaScript code and handle the values from that.
Assume I want to go with option 2. According to this StackOverflow question, using Node's VM module is possible, but the official documentation says "The node:vm module is not a security mechanism. Do not use it to run untrusted code."
I consider the code in my use case untrusted. What would be a safe solution for this?
EDIT: A snippet from the string:
hatizsakCucc = new Array();
hazbanCucc = new Array();
function adatokMessage(targyIndexStr,tomb) {
var targyIndex = parseInt(targyIndexStr);
if (tomb.length<1) alert("Nincs semmi!");
else alert(tomb[targyIndex]);
}
hatizsakCucc[0]="Név: ezüst\nSúly: 0.0001 kg.\nMennyiség: 453\nÖsszsúly: 0.0453 kg.\n";
hatizsakCucc[1]="Név: kaja\nSúly: 0.4 kg.\nÁr: 2 ezüst\nMennyiség: 68\nÖsszár: 136 ezüst\nÖsszsúly: 27.2 kg.\n";
hatizsakCucc[2]="Típus: fegyver\nNév: bot\nSúly: 2 kg.\nÁr: 6 ezüst\nMin. szint: 1\nMaximum sebzés: 6\nSebzés szórás: 5\nFajta: ütő/zúzó\n";
hatizsakCucc[3]="Típus: fegyver\nNév: parittya\nSúly: 0.3 kg.\nÁr: 14 ezüst\nMin. szint: 1\nMaximum sebzés: 7\nSebzés szórás: 4\nFajta: távolsági\n";
hatizsakCucc[4]="Név: csodatarisznya\nSúly: 4 kg.\nÁr: 1000 ezüst\nExtra: templomi árú\n";
hatizsakCucc[5]="Név: imamalom\nSúly: 5 kg.\nÁr: 150 ezüst\nExtra: templomi árú\n";
The whole string is about 100 lines of this, so it's not too much data.
What I need is the contents of the hatizsakCucc array. Actually, getting an array of that it not too difficult with a regex, I'm realizing now.
hatizsakSzkript.match(/hatizsakCucc(.*)\\n/g);
This gives me an array of the hatizsakCucc elements, so I guess my problem is solved.
That said, I'm still curious about the possibility of running "untrusted" code safely.
Further context:
I plan parse each array element so it will be an object, the object elements will be the substring separated by the \n-s
So the expected result for the first array element will be:
hatizsakCucc[0]{
nev: "ezüst",
suly: 0.0001,
mennyiseg: ...
}
I'll write a function that splits the string to substrings at the \n then parse the data with a match().

Related

Node console.log on large array shows "... 86 more items"

I'm new to puppeteer. I used to have PhantomJS and CasperJS but while setting a newer server (freebsd 12) found out that support for PhantomJS is gone and CasperJS gives me segmentation faults.
I was able to port my applications to puppeteer just fine but ran into the problem that when I want to capture data from a table, this data seems to be incomplete or truncated.
I need all the info from a table but always end up getting less.
I have tried smaller tables but it also comes out truncated.
I don't know if the console.log buffer can be extended or not, or if there is a better way to get the values of all tds in the table.
const data = await page.$$eval('table.dtaTbl tr td', tds => tds.map((td) => {
return td.innerHTML;
}));
console.log(data);
I should be able to get all rows but instead I get this
[ 'SF xx/xxxx 3-3999 06-01-16',
'Sample text - POLE',
'',
/* tons of other rows (removed by me in this example) <- */
'',
/* end of output */ ... 86 more items ]
I need the 86 other items!!!
because I'm having PHP pick it up from stdout as the code is executed.
Why console.log does not work
Under the hood, console.log uses util.inspect, which produces output intended for debugging. To create reasonable debugging information, this function will truncate output which would be too long. To quote the docs:
The util.inspect() method returns a string representation of object that is intended for debugging. The output of util.inspect may change at any time and should not be depended upon programmatically.
Solution: Use process.stdout
If you want to write output to stdout you can use process.stdout which is a writable stream. It will not modify/truncate what you write on the stream. You can use it like this:
process.stdout.write(JSON.stringify(data) + '\n');
I added a line break at the end, as the function will not produce a line break itself (in contrast to console.log). If your script does not rely on it you can simply remove it.
You can also use
console.log(JSON.stringify(data, null, 4));
instead of
process.stdout.write(JSON.stringify(data) + '\n');
I know the question is from a couple of years ago, but this has been an issue I've seen time and time again. Discovering (through this thread) the underlying util.inspect call has helped me to overcome this issue in the following way:
process.stdout.write(`${util.inspect(data, { maxArrayLength: 1000 })}\n`)
By default maxArrayLength is 100 which is why the data is truncated for longer arrays.
Do you absolutely have to use stdout? It's not recommended to do that for monitoring because it's very easy for stdout to overrun the buffer (or have incomplete output) - as you've seen illustrating the problem.
Why not modify the PHP script to read from a file as a stream using the readfile function, and write to that stream from your JS code using fs?
https://nodejs.org/docs/latest-v10.x/api/fs.html#fs_class_fs_writestream
https://www.php.net/manual/en/function.readfile.php

Convert what looks like a number but isn't to an integer (Google Earth Engine)

I'm trying to get the number of images in an image collection in the Google Earth Engine (GEE) code editor. The image collection filteredCollection contains all Landsat 8 images on GEE that cover Greenwich (just an example).
The number of images is printed as 113 but it doesn't appear to be of type integer and I can't coerce it to an integer either. Here's what that looks like:
var imageCollection = ee.ImageCollection("LANDSAT/LC8_SR");
var point = ee.Geometry.Point([0.0, 51.48]);
var filteredCollection = imageCollection.filterBounds(point);
var number_of_images = filteredCollection.size();
print(number_of_images); // prints 113
print(number_of_images > 1); // prints false
print(+number_of_images); // prints NaN
print(parseInt(number_of_images, 10)); // prints NaN
print(Number(number_of_images)); // prints NaN
print(typeof number_of_images); // prints object
print(number_of_images.constructor); // prints <Function>
print(number_of_images.constructor.name); // prints Ik
var number_of_images_2 = filteredCollection.length;
print(number_of_images_2); // prints undefined
Any idea what's happening here and how I can get the number of images in the collection as an integer?
P.S.: Collection.size() is the recommended function for getting the number of images in the GEE docs.
This is due to the GEE architecture, the way the GEE client and server side interact with each other. You can read about it in the docs.
But in short:
If you're writing Collection.size(), you're basically building a JSON object on your side (client) which doesn't contain any info per-se. Once you're invoking the print function, you're sending the JSON object to the server side where it gets evaluated and returns the output. This also applies to any other function where you include your variable number_of_images. If the function is evaluated on the server side it will work (because it will be evaluated there), if the function is only executed locally (as number_of_images > 1), it will fail.
This has also a "big" implication on how to use loops in GEE, which is better described in the docs (link above).
So as for a solution:
You can use the function .getInfo() which basically retrieves the result from the Server an lets you assign it to a variable.
So
var number_of_images = filteredCollection.size().getInfo();
will get you where you want. This method is to be used with caution, as stated in the docs:
You shouldn't use getInfo() unless you absolutely need to. If you call getInfo() in your code, Earth Engine will open the container and tell you what's inside, but it will block the rest of your code until that's done
HTH

"Fixing" JSON coming out of MySQL

I'm fetching JSON code stored in MySQL and it has extra slashes, which I have to remove in order to parse it in JavaScript, after I print it on the page. Right now I'm doing the following:
$save = str_replace("\n", "<br>", $save); // Replace new line characters with <br>
$save = str_replace('\\"', '"', $save); // top-level JSON
$save = str_replace('\\\\"', '\"', $save); // HTML inside top level JSON
$save = str_replace('\\\\\\\\\\"', '\\\\\"', $save); // HTML inside second level JSON
Here is an example JSON code, as it comes out from MySQL:
{\"id\":2335,\"editor\":{\"selected_shape\":\"spot-7488\"},\"general\":{\"name\":\"HTML Test\",\"shortcode\":\"html-test\",\"width\":1280,\"height\":776},\"spots\":[{\"id\":\"spot-7488\",\"x\":9.9,\"y\":22.6,\"default_style\":{\"use_icon\":1},\"tooltip_content\":{\"content_type\":\"content-builder\",\"plain_text\":\"<p class=\\\"test\\\">Test</p>\",\"squares_json\":\"{\\\"containers\\\":[{\\\"id\\\":\\\"sq-container-293021\\\",\\\"settings\\\":{\\\"elements\\\":[{\\\"settings\\\":{\\\"name\\\":\\\"Paragraph\\\",\\\"iconClass\\\":\\\"fa fa-paragraph\\\"},\\\"options\\\":{\\\"text\\\":{\\\"text\\\":\\\"<p class=\\\\\\\"test\\\\\\\">Test</p>\\\"}}}]}}]}\"}}]}
And here is how it's supposed to look in order to get parsed correctly (using jsonlint.com to test):
{"id":2335,"editor":{"selected_shape":"spot-7488"},"general":{"name":"HTML Test","shortcode":"html-test","width":1280,"height":776},"spots":[{"id":"spot-7488","x":9.9,"y":22.6,"default_style":{"use_icon":1},"tooltip_content":{"content_type":"content-builder","plain_text":"<p class=\"test\">Test</p>","squares_json":"{\"containers\":[{\"id\":\"sq-container-293021\",\"settings\":{\"elements\":[{\"settings\":{\"name\":\"Paragraph\",\"iconClass\":\"fa fa-paragraph\"},\"options\":{\"text\":{\"text\":\"<p class=\\\"test\\\">Test</p>\"}}}]}}]}"}}]}
Please note that I have HTML code inside JSON, which is inside another JSON and this is where it gets a bit messy.
My question - is there a function or library for PHP (for JS will work too) which covers all those corner cases, because I'm sure someone will find a way to break the script.
Thanks!
The short answer, which is woefully inadequate, is for you to use stripslashes. The reason this answer is not adequate is that your JSON string might have been escaped or had addslashes called on it multiple times and you would have to call stripslashes precisely once for each time this had happened.
The proper solution is to find out where the slashes are being added and either a) avoid adding the slashes or b) understand why the slashes are there and respond accordingly. I strongly believe that the process that creates that broken JSON is where the problem lies.
Slashes are typically added in PHP in a few cases:
magic_quotes are turned on. This is an old PHP feature which has been removed. The basic idea is that PHP used to auto-escape quotes in incoming requests to let you just cram incoming strings into a db. Guess what? NOT SAFE.
add_slashes has been called. Why call this? Some folks use it as an incorrect means of escaping data before sticking stuff in a db. Others use it to keep HTML from breaking when echoing variables out (htmlspecialchars should probably be used instead). It can also come in handy in a variety of other meta situations when you are defining code in a string.
When escaping data input. The most common escaping function is mysqli_real_escape_string. It's very important to escape values before inserting them in a db to prevent sql injection and other exploits but you should never escape things twice.
So there's a possibility that your code is double-escaping things or that addslashes is getting called or something like magic_quotes is causing the problem, but I suspect it is another problem: some JS code might be supplying this JSON not as a proper JSON string, but one that has been escaped so to define a string within javascript.
If you take your example JSON string above, and slap some quotes around it:
var myJSON = "<put your string here>";
then SURPRISE your javascript is not broken and the var myJSON contains a string that is actually valid JSON and can be parsed into an a valid JSON object:
var myJSON = "{\"id\":2335,\"editor\":{\"selected_shape\":\"spot-7488\"},\"general\":{\"name\":\"HTML Test\",\"shortcode\":\"html-test\",\"width\":1280,\"height\":776},\"spots\":[{\"id\":\"spot-7488\",\"x\":9.9,\"y\":22.6,\"default_style\":{\"use_icon\":1},\"tooltip_content\":{\"content_type\":\"content-builder\",\"plain_text\":\"<p class=\\\"test\\\">Test</p>\",\"squares_json\":\"{\\\"containers\\\":[{\\\"id\\\":\\\"sq-container-293021\\\",\\\"settings\\\":{\\\"elements\\\":[{\\\"settings\\\":{\\\"name\\\":\\\"Paragraph\\\",\\\"iconClass\\\":\\\"fa fa-paragraph\\\"},\\\"options\\\":{\\\"text\\\":{\\\"text\\\":\\\"<p class=\\\\\\\"test\\\\\\\">Test</p>\\\"}}}]}}]}\"}}]}";
console.log(JSON.parse(myJSON)); // this is an actual object
The key here is to examine the point of entry where this JSON arrives in your system. I suspect some AJAX request has created some object and rather than sending valid JSON Of that object, it is sending instead an escaped string of a JSON object.
EDIT:
Here's a simple example of what happens when you have too many encodings. Try running this JS in your browser and observe the console output:
var myObj = {"key":"here is my value"};
console.log(myObj);
var myJSON = JSON.stringify(myObj);
console.log(myJSON);
var doubleEncoded = JSON.stringify(myJSON);
console.log(doubleEncoded);

How to create push a large number of string elements into array in javascript without error

I'm using the following function to take a large array of strings (user names) and check them for single quotes, then push them into my new array and return that.
Recently, the number of users in this list increased dramatically (7418 currently) and now this function is getting an error:
Caused by: java.lang.ClassFormatError: Invalid method Code length
105684 in class file org/mozilla/javascript/gen/c135516
The version of javascript is embedded in the application so upgrading that is not an option at this time.
Is there a better way to do this? or a different way to try to avoid this error?
function listExcludedUsers(rInactive) {
var result = new Array('user1', 'user2', 'user3', 'user4');
for (var i = 0; i < rInactive.length; i++) {
//replace single quote with two single quotes for DQL
if (rInactive[i].indexOf("'") > 0) {
rInactive[i] = rInactive[i].replace(/'/g, "''");
}
result.push(rInactive[i]);
}
return result;
}
Thhe JVM restricts the length of methods to 65536 bytes, so it seems as if you have found a bug in Mozilla. Please file it (with example if possible) at https://bugzilla.mozilla.org/.
Meanwhile: try to cut your method in multiple smaller parts, e.g.: chop the input and push it into several smaller arrays and concatenate those at the end. You should do it with several different functions.
This code looks like you are running a jvm that does not take very long methods name and those methods are because of the JavaScript input you are giving, it would be nice to try to reproduce the case, I would
Try to change the input of your JavaScript code that makes those methods in jvm too longs, may be in several sets of inputs.

Calling toString on a javascript function returns source code

I just found out that when you call toString() on a javascript function, as in myFunction.toString(), the source code of that function is returned.
If you try it in the Firebug or Chrome console it will even go as far as formatting it nicely for you, even for minimized javascript files.
I don't know what is does for obfuscated files.
What's the use of such a toString implementation?
It has some use for debugging, since it lets you see the code of the function. You can check if a function has been overwritten, and if a variable points to the right function.
It has some uses for obfuscated javascript code. If you want to do hardcore obfuscation in javascript, you can transform your whole code into a bunch of special characters, and leave no numbers or letters. This technique relies heavily on being able to access most letters of the alphabet by forcing the toString call on everything with +""
example: (![]+"")[+[]] is f since (![]+"") evaluates to the string "false" and [+[]] evaluates to [0], thus you get "false"[0] which extracts the first letter f.
Some letters like v can only be accessed by calling toString on a native function like [].sort. The letter v is important for obfuscated code, since it lets you call eval, which lets you execute anything, even loops, without using any letters. Here is an example of this.
function.ToString - Returns a string representing the source code of the function. For Function objects, the built-in toString method decompiles the function back into the JavaScript source that defines the function.
Read this on mozilla.
You can use it as an implementation for multi-line strings in Javascript source.
As described in this blog post by #tjanczuk, one of the massive inconveniences in Javascript is multi-line strings. But you can leverage .toString() and the syntax for multi-line comments (/* ... */) to produce the same results.
By using the following function:
function uncomment(fn){
return fn.toString().split(/\/\*\n|\n\*\//g).slice(1,-1).join();
};
…you can then pass in multi-line comments in the following format:
var superString = uncomment(function(){/*
String line 1
String line 2
String line 3
*/});
In the original article, it was noted that Function.toString()'s behaviour is not standardised and therefore implementation-discrete — and the recommended usage was for Node.js (where the V8 interpreter can be relied on); however, a Fiddle I wrote seems to work on every browser I have available to me (Chrome 27, Firefox 21, Opera 12, Internet Explorer 8).
A nice use case is remoting. Just toString the function in the client, send it over the wire and execute it on the server.
My use case - I have a node program that processes data and produces interactive reports as html/js/css files. To generate a js function, my node code calls myfunc.toString() and writes it to a file.
You can use it to create a Web Worker from function defined in the main script:
onmessage = function(e) {
console.log('[Worker] Message received from main script:',e.data);
postMessage('Worker speaking.');
}
b = new Blob(["onmessage = " + onmessage.toString()], {type: 'text/javascript'})
w = new Worker(window.URL.createObjectURL(b));
w.onmessage = function(e) {
console.log('[Main] Message received from worker script:' + e.data);
};
w.postMessage('Main speaking.');

Categories

Resources