javascript regex strip out beginning and end part around JSON looking string - javascript

Im hoping someone might be able to help with this.
Ive got a text file on the server loaded with the following
var templateCache = '{"templateCache":[ {"test":"123"} ]}';
as its a text file, we are opening it and are aiming to strip out
var templateCache = '----';
so we can convert the string into an object using JSON.stringify().
We are making use of Rhino.js as the server so we can only use vanilla JS functions to process this string into something usable for our app.
Back story
The file is included in the main function of our little app, but for us to manipulate this set of variables we are opening it, converting it into a JSON object and applying whats neccessary to it and then saving it back as the variable so it doesnt impact our app. but I cant figure out how to strip out the var templateCache = ''; and leave the middle content in place
and im not sure what to look for via google to get the thing into order

/var templateCache = '(.+)';$/m
The regex feature you are looking for is called 'capturing'. It's normally implemented with () parentheses in most languages, js included.
What this example regex does is it 'captures' and remembers everything between the () parentheses and makes it available for more processing.
Here's a quick example for your case:
var fileContent = 'var templateCache = \'{"templateCache":[ {"test":"123"} ]}\';'
var regex = /var templateCache = '(.+?)';$/m;
var matchedGroups = regex.exec(fileContent);
console.log('Result String: ' + matchedGroups[1]);
console.log(JSON.parse(matchedGroups[1]));
Edit: changed the regex to handle cases where the file has more '; substrings on the same line after the json part.

Related

Convert a javascript variable to scala in play framework

I have some variables in javascript:
var something = 1;
var url = "#CSRF(routes.Some.thing(something))";
I get an error during compilation because "something" does not refer to the javascript variable, in other words; the compiler can't identify it. Is it possible to convert/inject the javascript variable somehow? Also, does this work in real time in javascript or do I need to prepare an "#CSRF(routes.Some.thing(something))" array containing each possible "something" value?
It's supposed to be a simple rest call, seen in routes file:
/something/:something controllers.Some.thing(something : Long)
An alternative would be to use a form, but I want to try not to.
You need to use a Javascript Routing and add the CSRF token to the request.
Javascript Rounting description: https://www.playframework.com/documentation/2.6.x/ScalaJavascriptRouting
Look at my answer to the question with explanation how to use it for assets("Correct and long solution"), the usage for other activities is the same: How to update src of a img from javascript in a play framework project?
So in your case, the Javascript routes generation can look like:
JavaScriptReverseRouter("jsRoutes")(
routes.javascript.Some.thing
)
And in the JavaScript:
var something = 1;
var url = jsRoutes.controllers.Some.thing(something).url;
The last - do not forget to add Csrf-Token header to the request.

Javascript crashes on special characters from query string

To use this value in my TypeScript I am getting it from my query string like this:
var UserName = #Request.QueryString["UserName"];
But I get a Unexpeted Identifier error on it because if in DevTool if I go to where it breaks that query string has a value like this:
var UserName = ANT -- ANT 37690 / THIRD PARTY
So is there a way to do some kind of sanitation on it so it wouldn't crash? I guess there are illegal characters in that value for JS?
The error has nothing to do with "special" characters, but with the fact that the right side of the assignment - unwrapped in quotes - contains what js engine views as unknown identifier[s].
One way to properly format data that becomes part of javascript code is to use JavaScriptSerializer class from System.Web.Script.Serialization namespace.
var UserName = #new System.Web.Script.Serialization.JavaScriptSerializer().Seria‌​lize(Request.Query‌​St‌​ring["UserName"]);
The shorter version of this for a string is:
var UserName = "#System.Web.HttpUtility.JavaScriptStringEncode(Request.Query‌​St‌​ring["UserName"])";
or overloaded version that wraps the result in double quotes:
var UserName = #System.Web.HttpUtility.JavaScriptStringEncode(Request.Query‌​St‌​ring["UserName"], true);
You need to include quotes for the value.
var UserName = "#(Request.QueryString["UserName"])";
Otherwise the name will come through verbatim in your code and cause the problems you are seeing.
There is no need to protect against an attack vector here as the user can alter the page as they see fit at any time with a user script, and the QueryString is entered by them and only seen as a result by them in this scenario.
If there was a need to scrub the user input, it should be done prior to it actually reaching the view on server side. However, if still concerned about scrubbing output into a view in this type of scenario in general, it would be prudent to include an encode from razor's library.
var sanitizedJsVariable = "#System.Web.HttpUtility.JavaScriptStringEncode(model.VariableFromServer)";

Decoding/reading json part of complex text file

I am starting to develop a desktop application using Electron. This app will parse some files and datas will be shown from these files. These files are containing complex data.
Now, I am trying to get json data from a complex text file. This text file contains some string and json objects. Sample file looks like that:
...strings that I'm not interested in...
{
"partOneA":0,
"partOneB":7,
....
}
...randomly strings may stand between json sections...
{
"partTwoA":7,
"partTwoB":4,
"partTwoC":4,
...
}
{
"differentPartA":3,
"differentPartB":5,
"differentPartC":6,
...
}
...somemoretext....
The problem is that, how can I get the json parts from this complex file using javascript? Performance of the solution should be considered.
Additionaly, Consider that json structure is nested like that:
{
"partOneA":0,
"partOneB" :{
"partOneBnode1":0,
"partOneBnode2":7,
}
}
Resolving with regular expressions is not applicable for this issue.
Now, I am trying to find a javascript based solution.
As long as you can rely on { and } as starting and closing tags you could use a regex like:
var jsonRegex = new RegExp(/({(?:(.|\n)*?(?:[^\\])){0,1}?})/g);
var result = jsonRegex.exec(text);
var firstMatch= result[1];
As a result you should get the first piece with the subsequent matches at the subsequent indexes. You can read the docs here on mdn.
You can play around with regex on sites like http://regexr.com/
Note
This approach does not work with nested JSON because you would require to match the same amount of opening and closing brackets (see this answer).

"Fixing" JSON coming out of MySQL

I'm fetching JSON code stored in MySQL and it has extra slashes, which I have to remove in order to parse it in JavaScript, after I print it on the page. Right now I'm doing the following:
$save = str_replace("\n", "<br>", $save); // Replace new line characters with <br>
$save = str_replace('\\"', '"', $save); // top-level JSON
$save = str_replace('\\\\"', '\"', $save); // HTML inside top level JSON
$save = str_replace('\\\\\\\\\\"', '\\\\\"', $save); // HTML inside second level JSON
Here is an example JSON code, as it comes out from MySQL:
{\"id\":2335,\"editor\":{\"selected_shape\":\"spot-7488\"},\"general\":{\"name\":\"HTML Test\",\"shortcode\":\"html-test\",\"width\":1280,\"height\":776},\"spots\":[{\"id\":\"spot-7488\",\"x\":9.9,\"y\":22.6,\"default_style\":{\"use_icon\":1},\"tooltip_content\":{\"content_type\":\"content-builder\",\"plain_text\":\"<p class=\\\"test\\\">Test</p>\",\"squares_json\":\"{\\\"containers\\\":[{\\\"id\\\":\\\"sq-container-293021\\\",\\\"settings\\\":{\\\"elements\\\":[{\\\"settings\\\":{\\\"name\\\":\\\"Paragraph\\\",\\\"iconClass\\\":\\\"fa fa-paragraph\\\"},\\\"options\\\":{\\\"text\\\":{\\\"text\\\":\\\"<p class=\\\\\\\"test\\\\\\\">Test</p>\\\"}}}]}}]}\"}}]}
And here is how it's supposed to look in order to get parsed correctly (using jsonlint.com to test):
{"id":2335,"editor":{"selected_shape":"spot-7488"},"general":{"name":"HTML Test","shortcode":"html-test","width":1280,"height":776},"spots":[{"id":"spot-7488","x":9.9,"y":22.6,"default_style":{"use_icon":1},"tooltip_content":{"content_type":"content-builder","plain_text":"<p class=\"test\">Test</p>","squares_json":"{\"containers\":[{\"id\":\"sq-container-293021\",\"settings\":{\"elements\":[{\"settings\":{\"name\":\"Paragraph\",\"iconClass\":\"fa fa-paragraph\"},\"options\":{\"text\":{\"text\":\"<p class=\\\"test\\\">Test</p>\"}}}]}}]}"}}]}
Please note that I have HTML code inside JSON, which is inside another JSON and this is where it gets a bit messy.
My question - is there a function or library for PHP (for JS will work too) which covers all those corner cases, because I'm sure someone will find a way to break the script.
Thanks!
The short answer, which is woefully inadequate, is for you to use stripslashes. The reason this answer is not adequate is that your JSON string might have been escaped or had addslashes called on it multiple times and you would have to call stripslashes precisely once for each time this had happened.
The proper solution is to find out where the slashes are being added and either a) avoid adding the slashes or b) understand why the slashes are there and respond accordingly. I strongly believe that the process that creates that broken JSON is where the problem lies.
Slashes are typically added in PHP in a few cases:
magic_quotes are turned on. This is an old PHP feature which has been removed. The basic idea is that PHP used to auto-escape quotes in incoming requests to let you just cram incoming strings into a db. Guess what? NOT SAFE.
add_slashes has been called. Why call this? Some folks use it as an incorrect means of escaping data before sticking stuff in a db. Others use it to keep HTML from breaking when echoing variables out (htmlspecialchars should probably be used instead). It can also come in handy in a variety of other meta situations when you are defining code in a string.
When escaping data input. The most common escaping function is mysqli_real_escape_string. It's very important to escape values before inserting them in a db to prevent sql injection and other exploits but you should never escape things twice.
So there's a possibility that your code is double-escaping things or that addslashes is getting called or something like magic_quotes is causing the problem, but I suspect it is another problem: some JS code might be supplying this JSON not as a proper JSON string, but one that has been escaped so to define a string within javascript.
If you take your example JSON string above, and slap some quotes around it:
var myJSON = "<put your string here>";
then SURPRISE your javascript is not broken and the var myJSON contains a string that is actually valid JSON and can be parsed into an a valid JSON object:
var myJSON = "{\"id\":2335,\"editor\":{\"selected_shape\":\"spot-7488\"},\"general\":{\"name\":\"HTML Test\",\"shortcode\":\"html-test\",\"width\":1280,\"height\":776},\"spots\":[{\"id\":\"spot-7488\",\"x\":9.9,\"y\":22.6,\"default_style\":{\"use_icon\":1},\"tooltip_content\":{\"content_type\":\"content-builder\",\"plain_text\":\"<p class=\\\"test\\\">Test</p>\",\"squares_json\":\"{\\\"containers\\\":[{\\\"id\\\":\\\"sq-container-293021\\\",\\\"settings\\\":{\\\"elements\\\":[{\\\"settings\\\":{\\\"name\\\":\\\"Paragraph\\\",\\\"iconClass\\\":\\\"fa fa-paragraph\\\"},\\\"options\\\":{\\\"text\\\":{\\\"text\\\":\\\"<p class=\\\\\\\"test\\\\\\\">Test</p>\\\"}}}]}}]}\"}}]}";
console.log(JSON.parse(myJSON)); // this is an actual object
The key here is to examine the point of entry where this JSON arrives in your system. I suspect some AJAX request has created some object and rather than sending valid JSON Of that object, it is sending instead an escaped string of a JSON object.
EDIT:
Here's a simple example of what happens when you have too many encodings. Try running this JS in your browser and observe the console output:
var myObj = {"key":"here is my value"};
console.log(myObj);
var myJSON = JSON.stringify(myObj);
console.log(myJSON);
var doubleEncoded = JSON.stringify(myJSON);
console.log(doubleEncoded);

How to parse JavaScript using Nokogiri and Ruby

I need to parse an array out of a website. The part of the JavaScript I want to parse looks like this:
_arPic[0] = "http://example.org/image1.jpg";
_arPic[1] = "http://example.org/image2.jpg";
_arPic[2] = "http://example.org/image3.jpg";
_arPic[3] = "http://example.org/image4.jpg";
_arPic[4] = "http://example.org/image5.jpg";
_arPic[5] = "http://example.org/image6.jpg";
I get the whole JavaScript using something like this:
product_page = Nokogiri::HTML(open(full_url))
product_page.css("div#main_column script")[0]
Is there an easy way to parse all the variables?
If I read you correctly you're trying to parse the JavaScript and get a Ruby array with your image URLs yes?
Nokogiri only parses HTML/XML so you're going to need a different library; A cursory search turns up the RKelly library which has a parse function that takes a JavaScript string and returns a parse tree.
Once you have a parse tree you're going to need to traverse it and find the nodes of interest by name (e.g. _arPic) then get the string content on the other side of the assignment.
Alternatively, if it doesn't have to be too robust (and it wouldn't be) you can just use a regex to search the JavaScript if possible:
/^\s*_arPic\[\d\] = "(.+)";$/
might be a good starter regex.
The easy way:
_arPic = URI.extract product_page.css("div#main_column script")[0].text
which can be shortened to:
_arPic = URI.extract product_page.at("div#main_column script").text

Categories

Resources