Decoding/reading json part of complex text file - javascript

I am starting to develop a desktop application using Electron. This app will parse some files and datas will be shown from these files. These files are containing complex data.
Now, I am trying to get json data from a complex text file. This text file contains some string and json objects. Sample file looks like that:
...strings that I'm not interested in...
{
"partOneA":0,
"partOneB":7,
....
}
...randomly strings may stand between json sections...
{
"partTwoA":7,
"partTwoB":4,
"partTwoC":4,
...
}
{
"differentPartA":3,
"differentPartB":5,
"differentPartC":6,
...
}
...somemoretext....
The problem is that, how can I get the json parts from this complex file using javascript? Performance of the solution should be considered.
Additionaly, Consider that json structure is nested like that:
{
"partOneA":0,
"partOneB" :{
"partOneBnode1":0,
"partOneBnode2":7,
}
}
Resolving with regular expressions is not applicable for this issue.
Now, I am trying to find a javascript based solution.

As long as you can rely on { and } as starting and closing tags you could use a regex like:
var jsonRegex = new RegExp(/({(?:(.|\n)*?(?:[^\\])){0,1}?})/g);
var result = jsonRegex.exec(text);
var firstMatch= result[1];
As a result you should get the first piece with the subsequent matches at the subsequent indexes. You can read the docs here on mdn.
You can play around with regex on sites like http://regexr.com/
Note
This approach does not work with nested JSON because you would require to match the same amount of opening and closing brackets (see this answer).

Related

Extract function names in python code using javascript regex

I am using prism.js to highlight python code but for some reasons it is not highlighting function names which are called for example deep_flatten() or flatten([1,2,3]) or flatten(list(map(lambda x: x*2,[1,2,3]))).
So made the following code to overcome this problem
[...document.getElementsByClassName('code')].forEach(x => {
x.innerText = x.innerText.replace(/(\w+)\([^\(\)]*\)/gi,match=>{
if(match.match(/^\d/)){
return match
}
else {
return `<span class="called-function">${match}</span>`
}
}
})
It works fine for the first two ones but fails for the other two ones.
On doing google search I found that this is called something recursive and can be done only with parsers. On searching for python parsers in javscript I found a lot of them but they are very big and for parsing whole code.
How can I make a parser/regex which extracts all function names and encloses them within span tags.
I don't want the specific code just some psuedo-code or algorithm as to how to proceed.
The default re package in std libs can't handle recursive regexes, however seems the regex package can
/(\w+)\([^\(\)]*\)/gi
can be changed to
/(\w+)(\((?:[^\(\)]|(?2))*\))/gi

UTF-8 symbol is converted when inserted to dom

I have a following problem, i am building app that uses data stream from ajax calls, the data that is coming is therefore escaped inside json string.
example: 1°Set
When i insert that data to DOM it is being converted like this: 1°Set
I dont use any libraries like jQuery, pure Javascript.
I tried to store converted name also in another place but i cannot seem to convert it manually, i tried following functions:
var test = function(str) {
console.log(unescape(encodeURIComponent(str)) );
console.log(decodeURIComponent(escape(str)) );
};
test('1°Set');
It stays the same, does anyone have an idea how to convert it to a DOM like version?
I have a following problem, i am building app that uses data stream from ajax calls, the data that is coming is therefore escaped inside json string.
example: 1°Set
Sounds like you're having a problem because your backend serves a JSON that looks like:
{
"something": "1°Set"
}
Instead of a string "1°Set", you're serving HTML source code that amounts to "1°Set". This looks very unnecessary. I cannot see a good reason of using HTML escaping inside JSON, unless you actually want your JSON to actually contain a part of HTML source (with formatting and everything), rather than just a string.
My suggestion: Let's keep it simple and instead serve something like:
{
"something": "1°Set"
}
or equivalently escape it properly using JSON syntax:
{
"something": "1\u00b0Set"
}
Now you'll JavaScript will receive a plain string that can be easily displayed, for example inside element.textContent or element.value or anywhere else. You won't even need any conversions.

"Fixing" JSON coming out of MySQL

I'm fetching JSON code stored in MySQL and it has extra slashes, which I have to remove in order to parse it in JavaScript, after I print it on the page. Right now I'm doing the following:
$save = str_replace("\n", "<br>", $save); // Replace new line characters with <br>
$save = str_replace('\\"', '"', $save); // top-level JSON
$save = str_replace('\\\\"', '\"', $save); // HTML inside top level JSON
$save = str_replace('\\\\\\\\\\"', '\\\\\"', $save); // HTML inside second level JSON
Here is an example JSON code, as it comes out from MySQL:
{\"id\":2335,\"editor\":{\"selected_shape\":\"spot-7488\"},\"general\":{\"name\":\"HTML Test\",\"shortcode\":\"html-test\",\"width\":1280,\"height\":776},\"spots\":[{\"id\":\"spot-7488\",\"x\":9.9,\"y\":22.6,\"default_style\":{\"use_icon\":1},\"tooltip_content\":{\"content_type\":\"content-builder\",\"plain_text\":\"<p class=\\\"test\\\">Test</p>\",\"squares_json\":\"{\\\"containers\\\":[{\\\"id\\\":\\\"sq-container-293021\\\",\\\"settings\\\":{\\\"elements\\\":[{\\\"settings\\\":{\\\"name\\\":\\\"Paragraph\\\",\\\"iconClass\\\":\\\"fa fa-paragraph\\\"},\\\"options\\\":{\\\"text\\\":{\\\"text\\\":\\\"<p class=\\\\\\\"test\\\\\\\">Test</p>\\\"}}}]}}]}\"}}]}
And here is how it's supposed to look in order to get parsed correctly (using jsonlint.com to test):
{"id":2335,"editor":{"selected_shape":"spot-7488"},"general":{"name":"HTML Test","shortcode":"html-test","width":1280,"height":776},"spots":[{"id":"spot-7488","x":9.9,"y":22.6,"default_style":{"use_icon":1},"tooltip_content":{"content_type":"content-builder","plain_text":"<p class=\"test\">Test</p>","squares_json":"{\"containers\":[{\"id\":\"sq-container-293021\",\"settings\":{\"elements\":[{\"settings\":{\"name\":\"Paragraph\",\"iconClass\":\"fa fa-paragraph\"},\"options\":{\"text\":{\"text\":\"<p class=\\\"test\\\">Test</p>\"}}}]}}]}"}}]}
Please note that I have HTML code inside JSON, which is inside another JSON and this is where it gets a bit messy.
My question - is there a function or library for PHP (for JS will work too) which covers all those corner cases, because I'm sure someone will find a way to break the script.
Thanks!
The short answer, which is woefully inadequate, is for you to use stripslashes. The reason this answer is not adequate is that your JSON string might have been escaped or had addslashes called on it multiple times and you would have to call stripslashes precisely once for each time this had happened.
The proper solution is to find out where the slashes are being added and either a) avoid adding the slashes or b) understand why the slashes are there and respond accordingly. I strongly believe that the process that creates that broken JSON is where the problem lies.
Slashes are typically added in PHP in a few cases:
magic_quotes are turned on. This is an old PHP feature which has been removed. The basic idea is that PHP used to auto-escape quotes in incoming requests to let you just cram incoming strings into a db. Guess what? NOT SAFE.
add_slashes has been called. Why call this? Some folks use it as an incorrect means of escaping data before sticking stuff in a db. Others use it to keep HTML from breaking when echoing variables out (htmlspecialchars should probably be used instead). It can also come in handy in a variety of other meta situations when you are defining code in a string.
When escaping data input. The most common escaping function is mysqli_real_escape_string. It's very important to escape values before inserting them in a db to prevent sql injection and other exploits but you should never escape things twice.
So there's a possibility that your code is double-escaping things or that addslashes is getting called or something like magic_quotes is causing the problem, but I suspect it is another problem: some JS code might be supplying this JSON not as a proper JSON string, but one that has been escaped so to define a string within javascript.
If you take your example JSON string above, and slap some quotes around it:
var myJSON = "<put your string here>";
then SURPRISE your javascript is not broken and the var myJSON contains a string that is actually valid JSON and can be parsed into an a valid JSON object:
var myJSON = "{\"id\":2335,\"editor\":{\"selected_shape\":\"spot-7488\"},\"general\":{\"name\":\"HTML Test\",\"shortcode\":\"html-test\",\"width\":1280,\"height\":776},\"spots\":[{\"id\":\"spot-7488\",\"x\":9.9,\"y\":22.6,\"default_style\":{\"use_icon\":1},\"tooltip_content\":{\"content_type\":\"content-builder\",\"plain_text\":\"<p class=\\\"test\\\">Test</p>\",\"squares_json\":\"{\\\"containers\\\":[{\\\"id\\\":\\\"sq-container-293021\\\",\\\"settings\\\":{\\\"elements\\\":[{\\\"settings\\\":{\\\"name\\\":\\\"Paragraph\\\",\\\"iconClass\\\":\\\"fa fa-paragraph\\\"},\\\"options\\\":{\\\"text\\\":{\\\"text\\\":\\\"<p class=\\\\\\\"test\\\\\\\">Test</p>\\\"}}}]}}]}\"}}]}";
console.log(JSON.parse(myJSON)); // this is an actual object
The key here is to examine the point of entry where this JSON arrives in your system. I suspect some AJAX request has created some object and rather than sending valid JSON Of that object, it is sending instead an escaped string of a JSON object.
EDIT:
Here's a simple example of what happens when you have too many encodings. Try running this JS in your browser and observe the console output:
var myObj = {"key":"here is my value"};
console.log(myObj);
var myJSON = JSON.stringify(myObj);
console.log(myJSON);
var doubleEncoded = JSON.stringify(myJSON);
console.log(doubleEncoded);

Converting JSON to have a new top-level wrapper

I suspect I am using the wrong terminology, so please bear with me.
I have been using mustache.js very effectively to work with the JSON output of an API. I am working with a sibling API that has outlook that looks mostly similar, excepting that there is no top-level name. I am trying to figure out how to adjust the new JSON output into the other format, such that I can continue using my mustache.js approach (this seems like the easier of the options, the other being using something other than mustache.js, which I am seeking to avoid).
I can use mustache.js to iterate very easily over the following JSON body using the top-level name 'records':
{"records":
[{"dt_created":"08/23/2013 04:49:13 PM","created_by":"x, x","dt_updated":"08/23/2013 04:49:13 PM","updated_by":"x, x","field_60374": ["Project 67"],"field_60331":["Ability to adjust the number of lines dynamically based on the mood of the person 3 rows down, 4th cube, 11th building. A breakthrough in mind-control data processing."],"field_60333": [{"id":"x","first":"x","last":"x"}],"field_60394": [{"id":"x","first":"x","last":"x"}],"field_60414":["11"],"field_60375": ["22"],"field_60395":["A"],"record_id":"1920704","form_id":"10898"},
{"dt_created":"08/23/2013 04:47:45 PM","created_by":"x, x","dt_updated":"08/23/2013 04:47:45 PM","updated_by":"x, x","field_60374":["Project 2"],"field_60331":["Very cool project to allow more than a single invoice to be in an ERP at any one time. Quite a big leap forward."],"field_60333": [{"id":"x","first":"x","last":"x"}],"field_60394": [{"id":"x","first":"x","last":"x"}],"field_60414":["x, x"],"field_60375":["60"],"field_60395": ["A"],"record_id":"1920703","form_id":"10898"}],
"meta":{"total":2,"count":2}
}
The format that I can't iterate through, since it is missing a top-level name, is:
[{"id":"x","user": {"id":"x","first_name":"x","last_name":"x"},"title":"Test Post 3","text":"This is an equally cool and enthralling post.","created_at":"2013-08-29T17:46:04.801Z","updated_at":"2013-08-29T17:46:04.804Z","num_comments":0,"num_likes":0},
{"id":"y","user": {"id":"212342277","first_name":"x","last_name":"x"},"title":"Test Post 1","text":"Super cool content you want to read!","created_at":"2013-08- 29T17:44:58.188Z","updated_at":"2013-08-29T17:44:58.190Z","num_comments":0,"num_likes":0}]
I tried the following to massage the second JSON example into the format of the first:
$.ajax({
[…….],
success: function(json_data){
alert("success");
var template=$('#listPosts').html();
var stuff = {"records":json_data}; //here
var stuff = JSON.stringify(stuff); //here
alert(stuff);
//var html = Mustache.to_html(template, json_data);
var html = Mustache.to_html(template, stuff);
//$('.content-pane').empty();
$('#listOfPostsContainer').html(html);
}
}
I have an alert that pops up with what is in var 'stuff', and it seems to be formatted like the first JSON example, but mustache.js doesn't parse it (I have verified mustache.js executes). Before trying to address any other issues, I wanted to understand if how I am adding a top-level name to the JSON array in the above AJAX call success attribute is correct, or if I have to do it differently.
If I am missing something to help explain this, let me know, so I can add it.

How to parse JavaScript using Nokogiri and Ruby

I need to parse an array out of a website. The part of the JavaScript I want to parse looks like this:
_arPic[0] = "http://example.org/image1.jpg";
_arPic[1] = "http://example.org/image2.jpg";
_arPic[2] = "http://example.org/image3.jpg";
_arPic[3] = "http://example.org/image4.jpg";
_arPic[4] = "http://example.org/image5.jpg";
_arPic[5] = "http://example.org/image6.jpg";
I get the whole JavaScript using something like this:
product_page = Nokogiri::HTML(open(full_url))
product_page.css("div#main_column script")[0]
Is there an easy way to parse all the variables?
If I read you correctly you're trying to parse the JavaScript and get a Ruby array with your image URLs yes?
Nokogiri only parses HTML/XML so you're going to need a different library; A cursory search turns up the RKelly library which has a parse function that takes a JavaScript string and returns a parse tree.
Once you have a parse tree you're going to need to traverse it and find the nodes of interest by name (e.g. _arPic) then get the string content on the other side of the assignment.
Alternatively, if it doesn't have to be too robust (and it wouldn't be) you can just use a regex to search the JavaScript if possible:
/^\s*_arPic\[\d\] = "(.+)";$/
might be a good starter regex.
The easy way:
_arPic = URI.extract product_page.css("div#main_column script")[0].text
which can be shortened to:
_arPic = URI.extract product_page.at("div#main_column script").text

Categories

Resources