Scrapy: Converting Javascript array to Json on Python - javascript

I have been struggling with a site I am scrapping using scrappy.
This site, returns a series of Javascript variables (array) with the products data.
Example:
datos[0] = ["12345","3M YELLOW CAT5E CABLE","6.81","1","A","N","N","N","N","N",0,0,0,0,0,"0","0","0","0","0","P","001-0030","12","40K8957","28396","250","Due: 30-12-1899",0.0000,1,"",\'\'];
datos[1] = ["12346","3M GREEN CAT5E CABLE","7.81","1","A","N","N","N","N","N",0,0,0,0,0,"0","0","0","0","0","P","001-0030","12","40K8957","28396","250","Due: 30-12-1899",0.0000,1,"",\'\'];
...
So on...
Fetching the array into a string with scrapy was easy, since the site response prints the variables.
The problem is I want to transform it into Json so I can process it and store it in a database table.
Normally I would use Javascript's function Json.stringify to convert it to Json and post it in PHP.
However when using Python's json.loads and even StringIO I am unable to load the array into json.
Probably is a format error, but I am unable to identify it, since I am not expert in Json nor Python.
EDIT:
I just realize since scrapy is unable to execute Javascript probably the main issue is that the data is just a string. I should format it into a Json format.
Any help is more than welcome.
Thank you.

If you wanted to take an array and create a json object, you could do something like this.
values = ["12345","3M YELLOW CAT5E CABLE","6.81","1","A","N","N","N","N","N",0,0,0,0,0,"0","0","0","0","0","P","001-0030","12","40K8957","28396","250","Due: 30-12-1899",0.0000,1]
keys = [x for x in range(len(values))]
d = dict(zip(keys, values))
x = json.dumps(d)

There is a section in the scrapy doc to find various ways to parse the JavaScript code. For your case, if you just need to have it in an array, you can use the regex to get the data.
Since the website you are scraping is not present in the question, I am assuming this would be a more straightforward way to get it, but you could use whichever way seems suitable.

Related

In NodeRed, how can I separate values from JSON data sent by the Watson IoT Platform

How can I get the distance value and how to assign it to another variable.
I get that data from bluemix (Watson IoT Platform) to node-red
{distance:"45.9"};
I tried like
var data=msg.distance;
Use Json.parse to convert the string to an array.Now you can access the elements of the array.
If you include the JSON node it will convert your JS Object to JSON. But maybe you have messed with the quotes and actually have json; by default you would.
I would add a Debug node to your IoT-In node. Check what exactly you receive. And then it is usually easy to parse like (it depends on what you send):
var distance = msg.payload.d.distance
var distance = msg.payload.distance
You might want to edit your Question to include exactly what you receive in the Debug node and need to parse.
Also be aware that your value for distance is a string, you'll probably want to convert it to a number at somepoint. If it is in your control, it would be better practice to send it as a number to start with.

JSON parsing a Titanium.App.Properties string

In an app, made with TideSDK; i assign a global variable (shocking I know) to a the JSON parse of a string stored in Titanium.App.Properties:
var workbookArray = JSON.parse(Titanium.App.Properties.getString('workbookArray'));
workbookArray is an array of objects.
And then on the unloading of a page, I assign Titanium.App.Properties string the value of workbookArray, which may have been changed by whoever has used the app:
Titanium.App.Properties.setString('workbookArray', JSON.stringify(workbookArray));
Each time I open the app, however, I'm told that JSON was unable to parse the first code snippet (initializing workbookArray).
Aside from this issue, I don't expect to use the app Properties API for my storage needs in the longterm, I wish i could use indexedDB with titanium. SQL is an option, but is a little messy when it comes to objects. Any other suggestions for a database solution?
Try getList and setList
http://docs.appcelerator.com/titanium/latest/#!/api/Titanium.App.Properties
What is stored in the list?

How to convert a JSON string to a JSON string with a different structure

I am building an application where data is retrieved from a third party system as a JSON string. I need to convert this JSON string to another JSON string with a different structure such that it can be used with pre-existing functions defined in a internal Javascript library.
Ideally I want to be able to perform this conversion on the client machine using Javascript.
I have looked at JSONT as a means of achieving this but that project does not appear to be actively maintained:
http://goessner.net/articles/jsont/
Is there a de facto way of achieving this? Or do I have to roll my own mapping code?
You shouldn't be passing JSON into an internal JavaScript library. You should parse the JSON into a JS object, then iterate over it, transforming it into the new format
Example
var json = '[{"a": 1:, "b": 2}, {"a": 4:, "b": 5}]';
var jsObj = JSON.parse(json);
// Transform property a into aa and property b into bb
var transformed = jsObj.map(function(obj){
return {
aa: obj.a,
bb: obj.b
}
});
// transformed = [{aa:1, bb:2},{aa:4, bb:5}]
If you really want JSON you'd just call JSON.stringify(transformed)
https://developer.mozilla.org/en-US/docs/JavaScript/Reference/Global_Objects/Array/map
Here's another answer with an even more complicated transformation How to make a jquery Datatable array out of standard json?
From what I can tell from the home page, the JSONT project is about transforming JSON into entirely different formats anyway (i.e. JSON => HTML).
It's going to be a lot simpler to write your own mapping code, possibly just as a from_json() method on the object you're creating (so YourSpecialObject.from_json(input); returns an instance of that object generated from the JSON data).
From your question, I'm not sure if this fits your use case, but hopefully someone else will have a better answer soon.
Another option is using XSLT. As there are SAX readers and writers for JSON, you can write happily use XSLT with JSON. There's no horrific JSON to XML and back conversion needs to go on. See: http://www.gerixsoft.com/blog/json/xslt4json
I can definitely see the irony in using a XML based language to tranform JSON - but it seems like a good option.
Otherwise you're probably best of writing your own mapping code.

How should i save the response data in a object(json) to have better manipulation and performance?

what i am doing is:
1. Get values from ajax response(which is in json format) for listing rows of data which
response = {"categories":[{"name":"General","id":"6305","pop":"show when clicked"},{"name":"Navigation","id":"6043","pop":"show when clicked"},{"name":"New","id":"6051","pop":"show when clicked"},{"name":"Time","id":"6117","pop":"show when clicked"},{"name":"Reesh","id":"6207","pop":"show when clicked"}]}
2 . I will parse the json and store in a object like this
ex:
object= {6305:{"name":"General","id":"6305","pop":"show when clicked"},
6043:{"name":"Navigation","id":"6043","pop":"show when clicked"},
6051:{"name":"New","id":"6051","pop":"show when clicked"},
6117:{"name":"Time","id":"6117","pop":"show when clicked"},
6207:{"name":"Reesh","id":"6207","pop":"show when clicked"}};
why i am doing this is because i can get the data using the id
ex: object[6305] will give me the data.
3 .So that i can retrieve the data and also make changes to values in the object using the id when changes occur in db.
ex: object[6350].pop="changed";
Please tell me:
-->whether is this the correct method or i can do it in a much simpler or efficient way?
-->whether i can store the json response as it is and parse data as it is? if so please explain with example.
Yes, of course you would not need to build the object:
function getObject(id) {
for (var i=0; i<response.categories.length; i++)
if (response.categories[i].id == id)
return response.categories[i];
return null;
}
However, if you often need to access objects by their ids this function would be slow. Creating the lookup table as you did will not create much memory overhead, but make retrieving data much faster.
BTW: Your title question "save data as object or json" is confusing. Serializing manipulated objects back to JSON makes no sense, as you always will use the parsed objects. Of course, if you just needed to manipulate a JSON string, and knew exactly what to do, (simple) string manipulation could be faster than parsing, manipulating and stringifying.

Convert Google results object (pure js) to Python object

So I'm trying to use Google Map suggest API to request place name suggestions. Unfortunately I can't find the docs for this bit.
Here is an example URI:
http://maps.google.com/maps/suggest?q=lon&cp=3&ll=55.0,-3.5&spn=11.9,1.2&hl=en&gl=uk&v=2
which returns:
{suggestion:[{query:"London",...
I want to use this in python (2.5). Now in proper JSON there would have been quotations around the keys like so:
{"suggestion":[{"query":"London",...
and I could have used simplejson or something, but as it is I'm a bit stuck.
There are two possible solutions here; either I can get to the API code and find an option to return proper JSON, or I do that in python.
Any ideas please.
Ugh, that's indeed pretty annoying. It's a JavaScript literal but it — pointlessly — isn't JSON.
In theory you are supposed to be able to import json.decoder.JSONDecoder from the Python stdlib (or simplejson pre-2.6, which is the same) and subclass it, then pass that subclass to json.loads to override decoder behaviour. In reality this isn't really feasible as json.decoder is full of global cross-references that resist subclassing, and the bit you need to change is slap bang in the middle of def JSONObject.
So it's probably worth looking at other Python JSON libraries. I found this one which, in ‘non-strict’ mode, will parse unquoted object property names:
>>> import demjson
>>> demjson.decode('{suggestion:[{query:"London",interpretation: ...')
{u'suggestion': [{u'query': u'London', u'operation': 2, u'interpretation': ...
I would try to poke around in order to get JSON, but failing that there's this little monstrosity which someone will inevitably yell at me about:
class Iden(object):
def __getitem__(name, index):
return index
notjson = '{...}'
data = eval(notjson, {}, Iden())
import demjson
demjson.decode(google.js)
I found this when trying to parse Google Finance option "JSON" data, which, as many note, isn't JSON-compliant.
demjson saved me writing an obnoxious regex string; it just works.

Categories

Resources