AWS CloudSearch - Getting results of a search in JSON format - javascript

I am performing a search on my AWS CloudSearch domain from a Lambda function in node.js:
I uploaded a document such as this:
{
“some_field”: “bla bla“,
“some_date_field”: 1.466719E9,
"number_field”: 4,
“some_string”: "some long string blabla"
}
And I perform a search like this
var params = {
query: 'bla bla',
};
cloudsearchdomain.search(params, function(err, data) {
if (err) {
console.log(err, err.stack); // an error occurred
context.fail(err);
}
else {
context.succeed(data); // successful response
}
});
The search works and as documented here CloudSearch returns document
info in fields property of a hit. Here is an example:
{
"status": {
"timems": 2,
"rid": “blabla”
},
"hits": {
"found": 1,
"start": 0,
"hit": [
{
"id": “452545-49B4-45C3-B94F-43524542352-454352435.6666-8532-4099-xxxx-1",
"fields": {
“some_field”: [
“bla bla“
],
“some_date_field”: [
"1.466719E9"
],
"number_field”: [
"4"
],
“some_string”: [
"some long string blabla"
],
}
}
]
}
}
As you can see all the fields are returned as strings in an array.
Is there anyway to get the results as a JSON that preserves the
type of all the fields?

After submitting a report about this to AWS I received this reply:
Hello, This is actually the intended behavior. The SDK team chose to implement the "fields" property as a dictionary of string keys and string-array values to maintain consistency across the various languages in which the AWS SDK exists. They place the responsibility for handling the various response formats (HTTP request vs. SDK method) on the client. For more details, please see: https://github.com/aws/aws-sdk-js/issues/791
Unfortunately the only current solutions to the problem I describe above is:
1) Create a parser that will parse the results as needed based on your expected response which takes into account your data types
2) Add a new field to your cloudsearch index (text type) containing a stringified version of your entire json object/document. You can then just use JSON.parse() on this to get the document in JSON format. This solution is not ideal because it adds an unnecessary chunk of text to your document but it proved a quick solution to my problem above.
I'd love to hear of any more solutions if anyone knows of any.

CloudSearch does preserve the field type; the results imply that you've configured these fields as arrays.
You can confirm this by going to Indexing Options for your domain on the AWS web console. You should see fields that are text-array, literal-array, etc as in the screenshot below. Those will be returned as arrays. You can change them to non-array types if you will only ever be submitting a single value for each field in each document and you'll get back non-array values.

Related

Limiting data based on document field in Firestore

I am playing around with Firebase Firestore for my data needs. I have a data model where there is a collection of apples, each document representing a unique apple which has a field of type object which is essentially a map of <string, string>
Think of each apple document like:
name: "SiberianApple",
weight: "400g"
color: {
today: "green",
yesterday: "red",
tomorrow: "crimson"
}
Reading through the docs, I understood that we can query collections in various ways, but is it possible to limit the amount of information a client needs to fetch while fetching a document. Can I have a query on this field color such that it only returns
name: "SiberianApple",
weight: "400g",
color: {
today: "green"
}
Basically something like graphql, where I can ask for what I want. Wanted to know if it's possible to query or should this field be a subcollection of this document so that I can query colors using path apples/<appleId>/colors for the subCollection?
Cloud Firestore listeners fire on the document level. There is no way to get triggered with just particular fields in a document or split the document to get only one property. It's the entire document or nothing. So the Firestore client-side SDKs always return complete documents. Unfortunately, there is no way to request only a part of the document.
is it possible to limit the amount of information.
As said, that's not possible. What you can do, is to create another collection with documents that can only hold the fields you need. This practice is called denormalization, and it's a quite common practice when it comes to NoSQL databases.
As Alex explained in his answer, with the Client SDKs it is not possible to get only a subset of the fields of a Document. When you fetch a Document you get it with all its fields.
However, this is possible with the Firestore REST API: With the REST API you can use a DocumentMask when you fetch one document with the get method. The DocumentMask will "restrict a get operation on a document to a subset of its fields".
For querying several documents, it is a bit different: you use the runQuery method. The request body shall contain a
JSON object which has a structuredQuery property.
In turn, the StructuredQuery has a select property which contains a Projection object.
Here is an example of a a payload used when calling the runQuery method (i.e. https://firestore.googleapis.com/v1/{parent=projects/*/databases/*/documents}:runQuery):
const payloadObj = {
structuredQuery: {
where: {
// ....
},
orderBy: [
{
field: {
fieldPath: 'name',
},
direction: 'ASCENDING',
},
],
from: [
{
collectionId: 'apples',
},
],
select: {
fields: [
{
fieldPath: 'name',
},
{
fieldPath: 'weight',
},
{
fieldPath: 'color.today',
},
],
},
limit: 1000,
},
};
As you can see in the code above, you can target only one of the properties of the color map, with fieldPath: 'color.today',
End note: The REST API is less user friendly than the client SDKs because you need to build the payload passed to the request and parse the responses but it is not difficult to use. In a web app, use fetch, axios or gaxios to execute the calls.

Logic App bypass Null in filter query

I am building a Logic App that uses the Azure Resource connector to obtain a list of my resources. I would then like to filter the results to Microsoft.Compute resources that have a tag name and value of stop and normal.
Here is a snippet of resource that I receive back without any filters
{
"id": "/subscriptions/<subscription>/resourceGroups/Env1/providers/Microsoft.Compute/virtualMachines/MyVM1",
"name": "MyVM1",
"type": "Microsoft.Compute/virtualMachines",
"location": "westeurope",
"tags": {
"stop": "normal"
}
},
{
"id": "/subscriptions/<subscription>/resourceGroups/LogicApp/providers/Microsoft.Logic/workflows/DWSize-Check",
"name": "DWSize-Check",
"type": "Microsoft.Logic/workflows",
"location": "westeurope",
"tags": {}
}
As you can see, the bottom resource does not contain any tags, as with many others that will appear in the list
I use the standard Compose Filter Array connector to try and filter out from the value I receive back.
Here is the code that I wish to use for the filter command:
#and(contains(item()?['id'], '/Microsoft.Compute/virtualMachines/'), contains(item()?['tags'], variables('TagName')), contains(item()?['tags'], variables('TagValue')))
variables('TagName') and variables('TagValue') will be stop and normal, as show in the example tags listed in my results snippet.
However, because there is no tag values listed in other resources, such as Microsoft.Logic/workflows, I receive the following null error:
InvalidTemplate. The execution of template action 'Filter_array'
failed: The evaluation of 'query' action 'where' expression
'#contains(item()?['tags'], variables('TagValue'))' failed: 'The
template language function 'contains' expects its first argument
'collection' to be a dictionary (object), an array or a string. The
provided value is of type 'Null'.'.
Would anyone know how to get around this?
I have tried similar queries to this #contains(item().tags?.stop, variables('TagValue')) just to see if it picks up anything, but I'm still blocked by a null response :(
I tried the above with the help of the Workflow Definition Language, but still no dice. Any help would be greatly appreciated.
EDIT
In addition to the answer posted by Thomas, I have performed the following (image below) to filter out null from the results and get to the TagName, but I am still unable to get to the TagValue, even if I use contains:
#and(contains(item()?['tags'], variables('TagName')), contains(item()?['tags'], variables('TagValue')))
or event just trying to look for TagValue
#contains(item()?['tags'], variables('TagValue'))
You can check for null and return an empty value (an empty array in your case).
You can replace item()?['tags'] with this expression or create a variable :
if(equals(item()?['tags'], null), [], item()?['tags'])

Detect when no previous posts available in Facebook Graph posts edge?

I'm accessing the Facebook Graph API for posts and am trying to figure out the pagination handling. I understand the use of paging.next and paging.previous properties of the results but I'd like to know when there are actually previous results. Particularly, when I make the first 'posts' call, I get back a paging.previous url even though there are no previous values. Upon calling that url I get a response with no results.
For example, calling "168073773388372/posts?limit=2" returns the following:
{
"data": [
{
"story": "Verticalmotion test added a new photo.",
"created_time": "2015-12-02T17:04:56+0000",
"id": "168073773388372_442952469233833"
},
{
"message": "http://www.youtube.com/watch?v=QD2Rdeo8vuE",
"created_time": "2013-12-16T23:19:30+0000",
"id": "168073773388372_184840215045061"
}
],
"paging": {
"previous": "https://graph.facebook.com/v2.6/168073773388372/posts?limit=2&format=json&since=1449075896&access_token=****&__paging_token=enc_AdA69SApv4VoBZB0PPZA7W5EivCYQal8KMFmRNkyhr8ZBk4w0YmFEQUJWV3JZBS70ihyMpbqieQaERhY50enqNCMBuIZATadeopYj8xPvQL7Y8KueaQZDZD&__previous=1",
"next": "https://graph.facebook.com/v2.6/168073773388372/posts?limit=2&format=json&access_token=****&until=1387235970&__paging_token=enc_AdAVMaUlPmpxjBmq5ZClVdNpFp7f9MyMFWjE7ygqsMLW7zvSx3eGHLkfwDxdCx0uO3ooAZCKDmCwMWHZA9RNyxkYUPJyjMtO3kynKm5uF2PhoPZB2gZDZD"
}
}
How can I tell if it's the first set of results?
From tidbits scattered around the documentation and web, it seems like the previous url shouldn't be there.
I don't think it matters because I get the same results in the Graph Explorer but I'm using OpenFB to access the API.
You can set the order to be reverse then get the 1st result
https://developers.facebook.com/docs/graph-api/using-graph-api
Ordering
You can order certain data sets chronologically. For example you may sort a photo's comments in reverse chronological order using the key reverse_chronological:
GET graph.facebook.com
/{photo-id}?
fields=comments.order(reverse_chronological)
order must be one of the following values:
*chronological*
*reverse_chronological*

How to call REC Registery API and store returned JSONs into some kind of database

I'd like to break this into smaller, tighter questions but I don't know what I don't know enough to do that yet. So hopefully a can get specific answers to help do that.
The scope of the solution requires receiving & parsing a lot of records, 2013 had ~17 million certificate(s) transactions while I'm only interested in very small subsets of the order 40,000 records.
In pseudo code:
iterate dates(thisDate)
send message to API for thisDate
receive JSONS as todaysRecords
examine todaysRecords to look for whatever criteria match inside the structure
append a subset of todaysRecords to recordsOut
save recordsOut to a SQL/CSV file.
There's a large database of Renewable Energy Certificates for the use under the Australian Government RET Scheme called the REC Registery and as well as the web interface linked to here, there is an API provided that has a simple call logic as follows
http://rec-registry.gov.au/rec-registry/app/api/public-register/certificate-actions?date=<user provided date> where:
The date part of the URL should be provided by the user
Date format should be YYYY-MM-DD (no angle brackets & 1 date limit)
A JSON is returned (with potentially 100,000s of records on each day).
The API documentation (13pp PDF) is here, but it mainly goes into explaining the elements of the returned structure which is less relevant to my question. Includes two sample JSON responses.
While I know some Javascript (mostly not in a web context) I'm not sure how send this message within a script and figure I'd need to do it server side to be able to process (filter) the returned information and then save the records I'm interested in. I'll have no issue parsing the JSON (if i can use JS) and copying the objects I wish to save I'm not sure where to even start doing this. Do I need a LAMP setup to do this (or MAMP since I'm on OS X) or is there a more light-weight JS way I can execute this. I've never known how to save file from within web-browser JS, I thought it was banned for security reasons but I guess theres ways and means.
If i can rewrite this question to be more clear and effective in soliciting an answer I'm happy for edits to question also.
I guess maybe I'm after some boilerplate code for calling a simple API like this and the stack or application context in which I need to do it. I realise there's potential several ways to execute this but looking for most straightforward for someone with JS knowledge and not much PHP/Python experience (but willing to learn what it takes).
Easy right?
Ok, to point you in the right direction.
Requirements
If the language of choice is Javascript, you'll need to install Node.js. No server whatsoever needed.
Same is valid for PHP or Python or whatever. No apache needed, just the lang int.
Running a script with node
Create a file.js somewhere. To run it, you'll just need to type (in the console) node file.js (in the directory the file lives in.
Getting the onfo from the REC Webservice
Here's an example of a GET request:
var https = require('https');
var fs = require('fs');
var options = {
host: 'rec-registry.gov.au',
port: 443,
path: '/rec-registry/app/api/public-register/certificate-actions?date=2015-06-03'
};
var jsonstr = '';
var request = https.get(options, function(response) {
process.stdout.write("downloading data...");
response.on('data', function (chunk) {
process.stdout.write(".");
jsonstr += chunk;
});
response.on('end', function () {
process.stdout.write("DONE!");
console.log(' ');
console.log('Writing to file...');
fs.writeFile("data.json", jsonstr, function(err) {
if(err) {
return console.error('Error saving file');
}
console.log('The file was saved!');
});
});
})
request.on('error', function(e) {
console.log('Error downloading file: ' + e.message);
});
Transforming a json string into an object/array
use JSON.parse
Parsing the data
examine todaysRecords to look for whatever criteria match inside the structure
Can't help you there, but should be relatively straightforward to look for the correct object properties.
NOTE: Basically, what you get from the request is a string. You then parse that string with
var foo = JSON.parse(jsonstr)
In this case foo is an object. The results "certificates" are actually inside the property result, which is an array
var results = foo.result;
In this example the array contains about 1700 records and the structure of a certificate is something like this:
"actionType": "STC created",
"completedTime": "2015-06-02T21:51:26.955Z",
"certificateRanges": [{
"certificateType": "STC",
"registeredPersonNumber": 10894,
"accreditationCode": "PVD2259359",
"generationYear": 2015,
"generationState": "QLD",
"startSerialNumber": 1,
"endSerialNumber": 72,
"fuelSource": "S.G.U. - solar (deemed)",
"ownerAccount": "Solargain PV Pty Ltd",
"ownerAccountId": 25782,
"status": "Pending audit"
}]
So, to access, for instance, the "ownerAccount" of the first "certificateRanges" of the first "certificate" you would do:
var results = JSON.parse(jsonstr).result;
var ownerAccount = results[0].certificateRanges[0].ownerAccount;
Creating a csv
The best way is to create an abstract structure (that meets your needs) and convert it to a csv.
There's a good npm library called json2csv that can help you there
Example:
var fs = require('fs');
var json2csv = require('json2csv');
var fields = ['car', 'price', 'color']; // csv titles
var myCars = [
{
"car": "Audi",
"price": 40000,
"color": "blue"
}, {
"car": "BMW",
"price": 35000,
"color": "black"
}, {
"car": "Porsche",
"price": 60000,
"color": "green"
}
];
json2csv({ data: myCars, fields: fields }, function(err, csv) {
if (err) console.log(err);
fs.writeFile('file.csv', csv, function(err) {
if (err) throw err;
console.log('file saved');
});
});
If you wish to append instead of writing to a new file you can use
fs.appendFile('file.csv', csv, function (err) { });

URL GET parameter representing and parsing embedded objects

I'm working on an API that consists of several collections that have relations to each other. I want to give users the opportunity to eagerly fetch records from associated collections via a GET url parameter.
For instances
/api/clients/
Would return an array of objects, each representing a client.
But clients have "employees" and "templates". "Templates" also have "revisions." And lastly, "revisions" have "groups"
My strategy for the format of the url parameter is something like this:
/api/clients?expand=[employees][templates[revisions[groups]]]
Which represents:
clients
+ employees
+ templates
+ revisions
+ groups
My question is, what is a good way to go about parsing a string in this format: [employees][templates[revisions[groups]]]
to arrive at an object like this:
{
"employees": {},
"templates": {
"revisions": {
"groups": {}
}
}
}
Or something similar that's easy to work with. I'm working in NodeJS so any answers specific to that environment are a plus. Regex?
Going off #Kovge 's suggestion, I'm going to handle this situation with a JSON string passed in my URL get request. It's a bit verbose, but should be acceptable.
Here's how I'd represent my eager-fetching associated collection records:
[
{
"resource": "employees"
},
{
"resource": "templates",
"extend": [
{
"resource": "revisions",
"extend": [
{
"resource": "groups"
}
]
}
]
}
]
Basically using arrays of objects with "resource" parameters and optional "extend" parameters. My request will end up looking like this: (url encoded)
http://example.com/api/clients?extend=%5B%7B%22resource%22%3A%22employees%22%7D%2C%7B%22resource%22%3A%22templates%22%2C%22extend%22%3A%5B%7B%22resource%22%3A%22revisions%22%2C%22extend%22%3A%5B%7B%22resource%22%3A%22groups%22%7D%5D%7D%5D%7D%5D
EDIT:
This is what my result ended up looking like, still playing with things.

Categories

Resources