I'd like to use the expand and compact methods of the jsonld.js library to translate data from various sources into a common format for processing. If I take a source JSON document, add a #context to it, then pass it through the expand method I'm able to get the common format that I need.
The use case that I haven't been able to find a solution for is when multiple values need to be merged. For example, schema.org defines a PostalAddress with a single field for the streetAddress, but many systems store the street address as separate values (street number, street name, street direction...). To translate the incoming data to the schema.org format I need a way to indicate in my #context that multiple fields make up the streetAddress, in the correct order.
Compacted Document
{
"#context": {
"displaName": "http://schema.org/name",
"website": "http://schema.org/homepage",
"icon": "http://schema.org/image",
"streetNumber": "http://schema.org/streetAddress"
},
"displaName": "John Doe",
"website": "http://example.com/",
"icon": "http://example.com/images/test.png",
"streetNumber": "123",
"streetName": "Main St",
"streetDirection": "South"
}
Expanded Document
{
"http://schema.org/name":[
{
"#value":"John Doe"
}
],
"http://schema.org/image":[
{
"#value":"http://example.com/images/test.png"
}
],
"http://schema.org/streetAddress":[
{
"#value":"123"
}
],
"http://schema.org/homepage":[
{
"#value":"http://example.com/"
}
]
}
I've reviewed all of the JSON-LD specs that I could find and haven't been able to locate anything that indicates a way to split or concatenate values using the #context.
Is anyone aware of a way to map multiple values into one context property, in the correct order, and possibly add whitespace between the values. I also need to find a solution for the reverse scenario, where I need to split one field into multiple values, in the correct order.
Note: Even if I map all three properties to streetAddress, the values will all be included in the array, but there's no guarantee they'll be in the correct order.
One possible way to achieve this is to use a single array field for your address containing the ordered address components (i.e. ["number", "direction", "name"]). Then in the #context you can specify the address with #container: #list, which will ensure the address components are correctly ordered.
So the compacted document would be:
{
"#context": {
"displaName": "http://schema.org/name",
"website": "http://schema.org/homepage",
"icon": "http://schema.org/image",
"address": {
"#id": "http://schema.org/streetAddress",
"#container": "#list"
}
},
"displaName": "John Doe",
"website": "http://example.com/",
"icon": "http://example.com/images/test.png",
"address": ["123", "South", "Main St"]
}
And the expanded one would be
{
"http://schema.org/streetAddress": [
{
"#list": [
{
"#value": "123"
},
{
"#value": "South"
},
{
"#value": "Main St"
}
]
}
],
"http://schema.org/name": [
{
"#value": "John Doe"
}
],
"http://schema.org/image": [
{
"#value": "http://example.com/images/test.png"
}
],
"http://schema.org/homepage": [
{
"#value": "http://example.com/"
}
]
}
I posted an issue on the jsonld.js Github repository. According to #dlongley, the original creator of the jsonld.js library, it's not possible to manipulate properties in this manor, using standard JSON-LD.
https://github.com/digitalbazaar/jsonld.js/issues/115
Related
I am currently building a web tool which enables the user to generate a package of options in the form of a String. To select which options he wants he uses a form with different inputs (radio, checkbox) which is generated from a dictionary.json that currently holds all available options and their codes in the following format (subject to change):
[
{
"id": 0001,
"title":"foo",
"type":"radio",
"options":[
{
"bar":"",
"foo":"489",
"foobar":"489+490"
}
]
},
{
"id": 0002,
"title":"something",
"type":"check",
"options":[
{
"everything":"M016",
"evenmore":"M139"
}
]
},
[...]
As you can see it is basically a small database. The problem is that the options depend on each other so if foo is foobar it might determine that something is definitely evenmore and can NOT be changed to everything. How would I map these dependencies in the dictionary.json so that the generated form can reliably grey out options that are determined by other choices?
The structure has to be flexible so new dependencies can be inserted and would generate the new form reliably or validate existing outputs against them. There could also be options that depend on multiple other options. I can't think of a smart way of saving these dependencies and I wonder if JSON is the right format to go with here.
Any tips or ideas are welcome. Thanks!
You could try to save every option as one object which stores all the options which will be excluded if that option is selected.
So your JSON could look like the following:
[
{
"id": 0001,
"title":"foo",
"type":"radio",
"options":[
{
"bar":"",
"excludes": []
},
{
"foo":"489",
"excludes": []
},
{
"foobar":"489+490",
"excludes": [
{
"id": 0002,
"options": [
"everything"
],
},
{
"id": 0003,
"options": [
"apple",
"cherry"
],
},
]
}
]
},
{
"id": 0002,
"title":"something",
"type":"check",
"options":[
{
"everything":"M016",
"excludes": []
},
{
"evenmore":"M139",
"excludes": []
}
]
},
[...]
Everytime an option is selected you would have to check their excludes list and diable all those options for the specific fields.
To improve the usability you could check there is only one option left for a field, select this option and then disable the whole field.
EDIT:
Additionally you could save a isExcludedBy field to each of the options.
The everything option of id 0002 would then look like this:
"isExcludedBy": [
"id": 0001,
"options": [
"foobar"
]
]
This would be kind of redundant, but depending on what you want your UI to show, it could save you some computing time.
A possible simple solution (which answers your question):
// dictionary.json
{
"options": [
{
"id": 0001,
"title":"foo",
"type":"radio",
"options":[
{
"bar":"",
"foo":"489",
"foobar":"489+490"
}
]
}
// etc.; same as before
],
// this is it:
"dependencies": [
[["0001", "foobar"], ["0002", "evenmore"]],
]
}
dependencies here consist of pairs of [path to option in options that implies another option, path to the implied option].
You could make a Map data structure out of this directly (the implying options are keys, the implied are values).
This assumes that one option can imply only one other option (but it still allows for options that depend on multiple other options).
You could of course easily extend that like so:
[["0001", "foobar"], [["0002", "evenmore"], ["0003", "namaste"]]]
This would mean that "0001"/"foobar" implies both "0002"/"evenmore" and "0003"/"namaste". But perhaps YAGNI. :)
One way to approach this is to model the domain you're actually expressing, and generate the form based on that. For example, we know that apartments have street numbers, and apartment numbers, whereas houseboats don't even have streets.
{
"dwelling": {
"type": "houseboat",
"latitude": null,
"longitude": null,
}
}
or
{
"dwelling": {
"type": "apartment",
"street": "Beech St.",
"street_number": 123,
"apartment_number": 207,
}
}
By modelling the domain rather than the form, you can write rules that apply beyond the form, and you won't have to develop a mini-language for expressing form dependencies.
For a Chrome app, wich stores data in IndexedDB, i have a object like this:
var simplifiedOrderObject = {
"ordernumber": "123-12345-234",
"name": "Mr. Sample",
"address": "Foostreet 12, 12345 Bar York",
"orderitems": [
{
"item": "brush",
"price": "2.00"
},
{
"item": "phone",
"price": "30.90"
}
],
"parcels": [
{
"service": "DHL",
"track": "12345"
},
{
"service": "UPS",
"track": "3254231514"
}
]
}
If i store the hole object in an objectStore, can i use an index for "track", which can be contained multiple times in each order object?
Or is it needed or possibly better/faster to split each object into multiple objectStores like know from relational DBs:
order
orderitem
parcel
The solution should also work in a fast way with 100.000 or more objects stored.
Answering my own question: I have made some tests now. It looks like it is not possible to do this with that object in only 1 objectStore.
An other example object which would work:
var myObject = {
"ordernumber": "123-12345-234",
"name": "Mr. Sample",
"shipping": {"method": "letter",
"company": "Deutsche Post AG" }
}
Creating an index will be done by:
objectStore.createIndex(objectIndexName, objectKeypath, optionalObjectParameters);
With setting objectKeypath it is possible to address a value in the main object like "name":
objectStore.createIndex("name", "name", {unique: false});
It would also be possible to address a value form a subobject of an object like "shipping.method":
objectStore.createIndex("shipping", "shipping.method", {unique: false});
BUT it is not possible to address values like the ones of "track", which are contained in objects, stored in an array. Even something like "parcels[0].track" to get the first value as index does not work.
Anyhow, it would be possible to index all simple elements of an array (but not objects).
So the following more simple structure would allow to create an index entry for each parcelnumber in the array "trackingNumbers":
var simplifiedOrderObject = {
"ordernumber": "123-12345-234",
"name": "Mr. Sample",
"address": "Foostreet 12, 12345 Bar York",
"orderitems": [
{
"item": "brush",
"price": "2.00"
},
{
"item": "phone",
"price": "30.90"
}
],
"trackingNumbers": ["12345", "3254231514"]
}
when creating the index with multiEntry set to true:
objectStore.createIndex("tracking", "trackingNumbers", {unique: false, multiEntry: true});
Anyhow, the missing of the possibility to index object values in arrays, makes using indexedDB really unneeded complicated. It's a failure in design. This forces the developer to do things like in relational DBs, while lacking all the possibilities of SQL. Really bad :(
I have a large JSON file that I want to use to create a subset using vars that I'll be storing in localStorage. I prefer jQuery but other approaches are welcomed.
The id (which will exist in both places) will be used to match and determine that the keys/values should be part of the new subset.
The "master" JSON file is structured similar to this:
{
"myStuff": [
{
"id": "53b0c01de4b0deedb5c9015f",
"brief": "Joe's Stuff",
"author": "Joe"
},
{
"id": "545fb8c4e4b03cfb303de9f2",
"brief": "Jim's Stuff",
"author": "Jim"
},
{
"id": "54676ae4e4b09ffed41ffc7c",
"brief": "Mary's Stuff",
"author": "Mary"
}
]
}
I have flexibility in how the items that will determine the subset are presented.It will always contain multiple values. Those are "id" value to match an existing "id" value in the master JSON file.
For example, I can get a string out of localStorage that would look like this:
{"id1":"545fb8c4e4b03cfb303de9f2","id2":"54676ae4e4b09ffed41ffc7c"}
or as simple as this:
545fb8c4e4b03cfb303de9f2, 54676ae4e4b09ffed41ffc7c
Suggestions to which approach may be better are welcomed.
So, in this case, The subset should return just the stuff from Jim and Mary and ignore Joe.
{
"myStuffSubset": [
{
"id": "545fb8c4e4b03cfb303de9f2",
"brief": "Jim's Stuff",
"author": "Jim"
},
{
"id": "54676ae4e4b09ffed41ffc7c",
"brief": "Mary's Stuff",
"author": "Mary"
}
]
}
Please let me know if I've missed something in the explanation. And, I find fiddles help me learn the best. Thanks!
I've created a plnkr for you.
Basically, the filter function will get the string separated by commas and search inside your JSON:
function search(str, obj) {
var arr = str.split(',');
return obj.myStuff.filter(function(o) {
return arr.indexOf(o.id) > -1;
});
}
Consider this example collection:
{
"_id:"0,
"firstname":"Tom",
"children" : {
"childA":{
"toys":{
'toy 1':'batman',
'toy 2':'car',
'toy 3':'train',
}
"movies": {
'movie 1': "Ironman"
'movie 2': "Deathwish"
}
},
"childB":{
"toys":{
'toy 1':'doll',
'toy 2':'bike',
'toy 3':'xbox',
}
"movies": {
'movie 1': "Frozen"
'movie 2': "Barbie"
}
}
}
}
Now I would like to retrieve ONLY the movies from a particular document.
I have tried something like this:
movies = users.find_one({'_id': 0}, {'_id': 0, 'children.ChildA.movies': 1})
However, I get the whole field structure from 'children' down to 'movies' and it's content. How do I just do a query and retrieve only the content of 'movies'?
To be specific I want to end up with this:
{
'movie 1': "Frozen"
'movie 2': "Barbie"
}
The problem here is your current data structure is not really great for querying. This is mostly because you are using "keys" to actually represent "data points", and while it might initially seem to be a logical idea it is actually a very bad practice.
So rather than do something like assign "childA" and "childB" as keys of an object or "sub-document", you are better off assigning these are "values" to a generic key name in a structure like this:
{
"_id:"0,
"firstname":"Tom",
"children" : [
{
"name": "childA",
"toys": [
"batman",
"car",
"train"
],
"movies": [
"Ironman"
"Deathwish"
]
},
{
"name": "childB",
"toys": [
"doll",
"bike",
"xbox",
],
"movies": [
"Frozen",
"Barbie"
]
}
]
}
Not the best as there are nested arrays, which can be a potential problem but there are workarounds to this as well ( but later ), but the main point here is this is a lot better than defining the data in "keys". And the main problem with "keys" that are not consistently named is that MongoDB does not generally allow any way to "wildcard" these names, so you are stuck with naming and "absolute path" in order to access elements as in:
children -> childA -> toys
children -> childB -> toys
And that in a nutshell is bad, and compared to this:
"children.toys"
From the sample prepared above, then I would say that is a whole lot better approach to organizing your data.
Even so, just getting back something such as a "unique list of movies" is out of scope for standard .find() type queries in MongoDB. This actually requires something more of "document manipulation" and is well supported in the aggregation framework for MongoDB. This has extensive capabilities for manipulation that is not present in the query methods, and as a per document response with the above structure then you can do this:
db.collection.aggregate([
# De-normalize the array content first
{ "$unwind": "$children" },
# De-normalize the content from the inner array as well
{ "$unwind": "$children.movies" },
# Group back, well optionally, but just the "movies" per document
{ "$group": {
"_id": "$_id",
"movies": { "$addToSet": "$children.movies" }
}}
])
So now the "list" response in the document only contains the "unique" movies, which corresponds more to what you are asking. Alternately you could just $push instead and make a "non-unique" list. But stupidly that is actually the same as this:
db.collection.find({},{ "_id": False, "children.movies": True })
As a "collection wide" concept, then you could simplify this a lot by simply using the .distinct() method. Which basically forms a list of "distinct" keys based on the input you provide. This playes with arrays really well:
db.collection.distinct("children.toys")
And that is essentially a collection wide analysis of all the "distinct" occurrences for each"toys" value in the collection, and returned as a simple "array".
But as for you existing structure, it deserves a solution to explain, but you really must understand that the explanation is horrible. The problem here is that the "native" and optimized methods available to general queries and aggregation methods are not available at all and the only option available is JavaScript based processing. Which even though a little better through "v8" engine integration, is still really a complete slouch when compared side by side with native code methods.
So from the "original" form that you have, ( JavaScript form, functions have to be so easy to translate") :
db.collection.mapReduce(
// Mapper
function() {
var id this._id;
children = this.children;
Object.keys(children).forEach(function(child) {
Object.keys(child).forEach(function(childKey) {
Object.keys(childKey).forEach(function(toy) {
emit(
id, { "toys": [children[childkey]["toys"][toy]] }
);
});
});
});
},
// Reducer
function(key,values) {
var output = { "toys": [] };
values.forEach(function(value) {
value.toys.forEach(function(toy) {
if ( ouput.toys.indexOf( toy ) == -1 )
output.toys.push( toy );
});
});
},
{
"out": { "inline": 1 }
}
)
So JavaScript evaluation is the "horrible" approach as this is much slower in execution, and you see the "traversing" code that needs to be implemented. Bad news for performance, so don't do it. Change the structure instead.
As a final part, you could model this differently to avoid the "nested array" concept. And understand that the only real problem with a "nested array" is that "updating" a nested element is really impossible without reading in the whole document and modifying it.
So $push and $pull methods work fine. But using a "positional" $ operator just does not work as the "outer" array index is always the "first" matched element. So if this really was a problem for you then you could do something like this, for example:
{
"_id:"0,
"firstname":"Tom",
"childtoys" : [
{
"name": "childA",
"toy": "batman"
}.
{
"name": "childA",
"toy": "car"
},
{
"name": "childA",
"toy": "train"
},
{
"name": "childB",
"toy": "doll"
},
{
"name": "childB",
"toy": "bike"
},
{
"name": "childB",
"toy": "xbox"
}
],
"childMovies": [
{
"name": "childA"
"movie": "Ironman"
},
{
"name": "childA",
"movie": "Deathwish"
},
{
"name": "childB",
"movie": "Frozen"
},
{
"name": "childB",
"movie": "Barbie"
}
]
}
That would be one way to avoid the problem with nested updates if you did indeed need to "update" items on a regular basis rather than just $push and $pull items to the "toys" and "movies" arrays.
But the overall message here is to design your data around the access patterns you actually use. MongoDB does generally not like things with a "strict path" in the terms of being able to query or otherwise flexibly issue updates.
Projections in MongoDB make use of '1' and '0' , not 'True'/'False'.
Moreover ensure that the fields are specified in the right cases(uppercase/lowercase)
The query should be as below:
db.users.findOne({'_id': 0}, {'_id': 0, 'children.childA.movies': 1})
Which will result in :
{
"children" : {
"childA" : {
"movies" : {
"movie 1" : "Ironman",
"movie 2" : "Deathwish"
}
}
}
}
Consider a JSON like this:
[{
"type": "person",
"name": "Mike",
"age": "29"
},
{
"type": "person",
"name": "Afshin",
"age": "21"
},
{
"type": "something_else",
"where": "NY"
}]
I want to search in the JSON value with a key (for example type='person') and then select a whole object of matched item in JSON. For example when I search for type='person' I expect this value:
[{
"type": "person",
"name": "Mike",
"age": "29"
},
{
"type": "person",
"name": "Afshin",
"age": "21"
}]
Because it's a really big JSON value, I don't want to do a brute-force search in all nodes, so I think the only way is using Regular Expressions but I don't know how can I write a Regex to match something like above.
I'm using NodeJs for the application.
Using underscore.js#where:
var results = _(yourObject).where({ type: 'person' })
If your data set is very very big [e.g. 10k or so], consider filtering / paginating stuff server side.
Plain javascript :
var results = dataset.filter(function(p) {
if(p.type == 'person')
return true;
});
If the requirement is to scan multiple times through the collection, the following one time construction overhead might be of worth.
Use hashing based on values of type.Convert the current data structure to hash map.
var hashMap ={
};
hashMap['person'] =[{},{}];
Hope this helps you.
Use
$.grep(jsonarrayobj,function(n, i){
if(n.type==="person")
{}
})