Regex for parsing single key: values out of JSON in Javascript - javascript

I'm trying to see if it's possible to lookup individual keys out of a JSON string in Javascript and return it's Value with Regex. Sort of like building a JSON search tool.
Imagine the following JSON
"{
"Name": "Humpty",
"Age": "18",
"Siblings" : ["Dracula", "Snow White", "Merlin"],
"Posts": [
{
"Title": "How I fell",
"Comments": [
{
"User":"Fairy God Mother",
"Comment": "Ha, can't say I didn't see it coming"
}
]
}
]
}"
I want to be able to search through the JSON string and only pull out individual properties.
lets assume it's a function already, it would look something like.
function getPropFromJSON(prop, JSONString){
// Obviously this regex will only match Keys that have
// String Values.
var exp = new RegExp("\""+prop+"\"\:[^\,\}]*");
return JSONString.match(exp)[0].replace("\""+prop+"\":","");
}
It would return the substring of the Value for the Key.
e.g.
getPropFromJSON("Comments")
> "[
{
"User":"Fairy God Mother",
"Comment": "Ha, can't say I didn't see it coming"
}
]"
If your wondering why I want to do this instead of using JSON.parse(), I'm building a JSON document store around localStorage. localStorage only supports key/value pairs, so I'm storing a JSON string of the entire Document in a unique Key. I want to be able to run a query on the documents, ideally without the overhead of JSON.parsing() the entire Collection of Documents then recursing over the Keys/nested Keys to find a match.
I'm not the best at regex so I don't know how to do this, or if it's even possible with regex alone. This is only an experiment to find out if it's possible. Any other ideas as a solution would be appreciated.

I would strongly discourage you from doing this. JSON is not a regular language as clearly stated here: https://cstheory.stackexchange.com/questions/3987/is-json-a-regular-language
To quote from the above post:
For example, consider an array of arrays of arrays:
[ [ [ 1, 2], [2, 3] ] , [ [ 3, 4], [ 4, 5] ] ]
Clearly you couldn't parse that with true regular expressions.
I'd recommend converting your JSON to an object (JSON.parse) & implementing a find function to traverse the structure.
Other than that, you can take a look at guts of Douglas Crockford's json2.js parse method. Perhaps an altered version would allow you to search through the JSON string & just return the particular object you were looking for without converting the entire structure to an object. This is only useful if you never retrieve any other data from your JSON. If you do, you might as well have converted the whole thing to begin with.
EDIT
Just to further show how Regex breaks down, here's a regex that attempts to parse JSON
If you plug it into http://regexpal.com/ with "Dot Matches All" checked. You'll find that it can match some elements nicely like:
Regex
"Comments"[ :]+((?=\[)\[[^]]*\]|(?=\{)\{[^\}]*\}|\"[^"]*\")
JSON Matched
"Comments": [
{
"User":"Fairy God Mother",
"Comment": "Ha, can't say I didn't see it coming"
}
]
Regex
"Name"[ :]+((?=\[)\[[^]]*\]|(?=\{)\{[^\}]*\}|\"[^"]*\")
JSON Matched
"Name": "Humpty"
However as soon as you start querying for the higher structures like "Posts", which has nested arrays, you'll find that you cannot correctly return the structure since the regex does not have context of which "]" is the designated end of the structure.
Regex
"Posts"[ :]+((?=\[)\[[^]]*\]|(?=\{)\{[^\}]*\}|\"[^"]*\")
JSON Matched
"Posts": [
{
"Title": "How I fell",
"Comments": [
{
"User":"Fairy God Mother",
"Comment": "Ha, can't say I didn't see it coming"
}
]

\{|\}|\[|\]|,|:|(\\-)?\\d+(\\.\\d+)?|".+?"
You can use the following regex and iterate with a match over all tokens of a json. You can tokenize the JSON, but the parsing part has to be implemented by you.
Since you're using JavaScript as I assume from the tags, your best way to encode the JSON stays JSON.parse().

I'm almost 10 years late to the party, but I came up with this.
Not tested in crazier JSONs than this, but it solves my use cases.
const obj1 = {
id: 1,
'name.1': '123',
address: {
'address.1': 'Chicken Dinner Road, 69',
'address.2': 'Psycho lane, 666',
},
'age.1': {
'thisIsSomeCrazyJson.3': 10,
age: 50,
},
types: [
{
id: 22,
'name.name': '123',
typeOption: {
id: 1,
'whoTFWroteThisJSON.2': '123',
},
},
{
id: 32,
'name.1': '123',
},
],
};
const obj2 = {
Name: 'Humpty',
Age: '18',
Siblings: ['Dracula', 'Snow White', 'Merlin'],
Posts: [
{
Title: 'How I fell',
Comments: [
{
'User': 'Fairy God Mother',
'Comment': "Ha, can't say I didn't see it coming",
},
],
},
],
};
function matchKeyDeep(input, pattern) {
return Object.entries(input).reduce((nextInput, [key, value]) => {
const isMatch = pattern.test(key);
if (Array.isArray(value)) {
const arrValue = value;
let nextValue = arrValue.map((arrItem) => {
if (typeof arrItem === 'object') {
return matchKeyDeep(arrItem, pattern);
}
return arrItem;
});
if (!isMatch && Array.isArray(nextValue)) {
nextValue = nextValue.filter((v) => (typeof v === 'object' && v !== null));
if (nextValue.length === 0) return nextInput;
}
nextInput[key] = nextValue;
return nextInput;
}
if (typeof value === 'object') {
const recurse = matchKeyDeep(value, pattern);
if (!isMatch && Object.keys(recurse).length === 0) {
return nextInput;
}
nextInput[key] = recurse;
return nextInput;
}
if (isMatch) {
nextInput[key] = value;
}
return nextInput;
}, {});
}
const res = matchKeyDeep(obj1, /\.\d/);
const res2 = matchKeyDeep(obj2, /Comment/);
console.log(res);
console.log(res2);

First, stringify the JSON object. Then, you need to store the starts and lengths of the matched substrings. For example:
"matched".search("ch") // yields 3
For a JSON string, this works exactly the same (unless you are searching explicitly for commas and curly brackets in which case I'd recommend some prior transform of your JSON object before performing regex (i.e. think :, {, }).
Next, you need to reconstruct the JSON object. The algorithm I authored does this by detecting JSON syntax by recursively going backwards from the match index. For instance, the pseudo code might look as follows:
find the next key preceding the match index, call this theKey
then find the number of all occurrences of this key preceding theKey, call this theNumber
using the number of occurrences of all keys with same name as theKey up to position of theKey, traverse the object until keys named theKey has been discovered theNumber times
return this object called parentChain
With this information, it is possible to use regex to filter a JSON object to return the key, the value, and the parent object chain.
You can see the library and code I authored at http://json.spiritway.co/

Related

Using lodash isEqual() to compare single object property with matching properties from large array with multiple objects

I've looked at lodash documentation and played around with comparing simple objects. I've also found a number of explanations online for comparing entire objects and other types of comparisons, but I want to compare one property value in a single object with the values of all properties of a certain name in a large array with multiple objects.
Is lodash smart enough to do this as is, and, if so, what would be the proper syntax to handle this? Or do I need some sort of loop to work through the larger object and recursively compare its properties of a certain name with the small object property?
The javascript comparison I'm looking for would be something like this, but I don't know how to indicate that I want to compare all itemURL properties in the large array:
// guard clause to end the larger function if test is true, any match found
if (_.isEqual(feedItem.link, rssDataFileArr.itemURL)) {
return;
}
Small object example:
const feedItem = {
link: 'https://news.google.com/rss/search?q=nodejs',
otherProperty: 'whatever'
}
Large array of objects example:
const rssDataFileArr = [
{
"itemURL": "https://news.google.com/rss/search?q=rss-parser",
"irrelevantProperty": "hello"
},
{
"itemURL": "https://news.google.com/rss/search?q=nodejs",
"irrelevantProperty": "world"
},
{
"itemURL": "https://news.google.com/rss/search?q=javascript",
"irrelevantProperty": "hello"
}
]
Any and all help appreciated.
As per suggestion in comment, I went with a built-in javascript method instead of lodash. I used some() because I only needed a true/false boolean result, not a find() value.
const feedItem = {
link: 'https://news.google.com/rss/search?q=nodejs',
otherProperty: 'whatever',
};
const rssDataFileArr = [
{
itemURL: 'https://news.google.com/rss/search?q=rss-parser',
irrelevantProperty: 'hello',
},
{
itemURL: 'https://news.google.com/rss/search?q=nodejs',
irrelevantProperty: 'world',
},
{
itemURL: 'https://news.google.com/rss/search?q=javascript',
irrelevantProperty: 'hello',
},
{
itemURL: 'https://news.google.com/rss/search?q=nodejs',
irrelevantProperty: 'world',
},
];
const linkMatch = rssDataFileArr.some(
({ itemURL }) => itemURL === feedItem.link
);
// guard clause to end the larger function if test is true, any match found
if (linkMatch) {
console.log('linkMatch is true');
return;
}

Proper way of handling errors when working with nested objects and arrays

My data structure (condensed)
const data = {
"cars": [
{
"name": "toyota",
"sedan": {
"2d": [
{
"name": "corolla",
"year": 2020
},
{
"name": "supra",
"year": 1986
}
]
}
}
]
};
To find the object by name, I would do:
const twoDcars = data.cars.reduce(car => car.sedan);
const match = twoDcars.sedan['2d'].filter(car => car.name === "corolla");
console.log(match); //[ { name: 'corolla', year: 2020 } ]
With conditional check:
const twoDcars = data.cars && data.cars.reduce(car => car.sedan);
const match = twoDcars && twoDcars.sedan && twoDcars.sedan['2d'] && twoDcars.sedan['2d'].filter(car => car && car.name === "corolla");
console.log(match); //[ { name: 'corolla', year: 2020 } ]
With try/catch:
let match;
try {
match = data.cars.reduce(car => car.sedan).sedan['2d'].filter(car => car.name === "corolla");
} catch {}
console.log(match); //[ { name: 'corolla', year: 2020 } ]
My question is, what is the preferred/industry-standard of doing this.
A && A.B && A.B.C && A.B.C.D
try {A.B.C.D} catch{}
Some other approach?
My requirement is pretty simple.
Find the match if possible
App shouldn't break on any conditions.
What am I trying to do is avoid hundreds of && or `try{}catch{}`` everywhere in my code. I can create utility methods whenever possible but with the complex/nested data that I'm dealing with, it is often impossible.
If possible, I would probably do some massaging of the raw data to get it in a form where you can filter down at the top level and ensure you're not dealing with all the possible nulls everywhere in you code. I'd also get rid of the check on cars by ensuring there's always an empty list of cars. That way, filter and the rest will just work.
I would probably shoot to flatten the car objects into individual cars with all the props; like so:
const data = {
"cars": [
{
"year": 2020,
"make": "toyota",
"model": "corolla",
"type": "sedan",
"doors" : 2
},
{
"year": 1986,
"make": "toyota",
"model": "supra",
"type": "sedan",
"doors" : 2
}
]
};
I wouldn't use multiple chained filters for this I'm just showing how much easier it would be to filter more directly and get all sedans, two-door sedans, etc. simplifying your code and life :)
let results = data
.cars
.filter(car => car.type === 'sedan') // all sedans
.filter(car => car.doors === 2) // two-door sedans
.filter(car => car.model === 'corolla'); // two-door corollas
Of course, once you massage it, you can reorder the filters to be more direct; like so (assuming you know a corolla is a sedan and you want only two-door models):
let results = data
.cars
.filter(car => car.model === 'corolla' && car.doors === 2);
Whether to use try/catch or to add in the guarding conditions, is a matter of opinion, although I have seen more often the guarded expressions.
But there is no doubt that we're all going to be fans of the conditional chaining feature (also: mdn), currently in stage 3.
Then your code would look like:
const match = data.cars?.find(car => car.sedan)
?.sedan?.['2d']?.filter(car => car?.name === "corolla");
If searches in a nested object are frequent, then you could consider to flatten the structure into an array of non-nested objects.
To avoid a scan of the whole array, you could sort that array by one of its object-properties, allowing for a binary search. You can even add some Map objects as separate ways to access the same data, but by key. This obviously brings you back to more nesting, but it would be an additional structure (not replacing the array) for drilling down into your data faster than by filtering the whole lot. Such a Map would hold per key an array of matching objects (no copies, but the same object references as in the main array).

Querying data in mongodb using where clause [duplicate]

I want to perform a query on this collection to determine which documents have any keys in things that match a certain value. Is this possible?
I have a collection of documents like:
{
"things": {
"thing1": "red",
"thing2": "blue",
"thing3": "green"
}
}
EDIT: for conciseness
If you don't know what the keys will be and you need it to be interactive, then you'll need to use the (notoriously performance challenged) $where operator like so (in the shell):
db.test.find({$where: function() {
for (var field in this.settings) {
if (this.settings[field] == "red") return true;
}
return false;
}})
If you have a large collection, this may be too slow for your purposes, but it's your only option if your set of keys is unknown.
MongoDB 3.6 Update
You can now do this without $where by using the $objectToArray aggregation operator:
db.test.aggregate([
// Project things as a key/value array, along with the original doc
{$project: {
array: {$objectToArray: '$things'},
doc: '$$ROOT'
}},
// Match the docs with a field value of 'red'
{$match: {'array.v': 'red'}},
// Re-project the original doc
{$replaceRoot: {newRoot: '$doc'}}
])
I'd suggest a schema change so that you can actually do reasonable queries in MongoDB.
From:
{
"userId": "12347",
"settings": {
"SettingA": "blue",
"SettingB": "blue",
"SettingC": "green"
}
}
to:
{
"userId": "12347",
"settings": [
{ name: "SettingA", value: "blue" },
{ name: "SettingB", value: "blue" },
{ name: "SettingC", value: "green" }
]
}
Then, you could index on "settings.value", and do a query like:
db.settings.ensureIndex({ "settings.value" : 1})
db.settings.find({ "settings.value" : "blue" })
The change really is simple ..., as it moves the setting name and setting value to fully indexable fields, and stores the list of settings as an array.
If you can't change the schema, you could try #JohnnyHK's solution, but be warned that it's basically worst case in terms of performance and it won't work effectively with indexes.
Sadly, none of the previous answers address the fact that mongo can contain nested values in arrays or nested objects.
THIS IS THE CORRECT QUERY:
{$where: function() {
var deepIterate = function (obj, value) {
for (var field in obj) {
if (obj[field] == value){
return true;
}
var found = false;
if ( typeof obj[field] === 'object') {
found = deepIterate(obj[field], value)
if (found) { return true; }
}
}
return false;
};
return deepIterate(this, "573c79aef4ef4b9a9523028f")
}}
Since calling typeof on array or nested object will return 'object' this means that the query will iterate on all nested elements and will iterate through all of them until the key with value will be found.
You can check previous answers with a nested value and the results will be far from desired.
Stringifying the whole object is a hit on performance since it has to iterate through all memory sectors one by one trying to match them. And creates a copy of the object as a string in ram memory (both inefficient since query uses more ram and slow since function context already has a loaded object).
The query itself can work with objectId, string, int and any basic javascript type you wish.

Methodology Approach - json string/output

Hope this question is up to par for SO. I've never posted before but looking for some info in regards to approaching a task I have been assigned.
I have a page with Form information, some simple fields that I've got going.
The page outputs a json string of the form fields and writes them to a text file.
What I need to do is use the key: values to fill in a bash script to output to a hotfolder for ingestion into the production environment.
The "addmore" array needs to determine the number of filename fields produced. I should add that this is dynamic, the "addmore" array may contain 1 entry, or up to technically an unlimited number. This number will fluctuate, though.
JSON Output:
{"formatcode":"JFM","serialnumber":"555","callletters":"555","rotator":"555","addmore":["555","444","333",""]}
How can I use the key : value pairs to output this into a file like below:
{"type":"sequential","items":[
{"filename": "/assets/$formatcode/$serialnumber/$addmore1", "description":"First item"},
{"filename": "/assets/$formatcode/$serialnumber/$addmore2", "description": "Second item"},
{"filename": "/assets/$formatcode/$serialnumber/$addmore3", "description": "Third item"}
]}
This should do the trick, I'm using ES6 string templates to create the dynamic filename and description. The resulting object should have addmore.length items, so we reduce based on data.addmore, creating the basic object as our starting point ({ type: "sequential", items: [] }) and pushing the dynamic strings into output.items on each iteration.
const data = {
"formatcode":"JFM",
"serialnumber":"555",
"callletters":"555",
"rotator":"555",
"addmore":[
"555",
"444",
"333",
]
};
const desiredOutput = data.addmore.reduce((p, c, index) => {
p.items.push({
filename: `/assets/${data.formatcode}/${data.serialnumber}/${c}`,
description: `Item ${index + 1}`
});
return p;
}, {
type: "sequential",
items: []
});
console.log(desiredOutput);
If you don't have ES6 available, you can turn the arrow function into a regular function and use the + concat operator instead of string templates.

Live filtering of json

This Meteor client app "mostly smartPhone usage" needs to accept input from user in the form of typed text and filter json data of about 500 bottom nodes, giving back the branches of the tree where the bottom node text contains the user input text.
{
"people": {
"honour": [
[
"family"
],
[
"friends"
]
],
"respect": [
[
"one another"
]
]
},
"animals": {
"eat": [
[
"row food"
]
]
}
}
When the user inputs 'a', the code needs to give the tree where the occurrence exists:
people, honour, family.
people, respect, one another
When the user types 'o', output should be:
people, respect, one another.
animals, eat, row food.
When the user types 'oo', output should be:
animals, eat, row food.
when the user types 'f', output should be:
people, honour, family.
people, honour, friends.
animals, eat, row food.
My options are:
Converting the json to javascript object and write the seach/find/match logic with few loops.
Use defiantjs which I never used before and have to learn.
Import the json to mongodb and filter the database.
Whatever else you suggest.
Which would be best fit for fast results and easy of maintenance? Thanks
OK this question was an excuse for me to create a generic Object method Object.prototype.paths() to get all the paths within an object. In objects there are values many paths. Some values might be the same at the end of different paths. We will generate an object with the original object's values as properties and these properties' values are going to be the paths. Each value might have several paths so an array of strings array where each strings array will contain a single path to that value.
So once we have this tool to map the object values and paths, it becomes very easy to get your result.
Object.prototype.paths = function(root = [], result = {}) {
var ok = Object.keys(this);
return ok.reduce((res,key) => { var path = root.concat(key);
typeof this[key] === "object" &&
this[key] !== null ? this[key].paths(path,res)
: res[this[key]] == 0 || res[this[key]] ? res[this[key]].push(path)
: res[this[key]] = [path];
return res;
},result);
};
var data = {"people":{"honour":[["family"],["friends"]],"respect":[["one another"],["friends"]]},"animals":{"eat":[["row food"]]}},
paths = data.paths(),
values = Object.keys(paths),
keystr = document.getElementById("keystr");
getPaths = function(str){
var valuesOfInterest = values.filter(f => f.includes(str));
return valuesOfInterest.reduce((p,c) => p.concat({[c]: paths[c]}),[]);
};
keystr.oninput = function(e){
console.log(JSON.stringify(getPaths(e.target.value),null,2))
}
<input id="keystr" placeholder = "enter some characters" value = ""/>
So when you press "o" you will get the following
[
{
"one another": [
[
"people",
"respect",
"0",
"0"
]
]
},
{
"row food": [
[
"animals",
"eat",
"0",
"0"
]
]
}
]
Which means:
The outer array has 2 object items. This means that in the original
object there are two values with "o" character/s in it. "one another" and "row food",
"one another" has only one path ["people", "respect", "0", "0"]. If "one another" was listed at multiple places like "friends" listed both under "respect" and "honour" then this array would contain two sub arrays with paths. Type "fr" and see it for yourself.
A few words of warning: We should be cautious when playing with the Object prototype. Our modification should have the descriptor enumerable = false or it will list in the for in loops and for instance jQuery will not work. (this is how silly jQuery is, since apparently they are not making a hasOwnProperty check in their for in loops) Some good reads are here and here So we have to add this Object method with Object.defineProperty() to make it enumerable = false;. But for the sake of simplicity and to stay in the scope of the question i haven't included that part in the code.
I think a few loops would work just fine. In the example below, as you type in the input, the results matching your search are logged to the console.
$("#search").on("input", function() {
var result = [];
var search = this.value;
if (search.length) {
$.each(data, function(key1, value1) {
//key1: people, animals
$.each(value1, function(key2, value2) {
//key2: honor, respect, eat
$.each(value2, function(i, leaf) {
if (leaf.length && leaf[0].indexOf(search) >= 0) {
//found a match, push it onto the result
var obj = {};
obj[key1] = {};
obj[key1][key2] = leaf;
result.push(obj);
}
});
});
});
}
console.log(result);
});
var data = {
"people": {
"honour": [
[
"family"
],
[
"friends"
]
],
"respect": [
[
"one another"
]
]
},
"animals": {
"eat": [
[
"row food"
]
]
}
};
<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.11.1/jquery.min.js"></script>
<input id="search" />
Use this package for meteor is awesome https://atmospherejs.com/matteodem/easy-search

Categories

Resources