Best practice creating a key/value map dev/prod node.js - javascript

I have a Node.js app, APP-A, that communicates with another C# app, APP-B, using APP-B's API. APP-B has a RESTful API that returns JSON. Other than a few standard fields e.g., name, description, APP-B's keys are defined when the user creates the field in the system. The resulting JSON looks like this:
{
"name": "An example name",
"description": "Description for the example",
"cust_fields": {
"cust_123": "Joe Bloggs",
"cust_124": "Essex"
}
}
I have two instances of APP-B, a dev and prod environment, which are separate installations. As a result, the JSON from the prod environment is as above, and the JSON from the dev environment looks like this:
{
"name": "An example name",
"description": "Description for the example",
"cust_fields": {
"cust_782": "Joe Bloggs",
"cust_793": "Essex"
}
}
This is dealt with in APP-A (the Node.js app) by having a JSON map like this:
{
"name": "name",
"description": "description",
"cust_fields": {
"full_name": "cust_123",
"city": "cust_124"
}
}
Which is loaded like this:
var map;
switch(env) {
case 'dev':
map = require('../env/dev/map.json');
break;
case 'prod':
map = require('../env/prod/map.json');
break;
};
module.exports = {
name: map.name,
description: map.description,
cust_fields: {
full_name: map.cust_fields.full_name,
city: map.cust_fields.city,
}
}
So I am wondering, is there is a better way of dealing with this? I don't see a way around having to create some kind of manual relationship between the key names across prod and dev, as there is no way to find out what field corresponds to what, but it seems like a lot of work.
Thanks for reading.
Update:
I have created a jsFiddle to better illustrate my question: http://jsfiddle.net/7k9k03o6.

If the mapping is unavoidable and everything is done manually right now, the next best progression would be to automate the building of those lookup maps, through some persistent storage, i.e. a database.
The general flow would be:
When APP-B creates a new form, that field information is stored in the database with all the identifying information. You could store production and dev data in the same db (as a flag) but likely they would just be different databases. Structure might be like customerId, formId, fieldName, fieldMapping, fieldValue, isProduction --> 123, 2, 'cust_124', 'city', 'Essex', true
When APP-A needs a field listing, it queries the DB for the relevant field lists."Find mapping customer X for form Y in production" --> WHERE custId = 123 AND formId = 2 AND isProduction = true would yield a list of fields and their mapping values (which you would post process/reduce into the mapping you need).
This automated process will leave less work for you manually. You shouldn't accidentally miss or forget a mapping from the hand generated file.
This will add a tiny bit of work to the server processing, as you'll need the field mapping from the DB every time a request is processed. (You could back off a bit and do one big query each time a customer is loaded, or further back is each time the server starts . . . depends how dynamic these custom fields are). Plus you would have to map DB results into a usable listing for your purposes.
Depending how many customers and custom forms you are monitoring, an automated process for that will save you a lot of time and avoid a lot of mistakes of all things hand generated.

Related

How to use my JSON file to build a NoSQL database?

I have a JSON file which is 3000+ lines. What I'd like to do is create a NoSQL database with the same structure (it has embedded documents between 3-5 levels deep). But I want to add information to each level and create a schema for each item, so that I can go back at a later stage and update the information fields, and even have users log-in and change their own values.
I am using JavaScript to write a script that will iterate through the file and upload to MongoDB the schema that I want, based on the information at each level. But I'm struggling to write the code that does this efficiently. At this stage, I'm just wasting too much time trying this and that, and want to move on to the next step of my site.
Below is an example of the file. Basically, it's a bunch of embedded documents, and then at the final level (which will be at a different depth depending on which document it's in), there is an array where each of the fields is a string.
How can I use this data to create a MongoDB database while adding a schema to each item, but keeping the hierarchical nature of the documents? I want all of the documents to have one schema, and then each of the strings at the final depth to have their own, separate schema as well. I can't think of an efficient way to iterate through.
Example from the JSON file:
{
"Applied Sciences": {
"Agriculture": {
"Agricultural Economics": [
"Agricultural Environment And Natural Resources",
"Developmental Economics",
"Food And Consumer Economics",
"Production Economics And Farm Management"
],
"Agronomy": [
"Agroecology",
"Biotechnology",
"Plant Breeding",
"Soil Conservation",
"Soil Science",
"Theoretical Modeling"
],
Here's my schema for all but the strings at the end:
name: String,
completed: Boolean,
category: "Field",
items: {
type: Array
},
description: String,
resources: {
type: Array
}
};
And my rough code which at this stage just iterates through. I'm trying to use the same function call to create the Arrays in the schema, but I'm just not up to that stage yet because I can't even iterate properly through:
function createDatabase(data){
for (field in data){
items = {};
for (field in data){
if (typeof data[field] == "object");
items[field] = createDatabase(data[field]);
};
return items;
}

Firebase Realtime Database - Best practice for additional data nodes (lookups and initial load)

I have a Firebase Realtime Database with the below structure.
I wish to fetch all "notes" a user has access to and, initially, only show titles for the notes.
notes: {
"noteId-1345" : {
"access" : {
"author": "1234567890"
"members": {
"1234567890": 0 <--- Author
"0987654321": 1 <--- Member
}
},
"data" : {
"title": "Hello",
"content": "Konichiwa!",
"comment": "123"
}
}
}
To be able to fetch all notes a user has access to I have expanded my data model by keeping an additional user_notes node in the root:
Whenever I associate a user (update of members) with a note, I write that relation both in /notes/$noteid and in /user_notes/$uid.
user_notes: {
"$uid": {
"noteId-1345": true
}
}
When fetching initial data I only need the "notes" the user has access to - including titles.
(I only fetch the entire "note" if a user wants to view a complete "note")
I begin by fetching the ids for notes the user has access to and then I have to do another lookup to fetch the titles.
let titles = []
database.ref(`user_notes/${uid}`)
.on('value', (snaps) => {
snaps.forEach((snap) => {
const noteId = snap.key
database.ref(`notes/${noteId}/data/title`).on('value', (noteSnap) => {
const title = noteSnap.val()
titles.push(title)
}
})
})
Is this the most efficient approach? - It seems inefficient to do double lookups.
Should I store title, and other data needed for initial load, in the user_notes node as well to avoid double lookups?
What is considered to be best practice in cases like this when using a NoSQL database?
Kind regards /K
What you're doing is indeed the common approach. It is not nearly as slow as you may initially think, since Firebase pipelines the requests over a single connection.
A few things to consider:
I'd typically move the members for each note under a top-level node note_members. Separating the types of data typically makes it much easier to keep your security rules reasonable, and is documented under keep your data structure flat.
If you'd like to get rid of the lookup, you can consider storing the title of each note under each user_notes node where you have the ID. You'd essentially replace the true with the name:
user_notes: {
"$uid": {
"noteId-1345": "Hello"
}
}
This simplifies the lookup code (its main advantage) and makes it a bit faster.
This sort of data duplication is quite common when using Firebase and other NoSQL databases, you trade write complexity and extra data storage, for reach simplicity and scalability.

How to do a simple join in GraphQL?

I am very new in GraphQL and trying to do a simple join query. My sample tables look like below:
{
phones: [
{
id: 1,
brand: 'b1',
model: 'Galaxy S9 Plus',
price: 1000,
},
{
id: 2,
brand: 'b2',
model: 'OnePlus 6',
price: 900,
},
],
brands: [
{
id: 'b1',
name: 'Samsung'
},
{
id: 'b2',
name: 'OnePlus'
}
]
}
I would like to have a query to return a phone object with its brand name in it instead of the brand code.
E.g. If queried for the phone with id = 2, it should return:
{id: 2, brand: 'OnePlus', model: 'OnePlus 6', price: 900}
TL;DR
Yes, GraphQL does support a sort of pseudo-join. You can see the books and authors example below running in my demo project.
Example
Consider a simple database design for storing info about books:
create table Book ( id string, name string, pageCount string, authorId string );
create table Author ( id string, firstName string, lastName string );
Because we know that Author can write many Books that database model puts them in separate tables. Here is the GraphQL schema:
type Query {
bookById(id: ID): Book
}
type Book {
id: ID
title: String
pageCount: Int
author: Author
}
type Author {
id: ID
firstName: String
lastName: String
}
Notice there is no authorId on the Book type but a type Author. The database authorId column on the book table is not exposed to the outside world. It is an internal detail.
We can pull back a book and it's author using this GraphQL query:
{
bookById(id:"book-1"){
id
title
pageCount
author {
firstName
lastName
}
}
}
Here is a screenshot of it in action using my demo project:
The result nests the Author details:
{
"data": {
"book1": {
"id": "book-1",
"title": "Harry Potter and the Philosopher's Stone",
"pageCount": 223,
"author": {
"firstName": "Joanne",
"lastName": "Rowling"
}
}
}
}
The single GQL query resulted in two separate fetch-by-id calls into the database. When a single logical query turns into multiple physical queries we can quickly run into the infamous N+1 problem.
The N+1 Problem
In our case above a book can only have one author. If we only query one book by ID we only get a "read amplification" against our database of 2x. Imaging if you can query books with a title that starts with a prefix:
type Query {
booksByTitleStartsWith(titlePrefix: String): [Book]
}
Then we call it asking it to fetch the books with a title starting with "Harry":
{
booksByTitleStartsWith(titlePrefix:"Harry"){
id
title
pageCount
author {
firstName
lastName
}
}
}
In this GQL query we will fetch the books by a database query of title like 'Harry%' to get many books including the authorId of each book. It will then make an individual fetch by ID for every author of every book. This is a total of N+1 queries where the 1 query pulls back N records and we then make N separate fetches to build up the full picture.
The easy fix for that example is to not expose a field author on Book and force the person using your API to fetch all the authors in a separate query authorsByIds so we give them two queries:
type Query {
booksByTitleStartsWith(titlePrefix: String): [Book] /* <- single database call */
authorsByIds(authorIds: [ID]) [Author] /* <- single database call */
}
type Book {
id: ID
title: String
pageCount: Int
}
type Author {
id: ID
firstName: String
lastName: String
}
The key thing to note about that last example is that there is no way in that model to walk from one entity type to another. If the person using your API wants to load the books authors the same time they simple call both queries in single post:
query {
booksByTitleStartsWith(titlePrefix: "Harry") {
id
title
}
authorsByIds(authorIds: ["author-1","author-2","author-3") {
id
firstName
lastName
}
}
Here the person writing the query (perhaps using JavaScript in a web browser) sends a single GraphQL post to the server asking for both booksByTitleStartsWith and authorsByIds to be passed back at once. The server can now make two efficient database calls.
This approach shows that there is "no magic bullet" for how to map the "logical model" to the "physical model" when it comes to performance. This is known as the Object–relational impedance mismatch problem. More on that below.
Is Fetch-By-ID So Bad?
Note that the default behaviour of GraphQL is still very helpful. You can map GraphQL onto anything. You can map it onto internal REST APIs. You can map some types into a relational database and other types into a NoSQL database. These can be in the same schema and the same GraphQL end-point. There is no reason why you cannot have Author stored in Postgres and Book stored in MongoDB. This is because GraphQL doesn't by default "join in the datastore" it will fetch each type independently and build the response in memory to send back to the client. It may be the case that you can use a model that only joins to a small dataset that gets very good cache hits. You can then add caching into your system and not have a problem and benefit from all the advantages of GraphQL.
What About ORM?
There is a project called Join Monster which does look at your database schema, looks at the runtime GraphQL query, and tries to generate efficient database joins on-the-fly. That is a form of Object Relational Mapping which sometimes gets a lot of "OrmHate". This is mainly due to Object–relational impedance mismatch problem.
In my experience, any ORM works if you write the database model to exactly support your object API. In my experience, any ORM tends to fail when you have an existing database model that you try to map with an ORM framework.
IMHO, if the data model is optimised without thinking about ORM or queries then avoid ORM. For example, if the data model is optimised to conserve space in classical third normal form. My recommendation there is to avoid querying the main data model and use the CQRS pattern. See below for an example.
What Is Practical?
If you do want to use pseudo-joins in GraphQL but you hit an N+1 problem you can write code to map specific "field fetches" onto hand-written database queries. Carefully performance test using realist data whenever any fields return an array.
Even when you can put in hand written queries you may hit scenarios where those joins don't run fast enough. In which case consider the CQRS pattern and denormalise some of the data model to allow for fast lookups.
Update: GraphQL Java "Look-Ahead"
In our case we use graphql-java and use pure configuration files to map DataFetchers to database queries. There is a some generic logic that looks at the graph query being run and calls parameterized sql queries that are in a custom configuration file. We saw this article Building efficient data fetchers by looking ahead which explains that you can inspect at runtime the what the person who wrote the query selected to be returned. We can use that to "look-ahead" at what other entities we would be asked to fetch to satisfy the entire query. At which point we can join the data in the database and pull it all back efficiently in the a single database call. The graphql-java engine will still make N in-memory fetches to our code. The N requests to get the author of each book are satisfied by simply lookups in a hashmap that we loaded out of the single database call that joined the author table to the books table returning N complete rows efficiently.
Our approach might sound a little like ORM yet we did not make any attempt to make it intelligent. The developer creating the API and our custom configuration files has to decide which graphql queries will be mapped to what database queries. Our generic logic just "looks-ahead" at what the runtime graphql query actually selects in total to understand all the database columns that it needs to load out of each row returned by the SQL to build the hashmap. Our approach can only handle parent-child-grandchild style trees of data. Yet this is a very common use case for us. The developer making the API still needs to keep a careful eye on performance. They need to adapt both the API and the custom mapping files to avoid poor performance.
GraphQL as a query language on the front-end does not support 'joins' in the classic SQL sense.
Rather, it allows you to pick and choose which fields in a particular model you want to fetch for your component.
To query all phones in your dataset, your query would look like this:
query myComponentQuery {
phone {
id
brand
model
price
}
}
The GraphQL server that your front-end is querying would then have individual field resolvers - telling GraphQL where to fetch id, brand, model etc.
The server-side resolver would look something like this:
Phone: {
id(root, args, context) {
pg.query('Select * from Phones where name = ?', ['blah']).then(d => {/*doStuff*/})
//OR
fetch(context.upstream_url + '/thing/' + args.id).then(d => {/*doStuff*/})
return {/*the result of either of those calls here*/}
},
price(root, args, context) {
return 9001
},
},

Angular.js accessing and displaying nested models efficiently

I'm building a site at the moment where there are many relational links between data. As an example, users can make bookings, which will have booker and bookee, along with an array of messages which can be attached to a booking.
An example json would be...
booking = {
id: 1,
location: 'POST CDE',
desc: "Awesome stackoverflow description."
booker: {
id: 1, fname: 'Lawrence', lname: 'Jones',
},
bookee: {
id: 2, fname: 'Stack', lname: 'Overflow',
},
messages: [
{ id: 1, mssg: 'For illustration only' }
]
}
Now my question is, how would you model this data in your angular app? And, while very much related, how would you pull it from the server?
As I can see it I have a few options.
Pull everything from the server at once
Here I would rely on the server to serialize the nested data and just use the given json object. Downsides are that I don't know what users will be involved when requesting a booking or similar object, so I can't cache them and I'll therefore be pulling a large chunk of data every time I request.
Pull the booking with booker/bookee as user ids
For this I would use promises for my data models, and have the server return an object such as...
booking = {
id: 1,
location: 'POST CDE',
desc: "Awesome stackoverflow description."
booker: 1, bookee: 2,
messages: [1]
}
Which I would then pass to a Booking constructor, which would resolve the relevant (booker,bookee and message) ids into data objects via their respective factories.
The disadvantages here are that many ajax requests are used for a single booking request, though it gives me the ability to cache user/message information.
In summary, is it better practise to rely on a single ajax request to collect all the nested information at once, or rely on various requests to 'flesh out' the initial response after the fact.
I'm using Rails 4 if that helps (maybe Rails would be more suited to a single request?)
I'm going to use a system where I can hopefully have the best of both worlds, by creating a base class for all my resources that will be given a custom resolve function, that will know what fields in that particular class may require resolving. A sample resource function would look like this...
class Booking
# other methods...
resolve: ->
booking = this
User
.query(booking.booker, booking.bookee)
.then (users) ->
[booking.booker, booking.bookee] = users
Where it will pass the value of the booker and bookee fields to the User factory, which will have a constructor like so...
class User
# other methods
constructor: (data) ->
user = this
if not isNaN(id = parseInt data, 10)
User.get(data).then (data) ->
angular.extend user, data
else angular.extend this, data
If I have passed the User constructor a value that cannot be parsed into a number (so this will happily take string ids as well as numerical) then it will use the User factorys get function to retrieve the data from the server (or through a caching system, implementation is obviously inside the get function itself). If however the value is detected to be non-NaN, then I'll assume that the User has already been serialized and just extend this with the value.
So it's invisible in how it caches and is independent of how the server returns the nested objects. Allows for modular ajax requests and avoids having to redownload unnecessary data via its caching system.
Once everything is up and running I'll write some tests to see whether the application would be better served with larger, chunked ajax requests or smaller modular ones like above. Either way this lets you pass all model data through your angular factories, so you can rely on every record having inherited any prototype methods you may want to use.

BreezeJS Database Connection Security

Is interacting with the database in BreezeJS secure?
For example, if I use the following code, it clearly shows the Database name, tables and the query itself directly in the javascript. Does it make a secure connection to the database?
var manager = new breeze.EntityManager('api/northwind');
var query = new breeze.EntityQuery()
.from("Employees");
manager.executeQuery(query).then(function(data){
ko.applyBindings(data);
}).fail(function(e) {
alert(e);
});
The line "var manager = new breeze.EntityManager('api/northwind');" doesn't says anything about the database. It is the route to the MVC controller ( webapi in this case ).
And the line "var query = new breeze.EntityQuery().from("Employees");" does not have any relation to the database, it's the name of a methd in you controller.
Having in consideration that you can use the mechanics that mvc provides to securize the controller ( like the Authorize attribute ), I don't see any risk using breeze.
The security of breeze.js in the end falls to the programming language used to actually run the queries. As I saw in the docs, it's mainly for ASP.
Checking the TODO sample, doing an action calls /api/todos/SaveChanges with a payload of:
{
"entities": [{
"Id": 2908,
"Description": "Wine",
"CreatedAt": "2012-08-22T09:06:00.000Z",
"IsDone": true,
"IsArchived": false,
"entityAspect": {
"entityTypeName": "TodoItem:#Todo.Models",
"entityState": "Modified",
"originalValuesMap": {
"IsDone": false
},
"autoGeneratedKey": {
"propertyName": "Id",
"autoGeneratedKeyType": "Identity"
}
}
}],
"saveOptions": {
"allowConcurrentSaves": false
}
}
The only sensitive thing there is the Id. Even if you don't use JavaScript you still have to expose some data in one way or another. I'm not saying this is best way of doing it, but this does not have any immediate drawbacks that I can think of. At least not in the JS component.
It falls on behalf of the application (just like in any situation) to sanitize any input from users. This includes any AJAX calls, be it done with breeze or not.
If you can comment with some of the ASP code used to sanitize/run the queries, we can offer more insight on the matter.
So in summary. No issues. JavaScript by itself does NOT connect to the database so it does not have any inherent security issues.

Categories

Resources