Passing down arguments using Facebook's DataLoader

Passing down arguments using Facebook's DataLoader - javascript

I'm using DataLoader for batching the requests/queries together.
In my loader function I need to know the requested fields to avoid having a SELECT * FROM query but rather a SELECT field1, field2, ... FROM query...
What would be the best approach using DataLoader to pass down the resolveInfo needed for it? (I use resolveInfo.fieldNodes to get the requested fields)
At the moment, I'm doing something like this:
await someDataLoader.load({ ids, args, context, info });
and then in the actual loaderFn:
const loadFn = async options => {
const ids = [];
let args;
let context;
let info;
options.forEach(a => {
ids.push(a.ids);
if (!args && !context && !info) {
args = a.args;
context = a.context;
info = a.info;
}
});
return Promise.resolve(await new DataProvider().get({ ...args, ids}, context, info));};
but as you can see, it's hacky and doesn't really feel good...
Does anyone have an idea how I could achieve this?

I am not sure if there is a good answer to this question simply because Dataloader is not made for this usecase but I have worked extensively with Dataloader, written similar implementations and explored similar concepts on other programming languages.
Let's understand why Dataloader is not made for this usecase and how we could still make it work (roughly like in your example).
Dataloader is not made for fetching a subset of fields
Dataloader is made for simple key-value-lookups. That means given a key like an ID it will load a value behind it. For that it assumes that the object behind the ID will always be the same until it is invalidated. This is the single assumption that enables the power of dataloader. Without it the three key features of Dataloader won't work anymore:
Batching requests (multiple requests are done together in one query)
Deduplication (requests to the same key twice result in one query)
Caching (consecutive requests of the same key don't result in multiple queries)
This leads us to the following two important rules if we want to maximise the power of Dataloader:
Two different entities cannot share the same key, othewise we might return the wrong entity. This sounds trivial but it is not in your example. Let's say we want to load a user with ID 1 and the fields id and name. A little bit later (or at the same time) we want to load user with ID 1 and fields id and email. These are technically two different entities and they need to have a different key.
The same entity should have the same key all the time. Again sounds trivial but really is not in the example. User with ID 1 and fields id and name should be the same as user with ID 1 and fields name and id (notice the order).
In short a key needs to have all the information needed to uniquely identify an entity but not more than that.
So how do we pass down fields to Dataloader
await someDataLoader.load({ ids, args, context, info });
In your question you have provided a few more things to your Dataloader as a key. First I would not put in args and context into the key. Does your entity change when the context changes (e.g. you are querying a different database now)? Probably yes, but do you want to account for that in your dataloader implementation? I would instead suggest to create new dataloaders for each request as described in the docs.
Should the whole request info be in the key? No, but we need the fields that are requested. Apart from that your provided implementation is wrong and would break when the loader is called with two different resolve infos. You only set the resolve info from the first call but really it might be different on each object (think about the first user example above). Ultimately we could arrive at the following implementation of a dataloader:
// This function creates unique cache keys for different selected
// fields
function cacheKeyFn({ id, fields }) {
const sortedFields = [...(new Set(fields))].sort().join(';');
return `${id}[${sortedFields}]`;
}
function createLoaders(db) {
const userLoader = new Dataloader(async keys => {
// Create a set with all requested fields
const fields = keys.reduce((acc, key) => {
key.fields.forEach(field => acc.add(field));
return acc;
}, new Set());
// Get all our ids for the DB query
const ids = keys.map(key => key.id);
// Please be aware of possible SQL injection, don't copy + paste
const result = await db.query(`
SELECT
${fields.entries().join()}
FROM
user
WHERE
id IN (${ids.join()})
`);
}, { cacheKeyFn });
return { userLoader };
}
// now in a resolver
resolve(parent, args, ctx, info) {
// https://www.npmjs.com/package/graphql-fields
return ctx.userLoader.load({ id: args.id, fields: Object.keys(graphqlFields(info)) });
}
This is a solid implementation but it has a few weaknesses. First, we are overfetching a lot of fields if we have different field requiements in the same batch request. Second, if we have fetched an entity with key 1[id,name] from cache key function we could also answer (at least in JavaScript) keys 1[id] and 1[name] with that object. Here we could build a custom map implementation that we could supply to Dataloader. It would be smart enough to know these things about our cache.
Conclusion
We see that this is really a complicated matter. I know it is often listed as a benefit of GraphQL that you don't have to fetch all fields from a database for every query, but the truth is that in practice this is seldomly worth the hassle. Don't optimise what is not slow. And even is it slow, is it a bottleneck?
My suggestion is: Write trivial Dataloaders that simply fetch all (needed) fields. If you have one client it is very likely that for most entities the client fetches all fields anyways, otherwise they would not be part of you API, right? Then use something like query introsprection to measure slow queries and then find out which field exactly is slow. Then you optimise only the slow thing (see for example my answer here that optimises a single use case). And if you are a big ecomerce platform please don't use Dataloader for this. Build something smarter and don't use JavaScript.

Related

Using Merge with a single Create call in FaunaDB is creating two documents?

Got a weird bug using FaunaDB with a Node.js running on a Netlify Function.
I am building out a quick proof-of-concept and initially everything worked fine. I had a Create query that looked like this:
const faunadb = require('faunadb');
const q = faunadb.query;
const CreateFarm = (data) => (
q.Create(
q.Collection('farms'),
{ data },
)
);
As I said, everything here works as expected. The trouble began when I tried to start normalizing the data FaunaDB sends back. Specifically, I want to merge the Fauna-generated ID into the data object, and send just that back with none of the other metadata.
I am already doing that with other resources, so I wrote a helper query and incorporated it:
const faunadb = require('faunadb');
const q = faunadb.query;
const Normalize = (resource) => (
q.Merge(
q.Select(['data'], resource),
{ id: q.Select(['ref', 'id'], resource) },
)
);
const CreateFarm = (data) => (
Normalize(
q.Create(
q.Collection('farms'),
{ data },
),
)
);
This Normalize function works as expected everywhere else. It builds the correct merged object with an ID with no weird side effects. However, when used with CreateFarm as above, I end up with two identical farms in the DB!!
I've spent a long time looking at the rest of the app. There is definitely only one POST request coming in, and CreateFarm is definitely only being called once. My best theory was that since Merge copies the first resource passed to it, Create is somehow getting called twice on the DB. But reordering the Merge call does not change anything. I have even tried passing in an empty object first, but I always end up with two identical objects created in the end.

Your helper creates an FQL query with two separate Create expressions. Each is evaluated and creates a new Document. This is not related to the Merge function.
Merge(
Select(['data'], Create(
Collection('farms'),
{ data },
)),
{ id: Select(['ref', 'id'], Create(
Collection('farms'),
{ data },
)) },
)
Use Let to create the document, then Update it with the id. Note that this increases the number of Write Ops required for you application. It will basically double the cost of creating Documents. But for what you are trying to do, this is how to do it.
Let(
{
newDoc: Create(q.Collection("farms"), { data }),
id: Select(["ref", "id"], Var("newDoc")),
data: Select(["data"], Var("newDoc"))
},
Update(
Select(["ref"], Var("newDoc")),
{
data: Merge(
Var("data"),
{ id: Var("id") }
)
}
)
)
Aside: why store id in the document data?
It's not clear why you might need to do this. Indexes can be created on the ref value themselves. If your client receives a Ref, then that can be passed into subsequent queries directly. In my experience, if you need the plain id value directly in an application, transform the Document as close to that point in the application as possible (like using ids as keys for an array of web components).
There's even a slight Compute advantage for using Ref values rather than re-building Ref expressions from a Collection name and ID. The expression Ref(Collection("farms"), "1234") counts as 2 FQL functions toward Compute costs, but reusing the Ref value returned by queries is free.
Working with GraphQL, the _id field is abstracted out for you because working with Document types in GraphQL would be pretty awful. However, the best practice for FQL queries would be to use the Ref's directly as much as possible.
Don't let me talk in absolute terms, though! I believe generally that there's a reason for anything. If you believe you really need to duplicate the ID in the Documents data, then I would be interested in a comment why.

Why does my sequelize model instance lose its id?

I've got a node-based microservice built on top of postgres, using sequelize to perform queries. I've got a table of Pets, each with an id (uuid) and a name (string). And, I've got a function for fetching Pets from the database by name, which wraps the nasty-looking sequelize call:
async function getPetByName( petName ) {
const sqlzPetInstance = Database.Pet.findOne({
where: { name: { [Sequelize.Op.iLike]: petName } }
})
if(!sqlzPetInstance) return undefined
return sqlzPetInstance
}
It works great.
Later, to improve performance, I added some very short-lived caching to that function, like so:
async function getPetByName( petName ) {
if( ramCache.get(petName) ) return ramCache.get(petName)
const sqlzPetInstance = await Database.Pet.findOne({ ... })
if(!sqlzPetInstance) return undefined
return ramCache.set(petName, sqlzPetInstance) // persists for 5 seconds
}
Now I've noticed that items served from the cache sometimes have their id prop removed! WTF?!
I've added logging, and discovered that the ramCache entry is still being located reliably, and the value is still an instance of the sqlz Pet model. All the other attributes on the model are still present, but dataValues.id is undefined. I also noticed that _previousDataValues.id has the correct value, which suggests to me this really is the model instance I want it to be, but modified for some reason.
What can explain this? Is this what I would see if callers who obtain the model mutate it by assigning to id? What can cause _previousDataValues and dataValues to diverge? Are there cool sqlz techniques I can use to catch the culprit (perhaps by defining custom setters that log or throw)?
EDIT: experimentation shows that I can't overwrite the id by assigning to it. That's cool, but now I'm pretty much out of ideas. If it's not some kind of irresponsible mutation (which I could protect against), then I can't think of any sqlz instance methods that would result in removing the id.

I don't have a smoking gun, but I can describe the fix I wrote and the hypothesis that shaped it.
As I said, I was storing sequelize model instances in RAM:
ramCache[ cacheKey ] = sqlzModelInstance
My hypothesis is that, by providing the same instance to every caller, I created a situation in which naughty callers could mutate the shared instance.
I never figured out how that mutation was happening. I proved through experimentation that I could not modify the id attribute by overwriting it:
// this does not work
sqlzModelInstance.id = 'some-fake-id'
// unchanged
However, I read a few things in the sqlz documentation that suggested that every instance retains some kind of invisible link to a central authority, and so there's the possibility of "spooky action at a distance."
So, to sever that link, I modified my caching system to store the raw data, rather than sqlz model instances, and to automatically re-hydrate that raw data upon retrieval.
Crudely:
function saveInCache( cacheKey, sqlzModelInstance ) {
cache[ cacheKey ] = sqlzModelInstance.get({ plain: true })
}
function getFromCache( cacheKey ) {
let data = cache[ cacheKey ]
if(!data) return undefined
return MySqlzClass.build( data, { isNewRecord: false, raw: true } )
}
I never located the naughty caller -- and my general practice is to avoid mutating arguments, so it's unlikely any straightforward mutation is happening -- but the change I describe has fixed the easily-reproducible bug I was encountering. So, I think my hypothesis, vague as it is, is accurate.
I will refrain for a while from marking my answer as correct, in the hopes that someone can shed some more light on the problem.

Wait for Observable to complete in order to submit a form

I have a 'new trip' form, where the user can write the names of the participants and then submit the form to create the trip.
On submit, I query a Firebase database with the names, in order to get the IDs of the participants (/users). I then add the IDs to the participantsID field of the trip object and then I push the new trip to Firebase.
The problem is that the Firebase query is async and returns an Observable, therefore my function will proceed to push the object before the Observable has completed, so the participantsID field of the new object is empty.
Is there any method to wait for the observable to complete (in a kind of synchronous way) so that i can manipulate the data and then proceed? All my attempts to fix this have failed so far.
Here's my simple code.
getUserByAttribute(attribute, value) {
return this.db.list('/users', {
query: {
orderByChild: attribute,
equalTo: value,
limitToFirst: 1
}
});
}
createTrip(trip) {
for(let name in participantsName.split(',')) {
getUserByAttribute('username', name)
.subscribe( user => trip.participantsID.push(user[0].$key) );
}
this.db.list('/trips').push(trip);
}

You could treat all Observables into a single Observable by doing forkJoin
createTrip(trip) {
var observableArray: any = participantsName.split(',')
.switchMap((name)=> getUserByAttribute('username', name))
Observable.forkJoin(observableArray).subscribe(
trips => trips.forEach((trip) => {
this.db.list('/trips').push(trip);
})
);
}

In the end I used part of #Pankaj Parkar's answer to solve the problem.
I forkJoin all the Observables returned by mapping the splitted names and I subscribe to that Observable which result contains an array of arrays, where the inner arrays contain a user object.
getUserByAttribute(attribute, value) {
return this.db.list('/users', {
query: {
orderByChild: attribute,
equalTo: value,
limitToFirst: 1
}
}).first();
}
createTrip(trip) {
Observable.forkJoin(
trip.participantsName.split(',')
.map(name => getUserByAttribute('name', name))
).subscribe(
participants => {
trip.participants = participants.map( p => p[0].$key);
this.tripService.createTrip(trip);
}
);
}
}

You have a difficult problem. You have to get users info before push a new trip.
You can't just make new subscriptions every time because of the memory leak problem (or be careful with unsubscribes). If you are using Firebase, you can use AngularFire subject support.
You can update a subscription by using a subject in your query (with the equal to) and then push a user to retrieve with .next(user).
Then you still have to wait for all users. For that, you can have only one subscription and get all IDs synchronously or have multiple subscriptions to get multiple results faster (but it's difficult).
To solve this problem, I created:
a queue of callbacks (just arrays but use push() and unshift() methods)
a queue of values
one subject for one subscription.
If you want an ID, you have to:
push the value
push the callback that will retrieve the value returned.
You should use functions to push because you'll have to call .next() if there is no value in the stack (to start !).
And in your subscription, in its callback, i.e when you receive the distant user object, you can call the first callback in the stack. Don't forget to pop your value and callback of the stacks and call the next() for the next value if there is one.
This way, you can push your trip in the last callback for the last user. And it's all callbacks, it means your app is not interrupted.
I still not decided if we should do that in a cloud function. Because the user have to stay connected, and this use his data / processor. But it's good to have all the code in the same place, and cloud functions are limited for a free version of Firebase. What would a Firebase developer advice?
I made a lot of searches to find a better solution, so please share it if you have one. It's a little complicated I think, but it's working very fine. I had the same problem when a user want to add a new flight, I need to get the airports information before (coords) and push multiple objects (details, maps, etc.)

Best practice for filtering an array in Angular 2

I have a component called Tours that shows a full list of tours.
I have a link to a site called favorites which should display the tours with the favorite parameter set to true, and apart from that identical to the Tour component.
Is there a best practice for achieving this?
I can think of a few ways
Create a separate route to this component and filter based on the value in the route
Create a custom pipe that's triggered based on the route path
However, neither of them seem optimal to me

I exclusively use the filter function, especially in cases such as yours. If you have an array of tours with a favorite property on each iteration, you could do this:
this.tours.filter((item) => {
return (item.favorite === true)
})
Or if you wanted a filter function with a favorite parameter, you could do this:
filterFavorites(favorite : boolean): Array<string>{
return tours.filter((item) => {
return (favorite)? item : null;
})
}
Let me know if this helps you.
Edit: you could definitely create a pipe, but I think that would be overkill in your case, unless you intend to use the pipe multiple times in other places.

Updating Data between two components in React

I am new to React and I don't know what's the best way to do this.
I have a list of cars and on clicking each row it should show slide to full page details of that car.
My code structure is:
I have App which renders two components. CarList and CarDetails. Car Details is hidden initially. The reason I rendered carDetails in app is because it's a massive fix template so I would like to render this once when app is loaded and only update it's data when each row clicked.
CarList also renders CarRow component which is fine.
Now my problem is I have a getDetails function on CarRow component which is making a call to get the details based on the car id. How to get carDetails component data updated ? I used
this.setState({itemDetails:data});
but seems state of the carRow is not the same reference as state in carDetails.
Any help?

This is a fundamental issue that lots of thought and man-hours has gone into in order to try and solve. It probably can't be answered, except on a surface level, in a StackOverflow post. It's not React-centric, either. This is an issue across most applications, regardless of the framework you're using.
Since you asked in the context of React, you might consider reading into flux, which is the de-facto implementation of this one-way data-flow idea in concert with React. However, that architecture is by no means "the best". There are simply advantages and disadvantages to it like everything else.
Some people don't like the idea of the global "event bus" that flux proposes. If that's the case, you can simply implement your own intermediate data layer API that collects query callbacks and A) invokes the callbacks on any calls to save data and B) refreshes any appropriate queries to the server. For now, though, I'd stick with flux as it will give you an idea of the general principles involved in having the things that most people consider to be "good", like a single source of truth for your data, one way flow, etc.
To give a concrete example of the callback idea:
// data layer
const listeners = [];
const data = {
save: save,
query: query
};
function save(someData) {
// save data to the server, and then...
.then(data => {
listeners.forEach(listener => listener(data));
});
}
function query(params, callback) {
// query the server with the params, then
listeners.push(callback);
}
// component
componentWillMount() {
data.query(params, data => this.setState({ myData: data }));
},
save() {
// when the save operation is complete, it will "refresh" the query above
data.save(someData);
}
This is a very distilled example and doesn't address optimization, such as potential for memory leaks when moving to different views and invoking "stale" callbacks, however it should give you a general idea of another approach.
The two approaches have the same policy (a single source of truth for data and one way data flow) but different implementations (global "event bus" which necessitates keeping track of events, or the simple callback method, which can necessitate a form of memory management).

Develop Reference

JavaScript is the programming language of the Web.