Using Merge with a single Create call in FaunaDB is creating two documents? - javascript

Got a weird bug using FaunaDB with a Node.js running on a Netlify Function.
I am building out a quick proof-of-concept and initially everything worked fine. I had a Create query that looked like this:
const faunadb = require('faunadb');
const q = faunadb.query;
const CreateFarm = (data) => (
q.Create(
q.Collection('farms'),
{ data },
)
);
As I said, everything here works as expected. The trouble began when I tried to start normalizing the data FaunaDB sends back. Specifically, I want to merge the Fauna-generated ID into the data object, and send just that back with none of the other metadata.
I am already doing that with other resources, so I wrote a helper query and incorporated it:
const faunadb = require('faunadb');
const q = faunadb.query;
const Normalize = (resource) => (
q.Merge(
q.Select(['data'], resource),
{ id: q.Select(['ref', 'id'], resource) },
)
);
const CreateFarm = (data) => (
Normalize(
q.Create(
q.Collection('farms'),
{ data },
),
)
);
This Normalize function works as expected everywhere else. It builds the correct merged object with an ID with no weird side effects. However, when used with CreateFarm as above, I end up with two identical farms in the DB!!
I've spent a long time looking at the rest of the app. There is definitely only one POST request coming in, and CreateFarm is definitely only being called once. My best theory was that since Merge copies the first resource passed to it, Create is somehow getting called twice on the DB. But reordering the Merge call does not change anything. I have even tried passing in an empty object first, but I always end up with two identical objects created in the end.

Your helper creates an FQL query with two separate Create expressions. Each is evaluated and creates a new Document. This is not related to the Merge function.
Merge(
Select(['data'], Create(
Collection('farms'),
{ data },
)),
{ id: Select(['ref', 'id'], Create(
Collection('farms'),
{ data },
)) },
)
Use Let to create the document, then Update it with the id. Note that this increases the number of Write Ops required for you application. It will basically double the cost of creating Documents. But for what you are trying to do, this is how to do it.
Let(
{
newDoc: Create(q.Collection("farms"), { data }),
id: Select(["ref", "id"], Var("newDoc")),
data: Select(["data"], Var("newDoc"))
},
Update(
Select(["ref"], Var("newDoc")),
{
data: Merge(
Var("data"),
{ id: Var("id") }
)
}
)
)
Aside: why store id in the document data?
It's not clear why you might need to do this. Indexes can be created on the ref value themselves. If your client receives a Ref, then that can be passed into subsequent queries directly. In my experience, if you need the plain id value directly in an application, transform the Document as close to that point in the application as possible (like using ids as keys for an array of web components).
There's even a slight Compute advantage for using Ref values rather than re-building Ref expressions from a Collection name and ID. The expression Ref(Collection("farms"), "1234") counts as 2 FQL functions toward Compute costs, but reusing the Ref value returned by queries is free.
Working with GraphQL, the _id field is abstracted out for you because working with Document types in GraphQL would be pretty awful. However, the best practice for FQL queries would be to use the Ref's directly as much as possible.
Don't let me talk in absolute terms, though! I believe generally that there's a reason for anything. If you believe you really need to duplicate the ID in the Documents data, then I would be interested in a comment why.

Related

Apollo/GraphQL/React, How to query for data in a loop?

In one of my React components, I have an array that includes a number of IDs as Strings. I want to query each ID in the array and return an array of its associated objects within my MongoDB database. However, this only happens conditionally, so I wanted to use useLazyQuery instead of useQuery.
I tried to accomplish it via mapping, but it seems I had an issue with using useLazyQuery. Here's the issue:
const [queryId, {loading, data}] = useLazyQuery[QUERY_ID];
if (queryForIds) {
// Using just the first ID
(queryId({variables: {id: idList[0]}}));
console.log(data);
}
This results in an infinite loop of nothing being printed (maybe my resolver returns nothing but regardless I get an infinite loop that crashes my site). I am most likely misusing useLazyQuery but I'm unsure of how.
Originally my idea was to do this:
const [queryId, {loading, data}] = useLazyQuery[QUERY_ID];
if (queryForIds) {
// Using just the first ID
idList.map((id) => (queryId({variables: {id: idList[0]}}))).data);
}
But I'm also unsure if that works either. How can I resolve either issue?

How to execute a sort after loading in data into an array in UseEffect - React Native

I'm trying to create a chat app and there is a small issue. Whenever I load in my messages from firebase, they appear in the chat app in unsorted order, so I'm attempting to sort the messages by timestamp so they appear in order. I can do this if I move the sort and setMessages within onReceive of useEffect, but I feel like this will be pretty inefficient because it sorts and setsMessages a separate time for each message that's retrieved from firebase. I want to just do it all at the end after all the messages are loaded into the array.
Right now with my logs, I get this:
[REDACTED TIME] LOG []
[REDACTED TIME] LOG pushing into loadedMessages
[REDACTED TIME] LOG pushing into loadedMessages
So it's printing the (empty) array first, then loading in messages. How can I make sure this is done in the correct order?
useEffect(() => {
// Gets User ID
fetchUserId(getUserId());
const messagesRef = firebase.database().ref(`${companySymbol}Messages`);
messagesRef.off();
messagesRef.off();
const onReceive = async (data) => {
const message = data.val();
const iMessage = {
_id: message._id,
text: message.text,
createdAt: new Date(message.createdAt),
user: {
_id: message.user._id,
name: message.user.name,
},
};
loadedMessages.push(iMessage);
console.log('pushing into loadedMessages');
};
messagesRef.on('child_added', onReceive);
loadedMessages.sort(
(message1, message2) => message2.createdAt - message1.createdAt,
);
console.log(loadedMessages);
return () => {
console.log('useEffect Return:');
messagesRef.off();
};
}, []);
I think that the perspective is a bit off.
The right way to do so will be to fetch the firebase data sorted.
Firebase has a built-in sort, although it does come with its limitations.
In my opinion, you sould try something like:
const messagesRef = firebase.database().ref(`${companySymbol}Messages`);
messagesRef.orderByChild("createdAt").on("child_added", function(snapshot) {
// the callback function once a new message has been created.
console.log(snapshot.val());
});
And if I may add one more thing, to bring every single message from the down of time can be a bit harry once you've got over a thousand or so, so I would recommend limiting it. that can be achieved using the built-in limit function limitToLast(1000) for example.
Good luck!
Well, the name of the database is "Realtime Database". You are using the "child_added" listener which is going to be triggered every time a new object gets added to the Messages collection. The onReceive callback should do the sorting - otherwise the messages won't be in the correct order. Yes, that is inefficient for the first load as your "child_added" will most probably be triggered for every item returned from the collection and you'll be repeating sorting.
What you could explore as alternative is to have a .once listener: https://firebase.google.com/docs/database/web/read-and-write#read_data_once the first time you populate the data in your app. This will return all the data you need. After that is complete you can create your "child_added" listener and only listen for new objects. This way onReceive shouldn't be called that often the first time and afterwards it already makes sense to sort on every new item that comes in.
Also have a look at sorting: https://firebase.google.com/docs/database/web/lists-of-data#sorting_and_filtering_data
You might be able to return the messages in the correct order.
And also - if you need queries - look at firestore...

Firebase Realtime Database - Database trigger structure best practice

I have some data in a Firebase Realtime Database where I wish to split one single onUpdate() trigger into two triggers. My data is structured as below.
notes: {
note-1234: {
access:{
author: "-L1234567890",
members: {
"-L1234567890": 0,
"-LAAA456BBBB": 1
}
},
data: {
title: "Hello",
order: 1
}
}
}
Currently I have one onUpdate() database trigger for node 'notes/{noteId}'.
exports.onNotesUpdate = functions
.region('europe-west1')
.database
.ref('notes/{noteId}')
.onUpdate((change, context) => {
//Perform actions
return noteFunc.notesUpdate({change, context, type:ACTIVE})
})
However, since my code is getting quite extensive handling both data and access updates - I am considering splitting the code into two parts. One part handling updates in the access child node and one handling the data child node. This way my code would be easier to read and understand by being logically split into separate code blocks.
exports.onNotesUpdateAccess = functions
.region('europe-west1')
.database
.ref('notes/{noteId}/access')
.onUpdate((change, context) => {
//Perform actions
return noteFunc.notesAccessUpdate({change, context, type:ACTIVE})
})
exports.onNotesUpdateData = functions
.region('europe-west1')
.database
.ref('notes/{noteId}/data')
.onUpdate((change, context) => {
//Perform actions
return noteFunc.notesDataUpdate({change, context, type:ACTIVE})
})
I am a bit unsure though, since both access and data are child nodes to the note-1234 (noteId) node.
My question is - Would this be a recommended approach or could separate triggers on child nodes create problems?
Worth mentioning is that the entire note-1234 node (both access and data) will sometimes be updated with one .update() action from my application. At other times only access or data will be updated.
Kind regards /K
It looks like you've nested two types of data under a single branch, which is something the Firebase documentation explicitly recommends against in the sections on avoid nesting data and flatten data structure.
So instead of merely splitting the code into two, I'd also recommend splitting the data structure into two top-level nodes: one for each type of data. For example:
"notes-data": {
note-1234: {
author: "-L1234567890",
members: {
"-L1234567890": 0,
"-LAAA456BBBB": 1
}
}
},
"notes-access": {
note-1234: {
title: "Hello",
order: 1
}
}
By using the same key in both top-level nodes, you can easily look up the other type of data for a note. And because Firebase pipelines these requests over a single connection such "client side joining of data" is not nearly as slow as you may initially think.

Why does my sequelize model instance lose its id?

I've got a node-based microservice built on top of postgres, using sequelize to perform queries. I've got a table of Pets, each with an id (uuid) and a name (string). And, I've got a function for fetching Pets from the database by name, which wraps the nasty-looking sequelize call:
async function getPetByName( petName ) {
const sqlzPetInstance = Database.Pet.findOne({
where: { name: { [Sequelize.Op.iLike]: petName } }
})
if(!sqlzPetInstance) return undefined
return sqlzPetInstance
}
It works great.
Later, to improve performance, I added some very short-lived caching to that function, like so:
async function getPetByName( petName ) {
if( ramCache.get(petName) ) return ramCache.get(petName)
const sqlzPetInstance = await Database.Pet.findOne({ ... })
if(!sqlzPetInstance) return undefined
return ramCache.set(petName, sqlzPetInstance) // persists for 5 seconds
}
Now I've noticed that items served from the cache sometimes have their id prop removed! WTF?!
I've added logging, and discovered that the ramCache entry is still being located reliably, and the value is still an instance of the sqlz Pet model. All the other attributes on the model are still present, but dataValues.id is undefined. I also noticed that _previousDataValues.id has the correct value, which suggests to me this really is the model instance I want it to be, but modified for some reason.
What can explain this? Is this what I would see if callers who obtain the model mutate it by assigning to id? What can cause _previousDataValues and dataValues to diverge? Are there cool sqlz techniques I can use to catch the culprit (perhaps by defining custom setters that log or throw)?
EDIT: experimentation shows that I can't overwrite the id by assigning to it. That's cool, but now I'm pretty much out of ideas. If it's not some kind of irresponsible mutation (which I could protect against), then I can't think of any sqlz instance methods that would result in removing the id.
I don't have a smoking gun, but I can describe the fix I wrote and the hypothesis that shaped it.
As I said, I was storing sequelize model instances in RAM:
ramCache[ cacheKey ] = sqlzModelInstance
My hypothesis is that, by providing the same instance to every caller, I created a situation in which naughty callers could mutate the shared instance.
I never figured out how that mutation was happening. I proved through experimentation that I could not modify the id attribute by overwriting it:
// this does not work
sqlzModelInstance.id = 'some-fake-id'
// unchanged
However, I read a few things in the sqlz documentation that suggested that every instance retains some kind of invisible link to a central authority, and so there's the possibility of "spooky action at a distance."
So, to sever that link, I modified my caching system to store the raw data, rather than sqlz model instances, and to automatically re-hydrate that raw data upon retrieval.
Crudely:
function saveInCache( cacheKey, sqlzModelInstance ) {
cache[ cacheKey ] = sqlzModelInstance.get({ plain: true })
}
function getFromCache( cacheKey ) {
let data = cache[ cacheKey ]
if(!data) return undefined
return MySqlzClass.build( data, { isNewRecord: false, raw: true } )
}
I never located the naughty caller -- and my general practice is to avoid mutating arguments, so it's unlikely any straightforward mutation is happening -- but the change I describe has fixed the easily-reproducible bug I was encountering. So, I think my hypothesis, vague as it is, is accurate.
I will refrain for a while from marking my answer as correct, in the hopes that someone can shed some more light on the problem.

Passing down arguments using Facebook's DataLoader

I'm using DataLoader for batching the requests/queries together.
In my loader function I need to know the requested fields to avoid having a SELECT * FROM query but rather a SELECT field1, field2, ... FROM query...
What would be the best approach using DataLoader to pass down the resolveInfo needed for it? (I use resolveInfo.fieldNodes to get the requested fields)
At the moment, I'm doing something like this:
await someDataLoader.load({ ids, args, context, info });
and then in the actual loaderFn:
const loadFn = async options => {
const ids = [];
let args;
let context;
let info;
options.forEach(a => {
ids.push(a.ids);
if (!args && !context && !info) {
args = a.args;
context = a.context;
info = a.info;
}
});
return Promise.resolve(await new DataProvider().get({ ...args, ids}, context, info));};
but as you can see, it's hacky and doesn't really feel good...
Does anyone have an idea how I could achieve this?
I am not sure if there is a good answer to this question simply because Dataloader is not made for this usecase but I have worked extensively with Dataloader, written similar implementations and explored similar concepts on other programming languages.
Let's understand why Dataloader is not made for this usecase and how we could still make it work (roughly like in your example).
Dataloader is not made for fetching a subset of fields
Dataloader is made for simple key-value-lookups. That means given a key like an ID it will load a value behind it. For that it assumes that the object behind the ID will always be the same until it is invalidated. This is the single assumption that enables the power of dataloader. Without it the three key features of Dataloader won't work anymore:
Batching requests (multiple requests are done together in one query)
Deduplication (requests to the same key twice result in one query)
Caching (consecutive requests of the same key don't result in multiple queries)
This leads us to the following two important rules if we want to maximise the power of Dataloader:
Two different entities cannot share the same key, othewise we might return the wrong entity. This sounds trivial but it is not in your example. Let's say we want to load a user with ID 1 and the fields id and name. A little bit later (or at the same time) we want to load user with ID 1 and fields id and email. These are technically two different entities and they need to have a different key.
The same entity should have the same key all the time. Again sounds trivial but really is not in the example. User with ID 1 and fields id and name should be the same as user with ID 1 and fields name and id (notice the order).
In short a key needs to have all the information needed to uniquely identify an entity but not more than that.
So how do we pass down fields to Dataloader
await someDataLoader.load({ ids, args, context, info });
In your question you have provided a few more things to your Dataloader as a key. First I would not put in args and context into the key. Does your entity change when the context changes (e.g. you are querying a different database now)? Probably yes, but do you want to account for that in your dataloader implementation? I would instead suggest to create new dataloaders for each request as described in the docs.
Should the whole request info be in the key? No, but we need the fields that are requested. Apart from that your provided implementation is wrong and would break when the loader is called with two different resolve infos. You only set the resolve info from the first call but really it might be different on each object (think about the first user example above). Ultimately we could arrive at the following implementation of a dataloader:
// This function creates unique cache keys for different selected
// fields
function cacheKeyFn({ id, fields }) {
const sortedFields = [...(new Set(fields))].sort().join(';');
return `${id}[${sortedFields}]`;
}
function createLoaders(db) {
const userLoader = new Dataloader(async keys => {
// Create a set with all requested fields
const fields = keys.reduce((acc, key) => {
key.fields.forEach(field => acc.add(field));
return acc;
}, new Set());
// Get all our ids for the DB query
const ids = keys.map(key => key.id);
// Please be aware of possible SQL injection, don't copy + paste
const result = await db.query(`
SELECT
${fields.entries().join()}
FROM
user
WHERE
id IN (${ids.join()})
`);
}, { cacheKeyFn });
return { userLoader };
}
// now in a resolver
resolve(parent, args, ctx, info) {
// https://www.npmjs.com/package/graphql-fields
return ctx.userLoader.load({ id: args.id, fields: Object.keys(graphqlFields(info)) });
}
This is a solid implementation but it has a few weaknesses. First, we are overfetching a lot of fields if we have different field requiements in the same batch request. Second, if we have fetched an entity with key 1[id,name] from cache key function we could also answer (at least in JavaScript) keys 1[id] and 1[name] with that object. Here we could build a custom map implementation that we could supply to Dataloader. It would be smart enough to know these things about our cache.
Conclusion
We see that this is really a complicated matter. I know it is often listed as a benefit of GraphQL that you don't have to fetch all fields from a database for every query, but the truth is that in practice this is seldomly worth the hassle. Don't optimise what is not slow. And even is it slow, is it a bottleneck?
My suggestion is: Write trivial Dataloaders that simply fetch all (needed) fields. If you have one client it is very likely that for most entities the client fetches all fields anyways, otherwise they would not be part of you API, right? Then use something like query introsprection to measure slow queries and then find out which field exactly is slow. Then you optimise only the slow thing (see for example my answer here that optimises a single use case). And if you are a big ecomerce platform please don't use Dataloader for this. Build something smarter and don't use JavaScript.

Categories

Resources