I have fetched the data from table tb_project_milestones and want to insert this projectMilestoneRow in a table tb_xyz using streams. I checked the documentation, but couldn't find how to implement it.
Has anyone implemented reading through streams and inserting through streams in MySQL.
let insertProjectMilestones = [];
const getProjectMilestones = executeQueryStream.query('SELECT * FROM tb_project_milestones WHERE project_id = ? ');
getProjectMilestones
.on('error', function(err) {
// Handle error, an 'end' event will be emitted after this as well
})
.on('result', function(projectMilestoneRow) {
// Pausing the connnection is useful if your processing involves I/O
connection.pause();
processRow(projectMilestoneRow, function() {
_.each(payload.projects, (project_id)=> {
_.each(projectMilestoneRow, (el)=> {
insertProjectMilestones.push([el.project_milestone_id, el.name, el.prefix, el.short_name, el.description, el.pre_requisites, project_id,
el.milestone_template_id, el.generic_milestone_id, el.planned_date, el.actual_date, el.forecast_date,
el.planned_date_only, el.forecast_date_only, el.actual_date_only, el.planned_time_only, el.forecast_time_only, el.actual_time_only,
el.planned_date_formula, el.actual_date_formula, el.forecast_date_formula, el.planned_date_is_active, el.forecast_date_is_active,
el.actual_date_is_active, el.creation_datetime, el.allow_notes, el.forecast_date_allow_notes, el.actual_date_allow_notes,
el.planned_date_allow_notes, 0, el.requires_approval]);
});
});
connection.resume();
});
})
.on('end', function() {
// all rows have been received
});
EDIT
I used streams in this case because millions of records are fetched from tb_project_milestones and then inserted into an array(after a manipulation) and then pushed into another table.
Considering the fact that pushing these many rows in the array would increase the memory of node I thought of using stream here.
Is stream better choice or could I just implement a batch insert in DB using transactions?
You can use knex stream and async iteration (ES2018/Node 10) for that
const knexClient = knex(someMysqlClientSettings);
const dbStream = knexClient("tb_project_milestones").where({ projectId }).stream();
for await (const row of dbStream){
const processedRowObj = process(row);
await knexClient("tb_xyz").insert(processedRowObj)
}
Wouldn't it be much faster and simpler to perform the single SQL statement:
INSERT INTO insertProjectMilestones (...)
SELECT ... FROM tb_project_milestones;
That way, the data is not shoveled to the client only to be turned around and shoveled back to the server.
And you could do transformations (expressions in the SELECT) and/or filtering (WHERE in SELECT) at the same time.
MySQL will impose essentially no limits on how big the table can be.
Related
I have a large csv and I need to read t and insert to mongodb
Csv contains user name, category name and policy name.
Need to insert Users into User collection with category id and policy id. Csv provides only the category name and policy name. So I need to fetch category id from collection using its name.
If category name not exist, create a new one and returns its id. Same case for policy.
So I tried
fs.createReadStream('./data_sheet.csv')
.pipe(csv())
.on('data', async (row) => {
// console.log(row)
let res = await Category.findOneOrCreate({ name: row.cat.trim() });
console.log(res)
})
.on('end', () => {
console.log('CSV file successfully processed');
});
categorySchema.statics.findOneOrCreate = async function findOneOrCreate(condition) {
try {
const self = this
let agent = await self.findOne(condition)
console.log("condition")
console.log(condition)
console.log("agent")
console.log(agent)
if (agent) return agent._id
else {
agent = await self.create(condition)
return agent._id
}
} catch (e) {
console.log(e)
}
}
This is not working in proper manner. What is the proper way to do this?
If by not working you mean, the data is not coming up for category then make sure you follow the right async approach or else provide more info.
There are several things to keep in mind,
You might need to create a cronjob for recursive process
Import the csv file into an array of Object
Loop over the object to match with category Id
If not then create a category Id
Return to the main function
update the document [here you can either update all or update one at a time using the step 2 looping we did]
I am stuck in what I thought was a very simple use case: I have a list of client ids in an array. All I want to do is fetch all those clients and "watch" them (using the .onSnapshot).
To fetch the client objects, it is nice and simple, I simply go through the array and get each client by their id. The code looks something like this:
const accessibleClients = ['client1', 'client2', 'client3']
const clients = await Promise.all(
accessibleClients.map(async clientId => {
return db
.collection('clients')
.doc(clientId)
.get()
})
)
If I just needed the list of clients, it would be fine, but I need to perform the .onSnapshot on it to see changes of the clients I am displaying. Is this possible to do? How can I get around this issue?
I am working with AngularFire so it is a bit different. But i also had the problem that i need to listen to unrelated documents which can not be queried.
I solved this with an object which contains all the snapshot listeners. This allows you to unsubscribe from individual client snapshots or from all snapshot if you do not need it anymore.
const accessibleClients = ['client1', 'client2', 'client3'];
const clientSnapshotObject = {};
const clientDataArray = [];
accessibleClients.forEach(clientId => {
clientSnapshotArray[clientId] = {
db.collection('clients').doc(clientId).onSnapshot(doc => {
const client = clientDataArray.find(client => doc.id === client.clientId);
if (client) {
const index = clientDataArray.findIndex(client => doc.id === client.clientId);
clientDataArray.splice(index, 1 , doc.data())
} else {
clientDataArray.push(doc.data());
}
})
};
})
With the clientIds of the accessibleClients array, i create an object of DocumentSnapshots with the clientId as property key.
The snapshot callback function pushes the specific client data into the clientDataArray. If a snapshot changes the callback function replaces the old data with the new data.
I do not know your exact data model but i hope this code helps with your problem.
I have the following code:
myTable()
.update(data, {
where: criteria
})
.then(delay(100))
.then((entries) => {
...
...
The .then(delay(100)) part sets a delay of 100ms.
If I don't use that delay, sometimes entries (the resulted updated rows) aren't correct, meaning their fields were not updated. But sometimes they are.
If I'm using the delay, the content of entries is always correct.
Why do I have to set a delay for it to work?
My local MySQL my.cnf file:
[mysqld]
general_log_file = /var/log/mysql.log
general_log = 1
sql_mode = STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION
innodb_file_per_table = 1
innodb_log_file_size = 100M
interactive_timeout = 32000
lock_wait_timeout = 41536000
net_read_timeout = 120
net_write_timeout = 900
wait_timeout = 32000
max_allowed_packet = 1G
innodb_buffer_pool_size = 1G
In terms of the table schema and model:
It has a few double columns, a couple of datetime and char, one json column and one enum column.
They are defined the same in the model.
1) Please check the isolation level on database
2) In general the Transactions may be what you need, if the issue with isolation level appears, try to select different isolation level for a transaction
3) cluster mode may result in such issues
So, the .then(... block is executed after the update promise is resolved. And it does not have influence on the update query.
return sequelize.transaction(function (t) {
// chain all your queries here. make sure you return them.
return yourModel.update(updates,{where: { 'id': id } },
{transaction: t}).then(function (entries) {
// your logic with your entries.
});
}).then(function (result) {
// Transaction has been committed
cb(null,result);
console.log('transaction commited');
}).catch(function (err) {
// Transaction has been rolled back
cb(err,null);
console.log('Transaction rolled back');
});
And if you are curious on what was happening when you add .then(delay(100))... statement use catch block because when you find the entries diffrenet from what you have expected it is because the query has already failed to update.
I'm struggling to find an example of using a cursor with pg-promise. node-postgres supports its pg-cursor extension. Is there a way to use that extension with pg-promise? I'm attempting to implement an asynchronous generator (to support for-await-of). pg-query-stream doesn't seem to be appropriate for this use case (I need "pull", rather than "push").
As an example, I use SQLite for my unit tests and my (abridged) generator looks something like this...
async function* () {
const stmt = await db.prepare(...);
try {
while (true) {
const record = await stmt.get();
if (isUndefined(record)) {
break;
}
yield value;
}
}
finally {
stmt.finalize();
}
}
Using pg-cursor, the assignment to stmt would become something like client.query(new Cursor(...)), stmt.get would become stmt.read(1) and stmt.finalize would become stmt.close.
Thanks
Following the original examples, we can modify them for use with pg-promise:
const pgp = require('pg-promise')(/* initialization options */);
const db = pgp(/* connection details */);
const Cursor = require('pg-cursor');
const c = await db.connect(); // manually managed connection
const text = 'SELECT * FROM my_large_table WHERE something > $1';
const values = [10];
const cursor = c.client.query(new Cursor(text, values));
cursor.read(100, (err, rows) => {
cursor.close(() => {
c.done(); // releasing connection
});
// or you can just do: cursor.close(c.done);
});
Since pg-promise doesn't support pg-cursor explicitly, one has to manually acquire the connection object and use it directly, as shown in the example above.
pg-query-stream doesn't seem to be appropriate for this use case (I need pull, rather than push).
Actually, in the context of these libraries, both streams and cursors are only for pulling data. So it would be ok for you to use streaming also.
UPDATE
For reading data in a simple and safe way, check out pg-iterator.
I'm learning FRP using Bacon.js, and would like to assemble data from a paginated API in a stream.
The module that uses the data has a consumption API like this:
// UI module, displays unicorns as they arrive
beautifulUnicorns.property.onValue(function(allUnicorns){
console.log("Got "+ allUnicorns.length +" Unicorns");
// ... some real display work
});
The module that assembles the data requests sequential pages from an API and pushes onto the stream every time it gets a new data set:
// beautifulUnicorns module
var curPage = 1
var stream = new Bacon.Bus()
var property = stream.toProperty()
var property.onValue(function(){}) # You have to add an empty subscriber, otherwise future onValues will not receive the initial value. https://github.com/baconjs/bacon.js/wiki/FAQ#why-isnt-my-property-updated
var allUnicorns = [] // !!! stateful list of all unicorns ever received. Is this idiomatic for FRP?
var getNextPage = function(){
/* get data for subsequent pages.
Skipping for clarity */
}
var gotNextPage = function (resp) {
Array.prototype.push.apply(allUnicorns, resp) // just adds the responses to the existing array reference
stream.push(allUnicorns)
curPage++
if (curPage <= pageLimit) { getNextPage() }
}
How do I subscribe to the stream in a way that provides me a full list of all unicorns ever received? Is this flatMap or similar? I don't think I need a new stream out of it, but I don't know. I'm sorry, I'm new to the FRP way of thinking. To be clear, assembling the array works, it just feels like I'm not doing the idiomatic thing.
I'm not using jQuery or another ajax library for this, so that's why I'm not using Bacon.fromPromise
You also may wonder why my consuming module wants the whole set instead of just the incremental update. If it were just appending rows that could be ok, but in my case it's an infinite scroll and it should draw data if both: 1. data is available and 2. area is on screen.
This can be done with the .scan() method. And also you will need a stream that emits items of one page, you can create it with .repeat().
Here is a draft code (sorry not tested):
var itemsPerPage = Bacon.repeat(function(index) {
var pageNumber = index + 1;
if (pageNumber < PAGE_LIMIT) {
return Bacon.fromCallback(function(callback) {
// your method that talks to the server
getDataForAPage(pageNumber, callback);
});
} else {
return false;
}
});
var allItems = itemsPerPage.scan([], function(allItems, itemsFromAPage) {
return allItems.concat(itemsFromAPage);
});
// Here you go
allItems.onValue(function(allUnicorns){
console.log("Got "+ allUnicorns.length +" Unicorns");
// ... some real display work
});
As you noticed, you also won't need .onValue(function(){}) hack, and curPage external state.
Here is a solution using flatMap and fold. When dealing with network you have to remember that the data can come back in a different order than you sent the requests - that's why the combination of fold and map.
var pages = Bacon.fromArray([1,2,3,4,5])
var requests = pages.flatMap(function(page) {
return doAjax(page)
.map(function(value) {
return {
page: page,
value: value
}
})
}).log("Data received")
var allData = requests.fold([], function(arr, data) {
return arr.concat([data])
}).map(function(arr) {
// I would normally write this as a oneliner
var sorted = _.sortBy(arr, "page")
var onlyValues = _.pluck(sorted, "value")
var inOneArray = _.flatten(onlyValues)
return inOneArray
})
allData.log("All data")
function doAjax(page) {
// This would actually be Bacon.fromPromise($.ajax...)
// Math random to simulate the fact that requests can return out
// of order
return Bacon.later(Math.random() * 3000, [
"Page"+page+"Item1",
"Page"+page+"Item2"])
}
http://jsbin.com/damevu/4/edit