How to do nested looping over many pages in CasperJS - javascript

I don't have a clue where to start with this. Basically I need CasperJS to run through about 15 different pages, each page that it runs through it needs to get the data for 150 different locations that need to be set as cookie values. For each location, I need to check the data for 5 different dates.
Any one of these seems pretty straight forward, but trying to get all three to happen is confusing me.
I tried to set it up this way:
for(Iterate through URLs){
for(Iterate through locations){
for(Iterate through dates){
phantom.addCookie({
// Cookie data here based on location and date
});
casper.start(url)
.then(function(){
// Do some stuff here
})
.run();
}
}
}
Essentially what it does is loop through everything, then load the page based on the last link, at the last location, on last date. But every other location gets skipped. Is there an easier way to do this? Perhaps better, is there a way to tell my JavaScript loop to wait for casper to finish doing what it needs to do before jumping to the next loop iteration?
I'm happy to provide more details if needed. I tried to simplify the process as best I can without cutting out needed info.

That's pretty much it. Two things to look out for:
casper.start() and casper.run() should only be called once per script. You can use casper.thenOpen() to open different URLs.
Keep in mind that all casper.then*() and casper.wait*() functions are asynchronous step functions and are only scheduled for execution after the current step. Since JavaScript has function level scope, you need to "fix" the iteration variables for each iteration otherwise you will get only the last URL. (More information)
Example code:
casper.start(); // deliberately empty
for (var url in urls) {
for (var location in locations) {
for (var date in dates) {
(function(url, location, date){
casper.then(function(){
phantom.addCookie({
// Cookie data here based on location and date
});
}).thenOpen(url)
.then(function(){
// Do some stuff here
});
})(url, location, date);
}
}
}
casper.run(); // start all the scheduled steps
If you use Array.prototype.forEach instead of the for-loop, then you can safely skip the use of the IIFE to fix the variables.
I'm not sure, but you may need to first open a page to then add a cookie for that domain. It may be possible that PhantomJS only accepts a cookie when that domain for that cookie is currently open.

Related

Firebase concurrency issue: how to prevent 2 users from getting the same Game Key?

DATABASE:
SITUATION:
My website sells keys for a game.
A key is a randomly generated string of 20 characters whose uniqueness is guaranteed (not created by me).
When someone buys a key, NTWKeysLeft is read to find it's first element. That element is then copied, deleted from NTWKeysLeft and pasted to NTWUsedKeys.
Said key is then displayed on the buyer's screen.
PROBLEM:
How can I prevent the following problem :
1) 2 users buy the game at the exact same time.
2) They both get the same key read from NTWKeysLeft (first element in list)
3) And thus both get the same key
I know about Firebase Transactions already. I am looking for a pseudo-code/code answer that will point me in the right direction.
CURRENT CODE:
Would something like this work ? Can I put a transaction inside another transaction ?
var keyRef = admin.database().ref("NTWKeysLeft");
keyRef.limitToFirst(1).transaction(function (keySnapshot) {
keySnapshot.forEach(function(childKeySnapshot) {
// Key is read here:
var key = childKeySnapshot.val();
// How can I prevent two concurrent read requests from reading the same key ? Using a transaction to change a boolean could only happen after the read happens since I first need to read in order to know which key boolean to change.
var selectedKeyRef = admin.database().ref("NTWKeysLeft/"+key);
var usedKeyRef = admin.database().ref("NTWUsedKeys/"+key);
var keysLeftRef = admin.database().ref("keysLeft");
selectedKeyRef.remove();
usedKeyRef.set(true);
keysLeftRef.transaction(function (keysLeft) {
if (!keysLeft) {
keysLeft = 0;
}
keysLeft = keysLeft - 1;
return keysLeft;
});
res.render("bought", {key:key});
});
});
Just to be clear: keyRef.limitToFirst(1).transaction(function (keySnapshot) { does not work, but I would like to accomplish something to that effect.
Most depends on how you generate the keys, since that determines how likely collisions are. I recommend reading about Firebase's push IDs to get an idea how unique those are and compare that to your keys. If you can't statistically guarantee uniqueness of your keys or if statistical uniqueness isn't good enough, you'll have to use transactions to prevent conflicting updates.
The OP has changed the question a bit so, i will update the answer as follows: I will leave the bottom part about transactions as it was and will put the new update on top.
I can see two ways to proceed:
1) handle the lock system on your own and use JavaScript callbacks or other mechanisms for preventing simultaneous access to a portion of the code.
or
2) Use transactions/fireBase. On this case, i don't have the setup ready to share code other than sample/pseudo code provided at the bottom of this page.
With respect to option 1 above:
I have coded a use-case and put in on plunker. It uses JavaScript callbacks to queue users as they try to access the part of the code under lock.
I. user comes in and he is placed in queue
II. It then calls the callback function which pops users as
first come first out bases. I have the keys on top of the page to
be shared by the functions.
I have a button click event to this and when you click the button twice quickly, you will see keys assigned and they're different keys.
To read this code, click on the script.js file on the left and read starting from the bottom of the page where it calls the functions.
Here is the sample code in plunker. After clicking it, click on Run on top of the page and then click on the button on right hand side. Alert will pop up to show which key is given (note, there are two calls back to back to show two users coming in at same time)
https://plnkr.co/edit/GVFfvqQrlLeMaKlo5FCj?p=info
The fireBase transactions:
Use fireBase transactions to prevent concurrent read/write issues - below is the transaction() method signiture
transaction(dataToBeWritten, onComplete, applyLocally) returns fireBase.promise containing {
committed: boolean, nullable fireBase.database.snapshot }
Note, transaction needs writeOperation as first parameter and in your case looks like you’re removing a key upon success! hence the following function to be called in place of write
Try this pseudo code :
//first, get reference to your db
var selectedKeyRef = admin.database().ref("NTWKeysLeft/"+key);
// needed by transaction as first parameter
function writeOperation() {
selectedKeyRef.remove();
}
selectedKeyRef.transaction(function(writeOperation) , function(error,
committed, snapshot) {
  if (error) {
    console.log('Transaction failed abnormally!', error);
  } else if (!committed) {
    console.log('We aborted the transaction (because xyz).’);
  } else {
    console.log(‘keyRemoved!’);
  }
  console.log(“showKey: ", snapshot.val());
}); // end of the transaction() method call
Docs + to see parameters/return objects of the transaction() method see:
https://firebase.google.com/docs/reference/js/firebase.database.Reference#transaction
In the Docs.... If another client writes to the location before your new value is successfully written, your update function is called again with the new current value, and the write is retried.
https://firebase.google.com/docs/database/web/read-and-write#save_data_as_transactions
I don't think the problem you're worried about can happen. JavaScript, including Node, is single-threaded and can only do one thing at a time. If you had a big server infrastructure with more than one server running this code, then it would be possible, but for a single Node program, there's no problem.
Since none of the previous answers discussing the scope of Transactions worked out, I would suggest a different workaround.
Is it possible to trigger the unique code generation when someone buys a code? If yes, you could generate the unique string if the "buy" button is clicked, display the ID and save the ID to your database.
Later the user enters the key in your game, which checks if the ID is written in your database. This might probably also save a bit of data, since you do not need to keep track of the unique IDs before they get bought and you will also not run out of IDs, since they will always get generated when necessary.

nedb method update and delete creates a new entry instead updating existing one

I'm using nedb and I'm trying to update an existing record by matching it's ID, and changing a title property.
What happens is that a new record gets created, and the old one is still there.
I've tried several combinations, and tried googling for it, but the search results are scarce.
var Datastore = require('nedb');
var db = {
files: new Datastore({ filename: './db/files.db', autoload: true })
};
db.files.update(
{_id: id},
{$set: {title: title}},
{},
callback
);
What's even crazier when performing a delete, a new record gets added again, but this time the record has a weird property:
{"$$deleted":true,"_id":"WFZaMYRx51UzxBs7"}
This is the code that I'm using:
db.files.remove({_id: id}, callback);
In the nedb docs it says followings :
localStorage has size constraints, so it's probably a good idea to set
recurring compaction every 2-5 minutes to save on space if your client
app needs a lot of updates and deletes. See database compaction for
more details on the append-only format used by NeDB.
 
Compacting the database
Under the hood, NeDB's persistence uses an append-only format, meaning
that all updates and deletes actually result in lines added at the end
of the datafile. The reason for this is that disk space is very cheap
and appends are much faster than rewrites since they don't do a seek.
The database is automatically compacted (i.e. put back in the
one-line-per-document format) everytime your application restarts.
You can manually call the compaction function with
yourDatabase.persistence.compactDatafile which takes no argument. It
queues a compaction of the datafile in the executor, to be executed
sequentially after all pending operations.
You can also set automatic compaction at regular intervals with
yourDatabase.persistence.setAutocompactionInterval(interval), interval
in milliseconds (a minimum of 5s is enforced), and stop automatic
compaction with yourDatabase.persistence.stopAutocompaction().
Keep in mind that compaction takes a bit of time (not too much: 130ms
for 50k records on my slow machine) and no other operation can happen
when it does, so most projects actually don't need to use it.
I didn't use this but it seems , it uses localStorage and it has append-only format for update and delete methods.
When investigated its source codes, in that search in persistence.tests they wanted to sure checking $$delete key also they have mentioned `If a doc contains $$deleted: true, that means we need to remove it from the data``.
So, In my opinion you can try to compacting db manually, or in your question; second way can be useful.

How to make specific JS script work only for certain page

How to make script for specific page, checking existing cookies and if exists redirecting to specified page? So far I have something like this, but it keeps triggering the function on every page for some reason( just to clarify I use jQueryMobile):
$(document).on('pageinit',function(){
jQuery(function($){
if ($('body#home').length){
if($.cookie('usr') && $.cookie('psw')){
$.mobile.changePage("http://imes.**********.com/userpanel.php");
}
}
});
});
In future I won't be storing name and password in cookie, I know about security issues of this approach, I will be storing generated key to match user from user cookie, but for now, for stage of testing I use these cookies.
Your ID approach is essentially good, but testing for the length is something I don't trust (I've got not arguments for that, just a feeling).
That's why I would go for this approach:
$("body#home").on("load", function() {
// Eat your cookies here.
});
P.S.
Naming the tag is useless, it only takes time. The browser reads CSS selectors (and I think JS querySelectors too, then, PS is based on this assumption) from the right to the left, so it will first find the one element with the ID 'home', and then get all bodies (in this case there is only one body, but imagine you did div#home and had 100 divs on your page) and pick the right one. As every ID can only occur once on a page, it is not necessary to name the tag.
I'm not too sure what you are asking, but if you want to run code on a specific page only you can play around with location.pathname to determine what page you are on.
jQuery(function($){
if ( location.pathname.indexOf( '/home' ) ){
// ...
}
});
This approach would be faster than having to look up dom nodes. Also, the username and password should be handled by sessions your backend language

Database Backed Work Queue

My situation ...
I have a set of workers that are scheduled to run periodically, each at different intervals, and would like to find a good implementation to manage their execution.
Example: Let's say I have a worker that goes to the store and buys me milk once a week. I would like to store this job and it's configuration in a mysql table. But, it seems like a really bad idea to poll the table (every second?) and see which jobs are ready to be put into the execution pipeline.
All of my workers are written in javascript, so I'm using node.js for execution and beanstalkd as a pipeline.
If new jobs (ie. scheduling a worker to run at a given time) are being created asynchronously and I need to store the job result and configuration persistently, how do I avoid polling a table?
Thanks!
I agree that it seems inelegant, but given the way that computers work something *somewhere* is going to have to do polling of some kind in order to figure out which jobs to execute when. So, let's go over some of your options:
Poll the database table. This isn't a bad idea at all - it's probably the simplest option if you're storing the jobs in MySQL anyway. A rate of one query per second is nothing - give it a try and you'll notice that your system doesn't even feel it.
Some ideas to help you scale this to possibly hundreds of queries per second, or just keep system resource requirements down:
Create a second table, 'job_pending', where you put the jobs that need to be executed within the next X seconds/minutes/hours.
Run queries on your big table of all jobs only once in a longer while, then populate the small table which you query every shorter while.
Remove jobs that were executed from the small table in order to keep it small.
Use an index on your 'execute_time' (or whatever you call it) column.
If you have to scale even further, keep the main jobs table in the database, and use the second, smaller table I suggest, just put that table in RAM: either as a memory table in the DB engine, or in a Queue of some kind in your program. Query the queue at extremely short intervals if you have too - it'll take some extreme use cases to cause any performance issues here.
The main issue with this option is that you'll have to keep track of jobs that were in memory but didn't execute, e.g. due to a system crash - more coding for you...
Create a thread for each of a bunch of jobs (say, all jobs that need to execute in the next minute), and call thread.sleep(millis_until_execution_time) (or whatever, I'm not that familiar with node.js).
This option has the same problem as no. 2 - where you have to keep track job execution for crash recovery. It's also the most wasteful imo - every sleeping job thread still takes system resources.
There may be additional options of course - I hope that others answer with more ideas.
Just realize that polling the DB every second isn't a bad idea at all. It's the most straightforward way imo (remember KISS), and at this rate you shouldn't have performance issues so avoid premature optimizations.
Why not have a Job object in node.js that's saved to the database.
var Job = {
id: long,
task: String,
configuration: JSON,
dueDate: Date,
finished: bit
};
I would suggest you only store the id in RAM and leave all the other Job data in the database. When your timeout function finally runs it only needs to know the .id to get the other data.
var job = createJob(...); // create from async data somewhere.
job.save(); // save the job.
var id = job.id // only store the id in RAM
// ask the job to be run in the future.
setTimeout(Date.now - job.dueDate, function() {
// load the job when you want to run it
db.load(id, function(job) {
// run it.
run(job);
// mark as finished
job.finished = true;
// save your finished = true state
job.save();
});
});
// remove job from RAM now.
job = null;
If the server ever crashes all you have to is query all jobs that have [finished=false], load them into RAM and start the setTimeouts again.
If anything goes wrong you should be able to restart cleanly like such:
db.find("job", { finished: false }, function(jobs) {
each(jobs, function(job) {
var id = job.id;
setTimeout(Date.now - job.dueDate, function() {
// load the job when you want to run it
db.load(id, function(job) {
// run it.
run(job);
// mark as finished
job.finished = true;
// save your finished = true state
job.save();
});
});
job = null;
});
});

jQuery $.each()-problem

im making a wordpress plugin and i have a function where i import images, this is done with a $.each()-loop that calls a .load()-function every iteration. The load-function page the load-function calls is downloading the image and returns a number. The number is imported into a span-element. The source and destination Arrays is being imported from LI-elemnts of a hidden ULs.
this way the user sees a counter counting from zero up to the total number of images being imported. You can se my jQuery code below:
jQuery(document).ready(function($) {
$('#mrc_imp_img').click(function(){
var dstA = [];
var srcA = [];
$("#mrc_dst li").each(function() { dstA.push($(this).text()) });
$("#mrc_src li").each(function() { srcA.push($(this).text()) });
$.each(srcA, function (i,v) {
$('#mrc_imgimport span.fc').load('/wp-content/plugins/myplugin/imp.php?num='+i+'&dst='+dstA[i]+'&src='+srcA[i]);
});
});
});
This works pretty good but sometimes it looks like the load function isn't updating the DOM as fast as it should because sometimes the numbers that the span is updated with is lower than the previous and almost everytime a lower number is replacing the last number in the end. How can i prevent this from happening and how can i make it hide '#mrc_imp_img' when the $.each-loop is ready?
AJAX calls which have been called earlier are not guaranteed to finish earlier so the smaller number can overwrite the bigger. One solution is to simply increment the counter on each successful call:
jQuery(function($) {
$('#mrc_imp_img').click(function(){
var dstList = $("#mrc_dst li");
var srcList = $("#mrc_src li");
dstList.each(function(i) {
var dst = $(this).text();
var src = srcList[i].text();
$.post('/wp-content/plugins/myplugin/imp.php?num='+i+'&dst='+dst+'&src='+src, function() {
$('#mrc_imgimport span.fc').text($('#mrc_imgimport span.fc').text()+1);
});
});
});
});
(Changed the code to avoid unnecessary array operations, changed onready call to use shorthand, changed AJAX call to use POST which should be used for operations that change state.)
Most servers likely have a finite number of threads running. If you're firing off 10 calls at once, and your server only has 5 threads, 5 of them will fail.
Also - once you max out all the running threads, no other users can access the server, so you're essentially DOS-ing the server.
If you don't mind slowing it down to one call at a time, do what Tgr recommended which serializes the calls, waiting until each one completes before starting the next one.
I would prefer what Yoda suggested. What you can do is turn it into one server call that processes the entire array. If you really want to update a counter client-side, that one server call can update a counter in the database - and then a 2nd ajax call can poll the server every few seconds to find out where the counter is. Obviously wont be guaranteed to be sequential but will be better for your server health. You could also fake the sequential aspect (if you're on #3 and the next call yields a #6 - increment it client side one by one)
As far as not seeing an alert, there is probably a javascript error before or on the alert line. Try using firebug and the console.log statement, or even bette, step through it with the firebug debugger.

Categories

Resources