Recursive Facebook Page Webscraper with Selenium & Node.js - javascript

What I try to do is to loop through an array of Facebook page IDs and to return the code from each event page. Unfortunately, I only get the code of the last page ID in the array but as many times as elements are in the array. E.g. when I have 3 ID's in the array I get 3 times the code of the last page ID.
I already experimented with async await but I had no success.
The expected outcome would be the code of each page.
Thank you for any help and examples.
//Looping through pages
pages.forEach(
function(page) {
//Creating URL
let url = "https://mbasic.facebook.com/"+page+"?v=events";
//Getting URL
driver.get(url).then(
function() {
//Page loaded
driver.getPageSource().then(function(result) {
console.log(result);
});
}
);
}
);

you faced the same issue i did when i created a scraper using python and selenium. Facebook has countermeasure on manual URL change, you cannot change it , i receive the same data again and again even though it was automated. in order to get a good result you need to have access of face books Graph API which provides a complete object of Facebook page with its pagination URL.
or the second way i got it write was i used on click button of selenium browser automation to scroll down the next page.it wont work like you are typing , i prefer the usage of graph API

Related

Apify crawler with more than 2 clickable element

I am trying to create an apify crawler, which has multiple clickable element. First click is to paginate, second click to visit each result, third is to visit a section of each result to extract more information.
function pageFunction(context) {
var $ = context.jQuery;
if (context.request.label === 'category'|| context.request.label === 'detail') {
context.skipLinks();
var result = {
item_name: $('name').text(),
categories: $('.categories').text(),
email: $('email').text(),
kvk: $('kvk').text()
};
return result;
} else {
context.skipOutput();
}
}
The first 2 clicks are happening, it paginates and visits the results and extract first 3 values : item_name, categories and email
The fourth value : kvk is not returned. I think either the third click is not happening or the code I used have some errors. Can anyone please help me to fix this?
One of the problems can the context.skipLinks() a function that prevents any new enqueued pages. Also, did you check all the selectors in the developer console? For debugging I would advise you to log the content of the page so you know it loaded. First, you need to find the source of the problem.
ONe side note, I would advice you to start developing is our modern web-scraper. Crawler platform is no longer maintained and may perform worse for some cases.

Attempting to use a global array inside of a JS file shared between 2 HTML files and failing

So I have one HTML page which consists of a bunch of form elements for the user to fill out. I push all the selections that the user makes into one global variable, allTheData[] inside my only Javascript file.
Then I have a 2nd HTML page which loads in after a user clicks a button. This HTML page is supposed to take some of the data inside the allTheData array and display it. I am calling the function to display allTheData by using:
window.onload = function () {
if (window.location.href.indexOf('Two') > -1) {
carousel();
}
}
function carousel() {
console.log("oh");
alert(allTheData.toString());
}
However, I am finding that nothing gets displayed in my 2nd HTML page and the allTheData array appears to be empty despite it getting it filled out previously in the 1st HTML page. I am pretty confident that I am correctly pushing data into the allTheData array because when I use alert(allTheData.toString()) while i'm still inside my 1st HTML page, all the data gets displayed.
I think there's something happening during my transition from the 1st to 2nd HTML page that causes the allTheData array to empty or something but I am not sure what it is. Please help a newbie out!
Web Storage: This sounds like a job for the window.sessionStorage object, which along with its cousin window.localStorage allows data-as-strings to be saved in the users browser for use across pages on the same domain.
However, keep in mind that they are both Cookie-like features and therefore their effectiveness depends on the user's Cookie preference for each domain.
A simple condition will determine if the web storage option is available, like so...
if (window.sessionStorage) {
// continue with app ...
} else {
// inform user about web storage
// and ask them to accept Cookies
// before reloading the page (or whatever)
}
Saving to and retrieving from web storage requires conversion to-and-from String data types, usually via JSON methods like so...
// save to...
var array = ['item0', 'item1', 2, 3, 'IV'];
sessionStorage.myApp = JSON.stringify(array);
// retrieve from...
var array = JSON.parse(sessionStorage.myApp);
There are more specific methods available than these. Further details and compatibility tables etc in Using the Web Storage API # MDN.
Hope that helps. :)

Get al the emails from a whole website ( not only a page )

Hey i'm trying to make a code that will automate the extraction of all the emails from a website by going through all the links and checking if there's a regex match but i can't figure it out here is what i got.
function getEmails() {
var search_in = document.body.innerHTML;
string_context = search_in.toString();
array_mails = string_context.match(/([a-zA-Z0-9._-]+#[a-zA-Z0-9._-]+\.[a-zA-Z0-9._-]+)/gi);
return array_mails;
}
You have to create a loop that will open every link that is presented on main page create ajax request and for each page opened use your function to get emails from it and push them to some array. Then you will have another array with all results. You will also need to check if your loop isn't infinite. Storing all links that have already been used will be needed.

Change URL data on page load

Hello I have a small website where data is passed between pages over URL.
My question is can someone break into it and make it pass the same data always?
For example let say, when you click button one, page below is loaded.
example.com?clicked=5
Then at that page I take value 5 and get some more data from user through a form. Then pass all the data to a third page. In this page data is entered to a database. While I observe collected data I saw some unusual combinations of records. How can I verify this?
yes. as javascript is open on the website, everyone can hack it.
you will need to write some code on you backend to validade it.
always think that you user/costumer will try to hack you sytem.
so take precautions like, check if user is the user of the session, if he is logged, if he can do what he is trying to do. check if the record that he is trying get exists.
if u are using a stand alone site, that u made the entire code from the ashes, you will need to implement this things by yourself.
like using the standard php session, making the data validation etc.
or you can find some classes that other people have made, you can find a lot o this on google. as it is a common problem of web programing.
if u are using a backed framework that isnt from another world, probably already has one. sp, go check its documentation.
html:
<a id = 'button-one' name = '5'> Button One </a>
javascript:
window.onload = function() {
document.getElementById('button-one').onclick = function() {
changeURL(this.attributes.name.value);
};
};
function changeURL(data) {
location.hash = data;
}

creating a configuration page and passing variables to a simply.js app

i developed a simply.js app that fetches bus arrival time from a webservice, problem is that as of now it work only for one stop.
i want to create a configuration page with a multiselect where i could choose multiple stops , sending them to the pebble as an array and at the press of up/down buttons i want to cycle the array to show different bus stops.
Im not good in C, i prefere javascript thats because i used simply.js.
id like to know and learn how to do it, because i think online there isnt much documentation and examples.
Found a similar question/ issue at simply.js github page https://github.com/Meiguro/simplyjs/issues/11. The code example below comes from Meiguros first answer. The code sends the user to your configuration website, which you should configure to send json back.
You can probably copy the code example for enabling the configuration window and paste it in the begining of your main pebble app.js file. Do not forget to add "capabilities": [ "configurable" ], in your appinfo.json file. If you are using cloudpebble you should go to the settings page of your app and make sure the configurable box is checked.
var initialized = false;
Pebble.addEventListener("ready", function() {
console.log("ready called!");
initialized = true;
});
Pebble.addEventListener("showConfiguration", function() {
console.log("showing configuration");
//change this url to yours
Pebble.openURL('http://assets.getpebble.com.s3-website-us-east-1.amazonaws.com/pebble-js/configurable.html');
});
Pebble.addEventListener("webviewclosed", function(e) {
console.log("configuration closed");
// webview closed
var options = JSON.parse(decodeURIComponent(e.response));
console.log("Options = " + JSON.stringify(options));
});
(https:// github.com/pebble-hacks/js-configure-demo/blob/master/src/js/pebble-js-app.js - remove space after https://)
To then push the settings back to the pebble i think you need to add
Pebble.sendAppMessage(options);
just before
console.log("configuration closed");
// webview closed
I found this out at the last post on this pebble forum thread http://forums.getpebble.com/discussion/12854/appmessage-inbox-handlers-not-getting-triggered-by-javascript-configuration-data
You can aslo find a configuration website example named configurable.html in the same git as the code example at https:// github.com/pebble-hacks/js-configure-demo remove space after https://
Hope this helps a bit on the way to achieving your goal
So the configuration page is a web page, and you can host it and provide your URL as mentioned by Ankan above.
Like this:
Pebble.openURL('http://assets.getpebble.com.s3-website-us-east-1.amazonaws.com/pebble-js/configurable.html');
Lets say you decide to take the name and age of the user in the configuration page, you would have two text fields for them to enter their information, then you would have a submit button. For the submit button write a javascript function which uses jQuery to take the values of the text fields onclick, then save those values to a variable, and use JSON to send them to the phone. Here is an example of a fully created configuration web page: https://github.com/pebble-hacks/js-configure-demo
Enjoy.

Categories

Resources