So I am trying to make puppeteer lunch a page and then put a token inside a local storage.
.setItem not working it is just crushing my chromium.
so there is a page called discord
and if you have a user token you can log in to the page with a script
So I found out that someone has made a script that you can past in the console
and then when the code says "token here" you past your token and then it all happens
let token = "your token";
function login(token) {
setInterval(() => {
document.body.appendChild(document.createElement `iframe`).contentWindow.localStorage.token = `"${token}"`
}, 50);
setTimeout(() => {
location.reload();
}, 2500);
}
login(token);
This is the code.
so my idea is to make puppeteer run this code and then just login to the page after refresh
there is any option to do it?
If there is no option I have though about another solution,
maybe make puppeteer type in the console the whole code.
If you want to execute the JavaScript code in the browser context with Puppeteer, you need an evaluate method. For reference: https://github.com/puppeteer/puppeteer/blob/v10.4.0/docs/api.md#pageevaluatepagefunction-args or Google for similar and more user-friendly examples - there are plenty on web and SO.
Basically, to execute any JS code in a browser context, you put your code inside an evaluate method and it should look like this:
await page.evaluate(() => new Promise((resolve) => {
// your browser JS code goes here
}
As for the cookies part, they should persist even after reload (haven't tested, but check this for reference: Cookies gone after reload Puppeteer => page.setCookie(...cookies)).
Also, maybe unrelated, but be careful that everything is alright legal-wise, because bots and bot-like behavior is frowned upon by many sites and in breach of their ToS.
Related
I am trying to write end to end tests for this application with Cypress: https://app.gotphoto.com/admin/auth/login
When I visit the above url from my browswer, a login form is showing, as expected.
When I visit the above url through Cypress:
cypress first navigates to https://app.gotphoto.com/admin/auth/login
immediately afterwards I am redirected to https://app.gotphoto.com/__/ and the login form is not showing
These are two screenshots from inside Cypress:
My question is: why is there a difference between how it runs in my browser and how it runs in Cypress / Cypress's browswer?
The browswer I am using is Chrome 89, both when running with and without Cypress.
The entirety of the test I am running is this:
describe('login screen', () => {
it('logs in', () => {
cy.visit('/admin/auth/login');
});
});
with a cypress.json:
{
"baseUrl": "https://app.gotphoto.com"
}
I created a repo with the above configuration so it's simple to reproduce.
The /__/ portion of https://app.gotphoto.com/__/ is called the clientRoute and is an internal configuration item in Cypress.
You can turn it off in your cypress.json configuration file
{
...
"clientRoute": "/"
}
This effectively keeps your original url and allows the page to load properly.
cy.visit('https://app.gotphoto.com/admin/auth/login')
cy.get('input#username', { timeout: 10000 }).type('admin') // long timeout
// wait for page to load
cy.get('input#password').type('password')
cy.intercept('POST', 'api.getphoto.io/v4/auth/login/user').as('user')
cy.contains('button', 'Submit').click()
cy.wait('#user').then(interception => {
// incorrect credentials
expect(interception.response.body.detail).to.eq('Login failed!')
})
I'm not sure of any bad side effects of changing clientRoute, will post more information if I find it.
That redirect to __/ sounds familiar to an issue I stumbled upon some time ago. I found this comment in one of Cypress' issues quite helpful.
So did you already try to use the configuration option experimentalSourceRewriting? In your cypress.json, it may look like this:
{
"baseUrl": "https://app.gotphoto.com"
"experimentalSourceRewriting": true
}
As it's labelled experimental, I'd recommend testing it carefully but maybe it helps a bit. I hope for the best! 🙏
why is there a difference between how it runs in my browser and how it runs in Cypress / Cypress's browser?
Your normal browser waits for the XHR requests to be completed and renders the final output created by whatever js magic you have written in there but cy.visit is not supposed to wait for those XHR / AJAX requests inside. It gets 200 in response and moves ahead. If you add a cypress command next to cy.visit, something like cy.get('h1'), you will notice that this command runs instantly after cy.visit, and after that, your XHR requests are resolved.
One work around here can be to use cy.intercept, for example (Cypress 6.8.0, Chrome 89):
describe("login screen", () => {
it("logs in", () => {
cy.intercept({
method: "GET",
url: "admin/version/master/index.html"
}).as("indexHTML"); // Similarly add other internal xhr requests
cy.visit("/admin/auth/login");
cy.wait("#indexHTML").then(interception => {
expect(interception.response.statusCode).to.be.eq(200);
});
});
});
Output:
It basically waits for your internal XHR requests to finish and allows you to play with the request and responses once they are resolved.
This issue will help you debug further: https://github.com/cypress-io/cypress/issues/4383
Also, this /__/ has no hand in rendering the blank page IMO.
An example of logging in. Ultimately this is a bit of a hacky solution as it fails on the very first try; however, it works on any subsequent attempt.
Add the following to your command.js
// -- Visit multiple domains in one test
Cypress.Commands.add('forceVisit', url => {
cy.window().then(win => {
return win.open(url, '_self');
});
});
login.spec.js
describe('login screen', () => {
it('logs in', {
retries: {
runMode: 1,
openMode: 1
}
}, () => {
cy.forceVisit('https://app.gotphoto.com/admin/auth/login');
cy.get('#username').should('exist');
});
});
Screenshot:
I'm a beginner to Cypress. I'm sure it is a simple question and I already read the documentation of Cypress, but something still seems to wrong in my Cypress test. I want to wait for an xhr request to be finished, when I click on a different language of the page I want to test.
It works, when I use wait(5000), but I think, there is a better way to wait for the xhr request to be finished than fix wait 5 secs.
This is my code:
describe('test',() => {
it('should open homepage, page "history", click on English language, click on German language',() => {
cy.server();
cy.route('POST','/ajax.php').as('request');
cy.visit('http://localhost:1234/history');
cy.wait('#request');
cy.get('div[class="cursorPointer flagSelect flag-icon-gb"]').click({force:true});
cy.route('POST','/ajax.php').as('request');
cy.wait(['#request']);
//cy.wait(5000); // <- this works, but seems to be not the best way
cy.get('h2').should(($res) => {
expect($res).to.contain('History');
})
cy.get('.dataContainer').find('.container').should('have.length', 8);
});
});
The last check
cy.get('.dataContainer').find('.container').should('have.length', 8);
is not successful, because the xhr request is not yet finished.
The xhr request is being fired, when the click on the icon is done:
cy.get('div[class="cursorPointer flagSelect flag-icon-gb"]').click({force:true});
Here an image of the xhr request, if that helps to find the error:
Are you sure that this line is correct? Otherwise the cy.wait won't function as you want.
cy.route('POST','/ajax.php').as('request');
I expect something like
cy.route('GET','/endpoint').as('request');
You can lookup what route is it via developer tools (F12 in Chrome).
Go to network to monitor what kind of XHRs load when you open your page.
Find out request URL and Method - example with bing.com
Also:
I prefer to include the cy.server() and cy.route() command in the beforeEach.
Then you only need the cy.wait() in the test itself.
See https://docs.cypress.io/guides/references/best-practices.html#2-Run-shared-code-before-each-test for more information about that.
you should do like that:
describe('test',() => { //no here async mode
it('should open homepage, page "history", click on English language, click on German language', async () => { //but here
cy.server();
cy.route('POST','/ajax.php').as('request').as('requestToWait); // as-construction
const requestToWait = await cy.wait('#requestToWait');//here we are waiting and getting response object
// any other code
});
I'm building a very simple scraper to get the 'now playing' info from an online radio station I like to listen too.
It's stored in a simple p element on their site:
data html location
Now using the standard apify/web-scraper I run into a strange issue. The scraping sometimes works, but sometimes doesn't using this code:
async function pageFunction(context) {
const { request, log, jQuery } = context;
const $ = jQuery;
const nowPlaying = $('p.js-playing-now').text();
return {
nowPlaying
};
}
If the scraper works I get this result:
[{"nowPlaying": "Hangover Hotline - hosted by Lamebrane"}]
But if it doesn't I get this:
[{"nowPlaying": ""}]
And there is only a 5 minute difference between the two scrapes. The website doesn't change, the data is always presented in the same way. I tried checking all the boxes to circumvent security and different mixes of options (Use Chrome, Use Stealth, Ignore SSL errors, Ignore CORS and CSP) but that doesn't seem to fix it unfortunately.
Scraping instable
Any suggestions on how I can get this scraping task to constantly return the data I need?
It would be great if you can attach the URL, it will help me to find out the problem.
With the information you provided, I guess that the data you want to are loaded asynchronously. You can use context.waitFor() function.
async function pageFunction(context) {
const { request, log, jQuery } = context;
const $ = jQuery;
await context.waitFor(() => !!$('p.js-playing-now').text());
const nowPlaying = $('p.js-playing-now').text();
return {
nowPlaying
};
}
You can pass the function to wait, and I will wait until the result of the function will be true. You can check the doc.
I am preparing JavaScript code that shows a random number for user as follows: if the user spend more than two minutes to pass to the next web page or if the actual page has the GET parameter "&source", the random number is replaced by another one. otherwise, the same random number is displayed for all the web pages.
The problem is that the JavaScript code should be executed manually from browser console on each page load: I should prepare a code that can be integrated to any web page from console.
Is there any difference from the normal case (include script with<script></script>)
Thanks for posting! In future posts, please try to provide some code or an example of something you've tried previously.
Anyways, here is a brief example of a script that will check for an existing number, check to see if there is a &source parameter set, begin the timer if there isn't one, and generate a new number if the timer finishes or the parameter is set.
To save the information between pages, you should consider using window.localStorage. This will allow you to check for and save the number to be used on later loads.
Note that this snippet isn't going to work until you bring it into your own page. Also, as #Sorin-Vladu mentioned, you'll have to use a browser extension if you don't have access to modify the pages you're running the script on.
const timeout = 120000
// This can be replaced by your manual execution
window.onload = () => {
start()
}
function start() {
// Attempt to pull the code from storage
let code = localStorage.getItem('code')
console.log(code)
// Get the URL parameters
let urlParams = new URLSearchParams(window.location.search)
// Check to see if the source parameter exists
if (!urlParams.has('source')) {
// If not, begin the timer
setTimeout(() => {
setCode()
}, timeout)
} else {
setCode()
}
}
function setCode() {
const code = Math.floor(Math.random() * 1000000)
localStorage.setItem('code', code)
console.log(code)
}
I am working on a scraper . I am using Phantom JS along with Node JS. Phantom JS loads the page with async function, just like : var status = await page.open(url). Sometimes, because of the slow internet the page takes longer to load and after a time the page status is not returned, to check while its loaded or not. And the page.open() sleeps, which doesn't return anything at all, and all the execution is waiting.
So, my basic question is; is there any way to keep this page.open(url) alive, as the execution of the rest of the code waits until the page is loaded.
My Code is
const phantom = require('phantom');
ph_instance = await phantom.create();
ph_page = await ph_instance.createPage();
var status = await ph_page.open("https://www.cscscholarship.org/");
if (status == 'success') {
console.log("Page is loaded successfully !");
//do more stuff
}
From your comment, it seems like it might be timing out (because of slow internet sometimes)... you can validate this by adding the onResourceTimeout method to your code (link: http://phantomjs.org/api/webpage/handler/on-resource-timeout.html)
It would look something like this:
ph_instance.onResourceTimeout = (request) => {
console.log('Timeout caught:' + JSON.stringify(request));
};
And if that ends up being true, you can increase the default resource timeout settings (link: http://phantomjs.org/api/webpage/property/settings.html) like this:
ph_instance.settings.resourceTimeout = 60000 // 60 seconds
Edit: I know the question is about phantom, but I wanted to also mention another framework I've used for scraping projects before called Puppeteer (link: https://pptr.dev/) I personally found that their API's are easier to understand and code in, and it's currently a maintained project unlike Phantom JS which is not maintained anymore (their last release was two years ago).