I want to create a scraper that:
opens a headless browser,
goes to a url,
logs in (there is steam oauth),
fills some inputs,
and clicks 2 buttons.
My problem is that every new instance of headless browser clears my login session, and then I need to login again and again...
How to save it through instances? (using puppeteer with headless chrome)
Or how can I open already logged in chrome headless instance? (if I have already logged in in my main chrome window)
There is an option to save user data using the userDataDir option when launching puppeteer. This stores the session and other things related to launching chrome.
puppeteer.launch({
userDataDir: "./user_data"
});
It doesn't go into great detail but here's a link to the docs for it: https://pptr.dev/#?product=Puppeteer&version=v1.6.1&show=api-puppeteerlaunchoptions
In puppeter you have access to the session cookies through page.cookies().
So once you log in, you could get every cookie and save it in a json file:
const fs = require(fs);
const cookiesFilePath = 'cookies.json';
// Save Session Cookies
const cookiesObject = await page.cookies()
// Write cookies to temp file to be used in other profile pages
fs.writeFile(cookiesFilePath, JSON.stringify(cookiesObject),
function(err) {
if (err) {
console.log('The file could not be written.', err)
}
console.log('Session has been successfully saved')
})
Then, on your next iteration right before using page.goto() you can call page.setCookie() to load the cookies from the file one by one:
const previousSession = fs.existsSync(cookiesFilePath)
if (previousSession) {
// If file exist load the cookies
const cookiesString = fs.readFileSync(cookiesFilePath);
const parsedCookies = JSON.parse(cookiesString);
if (parsedCookies.length !== 0) {
for (let cookie of parsedCookies) {
await page.setCookie(cookie)
}
console.log('Session has been loaded in the browser')
}
}
Checkout the docs:
https://github.com/GoogleChrome/puppeteer/blob/master/docs/api.md#pagecookiesurls
https://github.com/GoogleChrome/puppeteer/blob/master/docs/api.md#pagesetcookiecookies
For a version of the above solution that actually works and doesn't rely on jsonfile (instead using the more standard fs) check this out:
Setup:
const fs = require('fs');
const cookiesPath = "cookies.txt";
Reading the cookies (put this code first):
// If the cookies file exists, read the cookies.
const previousSession = fs.existsSync(cookiesPath)
if (previousSession) {
const content = fs.readFileSync(cookiesPath);
const cookiesArr = JSON.parse(content);
if (cookiesArr.length !== 0) {
for (let cookie of cookiesArr) {
await page.setCookie(cookie)
}
console.log('Session has been loaded in the browser')
}
}
Writing the cookies:
// Write Cookies
const cookiesObject = await page.cookies()
fs.writeFileSync(cookiesPath, JSON.stringify(cookiesObject));
console.log('Session has been saved to ' + cookiesPath);
For writing Cookies
async function writingCookies() {
const cookieArray = require(C.cookieFile); //C.cookieFile can be replaced by ('./filename.json')
await page.setCookie(...cookieArray);
await page.cookies(C.feedUrl); //C.url can be ('https://example.com')
}
For reading Cookies, for this, you've to install jsonfile in your project : npm install jsonfile
async function getCookies() {
const cookiesObject = await page.cookies();
jsonfile.writeFile('linkedinCookies.json', cookiesObject, { spaces: 2 },
function (err) {
if (err) {
console.log('The Cookie file could not be written.', err);
}
console.log("Cookie file has been successfully saved in current working Directory : '" + process.cwd() + "'");
})
}
Call these two functions using await and it will work for you.
Related
I currently have a function that prompts the user to download a JSON file:
function downloadObjectAsJson (exportObj, exportName) {
const dataStr = 'data:text/json;charset=utf-8,' + encodeURIComponent(JSON.stringify(exportObj))
let downloadAnchorNode = document.createElement('a')
downloadAnchorNode.setAttribute('href', dataStr)
downloadAnchorNode.setAttribute('download', exportName + '.json')
document.body.appendChild(downloadAnchorNode) // required for firefox
downloadAnchorNode.click()
downloadAnchorNode.remove()
}
Is there a way to get the path the user selected to download this file to? Just need it to be displayed on the UI.
There are some API available that somehow allows access to the client file system like this, which lists down all the files in the selected directory:
async function listFilesInDirectory () {
const dirHandle = await window.showDirectoryPicker()
const promises = []
for await (const entry of dirHandle.values()) {
if (entry.kind !== 'file') {
break
}
promises.push(entry.getFile().then((file) => `${file.name} (${file.size})`))
}
console.log(await Promise.all(promises))
}
So I thought there might be some way to also get the path selected by the user when saving files.
Any other suggestions/means are welcome.
I am trying to implement an identity check for my chrome extension since I want to sell it to some users.
The code below is usig the chrome.identity API to get the unique OpenID in combination with the email that is logged in. Then it is fetching data from a pastebin and checks if the id is included or not.
If not I would want to block the user from using the extension. What would be the best approach?
My code:
// license check
chrome.identity.getProfileUserInfo({ 'accountStatus': 'ANY' }, async function (info) {
email = info.email;
console.log(info.id);
let response = await fetch('https://pastebin.com/*****');
let data = await response.text();
console.log(data.indexOf(info.id));
if (data.indexOf(info.id) !== -1) {
console.log('included');
} else {
console.log('not included');
// block chrome extension usage;
}
});
I found a pretty simple solution that should work for most cases.
// license check
chrome.identity.getProfileUserInfo({ 'accountStatus': 'ANY' }, async function (info) {
email = info.email;
console.log(info.id);
let response = await fetch('https://pastebin.com/*****');
let data = await response.text();
console.log(data.indexOf(info.id));
if (data.indexOf(info.id) !== -1) {
console.log('included');
} else {
console.log('not included');
// block chrome extension usage;
chrome.browserAction.setPopup({ popup: 'index.html' }); // index.html has to
be in the
extension folder and can have e. g. a <h1> which says "Invalid license"
}
});
as an example of what I'm trying to achieve, consider launching VS Code from the terminal. The code <file-name> command opens an instance of vs code if not only running, or tells it to open a file otherwise. Also, once opened, the user can use the terminal session for other tasks again (as if the process was disowned).
My script needs to interact with my electron app in the same way, with the only difference being that my app will be in the tray and not visible in the dock.
.
The solution only needs to work on linux
Use a unix socket server for inter-process-communication.
In electron
const handleIpc = (conn) => {
conn.setEncoding('utf8');
conn.on('data',(line) => {
let args = line.split(' ');
switch(args[0]) {
case 'hey':
conn.write('whatsup\n');
break;
default: conn.write('new phone who this?\n');
}
conn.end();
})
}
const server = net.createServer(handleIpc);
server.listen('/tmp/my-app.sock');
Then your CLI is:
#!/usr/bin/node
const net = require('net');
let args = process.argv;
args.shift(); // Drop /usr/bin/node
args.shift(); // Drop script path
let line = args.join(' ');
net.connect('/tmp/my-app.sock',(conn)=>{
conn.setEncoding('utf8');
conn.on('data',(response)=>{
console.log(response);
process.exit(0);
});
conn.write(line+'\n');
}).on('error',(err)=>{
console.error(err);
process.exit(1);
});
If I understand correctly, you want to keep only one instance of your app and to handle attempts to launch another instance. In old versions of Electron, app.makeSingleInstance(callback) was used to achieve this. As for Electron ...v13 - v15, app.requestSingleInstanceLock() with second-instance event is used. Here is an example how to use it:
const { app } = require('electron');
let myWindow = null;
const gotTheLock = app.requestSingleInstanceLock();
if (!gotTheLock) {
app.quit();
} else {
app.on('second-instance', (event, commandLine, workingDirectory) => {
// Someone tried to run a second instance
// Do the stuff, for example, focus the window
if (myWindow) {
if (myWindow.isMinimized()) myWindow.restore()
myWindow.focus()
}
})
// Create myWindow, load the rest of the app, etc...
app.whenReady().then(() => {
myWindow = createWindow();
})
}
So when someone will launch ./app arg1 arg2 at the second time, the callback will be called. By the way, this solution is cross-platform.
It's the first time that I work with evernote,
Like the example given in the JS SDK, I create my client with the token that I get from the OAuth and I get all the notebooks of my current user so it was good for me.
But I'm facing a problem that I can't understand, when I use any method of my shared store it throw an Thrift exception with error code 12 and giving the shard id in the message.
I know that 12 error code is that the shard is temporary unavailable..
But I know that it's another thing because it's not temporary...
I have a full access api key, it work with the note store, did I miss something ?
// This is the example in the JS SDK
var linkedNotebook = noteStore.listLinkedNotebooks()
.then(function(linkedNotebooks) {
// just pick the first LinkedNotebook for this example
return client.getSharedNoteStore(linkedNotebooks[0]);
}).then(function(sharedNoteStore) {
// /!\ There is the problem, throw Thrift exception !
return sharedNoteStore.listNotebooks().then(function(notebooks) {
return sharedNoteStore.listTagsByNotebook(notebooks[0].guid);
}).then(function(tags) {
// tags here is a list of Tag objects
});
});
this seems to be an error with the SDK. I created a PR (https://github.com/evernote/evernote-sdk-js/pull/90).
You can work around this by using authenticateToSharedNotebook yourself.
const client = new Evernote.Client({ token, sandbox });
const noteStore = client.getNoteStore();
const notebooks = await noteStore
.listLinkedNotebooks()
.catch(err => console.error(err));
const notebook = notebooks.find(x => x.guid === guid);
const { authenticationToken } = await client
.getNoteStore(notebook.noteStoreUrl)
.authenticateToSharedNotebook(notebook.sharedNotebookGlobalId);
const client2 = new Evernote.Client({
token: authenticationToken,
sandbox
});
const noteStore2 = client2.getNoteStore();
const [notebook2] = await noteStore2.listNotebooks();
noteStore2.listTagsByNotebook(notebook2.guid)
I've written a webapp that allows you to store the images in the localStorage until you hit save (so it works offline, if signal is poor).
When the localStorage reaches 5MB Google Chrome produces an error in the javascript console log:
Uncaught Error: QUOTA_EXCEEDED_ERR: DOM Exception 22
How do I increase the size of the localStorage quota on Google Chrome?
5MB is a hard limit and that is stupid. IndexedDB gives you ~50MB which is more reasonable. To make it easier to use try Dexie.js https://github.com/dfahlander/Dexie.js
Update:
Dexie.js was actually still an overkill for my simple key-value purposes so I wrote this much simpler script https://github.com/DVLP/localStorageDB
with this you have 50MB and can get and set values like that
// Setting values
ldb.set('nameGoesHere', 'value goes here');
// Getting values - callback is required because the data is being retrieved asynchronously:
ldb.get('nameGoesHere', function (value) {
console.log('And the value is', value);
});
Copy/paste the line below so ldb.set() and ldb.get() from the example above will become available.
!function(){function e(t,o){return n?void(n.transaction("s").objectStore("s").get(t).onsuccess=function(e){var t=e.target.result&&e.target.result.v||null;o(t)}):void setTimeout(function(){e(t,o)},100)}var t=window.indexedDB||window.mozIndexedDB||window.webkitIndexedDB||window.msIndexedDB;if(!t)return void console.error("indexDB not supported");var n,o={k:"",v:""},r=t.open("d2",1);r.onsuccess=function(e){n=this.result},r.onerror=function(e){console.error("indexedDB request error"),console.log(e)},r.onupgradeneeded=function(e){n=null;var t=e.target.result.createObjectStore("s",{keyPath:"k"});t.transaction.oncomplete=function(e){n=e.target.db}},window.ldb={get:e,set:function(e,t){o.k=e,o.v=t,n.transaction("s","readwrite").objectStore("s").put(o)}}}();
You can't, it's hard-wired at 5MB. This is a design decision by the Chrome developers.
In Chrome, the Web SQL db and cache manifest also have low limits by default, but if you package the app for the Chrome App Store you can increase them.
See also Managing HTML5 Offline Storage - Google Chrome.
The quota is for the user to set, how much space he wishes to allow to each website.
Therefore since the purpose is to restrict the web pages, the web pages cannot change the restriction.
If storage is low, you can prompt the user to increase local storage.
To find out if storage is low, you could probe the local storage size by saving an object then deleting it.
You can't but if you save JSON in your localStorage you can use a library to compress data like : https://github.com/k-yak/JJLC
demo : http://k-yak.github.io/JJLC/
Here you can test your program , you should handle also the cases when the cuota is exceed
https://stackoverflow.com/a/5664344/2630686 The above answer is much amazing. I applied it in my project and implement a full solution to request all kinds of resource.
// Firstly reference the above ldb code in the answer I mentioned.
export function get_file({ url, d3, name, enable_request = false }) {
if (name === undefined) { // set saved data name by url parsing alternatively
name = url.split('?')[0].split('/').at(-1).split('.')[0];
}
const html_name = location.href.split('/').at(-1).split('.')[0]
name = `${html_name}_${name}`
let ret = null;
const is_outer = is_outer_net(url); // check outer net url by its start with http or //
// try to access data from local. Return null if not found
if (is_outer && !enable_request) {
if (localStorage[name]) {
ret = new Promise(resolve => resolve(JSON.parse(localStorage[name])));
} else {
ret = new Promise(r => {
ldb.get(name, function (value) {
r(value)
})
});
}
} else {
ret = new Promise(r => r(null))
}
ret.then(data => {
if (data) {
return data
} else {
const method = url.split('.').at(-1)
// d3 method supported
if (d3 && d3[method]) {
ret = d3[method](url)
} else {
if (url.startsWith('~/')) { // local files accessed supported. You need a local service that can return local file data by requested url's address value
url = `http://localhost:8010/get_file?address=${url}`
}
ret = fetch(url).then(data => {
// parse data by requested data type
if (url.endsWith('txt')) {
return data.text()
} else {
return data.json()
}
})
}
ret = ret.then(da => {
data = da
if (is_outer) { // save data to localStorage firstly
localStorage[name] = JSON.stringify(data);
}
}).catch(e => { // save to ldb if 5MB exceed
ldb.set(name, data);
}).finally(_ => {
return data;
});
}
})
return ret;
}