Run a JavaScript script when excel file trigger an action - javascript

I'm using puppeteer to automate some processes, one of them that i want to open an excel file reading the data inside and search the web using this data (open google-->search using the cell's data).
I can do this correctly using Java script, but i want to know if i can run puppeteer when an excel trigger occurs?
because i don't want this to happen randomly i want it to happen when a specific event occurs inside excel sheet.
I've been searching for a while and i couldn't find useful resource. I found https://learn.microsoft.com/en-us/javascript/api/excel/excel.worksheet?view=excel-js-preview#onchanged
but it didn't help me alot as i couldn't understand how to use it.
example:
I have an excel file containing only 1 cell {facbook}. So i was
wondering if there is a way that allows me to run a [java script
script through cmd - that controls puppeteer] when i set another cell on excel to
be = {open}. So whenever a cell CHANGES it's value in excel sheet this
triggers the script i have.

You can use exceljs from node js
const puppeteer = require('puppeteer');
var Excel = require('exceljs');
//reading test.xlsx
wb.xlsx.readFile('test.xlsx').then(function(){
sh = wb.getWorksheet("sheet1");
start_page(sh);
});
//running new browser
async function start_page(sh){
var i = 2;
const browser = await puppeteer.launch({headless: false});
while(i <= sh.rowCount){
var result_cell = sh.getRow(i).getCell(3).text;
await open_page(browser, result_cell);
i ++;
}
}
async function open_page(browser, result_cell) {
const page = await browser.newPage();
page.setDefaultNavigationTimeout(10000);
await page.goto('testurl', {
waitUntil: 'networkidle2'
});
your code here
page.close();
}
Those are the sample code I have done for another project. You can check it with those code.

Related

JavaScript google-spreadsheet API logging not working properly

I am a beginner at Javascript.
Trying to read data from GoogleSheets using google-spreadsheet API(https://www.npmjs.com/package/google-spreadsheet)
For some reason, console.log is not working in below code.
Also, I am getting all the google sheet info printed in the console, which is so much that I am not able to see the full log.
const { GoogleSpreadsheet } = require("google-spreadsheet");
const doc = new GoogleSpreadsheet("****************-*******-*******");
const serviceAccountCreds = require("./serviceAccountCredentials.json");
let expenseSheet;
// Load envs in process
process.env["GOOGLE_SERVICE_ACCOUNT_EMAIL"] = serviceAccountCreds.client_email;
process.env["GOOGLE_PRIVATE_KEY"] = serviceAccountCreds.private_key;
// authenticate to the google sheet
const authenticate = async () => {
await doc.useServiceAccountAuth({
client_email: process.env.GOOGLE_SERVICE_ACCOUNT_EMAIL,
private_key: process.env.GOOGLE_PRIVATE_KEY,
});
};
authenticate();
//load Spreadsheet doc info
const docInfo = async () => {
await doc.loadInfo(); // loads document properties and worksheets
console.log(doc.title); // This line does not seem to work
};
docInfo();
I am requiring help with 2 things.
The console.log does not seem to print anything.
The Google sheet information is getting printed, which I don't want it to be.
Thanks in advance.
To check the console.log, I tried to output console log to a file, by using the below command in windows.
nodejs readSheet.js > path_to_file
Even though the file is generated, there is no log in that file.

Can't load all of the page content with phantom

I just wanted to let you know that i ve tried the solution with setTimeout (before marking my question as DUPLICATE)
The problem is related to the website that i'm scraping
So when i used phantomJs it only scrape the first part of the web page and not all of the page
I even tried another webscraping tool (APIFY) and it returns the same content
This is the page that i'm scraping the page that i want to scrape
And this is the code that i'm using :
var phantom = require("phantom");
(async function() {
const instance = await phantom.create();
const page = await instance.createPage();
await page.on("onResourceRequested", function(requestData) {
console.info("Requesting", requestData.url);
});
const status = await page.open(
"https://www.articles-epresse.fr/media/894eab75-c642-46a2-a1ba-b240c278ebbc?"
);
const content = await page.property("content");
console.log(content);
var $ = cheerio.load(content);
console.log($("#article319670").attr("href")); // returns undefined
//because phantomJs is not reaching the end of the page
await instance.exit();
})();
PS : i used phantom because the code source of the page is not the same as " inspect element"
Thank You

Writing to Excel spreadsheet with ASP.Net site using the Excel Javascript API

I have created an ASP.Net webapp using the Empty Template in Visual Studio 2017. The website has many familiar Web Controls such as Button (s), ImageButton, and Label (s). Users open a picture inside the ImageButton control and are able to click inside the control. The webapp calculates a value depending on where the user clicks in the ImageButton, and the values are displayed in the corresponding Label controls. The user is meant to write the values into an open Excel Spreadsheet (this is where the issue lies). For additional context, every action taken by the user is handled by a client-side javascript function-- with the exception of opening the picture. The opening action is handled by C# code belonging to the aspx page.
In the process of writing a similar Excel Web Add In, I found some very helpful code for writing to an Excel Spreadsheet using the Excel JavaScript API.
Here is that very code:
function HighlightCell() {
Excel.run(function (ctx) {
// Create a proxy object for the selected range and load its properties
var sourceRange = ctx.workbook.getSelectedRange().load("values, rowCount, columnCount");
var sheet = ctx.workbook.worksheets.getActiveWorksheet()
// Run the queued-up command, and return a promise to indicate task completion
return ctx.sync()
.then(function () {
var highestRow = 0;
var highestCol = 0;
var highestValue = sourceRange.values[0][0];
// Find the cell to highlight
for (var i = 0; i < sourceRange.rowCount; i++) {
for (var j = 0; j < sourceRange.columnCount; j++) {
if (!isNaN(sourceRange.values[i][j]) && sourceRange.values[i][j] > highestValue) {
highestRow = i;
highestCol = j;
highestValue = sourceRange.values[i][j];
}
}
}
cellToHighlight = sourceRange.getCell(highestRow, highestCol);
cellToHighlight.format.fill.color = "IndianRed";
cellToHighlight.values = 5;
})
.then(ctx.sync);
})
.catch(errorHandler);
}
The code works like a charm within the Excel Web Add In, but it hasn't worked so far within the ASP.Net webapp. From my understanding, it is because the code hasn't been able to retrieve the active workbook / worksheet. This could be because of the disconnect between the server and the client-side from what I know.
Is there any way to open an excel spreadsheet on the client-side with javascript or C#. Can I even use the above code in an ASP.Net webapp?
EDIT: more code
I opened the spreadsheet on the client side with this code:
function excload() {
var selectedFile = document.getElementById('imgupload').files[0];
document.getElementById("frame").src = window.URL.createObjectURL(selectedFile);
}
In this code, imgupload is an HTML file input and "frame" is an iframe element. I'm not sure why when I run it, instead of just opening the spreadsheet in the iframe, it opens the spreadsheet in a new instance of the Excel program on the computer.
Noticed something weird in writing code:
Calling the function HighlightCell--which writes to the cell refreshes the page, while none of the other javascript functions do. This happens even if I add a return false; line to the function and call it from button with _onclick ="HighlightCell(); return false;"
REDUX of "Noticed something weird in writing code":
Managed to call Highlight cell without refresh by using:
$(document).ready(function () {
$('#chosen').click(HighlightCell);
});
But still no writing takes place

How to make excel cells readonly using Javascript?

I have used ignite ui excel library to create an excel workbook using JavaScript. But unfortunately I didn't find any method to make columns/rows of excel read-only in their library. Is there a way we could make columns read-only before creating an excel sheet in JavaScript/Jquery?
I achieved this with the following code/steps:
By first making the entire excel sheet protected by using the code:
sheet.protect();
{sheet is my worksheet name}
Then by unlocking certain cells of excel sheet using the code:
sheet.getCell('H'+j).cellFormat().locked(false);
{where H is the column name and j is a row number, an integer value}
Hope that helps someone else.
overview:
Im using nodejs and exceljs and I was searching for save new row data on my xlsx file while the file is open for read the info (no to save) on windows 10, but due to excel lock the file i was not able to write to the file, exceljs threw me an exception ( Error : EBUSY: resource busy or locked). i was searching for the property "ReadOnlyRecommended" on exceljs for save the file with ReadOnlyRecommended = true, this way i can read the file and at the same time write on it (in the original file, because it is read only), but unfortunately exceljs doesnt have such option. So after a long search I achieved this using fs.chmod from
const fs = require('fs'); when i create for the first time or edit i use fs.chmodSync(excelFilePath, 0o600); for be able to write on the file but when i finish to write i use fs.chmodSync(excelFilePath, 0o400); to set the file on read only, this way when an user open the excel file this is in read only mode so excel will not lock the file.
i hope this help somebody.
https://www.geeksforgeeks.org/node-js-fs-chmod-method/
Excel.run(function (ctx) {
//Worksheet
var sheet = ctx.workbook.worksheets.getItem("Sheet1");
//Entire Range
var entireRange = sheet.getRange();
entireRange.format.protection.locked = false;
//Specific Range
var range = sheet.getRange("A1:B5");
return ctx.sync()
.then(() => {
//Set specific range "locked" status to true.
range.format.protection.locked = true;
})
.then(ctx.sync)
.then(() => {
//Protect Entire sheet
sheet.protection.protect({
allowInsertRows: false,
allowDeleteRows: false
});
});
}).catch(errorHandler);

is it possible to write web crawler in javascript?

I want to crawl the page and check for the hyperlinks in that respective page and also follow those hyperlinks and capture data from the page
Generally, browser JavaScript can only crawl within the domain of its origin, because fetching pages would be done via Ajax, which is restricted by the Same-Origin Policy.
If the page running the crawler script is on www.example.com, then that script can crawl all the pages on www.example.com, but not the pages of any other origin (unless some edge case applies, e.g., the Access-Control-Allow-Origin header is set for pages on the other server).
If you really want to write a fully-featured crawler in browser JS, you could write a browser extension: for example, Chrome extensions are packaged Web application run with special permissions, including cross-origin Ajax. The difficulty with this approach is that you'll have to write multiple versions of the crawler if you want to support multiple browsers. (If the crawler is just for personal use, that's probably not an issue.)
If you use server-side javascript it is possible.
You should take a look at node.js
And an example of a crawler can be found in the link bellow:
http://www.colourcoding.net/blog/archive/2010/11/20/a-node.js-web-spider.aspx
Google's Chrome team has released puppeteer on August 2017, a node library which provides a high-level API for both headless and non-headless Chrome (headless Chrome being available since 59).
It uses an embedded version of Chromium, so it is guaranteed to work out of the box. If you want to use an specific Chrome version, you can do so by launching puppeteer with an executable path as parameter, such as:
const browser = await puppeteer.launch({executablePath: '/path/to/Chrome'});
An example of navigating to a webpage and taking a screenshot out of it shows how simple it is (taken from the GitHub page):
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto('https://example.com');
await page.screenshot({path: 'example.png'});
await browser.close();
})();
We could crawl the pages using Javascript from server side with help of headless webkit. For crawling, we have few libraries like PhantomJS, CasperJS, also there is a new wrapper on PhantomJS called Nightmare JS which make the works easier.
There are ways to circumvent the same-origin policy with JS. I wrote a crawler for facebook, that gathered information from facebook profiles from my friends and my friend's friends and allowed filtering the results by gender, current location, age, martial status (you catch my drift). It was simple. I just ran it from console. That way your script will get privilage to do request on the current domain. You can also make a bookmarklet to run the script from your bookmarks.
Another way is to provide a PHP proxy. Your script will access the proxy on current domain and request files from another with PHP. Just be carefull with those. These might get hijacked and used as a public proxy by 3rd party if you are not carefull.
Good luck, maybe you make a friend or two in the process like I did :-)
My typical setup is to use a browser extension with cross origin privileges set, which is injecting both the crawler code and jQuery.
Another take on Javascript crawlers is to use a headless browser like phantomJS or casperJS (which boosts phantom's powers)
This is what you need http://zugravu.com/products/web-crawler-spider-scraping-javascript-regular-expression-nodejs-mongodb
They use NodeJS, MongoDB and ExtJs as GUI
yes it is possible
Use NODEJS (its server side JS)
There is NPM (package manager that handles 3rd party modules) in nodeJS
Use PhantomJS in NodeJS (third party module that can crawl through websites is PhantomJS)
There is a client side approach for this, using Firefox Greasemonkey extention. with Greasemonkey you can create scripts to be executed each time you open specified urls.
here an example:
if you have urls like these:
http://www.example.com/products/pages/1
http://www.example.com/products/pages/2
then you can use something like this to open all pages containing product list(execute this manually)
var j = 0;
for(var i=1;i<5;i++)
{
setTimeout(function(){
j = j + 1;
window.open('http://www.example.com/products/pages/ + j, '_blank');
}, 15000 * i);
}
then you can create a script to open all products in new window for each product list page and include this url in Greasemonkey for that.
http://www.example.com/products/pages/*
and then a script for each product page to extract data and call a webservice passing data and close window and so on.
I made an example javascript crawler on github.
It's event driven and use an in-memory queue to store all the resources(ie. urls).
How to use in your node environment
var Crawler = require('../lib/crawler')
var crawler = new Crawler('http://www.someUrl.com');
// crawler.maxDepth = 4;
// crawler.crawlInterval = 10;
// crawler.maxListenerCurrency = 10;
// crawler.redisQueue = true;
crawler.start();
Here I'm just showing you 2 core method of a javascript crawler.
Crawler.prototype.run = function() {
var crawler = this;
process.nextTick(() => {
//the run loop
crawler.crawlerIntervalId = setInterval(() => {
crawler.crawl();
}, crawler.crawlInterval);
//kick off first one
crawler.crawl();
});
crawler.running = true;
crawler.emit('start');
}
Crawler.prototype.crawl = function() {
var crawler = this;
if (crawler._openRequests >= crawler.maxListenerCurrency) return;
//go get the item
crawler.queue.oldestUnfetchedItem((err, queueItem, index) => {
if (queueItem) {
//got the item start the fetch
crawler.fetchQueueItem(queueItem, index);
} else if (crawler._openRequests === 0) {
crawler.queue.complete((err, completeCount) => {
if (err)
throw err;
crawler.queue.getLength((err, length) => {
if (err)
throw err;
if (length === completeCount) {
//no open Request, no unfetcheditem stop the crawler
crawler.emit("complete", completeCount);
clearInterval(crawler.crawlerIntervalId);
crawler.running = false;
}
});
});
}
});
};
Here is the github link https://github.com/bfwg/node-tinycrawler.
It is a javascript web crawler written under 1000 lines of code.
This should put you on the right track.
You can make a web crawler driven from a remote json file that opens all links from a page in new tabs as soon as each tab loads except ones that have already been opened. If you set up a with a browser extension running in a basic browser (nothing runs except the web browser and an internet config program) and had it shipped and installed somewhere with good internet, you could make a database of webpages with an old computer. That would just need to retrieve the content of each tab. You could do that for about $2000, contrary to most estimates for search engine costs. You'd just need to basically make your algorithm provide pages based on how much a term appears in the innerText property of the page, keywords, and description. You could also set up another PC to recrawl old pages from the one-time database and add more. I'd estimate it would take about 3 months and $20000, maximum.
Axios + Cheerio
You can do this with axios and cheerios. Check axios docs for response format.
const cheerio = require('cheerio');
const axios = require('axios');
//crawl
//get url
var url = 'http://amazon.com';
axios.get(url)
.then((res) => {
//response format
var body = res.data;
var statusCode = res.status;
var statusText = res.statusText;
var headers = res.headers;
var request = res.request;
var config = res.config;
//jquery
let $ = cheerio.load(body);
//example
//meta tags
var title = $('meta[name=title]').attr('content');
if(title == undefined || title == 'undefined'){
title = $('title').text();
}else{
title = title;
}
var description = $('meta[name=description]').attr('content');
var keywords = $('meta[name=keywords]').attr('content');
var author = $('meta[name=author]').attr('content');
var type = $('meta[http-equiv=content-type]').attr('content');
var favicon = $('link[rel="shortcut icon"]').attr('href');
}).catch(function (e) {
console.log(e);
});
Node-Fetch + Cheerio
You can do the same thing with node-fetch and cheerio.
fetch(url, {
method: "GET",
}).then(function(response){
//response
var html = response.text();
//return
return html;
})
.then(function(res) {
//response html
var html = res;
//jquery
let $ = cheerio.load(html);
//meta tags
var title = $('meta[name=title]').attr('content');
if(title == undefined || title == 'undefined'){
title = $('title').text();
}else{
title = title;
}
var description = $('meta[name=description]').attr('content');
var keywords = $('meta[name=keywords]').attr('content');
var author = $('meta[name=author]').attr('content');
var type = $('meta[http-equiv=content-type]').attr('content');
var favicon = $('link[rel="shortcut icon"]').attr('href');
})
.catch((error) => {
console.error('Error:', error);
});

Categories

Resources