Determining session ID from WebDriverJS - javascript

I'm trying to run WebDriverJS on the browser, but the documentation is somewhat vague on how to get it controlling the host browser. Here, it says:
Launching a browser to run a WebDriver test against another browser is a tad redundant
(compared to simply using node). Instead, using WebDriverJS in the browser is intended for
automating the browser actually running the script. This can be accomplished as long as the > URL for the server and session ID for the browser are known. While these values may be
passed to the builder directly, they may also be defined using the wdurl and wdsid
"environment variables", which are parsed from the loading page's URL query data:
<!-- Assuming HTML URL is /test.html?wdurl=http://localhost:4444/wd/hub&wdsid=foo1234 -->
<!DOCTYPE html>
<script src="webdriver.js"></script>
<input id="input" type="text"/>
<script>
// Attaches to the server and session controlling this browser.
var driver = new webdriver.Builder().build();
var input = driver.findElement(webdriver.By.tagName('input'));
input.sendKeys('foo bar baz').then(function() {
assertEquals('foo bar baz',
document.getElementById('input').value);
});
</script>
I want to open up my test page from Node.js, and then run the commands included in the client-side script. However, I don't know how I would be able to extract the session ID (wdsid query parameter) when I build the session. Does anyone have any idea?

Finally figured it out after a lot of experimentation and reading through the WebDriverJS source code.
var webdriver = require('./assets/webdriver');
var driver = new webdriver.Builder().
usingServer('http://localhost:4444/wd/hub').
withCapabilities({
'browserName': 'chrome',
'version': '',
'platform': 'ANY',
'javascriptEnabled': true
}).
build();
var testUrl = 'http://localhost:3000/test',
hubUrl = 'http://localhost:4444/wd/hub',
sessionId;
driver.session_.then(function(sessionData) {
sessionId = sessionData.id;
driver.get(testUrl + '?wdurl=' + hubUrl + '&wdsid=' + sessionId);
});
driver.session_ returns a Promise object which will contain the session data and other information upon instantiation. Using .then(callback(sessionData)) will allow you to manipulate the data as you wish.

My Selenium version: 4.1.0
Get sesion_id with await
const get_session_id = async (SeleniumDriver) =>
{
const res1 = await SeleniumDriver.getSession();
return res1.getId();
}
call in other function:
await get_session_id(SeleniumDriver);

Related

MarkLogic JavaScript scheduled task

I try to schedule a script using the 'Scheduled Tasks' in ML8. The documentation explains this a bit but only for xQuery.
Now I have a JavaScript file I'd like to schedule.
The error in the log file:
2015-06-23 19:11:00.416 Notice: TaskServer: XDMP-NOEXECUTE: Document is not of executable mimetype. URI: /scheduled/cleanData.js
2015-06-23 19:11:00.416 Notice: TaskServer: in /scheduled/cleanData.js [1.0-ml]
My script:
/* Scheduled script to delete old data */
var now = new Date();
var yearBack = now.setDate(now.getDate() - 65);
var date = new Date(yearBack);
var b = cts.jsonPropertyRangeQuery("Dtm", "<", date);
var c = fn.subsequence(cts.uris("", [], b), 1, 10);
while (true) {
var uri = c.next();
if (uri.done == true){
break;
}
xdmp.log(uri.value, "info"); // log for testing
}
Try the *.sjs extension (Server-side JavaScript).
The *.js extension can be used for static JavaScript resources to return to the client instead of executed on the server.
Hoping that helps,
I believe that ehennum found the issue for you (the extension - which is what the mime-type error is complaining about.
However, on the same subject, not all items in ML work quite as you would expect for Serverside Javascript. For example, using sjs as a target of a trigger is (or recently) did not work. So for things like that, it is also possible to wrap the sjs call inside of xqy using xdmp-invoke.

C# Faster way to get javascript DOM than EO.WebBrowser

I have code in place where I'm using EO.WebBrowser to get the html from a page using the EO.WebView Request:
var cookie = new EO.WebBrowser.Cookie("cookie", "value");
cookie.Path = path;
cookie.Domain = domain;
var options = new BrowserOptions();
options.EnableWebSecurity = false;
Runtime.SetDefaultOptions(options);
var request = new Request(url);
request.Cookies.Add(cookie);
webView.LoadRequestAndWait(request);
Finally I use the following to get the HTML I need:
webView.GetDOMWindow().document.body.outerHTML
My issue is that this is very slow and although I can get it to run it locally, I can not get it to run on Azure server code. Is there a way to do the same thing using HttpWebRequest?
You can use JavaScript:
var data = (string)webView.EvalScript("document.body.outerHTML");
No, HttpWebRequest (and other similar "get me HTML response") methods will only give you HTML itself and will not run JavaScript on the page.
For server side processing of dynamic HTML consider using proper headless internet browser? instead of trying to convince regular IE to work correctly without UI.
the eo.webbrowser runs multi-process like chrome and unsupported by many cloud service environment.
just use WebClient or HttpWebRequest or RestSharp or something like that can do http requests to get the response html.

Auto Save a file in Firefox

I am trying to find a way where by we can auto save a file in Firefox using JS. The way I have done till yet using FireShot on a Windows Desktop:
var element = content.document.createElement("FireShotDataElement");
element.setAttribute("Entire", EntirePage);
element.setAttribute("Action", Action);
element.setAttribute("Key", Key);
element.setAttribute("BASE64Content", "");
element.setAttribute("Data", Data);
element.setAttribute("Document", content.document);
if (typeof(CapturedFrameId) != "undefined")
element.setAttribute("CapturedFrameId", CapturedFrameId);
content.document.documentElement.appendChild(element);
var evt = content.document.createEvent("Events");
evt.initEvent("capturePageEvt", true, false);
element.dispatchEvent(evt);
But the issue is that it opens a dialog box to confirm the local drive location details. Is there a way I can hard code the local drive storage location and auto save the file?
If you are creating a Firefox add-on then FileUtils and NetUtil.asyncCopy are your friends:
Components.utils.import("resource://gre/modules/FileUtils.jsm");
Components.utils.import("resource://gre/modules/NetUtil.jsm");
var TEST_DATA = "this is a test string";
var source = Components.classes["#mozilla.org/io/string-input-stream;1"].
createInstance(Components.interfaces.nsIStringInputStream);
source.setData(TEST_DATA, TEST_DATA.length);
var file = new FileUtils.File("c:\\foo\\bar.txt");
var sink = file.openSafeFileOutputStream(file, FileUtils.MODE_WRONLY |
FileUtils.MODE_CREATE);
NetUtil.asyncCopy(source, sink);
This will asynchronously write the string this is a test string into the file c:\foo\bar.txt. Note that NetUtil.asyncCopy closes both streams automatically, you don't need to do it. However, you might want to pass a function as third parameter to this method - it will be called when the write operation is finished.
See also: Code snippets, writing to a file
Every computer has a different file structure. But still, there is a way. You can save it to cookie / session, depends on how "permanent" your data wants to be.
Do not consider writing a physical file as it requires extra permission.

is it possible to write web crawler in javascript?

I want to crawl the page and check for the hyperlinks in that respective page and also follow those hyperlinks and capture data from the page
Generally, browser JavaScript can only crawl within the domain of its origin, because fetching pages would be done via Ajax, which is restricted by the Same-Origin Policy.
If the page running the crawler script is on www.example.com, then that script can crawl all the pages on www.example.com, but not the pages of any other origin (unless some edge case applies, e.g., the Access-Control-Allow-Origin header is set for pages on the other server).
If you really want to write a fully-featured crawler in browser JS, you could write a browser extension: for example, Chrome extensions are packaged Web application run with special permissions, including cross-origin Ajax. The difficulty with this approach is that you'll have to write multiple versions of the crawler if you want to support multiple browsers. (If the crawler is just for personal use, that's probably not an issue.)
If you use server-side javascript it is possible.
You should take a look at node.js
And an example of a crawler can be found in the link bellow:
http://www.colourcoding.net/blog/archive/2010/11/20/a-node.js-web-spider.aspx
Google's Chrome team has released puppeteer on August 2017, a node library which provides a high-level API for both headless and non-headless Chrome (headless Chrome being available since 59).
It uses an embedded version of Chromium, so it is guaranteed to work out of the box. If you want to use an specific Chrome version, you can do so by launching puppeteer with an executable path as parameter, such as:
const browser = await puppeteer.launch({executablePath: '/path/to/Chrome'});
An example of navigating to a webpage and taking a screenshot out of it shows how simple it is (taken from the GitHub page):
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto('https://example.com');
await page.screenshot({path: 'example.png'});
await browser.close();
})();
We could crawl the pages using Javascript from server side with help of headless webkit. For crawling, we have few libraries like PhantomJS, CasperJS, also there is a new wrapper on PhantomJS called Nightmare JS which make the works easier.
There are ways to circumvent the same-origin policy with JS. I wrote a crawler for facebook, that gathered information from facebook profiles from my friends and my friend's friends and allowed filtering the results by gender, current location, age, martial status (you catch my drift). It was simple. I just ran it from console. That way your script will get privilage to do request on the current domain. You can also make a bookmarklet to run the script from your bookmarks.
Another way is to provide a PHP proxy. Your script will access the proxy on current domain and request files from another with PHP. Just be carefull with those. These might get hijacked and used as a public proxy by 3rd party if you are not carefull.
Good luck, maybe you make a friend or two in the process like I did :-)
My typical setup is to use a browser extension with cross origin privileges set, which is injecting both the crawler code and jQuery.
Another take on Javascript crawlers is to use a headless browser like phantomJS or casperJS (which boosts phantom's powers)
This is what you need http://zugravu.com/products/web-crawler-spider-scraping-javascript-regular-expression-nodejs-mongodb
They use NodeJS, MongoDB and ExtJs as GUI
yes it is possible
Use NODEJS (its server side JS)
There is NPM (package manager that handles 3rd party modules) in nodeJS
Use PhantomJS in NodeJS (third party module that can crawl through websites is PhantomJS)
There is a client side approach for this, using Firefox Greasemonkey extention. with Greasemonkey you can create scripts to be executed each time you open specified urls.
here an example:
if you have urls like these:
http://www.example.com/products/pages/1
http://www.example.com/products/pages/2
then you can use something like this to open all pages containing product list(execute this manually)
var j = 0;
for(var i=1;i<5;i++)
{
setTimeout(function(){
j = j + 1;
window.open('http://www.example.com/products/pages/ + j, '_blank');
}, 15000 * i);
}
then you can create a script to open all products in new window for each product list page and include this url in Greasemonkey for that.
http://www.example.com/products/pages/*
and then a script for each product page to extract data and call a webservice passing data and close window and so on.
I made an example javascript crawler on github.
It's event driven and use an in-memory queue to store all the resources(ie. urls).
How to use in your node environment
var Crawler = require('../lib/crawler')
var crawler = new Crawler('http://www.someUrl.com');
// crawler.maxDepth = 4;
// crawler.crawlInterval = 10;
// crawler.maxListenerCurrency = 10;
// crawler.redisQueue = true;
crawler.start();
Here I'm just showing you 2 core method of a javascript crawler.
Crawler.prototype.run = function() {
var crawler = this;
process.nextTick(() => {
//the run loop
crawler.crawlerIntervalId = setInterval(() => {
crawler.crawl();
}, crawler.crawlInterval);
//kick off first one
crawler.crawl();
});
crawler.running = true;
crawler.emit('start');
}
Crawler.prototype.crawl = function() {
var crawler = this;
if (crawler._openRequests >= crawler.maxListenerCurrency) return;
//go get the item
crawler.queue.oldestUnfetchedItem((err, queueItem, index) => {
if (queueItem) {
//got the item start the fetch
crawler.fetchQueueItem(queueItem, index);
} else if (crawler._openRequests === 0) {
crawler.queue.complete((err, completeCount) => {
if (err)
throw err;
crawler.queue.getLength((err, length) => {
if (err)
throw err;
if (length === completeCount) {
//no open Request, no unfetcheditem stop the crawler
crawler.emit("complete", completeCount);
clearInterval(crawler.crawlerIntervalId);
crawler.running = false;
}
});
});
}
});
};
Here is the github link https://github.com/bfwg/node-tinycrawler.
It is a javascript web crawler written under 1000 lines of code.
This should put you on the right track.
You can make a web crawler driven from a remote json file that opens all links from a page in new tabs as soon as each tab loads except ones that have already been opened. If you set up a with a browser extension running in a basic browser (nothing runs except the web browser and an internet config program) and had it shipped and installed somewhere with good internet, you could make a database of webpages with an old computer. That would just need to retrieve the content of each tab. You could do that for about $2000, contrary to most estimates for search engine costs. You'd just need to basically make your algorithm provide pages based on how much a term appears in the innerText property of the page, keywords, and description. You could also set up another PC to recrawl old pages from the one-time database and add more. I'd estimate it would take about 3 months and $20000, maximum.
Axios + Cheerio
You can do this with axios and cheerios. Check axios docs for response format.
const cheerio = require('cheerio');
const axios = require('axios');
//crawl
//get url
var url = 'http://amazon.com';
axios.get(url)
.then((res) => {
//response format
var body = res.data;
var statusCode = res.status;
var statusText = res.statusText;
var headers = res.headers;
var request = res.request;
var config = res.config;
//jquery
let $ = cheerio.load(body);
//example
//meta tags
var title = $('meta[name=title]').attr('content');
if(title == undefined || title == 'undefined'){
title = $('title').text();
}else{
title = title;
}
var description = $('meta[name=description]').attr('content');
var keywords = $('meta[name=keywords]').attr('content');
var author = $('meta[name=author]').attr('content');
var type = $('meta[http-equiv=content-type]').attr('content');
var favicon = $('link[rel="shortcut icon"]').attr('href');
}).catch(function (e) {
console.log(e);
});
Node-Fetch + Cheerio
You can do the same thing with node-fetch and cheerio.
fetch(url, {
method: "GET",
}).then(function(response){
//response
var html = response.text();
//return
return html;
})
.then(function(res) {
//response html
var html = res;
//jquery
let $ = cheerio.load(html);
//meta tags
var title = $('meta[name=title]').attr('content');
if(title == undefined || title == 'undefined'){
title = $('title').text();
}else{
title = title;
}
var description = $('meta[name=description]').attr('content');
var keywords = $('meta[name=keywords]').attr('content');
var author = $('meta[name=author]').attr('content');
var type = $('meta[http-equiv=content-type]').attr('content');
var favicon = $('link[rel="shortcut icon"]').attr('href');
})
.catch((error) => {
console.error('Error:', error);
});

Mozilla (Firefox, Thunderbird) Extension: How to get extension id (from install.rdf)?

If you are developing an extension for one of the mozilla applications (e.g. Firefox, Thunderbird, etc.) you define a extension id in the install.rdf.
If for some reason you need to know the extension id e.g. to retrieve the extension dir in local file system (1) or if you want to send it to a webservice (useage statistic) etc. it would be nice to get it from the install.rdf in favour to have it hardcoded in your javascript code.
But how to access the extension id from within my extension?
1) example code:
var extId = "myspecialthunderbirdextid#mydomain.com";
var filename = "install.rdf";
var file = extManager.getInstallLocation(extId).getItemFile(extId, filename);
var fullPathToFile = file.path;
I'm fairly sure the 'hard-coded ID' should never change throughout the lifetime of an extension. That's the entire purpose of the ID: it's unique to that extension, permanently. Just store it as a constant and use that constant in your libraries. There's nothing wrong with that.
What IS bad practice is using the install.rdf, which exists for the sole purpose of... well, installing. Once the extension is developed, the install.rdf file's state is irrelevant and could well be inconsistent.
"An Install Manifest is the file an Add-on Manager-enabled XUL application uses to determine information about an add-on as it is being installed" [1]
To give it an analogy, it's like accessing the memory of a deleted object from an overflow. That object still exists in memory but it's not logically longer relevant and using its data is a really, really bad idea.
[1] https://developer.mozilla.org/en/install_manifests
Like lwburk, I don't think its available through Mozilla's API's, but I have an idea which works, but it seems like a complex hack. The basic steps are:
Set up a custom resource url to point to your extension's base directory
Read the file and parse it into XML
Pull the id out using XPath
Add the following line to your chrome.manifest file
resource packagename-base-dir chrome/../
Then we can grab and parse the file with the following code:
function myId(){
var req = new XMLHttpRequest();
// synchronous request
req.open('GET', "resource://packagename-base-dir/install.rdf", false);
req.send(null);
if( req.status !== 0){
throw("file not found");
}
var data = req.responseText;
// this is so that we can query xpath with namespaces
var nsResolver = function(prefix){
var ns = {
"rdf" : "http://www.w3.org/1999/02/22-rdf-syntax-ns#",
"em" : "http://www.mozilla.org/2004/em-rdf#"
};
return ns[prefix] || null;
};
var parser = CCIN("#mozilla.org/xmlextras/domparser;1", Ci.nsIDOMParser);
var doc = parser.parseFromString(data, "text/xml");
// you might have to change this xpath expression a bit to fit your setup
var myExtId = doc.evaluate("//em:targetApplication//em:id", doc, nsResolver,
Ci.nsIDOMXPathResult.FIRST_ORDERED_NODE_TYPE, null);
return myExtId.singleNodeValue.textContent;
}
I chose to use a XMLHttpRequest(as opposed to simply reading from a file) to retrieve the contents since in Firefox 4, extensions aren't necessarily unzipped. However, XMLHttpRequest will still work if the extension remains packed (haven't tested this, but have read about it).
Please note that resource URL's are shared by all installed extensions, so if packagename-base-dir isn't unique, you'll run into problems. You might be able to leverage Programmatically adding aliases to solve this problem.
This question prompted me to join StackOverflow tonight, and I'm looking forward participating more... I'll be seeing you guys around!
As Firefox now just uses Chrome's WebExtension API, you can use #serg's answer at How to get my extension's id from JavaScript?:
You can get it like this (no extra permissions required) in two
different ways:
Using runtime api: var myid = chrome.runtime.id;
Using i18n api: var myid = chrome.i18n.getMessage("##extension_id");
I can't prove a negative, but I've done some research and I don't think this is possible. Evidence:
This question, which shows that
the nsIExtensionManager interface
expects you to retrieve extension
information by ID
The full nsIExtensionManager interface
description, which shows no
method that helps
The interface does allow you to retrieve a full list of installed extensions, so it's possible to retrieve information about your extension using something other than the ID. See this code, for example:
var em = Cc['#mozilla.org/extensions/manager;1']
.getService(Ci.nsIExtensionManager);
const nsIUpdateItem = Ci.nsIUpdateItem;
var extension_type = nsIUpdateItem.TYPE_EXTENSION;
items = em.getItemList(extension_type, {});
items.forEach(function(item, index, array) {
alert(item.name + " / " + item.id + " version: " + item.version);
});
But you'd still be relying on hardcoded properties, of which the ID is the only one guaranteed to be unique.
Take a look on this add-on, maybe its author could help you, or yourself can figure out:
[Extension Manager] Extended is very
simple to use. After installing, just
open the extension manager by going to
Tools and the clicking Extensions. You
will now see next to each extension
the id of that extension.
(Not compatible yet with Firefox 4.0)
https://addons.mozilla.org/firefox/addon/2195

Categories

Resources