Using the JavascriptExecutor with Powershell

Using the JavascriptExecutor with Powershell - javascript

I've created several Powershell scripts that's using the Selenium Webdriver.
Now I need to add some Javascript functionality to one of them, but I'm unable to figure out how I go about to get the syntax correct.
Attempted to convert the following C# code from this discussion:
Execute JavaScript using Selenium WebDriver in C#
And this is what my code looks like at the moment:
# Specify path to Selenium drivers
$DriverPath = (get-item ".\" ).parent.parent.FullName + "\seleniumdriver\"
$files = Get-ChildItem "$DriverPath*.dll"
# Read in all the Selenium drivers
foreach ($file in $files) {
$FilePath = $DriverPath + $file.Name
[Reflection.Assembly]::LoadFile($FilePath) | out-null
}
# Create instance of ChromeDriver
$driver = New-Object OpenQA.Selenium.Chrome.ChromeDriver
# Go to example page google.com
$driver.Url = "http://www.google.com"
# Create instance of IJavaScriptExecutor
$js = New-Object IJavaScriptExecutor($driver)
# Run Javascript to get current url title
$title = $js.executeScript("return document.title")
# Write titel to cmd
write-host $title
But I constantly get the error below when creating instance of IJavaScriptExecutor:
"New-Object : Cannot find type [IJavaScriptExecutor]: make sure the assembly containing this type is loaded."
Can anyone figure out what I'm missing? Is it incorrect code? Missing additional dll's?
Br,
Christian

The problem is that IJavaScriptExecutor is an interface and you can't create an instance of an interface. Instead, you need to create an instance of a class which implements the interface. In this case, the ChromeDriver class implements this interface, so you could just skip the line which creates the $js variable and instead use the $driver.
So you'd get something like the following, given that your javascript function works as expected:
# Create instance of ChromeDriver
$driver = New-Object OpenQA.Selenium.Chrome.ChromeDriver
# Go to example page google.com
$driver.Url = "http://www.google.com"
# Run Javascript to get current url title
$title = $driver.executeScript("return document.title")
You can read more about these classes on the Selenium Documentation.

Related

Can't narrow down correct element in Python/Selenium

So I'm trying to craft a website manipulation script to help automate teh creation of email mailboxes on our hosted provider.
I'm both new to Python and new to scripting web resources so if something looks weird or mediocre that's why :)
Here's my script:
import time
from selenium.webdriver import Firefox
from selenium.webdriver.firefox.options import Options
from selenium import webdriver
from selenium.webdriver.support.select import Select as driverselect
driver = webdriver.Firefox()
main_url = 'https://website.com:446'
opts = Options()
# noinspection PyDeprecation
# opts.set_headless()
#assert opts.headless # Operating in headless mode
browser = Firefox(options=opts)
browser.get(main_url)
search_form = browser.find_element_by_id('LoginName')
search_form.send_keys('username')
search_form = browser.find_element_by_id('Password')
search_form.send_keys('password')
search_form.submit()
time.sleep(5)
# provision = driverselect(driver.find_element_by_xpath("/html/body/div[2]/div[2]/nav/div/ul/li[4]"))
provision = driver.find_element_by_xpath('/html/body/div[2]/div[2]/nav/div/ul/li[4]/a/span[1]')
provision.submit()
# exchange = driver.find_element_by_name('Exchange')
# exchange.submit()
My error is:
Traceback (most recent call last): File
"/home/turd/PycharmProjects/Automate_NSGEmail/selenium_test.py", line
23, in provision =
driver.find_element_by_xpath('/html/body/div[2]/div[2]/nav/div/ul/li[4]/a/span1')
File "/home/turd/.local/lib/python3.8/site-packages/selenium/webdriver/remote/webdriver.py",
line 394, in find_element_by_xpath
return self.find_element(by=By.XPATH, value=xpath) File "/home/turd/.local/lib/python3.8/site-packages/selenium/webdriver/remote/webdriver.py",
line 976, in find_element
return self.execute(Command.FIND_ELEMENT, { File "/home/turd/.local/lib/python3.8/site-packages/selenium/webdriver/remote/webdriver.py",
line 321, in execute
self.error_handler.check_response(response) File "/home/turd/.local/lib/python3.8/site-packages/selenium/webdriver/remote/errorhandler.py",
line 242, in check_response
raise exception_class(message, screen, stacktrace) selenium.common.exceptions.NoSuchElementException: Message: Unable to
locate element: /html/body/div[2]/div[2]/nav/div/ul/li[4]/a/span1
Now that Xpath value I copied straight from the dev tools on that page, here's what this block of code looks like from the site:
I'm trying to grab and 'click' on the one Active Dynamic-Menu item in the pic above. I think that menu is JS but I'm not 100% positive.
Anyway I'd be much obliged if anyone could help me narrow this down and grab that blasted element.

So I discovered the answer myself.. I had some wrong code at the beginning of my script:
driver = webdriver.Firefox()
main_url = 'https://website.com:446'
opts = Options()
# noinspection PyDeprecation
# opts.set_headless()
#assert opts.headless # Operating in headless mode
browser = Firefox(options=opts)
browser.get(main_url)
I changed this section to:
driver = webdriver.Firefox()
url = 'https://website.com:446'
opts = Options()
driver.maximize_window()
driver.get(url)
I was opening two instances of Firebox before, the driver.* lines would attempting to locate the xpath tags on the FF instance that was not logged in.
Derp.

Web scraping with BeautifulSoup won't work

Ultimately, I'm trying to open all articles of a news website and then make a top 10 of the words used in all the articles. To do this, I first wanted to see how many articles there are so I could iterate over them at some point, haven't really figured out how I want to do everything yet.
To do this, I wanted to use BeautifulSoup4. I think the class I'm trying to get is Javascript as I'm not getting anything back.
This is my code:
url = "http://ad.nl"
ad = requests.get(url)
soup = BeautifulSoup(ad.text.lower(), "xml")
titels = soup.findAll("article")
print(titels)
for titel in titels:
print(titel)
The article name is sometimes an h2 or an h3. It always has one and the same class, but I can't get anything through that class. It has some parents but it uses the same name but with the extension -wrapper for example. I don't even know how to use a parent to get what I want but I think that those classes are Javascript as well. There's also an href which I'm interested in. But once again, that is probably also Javascript as it returns nothing.
Does anyone know how I could use anything (preferably the href, but the article name would be ok as well) by using BeautifulSoup?

In case you don't want to use selenium. This works for me. I've tried on 2 PCs with different internet connection. Can you try?
from bs4 import BeautifulSoup
import requests
cookies={"pwv":"2",
"pws":"functional|analytics|content_recommendation|targeted_advertising|social_media"}
page=requests.get("https://www.ad.nl/",cookies=cookies)
soup = BeautifulSoup(page.content, 'html.parser')
articles = soup.findAll("article")
Then follow kimbo's code to extract h2/h3.

As #Sri mentioned in the comments, when you open up that url, you have a page show up where you have to accept the cookies first, which requires interaction.
When you need interaction, consider using something like selenium (https://selenium-python.readthedocs.io/).
Here's something that should get you started.
(Edit: you'll need to run pip install selenium before running this code below)
import requests
from bs4 import BeautifulSoup
from selenium import webdriver
url = 'https://ad.nl'
# launch firefox with your url above
# note that you could change this to some other webdriver (e.g. Chrome)
driver = webdriver.Firefox()
driver.get(url)
# click the "accept cookies" button
btn = driver.find_element_by_name('action')
btn.click()
# grab the html. It'll wait here until the page is finished loading
html = driver.page_source
# parse the html soup
soup = BeautifulSoup(html.lower(), "html.parser")
articles = soup.findAll("article")
for article in articles:
# check for article titles in both h2 and h3 elems
h2_titles = article.findAll('h2', {'class': 'ankeiler__title'})
h3_titles = article.findAll('h3', {'class': 'ankeiler__title'})
for t in h2_titles:
# first I was doing print(t.text), but some of them had leading
# newlines and things like '22:30', which I assume was the hour of the day
text = ''.join(t.findAll(text=True, recursive=False)).lstrip()
print(text)
for t in h3_titles:
text = ''.join(t.findAll(text=True, recursive=False)).lstrip()
print(text)
# close the browser
driver.close()
This may or may not be exactly what you have in mind, but this is just an example of how to use selenium and beautiful soup. Feel free to copy/use/modify this as you see fit.
And if you're wondering about what selectors to use, read the comment by #JL Peyret.

How to yield fragment URLs in scrapy using Selenium?

from my poor knowledge about webscraping I've come about to find a very complex issue for me, that I will try to explain the best I can (hence I'm opened to suggestions or edits in my post).
I started using the web crawling framework 'Scrapy' long ago to make my webscraping, and it's still the one that I use nowadays. Lately, I came across this website, and found that my framework (Scrapy) was not able to iterate over the pages since this website uses Fragment URLs (#) to load the data (the next pages). Then I made a post about that problem (having no idea of the main problem yet): my post
After that, I realized that my framework can't make it without a JavaScript interpreter or a browser imitation, so they mentioned the Selenium library. I read as much as I could about that library (i.e. example1, example2, example3 and example4). I also found this StackOverflow's post that gives some tracks about my issue.
So Finally, my biggest questions are:
1 - Is there any way to iterate/yield over the pages from the website shown above, using Selenium along with scrapy?
So far, this is the code I'm using, but doesn't work...
EDIT:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
# The require imports...
def getBrowser():
path_to_phantomjs = "/some_path/phantomjs-2.1.1-macosx/bin/phantomjs"
dcap = dict(DesiredCapabilities.PHANTOMJS)
dcap["phantomjs.page.settings.userAgent"] = (
"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/53 "
"(KHTML, like Gecko) Chrome/15.0.87")
browser = webdriver.PhantomJS(executable_path=path_to_phantomjs, desired_capabilities=dcap)
return browser
class MySpider(Spider):
name = "myspider"
browser = getBrowser()
def start_requests(self):
the_url = "http://www.atraveo.com/es_es/islas_canarias#eyJkYXRhIjp7ImNvdW50cnlJZCI6IkVTIiwicmVnaW9uSWQiOiI5MjAiLCJkdXJhdGlvbiI6NywibWluUGVyc29ucyI6MX0sImNvbmZpZyI6eyJwYWdlIjoiMCJ9fQ=="
yield scrapy.Request(url=the_url, callback=self.parse, dont_filter=True)
def parse(self, response):
self.get_page_links()
def get_page_links(self):
""" This first part, goes through all available pages """
for i in xrange(1, 3): # 210
new_data = {"data": {"countryId": "ES", "regionId": "920", "duration": 7, "minPersons": 1},
"config": {"page": str(i)}}
json_data = json.dumps(new_data)
new_url = "http://www.atraveo.com/es_es/islas_canarias#" + base64.b64encode(json_data)
self.browser.get(new_url)
print "\nThe new URL is -> ", new_url, "\n"
content = self.browser.page_source
self.get_item_links(content)
def get_item_links(self, body=""):
if body:
""" This second part, goes through all available items """
raw_links = re.findall(r'listclickable.+?>', body)
links = []
if raw_links:
for raw_link in raw_links:
new_link = re.findall(r'data-link=\".+?\"', raw_link)[0].replace("data-link=\"", "").replace("\"",
"")
links.append(str(new_link))
if links:
ids = self.get_ids(links)
for link in links:
current_id = self.get_single_id(link)
print "\nThe Link -> ", link
# If commented the line below, code works, doesn't otherwise
yield scrapy.Request(url=link, callback=self.parse_room, dont_filter=True)
def get_ids(self, list1=[]):
if list1:
ids = []
for elem in list1:
raw_id = re.findall(r'/[0-9]+', elem)[0].replace("/", "")
ids.append(raw_id)
return ids
else:
return []
def get_single_id(self, text=""):
if text:
raw_id = re.findall(r'/[0-9]+', text)[0].replace("/", "")
return raw_id
else:
return ""
def parse_room(self, response):
# More scraping code...
So this is mainly my problem. I'm almost sure that what I'm doing isn't the best way, so for that I did my second question. And to avoid having to do these kind of issues in the future, I did my third question.
2 - If the answer to the first question is negative, how could I tackle this issue? I'm opened to another means, otherwise
3 - Can anyone tell me or show me pages where I can learn how to solve/combine webscraping along javaScript and Ajax? Nowadays are more the websites that use JavaScript and Ajax scripts to load content
Many thanks in advance!

Selenium is one of the best tools to scrape dynamic data.you can use selenium with any web browser to fetch the data that is loading from scripts.That works exactly like the browser click operations.But I am not prefering it.
For getting dynamic data you can use scrapy + splash combo. From scrapy you wil get all the static data and splash for other dynamic contents.

Have you looked into BeautifulSoup? It's a very popular web scraping library for python. As for JavaScript, I would recommend something like Cheerio (If you're asking for a scraping library in JavaScript)
If you are meaning that the website uses HTTP requests to load content, you could always try to manipulate that manually with something like the requests library.
Hope this helps

You can definitely use Selenium as a standalone to scrap webpages with dynamic content (like AJAX loading).
Selenium will just rely on a WebDriver (basically a web browser) to seek content over the Internet.
Here are a few of them (but the most often used) :
ChromeDriver
PhantomJS (my favorite)
Firefox
Once your started, you can start your bot and parse the html content of the webpage.
I included a minimal working example below using Python and ChromeDriver :
from selenium import webdriver
from selenium.webdriver.common.by import By
driver = webdriver.Chrome(executable_path='chromedriver')
driver.get('https://www.google.com')
# Then you can search for any element you want on the webpage
search_bar = driver.find_element(By.CLASS_NAME, 'tsf-p')
search_bar.click()
driver.close()
See the documentation for more details !

How to read and write to a file (Javascript) in ui automation?

I want to identify few properties during my run and form a json object which I would like to write to a ".json"file and save it on the disk.
var target = UIATarget.localTarget();
var properties = new Object();
var jsonObjectToRecord = {"properties":properties}
jsonObjectToRecord.properties.name = "My App"
UIALogger.logMessage("Pretty Print TEST Log"+jsonObjectToRecord.properties.name);
var str = JSON.stringify(jsonObjectToRecord)
UIALogger.logMessage(str);
// -- CODE TO WRITE THIS JSON TO A FILE AND SAVE ON THE DISK --
I tried :
// Sample code to see if it is possible to write data
// onto some file from my automation script
function WriteToFile()
{
set fso = CreateObject("Scripting.FileSystemObject");
set s = fso.CreateTextFile("/Volumes/DEV/test.txt", True);
s.writeline("HI");
s.writeline("Bye");
s.writeline("-----------------------------");
s.Close();
}
AND
function WriteFile()
{
// Create an instance of StreamWriter to write text to a file.
sw = new StreamWriter("TestFile.txt");
// Add some text to the file.
sw.Write("This is the ");
sw.WriteLine("header for the file.");
sw.WriteLine("-------------------");
// Arbitrary objects can also be written to the file.
sw.Write("The date is: ");
sw.WriteLine(DateTime.Now);
sw.Close();
}
But still unable to read and write data to file from ui automation instruments
Possible Workaround ??
To redirect to the stdout if we can execute a terminal command from my ui automation script. So can we execute a terminal command from the script ?
Haven't Tried :
1. Assuming we can include the library that have those methods and give it a try .

Your assumptions are good, But the XCode UI Automation script is not a full JavaScript.
I don't think you can simply program a normal browser based JavaScript in the XCode UI Automation script.
set fso = CreateObject("Scripting.FileSystemObject");
Is not a JavaScript, it is VBScript which will only work in Microsoft Platforms and testing tools like QTP.
Scripting.FileSystemObject
Is an ActiveX object which only exists in Microsoft Windows
Only few JavaScript functions like basic Math, Array,...etc..Are provided by the Apple JavaScript library, so you are limited to use only the classes provided here https://developer.apple.com/library/ios/documentation/DeveloperTools/Reference/UIAutomationRef/
If you want to do more scripting then Try Selenium IOS Driver http://ios-driver.github.io/ios-driver/

Hey so this is something that I was looking into for a project but never fully got around to implementing so this answer will be more of a guide of what to do than step by step copy and paste.
First you're going to need to create a bash script that writes to a file. This can be as simple as
!/bin/bash
echo $1 >> ${filename.json}
Then you call this from inside your Xcode Instruments UIAutomation tool with
var target = UIATarget.localTarget();
var host = target.host();
var result = host.performTaskWithPathArgumentsTimeout("your/script/path", ["Object description in JSON format"], 5);
Then after your automation ends you can load up the file path on your computer to look at the results.
EDIT: This will enable to write to a file line by line but the actual JSON formatting will be up to you. Looking at some examples I don't think it would be difficult to implement but obviously you'll need to give it some thought at first.

How to get the version of a pebble app on the watch?

I want to provide the app version of my pebble app on its splashscreen. But how can i access it?
Is there a way to access information from the appinfo.json on the watch or in JS? I need at least the version string.

The easiest way to get your app version into the C code is to modify the wscript to generate a header file containing it as part of the build process.
User pedrolane on the Pebble forums has provided his wscript as an example which you can find here: https://code.google.com/p/pebble-for-gopro/source/browse/wscript?spec=svn8634d98109cb03c30c4dab52e665c4ac548cb20a&r=8634d98109cb03c30c4dab52e665c4ac548cb20a
Here's the contents of the file. The generate_appinfo function reads in appinfo.json, grabs the versionLabel and writes it to generated/appinfo.h.
import json
top = '.'
out = 'build'
def options(ctx):
ctx.load('pebble_sdk')
def configure(ctx):
ctx.load('pebble_sdk')
def build(ctx):
ctx.load('pebble_sdk')
def generate_appinfo(task):
src = task.inputs[0].abspath()
tgt = task.outputs[0].abspath()
json_data=open(src)
data = json.load(json_data)
f = open(tgt,'w')
f.write('#ifndef appinfo_h\n')
f.write('#define appinfo_h\n')
f.write('#define VERSION_LABEL "' + data["versionLabel"] + '"\n')
f.write('#endif\n')
f.close()
ctx(
rule = generate_appinfo,
source = 'appinfo.json',
target = 'generated/appinfo.h',
)
ctx.pbl_program(source=ctx.path.ant_glob(['src/**/*.c','generated/**/*.c']),
includes='generated',
target='pebble-app.elf')
ctx.pbl_bundle(elf='pebble-app.elf',
js=ctx.path.ant_glob('src/js/**/*.js'))
To use the value, include appinfo.h and use VERSION_LABEL.

Another hacky solution without code generation, add the following lines in your main.c :
#include "pebble_app_info.h"
extern const PebbleAppInfo __pbl_app_info;
Then you can get the version of your app like this :
__pbl_app_info.app_version.major
__pbl_app_info.app_version.minor

Develop Reference

JavaScript is the programming language of the Web.

Using the JavascriptExecutor with Powershell - javascript

Related

Can't narrow down correct element in Python/Selenium

Web scraping with BeautifulSoup won't work

How to yield fragment URLs in scrapy using Selenium?

How to read and write to a file (Javascript) in ui automation?

How to get the version of a pebble app on the watch?

Categories

Resources