Selenium doesn't seem to load the JavaScript part of the website

Selenium doesn't seem to load the JavaScript part of the website - javascript

hello i have a script using Python and Selenium, and I don't understand why this can't retrieve the JS part of the website (the same script works perfectly fine on my other machine):
import chromedriver_binary
from bs4 import BeautifulSoup
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument("--headless")
chrome_options.add_argument("--disable-gpu")
chrome_options.add_argument("window-size=1024,768")
chrome_options.add_argument("--no-sandbox")
chrome_options.add_argument("--enable-javascript")
url = "https://deliveroo.co.uk/restaurants/london/holborn?geohash=gcpvj6kxet58&collection=pizza"
driver = webdriver.Chrome(chrome_options=chrome_options)
driver.get(url)
soup = BeautifulSoup(driver.page_source, "lxml")
show_data = soup.find_all("script", id="__NEXT_DATA__")
mydata = json.loads( show_data[0].text )
I get the following error, meaning that it couldn't see this part of JSON:
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
Not too sure why this is working on my other machine and not on my current one.

The .text attribute doesn't work here. To get the right data, it worked for me to use encode_contents(), just changing the definition of mydata like this:
mydata = json.loads( show_data[0].encode_contents())

Related

How to get all of a website’s js files and their urls [duplicate]

I want to scan some websites and would like to get all the java script files names and content.I tried python requests with BeautifulSoup but wasn't able to get the scripts details and contents.am I missing something ?
I have been trying lot of methods to find but I felt like stumbling in the dark.
This is the code I am trying
import requests
from bs4 import BeautifulSoup
r = requests.get("http://www.marunadanmalayali.com/")
soup = BeautifulSoup(r.content)

You can get all the linked JavaScript code use the below code:
l = [i.get('src') for i in soup.find_all('script') if i.get('src')]
soup.find_all('script') returns a list of all the <script> tags in the page.
A list comprehension is used here to loop over all the elements in the list which returned by soup.find_all('script').
i is a dict like object, use .get('src') to check if it has src attribute. If not, ignore it. Otherwise, put it into a list (which's called l in the example).
The output, in this case looks like below:
['http://adserver.adtech.de/addyn/3.0/1602/5506153/0/6490/ADTECH;loc=700;target=_blank;grp=[group]',
'http://tags.expo9.exponential.com/tags/MarunadanMalayalicom/ROS/tags.js',
'http://tags.expo9.exponential.com/tags/MarunadanMalayalicom/ROS/tags.js',
'http://js.genieessp.com/t/057/794/a1057794.js',
'http://ib.adnxs.com/ttj?id=5620689&cb=[CACHEBUSTER]&pubclick=[INSERT_CLICK_TAG]',
'http://ib.adnxs.com/ttj?id=5531763',
'http://advs.adgorithms.com/ttj?id=3279193&cb=[CACHEBUSTER]&pubclick=[INSERT_CLICK_TAG]',
'http://xp2.zedo.com/jsc/xp2/fo.js',
'http://www.marunadanmalayali.com/js/mnmads.js',
'http://www.marunadanmalayali.com/js/jquery-2.1.0.min.js',
'http://www.marunadanmalayali.com/js/jquery.hoverIntent.minified.js',
'http://www.marunadanmalayali.com/js/jquery.dcmegamenu.1.3.3.js',
'http://www.marunadanmalayali.com/js/jquery.cookie.js',
'http://www.marunadanmalayali.com/js/swanalekha-ml.js',
'http://www.marunadanmalayali.com/js/marunadan.js?r=1875',
'http://www.marunadanmalayali.com/js/taboola_home.js',
'http://d8.zedo.com/jsc/d8/fo.js']
My code missed some links because they're not in the HTML source actually.
You can see them in the console:
But they're not in the source:
Usually, that's because these links were generated by JavaScript. And the requests module doesn't run any JavaScript in the page like a real browser - it only send a request to get the HTML source.
If you also need them, you have to use another module to run the JavaScript in that page, and you can see these links then. For that, I'd suggest use selenium - which runs a real browser so it can runs JavaScript in the page.
For example (make sure that you have already installed selenium and a web driver for it):
from bs4 import BeautifulSoup
from selenium import webdriver
driver = webdriver.Chrome() # use Chrome driver for example
driver.get('http://www.marunadanmalayali.com/')
soup = BeautifulSoup(driver.page_source, "html.parser")
l = [i.get('src') for i in soup.find_all('script') if i.get('src')]
__import__('pprint').pprint(l)

You can use a select with script[src] which will only find script tags with a src, you don't need to call .get multiple times:
import requests
from bs4 import BeautifulSoup
r = requests.get("http://www.marunadanmalayali.com/")
soup = BeautifulSoup(r.content)
src = [sc["src"] for sc in soup.select("script[src]")]
You can also specify src=True with find_all to do the same:
src = [sc["src"] for sc in soup.find_all("script",src=True)]
Which will both give you the same output:
['http://tags.expo9.exponential.com/tags/MarunadanMalayalicom/ROS/tags.js', 'http://tags.expo9.exponential.com/tags/MarunadanMalayalicom/ROS/tags.js', 'http://js.genieessp.com/t/052/954/a1052954.js', '//s3-ap-northeast-1.amazonaws.com/tms-t/marunadanmalayali-7219.js', 'http://advs.adgorithms.com/ttj?id=3279193&cb=[CACHEBUSTER]&pubclick=[INSERT_CLICK_TAG]', 'http://www.marunadanmalayali.com/js/mnmcombined1.min.js', 'http://www.marunadanmalayali.com/js/mnmcombined2.min.js']
Also if you use selenium, you can use it with PhantomJs for headless browsing, you don't need beautufulSoup at all if you use selenium, you can use the same css selector directly in selenium:
from selenium import webdriver
driver = webdriver.PhantomJS()
driver.get('http://www.marunadanmalayali.com/')
src = [sc.get_attribute("src") for sc in driver.find_elements_by_css_selector("script[src]")]
print(src)
Which gives you all the links:
u'https://pixel.yabidos.com/fltiu.js?qid=836373f5137373f5131353&cid=511&p=165&s=http%3a%2f%2fwww.marunadanmalayali.com%2f&x=admeta&nci=&adtg=96331&nai=', u'http://gum.criteo.com/sync?c=72&r=2&j=TRC.getRTUS', u'http://b.scorecardresearch.com/beacon.js', u'http://cdn.taboola.com/libtrc/impl.201-1-RELEASE.js', u'http://p165.atemda.com/JSAdservingMP.ashx?pc=1&pbId=165&clk=&exm=&jsv=1.84&tsv=2.26&cts=1459160775430&arp=0&fl=0&vitp=0&vit=&jscb=&url=&fp=0;400;300;20&oid=&exr=&mraid=&apid=&apbndl=&mpp=0&uid=&cb=54613943&pId0=64056124&rank0=1&gid0=64056124:1c59ac&pp0=&clk0=[External%20click-tracking%20goes%20here%20(NOT%20URL-encoded)]&rpos0=0&ecpm0=&ntv0=&ntl0=&adsid0=', u'http://cdn.taboola.com/libtrc/marunadanaalayali-network/loader.js', u'http://s.atemda.com/Admeta.js', u'http://www.google-analytics.com/analytics.js', u'http://tags.expo9.exponential.com/tags/MarunadanMalayalicom/ROS/tags.js', u'http://tags.expo9.exponential.com/tags/MarunadanMalayalicom/ROS/tags.js', u'http://js.genieessp.com/t/052/954/a1052954.js', u'http://s3-ap-northeast-1.amazonaws.com/tms-t/marunadanmalayali-7219.js', u'http://d8.zedo.com/jsc/d8/fo.js', u'http://z1.zedo.com/asw/fm/1185/7219/9/fm.js?c=7219&a=0&f=&n=1185&r=1&d=9&adm=&q=&$=&s=1936&l=%5BINSERT_CLICK_TRACKER_MACRO%5D&ct=&z=0.025054786819964647&tt=0&tz=0&pu=http%3A%2F%2Fwww.marunadanmalayali.com%2F&ru=&pi=1459160768626&ce=UTF-8&zpu=www.marunadanmalayali.com____1_&tpu=', u'http://cas.criteo.com/delivery/ajs.php?zoneid=308686&nodis=1&cb=38688817829&exclude=undefined&charset=UTF-8&loc=http%3A//www.marunadanmalayali.com/', u'http://ads.pubmatic.com/AdServer/js/showad.js', u'http://showads.pubmatic.com/AdServer/AdServerServlet?pubId=135167&siteId=135548&adId=600924&kadwidth=300&kadheight=250&SAVersion=2&js=1&kdntuid=1&pageURL=http%3A%2F%2Fwww.marunadanmalayali.com%2F&inIframe=0&kadpageurl=marunadanmalayali.com&operId=3&kltstamp=2016-3-28%2011%3A26%3A13&timezone=1&screenResolution=1024x768&ranreq=0.8869257988408208&pmUniAdId=0&adVisibility=2&adPosition=999x664', u'http://d8.zedo.com/jsc/d8/fo.js', u'http://z1.zedo.com/asw/fm/1185/7213/9/fm.js?c=7213&a=0&f=&n=1185&r=1&d=9&adm=&q=&$=&s=1948&l=%5BINSERT_CLICK_TRACKER_MACRO%5D&ct=&z=0.08655649935826659&tt=0&tz=0&pu=http%3A%2F%2Fwww.marunadanmalayali.com%2F&ru=&pi=1459160768626&ce=UTF-8&zpu=www.marunadanmalayali.com____1_&tpu=', u'http://advs.adgorithms.com/ttj?id=3279193&cb=[CACHEBUSTER]&pubclick=[INSERT_CLICK_TAG]', u'http://ib.adnxs.com/ttj?ttjb=1&bdc=1459160761&bdh=ZllBLkzcj2dGDVPeS0Sw_OTWjgQ.&tpuids=eyJ0cHVpZHMiOlt7InByb3ZpZGVyIjoiY3JpdGVvIiwidXNlcl9pZCI6Il9KRC1PUmhLX3hLczd1cUJhbjlwLU1KQ2VZbDQ2VVUxIn1dfQ==&view_iv=0&view_pos=664,2096&view_ws=400,300&view_vs=3&bdref=http%3A%2F%2Fwww.marunadanmalayali.com%2F&bdtop=true&bdifs=0&bstk=http%3A%2F%2Fwww.marunadanmalayali.com%2F&&id=3279193&cb=[CACHEBUSTER]&pubclick=[INSERT_CLICK_TAG]', u'http://www.marunadanmalayali.com/js/mnmcombined1.min.js', u'http://www.marunadanmalayali.com/js/mnmcombined2.min.js', u'http://pixel.yabidos.com/iftfl.js?ver=1.4.2&qid=836373f5137373f5131353&cid=511&p=165&s=http%3a%2f%2fwww.marunadanmalayali.com%2f&x=admeta&adtg=96331&nci=&nai=&nsi=&cstm1=&cstm2=&cstm3=&kqt=&xc=&test=&od1=&od2=&co=0&tps=34&rnd=3m17uji8ftbf']

Scrape a javascript variable from a webpage

I am scraping a site with beautiful soup but all the content is hidden inside a script inside a js variable like this:
I can't seem to find any solution to this other than using selenium which in this case is not an option, I won't go into detail why but it just doesn't work. I can already scrape it by getting the insid eof the script tag and then using eval() on it but that introduces a few problems (unexpected indent, unwanted functions) I can use python, javascript and maybe C# if anything there helps.
Expected behaviour - whatever makes me get the info (the variable in the last line) into any of those 3 languages (preferably python).
The code (sorry for the formating but i cant since its so long, it isnt even the full variable, its huge):
barLoadGoogleFont('Open Sans'); barCssLoad('/global/pics/js/jquery/royalSlider/skins/universal/rs-universal.css?v=e449c4'); barCssLoad('/global/pics/css/material-icons.css?v=e6d856'); barCssLoad('/user/pics/css/user.css?v=eced9d');
barCssLoad('/user/pics/css/userIcons.css?v=6f9a03');
barCssLoad('/timeline/pics/css/timeline.css?v=8ec2ca'); barJsLibraryLoad('/global/pics/js/jquery/jquery.royalslider.min.js?v=515a43'); barJsLibraryLoad('/anketa/pics/js/utilsAnketa.js?v=9383d5'); barJsLibraryLoad('/znamky/pics/js/utilsZnamky.js?v=7afc9e'); barJsLibraryLoad('/exam/pics/js/utilsExam.js?v=033d55'); barJsLibraryLoad('/timeline/pics/js/utilsTimeline.js?v=29cf0e'); barJsLibraryLoad('/timeline/pics/js/timelineItemCreator.js?v=c37c99'); barJsLibraryLoad('/timeline/pics/js/timelineInputbox.js?v=2fde70'); barJsLibraryLoad('/timeline/pics/js/timelineViewer.js?v=f35e45');
barJsLibraryLoad('/user/pics/js/DailyPlan.js?v=e81fb9'); barJsLibraryLoad('/user/pics/js/userHomeEtest.js?v=6166f3');
$j(document).ready(function() { $j('#jwbcddd3da_md').userhome({"items":[{"timelineid":"2140963","timestamp":"2020-12-09 09:59:13","reakcia_na":"692638","typ":"h_clearplany","user":"Plan5077","target_user":null,"user_meno":"Kvarta aj2","ineid":"clearplany","text":"","cas_pridania":"2020-12-09 09:59:13","cas_udalosti":null,"data":"null","vlastnik":"Ucitel8678605","vlastnik_meno":"Barbora Drugajov\u00e1","pocet_reakcii":"0","posledna_reakcia":"","pomocny_zaznam":"1","removed":"0","cas_pridania_btc":"2020-12-09 09:59:13","posledna_reakcia_btc":null},{"timelineid":"2287814","timestamp":"2020-12-09 09:59:12","reakcia_na":"2290613","typ":"h_dailyplan","user":"Trieda8694210","target_user":null,"user_meno":"Kvarta A","ineid":"daily2020-12-09","text":"","cas_pridania":"2020-12-09 09:59:12","cas_udalosti":null,"data":"[]","vlastnik":"Ucitel8678605","vlastnik_meno":"Barbora Drugajov\u00e1","pocet_reakcii":"0","posledna_reakcia":"","pomocny_zaznam":"1","removed":"0","cas_pridania_btc":"2020-12-09 09:59:12","posledna_reakcia_btc":null},{"timelineid":"1439827","timestamp":"2020-12-09 08:56:57","reakcia_na":null,"typ":"h_clearplany","user":"*","target_user":null,"user_meno":"Cel\u00e1 \u0161kola","ineid":"clearplany","text":"","cas_pridania":"2020-12-09 08:56:57","cas_udalosti":null,"data":"null","vlastnik":"Ucitel16434","vlastnik_meno":"Ivor Dian","pocet_reakcii":"0","posledna_reakcia":"","pomocny_zaznam":null,"removed":"0","cas_pridania_btc":"2020-12-09 08:56:57","posledna_reakcia_btc":null},{"timelineid":"2290324","timestamp":"2020-12-09 08:37:22","reakcia_na":null,"typ":"sprava","user":"CustPlan5075","target_user":null,"user_meno":"Kvarta A+Kvarta B - nj4 \u00b7 nemeck\u00fd jazyk","ineid":null,"text":"Ahojte, zajtra...

Ok, little tough to debug without actually working with it. But you'll need to pull out that json structure. You can do it with splits. So this is sort of a generic code.
from bs4 import BeautifulSoup
import pandas as pd
import requests
import json
url = 'www.thesite.com'
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
scripts = soup.find_all('script')
for script in scripts:
if '.userhome({' in script.text:
json_str = script.text
data = json_str.split('.userhome(')[-1]
loop=True
while loop == True:
try:
jsonData = json.loads(data)
loop = False
break
except:
data = data.rsplit(';',1)[0]
rows = []
for row in jsonData['items']:
rows.append(row)
table = pd.DataFrame(rows)

Scraping Javascript Website With BeautifulSoup 4 & Requests_HTML

I'm learning how to build another scraper for another website, Reverb.com, after getting my scraper on another website to work properly. Reverb, however, has been more challenging to extract information from and the model with my old scraper isn't working the same. I did some research and using requests_html instead of requests seemed like the option most were using for Javascript like what Reverb.com has.
I'm essentially trying to scrape out text versions of the headline and price information and either paginate through the different pages or loop through a list of URLs to get all the content. I'm sort of there but hitting road blocks. Below are 2 versions of code I'm fiddling with.
The first version below prints out all of what looks like only 3 of many pages of content but it prints out all the instrument names and prices with the markup. In the CSV, however, all of those items are printed together on 3 rows only, not 1 item/price pair per row.
from requests_html import HTMLSession
from bs4 import BeautifulSoup
import csv
from fake_useragent import UserAgent
session = HTMLSession()
r = session.get("https://reverb.com/marketplace/bass-guitars?year_min=1900&year_max=2022")
r.html.render(sleep=5)
soup = BeautifulSoup(r.html.raw_html, "html.parser")
#content scrape
b = soup.findAll("h4", class_="grid-card__title") #title
for i in b:
print(i)
p = soup.findAll("div", class_="grid-card__price") #price
for i in p:
print(i)
Conversely, this version prints out 3 lines only to a CSV but the name and price are stripped of all the markup. But it only happens when I changed the findAll to just find. I read that the for html in r.html was a way to loop through pages without having to make a list of urls.
from requests_html import HTMLSession
from bs4 import BeautifulSoup
import csv
from fake_useragent import UserAgent
#make csv file
csv_file = open("rvscrape.csv", "w", newline='') #added the newline thing on 5.17.20 to try to stop blank lines from writing
csv_writer = csv.writer(csv_file)
csv_writer.writerow(["bass_name","bass_price"])
session = HTMLSession()
r = session.get("https://reverb.com/marketplace/bass-guitars?year_min=1900&year_max=2022")
r.html.render(sleep=5)
soup = BeautifulSoup(r.html.raw_html, "html.parser")
for html in r.html:
#content scrape
bass_name = []
b = soup.find("h4", class_="grid-card__title").text.strip() #title
#for i in b:
# bass_name.append(i)
# for i in bass_name:
# print(i)
price = []
p = soup.find("div", class_="grid-card__price").text.strip() #price
#for i in p:
# print(i)
csv_writer.writerow([b, p])

In order to extract all the pages of search results, you need to extract the link of the next page and keep going until there is no next page available. We can do this using a while loop and checking the existence of the next anchor tag.
The following script performs the loop and also adds the results to the csv. It also prints the url of the page, so that we have an estimate of what page the program is on.
from requests_html import HTMLSession
from bs4 import BeautifulSoup
import csv
from fake_useragent import UserAgent
# make csv file
# added the newline thing on 5.17.20 to try to stop blank lines from writing
csv_file = open("rvscrape.csv", "w", newline='')
csv_writer = csv.writer(csv_file)
csv_writer.writerow(["bass_name", "bass_price"])
session = HTMLSession()
r = session.get(
"https://reverb.com/marketplace/bass-guitars?year_min=1900&year_max=2022")
r.html.render(sleep=5)
stop = False
next_url = ""
while not stop:
print(next_url)
soup = BeautifulSoup(r.html.raw_html, "html.parser")
titles = soup.findAll("h4", class_="grid-card__title") # titles
prices = soup.findAll("div", class_="grid-card__price") # prices
for i in range(len(titles)):
title = titles[i].text.strip()
price = prices[i].text.strip()
csv_writer.writerow([title, price])
next_link = soup.find("li", class_="pagination__page--next")
if not next_link:
stop = True
else:
next_url = next_link.find("a").get("href")
r = session.get("https://reverb.com/marketplace" + next_url)
r.html.render(sleep=5)
Such data output schema issues are highly common for target javascript websites. This can be also solved using dynamic scrapers.

Search a string in javascript using python

Following my previous question :
how to fetch javascript contents in python
I tried to make another script which fetches the data from a javascript. After getting the webpage contents of course.
But, it's just not showing up the content I want. I want to find "content_id" from the javascript of the page. This is the page :- http://www.hulu.com/watch/815743
Here's what I have right now.
import re
import requests
from bs4 import BeautifulSoup
import os
import fileinput
Link = 'http://www.hulu.com/watch/815743'
q = requests.get(Link)
soup = BeautifulSoup(q.text)
#print soup
subtitles = soup.findAll('script',{'type':'text/javascript'})
pattern = re.compile(r'"content_id":"(.*?)"', re.MULTILINE | re.DOTALL)
script = soup.find("script", text=pattern)
print pattern.search(script.text).group(1)
I get this error :
AttributeError: 'NoneType' object has no attribute 'text'
Any idea how to solve this issue..?

There are two problems in your regular expression pattern:
the quotes are escaped with backslashes in the script contents, take that into account
there is a whitespace after the colon
Here is the fixed version:
pattern = re.compile(r'\\"content_id\\":\s*\\"(.*?)\\"', re.MULTILINE | re.DOTALL)
Works for me, getting 60585710 as a result.
FYI, here is the complete code that I'm executing:
import re
import requests
from bs4 import BeautifulSoup
Link = 'http://www.hulu.com/watch/815743'
q = requests.get(Link)
soup = BeautifulSoup(q.text)
pattern = re.compile(r'\\"content_id\\":\s*\\"(.*?)\\"', re.MULTILINE | re.DOTALL)
script = soup.find("script", text=pattern)
print pattern.search(script.text).group(1)

Issue in invoking "onclick" event using PyQt & javascript

I am trying to scrape data from a website using beautiful soup. By default, this webpage shows 18 items and after clicking on a javascript button "showAlldevices" all 41 items are visible. Beautiful soup scrapes data only for items visible by default, to get data for all items I used PyQt module and invoked the click event using the javascript code. Below is the referred code:
import csv
import urllib2
import sys
import time
from bs4 import BeautifulSoup
from PyQt4.QtGui import *
from PyQt4.QtCore import *
from PyQt4.QtWebKit import *
class Render(QWebPage):
def __init__(self, url):
self.app = QApplication(sys.argv)
QWebPage.__init__(self)
self.loadFinished.connect(self._loadFinished)
self.mainFrame().load(QUrl(url))
self.app.exec_()
def _loadFinished(self, result):
self.frame = self.mainFrame()
self.app.quit()
url = 'http://www.att.com/shop/wireless/devices/smartphones.html'
r = Render(url)
jsClick = """var evObj = document.createEvent('MouseEvents');
evObj.initEvent('click', true, true );
this.dispatchEvent(evObj);
"""
allSelector = "a#deviceShowAllLink"
allButton = r.frame.documentElement().findFirst(allSelector)
allButton.evaluateJavaScript(jsClick)
html = allButton.webFrame().toHtml()
page = html
soup = BeautifulSoup(page)
soup.prettify()
with open('Smartphones_26decv2.0.csv', 'wb') as csvfile:
spamwriter = csv.writer(csvfile, delimiter=',')
spamwriter.writerow(["Date","Day of Week","Device Name","Price"])
items = soup.findAll('a', {"class": "clickStreamSingleItem"},text=True)
prices = soup.findAll('div', {"class": "listGrid-price"})
for item, price in zip(items, prices):
textcontent = u' '.join(price.stripped_strings)
if textcontent:
spamwriter.writerow([time.strftime("%Y-%m-%d"),time.strftime("%A") ,unicode(item.string).encode('utf8').strip(),textcontent])
I am feeding the html to beautiful soup by using this line of code html = allButton.webFrame().toHtml() This code is running without any errors but I am still not getting data for all 41 items in the output csv
I also tried feeding html to beautiful soup using these lines of code:
allButton = r.frame.documentElement().findFirst(allSelector)
a = allButton.evaluateJavaScript(jsClick)
html = a.webFrame.toHtml()
page = html
soup = BeautifulSoup(page)
But I came across this error: html = a.webFrame.toHtml()
AttributeError: 'QVariant' object has no attribute 'webFrame'
Please pardon my ignorance if I am asking anything fundamental here, as I am new to programming and help me in solving this issue.

I think there is a problem with your JavaScript code. Since you're creating a MouseEvent object you should use an initMouseEvent method for initialization. You can find an example here.
UPDATE2
But I think the simplest think you can try is to use the JavaScript DOM method onclick of the a element instead of using your own JavaScript code. Something like this:
allButton.evaluateJavaScript("this.onclick()")
should work. I suppose you will have to reload the page after clicking, before passing it to the parser.
UPDATE 3
You can reload the page via r.action(QWebPage.ReloadAndBypassCache) or r.action(QWebpage.Reload) but it doesn't seem to have any effect. I've tried to display the page with QWebView, click the link and see what happens. Unfortunately I'm getting lots of Segmentation Fault errors so I would swear there is a bug somewhere in PyQt4/Qt4. As the page being scrapped uses jquery I've also tried to display it after loading jquery in the QWebPage but again no luck (the segfaults do not disappear). I'm giving up :( I hope other users here at SO will help you. Anyway I recommend you to ask for help to the PyQt4 mailing list. They provide excellent support to PyQt users.
UPDATE
The error you get when changing your code is expected: remember that allButton is a QWebElement object. And the QWebElement.evaluateJavaScript method returns a QVariant object (as stated in the docs) and that kind of objects don't have a webFrame attribute as you can check reviewing this page.

Develop Reference

JavaScript is the programming language of the Web.

Selenium doesn't seem to load the JavaScript part of the website - javascript

The .text attribute doesn't work here. To get the right data, it worked for me to use encode_contents(), just changing the definition of mydata like this: mydata = json.loads( show_data[0].encode_contents())

Related

How to get all of a website’s js files and their urls [duplicate]

Scrape a javascript variable from a webpage

Scraping Javascript Website With BeautifulSoup 4 & Requests_HTML

Search a string in javascript using python

Issue in invoking "onclick" event using PyQt & javascript

Categories

Resources