How to use Selenium to get real-time stock price on website? - javascript

I am working on a school project to get the real-time stock price on JPMorgan Website. I would like to get all the data shown in <div class="price_area">. I have tried beautifulsoup and yahoo api but still cannot get what I want. So, it my first time to try selenium but I have no idea how to run the javascript by it. Here is my code:
def getStockPrice():
driver = webdriver.Chrome()
driver.get("http://www.jpmhkwarrants.com/en_hk/market-statistics/underlying/underlying-terms/code/1")
try:
stock_index = WebDriverWait(driver, 10).until(
driver.find_element_by_class_name('price_area').find_element_by_class_name('price')
)
finally:
driver.quit()
But it shows the error 'WebElement' is not callable. How can I get the real-time price, % change and the open price. Thanks.

to use .find_element_by_* in WebDriverWait you have to use lambda function like
stock_index = WebDriverWait(driver, 10).until(
lambda d: d.find_element_by_class_name('price_area').find_element_by_class_name('price')
)
and don't forget to call .text to get the content
def getStockPrice():
driver = webdriver.Chrome()
driver.get("http://www.jpmhkwarrants.com/en_hk/market-statistics/underlying/underlying-terms/code/0700")
try:
stock_index = WebDriverWait(driver, 10).until(
lambda x: x.find_element_by_class_name('price_area')
)
price = stock_index.find_element_by_class_name('price')
percentage = stock_index.find_element_by_css_selector('.percentage.rise')
open_price = stock_index.find_element_by_css_selector('ul li span')
print('Current price: ' + price.text)
print('Percentage: ' + percentage.text)
print('Open price: ' + open_price.text)
except:
print('wait timeout, element not found')
finally:
driver.quit()

You can use requests and BeautifulSoup to get the three items you mention using the Ajax query string call
import requests
from bs4 import BeautifulSoup
url= 'http://www.jpmhkwarrants.com/en_hk/ajax/terms_quick_search?_=1543832902362'
res = requests.get(url)
soup = BeautifulSoup(res.content, "lxml")
items = [item.text for item in soup.select('.price,.percentage.rise,li:nth-of-type(3) span')]
print(items)
Result:
The real time box has its own Ajax call of:
http://www.jpmhkwarrants.com/en_hk/ajax/market-terms-real-time-box/code/0700?_=1543832902364
You can use that to retrieve all items in that box.
import requests
from bs4 import BeautifulSoup
url= 'http://www.jpmhkwarrants.com/en_hk/ajax/market-terms-real-time-box/code/0700?_=1543832902364'
res = requests.get(url)
soup = BeautifulSoup(res.content, "lxml")
items = [item.text.strip().split('\n') for item in soup.select('.price_area div')]
tidied = [item for sublist in items for item in sublist if item and item !='Change (%)']
print(tidied)
Result:

That data isn't real-time.
You have to pay for real time data usually.
If your project involves any type of paper trading/analysis, know that everything you pull from a scrape will probably be delayed by 5-15min.
I've heard Bloomberg has a free api, but I don't know if the real time data is free.
Check out Interactive Brokers API. I'm pretty sure access to the data is free, and it allows you to connect to a paper trading account to test strategies, and algorithms.

Related

Python: Javascript rendered webpage not parsing

I want to parse the information in the following url. I want to parse the Name of the trade, the strategy description and the transactions in the "Trading History" and "Open Positions". When I parse the page, I do not get this data.
I am new to parsing javascript rendered webpages so I would appreciate some explanation why my code below isn't working.
import bs4 as bs
import urllib
import dryscrape
import sys
import time
url = 'https://www.zulutrade.com/trader/314062/trading'
sess = dryscrape.Session()
sess.visit(url)
time.sleep(10)
sauce = sess.body()
soup = bs.BeautifulSoup(sauce, 'lxml')
Thanks!
Your link in the code doesn't allow you to get anything cause the original url you should play with is the one I'm pasting below .The one you tried to work with automatically gets redirected to the one I mentioned here.
https://www.zulutrade.com/zulutrade-client/traders/api/providers/314062/tradeHistory?
Scraping json data out of the table from that page is as follows:
import requests
r = requests.get('https://www.zulutrade.com/zulutrade-client/traders/api/providers/314062/tradeHistory?')
j = r.json()
items = j['content']
for item in items:
print(item['currency'],item['pips'],item['tradeType'],item['transactionCurrency'],item['id'])

Scrapy + Selenium 302 redirection handling

So I'm building a webcrawler that logs into my bank account and gathers data about my spending. I originally was going to use only Scrapy but it didn't work since the First Merit page uses Javascript to log in so I piled Selenium on top.
My code logs in (first you need to input username, and then password, not in conjunction as in most pages) through a series of yielding Requests with specific callback functions that handle the next step.
import scrapy
from scrapy import Request
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
import selenium
import time
class LoginSpider(scrapy.Spider):
name = 'www.firstmerit.com'
# allowed_domains = ['https://www.firstmeritib.com']
start_urls = ['https://www.firstmeritib.com/AccountHistory.aspx?a=1']
def __init__(self):
self.driver = webdriver.Firefox()
def parse(self, response):
self.driver.get(response.url)
# Obtaining necessary components to input my own stuff
username = WebDriverWait(self.driver, 10).until(lambda driver: self.driver.find_element_by_xpath('//*[#id="txtUsername"]'))
login_button = WebDriverWait(self.driver, 10).until(lambda driver: self.driver.find_element_by_xpath('//*[#id="btnLogin"]'))
# The actual interaction
username.send_keys("username")
login_button.click()
# The process of logging in is broken up in two functions since the website requires me
# to enter my username first which redirects me to a password page where I cna finally enter my account (after inputting password)
yield Request(url = self.driver.current_url,
callback = self.password_handling,
meta = {'dont_redirect' : True,
'handle_httpstatus_list': [302],
'cookiejar' : response}
)
def password_handling(self, response):
print("^^^^^^")
print(response.url)
password = WebDriverWait(self.driver, 10).until(lambda driver: self.driver.find_element_by_xpath('//*[#id="MainContent_txtPassword"]'))
login_button2 = WebDriverWait(self.driver, 10).until(lambda driver: self.driver.find_element_by_xpath('//*[#id="MainContent_btnLogin"]'))
password.send_keys("password")
login_button2.click()
print("*****")
print(self.driver.current_url)
print("*****")
yield Request (url = self.driver.current_url,
callback = self.after_login, #, dont_filter = True,
meta = {'dont_redirect' : True,
'handle_httpstatus_list': [302],
'cookiejar' : response.meta['cookiejar'] }
)
def after_login(self, response):
print"***"
print(response.url)
print"***"
print(response.body)
if "Account Activity" in response.body:
self.logger.error("Login failed")
return
else:
print("you got through!")
print()
The issue is that once I finally get to my account page where all my spending is displayed, I can't actually access the HTML data. I've properly handles 302 redirections, but the "meta = " options seem to take me to the page through selenium, but don't let me scrape it.
Instead of getting all the data from response.body in the after_login function, I get the following:
<html><head><title>Object moved</title></head><body>
<h2>Object moved to here.</h2>
</body></html>
How do I manage to actually be able to get that information to scrape it?
Is this redirection in place by the bank to protect account from being crawled?
Thank you!

Scrapy - making selections from dropdown(e.g.date) on webpage

I'm new to scrapy and python and i am trying to scrape data off the following start url .
After login, this is my start url--->
start_urls = ["http://www.flightstats.com/go/HistoricalFlightStatus/flightStatusByFlight.do?"]
(a) from there i need to interact with the webpage to select ---by-airport---
and then make ---airport, date, time period selection---
how can i do that? i would like to loop over all time periods and past dates..
I have used firebug to see the source, I cannot show here as I do not have enough points to post images..
i read a post mentioning the use of Splinter..
(b) after the selections it will lead me to a page where there are links to the eventual page with the information i want. How do i populate the links and make scrapy look into every one to extract the information?
-using rules? where should i insert the rules/ linkextractor function?
I am willing to try myself, hope help can be given to find posts that can guide me.. I am a student and I have spent more than a week on this.. I have done the scrapy tutorial, python tutorial, read the scrapy documentation and searched for previous posts in stackoverflow but I did not manage to find posts that cover this.
a million thanks.
my code so far to log-in and the items to scrape via xpath from the eventual target site:
`import scrapy
from tutorial.items import FlightItem
from scrapy.http import FormRequest
class flightSpider(scrapy.Spider):
name = "flight"
allowed_domains = ["flightstats.com"]
login_page = 'https://www.flightstats.com/go/Login/login_input.do;jsessionid=0DD6083A334AADE3FD6923ACB8DDCAA2.web1:8009?'
start_urls = [
"http://www.flightstats.com/go/HistoricalFlightStatus/flightStatusByFlight.do?"]
def init_request(self):
#"""This function is called before crawling starts."""
return Request(url=self.login_page, callback=self.login)
def login(self, response):
#"""Generate a login request."""
return FormRequest.from_response(response,formdata= {'loginForm_email': 'marvxxxxxx#hotmail.com', 'password': 'xxxxxxxx'},callback=self.check_login_response)
def check_login_response(self, response):
#"""Check the response returned by a login request to see if we aresuccessfully logged in."""
if "Sign Out" in response.body:
self.log("\n\n\nSuccessfully logged in. Let's start crawling!\n\n\n")
# Now the crawling can begin..
return self.initialized() # ****THIS LINE FIXED THE LAST PROBLEM*****
else:
self.log("\n\n\nFailed, Bad times :(\n\n\n")
# Something went wrong, we couldn't log in, so nothing happens.
def parse(self, response):
for sel in response.xpath('/html/body/div[2]/div[2]/div'):
item = flightstatsItem()
item['flight_number'] = sel.xpath('/div[1]/div[1]/h2').extract()
item['aircraft_make'] = sel.xpath('/div[4]/div[2]/div[2]/div[2]').extract()
item['dep_date'] = sel.xpath('/div[2]/div[1]/div').extract()
item['dep_airport'] = sel.xpath('/div[1]/div[2]/div[2]/div[1]').extract()
item['arr_airport'] = sel.xpath('/div[1]/div[2]/div[2]/div[2]').extract()
item['dep_gate_scheduled'] = sel.xpath('/div[2]/div[2]/div[1]/div[2]/div[2]').extract()
item['dep_gate_actual'] = sel.xpath('/div[2]/div[2]/div[1]/div[3]/div[2]').extract()
item['dep_runway_actual'] = sel.xpath('/div[2]/div[2]/div[2]/div[3]/div[2]').extract()
item['dep_terminal'] = sel.xpath('/div[2]/div[2]/div[3]/div[2]/div[1]').extract()
item['dep_gate'] = sel.xpath('/div[2]/div[2]/div[3]/div[2]/div[2]').extract()
item['arr_gate_scheduled'] = sel.xpath('/div[3]/div[2]/div[1]/div[2]/div[2]').extract()
item['arr_gate_actual'] = sel.xpath('/div[3]/div[2]/div[1]/div[3]/div[2]').extract()
item['arr_terminal'] = sel.xpath('/div[3]/div[2]/div[3]/div[2]/div[1]').extract()
item['arr_gate'] = sel.xpath('/div[3]/div[2]/div[3]/div[2]/div[2]').extract()
yield item`

Accepting form data from HTML in Python

I am trying to write a simple web app as a school assignment. Basically, the app uses the twitter stream API to pull tweets, and then I'm going to graph the results using Google Charts.
I have written a python script that is able to make the connection to twitter and pull the tweets in real time, but I want to be able to define the search criteria based off of user input on an HTML page. (I don't have it written yet, but I plan to have a list of radio buttons that a user can select from to choose the search option)
My question is how do I import data from an HTML document into my python script, and then return the data back to my HTML page.
I am very new to python and web programming in general, so over simplification would be nice!
This is the code in my python file:
from tweepy import Stream
from tweepy import OAuthHandler
from tweepy.streaming import StreamListener
import time
ckey = ''
csecret = ''
atoken = ''
asecret = ''
class listener(StreamListener):
def on_data(self,data):
try:
tweet = data.split(',"text":"')[1].split('","source')[0]
print tweet
saveThis = str(time.time())+ '::' + tweet
print saveThis
saveFile = open ('twitDB.csv' , 'a')
saveFile.write (saveThis)
saveFile.write ('\n')
saveFile.close()
return True
except BaseException, e:
print 'failed ondata, ', str(e)
time.sleep(5)
def on_error(self,status):
print status
auth =OAuthHandler(ckey,csecret)
auth.set_access_token(atoken, asecret)
twitterStream = Stream(auth, listener())
twitterStream.filter(track=["car"])

Python- Get Stock Information for a company for a range of dates

Trying to get more than just the stock information at the current time period, and I can't figure out if Google Finance allows for retrieving information for more than just one date. For example, if I wanted to find out the Google Stock value over the last 30 days and return that data as a list... how would I go about doing this?
Using the code below only gets me a single value:
class GoogleFinanceAPI:
def __init__(self):
self.prefix = "http://finance.google.com/finance/info?client=ig&q="
def get(self,symbol,exchange):
url = self.prefix+"%s:%s"%(exchange,symbol)
u = urllib2.urlopen(url)
content = u.read()
obj = json.loads(content[3:])
return obj[0]
c = GoogleFinanceAPI()
quote = c.get("MSFT","NASDAQ")
print quote
Here is a recipe to get a historical values from Google Finance:
http://code.activestate.com/recipes/576495-get-a-stock-historical-value-from-google-finance/
It looks like it returns the data in .csv format.
Edit: Here is your script modified to get the .csv. It works for me.
import urllib2
import csv
class GoogleFinanceAPI:
def __init__(self):
self.url = "http://finance.google.com/finance/historical?client=ig&q={0}:{1}&output=csv"
def get(self,symbol,exchange):
page = urllib2.urlopen(self.url.format(exchange,symbol))
content = page.readlines()
page.close()
reader = csv.reader(content)
for row in reader:
print row
c = GoogleFinanceAPI()
c.get("MSFT","NASDAQ")
The best way to go forward is use the API's provided by Google. Specifically look for returns parameter where you specify how long you want.
Instead, if you want to do it via Python, find out query pattern as where the date entry goes and substitute it in the URL and do a GET, parse the result and include it in your result list.

Categories

Resources