I am trying to get a specific dynamic figure from a webpage to excel, I managed to gather all the website get response into a "all" variable which I am supposed to parse to extract my numbers, except for when I check the string variable I can see everything but the required dynamic figure! :) "the attached phot shows the dynamic figure at the very instant was 2.19",
any ideas why I am capturing every thing, would be much appreciated, Thanks in advance
My thoughts:
1.I am guessing is the figures are injected by JavaScript or a server side that might be executing after my XMLHTTP request is processed maybe! if this is the case or else I need your expertise
the website doesn't response unless it sees a specific Html request header, so I might need to mimic the headers of Chrome, I don't know how they look like?
Please see below my code and a screenshot for the figure I would like to capture
'Tools>refrences>microsoft xml v3 must be refrenced
Public Function GetWebSource(ByRef URL As String) As String
Dim xml As IXMLHTTPRequest
On Error Resume Next
Set xml = CreateObject("Microsoft.XMLHTTP")
With xml
.Open "GET", URL, False
.send
GetWebSource = .responseText
End With
Set xml = Nothing
End Function
Sub ADAD()
Dim all As Variant
Dim objHTTP As Object
Dim URL As String
Set objHTTP = CreateObject("WinHttp.WinHttpRequest.5.1")
all = GetWebSource("https://www.tradingview.com/symbols/CRYPTOCAP-ADA.D/")
pos = InStr(all, "tv-symbol-price-quote__value js-symbol-last")
testString = Mid(all, pos, 200)
'I am supposed to see the dynamic figure within the TAG but it is not showing!!
Debug.Print testString
End Sub
HTML for Dynamic Required values
#Tim Williams This is a code using selenium (But it seems doesn't do the trick of getting the value)
PhantomJS Selenium VBA
Sub Test()
Dim bot As Selenium.PhantomJSDriver
Set bot = New Selenium.PhantomJSDriver
With bot
.Get "https://www.tradingview.com/symbols/CRYPTOCAP-ADA.D/"
.Wait 2000
Debug.Print .FindElementByXPath("//div[#class='tv-symbol-price-quote__value js-symbol-last']").Attribute("outerHTML")
End With
End Sub
Chrome VBA Selenium
It seems using PhantomJS doesn't work properly, so here's a Chrome version of selenium in VBA
Private bot As Selenium.ChromeDriver
Sub Test()
Set bot = New Selenium.ChromeDriver
With bot
.Start
.Get "https://www.tradingview.com/symbols/CRYPTOCAP-ADA.D/"
Debug.Print .FindElementByXPath("//div[#class='tv-symbol-price-quote__value js-symbol-last']").Text 'Attribute("outerHTML")
.Quit
End With
End Sub
Python Solution
And this is the working python code that my tutor #QHarr provided in comments
from selenium import webdriver
d = webdriver.Chrome("D:/Webdrivers/chromedriver.exe")
d.get('https://www.tradingview.com/symbols/CRYPTOCAP-ADA.D/')
d.find_element_by_css_selector('.tv-symbol-price-quote__value.js-symbol-last').text
Related
I'm trying to get web table data from the following below website and extract the first table regarding Policy rates on the right hand-side.
https://www.researchonline.se/macro/our_forecasts
Have use the following code just to see if it spits out the desired data but keep getting error 91. Suspecting something about Javascript that I need to consider in my code? Below is my code.
Dim request As Object
Dim response As String
Dim html As New HTMLDocument
Dim website As String
Dim price As Variant
' Website to go to.
website = "https://www.researchonline.se/macro/our_forecasts"
' Create the object that will make the webpage request.
Set request = CreateObject("MSXML2.XMLHTTP")
' Where to go and how to go there - probably don't need to change this.
request.Open "GET", website, False
' Get fresh data.
request.setRequestHeader "If-Modified-Since", "Sat, 1 Jan 2000 00:00:00 GMT"
' Send the request for the webpage.
request.send
' Get the webpage response data into a variable.
response = StrConv(request.responseBody, vbUnicode)
' Put the webpage into an html object to make data references easier.
html.body.innerHTML = response
' Get the price variable from the specified element on the page and just check in a message box if that is the correct data.
price = html.getElementsByTagName("table")(0).innerText
' Output the price into a message box.
MsgBox price
Set request = CreateObject("MSXML2.XMLHTTP")
The above cannot execute javascript , Though instead of calling url and fetching it from table. you can directly use their API
for example to get policy rate, the site uses API (There are many apis , i have seen by looking into the site). One example is
https://www.researchonline.se/api/MacroAdmin/GetForecast?name=policy&start=Mon%20Sep%2014%202020&end=Tue%20Sep%2014%202021
which will return an xml kindda response, which you can parse with MSXML2.XMLHTTP
I am a java newbie and met this question, anyone can help would be appreciate ;)
Actually, I write a java servlet, and this servlet, just output javascript code to generate a virtual html page, and in this html's JS code will output a 'String' I need (acutally decrypted RSA psd). this works fine when use browser, but when I used 'curl' command (java code also) to visit this servlet, it returns just the whole JS code ,not the 'String' I needed.
I tried set "contentType text/html", "application/json" etc, but always return JS code. Is the method wrong? how can I get the String at backend?
some code like this:
out.println("<html>");
out.println("<script type=\"text/javascript\" src=\"./js/jsencrypt.js\"></script> ");
out.println("<head>");
out.println("</head>");
out.println("<body onload=testenc()>");
out.println("</body>");
out.println("<script>");
out.println(" function testenc() ");
out.println("{");
out.println(" var encrypts = new JSEncrypt();");
out.println("encrypts.setPublicKey('MIGfMA0GCSqG.....
The browser just display like:
MGFuXPaCe+NgHUYQh8Xq86ooG13RJaxwyX2ZxIxhm+ewi/rrQPKbnz0hugwxdWcy3wBLuTx5bxougDLU0Bo...
But curl command will return whole source code?
any help would be thanks.
I want to make a general scraper which can crawl and scrape all data from any type of website including AJAX websites. I have extensively searched the internet but could not find any proper link which can explain me how Scrapy and Splash together can scrape AJAX websites(which includes pagination,form data and clicking on button before page is displayed). Every link I have referred tells me that Javascript websites can be rendered using Splash but there's no good tutorial/explanation about using Splash to render JS websites. Please don't give me solutions related to using browsers(I want to do everything programmatically,headless browser suggestions are welcome..but I want to use Splash).
class FlipSpider(CrawlSpider):
name = "flip"
allowed_domains = ["www.amazon.com"]
start_urls = ['https://www.amazon.com/s/ref=nb_sb_noss?url=search-alias%3Daps&field-keywords=mobile']
rules = (Rule(LinkExtractor(), callback='lol', follow=True),
def parse_start_url(self,response):
yield scrapy.Request(response.url,
self.lol,
meta={'splash':{'endpoint':'render.html','args':{'wait': 5,'iframes':1,}}})
def lol(self, response):
"""
Some code
"""
The problem with Splash and pagination is following:
I wasn't able to product a Lua script that delivers a new webpage (after click on pagination link) that is in format of response. and not pure HTML.
So, my solution is following - to click the link and extract that new generated url and direct a crawler to this new url.
So, I on the page that has pagination link I execute
yield SplashRequest(url=response.url, callback=self.get_url, endpoint="execute", args={'lua_source': script})
with following Lua script
def parse_categories(self, response):
script = """
function main(splash)
assert(splash:go(splash.args.url))
splash:wait(1)
splash:runjs('document.querySelectorAll(".next-page")[0].click()')
splash:wait(1)
return splash:url()
end
"""
and the get_url function
def get_url(self,response):
yield SplashRequest(url=response.body_as_unicode(), callback=self.parse_categories)
This way I was able to loop my queries.
Same way if you don't expect new URL your Lua script can just produce pure html that you have to work our with regex (that is bad) - but this is the best I was able to do.
You can emulate behaviors, like a ckick, or scroll, by writting a JavaScript function and by telling Splash to execute that script when it renders your page.
A little exemple:
You define a JavaScript function that selects an element in the page and then clicks on it:
(source: splash doc)
# Get button element dimensions with javascript and perform mouse click.
_script = """
function main(splash)
assert(splash:go(splash.args.url))
local get_dimensions = splash:jsfunc([[
function () {
var rect = document.getElementById('button').getClientRects()[0];
return {"x": rect.left, "y": rect.top}
}
]])
splash:set_viewport_full()
splash:wait(0.1)
local dimensions = get_dimensions()
splash:mouse_click(dimensions.x, dimensions.y)
-- Wait split second to allow event to propagate.
splash:wait(0.1)
return splash:html()
end
"""
Then, when you request, you modify the endpoint and set it to "execute", and you add "lua_script": _script to the args.
Exemple :
def parse(self, response):
yield SplashRequest(response.url, self.parse_elem,
endpoint="execute",
args={"lua_source": _script})
You will find all the informations about splash scripting here
I just answered a similar question here: scraping ajax based pagination. My solution is to get the current and last pages and then replace the page variable in the request URL.
Also - the other thing you can do is look on the network tab in the browser dev tools and see if you can identify any API that is called. If you look at the requests under XHR you can see those that return json.
You can then call the API directly and parse the json/ html response. Here is the link from the scrapy docs:The Network-tool
I am struggled with this problem.
Let me resume whats going on.
I want to replicate the submit of a form hosted on a external site. But when I get the response, I get question marks "?" where there is supose to be characters with accents.
Let me be more explicit.
1) Check the site (I did not build it):
Its in spanish, but let me guide you.
You can see a form , in the combo box, select "Asunto", and in the "N°" textbox, insert "1234/2016" and then click "Consultar" button.
after that you can see the result, and you can perfectly see the "accent characters" like á é , etc.
.... Now ... I am trying to replicate that from my software.
I replicate the submit using Jquery/Ajax call, and I did it, I can get the results and everything, ... BUT, the accent characters are shown as "?" question marks like (from chrome dev tools):
The javascript code used to make the call is ..
var ajaxCall4 = $.ajax({
url: '<*the url provided at the beggining*>',
type:'POST',
data:{
expediente_numero: _valores.expediente_numero,
expediente_codigo: _valores.expediente_codigo,
expediente_ano: _valores.expediente_ano,
documentacion_tipo_id: _valores.documentacion_tipo_id,
documentacion_id: _valores.documentacion_id ,
B1:'Consultar'
},
timeout: 5000
});
I am totally lost on this, I tried setting the contentType explicitly, but doesnt work, I need your help.
Checkout and example execution from the Advanced REST client Extension in chrome..
IN ADITION.
*I the HTML file that displays the results I have the
*The results obtained from the ajax request, are "append" to a "table" tag adding new rows.
When displaying a page that contains strange characters you have to specify encoding in head section like this :
<meta charset="utf-8">
I am using an ajax htmleditor in asp.net web application so i am trying to get the text the user has entered in the editor then i will send that text back to the client javascript function that will show the text in a div. But I am getting this error "Object reference not set to an instance of an object."
Firstly i tried to access the text of textbox linked with htmleditorextender through javascript but it was not working for me so i moved to ajax webmethod but this time also i am facing a problem. Please Help me.
[System.Web.Services.WebMethod]
public static string seteditor()
{
String x="";
try
{
Content c = new Content();
x = c.txteditor.Text;
}
catch (Exception ex) { x=ex.Message; }
return x;
}
Here, txteditor is the ID of asp:textbox which is linked with ajaxcontroltoolkit htmleditorextender.
You cannot get your aspx controls inside a static method.
If you are Calling a static method from jquery means the Page and its Controls don't even exist. You need to look another workaround for your problem.
EDIT:
I always pass my control values to page methods like this:
Assume I have two text controls: txtGroupName and txtGroupLevel
...My JS with Jquery will be :
var grpName = $("#<%=txtGroupName.ClientID%>").val();
var grpLevel = $("#<%= txtGroupLevel.ClientID %>").val();
data: "{'groupName':'" + grpName + "','groupLevel':'" + grpLevel + "'}",
Where groupName and groupRights are my webmethod parameters.
EDIT2:
Include your script like this:
<script type="text/javascript" src="<%= ResolveUrl("~/Scripts/jquery-1.4.1.js") %>"></script>
I suggest you to use the latest jquery version.
Web methods do not interact with the page object or the control hierarchy like this. That's why they're static in the first place. You need to pass the text from the client as a parameter to the web method, not read it from the textbox.
This issue was torturing me from last 18 hours continuouslyFirst I tried javascript than webmethod and than on user1042031's suggestion I tried jquery and than again I tried javascript and look how much easily it can be done with a single line of code.
var a = document.getElementById('<%= txteditor.ClientID %>').value;
read this stackoverflow article Getting Textbox value in Javascript
I apologize to everyone who responded me in this question but i have not found that article in my initial search.