Get full URL of external website by JS, PHP or VBA - javascript

I am wondering is there a way to return full URL (or just params) of the website using JavaScript or PHP, or even VBA in Excel?
I can not preview this site in an iframe, because it gives me an error:
("frame-ancestors 'self'". error, Content Security Policy)
so I can't use window.location (probably because the Cross-Origin), I wonder how it can be done in the PHP or Excel VBA (how to use xmlhttp.Open in such case?), or with JS without returning the content itself. I don't need the content, but just the full URL or parameters, by knowing only a part of the URL.
I have the full list of ID's from the shop, so if it would be possible I could save a lot of typing for the reseller, because he uses ID in ordering specific stuff from his supplier. If I use the ID the website loads the content and changes the URL to a proper one - it looks like this (in the browser):
My request: https://exampledomain.com/product_id=11
After loading URL is: https://exampledomain.com/product_id=11&category=bikes&type=street
so the part I am interested in is &category=bikes&type=street, but also a full URL will be enough for further analyzing.
TO BE MORE PRECISE, here are working examples:
observe how the URL changes after the content is loaded, and I need to grab new, changed URL
https://www.olx.pl/oferta/CID767-IDCkB7E.html
https://www.google.pl/maps/place/Paris,+France/
https://www.openstreetmap.org/search?query=Warszawa

I think this is what you are after.
'Import Everything From a Web Page:
Sub Test()
Dim IE As Object
Set IE = CreateObject("InternetExplorer.Application")
With IE
.Visible = True
.Navigate "http://your_link_here/" ' should work for any URL
Do Until .ReadyState = 4: DoEvents: Loop
x = .document.body.innertext
x = Replace(x, Chr(10), Chr(13))
x = Split(x, Chr(13))
Range("A1").Resize(UBound(x)) = Application.Transpose(x)
.Quit
End With
End Sub

Related

VBA & Selenium | Access iframe within HTML containing #document

I am trying to access the HTML within two iframes using Selenium Basic in VBA, as IE has been blocked on our machines, and Python, etc. are not available to us.
Previously I could access the html with this:
Dim IE As InternetExplorerMedium
Set IE = New InternetExplorerMedium
' actual website excluded as it is a work hosted website which requires login, etc.
website = "..."
IE.navigate (website)
Dim IEDocument As HTMLDocument
Set IEDocument = IE.document.getElementById(id1).contentDocument.getElementById(id2).contentDocument
From there I would have access to all the HTML elements which I could work with.
Now I am trying the following with Selenium Basic:
Set cd = New Selenium.ChromeDriver
website = "..."
cd.Start baseUrl:=website
cd.Get "/"
Dim af1 As Selenium.WebElement, af2 As Selenium.WebElement
Set af1 = cd.FindElementById("CRMApplicationFrame")
Set af2 = af1.FindElementById("WorkAreaFrame1")
It works up to the last line, as it is able to set af to the "CRMApplicationFrame" id; however, I am unable to get inside of it.
I think the solution lies in executing a bit of JavaScript, similar to as in this video:
https://www.youtube.com/watch?v=phYGCGXGtEw
Although I don't have a #ShadowDOM line, I do have a #document line.
Based on and trying to adapt the video I have tried the following:
Set af2 = cd.ExecuteScript(Script:="return arguments[0].contentDocument", arguments:=af1 )
However, that did not work.
I also tested:
Dim af1 As Selenium.WebElement
Set af1 = cd.FindElementById("CRMApplicationFrame")
call cd.SwitchToFrame (af1)
Debug.Print cd.PageSource
However, the SwitchToFrame line won't execute, with a 438 error: Object doesn't support this property or method.
Any advice or guidance on how I could succeed would be highly appreciated!
Replace:
call cd.SwitchToFrame (af1)
with:
cd.SwitchToFrame "CRMApplicationFrame"
You can find a relevant detailed discussion in Selenium VBA Excel - problem clicking a link within an iframe

How to Detect If JS is Running in Website Builder?

I want to display forums inside websites where my javascript (and HTML and CSS) is embedded, but if the javascript is running inside a website builder, I just want to have some text telling the user their forums are installed here (in the embedded DIV) and not try to display any forums. My only idea is to look at the URL and if I see a known website builder, then run the website builder code, but I would need a large list of all website builder URLs. Does anyone have such a list or is there a better solution? My current code looks like this:
var hostURL = window.location.href;
if (hostURL == "about:srcdoc") hostURL = window.parent.location.href;
if (hostURL.indexOf("websites.godaddy.com") > -1 || // godaddy
hostURL.indexOf(".preview.editmysite.com") > -1) { // weebly
displayWebsiteBuilderInfo();
return;
}
Here's what I did, but I'm not sure if it's a good solution (and it's not a solution for the original question):
In the PHP code that handles the request to get the forums data I read the content at the referer URL (comes from the client - window.location.href) to see if the javascript is there. If it's not there, assume the request came from a website builder. Then if isWebsiteBuilder is true back at the client, call displayWebsiteBuilderInfo();
Here's the PHP code:
$siteContent = #file_get_contents($referer);
$siteContent = htmlspecialchars_decode($siteContent);
$idx = strpos($siteContent, "<script async src=\"https://www.bubblecritic.com/js/embed/the_js.js\"></script>");
if ($idx === false) $isWebsiteBuilder = true;

SharePoint Rest Document library

I am creating a custom page writing the HTML and javascript for a SharePoint site. I would like to embed document libraries inside my custom html I am writing in SharePoint designer.
I have nto found a way to easily embed document libraries in custom html but did stumble on some documentation for a rest api. I figured I could use this and write my own ajax app in the html for users to navigate the document library.
I am currently trying with this javascrip just to see if I can pull html or JSON for a document library contents:
<script type="text/javascript">
var folderUrl = "x/x/x/testDocumentLibrary/Forms/AllItems.aspx";
var url = _spPageContextInfo.webServerRelativeUrl + "/_api/Web/GetFolderByServerRelativeUrl('" + folderUrl + "')?$expand=Folders,Files";
$.getJSON(url,function(data,status,xhr){
for(var i = 0; i < data.Files.length;i++){
console.log(data.Files[i].Name);
}
for(var i = 0; i < data.Folders.length;i++){
console.log(data.Folders[i].Name);
}
});
</script>
I am not sure if I am using the right url for the folderUrl variable.
In order to conduct some tests what is _spPageContextInfo.webServerRelativeURL pulling? I am trying to see if I can work backwards and create the URL manually first with out the SP function calls.
The folderUrl variable in your example code should end with the path to the library; everything up until /Forms/AllItems.aspx, so /x/x/x/testDocumentLibrary where /x/x/x/ is the server-relative path to the site on which the library resides.
The _spPageContextInfo object provides two variations of server-relative URL, one for the current site (called a "web" in SharePoint jargon) and one for the current site collection (called a "site" in SharePoint jargon). Appropriately, these properties are labeled webServerRelativeURL and siteServerRelativeURL. Both of these are server-relative, meaning that they exclude the first part of the domain name. (Instead of https://constoso.com/sites/stackoverflow they'll give you /sites/stackoverflow.)
For a REST call, you probably want the absolute URL, not the server-relative URL. You can access the web and site absolute URLs through _spPageContextInfo's properties webAbsoluteURL and siteAbsoluteURL.
If the list/library you're accessing is on the current site where your REST is running, use the webAbsoluteURL property.

Cache static HTML pages with get variables

I have a website with a lot of iframes like this:
<iframes src="expamle.com\page.html?var=blabla&id=42" scrolling="no"></iframe>
I have to change var=blabla&id=42 for each iFrame. These parameters are used in the javascript of the iframe. Is there any way to cache(give hints to the browser) page.html (static) once for all variables ?
I have to use an iframe since I want to be able to update this code ( from another server) & to run it in another scope.
No - Anything changing the query string represents a seperate resource for the browser.
However, you may be able to achieve that effect if you can make some slight changes to page.html. If you write it this way:
<iframes src="expamle.com\page.html#var=blabla&id=42" scrolling="no"></iframe>
Note the use of the # character - that's the key there.
The query string becomes simply "page.html" and will cache that way. However, the Javascript of that page will have access to the variable document.location.hash, which will contain "var=blabla&id=42". It'll be written as a single string, but it shouldn't be difficult to parse. Some libraries even use that tag to pass parameters in semi-real-time to iframes for IE6 compatibility.
If it's only used in the javascript but is really only 1 page server side don't use ? But use # it will consider it as the same page but at diferent anchor pounts. So if test.com/#foo is cached then test.col/#bar is too (same page, different anchor points)
You can update the frame URLs from code:
var fr = document.getElementsByTagName('iframe');
var sites = "1.com,2.com".split(",");
for(var x=0;x<fr.length;x++) {
document.getElementsByTagName('iframe')[x].src="http://"+sites[x];
}

Issues in developing web scraper

I want to develop a platform where users can enter a URL and then my website will open the webpage in an iframe. Now the user can modify his website by simply right clicking and I will provide him options like "remove this element", "copy this element". I am almost through. Many of the websites are opening perfectly in iframe but for a few websites some errors have shown up. I could not identify the reason so asking for your help.
I have solved other issues like XSS problem.
Here is the procedure I have followed :-
Used JavaScript and sent the request to my Java server which makes connection to the URL specified by the user and fetches the HTML and then use Jsoup HTML parser to convert relative URLs into absolute URLs and then save the HTML to my disk in Java. And then I render the saved HTML into my iframe.
Is somewhere wrong ?
A few websites are working perfectly but a few are not.
For example:-
When I tried to open http://www.snapdeal.com it gave me the
Uncaught TypeError: Cannot read property 'paddingTop' of undefined
error. I don't understand why this is happening..
Update
I really wonder how this is implemented? # http://www.proxywebsites.in/browse.php?u=Oi8vd3d3LnNuYXBkZWFsLmNvbQ%3D%3D&b=13&f=norefer
2 issues, pick any you like:
your server side proxy code contains bugs
plenty of sites have either explicit frame-break code or at least expect to be top level frame.
You can try one more thing. In your proxy script you are saving your webpage on your disk and then loading into iframe. I think instead of loading the page you saved on disk in iframe try to open that page in browser. All those sites that restirct their page to be loaded into iframe will now get opened without any error.
Try this I think it an work
My Proxy Server side code :-
DateFormat df = new SimpleDateFormat("ddMMyyyyHHmmss");
String dirName = df.format(new Date());
String dirPath = "C:/apache-tomcat-7.0.23/webapps/offlineWeb/" + dirName;
String serverName = "http://localhost:8080/offlineWeb/" + dirName;
boolean directoryCreated = new File(dirPath).mkdir();
if (!directoryCreated)
log.error("Error in creating directory");
String html = Jsoup.connect(url.toString()).get().html();
doc = Jsoup.parse(html, url);
links = doc.select("link");
scripts = doc.select("script");
images = doc.select("img");
for (Element element : links) {
String linkHref = element.attr("abs:href");
if (linkHref != "") {
element.attr("href", linkHref);
}
}
for (Element element : scripts) {
String scriptSrc = element.attr("abs:src");
if (scriptSrc != "") {
element.attr("src", scriptSrc);
}
}
for (Element element : images) {
String imgSrc = element.attr("abs:src");
if (imgSrc != "") {
element.attr("src", imgSrc);
log.info(imgSrc);
}
}
And Now i am just returning the path where i saved my html file
That's it about my server code

Categories

Resources