Finding previous URL from window.history - javascript

I've hosts file blocking so time to time, I get these "Page not found" errors browsing thru deals.
Since I got tired of copying the target url, unescaping, replacing it in address bar and hitting enter, I wrote a handy bookmarklet to automate this:
(function () {
var href = window.location.href;
var loc = href.indexOf('url=');
if (loc > 0) {
var endLoc = href.indexOf('&', loc + 4);
endLoc = endLoc > 0 ? endLoc : href.length;
window.location.href = unescape(href.substring(loc + 4, endLoc));
}
})()
Now the problem is that Chrome, internally redirects and unreachable page to its own bounce.php which produces the following error page.
Since it supports history API, the URL in browser address bar doesn't change, as evident from the following data:
> JSON.stringify(window.history)
{"state":null,"length":2}
Now the problem is, my bookmarklet doesn't work since window.location.href points to "data:text/html,chromewebdata" once this happens.
I've looked at this question How do you get the previous url in Javascript? whose accepted answer is blissfully incorrect. Rightfully so, document.referrer is empty in my case.
So is there a way to find the previous URL from window.history? window.history.previous is non-standard and doesn't work on Chrome anyway.

Found a way (at least while it lasts!)
Chrome sets the entire URL in document.title, so querying it is all that's needed. Here's the modified code if anyone's interested:
(function () {
var href = document.title;
var loc = href.lastIndexOf('http');
if (loc > 0) {
var endLoc = href.indexOf(' is not available', loc);
endLoc = endLoc > 0 ? endLoc : href.length;
window.location.href = unescape(unescape(href.substring(loc, endLoc)))
}
})()
Note: The double unescape is needed for links coming via Google Ads.

Related

In vanilla JavaScript, turn relative path + base URL into absolute URL

In Ruby, it’s simple to do this, but in JavaScript, I’m not sure.
Given a starting page, such as http://example.org/foo/bar, I want to be able to take any link on the page, which can have any sort of href such as /x.php, ?p=3, y.html, etc., and turn it into a fully qualified absolute URL, such as (in the last example) http://example.org/foo/y.html.
Is there any sort of simple way to do this? If it helps, we can assume these paths do live in an actual web page as actual <a href> elements.
The URL constructor takes a second, base argument, which does exactly what you want:
const base = 'http://example.org/foo/bar';
[ '/x.php',
'?p=3',
'y.html'
].forEach(urlPart => {
const url = new URL(urlPart, base);
console.log(url.href);
});
.as-console-wrapper{min-height:100%}
<script src="//rawgit.com/github/url-polyfill/0.5.6/url.js"></script>
The URL API works in all major browsers except IE. If you need to support IE, there are polyfills available. Node.js also has it built in (const { URL } = require('url');).
If your baseURL is equal to the current page, try this:
var getAbsoluteUrl = (function() {
var a;
return function(url) {
if(!a) a = document.createElement('a');
a.href = url;
return a.href;
};
})();
Found here: https://davidwalsh.name/get-absolute-url
Tried it and it worked well for relative as well as absolute URLs (it makes them all absolute) - assuming your basePath is actually your own page.
Use this script (but test it first for the various cases, I just wrote it and wouldn't guarantee I haven't overlooked any case). Note that if the path of the URL specifies a directory and not a file, it always ends in a /, even though the browser might not show that.
var getAbsoluteURL = function (url, href) {
var path = url.split(/[#?]/)[0];
var basePath = path.slice(0, path.lastIndexOf('/'));
var domain = url.split('/').slice(0,3).join('/');
var protocol = url.split('/')[0];
switch (href.charAt(0)) {
case '/':
{
if (href.length > 1 && href.charAt(1) == '/')
return protocol + href;
else
return domain + href;
}
case '#':
case '?':
return path + href;
default:
return basePath + '/' + href;
}
}

Scraping table from website, with javascript:subOpen href link

I would like to scrape for each link on this page the page details page behind.
I can get all informations on this page: PAGE
However, I would like to get all info's on the details page, but the href link looks like that, for example:
href="javascript:subOpen('9ca8ed0fae15d43dc1257e7300345b99')"
Here is my sample spreadsheet using the ImportHTML function to get the general overview.
Google Spreadsheet
Any suggestions how to get the details pages?
UPDATE
I implemented the method the following:
function doGet(e){
var base = 'http://www.ediktsdatei.justiz.gv.at/edikte/ex/exedi3.nsf/'
var feed = UrlFetchApp.fetch(base + 'suche?OpenForm&subf=e&query=%28%5BVKat%5D%3DEH%20%7C%20%5BVKat%5D%3DZH%20%7C%20%5BVKat%5D%3DMH%20%7C%20%5BVKat%5D%3DMW%20%7C%20%5BVKat%5D%3DMSH%20%7C%20%5BVKat%5D%3DGGH%20%7C%20%5BVKat%5D%3DRH%20%7C%20%5BVKat%5D%3DHAN%20%7C%20%5BVKat%5D%3DWE%20%7C%20%5BVKat%5D%3DEW%20%7C%20%5BVKat%5D%3DMAI%20%7C%20%5BVKat%5D%3DDTW%20%7C%20%5BVKat%5D%3DDGW%20%7C%20%5BVKat%5D%3DGA%20%7C%20%5BVKat%5D%3DGW%20%7C%20%5BVKat%5D%3DUL%20%7C%20%5BVKat%5D%3DBBL%20%7C%20%5BVKat%5D%3DLF%20%7C%20%5BVKat%5D%3DGL%20%7C%20%5BVKat%5D%3DSE%20%7C%20%5BVKat%5D%3DSO%29%20AND%20%5BBL%5D%3D0').getContentText();
var d = document.createElement('div'); //assuming you can do this
d.innerHTML = feed;//make the text a dom structure
var arr = d.getElementsByTagName('a') //iterate over the page links
var response = "";
for(var i = 0;i<arr.length;i++){
var atr = arr[i].getAttribute('onclick');
if(atr) atr = atr.match(/subOpen\((.*?)\)/) //if onclick calls subOpen
if(atr && atr.length > 1){ //get the id
var detail = UrlFetchApp.fetch(base + '0/'+atr[1]).getContentText();
response += detail//process the relevant part of the content and append to the reposnse text
}
}
return ContentService.createTextOutput(response);
}
However, I get an error when running the method:
ReferenceError: "document" is not defined. (line 6, file "")
What is the document an object of?
I have update the Google Spreadsheet with a webapp.
You can use Firebug in order to inspect the page contents and javascript. For instance you can find that subOpen is actually an alias to subOpenXML declared in xmlhttp01.js.
function subOpenXML(unid) {/*open found doc from search view*/
if (waiting) return alert(bittewar);
var wState = dynDoc.getElementById('windowState');
wState.value = 'H';/*httpreq pending*/
var last = '';
if (unid==docLinks[0]) {last += '&f=1'; thisdocnum = 1;}
if (unid==docLinks[docLinks.length-1]) {
last += '&l=1';
thisdocnum = docLinks.length;
} else {
for (var i=1;i<docLinks.length-1;i++)
if (unid==docLinks[i]) {thisdocnum = i+1; break;}
}
var url = unid + html_delim + 'OpenDocument'+last + '&bm=2';
httpreq.open('GET', // &rand=' + Math.random();
/*'/edikte/test/ex/exedi31.nsf/0/'+*/ '0/'+url, true);
httpreq.onreadystatechange=onreadystatechange;
// httpreq.setRequestHeader('Accept','text/xml');
httpreq.send(null);
waiting = true;
title2src = firstTextChild(dynDoc.getElementById('title2')).nodeValue;
}
So, after copying the function source and modifying it in firebug's Console tab to add a console.log(url) before the http call, like this:
var url = unid + html_delim + 'OpenDocument'+last + '&bm=2';
console.log(url)
httpreq.open('GET', // &rand=' + Math.random();
/*'/edikte/test/ex/exedi31.nsf/0/'+*/ '0/'+url, true);
You can execute the function declaration in firebug's Console tab and overwrite subOpen with the modified source.
Clickin in the link then will show that the invoked url is composed of the id passed as parameter to subOpen prefixed by '0/', so in the example you posted it would be a GET to:
http://www.ediktsdatei.justiz.gv.at/edikte/ex/exedi3.nsf/0/1fd2313c2e0095bfc1257e49004170ca?OpenDocument&f=1&bm=2
You could also verify this by opening the Network tab in firebug and clicking the link.
Therefore, in order to scrape the details page you'd need to
Parse the id passed to subOpen
Make a GET call to '0/'
Parse the request response
Looking the request response in firebug's Network Tab shows that probably you'll need to do similar parsing to actually get the showed contents, but I haven't looked deep into it.
UPDATE
The importHTML function is not suitable for the kind of scraping you want. Google's HTML or Content Services are better suited for this. You'll need to create a web app and implement the doGet function:
function doGet(e){
var base = 'http://www.ediktsdatei.justiz.gv.at/edikte/ex/exedi3.nsf/'
var feed = UrlFetchApp.fetch(base + 'suche?OpenForm&subf=e&query=%28%5BVKat%5D%3DEH%20%7C%20%5BVKat%5D%3DZH%20%7C%20%5BVKat%5D%3DMH%20%7C%20%5BVKat%5D%3DMW%20%7C%20%5BVKat%5D%3DMSH%20%7C%20%5BVKat%5D%3DGGH%20%7C%20%5BVKat%5D%3DRH%20%7C%20%5BVKat%5D%3DHAN%20%7C%20%5BVKat%5D%3DWE%20%7C%20%5BVKat%5D%3DEW%20%7C%20%5BVKat%5D%3DMAI%20%7C%20%5BVKat%5D%3DDTW%20%7C%20%5BVKat%5D%3DDGW%20%7C%20%5BVKat%5D%3DGA%20%7C%20%5BVKat%5D%3DGW%20%7C%20%5BVKat%5D%3DUL%20%7C%20%5BVKat%5D%3DBBL%20%7C%20%5BVKat%5D%3DLF%20%7C%20%5BVKat%5D%3DGL%20%7C%20%5BVKat%5D%3DSE%20%7C%20%5BVKat%5D%3DSO%29%20AND%20%5BBL%5D%3D0').getContentText();
var response = "";
var match = feed.match(/subOpen\('.*?'\)/g)
if(match){
for(var i = 0; i < match.length;i++){
var m = match[i].match(/\('(.*)'\)/);
if(m && m.length > 1){
var detailText = UrlFetchApp.fetch(base + '0/'+m[1]);
response += //dosomething with detail text
//and concatenate in the response
}
}
}
return ContentService.createTextOutput(response);
}

How to store a clicked URL as a variable to use in an if statement?

Here is my issue. I want window.open(TargetLink1[0].href); to only be activated if the element alertboxHeader does not exist, TargetLink1 is true and on only the page that was opened when I clicked a link. I have successfully done the first two and the issue is with getting, storing or checking for the right url, I don't know where the issue is. This is my code. The URL clicked would as have to be able to be changed if a new URL is clicked.
var varurl;
var TargetLink1 = $("a:contains('Accept')")
if ((!document.getElementById('alertboxHeader') && (TargetLink1.length) && (window.location.href.indexOf("" + varurl + "") > -1) )) {
window.open(TargetLink1[0].href);
}
function storeurl() {
var varurl = document.URL;
}
document.onclick = storeurl;
I think what you want is something like:
var validSource = (document.referrer !== "") ? (document.location.href.indexOf(document.referrer) == 0) : false;
But be aware that the above compares the document.referrer URL to the current URL as two strings, so that if your referrer were:
http://example.org?q=test
and the current URL (the link they followed) is:
http://example.org/1
it would handle it as not matching because of the query string in the referrer URL.
Here's a better way to handle it, using the URL object prototype (which is not necessarily supported in all browsers, but works in Chrome and FF):
var referrerOrigin = new URL(document.referrer).origin;
var currentOrigin = document.location.origin;
var validSource = ( referrerOrigin == currentOrigin );
The problem is here: document.onclick = storeurl; You should give any id from the document.For Example:
document.getElementById("IdHere").onclick = storeurl;

document generation only works the first time

I'm using openxml in my HTML5 mobile app to generate word documents on the mobile device.
In general openxml works fine and straight forward, but I'm struggling with an annyoing problem.
The document generation only works the first time after I've started the app. This time I can open and view the document. Restart the app means:
- Redeploy from development machine
- Removing the app from the task pane (pushing aside; I assume the app is removed then?)
The second time I get the message the document is corrupted and I'm unable to view the file
UPDATE:
I can't reproduce this behaviour when I'm running the app connected to the remote debugger without having a breakpoint set. Doing it this way I always get a working document.
I doesn't make a difference wether I do any changes on the document or not. Simply open and saving reproduce this error.
After doing some research I've found that structure of the docx.zip file of the working and the corrupt file is the same. They also have the same file length. But in the corrupt docx there are some files I've found some files having a wrong/invalid CRC. See here an example when trying to get a corrupt file out of the zip. Other files are working as expected.
The properties for this file are->
(CRC in a working version is: 44D3906C)
Code for processing the doc-template:
/*
* Process the template
*/
function processTemplate(doc64, callback)
{
"use strict";
console.log("PROCESS TEMPLATE");
var XAttribute = Ltxml.XAttribute;
var XCData = Ltxml.XCData;
var XComment = Ltxml.XComment;
var XContainer = Ltxml.XContainer;
var XDeclaration = Ltxml.XDeclaration;
var XDocument = Ltxml.XDocument;
var XElement = Ltxml.XElement;
var XName = Ltxml.XName;
var XNamespace = Ltxml.XNamespace;
var XNode = Ltxml.XNode;
var XObject = Ltxml.XObject;
var XProcessingInstruction = Ltxml.XProcessingInstruction;
var XText = Ltxml.XText;
var XEntity = Ltxml.XEntity;
var cast = Ltxml.cast;
var castInt = Ltxml.castInt;
var W = openXml.W;
var NN = openXml.NoNamespace;
var wNs = openXml.wNs;
var doc = new openXml.OpenXmlPackage(doc64);
// add a paragraph to the beginning of the document.
var body = doc.mainDocumentPart().getXDocument().root.element(W.body);
var tpl_row = ((doc.mainDocumentPart().getXDocument().descendants(W.tbl)).elementAt(1).descendants(W.tr)).elementAt(2);
var newrow = new XElement(tpl_row);
doc.mainDocumentPart().getXDocument().descendants(W.tbl).elementAt(1).add(newrow);
// callback(doc);
var mod_file = null;
var newfile;
var path;
if (doc != null && doc != undefined ) {
mod_file = doc.saveToBlob();
// Start writing document
path = "Templates";
newfile = "Templates/Bau.docx";
console.log("WRITE TEMPLATE DOCUMENT");
fs.root.getFile("Templates/" + "MyGenerated.docx", {create: true, exclusive: false},
function(fileEntry)
{
fileEntry.createWriter(
function(fileWriter)
{
fileWriter.onwriteend = function(e) {
console.log("TEMPLATE DOCUMENT WRITTEN:"+e.target.length);
};
fileWriter.onerror = function(e) {
console.log("ERROR writing DOCUMENT:" + e.code + ";" + e.message);
};
var blobreader = new FileReader();
blobreader.onloadend = function()
{
fileWriter.write(blobreader.result); // reader.result contains the contents of blob as a typed array
};
blobreader.readAsArrayBuffer(mod_file);
},
null);
}, null);
};
Any ideas what I'm doing wrong?
Thanks for posting about the error. There were some issues with jszip.js that I encountered when I was developing the Open XML SDK for JavaScript.
At the following link, there is a sample javascript app that demonstrates generating a document.
Open XML SDK for JavaScript Demo
In that app you can save multiple DOCXs, one after another, and they are not corrupted.
In order to work on this issue, I need to be able to re-produce locally. Maybe you can take that little working web app and replace parts with your parts until it is generating invalid files?
Cheers, Eric
P.S. I am traveling and have intermittent access to internet. If you can continue the thread on OpenXmlDeveloper.org, then it will help me to answer quicker. :-)
What made it work for me, was changing the way of adding images (Parts) to the document. I was using the type "binary" for adding images to document. I changed this to "base64"
So I changed the source from:
mydoc.addPart( "/word/"+reltarget, openXml.contentTypes.png, "binary", fotodata ); // add Image Part to doc
to:
mydoc.addPart( "/word/"+reltarget, openXml.contentTypes.png, "base64", window.btoa(fotodata) ); // add Image Part to doc

Jquery Redirect Based on URL location

This is what I'm trying to solve for...
Only if the URL explicitly contains /foldername/index.htm && /foldername/ on mydomain.com then redirect to http://www.example.com
Should the URL contain any URL parameter /foldername/index.htm?example it should not redirect
All other URLs should not redirect
This is my javascript which is incomplete, but is ultimately what I'm trying to solve for...
var locaz=""+window.location;
if (locaz.indexOf("mydomain.com") >= 0) {
var relLoc = [
["/foldername/index.htm"],
["/foldername/"]
];
window.location = "http://www.example.com";
}
This is for the purpose to manage a URL that some users are hitting based on a particular way like a bookmark. Without removing the page, we want to monitor how many people are hitting the page before we take further action.
Won't the page always be on the same domain, also if the url contains /foldername/pagename.htm won't it also already include /foldername? So an && check there would be redundant.
Try the below code.
var path = window.location.pathname;
if ( (path === '/foldername' || path === '/foldername/index.html') && !window.location.search ) {
alert('should redirect');
} else {
alert('should not redirect');
}
var url = window.location;
var regexDomain = /mydomain\.com\/[a-zA-Z0-9_\-]*\/[a-zA-Z0-9_\-]*[\/\.a-z]*$/
if(regexDomain.test(url)) {
window.location = "http://www.example.com";
}
Familiarize with the location object. It provides pathname, search and hostname as attributes, sparing you the RegExp hassle (you'd most like get wrong anyways). You're looking for something along the lines of:
// no redirect if there is a query string
var redirect = !window.location.search
// only redirect if this is run on mydomain.com or on of its sub-domains
&& window.location.hostname.match(/(?:^|\.)mydomain\.com$/)
// only redirect if path is /foldername/ or /foldername/index.html
&& (window.location.pathname === '/foldername/' || window.location.pathname === '/foldername/index.html');
if (redirect) {
alert('boom');
}

Categories

Resources