I would like to scrape for each link on this page the page details page behind.
I can get all informations on this page: PAGE
However, I would like to get all info's on the details page, but the href link looks like that, for example:
href="javascript:subOpen('9ca8ed0fae15d43dc1257e7300345b99')"
Here is my sample spreadsheet using the ImportHTML function to get the general overview.
Google Spreadsheet
Any suggestions how to get the details pages?
UPDATE
I implemented the method the following:
function doGet(e){
var base = 'http://www.ediktsdatei.justiz.gv.at/edikte/ex/exedi3.nsf/'
var feed = UrlFetchApp.fetch(base + 'suche?OpenForm&subf=e&query=%28%5BVKat%5D%3DEH%20%7C%20%5BVKat%5D%3DZH%20%7C%20%5BVKat%5D%3DMH%20%7C%20%5BVKat%5D%3DMW%20%7C%20%5BVKat%5D%3DMSH%20%7C%20%5BVKat%5D%3DGGH%20%7C%20%5BVKat%5D%3DRH%20%7C%20%5BVKat%5D%3DHAN%20%7C%20%5BVKat%5D%3DWE%20%7C%20%5BVKat%5D%3DEW%20%7C%20%5BVKat%5D%3DMAI%20%7C%20%5BVKat%5D%3DDTW%20%7C%20%5BVKat%5D%3DDGW%20%7C%20%5BVKat%5D%3DGA%20%7C%20%5BVKat%5D%3DGW%20%7C%20%5BVKat%5D%3DUL%20%7C%20%5BVKat%5D%3DBBL%20%7C%20%5BVKat%5D%3DLF%20%7C%20%5BVKat%5D%3DGL%20%7C%20%5BVKat%5D%3DSE%20%7C%20%5BVKat%5D%3DSO%29%20AND%20%5BBL%5D%3D0').getContentText();
var d = document.createElement('div'); //assuming you can do this
d.innerHTML = feed;//make the text a dom structure
var arr = d.getElementsByTagName('a') //iterate over the page links
var response = "";
for(var i = 0;i<arr.length;i++){
var atr = arr[i].getAttribute('onclick');
if(atr) atr = atr.match(/subOpen\((.*?)\)/) //if onclick calls subOpen
if(atr && atr.length > 1){ //get the id
var detail = UrlFetchApp.fetch(base + '0/'+atr[1]).getContentText();
response += detail//process the relevant part of the content and append to the reposnse text
}
}
return ContentService.createTextOutput(response);
}
However, I get an error when running the method:
ReferenceError: "document" is not defined. (line 6, file "")
What is the document an object of?
I have update the Google Spreadsheet with a webapp.
You can use Firebug in order to inspect the page contents and javascript. For instance you can find that subOpen is actually an alias to subOpenXML declared in xmlhttp01.js.
function subOpenXML(unid) {/*open found doc from search view*/
if (waiting) return alert(bittewar);
var wState = dynDoc.getElementById('windowState');
wState.value = 'H';/*httpreq pending*/
var last = '';
if (unid==docLinks[0]) {last += '&f=1'; thisdocnum = 1;}
if (unid==docLinks[docLinks.length-1]) {
last += '&l=1';
thisdocnum = docLinks.length;
} else {
for (var i=1;i<docLinks.length-1;i++)
if (unid==docLinks[i]) {thisdocnum = i+1; break;}
}
var url = unid + html_delim + 'OpenDocument'+last + '&bm=2';
httpreq.open('GET', // &rand=' + Math.random();
/*'/edikte/test/ex/exedi31.nsf/0/'+*/ '0/'+url, true);
httpreq.onreadystatechange=onreadystatechange;
// httpreq.setRequestHeader('Accept','text/xml');
httpreq.send(null);
waiting = true;
title2src = firstTextChild(dynDoc.getElementById('title2')).nodeValue;
}
So, after copying the function source and modifying it in firebug's Console tab to add a console.log(url) before the http call, like this:
var url = unid + html_delim + 'OpenDocument'+last + '&bm=2';
console.log(url)
httpreq.open('GET', // &rand=' + Math.random();
/*'/edikte/test/ex/exedi31.nsf/0/'+*/ '0/'+url, true);
You can execute the function declaration in firebug's Console tab and overwrite subOpen with the modified source.
Clickin in the link then will show that the invoked url is composed of the id passed as parameter to subOpen prefixed by '0/', so in the example you posted it would be a GET to:
http://www.ediktsdatei.justiz.gv.at/edikte/ex/exedi3.nsf/0/1fd2313c2e0095bfc1257e49004170ca?OpenDocument&f=1&bm=2
You could also verify this by opening the Network tab in firebug and clicking the link.
Therefore, in order to scrape the details page you'd need to
Parse the id passed to subOpen
Make a GET call to '0/'
Parse the request response
Looking the request response in firebug's Network Tab shows that probably you'll need to do similar parsing to actually get the showed contents, but I haven't looked deep into it.
UPDATE
The importHTML function is not suitable for the kind of scraping you want. Google's HTML or Content Services are better suited for this. You'll need to create a web app and implement the doGet function:
function doGet(e){
var base = 'http://www.ediktsdatei.justiz.gv.at/edikte/ex/exedi3.nsf/'
var feed = UrlFetchApp.fetch(base + 'suche?OpenForm&subf=e&query=%28%5BVKat%5D%3DEH%20%7C%20%5BVKat%5D%3DZH%20%7C%20%5BVKat%5D%3DMH%20%7C%20%5BVKat%5D%3DMW%20%7C%20%5BVKat%5D%3DMSH%20%7C%20%5BVKat%5D%3DGGH%20%7C%20%5BVKat%5D%3DRH%20%7C%20%5BVKat%5D%3DHAN%20%7C%20%5BVKat%5D%3DWE%20%7C%20%5BVKat%5D%3DEW%20%7C%20%5BVKat%5D%3DMAI%20%7C%20%5BVKat%5D%3DDTW%20%7C%20%5BVKat%5D%3DDGW%20%7C%20%5BVKat%5D%3DGA%20%7C%20%5BVKat%5D%3DGW%20%7C%20%5BVKat%5D%3DUL%20%7C%20%5BVKat%5D%3DBBL%20%7C%20%5BVKat%5D%3DLF%20%7C%20%5BVKat%5D%3DGL%20%7C%20%5BVKat%5D%3DSE%20%7C%20%5BVKat%5D%3DSO%29%20AND%20%5BBL%5D%3D0').getContentText();
var response = "";
var match = feed.match(/subOpen\('.*?'\)/g)
if(match){
for(var i = 0; i < match.length;i++){
var m = match[i].match(/\('(.*)'\)/);
if(m && m.length > 1){
var detailText = UrlFetchApp.fetch(base + '0/'+m[1]);
response += //dosomething with detail text
//and concatenate in the response
}
}
}
return ContentService.createTextOutput(response);
}
So I want to use ajax request and I know how to use it.
But problem that i had that I want to pass parameters to request. So My first page had 4 parameter then I build url like this,
var url = "./ControllerServlet?PAGE_ID=BPCLA&ACTION=closeAssessment&SAVE_FLAG=true&closeReason="+closeReasonStr+"&closeCmt="+closeCmt;
but now parameter is increasing like now I have 20 more. So now building url like this going to be messy approach. Is there a better way to do this.
Here is my function where i am building URL in javascript function.
function closeAssessment() {
var closeReason = document.getElementById("SectionClousureReason");
var closeReasonStr = closeReason.options[closeReason.selectedIndex].value;
var closeCmt=document.getElementById("SectionCloseAssessmentCmt").value;
var url = "./ControllerServlet?PAGE_ID=BPCLA&ACTION=closeAssessment&SAVE_FLAG=true&closeReason="+closeReasonStr+"&closeCmt="+closeCmt;
ajaxRequest(url);
return;
}
edit:
As you ask here is my ajaxRequest function,
function ajaxRequest(url) {
strURL = url;
var xmlHttpRequest = false;
var self = this;
// Mozilla, Safari
if (window.XMLHttpRequest) {
self.xmlHttpRequest = new XMLHttpRequest();
} else if (window.ActiveXObject) { // IE
self.xmlHttpRequest = new ActiveXObject("Microsoft.XMLHTTP");
}
self.xmlHttpRequest.open("POST", strURL, true);
self.xmlHttpRequest.setRequestHeader("Content-Type", "application/x-www-form-urlencoded");
self.xmlHttpRequest.onreadystatechange = function() {
if (self.xmlHttpRequest.readyState == 4) {
if (self.xmlHttpRequest.status == 200) {
var htmlString = self.xmlHttpRequest.responseText;
var parser = new DOMParser();
var responseDoc = parser.parseFromString(htmlString, "text/html");
window.close();
} else {
ajaxFailedCount++;
// Try for 1 min (temp fix for racing condition)
if (ajaxFailedCount < 1200) {window.setTimeout(function() {ajaxRequest(url)}, 50);}
else {alert("Refresh failed!")};
}
}
}
self.xmlHttpRequest.send(null);
}
You could make an object with the key/value pairs being what you want added to the URL.
var closeReason = document.getElementById("SectionClousureReason");
var params = {
PAGE_ID: 'BPCLA',
ACTION: 'closeAssessment',
SAVE_FLAG: 'true',
closeReasonStr: closeReason.options[closeReason.selectedIndex].value,
closeCmt: document.getElementById("SectionCloseAssessmentCmt").value
};
Then add them to the URL via a loop.
var url = "./ControllerServlet?";
var urlParams = Object.keys(params).map(function(key){
return encodeURIComponent(key) + '=' + encodeURIComponent(params[key]);
}).join('&');
url += urlParams;
ajaxRequest(url);
Note: I added encodeURIComponent just to be safe.
EDIT: From your comment, it seems you want to submit a <form> but you want to use AJAX to do so. In that case, you can loop over the form elements and build the above params object.
var params = {
PAGE_ID: 'BPCLA',
ACTION: 'closeAssessment',
SAVE_FLAG: 'true'
};
var form = document.getElementById('yourForm'),
elem = form.elements;
for(var i = 0, len = elem.length; i < len; i++){
var x = elem[i];
params[x.name] = x.value;
}
Build up an object of your parameters and put them in the uri through a loop like this:
var values= {
page_id: 'BPCLA',
action: 'test'
},
uri_params = [],
uri = 'http://yoururl.com/file.php?';
for (var param in values) uri_params.push( encodeURIComponent( param ) + '=' + encodeURIComponent( values[ param ] ) );
uri = uri + uri_params.join( '&' );
console.log( uri );
Or consider using POST to transport your parameters, as many browsers have limitations on the query string.
Edit: you can also build yourself a function which traverses your form and builds up the values object for you so you don't have to do it manually.
Be aware however that anyone can inject custom url paramters simpy by appending form elements before submitting the form (by using the developer tools for example) so keep that in mind.
If you are using jQuery you can use .serializeArray() or have a look at this answer for a possible function you could use.
I am currently working on some javascript that can be included in the header of surveys that use TrueSample, and will dynamically generate and fire Webservice calls for the survey. One of the requirements of Truesample is that after every page, it is sent the amount of time spend on that page, as well as some other arbitrary information generated in the beginning of the survey. I am trying to automate the every page web service call, so that I don't have to have hundreds of web services in every survey.
I am pretty far along, and have found some cool tricks to make this all work, but I am struggling with firing the webservice using javascript.
Here is what I have so far:
Qualtrics.SurveyEngine.addOnload(function()
{
var pageStart = new Date();
var beginning = pageStart.getTime();
// Necessary Variables
var account-id = parseInt("${e://Field/account-id}");
var passcode = parseInt("${e://Field/passcode}");
var survey-country = parseInt("${e://Field/survey-country}");
var end-client-id = parseInt("${e://Field/end-client-id}");
var page-exposure-duration;
var page-id = parseInt("${e://Field/pageID}");
var platform-id = parseInt("${e://Field/platform-id}");
var respondent-id = parseInt("${e://Field/respondent-id}");
var response-id = parseInt("${e://Field/response-id}");
var source-id = parseInt("${e://Field/source-id}");
var survey-id = parseInt("${e://Field/survey-id}");
var api-version = parseInt("${e://Field/api-version}");
//End Variables
var that = this;
that.hideNextButton();
var para = document.createElement("footnote");
var test = document.getElementById("Buttons");
var node = document.createElement('input');
var next = document.getElementById("NextButton");
node.id = "tsButton";
node.type = "button";
node.name = "tsButton";
node.value = " >> ";
node.onclick = function trueSample(){
var pageEnd = new Date();
var end = pageEnd.getTime();
var time = end - beginning;
window.alert(pageID + ", time spent on page = " + time);
Qualtrics.SurveyEngine.setEmbeddedData("pageID", pageID + 1);
new Ajax.Request('webserviceURL', {
parameters: {
account-id: account-id,
passcode: passcode,
survey-country: surveycountry,
end-client-id: end-client-id,
page-exposure-duration: time,
page-id: page-id,
platform-id: platform-id,
respondent-id: respondent-id,
response-id: response-id,
source-id: source-id,
survey-id: survey-id,
api-version: api-version}
});
that.clickNextButton();
};
para.appendChild(node);
test.insertBefore(para, next);
});
Does anyone have experience with firing webservice calls out of Javascript? And if so, do you have any ideas on how to finalize the ajax request and make it work? Or is there another(potentially better) method that I could use for these calls that would work? I understand that there is information on this on Stack Overflow, but I am having a hard time understanding how specific use cases apply to mine.
Also, please note that, while I would love to use JQuery, I am limited to vanilla Javascript, and Prototype.JS.
Using Traditional javascript XmlHttpRequest you can make an AJAX call. For a Webservice, we need couple of HTTP Headers. Like: SOAPAction, Content-Type, Accept. The values for these headers MUST be like below:
SOAPAction:""
Content-Type:text/xml
Accept:text/xml
So, additionally, your code should look something like this for making an AJAX call to the Webservice:
//Get XML Request Object
var request = new XMLHttpRequest();
// Define the URL
var url="http://your.end.point.url?wsdl";
//Define HTTP Method. Always POST for a Webservice
request.open("POST", url, true); // Remember that all the Webservice calls should be POST
//Setting Request Headers
request.setRequestHeader("SOAPAction", "\"\"");//Not sure of the escape sequence. The value should be "".
request.setRequestHeader("Accept","text/xml");
request.setRequestHeader("Content-Type","text/xml");
//Make your AJAX call
request.send(soap); // where soap is you SOAP Request Payload.
Parsing the response:
request.onreadystatechange=stateChanged;
function stateChanged()
{
if (request.status==200)
{
// Success. Parse the SOAP Response
}
if(request.status==500)
{
//Failure. Handle the SOAP Fault
}
}
I would like to give the users in my website the ability to download a "lnk" file.
My idea is to generate this file with to contain an address that can be used only once.
Is there a way to generate this file in javascript?
The flow is something like -
the user presses a button
the javascript generates this file and downloads it to the user's machine
the user sends this file to another user to use this one-time-address from his machine
Is something like this is doable in javascript from the client side? or would i need to generate this file using java server side?
This is a faithful translation of mslink.sh.
I only tested my answer in Windows 8.1, but I would think that it works in older versions of Windows, too.
function create_lnk_blob(lnk_target) {
function hex_to_arr(s) {
var result = Array(s.length / 2);
for (var i = 0; i < result.length; ++i) {
result[i] = +('0x' + s.substr(2*i, 2));
}
return result;
}
function str_to_arr(s) {
var result = Array(s.length);
for (var i = 0; i < s.length; ++i) {
var c = s.charCodeAt(i);
if (c >= 128) {
throw Error("Only ASCII paths are suppored :-(");
}
result[i] = c;
}
return result;
}
function convert_CLSID_to_DATA(s) {
var idx = [[6,2], [4,2], [2,2], [0,2],
[11,2], [9,2], [16,2], [14,2],
[19,4], [24,12]];
var s = idx.map(function (ii) {
return s.substr(ii[0], ii[1]);
});
return hex_to_arr(s.join(''));
}
function gen_IDLIST(s) {
var item_size = (0x10000 + s.length + 2).toString(16).substr(1);
return hex_to_arr(item_size.replace(/(..)(..)/, '$2$1')).concat(s);
}
var HeaderSize = [0x4c, 0x00,0x00,0x00],
LinkCLSID = convert_CLSID_to_DATA("00021401-0000-0000-c000-000000000046"),
LinkFlags = [0x01,0x01,0x00,0x00], // HasLinkTargetIDList ForceNoLinkInfo
FileAttributes_Directory = [0x10,0x00,0x00,0x00],
FileAttributes_File = [0x20,0x00,0x00,0x00],
CreationTime = [0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00],
AccessTime = [0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00],
WriteTime = [0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00],
FileSize = [0x00,0x00,0x00,0x00],
IconIndex = [0x00,0x00,0x00,0x00],
ShowCommand = [0x01,0x00,0x00,0x00], //SW_SHOWNORMAL
Hotkey = [0x00,0x00], // No Hotkey
Reserved = [0x00,0x00],
Reserved2 = [0x00,0x00,0x00,0x00],
Reserved3 = [0x00,0x00,0x00,0x00],
TerminalID = [0x00,0x00],
CLSID_Computer = convert_CLSID_to_DATA("20d04fe0-3aea-1069-a2d8-08002b30309d"),
CLSID_Network = convert_CLSID_to_DATA("208d2c60-3aea-1069-a2d7-08002b30309d"),
PREFIX_LOCAL_ROOT = [0x2f],
PREFIX_FOLDER = [0x31,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00],
PREFIX_FILE = [0x32,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00],
PREFIX_NETWORK_ROOT = [0xc3,0x01,0x81],
PREFIX_NETWORK_PRINTER = [0xc3,0x02,0xc1],
END_OF_STRING = [0x00];
if (/.*\\+$/.test(lnk_target)) {
lnk_target = lnk_target.replace(/\\+$/g, '');
var target_is_folder = true;
}
var prefix_root, item_data, target_root, target_leaf;
if (lnk_target.substr(0, 2) === '\\\\') {
prefix_root = PREFIX_NETWORK_ROOT;
item_data = [0x1f, 0x58].concat(CLSID_Network);
target_root = lnk_target.subtr(lnk_target.lastIndexOf('\\'));
if (/\\\\.*\\.*/.test(lnk_target)) {
target_leaf = lnk_target.substr(lnk_target.lastIndexOf('\\') + 1);
}
if (target_root === '\\') {
target_root = lnk_target;
}
} else {
prefix_root = PREFIX_LOCAL_ROOT;
item_data = [0x1f, 0x50].concat(CLSID_Computer);
target_root = lnk_target.replace(/\\.*$/, '\\');
if (/.*\\.*/.test(lnk_target)) {
target_leaf = lnk_target.replace(/^.*?\\/, '');
}
}
var prefix_of_target, file_attributes;
if (!target_is_folder) {
prefix_of_target = PREFIX_FILE;
file_attributes = FileAttributes_File;
} else {
prefix_of_target = PREFIX_FOLDER;
file_attributes = FileAttributes_Directory;
}
target_root = str_to_arr(target_root);
for (var i = 1; i <= 21; ++i) {
target_root.push(0);
}
var id_list_items = gen_IDLIST(item_data);
id_list_items = id_list_items.concat(
gen_IDLIST(prefix_root.concat(target_root, END_OF_STRING)));
if (target_leaf) {
target_leaf = str_to_arr(target_leaf);
id_list_items = id_list_items.concat(
gen_IDLIST(prefix_of_target.concat(target_leaf, END_OF_STRING)));
}
var id_list = gen_IDLIST(id_list_items);
var data = [].concat(HeaderSize,
LinkCLSID,
LinkFlags,
file_attributes,
CreationTime,
AccessTime,
WriteTime,
FileSize,
IconIndex,
ShowCommand,
Hotkey,
Reserved,
Reserved2,
Reserved3,
id_list,
TerminalID);
return new Blob([new Uint8Array(data)], { type: 'application/x-ms-shortcut' });
}
var blob = create_lnk_blob('C:\\Windows\\System32\\Calc.exe');
Use it like:
var blob_to_file = create_lnk_blob('C:\\Windows\\System32\\Calc.exe');
var blob_to_folder = create_lnk_blob('C:\\Users\\Myself\\Desktop\\'); // with a trailing slash
Demo: http://jsfiddle.net/5cjgLyan/2/
This would be simple if your website allows php.
If your script is part of an html file, just write the the javascript as if you were writing it to send a static lnk file. Then, at the lnk address part, break apart the javascript into two parts, breaking into html. Then at that point, put in
<?php /*PHP code set a variable *? /* PHP code to generate proper string*/ PRINT /*PHP variable*/
?>
I think make it pure client is impossible.
Even the web rtc protocol need at least one iceServer to signal other client.
And I think the easiest way to do that is use http://peerjs.com/
you could first create a clinet token of the room owner
//room owner side
peer.on('open', function(my_peer_id) {
console.log('My peer ID is: ' + my_peer_id);
});
And send the token to any other you want (by text file, web chat ...etc)
Then other connect it use the token above
//the other one
var conn = peer.connect(other_peer_id);
After the room owner detected someone entered the room.
Disconnect from signal server, so the token will become unusable
//room owner side
peer.disconnect()
About generate and read file by client side, I recommend you read article below.
http://www.html5rocks.com/en/tutorials/file/dndfiles/ read from file
How to use filesaver.js save as file
I believe the compatibility of fileReader api and blob doesn't matter.
Since there will never be a browser which support webrtc but not support fileReader api
Sorry if this is a noob question, network admin unknowingly turned into web developer :) I am trying to understand how to get the current sessid and put it into the javascript where sessid= (current sessid), its on the web address and is generated when you visit the search page. ex: http://www.southerntiredirect.com/shop/catalog/search?sessid=uUQgRHQyekRGJcyWwTFwf5hxep7cdYlV4CdKfunmjxNOQPEgDZdJD2tNgRsD7Prm&shop_param=
<script language="JavaScript">
var url= "http://www.southerntiredirect.com/online/system/ajax_search_manufacturer?sessid=????????";
</script><script type="text/javascript" src="http://www.southerntiredirect.com/online/templatemedia/all_lang/manufacturer.js"></script><input type="hidden" name="sessid" value="sessid??????">
Use my handy-dandy library URLTools!
Library
//URLTools- a tiny js library for accessing parts of the url
function urlAnalyze(url) {
if (url == undefined) {var url = document.location.toString();}
url = url.replace(/\%20/g," ");
//seperates the parts of the url
var parts = url.split("?");
//splits into sperate key=values
if (parts[1] == undefined) {return 1;}
var keyValues = parts[1].split("&");
var key = function () {}
keyValues.forEach(function (keyValue) {
var keyAndValue = keyValue.split("=");
key[keyAndValue[0]] = keyAndValue[1];
});
return key;
}
Then, just call URLAnalyze and get the sessid key.
Usage
var urlKeys = urlAnalyze(),
sessid = urlKeys["sessid"];
here is a great function that grabs whatever you want and returns the key, value for it.
The main portion of this function gets the url using window.location.href and then performs a regular expression on it to find botht he key and the value.
I DO NOT TAKE CREDIT FOR THIS CODE.
Please go the link to see the full example
function getUrlVars() {
var vars = {};
var parts = window.location.href.replace(
/[?&]+([^=&]+)=([^&]*)/gi,
function(m,key,value) {
vars[key] = value;
});
return vars;
}
You could use a simple regexp:
var url = "http://www.southerntiredirect.com/shop/catalog/search?sessid=uUQgRHQyekRGJcyWwTFwf5hxep7cdYlV4CdKfunmjxNOQPEgDZdJD2tNgRsD7Prm&shop_param=";
var match = url.match(/sessid=([^&]+)/);
if (match === null) {
throw new Error("now what? D:");
}
var sessid = match[1];
The regexp in English: look for "sessid=" then capture anything that isn't an &