how to save files in "while" with phantomjs?

how to save files in "while" with phantomjs? - javascript

I need to save 4 files in html output.
here is the code in phantomjs:
var i = 0;
while (i<4)
{
var page = require('webpage').create();
var fs = {};
fs = require('fs');
if(i==0)
{
var url = 'http://www.lamoda.ru/shoes/dutiki-i-lunohody/?sitelink=leftmenu&sf=16&rdr565=1#sf=16';
} else {
var url = 'http://www.lamoda.ru/shoes/dutiki-i-lunohody/?sitelink=leftmenu&sf=16&rdr565=1#sf=16&p='+i;
}
page.open(url, function (status) {
var js = page.evaluate(function () {
return document;
});
console.log(js.all[0].outerHTML);
page.render('export'+i+'.png');
fs.write(i+'.html', js.all[0].outerHTML, 'w');
phantom.exit();
});
i++;
}
It seems that I need to change the FS variable, but I don't know how... I don't need create fs1,fs2,fs3,fs4... I need to find you the better solution, hope you will help, thank you)

Is it okay if your requests are serial, so page 2 is not requested until page 1 has returned? If so I recommend you base your code of this multi-url sample in the documentation.
If you want the requests to run in parallel then you need to use a JavaScript closure to protect the local variables (see https://stackoverflow.com/a/17619716/841830 for an example of how to do that). Once you are doing that you can then either parse "url" to find out if it ends in p=1, p=2, etc. Or assign i inside the page object, and access it with this.i.

Related

How to create an app that opens external URL then scroll the page down?

I want to create an app using nodejs & Gulpjs that opens a specific URL then scroll the page of the URL down to the end of the page, is that possible ?
Here is my code inside gulpfile.js
const {series, src, dest} = require('gulp');
var fs = require('fs');
var path = require('path');
var open =require('open');
var https = require('https');
async function getLinks(params) {
var pageLink = 'https://youtube.com';
var links = [];
open(pageLink, {app: 'chrome'});
https.get(pageLink, (res) => {
let rawHtml = '';
res.on('data', (chunk) => { rawHtml += chunk; });
res.on('end', () => {
try {
console.log(rawHtml);
} catch (e) {
console.error(e.message);
}
});
});
}
exports.default = getLinks;
I would be thankful for some help!

I do not believe that without the use of extensions or some external program exactly what you are looking for is currently possible, but there are some potential alternatives that may help you accomplish some of what you are aiming to do.
ID's
If the page you are linking to has ID's for the element you want to link to you can append that to the end of URL. For example if you wanted to link to an element with the id of #pricing your link would look like this:
https://example.com#pricing
Obviously this is only useful on some pages, and only with elements that have ID's.
Text Fragments
These are slightly closer to what you may be looking for in that it allows you to link to anywhere on a page, regardless of whether the element has an ID or not. Here is an example of how you would link to the More information... text on example.com:
http://example.com/#:~:text=more%20information...
Unfortunately this still has some cons, chiefly in the browser support arena. According to caniuse only Chromium browsers currently supports the feature. (That is 71% of users though).

How do you store links outside Javascript?

I am trying to create a website that takes you to a random website. However, I couldn't really find any answers to store the links of websites outside javascript, forcing me to store them in the script itself. But this would become an issue in the future when I would need to navigate around the script and it's going to be difficult to work with. What should I do?
example script:
function clicked(){
window.open(links[Math.floor(Math.random()*max)]);
}
var links = [
"linkexample.com",
"linkexample2.com",
"linkexample3.com"
];

Like the last comment said the best way is to store them in your server script to access them with ajax here's an example of Nodejs server script and js browser Script
Nodejs
const express = require("express");
var app = express();
app.post("/links", (req, res) => {
var links = [
"linkexample.com",
"linkexample2.com",
"linkexample3.com"
];
links = JSON.stringify(links);
res.setHeader("Content-Type", "application/json; charset=utf-8");
res.send(links)
});
app.listen(3000, "localhost");
JS Browser Script
var xhr = new XMLHttpRequest();
xhr.onloadend = function(e) {
//You can see your links array in the console
//You can also use JSON.parse() to parse the array in you script and use it
console.log(e.currentTarget.responseText)
}
xhr.open("POST", "http://localhost:3000/links", true)
xhr.send()

thanks, everyone for answering. I managed to figure out my own solution which was to create an additional javascript file just for the links then get the value from there
external.js
var links = [
"linkexample.com",
"linkexample2.com",
"linkexample3.com"
];
main.js
function clicked(){
window.open(links[Math.floor(Math.random()* links.length)]);
}
I changed the max variable since it wasn't working for some reasons but at least it works perfectly now :D

List Files on a server via front-end javascript

A have a folder filled with files accessible to the end user, and am working on a javascript file to parse through them and deliver them as needed. However, rather than manually updating the list, I'd like the javascript to scan the folder and then list iterate through an array of the files in that folder. Is there a decent way in front-end JS to do this? All solutions I've looked into have turned out to be purely for Node.
For example, say I have a folder structure like so...
/ (Web Root)
|__ /Build_a_card
|__ /Cool pictures
|__ /Summer Pictures
summer_dog.gif
smiling_sun.svg
|__ /Winter Pictures
snowman.png
cat.jpg
And then in the javascript I'd run something like
var image_list = get_list("/Cool Pictures");
build_select_list(image_list);
function get_list(folder_to_look_in){
var the_list = ???
return the_list;
}
...
And then, for example, the JS is run, and after some parsing, the user would see...
<select>
<option value="summer_pictures/summer_dog.gif">summer_dog.gif</option>
<option value="summer_pictures/smiling_sun.svg">smiling_sun.svg</option>
<option value="winter_pictures/snowman.png">snowman.png</option>
<option value="cat.jpg">cat.jpg</option>
</select>
In an insane world, since the individual files in the folder are accessible to javascript, hypothetically I could brute-force every single possible file name in the folder and return success on each one:
function get_list(folder){
var list_of_files = {};
var starting_character = 0;
list_of_files = every_single_option({starting_character}, 0, 40, folder)
}
}
function every_single_option(existing_characters, current_depth, max_depth, folder){
this_string = String.fromCharCode(existing_characters);
if (request_url(this_string, folder)){
there_array[this_string] = this_string;
}
var there_array = {}
var i;
if (current_depth < max_depth){
while (i < 127){
let temp_array = there_array;
temp_array[i] = i;
mix_source(there_array, every_single_option(existing_characters, current_depth + 1, max_depth, folder))
}
}
return there_array;
}
function request_url(url, folder){
var oReq = new XMLHttpRequest();
oReq.addEventListener("load", reqListener);
oReq.open("GET", "/" + folder + "/" + url);
oReq.send();
}
function mix(source, target) {
for(var key in source) {
if (source.hasOwnProperty(key)) {
target[key] = source[key];
}
}
}
but as mentioned, doing it that way would be insane (both ridiculously slow and very bad code design, resorting to brute-forcing your own website is just dumb.)
but it does hypothetically prove that there's no reason javascript shouldn't be able to just get a directory listing assuming public permissions. Alternatively, I could make some API with the backend that allows fetching a JSON that lists it, but that's requiring backend code for something that's a frontend process. I'm trying to pull this off with something sane and simple, but the question is... how?
(If you insist on posting a jquery way to do this, please also post a non-jquery way as well as there is no jquery available in my environment.)

So, refusing to admit it's impossible, I engineered a solution that works, and requires no API.
That said, the server has to not be actively blocking the javascript from viewing the directory. In other words, the server hasn't turned indexing off, and the directory doesn't have an index.html or equivalent to rewrite any attempt to index, and the server isn't doing some url-rewriting. In other words, this should work in any server environment that doesn't rewrite or block indexes.
Here's a rough draft (still buggy, needs finished):
var request = new XMLHttpRequest();
request.open('GET', '/my/directory/', true);
request.onload = function() {
if (request.status >= 200 && request.status < 400) {
// Success!
var resp = request.responseText;
}
};
request.send();
var directory_listing = resp;
var regexp = /\b((?:[a-z][\w-]+:(?:\/{1,3}|[a-z0-9%])|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}\/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'".,<>?«»“”‘’]))/i;
var match, files = [];
while ((match = regexp.exec(resp)) != null) {
files.push(match.index);
}
console.log(files);

Building off lilHar's answer, we can use DOMParser to create a shadow-DOM for the directory page we're accessing, and then use that to find any links we need:
// relative path to the desired directory
const directory = "/DIRECTORY-NAME/";
// selector for the relevant links in the directory's index page
const selector = "LINK SELECTOR";
const request = new XMLHttpRequest();
request.open("GET", directory, true);
request.onload = () => {
// succesful response
if(request.status >= 200 && request.status < 400)
{
// create DOM from response HTML
const doc = new DOMParser().parseFromString(request.responseText, "text/html");
// get all links
const links = doc.querySelectorAll(selector);
console.log("Links:", links);
links.forEach(link => {
// do stuff with the links
});
}
};
request.send();

Is there a decent way in front-end JS to do this?
No. Nor is that a way that isn't decent.
The front end can communicate with the server via HTTP or WebSockets.
Neither of those provides any built-in mechanism for exploring a filesystem.
You need the server to provide an API (e.g. a web service) which provides the information you want.

Can't pass array items to function in PhantomJS

I am trying to pull the source code to several webpages at once. The links are fed into the array via a source text file. I am able to iterate through the array and print out the links and confirm they are there, but when trying to pass them through a function, they become undefined after the first iteration.
My ultimate goal is to have it save the source of each page to its own document. It does the first page correctly, but subsequent attempts are undefined. I've searched for hours but would appreciate it if someone could point me in the right direction.
var fs = require('fs');
var pageContent = fs.read('input.txt');
var arrdata = pageContent.split(/[\n]/);
var system = require('system');
var page = require('webpage').create();
var args = system.args;
var imagelink;
var content = " ";
function handle_page(file, imagelink){
page.open(file,function(){
var js = page.evaluate(function (){
return document;
});
fs.write(imagelink, page.content, 'w');
setTimeout(next_page(),500);
});
}
function next_page(imagelink){
var file = imagelink;
if(!file){phantom.exit(0);}
handle_page(file, imagelink);
}
for(var i in arrdata){
next_page(arrdata[i]);
}
I realize now that having that the for loop will only iterate once, then the other two functions make their own loop, so that makes sense, but still having issues getting it running.

PhantomJS's page.open() is asynchronous (that's why there is a callback). The other thing is that page.open() is a long operation. If two such calls are made the second will overwrite the first one, because you're operating on the same page object.
The best way would be to use recursion:
function handle_page(i){
if (arrdata.length === i) {
phantom.exit();
return;
}
var imageLink = arrdata[i];
page.open(imageLink, function(){
fs.write("file_"+i+".html", page.content, 'w');
handle_page(i+1);
});
}
handle_page(0);
Couple of other things:
setTimeout(next_page(),500); immediately invokes next_page() without waiting. You wanted setTimeout(next_page, 500);, but then it also wouldn't work, because without an argument next_page simply exits.
fs.write(imagelink, page.content, 'w') that imagelink is probably a URL in which case, you probably want to define another way to devise a filename.
While for(var i in arrdata){ next_page(arrdata[i]); } works here be aware that this doesn't work on all arrays and array-like objects. Use dumb for loops like for(var i = 0; i < length; i++) or array.forEach(function(item, index){...}) if it is available.
page.evaluate() is sandboxed and provides access to the DOM, but everything that is not JSON serializable cannot be passed out of it. You will have to put that into a serializable format before passing it out of evaluate().

pdf.js failing on getDocument

browser: Chrome
environment: grails app localhost
I'm running a grails app on local host (which i know there's an issue with pdf.js and local file system) and instead of using a file: url which i know would fail i'm passing in a typed javascript array and it's still failing. To be correct it's not telling me anything but "Warning: Setting up fake worker." and then it does nothing.
this.base64ToBinary = function(dataURI) {
var BASE64_MARKER = ';base64,';
var base64Index = dataURI.indexOf(BASE64_MARKER) + BASE64_MARKER.length;
var base64 = dataURI.substring(base64Index);
var raw = window.atob(base64);
var rawLength = raw.length;
var array = new Uint8Array(new ArrayBuffer(rawLength));
for(i = 0; i < rawLength; i++) {
array[i] = raw.charCodeAt(i);
}
return array;
};
PDFJS.disableWorker = true; // due to CORS
// I convert some base64 data to binary data here which comes back correctly
var data = utilities.base64ToBinary(result);
PDFJS.getDocument(data).then(function (pdf) {
//nothing console logs or reaches here
console.log(pdf);
}).catch(function(error){
//no error message is logged either
console.log("Error occurred", error);
});
I'm wondering if I just don't have it set up correctly? Can I use this library purely on the client side by just including pdf.js or do I need to include viewer.js too? and also i noticed compatibility file... the set up isn't very clear and this example works FIDDLE and mine doesn't and I'm not understanding the difference. Also if I use the url supplied in that example it also says the same thing.

I get to answer my own question:
the documentation isn't clear at all. If you don't define PDFJS.workerSrc to point to the correct pdf.worker.js file than in pdf.js it tries to figure out what the correct src path is to the file and load it.
Their method however is pretty sketchy for doing this:
if (!PDFJS.workerSrc && typeof document !== 'undefined') {
// workerSrc is not set -- using last script url to define default location
PDFJS.workerSrc = (function () {
'use strict';
var scriptTagContainer = document.body ||
document.getElementsByTagName('head')[0];
var pdfjsSrc = scriptTagContainer.lastChild.src;
return pdfjsSrc && pdfjsSrc.replace(/\.js$/i, '.worker.js');
})();
}
They only grab the last script tag in the head and assume that that is the right src to load the file instead of searching all the script tags for the src that contains "pdf.js" and using that as the correct one.
Instead they should just make it clear and require that you do in fact point PDFJS.workerSrc = "(your path)/pdf.worker.js"

Here is the short answer : define PDFJS.workerSrc at the begining of your code.
PDFJS.workerSrc = "(your path)/pdf.worker.js"
see the exemple on the documentation : https://mozilla.github.io/pdf.js/examples/#interactive-examples

Develop Reference

JavaScript is the programming language of the Web.

how to save files in "while" with phantomjs? - javascript

Related

How to create an app that opens external URL then scroll the page down?

How do you store links outside Javascript?

List Files on a server via front-end javascript

Can't pass array items to function in PhantomJS

pdf.js failing on getDocument

Categories

Resources