Web Scraping with Javascript? - javascript

I'm having a hard time figuring out how to scrape this webpage to get this wedding list into my onepager. It doesn't seem complicated at first but when I get into the code, I just can't get any results.
I've tried ygrab.js, which was fairly simple and got me somewhere but then I can't seem to scrape the images and it only prints the output in the console (not much documentation to go on).
$(function() {
var $listResult = $('#list-result');
var kado = [];
var data = [
{
url: 'https://www.kadolog.com/fr/list/liste-de-mariage-laura-julien',
selector: '.kado-not-full',
loop: true,
result: [{
name: 'photo',
find: '.views-field-field-photo',
grab: {
by: 'attr',
value: 'src'
}
},
{
name: 'title',
find: '.views-field-title .field-content',
grab: {
by: 'text',
value: ''
}
},
{
name: 'description',
find: '.views-field-body .field-content',
grab: {
by: 'text',
value: ''
}
},
{
name: 'price',
find: '.price',
grab: {
by: 'text',
value: ''
}
},
{
name: 'remaining',
find: '.topinfo',
grab: {
by: 'text',
value: ''
}
},
{
name: 'link',
find: '.views-field-nothing .field-content .btn',
grab: {
by: 'attr',
value: 'href'
}
},
],
},
];
ygrab(data, function(result){
console.log(JSON.stringify(result, null, 2)); //photos = undefined
});
Then there's Node.js with Request and Cheerio (and I tried Crawler too), but I have no idea how node works.
var request = require("request");
This gives me an error in the console saying require is not defined. Fair enough, I added require.js to the scripts in my page. I got another error ("Uncaught Error: Mismatched anonymous define() module: ...").
My question is this: Is there a simple Javascript way (possibly without involving node?), to scrape the wedding list I'm trying to get? Or maybe a tutorial that resembles what I'm trying to do step by step ?
I'd be truly grateful for any help or advice.

i think your only issue is the img selector.
Change
{
name: 'photo',
find: '.views-field-field-photo',
grab: {
by: 'attr',
value: 'src'
}
},
To this
{
name: 'photo',
find: '.views-field-field-photo .field-content img',
grab: {
by: 'attr',
value: 'src'
}
},
I actually can't test this right now, but it should be working!!

Node.js is a seperate application that executes javascript independent of a web page.
require is Node's way of importing packages, and isn't defined by the browser, require.js is a javascript library for requiring packages, but it doesn't work the same way as Node's require function.
To use request and cheerio, you'd need to install Node.js from here, then install request and cheerio with the following commands:
npm install request --save
npm install cheerio --save
Then any code you write with Node.js in that directory will have access to the modules.
Here's a tutorial to web scraping in Node.js with cheerio.

Related

Inline Js with Vue-Meta In Nuxt.js

Basically, Im using Nuxt 2.9.2, and trying to using innerHTML method to inline a Google Optimize script, but whenever i run npm run generate, the code transforms certain aspects even though __dangerouslyDisableSanitizers is whitelisting innerHTML..
This is my Script in Nuxt Config head object
script: [
{
innerHTML: `(function(a,s,y,n,c,h,i,d,e){s.className+=' '+y;h.start=1*new Date;h.end=i=function(){s.className=s.className.replace(RegExp(' ?'+y),'')};(a[n]=a[n]||[]).hide=h;setTimeout(function(){i();h.end=null},c);h.timeout=c;})(window,document.documentElement,'async-hide','dataLayer', 500 , ${JSON.stringify(
{ [process.env.GOOGLE_OPTIMIZE_ID]: true }
)})`
}
],
__dangerouslyDisableSanitizers: ['innerHTML']
},
Which renders out as the below, tried multiple different ways. could not get it to inline as expected
!function(e,n,t,a,c,s,d){n.className+=" "+t,s.start=1*new Date,s.end=d=function(){n.className=n.className.replace(RegExp(" ?"+t),"")},(e[a]=e[a]||[]).hide=s,setTimeout(function(){d(),s.end=null},500),s.timeout=500}(window,document.documentElement,"async-hide","dataLayer",0,{"GTM-XXXXXX":!0})
should be
(function(a,s,y,n,c,h,i,d,e){s.className+=' '+y;h.start=1*new Date;h.end=i=function(){s.className=s.className.replace(RegExp(' ?'+y),'')};(a[n]=a[n]||[]).hide=h;setTimeout(function(){i();h.end=null},c);h.timeout=c;})(window,document.documentElement,'async-hide','dataLayer', 500 , 'GTM-XXXXXX'': true }
)})
script: [
{
innerHTML: `window.MY_CONST = 'abcd1234'`,
type: 'text/javascript',
charset: 'utf-8',
},
],
__dangerouslyDisableSanitizers: ['script', 'innerHTML'],

bootboxjs invalid prompt type - example from documentation doesn't work

I am trying to use http://bootboxjs.com/examples.html#bb-prompt in my project, so naturally I started by using their example code to see if it runs.
As per the documentation; I am using latest bootstrap, jquery and bootstrap js and then loading the bootboxjs. This is the code I am trying to run:
bootbox.prompt({
title: "This is a prompt with a set of radio inputs!",
message: '<p>Please select an option below:</p>',
inputType: 'radio',
inputOptions: [
{
text: 'Choice One',
value: '1',
},
{
text: 'Choice Two',
value: '2',
},
{
text: 'Choice Three',
value: '3',
}
],
callback: function (result) {
console.log(result);
}
});
When this code executes; I am getting this error:
Uncaught Error: invalid prompt type
This is a pretty sweet library and I'd love to use it in my project; but I am a bit stumped. Any ideas?
I've solved the issue. I had initially used just bootbox.min.js but it turns out I need to use bootbox.locales.min.js also. I should've rtfm...
Also, I use webpack (via laravel-mix) to bundle all the libs into one .js file, so using the bootbox.all.min.js (Production build with locales) as the last thing I load seemed to have helped also.
This is my webpack.mix.js config file that works for me:
const mix = require('laravel-mix');
mix
.copyDirectory('resources/libs/font-awesome/fonts', 'public/fonts')
.styles(
[
// snipped...
],
'public/css/app.css'
)
.scripts(
[
'resources/libs/jquery/jquery-3.4.1.min.js',
'resources/libs/bootstrap/bootstrap.bundle.min.js',
// bunch of other js libs...
'resources/libs/bootbox/bootbox.all.min.js',
],
'public/js/app.js'
);

Bad Request in Discord.js (Node) and cant find out whats causing it

Im coding a bot in Discord.js (Node) and I'm trying to send an embed with the server info, I've got all the code but it keeps causing a Bad Request and I've tried everything I know here's the code:
var FieldsData = [{ name: "Channels", value: msg.guild.channels.size }, { name: "Emojis", value: msg.guild.emojis.size }, { name: "Members", value: msg.guild.members.size }, { name: "Owner", value: msg.guild.owner }, { name: "Roles", value: msg.guild.roles.size }, { name: "Region", value: msg.guild.region }, { name: "Id", value: msg.guild.id }, { name: "Icon", value: msg.guild.iconURL }, { name: "Created At", value: msg.guild.createdAt }];
msg.channel.send('', {
embed: {
color: 37119,
title: "Server info for " + msg.guild.name,
fields: FieldsData
}
});
I've tried the message with just one field and it works,
I've tried it will each field by themselves and it works
but when I put them all together they make a Bad Request,
I've checked every line, every character and I'm just
stumped at what could possibly be causing this,
the max fields is 25 and I don't have that many,
all the variables are valid, none produce 'Null' or 'Undefined',
I've tried different setups of the code layout,
I've tried adding/removing parts, editing parts, replacing bits
here and there but to no avail I cant get it to work at all.
I've been trying to figure this out for 2 hours, I've searched online, docs, etc
Please Note: I'm not that advanced with javascript so if i've made a big mistake then don't be surprised.
"msg" is the object of the message, Example:
Bot.on('message', function (msg) { /*Stuff*/ });
I hope I've explained this enough, I'm using the LATEST version of Discord.js at the time of posting this and I'm not using ANY other extensions, packages, etc
SHORT ANSWER:
Now, don't just ignore this post after I say this (actually read my reasons, the whole thing), but please just use a Rich Embed
LONG ANSWER:
First of all, I strongly suggest using Rich Embeds, as it is easier to play with and edit. Anyways, here:
The first suggestion comes from your message event. In ES6, we now have arrow functions which look like this (arg1, arg2) => {doSomething();}, and using this new feature, your message event handler should look more like this:
client.on('message', msg => {
//Do my thing with that msg object
});
Now back to the point.
Objects are weird k? I believe that this: "Server info for " + msg.guild.name is not allowed. I don't know why, but when I tried to use a variable to display my bot's version, it gave me an error too. So if you want to fix that you have two options:
Recommended: Use Rich Embeds
Not Recommended: Use `${myVar}` instead (Not Tested)
Don't overcomplicate. What is this: ('', You can just do msg.channel.send({embed:{}});
You don't just use variables for the sake of it. What is the point of using FieldsData? It is only used once, and why can't you just do:
msg.channel.send({
embed: {
color: 37119,
title: "Server info for " + msg.guild.name,
fields: [{name: "Channels", value: msg.guild.channels.size}, { name: "Emojis", value: msg.guild.emojis.size }, { name: "Members", value: msg.guild.members.size }, { name: "Owner", value: msg.guild.owner }, { name: "Roles", value: msg.guild.roles.size }, { name: "Region", value: msg.guild.region }, { name: "Id", value: msg.guild.id }, { name: "Icon", value: msg.guild.iconURL }, { name: "Created At", value: msg.guild.createdAt }]
}
});
VERY IMPORTANT NOTE:
Now, you don't give any valid reason why you don't want to use Rich Embeds, because a rich embed is also an object. ;-; So just use a rich embed.
i have scripts that require the use of objects, and they are pre made
I wonder how you get access to your embed if its not stored.... Very interesting.

TypeError: c[a] is undefined in CKEditor

I am loading ckeditor.js file using $.getScript and in callback I am initiating CKEditor. But it is showing an error TypeError: c[a] is undefined. Here is my code. How can I solve this issue?
$.getScript("ckeditor.js", function (data, textStatus, jqxhr) {
if (textStatus == 'success' && jqxhr.status == 200) {
CKEDITOR.replace( 'commentBox',
{
toolbar :
[
{ name: 'basicstyles', items : [ 'Bold','Italic','Underline','Strike','Subscript','Superscript','-','RemoveFormat' ] },
{ name: 'paragraph', items : [ 'NumberedList','BulletedList','-','Blockquote'] },
{ name: 'insert', items : [ 'Table','HorizontalRule','SpecialChar' ] },
{ name: 'styles', items : [ 'Styles','Format','Font','FontSize' ] },
{ name: 'colors', items : [ 'TextColor','BGColor' ] }
]
});
}
});
I was getting the same error in similar circumstances.
I checked the formatted source in Chrome and discovered that this was being caused by the Format plugin trying to load its labels from the CKEDITOR.language object.
Turns out I didn't have en-gb included in my build and apparently it won't automatically fall back to straight en. Adding English (United Kingdom) to the build corrected the issues.
Re. https://stackoverflow.com/a/50719171/6462713
I had same issue. I have also loaded all supported languages in "/lang" folder. Basically my issue was - CKEditor isn't identifying properly its own folder path. So I set a CKEDITOR_BASEPATH variable before loading CKEditor.
It's briefly said here: (but there might be other places where it's explained better.) http://docs.cksource.com/ckeditor_api/symbols/CKEDITOR.html#.basePath
Therefore implementation will be like this:
<script>
window.CKEDITOR_BASEPATH = 'http://example.com/path/to/libs/ckeditor/';
</script>
In my case i used window.CKEDITOR_BASEPATH = '/app/storereport/ckeditor/';
Then load the main ckeditor.js script. Hope this may help you.
<script type="application/javascript"/>
$(document).ready(function (){
CKEDITOR.replace( 'product_content' ); // ID of element
});
</script>

how to get stored options from `this.prompt` inside yeoman context?

Basically, yeoman force you to ask everything you need from developer. Although, it’s a good thing, that you can store something and in future runs these things will be autocompleted for developer. The point is that I want to not ask developer if he already answered on that questions.
here is example of basic yeoman generator (name will be saved and autocompleted later):
var yeoman = require('yeoman-generator');
module.exports = yeoman.generators.Base.extend({
init: function () {
var cb = this.async();
this.prompt([{
name: 'name',
message: 'your name:',
store: true,
}, {
name: 'moduleName',
message: 'module name:'
}], function (props) {
console.log(
props.name, // developer’s name
props.moduleName // module’s name
)
}.bind(this));
},
}
});
The question is how to get stored options from this.prompt inside yeoman context to do smth like this:
this.prompt([!this.name.stored && {
name: 'name', // so after first run this will never be asked again
message: 'your name:',
store: true,
}, {
name: 'moduleName',
message: 'module name:'
}], function (props) {
console.log(
props.name, // developer’s name
props.moduleName // module’s name
)
}.bind(this));
There's no public way to access the stored previous prompt answers.
If you want to cache some data and access it later, then use the storage functionality (this.config)
FWIW, the prompt cache is stored into the private this._globalConfig. I'm adding this detail for completeness, you probably shouldn't use it.
You can user default's value as function, where first argument is prev stored answers:
const answers = await this.prompt([
{
type: 'input',
name: 'projectName',
message: 'Your project id',
default: this.appname,
store: true
},
{
type: 'input',
name: 'projectTitle',
message: 'Your project title',
default: ({ projectName }) => projectName
},
])
You can add a config.js that will store and read the information in the users home directory as a config.json file and later the app can read this file that can be used as a default.
{
name: 'authorName',
message: 'What\'s your name?',
'default': self.defaultAuthorName
}
Check out https://www.npmjs.com/package/generator-yo-wordpress where the developer makes use of a config.js file.

Categories

Resources