cheeriojs has error: exports.load.initialize - javascript

I'm using cheeriojs to do web scraping. I'm having problem after load the body into cheerio. I can see the body is well formatted html code. I'm getting some error like exports.load.initialize. I couldn't using the css selector any elements.
parseWebsite = function () {
request.post(url, {
followAllRedirects: true, headers: {
'User-Agent': 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)'
}, form: formval
},
function (error, response, body) {
$ = cheerio.load(body);
console.log('test');
var table = $('#ContentPlaceHolder1_dgCRF');//table: exports.load.initialize
})
}
)
}

I finally figured out. I'm using webstorm and I think that "error" is part of webstorm thing. This actually wasn't an error at all.

Related

When adding a Job to scheduler: Value cannot be null, Job class cannot be null?

My question is very similar to:
Quartz.net - "Job's key cannot be null"
However its different setup as I am using Rest API.
I am able to run a job when adding through Startup.cs however when I call API to add job using javascript it fails with below error:
ERROR:
System.ArgumentNullException: Value cannot be null. (Parameter 'typeName')
at System.RuntimeType.GetType(String typeName, Boolean throwOnError, Boolean ignoreCase, StackCrawlMark& stackMark)
at System.Type.GetType(String typeName)
at Quartz.Web.Api.JobsController.AddJob(String schedulerName, String jobGroup, String jobName, String jobType, Boolean durable, Boolean requestsRecovery, Boolean replace) in E:\Amit\DotNet\QuartzApi\QuartzApi\Controllers\JobsController.cs:line 108
at lambda_method14(Closure , Object )
at Microsoft.AspNetCore.Mvc.Infrastructure.ActionMethodExecutor.AwaitableResultExecutor.Execute(IActionResultTypeMapper mapper, ObjectMethodExecutor executor, Object controller, Object[] arguments)
at Microsoft.AspNetCore.Mvc.Infrastructure.ControllerActionInvoker.<InvokeActionMethodAsync>g__Logged|12_1(ControllerActionInvoker invoker)
at Microsoft.AspNetCore.Mvc.Infrastructure.ControllerActionInvoker.<InvokeNextActionFilterAsync>g__Awaited|10_0(ControllerActionInvoker invoker, Task lastTask, State next, Scope scope, Object state, Boolean isCompleted)
at Microsoft.AspNetCore.Mvc.Infrastructure.ControllerActionInvoker.Rethrow(ActionExecutedContextSealed context)
at Microsoft.AspNetCore.Mvc.Infrastructure.ControllerActionInvoker.Next(State& next, Scope& scope, Object& state, Boolean& isCompleted)
at Microsoft.AspNetCore.Mvc.Infrastructure.ControllerActionInvoker.InvokeInnerFilterAsync()
--- End of stack trace from previous location ---
at Microsoft.AspNetCore.Mvc.Infrastructure.ResourceInvoker.<InvokeFilterPipelineAsync>g__Awaited|19_0(ResourceInvoker invoker, Task lastTask, State next, Scope scope, Object state, Boolean isCompleted)
at Microsoft.AspNetCore.Mvc.Infrastructure.ResourceInvoker.<InvokeAsync>g__Logged|17_1(ResourceInvoker invoker)
at Microsoft.AspNetCore.Routing.EndpointMiddleware.<Invoke>g__AwaitRequestTask|6_0(Endpoint endpoint, Task requestTask, ILogger logger)
at Microsoft.AspNetCore.Authorization.AuthorizationMiddleware.Invoke(HttpContext context)
at Microsoft.AspNetCore.Diagnostics.DeveloperExceptionPageMiddleware.Invoke(HttpContext context)
HEADERS
=======
Accept: application/json
Accept-Encoding: gzip, deflate, br
Accept-Language: en-US,en;q=0.9
Connection: close
Content-Length: 83
Content-Type: application/json
Host: localhost:44379
Referer: https://localhost:44379/jobs.html
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.159 Safari/537.36
sec-ch-ua: "Chromium";v="92", " Not A;Brand";v="99", "Google Chrome";v="92"
sec-ch-ua-mobile: ?0
origin: https://localhost:44379
sec-fetch-site: same-origin
sec-fetch-mode: cors
sec-fetch-dest: empty
SETUP:
In VS, I created Quartz REST API and front end in a single project. Running the project loads webpage with Jobs and API running in the background.
All controller endpoints work except AddJob. (i.e. get jobs, view job details, pause, resume, trigger, delete)
Dependency:
Quartz.Extensions.Hosting 3.3.3
JobsController.cs
quartznet/JobsController.cs at main · quartznet/quartznet · GitHub
[HttpPut]
[Route("{jobGroup}/{jobName}")]
public async Task AddJob(string schedulerName, string jobGroup, string jobName, string jobType, bool durable, bool requestsRecovery, bool replace = false)
{
var scheduler = await GetScheduler(schedulerName).ConfigureAwait(false);
var jobDetail = new JobDetailImpl(jobName, jobGroup, Type.GetType(jobType), durable, requestsRecovery);
await scheduler.AddJob(jobDetail, replace).ConfigureAwait(false);
}
HelloWorldJob.cs:
https://andrewlock.net/using-quartz-net-with-asp-net-core-and-worker-services/
Startup.cs: (Adds a job without API and runs it using trigger at start)
void ConfigureHostQuartz(IServiceCollection services)
{
services.AddQuartz(q =>
{
q.UseMicrosoftDependencyInjectionScopedJobFactory();
var jobKey = new JobKey("HelloWorldJob");
q.AddJob<HelloWorldJob>(opts => opts.WithIdentity(jobKey));
q.AddTrigger(opts => opts
.ForJob(jobKey)
.WithIdentity("HelloWorldJob-trigger")
.WithCronSchedule("0/5 * * * * ?"));
});
services.AddQuartzHostedService(
q => q.WaitForJobsToComplete = true);
}
Html/Javascript front end:
Following this example:
Tutorial: Call an ASP.NET Core web API with JavaScript | Microsoft Docs
<form action="javascript:void(0);" method="POST" onsubmit="addJob()">
<input type="text" id="add-name" placeholder="New job">
<input type="submit" value="Add">
</form>
<script>
function addJob() {
const addNameTextbox = document.getElementById('add-name').value.trim();
const item = {
jobType: "HelloWorldJob",
durable: true,
requestsRecovery: false,
replace: false
};
fetch(`${uri}/DEFAULT/${addNameTextbox}`, {
method: 'PUT',
headers: {
'Accept': 'application/json',
'Content-Type': 'application/json'
},
body: JSON.stringify(item)
})
.then(response => console.log(response))
.then(() => {
getJobs();
addNameTextbox.value = '';
})
.catch(error => console.error('Unable to add job.', error));
}
</script>
I have tried updating the API to include jobType in url, then it gives different error:
Job class cannot be null
at Qurtz.Impl.JobDetilImpl.set_JobType(Type value)
You need to supply assembly qualified name as job type. Problems is here:
jobType: "HelloWorldJob",
jobType should be something like "MyNameSpace.JobType, MyAssembly" - you can probably get this written to console with Console.WriteLine(typeof(HelloWorldJob).AssemblyQualifiedName) - you can ignore the version etc, only type name with namespace and assembly name are needed.
Please also note that your setup has security implications as you allow CLR types to be passed from the UI.
API controller changes:
As mentioned by Marko above, jobType needs fully qualified name, assembly reference is not however necessary in my case as I have jobs in same assembly.
[HttpPut]
[Route("{jobGroup}/{jobName}/{jobType}/{replace}/new")]
public async Task NewJob(string schedulerName, string jobGroup,
string jobName, string jobType, bool replace = false)
{
//Note: Job added without a trigger must be durable.
var scheduler = await GetScheduler(schedulerName).ConfigureAwait(false);
var jobDetail = new JobDetailImpl(jobName, jobGroup,
Type.GetType("QuartzApi.Jobs." + jobType), true, false);
await scheduler.AddJob(jobDetail, replace).ConfigureAwait(false);
}
JavaScript fetch query changes:
Removed JSON body tag and added extra parameters to url. Note its a job without trigger. At a later stage jobType can be a variable, for now its included in fetch string.
function addJob() {
const addNameTextbox = document.getElementById('add-name').value.trim();
fetch(`${uri}/DEFAULT/${addNameTextbox}/HelloWorldJob/false/new`, {
method: 'PUT',
headers: {
'Accept': 'application/json',
'Content-Type': 'application/json'
}
})
.then(response => console.log(response))
.then(() => {
getJobs();
addNameTextbox.value = '';
})
.catch(error => console.error('Unable to add job.', error));
}
Running the UI request to add job, now adds it without a trigger (to be worked on in separate section). To confirm, I then ran API request in browser to fetch all jobs for the running scheduler using:
[https://localhost:44379/api/schedulers/QuartzScheduler/jobs]
resulting in:
[{"name":"HelloWorldJob","group":"DEFAULT"},{"name":"TestJob","group":"DEFAULT"}]
That implies a few things:
Passing JSON object in body does not associate it with API function parameters. I need to add all parameters in url string to use them. May be there is a way to use body parameters.
Now that the class is correctly referenced in API, I can continue passing just the class name through UI, without namespace and assembly to keep it secure as class is defined in the project at build time.
Adding Console.Writeline in API function did not return any output at runtime.

When using PUT method body is not passed using Koa

I am trying to make update function, where a user can put the new data and the data in the server would get updated, a simple task, however, when I try to PUT new data, the body is always undefined.
The data which gets sent:
request: {
method: 'PUT',
url: '/api/v1.0/articles/1',
header: {
'user-agent': 'PostmanRuntime/7.17.1',
accept: '*/*',
'cache-control': 'no-cache',
host: 'localhost:3000',
'accept-encoding': 'gzip, deflate',
'content-length': '98',
connection: 'keep-alive'
}
},
response: {
status: 404,
message: 'Not Found',
header: [Object: null prototype] {}
},
Now I tried passing it as keys using other methods and not RAW method, this is what is inside of the body I am trying to pass:
{
"title": "another article",
"fullText": "again here is some text hereto fill the body"
}
This is the function which should update the data, but it gets undefined from the put request.
router.put("/:id", updateArticle);
function updateArticle(cnx, next) {
let id = parseInt(cnx.params.id);
console.log(cnx);
if (articles[id - 1] != null) {
//articles[id - 1].title = cnx.request.body.title;
cnx.body = {
message:
"Updated Successfully: \n:" + JSON.stringify(updateArticle, null, 4)
};
} else {
cnx.body = {
message:
"Article does not exist: \n:" + JSON.stringify(updateArticle, null, 4)
};
}
}
I am using postman, Body -> Raw | JSON, I do have to mention all other methods work perfectly - delete, create, getAll, getById
With a PUT or POST, the data is in the body of the request. You have to have some code (in your request handler or some middleware) that actually reads the body from the stream and populates the body property for you. If you don't have that, then the data is still sitting in the request stream waiting to be read.
You can see an example of reading it yourself here: https://github.com/koajs/koa/issues/719 or there is pre-built middleware that will do that for you.
Here's are a couple modules that will do that middleware for you:
https://github.com/dlau/koa-body
https://www.npmjs.com/package/koa-body-parser

Setting up proxy in Chrome by my extension

I'm trying to make Chrome extension which turns on-off proxy in browser. I wrote some code in javascript using documentation about chrome extensions, and what is interesting, some times extension works right.
Here is some code in Golang, which iterates through list of https proxy addresses, makes http requests to www.stackoverflow using this proxy addresses and if response status code is 2xx (because some of addresses are invalid, deprecated, unavailable, banned and etc), it marks proxy address correct and prints it.
The second block of the code sets chrome.proxy.settings property with found 'good' proxy address. By now I'm just copy-pasting it by hands.
The question is: Why is Golang sorts out "bad" proxy address, gives me list of "good" proxies, but Chrome, when I trying to set any of this "good" proxies gives me ERR_PROXY_CONNECTION_FAILED?
EDIT: in Go add User-Agent header, in JavaScript add scheme : https field to config object for proxy.
Still not working.
// checkProxy returns true if proxy adress a is valid and
// response by http request with this proxy is ok (2xx)
//185.132.179.108:1080
//185.132.179.107:1080
func checkProxy(a string) (string, bool) {
proxyUrl, err := url.Parse("http://" + a)
httpClient := &http.Client{
Transport: &http.Transport{
Proxy: http.ProxyURL(proxyUrl),
},
}
req, err := http.NewRequest("GET", "http://stackoverflow.com", nil)
if err != nil {
return "", false
}
req.Header.Add("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36")
response, err := httpClient.Do(req)
if err != nil {
return "", false
}
if response.StatusCode <= 200 && response.StatusCode >= 299 {
return "", false
} else {
return a, true
}
}
function setProxySettings(iip, pport) {
var config = {
mode: "fixed_servers",
rules: {
//https://www.proxy-list.download/HTTPS
// host: "107.191.63.8",
// port: 32482
singleProxy: {
scheme: "https",
host: iip,
port: parseInt(pport)
},
}
};
chrome.proxy.settings.set({
value: config,
scope: 'regular'
},
function () {});
}

Javascript: Cherrio is returning inconsistent results for anchor tag

I'm trying to scrape websites and grab their mailto links:
const url = "https://www.cverification.com/";
axios.get(url).then(({ data }) => {
const $_ = cheerio.load(data);
const mailToLink = $_('a[href^="mailto:"]');
console.log("maillllllllll: ", mailToLink);
if (!mailToLink || !mailToLink.length) {
console.log("NO EMAILLLL: ", url); // <------------ this prints
return;
}
const email = mailToLink.attr("href").replace("mailto:", "");
console.log("SUCCEEDEDDD", url, email);
});
However, Cheerio is returning a weird object for some of the links:
maillllllllll: initialize {
options:
{ withDomLvl1: true,
normalizeWhitespace: false,
xml: false,
decodeEntities: true },
_root:
initialize {
'0':
{ type: 'root',
name: 'root',
namespace: 'http://www.w3.org/1999/xhtml',
attribs: {},
This script works for some websites and not for others. When I visit https://www.cverification.com/ and run the code above line by line (just using jQuery) it works. What am I doing wrong?
As others in the comments have discovered, the site was using React and therefore the link was inserted after React injects all the components.
I fixed this by updating the user-agent of my request:
const instance = axios.create({
headers: {
"User-Agent": "Mozilla/5.0 (iPhone; CPU iPhone OS 8_0 like Mac OS X) AppleWebKit/600.1.3 (KHTML, like Gecko) Version/8.0 Mobile/12A4345d Safari/600.1.4"
}
});
This fixed it!

how to add a basic keen.io event with javascript

I am trying to set up a basic example that sends a custom keen.io event via js. At the moment I do not need any presentation, visualisation, etc.
Here is the example that I created from another one I found online. I attempted several variations, and all of them work in Google Chrome, but none of them works in Firefox (38.0 for Ubuntu canonical - 1.0).
if I add to the head the inline script (!function(a,b){a("Keen"...) as it is proposed in the manual, I do not get any errors in FF, but it seems that addEvent never gets called and it produces no response, "err" nor "res".
if I include the library from the CDN (d26b395fwzu5fz.cloudfront.net/3.2.4/keen.min.js), I get an error when the page is loaded:
ReferenceError: Keen is not defined
var keenClient = new Keen({
If I download the js file and serve it locally, after the button is clicked, I get the following error response:
Error: Request failed
err = new Error(is_err ? res.body.message : 'Unknown error occurred');
All of these attempts work from Chrome, but I need this to work from other browsers too.
I received a response from keen.io team. It turned out that Adblock Plus is interfering with the script. After I disabled it everything works in FF as in Chrome.
After some investigation in turned out that request to http://api.keen.io was blocked by "EasyPrivacy" filter of ABP with these filter rules: keen.io^$third-party,domain=~keen.github.io|~keen.io
So, sending a request to an "in-between" server (a proxy) seems to be the only solution that I can see.
We have a bit specific use case - a need to track a static site and also an available access to a rails api server, but the solution we ended up using may come useful to someone.
error.html
<html>
<head>
<title>Error</title>
<script src="/js/vendor/jquery-1.11.2.min.js"></script>
<script src="/js/notification.js"></script>
<script type="text/javascript">
$(document).on('ready', function () {
try {
$.get(document.URL).complete(function (xhr, textStatus) {
var code = xhr.status;
if (code == 200) {
var codeFromPath = window.location.pathname.split('/').reverse()[0].split('.')[0];
if (['400', '403', '404', '405', '414', '416', '500', '501', '502', '503', '504'].indexOf(codeFromPath) > -1) {
code = codeFromPath;
}
}
Notification.send(code);
});
}
catch (error) {
Notification.send('error.html', error);
}
});
</script>
</head>
<body>
There was an error. Site Administrators were notified.
</body>
</html>
notification.js
var Notification = (function () {
var endpoint = 'http://my-rails-server-com/notice';
function send(type, jsData) {
try {
if (jsData == undefined) {
jsData = {};
}
$.post(endpoint, clientData(type, jsData));
}
catch (error) {
}
}
// private
function clientData(type, jsData) {
return {
data: {
type: type,
jsErrorData: jsData,
innerHeight: window.innerHeight,
innerWidth: window.innerWidth,
pageXOffset: window.pageXOffset,
pageYOffset: window.pageYOffset,
status: status,
navigator: {
appCodeName: navigator.appCodeName,
appName: navigator.appName,
appVersion: navigator.appVersion,
cookieEnabled: navigator.cookieEnabled,
language: navigator.language,
onLine: navigator.onLine,
platform: navigator.platform,
product: navigator.product,
userAgent: navigator.userAgent
},
history: {
length: history.length
},
document: {
documentMode: document.documentMode,
documentURI: document.documentURI,
domain: document.domain,
referrer: document.referrer,
title: document.title,
URL: document.URL
},
screen: {
width: screen.width,
height: screen.height,
availWidth: screen.availWidth,
availHeight: screen.availHeight,
colorDepth: screen.colorDepth,
pixelDepth: screen.pixelDepth
},
location: {
hash: window.location.hash,
host: window.location.host,
hostname: window.location.hostname,
href: window.location.href,
origin: window.location.origin,
pathname: window.location.pathname,
port: window.location.port,
protocol: window.location.protocol,
search: window.location.search
}
}
}
}
return {
send: send
}
}());
example of sending notification manually from js code:
try {
// some code that may produce an error
}
catch (error) {
Notification.send('name of keen collection', error);
}
rails
# gemfile
gem 'keen'
#routes
resource :notice, only: :create
#controller
class NoticesController < ApplicationController
def create
# response to Keen.publish does not include an ID of the newly added notification, so we add an identifier
# that we can use later to easily track down the exact notification on keen
data = params['data'].merge('id' => Time.now.to_i)
Keen.publish(data['type'], data) unless dev?(data)
# we send part of the payload to a company chat, channel depends on wheter the origin of exception is in dev or production
ChatNotifier.notify(data, dev?(data)) unless data['type'] == '404'
render json: nil, status: :ok
end
private
def dev?(data)
%w(site local).include?(data['location']['origin'].split('.').last)
end
end

Categories

Resources