I have a website with various applets/widgets/RSS feeds. How would I go about creating a cron/script that is able to figure out when the last time one of these applets/RSS feeds was updated and store that in a database?
I'd need to be able to differentiate between an update in one feed or another. Some widgets have only pictures, one of them is the twitter widget so the content is all different.
You could find a element which has a date for example enter link description here has a pubDate element <pubDate>Mon, 28 Jul 2014 14:03:00 -0400</pubDate> or you could have a database row which told you the last article parsed from an RSS feed for each source. there would be a lot of ways this is possible, it might be worth looking on github for an opensource RSS reader and see if they have solved this problem.
Related
I have a table on one of my webpages that can be filtered using a search bar.
The script works pretty much as intended but for one problem.
Each row of the table contains an image of a flag. All of these images were uploaded to my site in 2018. This means if a user searches for 2018 the table remains unfiltered because the script is using the 2018 from the image src url in the results.
Is it possible for me to have the script ignore those image urls?
If you search for 2018 in the search bar on WEBSITE REMOVED you'll see my issue. Search for others years and you'll see that it works fine.
The script I'm using is:
SCRIPT REMOVED
I would appreciate any help on this.
If you look at the textContent of the table cell, you'll see it includes the URL.
Your solution is just to be more specific about you want. Maybe
rows[0].cells[2].getElementsByTagName("a")[0].textContent
I am pretty new to programming and kindly ask for your help.
I am currently analyzing travel restrictions imposed by countries during COVID outbreak. For the analysis I need to download a summary table from an online power bi dashboard, although it has no download option and no static URL. You could access the table in the 'travel restriction'-section of the website.
The data is being loaded after you scroll the page and one can't access the whole dataset untill one scrolls the page for quite a long time. I tried simple 'select all' (Ctrl+A combination) to copy at least parts of the dataset, but it doesn't work as well and helps to copy just one row.
Any ideas how to scrape the dataset? I would greatly appreciate any tips or solutions.
On this same page there is the DataSet:
Click on this and it brings you to excel download of data. Here you find the restriction data you are lookng for (at least this is wht I asume):
There are some vintage answers but they seem to be based on the client date.
I simply need a DIV to display if the posted (published) date of the article is on or after 2/25/2017. Platform=Google Blogger so no PHP.
Thanks
I am new to web scraping and so far I only know how to scrape basic html page using python beautiful soup. What I want is to extract the information on this page. Specifically, I would like to get the following data from all the fellows (around 700 of them)
name
background
insight project
current employer
However, that page is rendered by javascript and the desired information only show up as a separate box when mouseover event is triggered on each fellows picture.
How to extract text in this case? Any information (books, web resources) is appreciated. Python solutions are preferred if possible. Many thanks.
Check the page source of the website.
The information is already present in the in the DOM, just hidden using CSS. On a first glance, it seems like the JavaScript logic is only doing CSS manipulations.
The fact that the information is hidden by CSS will not prevent you from scraping it from the source using a web scraping tool.
In the picture in the following link the black box labeled #timestamp lists a date and time (sorry you need a 10 reputation to post an image)
https://www.elastic.co/assets/bltde09cbafb09b7f08/Screen-Shot-2014-09-30-at-4.07.15-PM.png
I have a kibana 4 dashboard embedded in a webpage and I would like to get access to these timestamps. An ideal behavior would be when a node is clicked the time stamp is printed in the console.
I have already suppressed the same-origin policy.
Has anyone performed something similar? I would appreciate any solution or insight.
Try to grab it using jQuery's .contains selector:
$(':contains(#timestamp)').click(function(){
var info = $(this).parent().find(':contains(,)').html();
console.log(info);
});