What I want
I want to be able to copy/paste the entire content of a chat to memory
so I can extract included YouTube urls from it.
What I know
As you may know, the group chat(s) run on a separate url and are loaded page by page. Normally you go to the previous page either by simply scrolling upwards, or by clicking on a show previous link (works differently on different devices I think).
Things I tried
Sadly I can't find the urls to either anymore, but ...
Add a script to Chrome console
The point was to add a script that went looking for the show previous link and clicked it.
Add a start=0 parameter to the url
This assumes you can find out the actual url, either manually or through something like Fiddler.
The idea was that you add something like ?start=0 to the url. This would cause the paging to start from the very first record and load all.
Both solutions didn't work.
Possibly this is because Facebook made these options obsolete. It's my impression that Facebook initially provided more dev options than it does now.
My question
What can I do to fully load chat content?
Not really sure what this has to do with C#, but i'll give a C# solution anyways. My solution would be to use something such as HTMLAgilityPack to get the InnerHTML from a page once it's loaded, although this will obviously require some type of authentication, so for this I suggest using something like a WebClient and sending along Auth credentials with whatever it is you're doing, OR just create a method to login, then use the same webclient to access chats via URL, use DownloadString() to get the contents of the page then using HTMLAgilityPack's methods to get the InnerHTML of whatever the chat box is called/indentified as.
Right now this is the nearest thing I can find:
https://www.facebook.com/help/community/question/?id=10200611181580779
There is a way to see your complete chat history on Facebook easily.
By this method you can also see Photos or videos you've shared on
Facebook. Your Wall posts etc. -- 'A copy of what you've shared on
Facebook' Follow these steps:
Go to 'Account Settings'
Click on 'Download a copy of your Facebook data' from bottom of General section
Then click 'Start My Archive' -- It may take a little while for gather your photos, wall posts, messages, and other information.
(Usually 20 to 60 minutes)
Once Archive generated Download it.
Extract and open 'index.html' from downloaded folder
Now you can see 'Messages' on bottom of the page, click it.
Done!
I got a response in my mail way faster than 20 minutes.
You will get a mail with a link to a zip file, containing your archive:
In the html folder you find: messages.htm
For that I can write a script that looks for YouTube URLs in that file.
Related
Python novice here.
I am trying to scrape company information from the Dutch Transparency Benchmark website for a number of different companies, but I'm at a loss as to how to make it work. I've tried
pd.read_html(https://www.transparantiebenchmark.nl/en/scores-0#/survey/4/company/793)
and
requests.get("https://www.transparantiebenchmark.nl/en/scores-0#/survey/4/company/793")
and then working from there. However, it seems like the data is dynamically generated/queried, and thus not actually contained in the html source code these methods retrieve.
If I go to my browser's developer tools and copy the "final" html as shown there in the "Elements" tab, the whole information is in there. But as I'd like to repeat the process for several of the companies, is there any way to automate it?
Alternatively, if there's no direct way to obtain the info from the html, there might be a second possibility. The site allows to download the information as an Excel-file for each individual company. Is it possible to somehow automatically "click" the download button and save the file somewhere? Then I might be able to loop over all the companies I need.
Please excuse if this question is poorly worded, and thank you very much in advance
Tusen takk!
Edit: I have also tried it using BeautifulSoup, as #pmkroeker suggested. But I'm not really sore how to make it work so that it first runs all the javascript so the site actually contains the data.
I think you will either want use a library to render the page. This answer seems to apply to python. I will also copy the code from that answer for completeness.
You can pip install selenium from a command line, and then run something like:
from selenium import webdriver
from urllib2 import urlopen
url = 'http://www.google.com'
file_name = 'C:/Users/Desktop/test.txt'
conn = urlopen(url)
data = conn.read()
conn.close()
file = open(file_name,'wt')
file.write(data)
file.close()
browser = webdriver.Firefox()
browser.get('file:///'+file_name)
html = browser.page_source
browser.quit()
I think you could probably skip the file write and just pass it to that browser.get call, but I'll leave that to you to find out.
The other thing you can do is look for the ajax calls in a browser developer tool. i.e. when using chrome the 3 dots -> more tools -> developer tools or press something like F12. Then look at the network tab. There will be various requests. You will want to click one, click the Preview tab, and then go through each until you find a response that looks like json data. You are effectively look for their API calls that they used to get the data to generate things. Once you find one, click the Headers tab and you will see a Request URL.
i.e. this https://sa-tb.nl/api/widget/chart/survey/4/sector/38 has lots of data
The problem here is it may or may not be repeatable (API may change, id's may change). You may have a similar problem with just HTML scraping as the HTML could change just as easily.
I have a blogspot blog (Blogger). And have just noticed that my posts are being scraped (illegally copied) onto another site by some low-life.
I have 2 options:
Just ignore it
Try to trick them
I would like to put a script on my blog posts that look for the domain name of my blog, and if it is not correct, to redirect the viewer to my blog.
Is this possible? Will it work?
I am hoping that the scrape method being used is just a copy-paste method, and would like to redirect anyone who visits the offending site back to me (the original content creator).
I know that they could just remove the script, but I would still like to know if it can be done. I would like to see if it works.
if (window.location.href !== "YOURSITE") window.location.href = "YOURSITE";
Should work fine if they copy the entire HTML
If they just copy the text, this won't work at all.
Getting your articles copied sucks, I hope you'll be able to resolve it.
First of all, sorry for the lame question (probably). I tried to search for an answer but I'm not finding everything I need for my issue.
So... I have a bootstrap website and I am trying to change the page URLS to appear like this :
For example : www.site.com/AboutUs.html - to appear as www.site.com/about-us
I am using the pushState method for this as it follows:
var stateObj = { AboutUs: "about-us" };
history.pushState(stateObj, "About Us", "about-us");
So I get the needed URL address there (www.site.com/about-us).. so far so good. But on page refresh it throws an error stating "The requested URL /about-us was not found on this server."
If I hit the back browser button it goes to www.site.com/AboutUs.html again.(and it is supposed to go on the home page)
My question is :
What am I missing, am I supposed to make a controller and how ?
I am not using C#, I can probably use some help with PHP because I am not good at it. JavaScript / jQuery are welcomed.
Thanks in advance and sorry for the dumb question.
Happy days!
The point of pushState is to be able to say: I have modified the page with JavaScript, the new state is what you would get if you just asked the server for this URL.
You shouldn't use JavaScript for this problem at all. You just want to have a page appear at a particular URL.
You need to configure the server to serve up the content you want on the URL you want.
What am I missing, am I supposed to make a controller and how ?
You need something on the server to handle the URL /about-us.
"A controller" is something you would probably use if you were using the MVC architecture on the server … and it doesn't sound like you are.
More likely you will be wanting to use an Alias, a tool like mod_write, or simply moving the static file to a directory called about-us and renaming it index.html.
I have the following problem:
HTML blank page on server 1.
WordPress site on server 2.
What I need is to call the content from www.wordpress.site/sample-page/ to HTML page on server 1, but not the entire page, only the part that I can edit from wp-admin; so without header and footer.
Also, I don't know if there is any other method, but I need it to be done via JavaScript/jQuery or Ajax.
I've used Google, but is hard to get a tutorial for this, I've tried a lot of tutorials, but none is what I need, and I don't know that much JavaScript to make it work.
SO, can someone help me please?
BIG Thanks!
Andrei
L.E.:
I've found this working: http://jsfiddle.net/mdawaffe/hLWdH/
It is working as it is written, if I try to change the domain with mine, will not work.
What script do I have to implement on the server from which the content is called (taken)?
For more information, as you asked:
I have a HTML + CSS + JS template that I will use with phonegap (if you don't know about it, try it, it's very useful) to create a mobile app for Android, iOS, and BlackBerry.
Now, I have this site: m.trafficvoice.ro (I hope I can post links here).
In the 'live stream' page (it's called services.html), I have a HTML5 audio tag/player.
What I need, is to get from www.trafficvoice.ro/whatever-the-name-page, the content, but only the part that I can edit in WordPress (so without header and footer).
Why? Because in the future there will be more stream to add, and maybe some of them will be down due to unknown reason, so I need to update that page, without making an update for the entire app, upload it to the store, wait for approval, the client to download it, etc.
Big thanks!
Andrei
Could you just use an iframe instead? You could modify a template in your theme to not display header/footer and then use that in the iframe.
I inject javascript code into a page user is currently viewing, on users command this script make DOM changes. At the end of this interaction user might want to save the page so that s/he can view/edit it later. I could remember the DOM changes that user made, But if the original page(at its source) is changed, I will not be able to restore this page for user. That is why I want to send the changed page to my server. I should be able to restore it completely and the page should behave exactly the way it did(including scripts and media).
Additionally I can not store media of users page at my end(resource limitation), so I guess I have to parse and modify all addresses/references/links of media to global URL/URI in various scripts(HTML/CSS/JavaScript).
Now the question is, Is there a library/framework/jquery extension that can help me achieve this objective ?
else, What is the right/professional way to do it ?
Since you are using jQuery you could try $("html").html(); just make sure to add the appropriate <html> tags when you output it again.
$('body').html()
$('head').html()
$('html').html()
Download firebug, and try it in the console window on this page. I am getting what looks like the correct data back.
Have I got It right that you are building some kind of CMS that let's the user edit entire pages (Not just seperate content blocks) in Contenteditable mode?
I would definatly advise looking at a solution like ckeditor/tinymce etc... Because doing it all yourself will be a terrible pain.
The answer from #Sydenam should work fine to save the whole HTML page.
Meanwhile, and this is IMPORTANT, I would recommend you to consider a potential SECURITY ISSUE here. Indeed the user can inject whatever he wants in the DOM and have you saving it, like nasty Javascript functions sending confidential information on a remote server for example.
So, in my perspective, a professional way of doing this would be to dedicate a PART of the DOM only to that usage, let say a <div id='editable_div'> that you can load using a $('#editable_div').load('your_url',parameters, etc...), and save afterward using another AJAX call.
When saving it you can parse this chunk of HTML and make sure nothing nasty is inside with some regexp (like tags).
Hope it helps,
Regards,