I am developing an extension for Opera, which is an absolute first for me, and at some point I want to save the context where a user highlighted some text, so that when the user refreshes the site, or reopens it later, the highlighted text will be highlighted. I execute a js script that does the highlighting, and it sends it back to the background process, which stores it in an array (for now I want it to be persistent at least through a single session of Opera). Then, once any tabs finishes loading, the background process has it run a different js script that highlights any previously saved text from that webpage. To do this, I have a highlight object, that currently has the highlighted text, along with the source url, and an ID. I tried passing the range that I used for highlighting the text, but as soon as the range object gets sent to the background process, it is just received as a generic object, which I cannot use.
So the problem I am facing right now is that once a page loads, I know what pieces of text had previously been highlighted on that page, but I don't have a way to highlight them.
I guess what I was trying to do (passing the range, or the start and end containers) didn't really make a lot of sense, but I can't think of another way to do it, nor can I find anything online to help.
Related
I'm trying to use JavaScript to scrape data from the following page, specifically the "free shipping free returns" text that appears when you hover your mouse over the cart icon:
Whenever I hover over the cart icon, new HTML is added to the DOM.
And when I move my mouse away, the the previously added HTML goes away. I want to be able to parse data from the HTML that gets added without having the popup visible. How would I be able to scrape this text data even if someone does not hover over the cart icon? Is there a way to access all the HTML data at once?
You can try to catch the JavaScript function being executed when you hover your mouse over the cart icon. You can do this via the developer tools. Add break points to code execution if the DOM changes (on the parent element in which the new element is added).
Once you get the function, just execute it directly on that page and you'll probably be able to see the popup and extract it's contents.
You could also try to simulate a hover as explained in these answers: How do I simulate a mouseover in pure JavaScript that activates the CSS ":hover"?
Scraping a page for data is not usually recommended since they can change over time (especially ones not written directly in HTML, but are rather generated (usually they have CSS classes like 8h2H1)).
If this is not supposed to be a long-term solution, the above answer by #nvkrjn is a good answer. Or, you can just check for an element with the id name free-shipping-label.
But, if this is supposed to be a long-term solution, then I would suggest using an API (this site doesn't seem to have one) or querying the database like how to Javascript does. Also, if you're using a non-browser environment (eg BeautifulSoup), it may not run the JS required to get the data.
I need a quick way to get the image URL, just like I would get if I right click on an image and select "Copy Image URL". I'm thinking Applescript, though others have mentioned Javascript.
This needs to be compatible with an Automator workflow and needs to work with Google Chrome, Chromium, and Safari, at a minimum.
More specifics:
I already have an Automator workflow that this will be added to.
The workflow begins with text and images that I have selected on a webpage using the mouse.
The processing of the text is working fine.
I just need a Applescript or Javascript or Shell Script (which I assume are the only outside code that can be added to an Automator workflow) that will grab any and all image URL's within the part of the page selected in step 2.
Images are NOT downloaded. Only the image URL is needed.
The basic logic is this:
Does selected input contain images?
If yes,
get URL of image(s)
pass to the next step
else continue
Any help or ideas appreciated!
OS X Services would be your best bet. Those work with text selections and are supported in most apps (e.g. see the Safari>Services submenu). You can also assign them keyboard shortcuts, which is very handy for repetitive tasks.
Basically, you want to get the selection as web content (i.e. HTML data, not plain text) then extract the URLs from that. You can create services in Automator, which includes various actions for working with web content, so I recommend starting there.
I'm working on a JS chrome extension that allows the user to modify text on any web page. The challenge is that when the user visits the web page again, he/she should see the modified text rather than the original page's text. Let's assume the user can only modify paragraph elements (p)
In another words, on page load, the extension needs to scan the document - find the matching text and modify it.
This is a challenging problem because:
- In between visits, the page could change
- There could be any number of text occurrences. For example, the string "I am a ninja" could appear 10 times in a page.
- Other extensions could modify the DOM as well (who knows what extensions the user have installed).
- This needs to work on ANY WEB PAGE
On a subsequent page visit, when the user needs to see his/her modified text - how would you go about determining what text to modify? Right now I'm doing simple string matching which is far from ideal.
Ideally, I will have a function which scans each element in the document and return a percentage degree of certainty (0 - 1) of the likelihood that this is the element the user modified.
FOOTNOTE:
I realize that there will be instances where the page will be modified completely and it will be impossible to find the element but, I'm not interested int those instances - In those cases, I will render those differently.
Simply thinking it over, I came up with this (yet not complete solution):
Whenever the user selects the text to modify, right clicks and calls your extension, what you should do is:
Use Selection and Range objects to get the nearest proper node (having class and id) (if none present, then simply the nearest node) in which the range is present. I assume that the marked text cannot be within textarea or input element. Then, get the offset of the selected text. Grab details of that nearest proper node i.e. class and id.
Store this all data into the synced or local quota storage and then use this data to remodify the text next time the user visits the page.
Note that this assumes that these proper nodes would not be modified at later point of time. Like, if I mark some text in this answer, then delete some other part of the answer, which makes this texts offset shift left, then the above solution would not work.
I am replacing the showModalDialog function which no longer works in Chrome and FF. We have many applications using that. The problem is, pop up windows do post instructions to the web server and update the database. For instance if there's a list of accounts on screen and edit is clicked on one of the accounts, an edit page appears as a pop up, posts changes back to the web server, then the list is refreshed with changes. The entire list may be refreshed or just text that changed.
I made a javascript function to do pop up content using overlays. I thought it would be simple to replace showModalDialog calls with the javascript function, but I did not consider post instructions sent by the pop up page to update the database, and complexity to facilitate that. Posting can be done via ajax-like functionality, encapsulated in a set of functions. Before I start writing code to do this I'd like to know what other people have done in this circumstance. Thanks
I wrote some javascript to do everything I want. Since my pop up windows had javascript, I needed to run javascript upon rendering modal content, and also when the modal content went away. This will produce any number of overlays on top of each other, managing each. Content can optionally appear in a frame with a title bar, closely matching the functionality of showModalDialog.
Download at http://bikehappy.org/modal.html . If used, please give feedback saying if it works and provide update suggestions.
I'm trying to write a python web scraper that takes a pandora account and gets all the stations from it.
However, the stations do not immediately all show up, and i need to click the show all button to view all of the stations. Moreover, even after i click the show all, the source code remains unchanged!
My question is where is the html that displays these extra elements that are seemingly invisible?
Example)
if you go to http://www.pandora.com/people/nenadbach#tbl_stations_table,all
(the #tbl_stations_table,all makes all the stations show up; this is where the "show all" button takes you)
And view source, the stations after The Girl From Ipanema Radio arent stored in the immediate source
Thanks for the help!
If you view the source from Firebug (if you use Firefox) or Inspector (if you use Safari or Chrome) you can see that the data is there. It's most likely being pulled in via ajax (JavaScript).
You would either need a scraper that understands JavaScript or to find the http ajax calls its making and call them yourself. The call that you are probably looking for is:
http://www.pandora.com/favorites/profile_tablerows_station.vm?webname=nenadbach&countRowsOnBrowser=10&countRowsNeeded=25
Note that mostly likely this is using a cookie to detect who you are and what list to show.