Rendering HTML from a python String in web2py, generate #usename links python - javascript

Rendering HTML from a python String in web2py
I am trying to render an anchor link in an html file generated server side in web2py
#username
and the link generates correctly; however when I call it in my view {{=link}} the page does not render it as HTML. I have tried using
mystring.decode('utf-8')
and various other conversions. Passing it to javascript and back to the page displays the link fine. Is there something specific about python strings that do not communicate well with html?
In the controller the string is generated by the function call:
#code barrowed from luca de alfaro's ucsc cmps183 class examples
def regex_text(s):
def makelink(match):
# The title is the matched praase #username
title = match.group(0).strip()
# The page is the striped title 'username' lowercase
page = match.group(1).lower()
return '%s' % (A(title, _href=URL('default', 'profile', args=[page])))
return re.sub(REGEX,makelink, s)
def linkify(s):
return regex_text(s)
def represent_links(s, v):
return linkify(s)
which replaces #username with a link to their profile and args(0) = username and is sent to the view by a controller call
def profile():
link = linkify(string)
return dict(link=link)

For security, web2py templates will automatically escape any text inserted via {{=...}}. To disable the escaping, you can wrap the text in the XML() helper:
{{=XML(link)}}

Related

How to scrape AJAX based websites by using Scrapy and Splash?

I want to make a general scraper which can crawl and scrape all data from any type of website including AJAX websites. I have extensively searched the internet but could not find any proper link which can explain me how Scrapy and Splash together can scrape AJAX websites(which includes pagination,form data and clicking on button before page is displayed). Every link I have referred tells me that Javascript websites can be rendered using Splash but there's no good tutorial/explanation about using Splash to render JS websites. Please don't give me solutions related to using browsers(I want to do everything programmatically,headless browser suggestions are welcome..but I want to use Splash).
class FlipSpider(CrawlSpider):
name = "flip"
allowed_domains = ["www.amazon.com"]
start_urls = ['https://www.amazon.com/s/ref=nb_sb_noss?url=search-alias%3Daps&field-keywords=mobile']
rules = (Rule(LinkExtractor(), callback='lol', follow=True),
def parse_start_url(self,response):
yield scrapy.Request(response.url,
self.lol,
meta={'splash':{'endpoint':'render.html','args':{'wait': 5,'iframes':1,}}})
def lol(self, response):
"""
Some code
"""
The problem with Splash and pagination is following:
I wasn't able to product a Lua script that delivers a new webpage (after click on pagination link) that is in format of response. and not pure HTML.
So, my solution is following - to click the link and extract that new generated url and direct a crawler to this new url.
So, I on the page that has pagination link I execute
yield SplashRequest(url=response.url, callback=self.get_url, endpoint="execute", args={'lua_source': script})
with following Lua script
def parse_categories(self, response):
script = """
function main(splash)
assert(splash:go(splash.args.url))
splash:wait(1)
splash:runjs('document.querySelectorAll(".next-page")[0].click()')
splash:wait(1)
return splash:url()
end
"""
and the get_url function
def get_url(self,response):
yield SplashRequest(url=response.body_as_unicode(), callback=self.parse_categories)
This way I was able to loop my queries.
Same way if you don't expect new URL your Lua script can just produce pure html that you have to work our with regex (that is bad) - but this is the best I was able to do.
You can emulate behaviors, like a ckick, or scroll, by writting a JavaScript function and by telling Splash to execute that script when it renders your page.
A little exemple:
You define a JavaScript function that selects an element in the page and then clicks on it:
(source: splash doc)
# Get button element dimensions with javascript and perform mouse click.
_script = """
function main(splash)
assert(splash:go(splash.args.url))
local get_dimensions = splash:jsfunc([[
function () {
var rect = document.getElementById('button').getClientRects()[0];
return {"x": rect.left, "y": rect.top}
}
]])
splash:set_viewport_full()
splash:wait(0.1)
local dimensions = get_dimensions()
splash:mouse_click(dimensions.x, dimensions.y)
-- Wait split second to allow event to propagate.
splash:wait(0.1)
return splash:html()
end
"""
Then, when you request, you modify the endpoint and set it to "execute", and you add "lua_script": _script to the args.
Exemple :
def parse(self, response):
yield SplashRequest(response.url, self.parse_elem,
endpoint="execute",
args={"lua_source": _script})
You will find all the informations about splash scripting here
I just answered a similar question here: scraping ajax based pagination. My solution is to get the current and last pages and then replace the page variable in the request URL.
Also - the other thing you can do is look on the network tab in the browser dev tools and see if you can identify any API that is called. If you look at the requests under XHR you can see those that return json.
You can then call the API directly and parse the json/ html response. Here is the link from the scrapy docs:The Network-tool

How to parse url with underscoreJS

I have an url in this format:
domain.com/update/item/1
How I could retrieve the number 1 from url (for instance, parsing the url) and make me able to use it?
To give some more details about the scenario:
the page rendered associated with that url is a backbone template
called by a view
I need that value 1 to retrieve and than to use it in a html form
it will be useful if exsist a way to retrieve that value directly from the template page using the inline code (for istance, using <%
codeHere %>

Rendering Jinja template in Flask following ajax response

This is my first dive into Flask + Jinja, but I've used HandlebarsJS a lot in the past, so I know this is possible but I'm not sure how to pull this off with Flask:
I'm building an app: a user enters a string, which is processed via python script, and the result is ajax'd back to the client/Jinja template.
I can output the result using $("body").append(response) but this would mean I need to write some nasty html within the append.
Instead, I'd like to render another template once the result is processed, and append that new template in the original template.
Is this possible?
My python:
from flask import Flask, render_template, request, jsonify
from script import *
app = Flask(__name__)
#app.route('/')
def index():
return render_template('index.html')
#app.route('/getColors')
def add_colors():
user = request.args.get("handle", 0, type = str)
return jsonify(
avatar_url = process_data(data)
)
if __name__ == '__main__':
app.run()
There is no rule about your ajax routes having to return JSON, you can return HTML exactly like you do for your regular routes.
#app.route('/getColors')
def add_colors():
user = request.args.get("handle", 0, type = str)
return render_template('colors.html',
avatar_url=process_data(data))
Your colors.html file does not need to be a complete HTML page, it can be the snippet of HTML that you want the client to append. So then all the client needs to do is append the body of the ajax response to the appropriate element in the DOM.

render html page /#:id using flask

I'm using Flask and want to render html pages and directly focus on a particular dom element using /#:id.
Below is the code for default / rendering
#app.route('/')
def my_form():
return render_template("my-form.html")
Below is the function being called upon POST request.
#app.route('/', methods=['POST'])
def my_form_post():
#...code goes here
return render_template("my-form.html")
I want to render my my-form.html page as my-form.html/#output so that it should directly focus upon the desired element in the dom.
But trying return render_template("my-form.html/#output") tells that there's no such file and trying #app.route('/#output', methods=['POST']) doesn't work either.
UPDATE:
Consider this web-app JSON2HTML - http://json2html.herokuapp.com/
What is happening: Whenever a person clicks send button the textarea input and the styling setting is send over to my python backend flask-app as below.
#app.route('/', methods=['POST'])
def my_form_post():
#...code for converting json 2 html goes here
return render_template("my-form.html",data = processed_data)
What I want: Whenever the send button is clicked and the form data is POSTED and processed and the same page is redirected with a new parameter which contains the processed_data to be displayed. My problem is to render the same page appending the fragment identifier #outputTable so that after the conversion the page directly focuses on the desired output the user wants.
The fragment identifier part of the URL is never sent to the server, so Flask does not have that. This is supposed to be handled in the browser. Within Javascript you can access this element as window.location.hash.
If you need to do your highlighting on the server then you need to look for an alternative way of indicating what to highlight that the server receives so that it can give it to the template. For example:
# focus element in the query string
# http://example.com/route?id=123
#app.route('/route')
def route():
id = request.args.get('id')
# id will be the given id or None if not available
return render_template('my-form.html', id = id)
And here is another way:
# focus element in the URL
#http://example.com/route/123
#app.route('/route')
#app.route('/route/<id>')
def route(id = None):
# id is sent as an argument
return render_template('my-form.html', id = id)
Response to the Edit: as I said above, the fragment identifier is handled by the web browser, Flask never sees it. To begin, you have to add <a name="outputTable">Output</a> to your template in the part you want to jump to. You then have two options: The hacky one is to write the action attribute of your form including the hashtag. The better choice is to add an onload Javascript event handler that jumps after the page has loaded.

Extract feeds from web page

I'm looking for a code snippet (language is not important here) that will
extract all feeds (RSS, atom etc.) that is associated with this page.
So input is URL and output list of channels.
Important is completeness, it means if the page has associated some information channel
it should be found.
I'm asking preferably for what to find in HTML code and where to find to cover completeness.
thank you
You find feeds in the head tag in html files. There they should be specified as link tags with an associated content type and a href attribute specifying it's location.
To extract all feed urls from a page using python you could use something like this:
import urllib
from HTMLParser import HTMLParser
class FeedParser(HTMLParser):
def __init__(self, *args, **kwargs):
self.feeds = set()
HTMLParser.__init__(self, *args, **kwargs)
def handle_starttag(self, tag, attrs):
if tag == 'link':
try:
href = [attr[1] for attr in attrs if attr[0] == 'href'][0]
except IndexError:
return None
else:
if ('type', 'application/atom+xml') in attrs or ('type', 'application/rss+xml') in attrs:
self.feeds.add(href)
def get_all_feeds_from_url(url):
f = urllib.urlopen(url)
contents = f.read()
f.close()
parser = FeedParser()
parser.feed(contents)
parser.close()
return list(parser.feeds)
This code would have to be extended quite a bit though if you want to cover all the quirky ways a feed can be added to a html page.

Categories

Resources