Stop text being made HTML unless needed

Stop text being made HTML unless needed - javascript

I have the ability for a user to post some text into a message feed. They are able to type whatever they want.
The backend server detects that there are links and other linkable elements (like tagging someone else), then saves these links into a database.
When a user is served up the full list of messages, they get shown what the user has posted, and anything that was meant to be a link is turned into a link
For example someone typed www.foobar.com so the post will show <a href='www.foobar.com'>www.footbar.com</a>
This works great, however what if someone types HTML in themselves. First off it will mess with formatting, and secondly it allows users to post their own HTML which can be dangerous for obvious reasons.
I need a way of not turning the text that comes back from the API into HTML unless it was HTML that I intended. However I don't want to STOP someone trying to post HTML, just stop it turning into HTML
Twitter does it correctly. If I post this to twitter
<a href='www.foobar.com'>www.footbar.com</a>
It will show me exactly what you see there, but the www.foobar.com will be highlighted and clickable.
Although it may not be relevant, I am using a ASP.NET Web API backend and Knockout.JS for the front end
UPDATE
I am using the following code to show the text (knockout JS observables)
<div data-bind="foreach: allMessages">
<span data-bind="html: theText"></span>

Your backend code should encode the data coming in. this will change the html tags into text based.
Example is <div>test</div> changes to %3Cdiv%3Etest%3C/div%3E. this will prevent data getting into your database that you don't want.

Related

How to scrape data from website when the data is only accessible by pressing a button?

I want to scrape data from a website within my java-application. The data I want to collect is inside a html-table-element. I tried two different methods:
I tried to load the website with a BufferedReader into a String and collect the data from the String.
I tried to use Jsoup to get access to the exact html-element, but it's empty.
Turns out that the table exists, but it is empty as long as the user has not pressed a button (labled "load raw data"). I inspected the sourcecode of the webpage. When the user presses the button, a load_table()-function is called which loads the data into the table. Obviously, the URL remains the same, otherwise I could've just used the other URL where the data is already loaded into the table. Has anyone an idea on how to scrape data from a website although it's only on the website if the user presses a button after the website is loaded?
I'm not really a trained Javascript-coder, but I tried to look through the script which is executed after the user presses the button. It's kind of hard to understand for me but I made a pastebin of the script with a highlighting where I think the rows are added to the table if that helps. The code for the button is:
Load raw data
The code I use to access the html element with Jsoup would be (all the child(x) methods are called on different div-elements to go deeper into the html-document until I finally reach the table-element):
Jsoup.connect(url).get().body().children().get(5).child(0).child(4).child(1).child(1);
As I stated above, the element is empty. I hope the description of my problem is detailed enough and somebody has at least an idea of what I'm trying to say. Sorry for my clumsy expressions. Not a native speaker.

if you are familiar with selenim webdriving you could use selenium to load the page and then pass to source page into beautifulSoup argument.
html = pageSource()
you could parse the page by this method i guess

How does Google Calendar update the content of an email AFTER it is sent?

Google calendar invite emails will update after they are sent if the original event has been changed... how does Google achieve this? Is there a general technique for anyone to do this? Or is this only possible because Google owns both gMail/gCalendar and the two systems are integrated behind the scenes outside of SMTP?
My first guess was that they used an iframe or an image that was loaded when the email was opened, but inspecting the source of the gMail page doesn't show any signs of that.
Here's a screenshot of the updated text:
And here's the HTML for that section of the page when reading the email within gMail:

Note :
Inspecting Source wil give you nothing other than the markup of the content you see in the page after all dynamic operations including ajax.
To check the actual source, you want to visit view-source:url.
Now the question
That information is updated automatically at Run time via a JavaScript code.
In the image, you checked on Inspect element, which show the code of live view and so, you saw the updated content.
It is done by JavaScript DOM and text manipulation.
To verify this,
Click on the address bar.
add view-source: before the url. So, it will look like view-source:https://url
Then press ctrl+f or the corresponding key to find.
Search for the <div id=":8hg" which will show 0 results.
The view-source is load the source of the file without any ajax or JavaScript manipulation.
The div is not present in the source. So, we can understand that it is done dynamically.
When checking in detail,
in the source, we can see a link https://www.google.com/calendar/event?action\u003dVIEW\u0026eid\u003db..... which is stored in an array.
From this link, the content is taken.
(I blacked out some text for privacy).
Based on the return of the url, the content on mail is upated.
To verify this,
In the mail, you can see This invitation is out of date
But in the view-source: page, search for This invitation is out of date and it will return 0 results.
So, it is sure that the Calendar details are taken via an API call by Gmail to the G Calendar API.

I wonder if on sending the email they create an image at some url and then if it changes they just remove it, then in the email they have something like
<div id="updated"></div>
<img src="asdfawe" onerror="document.getElementById('updated').innerhtml="some text""/>
Although im not sure if they can't use the onerror attribute (b/c email + js = bad idea). the only other way is just to use alt attribute and use some css trickery but I don't see how that could result in the inspected code.

How to remove hidden divs from view source of a page?

I have a HTML Page, in which there are some hidden DIVs and these DIVs are visible vai view source of a page. These DIVs should not be visible to a user when they "view source" of the page.
How this can be done ? Perhaps Javascript or other solution?

You can't really prevent a div from being read, because if you do, there will be no render of it.
It can be encrypted and generated via javascript. But once it is generated, user will be able to see it clearly in computed source.

There is no way of doing what you want. The source (in case of HTML) is just text containing HTML markup. The show source view in the browser shows it to you as it came from the server with added syntax highlighting, but unlike the developer tools, it doesn't reflect any DOM changes done with Javascript. Even if some browser had a feature to prevent some parts of the source from being displayed, users will still be able to open it in another browser or download the HTML as a file and examine the source in a text editor.

JavaScript will only change the "computed source" so the client will still be able to see them. In order to really remove them you'll need to remove them server side.

You can not really hide the source code but you can encrypt it. What you transmit from Server to the Client will be in the client side browser and can be seen somehow.
With a tool like the one I just googled http://www.iwebtool.com/html_encrypter it is possible to encrypt html.
It will encrypt your html code and you can insert it via javascript later. Encryption will not finally hide it from someone keen in using debugging tools. But a "normal" user won't see it directly in the source.
Still you should be thinking about storing information you want to hide from the user server-side in a session or something.

User-Generated-Content—where to start? Users submits content into divs which adds to the gallery

A user would click a submit button and a function would create a div in a gallery on my site with which they could link to if they wanted to share that content specifically. The content is just embedded and hosted on other sites like youtube so the user would not be actually uploading any content or need an account. It's a free open gallery that anyone could copy a url and paste into an input and submit that content into a div in the gallery.
Any ideas where to start? Would this require php?

Well if you're a super beginner or something the first step would be to make your website just the way you want it and inside these div's you can just put the url that the user submitted instead of the content that url points to. [If you can do this then I assume you wouldn't need to ask this question, so don't mind me treating you like a complete beginner]
How would you achieve this? Well you're definitely need:
Some sort of server side language (php is a good choice) that allows you to use the input from the user (The POST request from the form he/she submits),
Check it for correctness / clean up the input / supported websites, etc.
Save this information somewhere (a database) so that you can get it back later.
The next steps would be to now get the information from the database and show it on your gallery page like you want it. This involves:
Getting whatever subset of information you want to display on a particular page from the database. Perhaps only cat related things or something, I don't know.
Just displaying it in your div's using a for loop or something.
.
foreach ($subset as $url) {
echo "<div>$url</div>";
}
Then the last step would be to convert these links into actual videos / images or whatever depending on the type of link. This can be done both client side using Javascript / server side using php or some other language.
This is going to be a lot of manual work, looking through every websites api and figuring out how to convert a url into a video for example. Images are easy but they may be hotlink protected so you might have to go through an API there as well.

What is the best approach for storing YouTube content appended to user contributed content?

So I'm trying to implement similar functionality to Facebook where I am including information from YouTube should a users post contain a link, and when clicked it embeds the video.
I've accomplished it thus far, I'm just wondering how Facebook stores this information.
To me there are two options:
1) Have the post saved as normal (it is just plain text), and if the post contains a youtube link, append it on the fly in JavaScript whenever that content is viewed. However I know that when you post a link, Facebook gives you the option to change the title, description etc. Which leads me to..
2) Generate the HTML that would be otherwise appended when viewed and store it alongside the post at the database-insert level.
If so, doesnt that add a significant amount of information per post? What happens if you want to change the formatting of all youtube content within posts on your site later on? Each will be stored individually and seems like it would be a pain.
What is the best way to manage & engineer this sort of functionality?
Cheers,

I'd store the information itself in the database, but not as HTML. Generate the HTML on the fly but store the data in a separate place. If you don't want to add too many extra database fields consider storing the information in some serialized form (like serialize() in PHP).
Anyway I would always keep information separated and never store auto-generated HTML unless it's some sort of cache that can be re-generated.

If you want the user to include his video within his text, store the link in HTML within that user's intervention, and output it as is from the database on the page. Then your users can edit their posts to decide whether to place the video before, after, in the middle or not at all, and can change the details in HTML.
If you are showing the video in some standard way, then store the video link along with the post in a separate database column, and generate the HTML on the fly. You can have data in columns for size, colour etc..., but the flexibility will always be limited to what you decide to store: if there is a database coloumn for colour then you are letting the user choose the colour, otherwise... not.
So, the most flexible is to let your users type HTML. If you think they aren't up to it, or you want to limit their choices of what they can format, you could use a java(script) rich text editor of the type that you have in stackoverflow, wikipedia etc., with possibilities to edit text in certain chosen ways via buttons. You could also store the post in XML, say in a chosen subset of HTML5 (anything that is valid in a certain container...), and transform it at presentation time.

to me this sounds like a problem that was taken too far.
if you implement ckeditor in your post form,
it should resolve the problem ( if i understood it right ),
since in ckeditor you can embed an swf/flv,
and the output will be html.
that gives the editor the power to decide exactly where he want the video ( since he can add the link wherever he wants in the form ).
since the flv/swf come with its meta data from youtube, you dont need to save that data,
just the link to the video.

Develop Reference

JavaScript is the programming language of the Web.