Link terms on page to Wikipedia articles in pure JavaScript - javascript

While browsing I came across this blog post about using the Wikipedia API from JavaScript, to link a single search term to it's definition. At the end of the blog post the author mentions possible extensions including:
A plugin which auto links terms to Wikipedia articles.
This fits the bill perfectly for a project requirement I'm working on, but sadly I lack the programming skills to extend the original source code. What I'd like is to have a pure JavaScript snippet I can add to a webpage, that links all the terms on that webpage that have an article on an internal wiki to that wiki.
I know this might be asking for much, but the code looks like it's nearly there, and I'd be willing to add a bounty if anyone will do the remaining work for that virtual credit.. ;) I also suspect this might be of value to a few others, as I've seen similar requests but no working implementation (that's a mere JavaScript (and therefore portable) library/snippet include).
Here's a sample of the original source code, I hope anyone is able to add to this or point me to what I'd need to add if I were to implement this myself (in which case I'll share the code if I manage to put something together).
<script type="text/javascript"><!--
var spellcheck = function (data) {
var found = false; var url=''; var text = data [0];
if (text != document.getElementById ('spellcheckinput').value)
return;
for (i=0; i<data [1].length; i++) {
if (text.toLowerCase () == data [1] [i].toLowerCase ()) {
found = true;
url ='http://en.wikipedia.org/wiki/' + text;
document.getElementById ('spellcheckresult').innerHTML = '<b style="color:green">Correct</b> - <a target="_top" href="' + url + '">link</a>';
}
}
if (! found)
document.getElementById ('spellcheckresult').innerHTML = '<b style="color:red">Incorrect</b>';
};
var getjs = function (value) {
if (! value)
return;
url = 'http://en.wikipedia.org/w/api.php?action=opensearch&search='+value+'&format=json&callback=spellcheck';
document.getElementById ('spellcheckresult').innerHTML = 'Checking ...';
var elem = document.createElement ('script');
elem.setAttribute ('src', url);
elem.setAttribute ('type','text/javascript');
document.getElementsByTagName ('head') [0].appendChild (elem);
};--></script>
<form action="#" method="get" onsubmit="return false">
<p>Enter a word - <input id="spellcheckinput" onkeyup="getjs (this.value);" type="text"> <span id="spellcheckresult"></span></p></form>
Update
As pointed out in the comments, both the time it would take to link all words and how to handle multiple word spanning article names were concerns of mine as well..
I'd think starting with single word articles would already cover a large percentage of the use cases, with maybe some performance benefits gained when skipping the 500 most common words in the English language, but still I'm uncertain how feasible this approach will be..
On the upside however this would all be client side, and some delay in linking terms is fully acceptable.
Alternatively searching for terms the mouse is hovering over / selected might be acceptable as well, but I'm unsure if this would decrease or increase complexity..
Update 2
'Pointy' explained below that this functionality could be achieved by altering some fairly standard highlighting scripts, after having obtained a list of article topics from api.php?action=query&list=allpages.
To reinterate: we're using an internal wiki, so the list of articles is likely limited, non ambiguous and domain specific enough to overcome some of the expected problems in matching words.
Since we've had some good suggestions so far, and a few workable ideas, I'm starting a bounty to see if I can get a few answers on this..

Perhaps something like this might help:
Assuming very simple HTML/Text like so:
<div id="theText">Testing the auto link system here...</div>
And two very small scripts.
dictionary.js sets up your list of your terms. My thought was that this could be generated in php by querying the articles database if you wanted. It also can be loaded cross domain (as it sets window.termsRE). If you don't need to generate the list from the database, you could also manually put it with termlinker.js.
This code that generates the RegExp assumes that your terms array contains properly formatted strings to match using Regular Expressions, so be sure to use \\ to escape []\.?*+|(){}^&
// dictionary.js - define some terms
var terms = ['testing', 'auto link'];
window.termsRE = new RegExp("\\b("+terms.join("|")+")\\b",'gi');
termlinker.js is just a simple regexp search replace on the defined terms. It could be an inline <script> too. requires that the dictionary.js has been loaded before you run it.
// termlinker.js - add some tags
var element = document.getElementById("theText");
element.innerHTML = element.innerHTML.replace(termsRE, function(term) {
return "<a href='http://en.wikipedia.org/wiki/"+escape(term)+"'>"+term+"</a>";
});
This simply searches for any words in the terms array and replaces them with a link to the term. Of course, it will also match properties and values inside HTML tags, which could break your markup a little.
All thrown together you get this (jsbin preview)
Using the API
Based off of the "minimum case" from before, here is the code sample for using the API to receive the list of words directly and the jsbin preview
// Utility Function
RegExp.escape = function(text) {
if (!arguments.callee.sRE) {
var specials = [
'/', '.', '*', '+', '?', '|',
'(', ')', '[', ']', '{', '}', '\\'
];
arguments.callee.sRE = new RegExp(
'(\\' + specials.join('|\\') + ')', 'g'
);
}
return text.replace(arguments.callee.sRE, '\\$1');
};
// JSONP Callback for receiving the API
function receiveAPI(data) {
var terms = [];
if (!data || !data['query'] || !data['query']['allpages']) return false;
var pages = data.query.allpages
for (var x in pages) {
terms.push(RegExp.escape(pages[x].title));
}
window.termsRE = new RegExp("\\b("+terms.reverse().join("|")+")\\b",'gi');
linkterms();
}
function linkterms() {
var element = document.getElementById("theText");
element.innerHTML = element.innerHTML.replace(termsRE, function(term) {
return "<a href='http://en.wikipedia.org/wiki/"+escape(term)+"'>"+term+"</a>";
});
}
// the apfrom=testing can be removed, it is only there so that
// we can get some useful terms near "testing" to work with.
// we are limited to 500 terms for the purpose of this demo:
url = 'http://en.wikipedia.org/w/api.php?action=query&list=allpages&aplimit=500&format=json&callback=receiveAPI' + '&apfrom=testing';
var elem = document.createElement('script');
elem.setAttribute('src', url);
elem.setAttribute('type','text/javascript');
document.getElementsByTagName('head')[0].appendChild (elem);

Related

HTML Theft Prevention

I'm working on designing a website, and I want to make sure that no one can steal the code. I would like to prevent the code from being taken out of the website, and display an error message if a user tries to do so.
HTML Obfuscation is a transformational tool that both preserves the code and prevents it from being reverse-engineered. You can find out more about it here.
Here is an example of obfuscated code.
This is extremely simple HTML code:
Mail me
This can be turned into this:
<script type="text/javascript">
<!--
var s="=b!isfg>#nbjmup;tpnfpofAepnbjo/dpn#?Nbjm!nf=0b?";
m=""; for (i=0; i<s.length; i++).m+=String.fromCharCode(s.charCodeAt(i)-1); document.write(m);
//-->
</script>
<noscript>
&#13&#10&#60&#97&#32&#104&#114&#101&#102&#61&#34&#109&#97&#105&#108&#116&#111&#58&#115&#111&#109&#101&#111&#110&#101&#64&#100&#111&#109&#97&#105&#110&#46&#99&#111&#109&#34&#62&#77&#97&#105&#108&#32&#109&#101&#60&#47&#97&#62
</noscript>
This is called Combined obfuscation.
<script type="text/javascript">
<!--
var s="=b!isfg>#nbjmup;tpnfpofAepnbjo/dpn#?Nbjm!nf=0b?";
m=""; for (i=0; i<s.length; i++) m+=String.fromCharCode(s.charCodeAt(i)-1); document.write(m);
//-->
</script>
<noscript>
You must enable JavaScript to see this text.
</noscript>
This is called JavaScript obfuscation.
&#13&#10&#60&#97&#32&#104&#114&#101&#102&#61&#34&#109&#97&#105&#108&#116&#111&#58&#115&#111&#109&#101&#111&#110&#101&#64&#100&#111&#109&#97&#105&#110&#46&#99&#111&#109&#34&#62&#77&#97&#105&#108&#32&#109&#101&#60&#47&#97&#62&#13&#10
This is called Character Entities obfuscation.
All of these methods are entirely free on that website, and let you keep all your code private.
EDIT:
After further research, I found another website, JSF**K, which lets you encode items using a series of brackets, parentheses, exclamations and plus signs. Below is how it encodes a simple item:
alert(1)
becomes:
[][(![]+[])[+[]]+([![]]+[][[]])[+!+[]+[+[]]]+(![]+[])[!+[]+!+[]]+(!![]+[])[+[]]+(!![]+[])[!+[]+!+[]+!+[]]+(!![]+[])[+!+[]]][([][(![]+[])[+[]]+([![]]+[][[]])[+!+[]+[+[]]]+(![]+[])[!+[]+!+[]]+(!![]+[])[+[]]+(!![]+[])[!+[]+!+[]+!+[]]+(!![]+[])[+!+[]]]+[])[!+[]+!+[]+!+[]]+(!![]+[][(![]+[])[+[]]+([![]]+[][[]])[+!+[]+[+[]]]+(![]+[])[!+[]+!+[]]+(!![]+[])[+[]]+(!![]+[])[!+[]+!+[]+!+[]]+(!![]+[])[+!+[]]])[+!+[]+[+[]]]+([][[]]+[])[+!+[]]+(![]+[])[!+[]+!+[]+!+[]]+(!![]+[])[+[]]+(!![]+[])[+!+[]]+([][[]]+[])[+[]]+([][(![]+[])[+[]]+([![]]+[][[]])[+!+[]+[+[]]]+(![]+[])[!+[]+!+[]]+(!![]+[])[+[]]+(!![]+[])[!+[]+!+[]+!+[]]+(!![]+[])[+!+[]]]+[])[!+[]+!+[]+!+[]]+(!![]+[])[+[]]+(!![]+[][(![]+[])[+[]]+([![]]+[][[]])[+!+[]+[+[]]]+(![]+[])[!+[]+!+[]]+(!![]+[])[+[]]+(!![]+[])[!+[]+!+[]+!+[]]+(!![]+[])[+!+[]]])[+!+[]+[+[]]]+(!![]+[])[+!+[]]]((![]+[])[+!+[]]+(![]+[])[!+[]+!+[]]+(!![]+[])[!+[]+!+[]+!+[]]+(!![]+[])[+!+[]]+(!![]+[])[+[]]+(![]+[][(![]+[])[+[]]+([![]]+[][[]])[+!+[]+[+[]]]+(![]+[])[!+[]+!+[]]+(!![]+[])[+[]]+(!![]+[])[!+[]+!+[]+!+[]]+(!![]+[])[+!+[]]])[!+[]+!+[]+[+[]]]+[+!+[]]+(!![]+[][(![]+[])[+[]]+([![]]+[][[]])[+!+[]+[+[]]]+(![]+[])[!+[]+!+[]]+(!![]+[])[+[]]+(!![]+[])[!+[]+!+[]+!+[]]+(!![]+[])[+!+[]]])[!+[]+!+[]+[+[]]])()
This is practically impossible to crack, as you'd need to "fuzz" the website with data to obtain the character codes and then use regular expressions to build a decoder.
#JBDouble05 gave a great answer to your question and I totally recommend it. I wanted to share an example that I threw together for fun, that employs some of the techniques he described. It uses HTML obfuscation via zero-width whitespace characters. I also threw in JSF*CK to really make the source code interesting :D
First, the URL which is serving the obfuscated code: http://jackpattishall.com/obfuscated.html
(Uses padStart, so you'll need to view in a browser that supports that!)
If you view the source (using Chrome), you'll notice that 98% or so of the markup visible is JSF*CK (basically all those() and []):
If you scroll long enough, you'll see a variable that seems to be assigned nothing:
The var m is actually assigned the following zero-width whitespace characters:
const m = "​​‍‍‍‍​​​‍‍​​‍​​​‍‍​‍​​‍​‍‍‍​‍‍​​​‍​​​​​​‍‍​​​‍‍​‍‍​‍‍​​​‍‍​​​​‍​‍‍‍​​‍‍​‍‍‍​​‍‍​​‍‍‍‍​‍​​‍​​​‍​​‍‍​‍‍‍‍​‍‍​​​‍​​‍‍​​‍‍​​‍‍‍​‍​‍​‍‍‍​​‍‍​‍‍​​​‍‍​‍‍​​​​‍​‍‍‍​‍​​​‍‍​​‍​‍​‍‍​​‍​​​​‍​​​‍​​​‍‍‍‍‍​​‍​​‍​​​​‍‍​​‍​‍​‍‍​‍‍​​​‍‍​‍‍​​​‍‍​‍‍‍‍​​‍​‍‍‍​​​‍​​​​​​‍​‍​‍​​​‍‍​‍​​​​‍‍​‍​​‍​‍‍‍​​‍‍​​‍​​​​​​‍‍‍​‍​​​‍‍​​‍​‍​‍‍‍‍​​​​‍‍‍​‍​​​​‍​​​​​​‍‍‍​‍‍‍​‍‍​​​​‍​‍‍‍​​‍‍​​‍​​​​​​‍‍​‍‍‍‍​‍‍​​​‍​​‍‍​​‍‍​​‍‍‍​‍​‍​‍‍‍​​‍‍​‍‍​​​‍‍​‍‍​​​​‍​‍‍‍​‍​​​‍‍​​‍​‍​‍‍​​‍​​​​‍​‍‍‍​​​‍​​​​​​‍​‍​‍​‍​‍‍‍​​‍‍​‍‍​‍​​‍​‍‍​‍‍‍​​‍‍​​‍‍‍​​‍​​​​​​‍​​‍​‍​​‍‍​​​​‍​‍‍‍​‍‍​​‍‍​​​​‍​‍​‍​​‍‍​‍‍​​​‍‍​‍‍‍​​‍​​‍‍​‍​​‍​‍‍‍​​​​​‍‍‍​‍​​​​‍​‍‍​​​​‍​​​​​​‍‍‍​‍‍‍​‍‍​​‍​‍​​‍​​​​​​‍‍‍​‍​​​‍‍‍​​‍​​‍‍​​​​‍​‍‍​‍‍‍​​‍‍‍​​‍‍​‍‍​​‍‍​​‍‍​‍‍‍‍​‍‍‍​​‍​​‍‍​‍‍​‍​​‍​​​​​​‍‍‍​‍​​​‍‍​‍‍‍‍​​‍​​​​​​‍‍‍​​‍‍​‍‍​‍‍‍‍​‍‍​‍‍​‍​‍‍​​‍​‍​‍‍‍​‍​​​‍‍​‍​​​​‍‍​‍​​‍​‍‍​‍‍‍​​‍‍​​‍‍‍​​‍​​​​​​‍‍‍​​‍​​‍‍​​‍​‍​‍‍​​​​‍​‍‍​​‍​​​‍‍​​​​‍​‍‍​​​‍​​‍‍​‍‍​​​‍‍​​‍​‍​​‍​​​​‍​​‍‍‍‍​​​​‍​‍‍‍‍​‍‍​​‍​​​‍‍​‍​​‍​‍‍‍​‍‍​​​‍‍‍‍‍​";
Try copy/pasting the previous line in Chrome console. You should get something like:
(Any text editor that shows special characters will do the same!)
The massive JSF*CK code is basically the following (but minified):
const zero_regex = new RegExp(zero, 'g');
const one_regex = new RegExp(one, 'g');
const binToText = text => {
let str = text.replace(zero_regex, '0').replace(one_regex, '1');
if (str.match(/[10]{8}/g)) {
return str.match(/([10]{8}|\s+)/g).map(val => {
return String.fromCharCode(parseInt(val, 2));
}).join('');
}
}
The script responsible for the obfuscation looks like:
// Our zero-width whitespace chars
const zero = '​';
const one = '‍';
const textToBin = text => {
let len = text.length;
let output = [];
let i = 0;
for (; i < len; i++) {
output.push(text[i].charCodeAt().toString(2).padStart(8, '0'));
}
return output.join('').replace(/0/g, zero).replace(/1/g, one);
}
Here's a JSFiddle that shows a bit more of the magic:
http://jsfiddle.net/z5gu4bq1/
Hope this was helpful. Have fun with JavaScript! And please never do this in Production :)
Resources:
JSF*CK
White-space Obfuscation Reference

Regex to find <a> tags containing links to specific file types

I am trying to write a small jQuery / javascript function that searches through all the links on a page, identifies the type of file to which the tag links, and then adds an appropriate class. The purpose of this task is to style the links depending on the type of file at the other end of the link.
So far I have this:
$(document).ready(function(){
$('#rt-mainbody a').each(function(){
linkURL = $(this).attr('href');
var match = linkURL.match("^.*\.(pdf|PDF)$");
if(match != null){$(this).addClass('pdf');}
});
});
Fiddle me this.
And then I would continue the concept to identify, for example, spreadsheet files, Word documents, text files, jpgs, etc.
it works... but the thing is, to me this is super clunky because I have completely botched it together from odds and sods I've found around SO and the internet - I'm sure there must be a neater, more efficient, more readable way of doing this but I have no idea what it might be. Can someone give it a spit and polish for me, please?
Ideally the function should detect (a) that the extension is at the end of the href string, and (b) that the extension is preceded by a dot.
Thanks! :)
EDIT
Wow! Such a response! :) Thanks guys!
When I saw the method using simply the selector it was a bit of a facepalm moment - however the end user I am building this app for is linking to PDFs (and potentially other MIMEs) on a multitude of resource websites and has no control over the case usage of the filenames to which they'll be linking... using the selector is clearly not the way to go because the result would be so inconsistent.
EDIT
And the grand prize goes to #Dave Stein!! :D
The solution I will adopt is a "set it and leave it" script (fiddle me that) which will accommodate any extension, regardless of case, and all I need to do is tweak the CSS for each reasonable eventuality.
It's actually nice to learn that I was already fairly close to the best solution already... more through good luck than good judgement though XD
Well you don't want to use regex to search strings so I like that you narrowed it to just links. I saved off $(this) so you don't have to double call it. I also changed the regex so it's case insensitive. And lastly I made sure that the class is adding what the match was. This accomplish what you want?
$(document).ready(function(){
$('#rt-mainbody a').each(function(){
var $link = $(this),
linkURL = $link.attr('href'),
// I can't remember offhand but I think some extensions have numbers too
match = linkURL.match( /^.*\.([a-z0-9]+)$/i );
if( match != null ){
$link.addClass( match[1].toLowerCase() );
}
});
});
Oh and I almost forgot, I made sure linkURL was no longer global. :)
"Attribute ends with" selector:
$('#rt-mainbody a[href$=".pdf"], #rt-mainbody a[href$=".PDF"]').addClass('pdf')
EDIT: Or more generally and flexibly:
var types = {
doc: ['doc', 'docx'],
pdf: ['pdf'],
// ...
};
function addLinkClasses(ancestor, types) {
var $ancestor = $(ancestor);
$.each(types, function(type, extensions) {
selector = $.map(extensions, function(extension) {
return 'a[href$=".' + extension + '"]';
}).join(', ');
$ancestor.find(selector).addClass(type);
});
}
addLinkClasses('#rt-mainbody', types);
This is case sensitive, so I suggest you canonicalise all extensions to lowercase on your server.
Regex should be /^.*\.(pdf)$/i .
You can use this in your selector (to find all links to pdf files)
a[href$=".pdf"]
use this regex (without quotes):
/\.(pdf|doc)$/i
this regex matches (case insensitive) anything that ends with .pdf, .doc etc.
for dynamic class:
var match = linkURL.match(/\.(pdf|doc)$/i);
match = match ? match[1].toLowerCase() : null;
if (match != null) {
$(this).addClass(match);
}
Another answer, building off of #Amadan is:
var extensions = [
'pdf',
'jpg',
'doc'
];
$.each( extensions, function( i, v) {
$('#rt-mainbody').find( 'a[href$=".' + v + '"], a[href$=".' + v.toUpperCase() + '"]')
.addClass( extension );
});
The onyl suggestion I would make is that you can change your match to inspect what is the file extension instead of having to do a different regex search for each possible file extension:
var linkURL = $(this).attr('href'); //<--you were accidentally declared linkURL as a global BTW.
var match = linkURL.match(/\.(.*)$/);
if(match != null){
//we can extract the part between the parens in our regex
var ext = match[1].toLowerCase()
switch(ext){
case 'pdf': $(this).addClass('pdf'); break;
case 'jpg': $(this).addClass('jpg'); break;
//...
}
}
This switch statement mostly useful if you want the option of using class names that are different from your file extensions. If the file extension is always the same you can consider changing the regex to something that fits the file extensions you want
/\.(pdf|jpg|txt)$/i //i for "case insensitive"
and then just do
var ext = match[1].toLowerCase()
$(this).addClass(ext);

Detect if string contains javascript tags using jQuery/JavaScript

I am trying to create a very simplistic XSS detection system for a system I am currently developing. The system as it stands, allows users to submit posts with javascript embedded within the message. Here is what I currently have:-
var checkFor = "<script>";
alert(checkFor.indexOf("<script>") !== -1);
This doesn't really work that well at all. I need to write code that incorporates an array which contains the terms I am searching for [e.g - "<script>","</script>","alert("]
Any suggestions as to how this could be achieved using JavaScript/jQuery.
Thanks for checking this out. Many thanks :)
Replacing characters is a very fragile way to avoid XSS. (There are dozens of ways to get < in without typing the character -- like < Instead, HTML-encode your data. I use these functions:
var encode = function (data) {
var result = data;
if (data) {
result = $("<div />").html(data).text();
}
};
var decode = function (data) {
var result = data;
if (data) {
result = $("<div />").text(data).html();
}
};
As Explosion Pills said, if you're looking for cross–site exploits, you're probably best to either find one that's already been written or someone who can write one for you.
Anyway, to answer the question, regular expressions are not appropriate for parsing markup. If you have an HTML parser (client side is easy, server a little more difficult) you could insert the text as the innerHTML of an new element, then see if there are any child elements:
function mightBeMarkup(s) {
var d = document.createElement('div');
d.innerHTML = s;
return !!(d.getElementsByTagName('*').length);
}
Of course there still might be markup in the text, just that it's invalid so doesn't create elements. But combined with some other text, it might be valid markup.
The most effective way to prevent xss attacks is by replacing all <, > and & characters with
<, >, and &.
There is a javascript library from OWASP. I haven't worked with it yet so can't tell you anything about the quality. Here is the link: https://www.owasp.org/index.php/ESAPI_JavaScript_Readme

Replace all strings "<" and ">" in a variable with "<" and ">"

I am currently trying to code an input form where you can type and format a text for later use as XML entries. In order to make the HTML code XML-readable, I have to replace the code brackets with the corresponding symbol codes, i.e. < with < and > with >.
The formatted text gets transferred as HTML code with the variable inputtext, so we have for example the text
The <b>Genji</b> and the <b>Heike</b> waged a long and bloody war.
which needs to get converted into
The <b>Genji</b> and the <b>Heike</b> waged a long and bloody war.
I tried it with the .replace() function:
inputxml = inputxml.replace("<", "<");
inputxml = inputxml.replace(">", ">");
But this would just replace the first occurrence of the brackets. I'm pretty sure I need some sort of loop for this; I also tried using the each() function from jQuery (a friend recommended I looked at the jQuery package), but I'm still new to coding in general and I have troubles getting this to work.
How would you code a loop which would replace the code brackets within a variable as described above?
Additional information
You are, of course, right in the assumption that this is part of something larger. I am a graduate student in Japanese studies and currently, I am trying to visualize information about Japenese history in a more accessible way. For this, I am using the Simile Timeline API developed by MIT grad students. You can see a working test of a timeline on my homepage.
The Simile Timeline uses an API based on AJAX and Javascript. If you don't want to install the AJAX engine on your own server, you can implement the timeline API from the MIT. The data for the timeline is usually provided either by one or several XML files or JSON files. In my case, I use XML files; you can have a look at the XML structure in this example.
Within the timeline, there are so-called "events" on which you can click in order to reveal additional information within an info bubble popup. The text within those info bubbles originates from the XML source file. Now, if you want to do some HTML formatting within the info bubbles, you cannot use code bracket because those will just be displayed as plain text. It works if you use the symbol codes instead of the plain brackets, however.
The content for the timeline will be written by people absolutely and totally not accustomed to codified markup, i.e. historians, art historians, sociologists, among them several persons of age 50 and older. I have tried to explain to them how they have to format the XML file if they want to create a timeline, but they occasionally slip up and get frustrated when the timeline doesn't load because they forgot to close a bracket or to include an apostrophe.
In order to make it easier, I have tried making an easy-to-use input form where you can enter all the information and format the text WYSIWYG style and then have it converted into XML code which you just have to copy and paste into the XML source file. Most of it works, though I am still struggling with the conversion of the text markup in the main text field.
The conversion of the code brackets into symbol code is the last thing I needed to get working in order to have a working input form.
look here:
http://www.bradino.com/javascript/string-replace/
just use this regex to replace all:
str = str.replace(/\</g,"<") //for <
str = str.replace(/\>/g,">") //for >
To store an arbitrary string in XML, use the native XML capabilities of the browser. It will be a hell of a lot simpler that way, plus you will never have to think about the edge cases again (for example attribute values that contain quotes or pointy brackets).
A tip to think of when working with XML: Do never ever ever build XML from strings by concatenation if there is any way to avoid it. You will get yourself into trouble that way. There are APIs to handle XML, use them.
Going from your code, I would suggest the following:
$(function() {
$("#addbutton").click(function() {
var eventXml = XmlCreate("<event/>");
var $event = $(eventXml);
$event.attr("title", $("#titlefield").val());
$event.attr("start", [$("#bmonth").val(), $("#bday").val(), $("#byear").val()].join(" "));
if (parseInt($("#eyear").val()) > 0) {
$event.attr("end", [$("#emonth").val(), $("#eday").val(), $("#eyear").val()].join(" "));
$event.attr("isDuration", "true");
} else {
$event.attr("isDuration", "false");
}
$event.text( tinyMCE.activeEditor.getContent() );
$("#outputtext").val( XmlSerialize(eventXml) );
});
});
// helper function to create an XML DOM Document
function XmlCreate(xmlString) {
var x;
if (typeof DOMParser === "function") {
var p = new DOMParser();
x = p.parseFromString(xmlString,"text/xml");
} else {
x = new ActiveXObject("Microsoft.XMLDOM");
x.async = false;
x.loadXML(xmlString);
}
return x.documentElement;
}
// helper function to turn an XML DOM Document into a string
function XmlSerialize(xml) {
var s;
if (typeof XMLSerializer === "function") {
var x = new XMLSerializer();
s = x.serializeToString(xml);
} else {
s = xml.xml;
}
return s
}
https://developer.mozilla.org/en/JavaScript/Reference/Global_Objects/String/replace
You might use a regular expression with the "g" (global match) flag.
var entities = {'<': '<', '>': '>'};
'<inputtext><anotherinputext>'.replace(
/[<>]/g, function (s) {
return entities[s];
}
);
You could also surround your XML entries with the following:
<![CDATA[...]]>
See example:
<xml>
<tag><![CDATA[The <b>Genji</b> and the <b>Heike</b> waged a long and bloody war.]]></tag>
</xml>
Wikipedia Article:
http://en.wikipedia.org/wiki/CDATA
What you really need, as mentioned in comments, is to XML-encode the string. If you absolutely want to do this is Javascript, have a look at the PHP.js function htmlentities.
I created a simple JS function to replace Greater Than and Less Than characters
Here is an example dirty string: < noreply#email.com >
Here is an example cleaned string: [ noreply#email.com ]
function RemoveGLthanChar(notes) {
var regex = /<[^>](.*?)>/g;
var strBlocks = notes.match(regex);
strBlocks.forEach(function (dirtyBlock) {
let cleanBlock = dirtyBlock.replace("<", "[").replace(">", "]");
notes = notes.replace(dirtyBlock, cleanBlock);
});
return notes;
}
Call it using
$('#form1').submit(function (e) {
e.preventDefault();
var dirtyBlock = $("#comments").val();
var cleanedBlock = RemoveGLthanChar(dirtyBlock);
$("#comments").val(cleanedBlock);
this.submit();
});

jQuery Text to Link Script? [duplicate]

This question already has answers here:
How to replace plain URLs with links?
(25 answers)
Closed 9 years ago.
Does anyone know of a script that can select all text references to URLs and automatically replace them with anchor tags pointing to those locations?
For example:
http://www.google.com
would automatically turn into
http://www.google.com
Note: I am wanting this because I don't want to go through all my content and wrap them with anchor tags.
NOTE: An updated and corrected version of this script is now available at https://github.com/maranomynet/linkify (GPL/MIT licence)
Hmm... to me this seems like the perfect task for jQuery.
...something like this came off the top of my mind:
// Define: Linkify plugin
(function($){
var url1 = /(^|<|\s)(www\..+?\..+?)(\s|>|$)/g,
url2 = /(^|<|\s)(((https?|ftp):\/\/|mailto:).+?)(\s|>|$)/g,
linkifyThis = function () {
var childNodes = this.childNodes,
i = childNodes.length;
while(i--)
{
var n = childNodes[i];
if (n.nodeType == 3) {
var html = $.trim(n.nodeValue);
if (html)
{
html = html.replace(/&/g, '&')
.replace(/</g, '<')
.replace(/>/g, '>')
.replace(url1, '$1$2$3')
.replace(url2, '$1$2$5');
$(n).after(html).remove();
}
}
else if (n.nodeType == 1 && !/^(a|button|textarea)$/i.test(n.tagName)) {
linkifyThis.call(n);
}
}
};
$.fn.linkify = function () {
return this.each(linkifyThis);
};
})(jQuery);
// Usage example:
jQuery('div.textbody').linkify();
It attempts to turn all occurrences of the following into links:
www.example.com/path
http://www.example.com/path
mailto:me#example.com
ftp://www.server.com/path
...all of the above wrapped in angle brackets (i.e. <...>)
Enjoy :-)
JQuery isn't going to help you a whole lot here as you're not really concerned with DOM traversal/manipulation (other than creating the anchor tag). If all your URLs were in <p class="url"> tags then perhaps.
A vanilla JavaScript solution is probably what you want, and as fate would have it, this guy should have you covered.
I have this function i call
textToLinks: function(text) {
var re = /(https?:\/\/(([-\w\.]+)+(:\d+)?(\/([\w/_\.]*(\?\S+)?)?)?))/g;
return text.replace(re, "$1");
}
I suggest you do this on your static pages before rendering to the browser, or you'll be pushing the burden of conversion computation onto your poor visitors. :) Here's how you might do it in Ruby (reading from stdin, writing to stdout):
while line = gets
puts line.gsub( /(^|[^"'])(http\S+)/, "\\1<a href='\\2'>\\2</a>" )
end
Obviously, you'll want to think about how to make this as robust as you desire. The above requires all URLs to start with http, and will check not to convert URLs that are in quotes (i.e. which may already be inside an <a href="...">). It will not catch ftp://, mailto:. It will happily convert material in places like <script> bodies, which you may not want to happen.
The most satisfactory solution is really to do the conversion by hand with your editor so you can eyeball and approve all substitutions. A good editor will let you do regexp substitution with group references (aka back references), so it shouldn't be a big deal.
Take a look at this JQuery plugin: https://code.google.com/p/jquery-linkifier/
Doing this server-side is not an option sometimes. Think of a client-side Twitter widget (that goes directly to Twitter API using jsonp), and you want to linkify all the URLs in the Tweets dynamically...
If you want a solution from another perspective... if you can run the pages through php and HTML Purifier, it can autoformat the output and linkify any urls.

Categories

Resources