JavaScript: dynamic regex for my own parser

JavaScript: dynamic regex for my own parser - javascript

I am trying a new direction in Language Kits (or whatever you want to call those multi language text files with placeholders). Basically, I have text like this: Hello, my name is %0. Welcome to %1!. This would be my pText.
My pValues is an array whose values represent %0 and %1.
The following function should find %0 and replace it with pValues[0] and so on.
function _parseDialogMessage(pText, pValues) {
var result = pText;
for (var i=0; i<pValues.length; ++i) {
var regex = new RegExp('\%'+i, 'gi');
pText = pText.replace(regex, pValues[i]);
}
return result;
}
It all works except for the fact that it does not replace the placeholders %0 and %1. All Variables have the expected values but .replace doesn't seem to find my placeholders.
Any help?
Edit 1
Shame on me... -.-

You don't need "dynamic regex", since replace can take a function as argument:
function _parseDialogMessage(pText, pValues) {
return pText.replace(/%(\d+)/g, function (s, i) { return pValues[i]; });
}
(and you should return pText.)

You are returning the result variable which hold the initial values of the ptext parameter..
return the pText variable..

UUhm, you return result and not the replace pText

While you can't use any of their code unless you want to GPL your library, the manual for gnu's gettext covers the rationale behind a number of topics related to internationalization.
http://www.gnu.org/software/gettext/manual/gettext.html
edit : I know you're just looking for a magic regex, but it won't be enough.
easy example :
I have %n computers.
I have 1 computers.
Did you know arabic has a special tense for 2 things, and chinese no tense for the number of things referred to?

Related

Make text content between specified HTML tags toUpperCase in React-Native

I want to make to uppercase the contents of specific HTML tags with plain JavaScript in a React-Native application.
Note: This is a React-Native application. There is no JS document, available, nor jQuery. Likewise, CSS text-transform: uppercase cannot be used because it will not be displayed in a web browser.
Let's say, there is the following HTML text:
<p>This is an <mytag>simple Example</mytag></p>
The content of the Tag <mytag> shall be transformed to uppercase:
<p>This is an <mytag>SIMPLE EXAMPLE</mytag></p>
I tried this code:
let regEx = storyText.match(/<mytag>(.*?)<\/mytag>/g)
if(regEx) storyText = regEx.map(function(val){
return val.toUpperCase();
});
But the map() function returns only the matched content instead of the whole string variable with the transformed part of <mytag>.
Also, the match() method will return null, if the tag wasn't found. So a fluent programming style like storyText.match().doSomething isn't possible.
Since there are more tags to transform, an approach where I can pass variables to the regex-pattern would be appreciated.
Any hints to solve this?
(This code is used in a React-Native-App with the react-native-html-view Plugin which doesn't support text-transform out of the box.)

Since it seems that document and DOM manipulation (e.g., i.e., through jQuery and native JS document functions) are off limits, I guess you do have to use regex.
Then why not just create a function that does a job like the above: looping through each tag and replacing it via regex?
var storyText = "your HTML in a string";
function tagsToUppercase(tags) {
for(tag in tags) {
let regex = new RegExp("(<" + tags[tag] + ">)([^<]+)(<\/" + tags[tag] + ">)", "g");
storyText = storyText.replace(regex, function(match, g1, g2, g3) {
return g1 + g2.toUpperCase() + g3;
});
}
}
// uppercase all <div>, <p>, <span> for example
tagsToUppercase(["div", "p", "span"]);
See it working on JSFiddle.
Also, although it probably doesn't apply to this case, (#Bergi urged me to remind you to) try to avoid using regular expressions to manipulate the DOM.

Edit, Updated
The content of the Tag < mytag > shall be transformed to uppercase:
<p>This is an <mytag>SIMPLE EXAMPLE</mytag></p>
You can use String.prototype.replace() with RegExp /(<mytag>)(.*?)(<\/mytag>)/g to create three capture groups, call .toUpperCase() on second capture group
let storyText = "<p>This is an <mytag>simple Example</mytag></p>";
let regEx = storyText.replace(/(<mytag>)(.*?)(<\/mytag>)/g
, function(val, p1, p2, p3) {
return p1 + p2.toUpperCase() + p3
});
console.log(regEx);

In general you shouldn't be parsing html with javascript. With that in mind, if this is what you truly need to do, then try something like this:
let story = '<p>smallcaps</p><h1>heading</h1><div>div</div><p>stuff</p>';
console.log( story.replace(/<(p|span|div)>([^<]*)<\/(p|span|div)>/ig,
(fullmatch, startag,content,endtag) => `<${startag}>${content.toUpperCase()}</${endtag}>` )
)
Consider the cases where you might have nested values, p inside a div, or an a or strong or em inside your p. For those cases this doesn't work.

Why not this way ?
$("mytag").text($("mytag").text().toUpperCase())
https://jsfiddle.net/gub61haL/

Detect if string contains javascript tags using jQuery/JavaScript

I am trying to create a very simplistic XSS detection system for a system I am currently developing. The system as it stands, allows users to submit posts with javascript embedded within the message. Here is what I currently have:-
var checkFor = "<script>";
alert(checkFor.indexOf("<script>") !== -1);
This doesn't really work that well at all. I need to write code that incorporates an array which contains the terms I am searching for [e.g - "<script>","</script>","alert("]
Any suggestions as to how this could be achieved using JavaScript/jQuery.
Thanks for checking this out. Many thanks :)

Replacing characters is a very fragile way to avoid XSS. (There are dozens of ways to get < in without typing the character -- like < Instead, HTML-encode your data. I use these functions:
var encode = function (data) {
var result = data;
if (data) {
result = $("<div />").html(data).text();
}
};
var decode = function (data) {
var result = data;
if (data) {
result = $("<div />").text(data).html();
}
};

As Explosion Pills said, if you're looking for cross–site exploits, you're probably best to either find one that's already been written or someone who can write one for you.
Anyway, to answer the question, regular expressions are not appropriate for parsing markup. If you have an HTML parser (client side is easy, server a little more difficult) you could insert the text as the innerHTML of an new element, then see if there are any child elements:
function mightBeMarkup(s) {
var d = document.createElement('div');
d.innerHTML = s;
return !!(d.getElementsByTagName('*').length);
}
Of course there still might be markup in the text, just that it's invalid so doesn't create elements. But combined with some other text, it might be valid markup.

The most effective way to prevent xss attacks is by replacing all <, > and & characters with
<, >, and &.
There is a javascript library from OWASP. I haven't worked with it yet so can't tell you anything about the quality. Here is the link: https://www.owasp.org/index.php/ESAPI_JavaScript_Readme

system for censoring banned words

I'm actually working on a website in which I'll need to replace many words by something like for example: banana by ******.
I use a website with php and mysql, but I also use javascript.
I have in my database a table in which are banned words.
I'm receive this words in an array from my database. i'm looking for a function that will be able to replace this words in all tha page. i can not use function like ob start.
The best will be a function that check on body onload and replace words.

This is a rather difficult task to tackle because:
People will try to circumvent this system by replacing certain letter, such as "s" with "$", "a" with "#", or by misspelling words that can still be understood
How will you deal with words like "password" that contains an swear word?
I would recommend going with a service that already has this figured out:
http://www.webpurify.com/
Look at this SO post: How do you implement a good profanity filter?

I'm going to use CoffeeScript, you can compile to JavaScript here if you wish or just use this as pseudocode.
String::replaceAll = (a, b) ->
regExp = new RegExp(a, "ig")
#replace regExp, b
_stars = (string) ->
str = ""
for i in [0..string.length]
str = "#{str}*"
str
bannedWords = [ "bannedword", "anotherbannedword" ]
_formSubmitHandler = (data) ->
for bannedWord in bannedWords
data.userInput = data.userInput.replaceAll bannedWord, _stars(data.userInput)

If the page content is as well coming from the database, or being entered into the database. Why not filter it using php prior to it being inserted or when it is pulled using str_replace
// PREFERRED WAY
$filteredContent = str_replace($bannedlist, "**", $content2Filter);
Or if you are looking for a javascript version, then you would need to use either multiple str.replace or regex. Something like:
var search = "/word1|word2|word3/gi"; //This would be your array joined by a pipe delimiter
var ret=str.replace(search,'**');

I made a very simple censoring method for this. It will only track words you put into the array of bad words. I would suggest you use an advanced library for word censors.
censor.js
var censor = (function() {
function convertToAsterisk(word) {
var asteriskSentence = '';
for(var asterisks=0;asterisks<word.length;asterisks++) {
asteriskSentence+='*';
}
return asteriskSentence;
}
return function(sentence, bannedWords) {
sentence = sentence || undefined;
bannedWords = bannedWords || undefined;
if(sentence!==undefined && bannedWords!==undefined) {
for(var word=0;word<bannedWords.length;word++) {
sentence = sentence.replace(bannedWords[word], convertToAsterisk(bannedWords[word]));
}
}
return sentence;
};
})();
The method can be used like so:
var sentence = 'I like apples, grapes, and peaches. My buddy likes pears';
var bannedWords = [
'pears',
'peaches',
'grapes',
'apples'
];
sentence = censor(sentence, bannedWords);
This system does not protect bad words within other words, or tricky mispellings. Only the basics.

var str="badword";
var ret=str.replace("badword","*******");
And to detect length automatically (useful for function useage)
var str="badword";
var ret=str.replace("badword",function() {
var ret = ""
for(var loop = 0; loop < str.length; loop++) {
var ret = ret + "*"
}
return ret
});

Finally I find my own way to make this system it is an easy way and you don't need to change all the code for all your website just the page that needs to be censored.
As far as I'm concerned I uses thausands of pages but the things is that I have one main page that included others pages.
For poeple who may be interested. all you have to do is to put this code at the beginning of your page so after the just put this code <?php ob_start(); ?> at the end of the body, before just put this code `
<?php
//We get the content of the page
$content = ob_get_contents();
// and we replace all
$content = str_replace('naughty', '*****', $content);
/// / VERY important, we must finish the page or in any case include ob_end_clean () function before echo $ content as PHP code would be displayed also
ob_end_clean ();
echo $content;
?>
This is an easy way, but you can also do an array for all censored words.

Full disclosure, I wrote the plugin.
I've written a jQuery plugin that does what you're looking for. It is not completely water tight, and others can very easily circumvent the plugin by disabling javascript. If you'd like to try it out, here's a link.
http://profanityfilter.chaseflorell.com/
And here's some example code.
<div id="someDiv">swears are ass, but passwords are ok</div>
<script>
$('#someDiv').profanityFilter({
customSwears: ['ass']
});
</script>

Pass value to Regex

I am trying to make a function with regular expression [javascript].
Please take a look.
function ReplaceIt(key)
{
var KeyCode = /.body\s*\{([^\}]*?)\}/m; // i want to replace the body to the key
}
var key ="h1";
ReplaceIt(key);
so the final result will be
var Keycode = /.h1\s*\{([^\}]*?)\}/m;
I am little bit newbie with javascript and I don't know how to search other resources.
Note: Friends, why are you deleting the answers?? Each and every comment/answer is helping us to improve, but we will choose the most appropriate/best answer, please don't delete comments/answers.

You can use RegExp with a string to build up your regular expression. If you build the regular expression that way, you should escape all '\'. So your function could look like:
function ReplaceIt(key)
{
return RegExp('.'+key+'\\s*\\{([^\\}]*?)\\}','m');
}
var reKey = ReplaceIt('h1'); //=> /.h1\s*\{([^\}]*?)\}/m

Replace all strings "<" and ">" in a variable with "<" and ">"

I am currently trying to code an input form where you can type and format a text for later use as XML entries. In order to make the HTML code XML-readable, I have to replace the code brackets with the corresponding symbol codes, i.e. < with < and > with >.
The formatted text gets transferred as HTML code with the variable inputtext, so we have for example the text
The Genji and the Heike waged a long and bloody war.
which needs to get converted into
The Genji and the Heike waged a long and bloody war.
I tried it with the .replace() function:
inputxml = inputxml.replace("<", "<");
inputxml = inputxml.replace(">", ">");
But this would just replace the first occurrence of the brackets. I'm pretty sure I need some sort of loop for this; I also tried using the each() function from jQuery (a friend recommended I looked at the jQuery package), but I'm still new to coding in general and I have troubles getting this to work.
How would you code a loop which would replace the code brackets within a variable as described above?
Additional information
You are, of course, right in the assumption that this is part of something larger. I am a graduate student in Japanese studies and currently, I am trying to visualize information about Japenese history in a more accessible way. For this, I am using the Simile Timeline API developed by MIT grad students. You can see a working test of a timeline on my homepage.
The Simile Timeline uses an API based on AJAX and Javascript. If you don't want to install the AJAX engine on your own server, you can implement the timeline API from the MIT. The data for the timeline is usually provided either by one or several XML files or JSON files. In my case, I use XML files; you can have a look at the XML structure in this example.
Within the timeline, there are so-called "events" on which you can click in order to reveal additional information within an info bubble popup. The text within those info bubbles originates from the XML source file. Now, if you want to do some HTML formatting within the info bubbles, you cannot use code bracket because those will just be displayed as plain text. It works if you use the symbol codes instead of the plain brackets, however.
The content for the timeline will be written by people absolutely and totally not accustomed to codified markup, i.e. historians, art historians, sociologists, among them several persons of age 50 and older. I have tried to explain to them how they have to format the XML file if they want to create a timeline, but they occasionally slip up and get frustrated when the timeline doesn't load because they forgot to close a bracket or to include an apostrophe.
In order to make it easier, I have tried making an easy-to-use input form where you can enter all the information and format the text WYSIWYG style and then have it converted into XML code which you just have to copy and paste into the XML source file. Most of it works, though I am still struggling with the conversion of the text markup in the main text field.
The conversion of the code brackets into symbol code is the last thing I needed to get working in order to have a working input form.

look here:
http://www.bradino.com/javascript/string-replace/
just use this regex to replace all:
str = str.replace(/\</g,"<") //for <
str = str.replace(/\>/g,">") //for >

To store an arbitrary string in XML, use the native XML capabilities of the browser. It will be a hell of a lot simpler that way, plus you will never have to think about the edge cases again (for example attribute values that contain quotes or pointy brackets).
A tip to think of when working with XML: Do never ever ever build XML from strings by concatenation if there is any way to avoid it. You will get yourself into trouble that way. There are APIs to handle XML, use them.
Going from your code, I would suggest the following:
$(function() {
$("#addbutton").click(function() {
var eventXml = XmlCreate("<event/>");
var $event = $(eventXml);
$event.attr("title", $("#titlefield").val());
$event.attr("start", [$("#bmonth").val(), $("#bday").val(), $("#byear").val()].join(" "));
if (parseInt($("#eyear").val()) > 0) {
$event.attr("end", [$("#emonth").val(), $("#eday").val(), $("#eyear").val()].join(" "));
$event.attr("isDuration", "true");
} else {
$event.attr("isDuration", "false");
}
$event.text( tinyMCE.activeEditor.getContent() );
$("#outputtext").val( XmlSerialize(eventXml) );
});
});
// helper function to create an XML DOM Document
function XmlCreate(xmlString) {
var x;
if (typeof DOMParser === "function") {
var p = new DOMParser();
x = p.parseFromString(xmlString,"text/xml");
} else {
x = new ActiveXObject("Microsoft.XMLDOM");
x.async = false;
x.loadXML(xmlString);
}
return x.documentElement;
}
// helper function to turn an XML DOM Document into a string
function XmlSerialize(xml) {
var s;
if (typeof XMLSerializer === "function") {
var x = new XMLSerializer();
s = x.serializeToString(xml);
} else {
s = xml.xml;
}
return s
}

https://developer.mozilla.org/en/JavaScript/Reference/Global_Objects/String/replace
You might use a regular expression with the "g" (global match) flag.
var entities = {'<': '<', '>': '>'};
'<inputtext><anotherinputext>'.replace(
/[<>]/g, function (s) {
return entities[s];
}
);

You could also surround your XML entries with the following:
<![CDATA[...]]>
See example:
<xml>
<tag><![CDATA[The <b>Genji</b> and the <b>Heike</b> waged a long and bloody war.]]></tag>
</xml>
Wikipedia Article:
http://en.wikipedia.org/wiki/CDATA

What you really need, as mentioned in comments, is to XML-encode the string. If you absolutely want to do this is Javascript, have a look at the PHP.js function htmlentities.

I created a simple JS function to replace Greater Than and Less Than characters
Here is an example dirty string: < noreply#email.com >
Here is an example cleaned string: [ noreply#email.com ]
function RemoveGLthanChar(notes) {
var regex = /<[^>](.*?)>/g;
var strBlocks = notes.match(regex);
strBlocks.forEach(function (dirtyBlock) {
let cleanBlock = dirtyBlock.replace("<", "[").replace(">", "]");
notes = notes.replace(dirtyBlock, cleanBlock);
});
return notes;
}
Call it using
$('#form1').submit(function (e) {
e.preventDefault();
var dirtyBlock = $("#comments").val();
var cleanedBlock = RemoveGLthanChar(dirtyBlock);
$("#comments").val(cleanedBlock);
this.submit();
});

Develop Reference

JavaScript is the programming language of the Web.