How to convert string into control characters? - javascript

We have a program where a user can input a line-break for another system.
the problem is when substituting a string with this line-break, it simply adds "\n".
for instance :
works :
var test = 'message';
test+='\n';
console.log("message " + test); // (doesnt show \n)
doesn't work :
var test="message";
var myLineBreakFromDatabase = getMyLineBreak();//simply returns a string containing "\n"
test+=myLineBreakFromDatabase;
test is now = "message\n" which is useless.
However, we using Javascript in the entire system so I guess just to figure out how to convert a line-break from string into control-characters?

The issue is that the string you are getting back from your database is actually something like "\\n", not "\n". Most likely you have some autoescaping going on somewhere, either before things get saved in the DB or before it gets to where you are using it in your app.
Hacky workaround that isn't a great idea but will probably "work":
var myLineBreakFromDatabase = getMyLineBreak().split('\\n').join('\n');

If your linebreaks for different systems are stored in a database where the linebreak for unix is \u000A and the one for windows is \u000D\u000A, then you could read the text into myLineBreak and then do something like:
var actualLineBreakString=JSON.parse('"' + myLineBreak + '"');
You'll have to make sure your javascript has the JSON object. If needed, it can be grabbed at json.org.
This approach works for any of the special characters. See here: http://msdn.microsoft.com/en-us/library/ie/2yfce773(v=vs.94).aspx

Related

How to check if the special character I'm seeing is indeed what I'm seeing or it's just an HTML entity in disguise?

I have a div containing a string which is a Cypher query:
<div id="foo">match (n)-[r]-() where n.gid='Cx' return n,r</div>
Since the ' character will be encoded as ’, and since Cypher query isn't able to automatically decode it, I need to make sure whether the ' character I see (and Cypher sees as well) is really the ' character, not just ’ but the browser automatically transforms it to '.
In all the places the character is always showed as '; the only place it's really show its true evil is in the HTML view-source page. I try using a solution from What's the right way to decode a string that has special HTML entities in it?:
function decodeHtml(html) {
var txt = document.createElement("textarea");
txt.innerHTML = html;
return txt.value;
}
But Cypher still doesn't accept it. But I guess I cannot check when the string is still encoded or not, because anywhere I go, the browser will always show the decoded version.
Another problem is when I add this into the console:
string = document.getElementById('foo').innerText
string.includes("'") //false
string.includes("&") //false
If it return false in both case, then what exactly the version of the character in there? Is it the decoded or encoded version?
Related: Can Cypher interpret HTML character codes as input?

Dealing with the Cyrillic encoding in Node.Js / Express App

In my app a user submits text through a form's textarea and this text is passed on to the app and is then processed by jsesc library, which escapes javascript strings.
The problem is that when I type in a text in Russian, such as
нам #интересны наши #идеи
what i get is
'\u043D\u0430\u043C #\u0438\u043D\u0442\u0435\u0440\u0435\u0441\u043D\u044B \u043D\u0430\u0448\u0438 #\u0438\u0434\u0435\u0438'
I then need to pass this data through FlowDock to extract hashtags and FlockDock just does not recognize it.
Can someone please tell me
1) What is the need for converting it into that representation;
2) If it makes sense to convert it back to cyrillic encoding for FlowDock and for the database, or shall I keep it in Unicode and try to make FlowDock work with it?
Thanks!
UPDATE
The complete script is:
result = getField(req, field);
result = S(result).trim().collapseWhitespace().s;
// at this point result = "нам #интересны наши #идеи"
result = jsesc(result, {
'quotes': 'double'
});
// now i end up with Unicode as above above (\u....)
var hashtags = FlowdockText.extractHashtags(result);
FlowDock receives the result which is
\u043D\u0430\u043C #\u0438\u043D\u0442\u0435\u0440\u0435\u0441\u043D\u044B \u043D\u0430\u0448\u0438 #\u0438\u0434\u0435\u0438
And doesn't extract hashtags from it...
These are 2 representations of the same string:
'нам #интересны наши #идеи' === '\u043D\u0430\u043C #\u0438\u043D\u0442\u0435\u0440\u0435\u0441\u043D\u044B \u043D\u0430\u0448\u0438 #\u0438\u0434\u0435\u0438'
looks like flowdock-text doesn't work well with non-ASCII characters
UPD: Tried, actually works well:
fdt.extractHashtags('\u043D\u0430\u043C #\u0438\u043D\u0442\u0435\u0440\u0435\u0441\u043D\u044B \u043D\u0430\u0448\u0438 #\u0438\u0434\u0435\u0438');
You shouldn't have used escaping in the first place, it gives you string literal representation (suits for eval, etc), not a string.
UPD2: I've reduced you code to the following:
var jsesc = require('jsesc');
var fdt = require('flowdock-text');
var result = 'нам #интересны наши #идеи';
result = jsesc(result, {
'quotes': 'double'
});
var hashtags = fdt.extractHashtags(result);
console.log(hashtags);
As I said, the problem is with jsesc: you don't need it. It returns javascript-encoded string. You need when you are doing eval with concatenation to protect from code injection, or something like this. For example if you add result = eval('"' + result + '"');, it will work.
What is the need for converting it into that representation?
jsesc is a JavaScript library for escaping JavaScript strings while generating the shortest possible valid ASCII-only output. Here’s an online demo.
This can be used to avoid mojibake and other encoding issues, or even to avoid errors when passing JSON-formatted data (which may contain U+2028 LINE SEPARATOR, U+2029 PARAGRAPH SEPARATOR, or lone surrogates) to a JavaScript parser or an UTF-8 encoder, respectively.
Sounds like in this case you don’t intend to use jsesc at all.
Try this:
decodeURIComponent("\u043D\u0430\u043C #\u0438\u043D\u0442\u0435\u0440\u0435\u0441\u043D\u044B \u043D\u0430\u0448\u0438 #\u0438\u0434\u0435\u0438");

JS/XSS: When assigning user-provided strings to variables; is it enough to replace <,>, and string delimiter?

If a server-side script generates the following output:
<script>
var a = 'text1';
var b = 'text2';
var c = 'text3';
</script>
, and the values (in this example "text1", "text2" and "text3") are user supplied (via HTTP GET/POST), is it enough to remove < and > from the input and to replace
'
with
' + "'" + '
in order to be safe from XSS? (This is my main question)
I'm particularly worried about the backslash not being escaped because an attacker could unescape the trailing '. Could that be a potential problem in this context? If the variable assignments were not separated by line breaks, an attacker could supply the values
text1
text2\
;alert(1);//
and end up with working JS code like
<script>
var a = 'text1'; var b = 'text2\'; var c = ';alert(1);//text3';
</script>
But since there are line breaks that shouldn't be a problem either. Am I missing something else?
It would be more secure to JSON encode your data, instead of rolling your own Javascript encoding function. When dealing with web application security, rolling your own is almost always not the answer. A JSON representation would handle the quotes and backslashes and any other special characters.
Most server side languages have a JSON module. Some also have a function specifically for what you're doing such as HttpUtility.JavaScriptStringEncode for the .NET framework.
If you were to roll your own, then it would be better to replace the characters for example like " to \x22, instead of changing single quotes or removing them. Also consider there is a multitude of creative XSS attacks that you'd need to defend against.
The end result, whatever method you use, is your data should remain intact when presented to the user. For example it's no good having O"Neil if someone's name is O'Neil.

Passing strings with Single Qoute from MVC Razor to JavaScript

This seems so simple it's embarrassing. However, the first question is when passing a value from the new ViewBag in MVC 3.0 (Razor) into a JavaScript block, is this the correct way to do it? And more importantly, where and how do you apply the proper string replacement code to prevent a single quote from becoming &#39 as in the resultant alert below?
Adding this into a single script block:
alert('#ViewBag.str') // "Hi, how's it going?"
Results in the following alert:
Razor will HTML encode everything, so to prevent the ' from being encoded to ', you can use
alert('#Html.Raw(ViewBag.str)');
However, now you've got an actual ' in the middle of your string which causes a javascript error. To get around this, you can either wrap the alert string in double quotes (instead of single quotes), or escape the ' character. So, in your controller you would have
ViewBag.str = "Hi, how\\'s it going?";
Another solution to use JSON string:
C#
ViewBag.str = "[{\"Text\":\"Hi, how's it going?\"}]";
Javascript
var j = #Html.Raw(ViewBag.str);
alert (j[0].Text);

Regex won't find '\u2028' unicode characters

We're having a lot of trouble tracking down the source of \u2028 (Line Separator) in user submitted data which causes the 'unterminated string literal' error in Firefox.
As a result, we're looking at filtering it out before submitting it to the server (and then the database).
After extensive googling and reading of other people's problems, it's clear I have to filter these characters out before submitting to the database.
Before writing the filter, I attempted to search for the character just to ensure it can find it using:
var index = content.search("/\u2028/");
alert("Index: [" + index + "]");
I get -1 as the result everytime, even when I know the character is in the content variable (I've confirmed via a Java jUnit test on the server side).
Assuming that content.replace() would work the same way as search(), is there something I'm doing wrong or anything I'm missing in order to find and strip these line separators?
Your regex syntax is incorrect. You only use the two forward slashes when using a regex literal. It should be just:
var index = content.search("\u2028");
or:
var index = content.search(/\u2028/); // regex literal
But this should really be done on the server, if anywhere. JavaScript sanitization can be trivially bypassed. It's only useful for user convenience, and I don't think accidentally entering line separator is that common.

Categories

Resources