Our web application went wrong couple of times because of copy-pasting text from other application. Always the saved ASCII control characters were the root cause (e.g. 0x1A). I would like to eliminate all the possible control characters from the input string except for newline and tabulator. I created the following code:
var originalString, newString;
newString = originalString.replace(/[\x00-\x08\x0B-\x1F\x7F]+/g, '');
So I keep the tabulator (0x09) and newline (0x0A). My tests are all fine, but I would like to be sure before adding this code to our application.
Is that the one I want? Is it correct? Or should be extended? Or this should be done rather in the backend?
Related
Long story short, I'm trying to "fix" my system so I'm using the same regular expressions on the backend as we are the front (validating both sides for obvious security reasons). I've got my regex server side working just fine, but getting it down to the client is a pain. My quickest thought was to simply store it in a data attribute on a tag, grab it, and then validate against it.
Well, me, think again! JS is throwing me for a loop because apparently RegExp interprets the string differently depending how it's pulled in. Can anyone shine some light on what is happening here or how I might go about resolving this issue
HTML
<span data-regex="(^\\d{5}$)|(^\\d{5}-\\d{4}$)"></span>
Javascript
new RegExp($0.dataset.regex)
//returns /(^\\d{5}$)|(^\\d{5}-\\d{4}$)/
new RegExp($($0).data('regex'))
//returns /(^\\d{5}$)|(^\\d{5}-\\d{4}$)/
new RegExp("(^\\d{5}$)|(^\\d{5}-\\d{4}$)");
//returns /(^\d{5}$)|(^\d{5}-\d{4}$)/
Note in the first two how if I pull the value from the data attribute dynamically, the constructor for RegExp for some reason doesn't interpret the double slash correctly. If, however, I copy and paste the value as a string and call RegExp on the value, it correctly interprets the double slash and returns it in the right pattern.
I've also attempted simply not escaping the \d character by double slashing on the server side, but as you might (or might not) have guessed, the opposite happens. When pulled from attributes/dataset, the \ is completely removed leading the Regex to think I'm looking for the "d" character rather than digits. I'm at a loss for understanding what JS is thinking here. Please send help, Internet
Your data attribute has redundant backslashes. There's no need to escape backslashes in HTML attributes, so you'll actually get a double-backslash where you don't want one. When writing regular expressions as strings in JavaScript you have to escape backslashes, of course.
So you don't actually have the same string on both sides, simply because escaping works differently.
I have a big string (1116902 char length) that I want to process with a regex (pretty simple one). I get a response from a soap server that is encoded in base64. So I just get the result between the appropriate xml tags and then decode the response.
This working for a small request. But when I get a big response back, the callback function of the replace() method is never called. I have tried to test the string on the regex101 website and it can find the result. So I wonder if there is a limitation in my JavaScript engine. I'm working on a Wakanda Server V10 that use Webkit as JavaScript engine. I cannot provide the string because it contains some enterprise information.
Here is my regex : /xsd:base64Binary">((.|\n)*?)<\/responseData>/
I taught it is maybe a special character that is not included in the ((.|\n)*?) group. But then why the regex101 find out the result (then maybe is the JavaScript engine)
Maybe anybody can help me?
Thanks
If you can guarantee that there are no tags between your start and end delimiter, which sounds like it might be the case, you could just change your RE to
/xsd:base64Binary">([^<]*)<\/responseData>/
which shouldn't require any backtracking and might work for you.
[^<] simply means everything but the < character. Since there shouldn't be any tags between the open and closing tags of your section (at least that's what I understand) that will accept everything until you hit your closing tag. The important thing is that the RE engine can tell immediately whether something matches that or not, so no branching or backtracking is required.
In my web application (JSP, JQuery...) there is a form which, along with other fields, has a textarea where the user can input notes freely. The value is saved to the database as is.
The problem happens when the value has newline characters and is loaded back to the textarea; it sometimes "breaks" the Jquery code. Explaining further:
The value is loaded to the textarea using Jquery:
$('#p_notas').text("value_from_db");
When the user hits Enter to insert a new paragraph, the resulting value will include a newline character (or more than one char). This char is the problem as it varies from browser to browser and I haven't found out which one is causing the problem.
The error I get is a console error: SyntaxError: unterminated string literal. The page doesn't load correctly.
I'm not able to reproduce the problem. I tried with Chrome, Firefox and IE Edge (with several combinations of user agent and document mode).
We advise our users to use IE8+, Firefox or Chrome but we can't control it.
What I wanted to know is which character is causing the problem and how can I solve it.
Thanks
EDIT: Summing up - What are the differences in newline characters for the different browsers? Can I do anything to make them uniform?
EDIT 2: Looking at the page in the debugger, what I get is:
Case 1 (No problem)
$('#p_notas').text("This is the text I inserted \r\n More text");
Case 2 (Problem)
$('#p_notas').text("This is the text I inserted
More text");
In case 2 I get the Javascript error "SyntaxError: unterminated string literal." because it is interpreted as two lines of code
EDIT 3: #m02ph3u5 I tried using '\r' '\n' '\r\n' '\n\r' and I couldn't reproduce the problem.
EDIT 4: I'm going to try and replace all line breaks with '\n\r'
EDIT 5: In case it is of interest, what I did was treat the value before it was saved
value.replace(/(?:\r\n|\r(?=\n)|\n(?=\r))/g, '\n\r')
The problem isn't the browser but the operating system. Quoting from this post:
So, using \r\n will ensure linebreaks on all major operating systems
without issue.
Here's a nice read on the why: why do operating systems implement line breaks differently?
The problem you might be experiencing is saving the value of the textarea and then returning that value including any newlines. What you could do is "normalize" the value before saving, so that you don't have to change the output. In other words: get the value from the textarea, do a find-and-replace and replace every ossible occurrence of a newline (\r, \n) by a value that works on all OS's \r\n. Then, when you get the value from the database later on, it'll always be correct.
I suspect your problem is actually any new line in the entered input is causing an issue. It looks like on the server you are have a templated page something like:
$('#p_notas').text("<%=db.value%>");
So what you end up with client side is:
$('#p_notas').text("some notes that
were entered by the user");
or some other characters that break the JS. Embedded quotes would do it too.
You need to escape the user entered values some how. The preferred "modern" way is to format info you are returning as AJAX. If you are embedding the value within a template what I might do is:
<div style="display:none" id="userdata><%=db.value%></div>
<script>$('#p_notas').text($("#userdata").text());</script>
Of course if it were this exactly you could just embed the data in the text area <textarea><%=db.value%></textarea>
When you output data to the response, you always need to encode it using the appropriate encoding for the context it appears in.
You haven't mentioned which server-side technology you're using. In ASP.NET, for example, the HttpUtility class contains various encoding methods for different contexts:
HtmlEncode for general HTML output;
HtmlAttributeEncode for HTML attributes;
JavaScriptStringEncode for javascript strings;
UrlEncode for values passed in the query-string of a URL;
In some cases, you might need to encode the value more than once. For example, if you're passing a value in a URL via a javascript string, you'd need to UrlEncode the raw value, then JavaScriptStringEncode the result.
Assuming that you're using ASP.NET, and your code currently looks something like this:
$('#p_notas').text("<%# Eval("SomeField") %>");
change it to:
$('#p_notas').text("<%# HttpUtility.JavaScriptStringEncode(Eval("SomeField", "{0}")) %>");
I am sending a parameter, from Javascript to .NET. The parameter could contain multiple spaces like 'John [3 spaces here, stackoverflow not showing them] Smith', I need the spaces to stay. However, it looks like the spaces disappear in .NET. In an attempt to fix this, I made sure to encode the URI on client side, and decode it on server side. The code (in VB.NET) looks like this:
<AjaxPro.AjaxMethod()> _
Public Function GetSearch(ByVal strValue As String) As String
strValue = HttpUtility.UrlDecode(strValue)
...
End Function
Before the UrlDecode, strValue looks like John%20%20%20Smith'. Afterwards it looks like John Smith. Can anyone tell me how to fix this?
I am using .NET 2.0 Web Forms.
EDIT: following one of the suggestions below I replaced all the spaces with (ampersand)nbsp;. I can see all the spaces when I debug, however, my database is SQL server, for some reason it views regular spaces to be different from (ampersand)nbsp; spaces, and as a result the query does not return the right values. It took me some time to figure this out because I could not see the difference with the naked eye.
Ok, I was able to resolve the issue in the front-end by replacing all spaces with Non-breaking spaces. I then had to replace the Non-breaking spaces back with regular spaces in my SQL stored procedure (otherwise the query wouldn't work).
I have a search query from the user and I want to process it before applying to browser. since I'm using SEO with htaccess and the search url looks like this : /search/[user query] I should do something to prevent user from doing naughty things.. :) Like searching ../include/conf.php which will result in giving away my configuration file. I want to process the query like removing spaces, removing dots(which will cause problems), commas,etc.
var q = document.getElementById('q').value;
var q = q.replace(/ /gi,"+");
var q = q.replace(/../gi,"");
document.location='search/'+q;
the first replace works just fine but the second one messes with my query.. any solution to replacing this risky characters safely?
So if I disable JavaScript or use curl I still can do "naughty things"? On the client side do the sanity escaping with:
encodeURIComponent(document.getElementById('q').value)
and leave security checks to the server. You would be amazed what malicious user can do (using some escape sequences instead of plain . is the simplest example).
I'd do this server-side - it's too easy for someone to alter your JS in the page or switch it off altogether. Your search script that runs server-side can't (as) easily be tampered with and can then filter the search consistently.
You might also want to restrict what the search returns... if it's able to show sensitive config files, your search may have a little too much reach.
Dots in regular expressions match anything. You need to escape them with a back-slash ('\'):
var q = q.replace(/\.\./gi,"");