When searching my db all special characters work aside from the "+" - it thinks its a space. Looking on the backend which is python, there is no issues with it receiving special chars which I believe it is the frontend which is Javascript
what i need to do is replace "+" == "%2b". Is there a way for me to use create this so it has this value going forth?
You can use decodeURIComponent('%2b'), or encodeUriComponent('+');
if you decode the response from the server, you get the + sign-
if you want to replace all ocurrence just place the whole string insde the method and it decodes/encodes the whole string.
Related
I'm running a NodeJS app that gets certain posts from an API.
When trying to JSON.parse with special characters in, the JSON.parse would fail.
Special characters can be just any other language, emojis etc.
Parsing works fine when posts don't have special characters.
I need to preserve all of the text, I can't just ignore those characters since I need to handle every possible language.
I'm getting the following error:
"Unexpected token �"
Example of a text i'm supposed to be able to handle:
"summary": "★リプライは殆ど見てません★ Tokyo-based E-J translator. ここは流れてくるニュースの自分用記録でRT&メモと他人の言葉の引用、ブログのフィード。ここで意見を述べることはしません。「交流」もしません。関心領域は匦"�アイルランドと英国(他は専門外)※Togetterコメ欄と陰謀論が嫌いです。"
How can I properly parse such a text?
Thanks
You have misdiagnosed your problem, it has nothing to do with that character.
Your code contains an unescaped " immediately before the special character you think is causing the problem. The early " is prematurely terminating the string.
If you insert a backslash to escape the ", your string can be parsed as JSON just fine:
x = '{"summary": "★リプライは殆ど見てません★ Tokyo-based E-J translator. ここは流れてくるニュースの自分用記録でRT&メモと他人の言葉の引用、ブログのフィード。ここで意見を述べることはしません。「交流」もしません。関心領域は匦\\"�アイルランドと英国(他は専門外)※Togetterコメ欄と陰謀論が嫌いです。"}';
console.log(JSON.parse(x));
You need to pass a string not as an object.
Example
JSON.parse('{"summary" : "a"}');
In your case it should be like this
JSON.parse(
'{"summary" : "★リプライは殆ど見てません★ Tokyo-based E-J translator. ここは流れてくるニュースの自分用記録でRT&メモと他人の言葉の引用、ブログのフィード。ここで意見を述べることはしません。「交流」もしません。関心領域は匦�アイルランドと英国(他は専門外)※Togetterコメ欄と陰謀論が嫌いです。"}')
My page state can be described by a JavaScript object that can be serialized into JSON. But I don't think a JSON string is suitable for use in a fragment ID due to, for example, the spaces and double-quotes.
Would encoding the JSON string into a base64 string be sensible, or is there a better way? My goal is to allow the user to bookmark the page and then upon returning to that bookmark, have a piece of JavaScript read window.location.hash and change state accordingly.
I think you are on a good way. Let's write down the requirements:
The encoded string must be usable as hash, i.e. only letters and numbers.
The original value must be possible to restore, i.e. hashing (md5, sha1) is not an option.
It shouldn't be too long, to remain usable.
There should be an implementation in JavaScript, so it can be generated in the browser.
Base64 would be a great solution for that. Only problem: base64 also contains characters like - and +, so you win nothing compared to simply attaching a JSON string (which also would have to be URL encoded).
BUT: Luckily, theres a variant of base64 called base64url which is exactly what you need. It is specifically designed for the type of problem you're describing.
However, I was not able to find a JS implementation; maybe you have to write one youself – or do a bit more research than my half-assed 15 seconds scanning the first 5 Google results.
EDIT: On a second thought, I think you don't need to write an own implementation. Use a normal implementation, and simply replace the “forbidden” characters with something you find appropriate for your URLs.
Base64 is an excellent way to store binary data in text. It uses just 33% more characters/bytes than the original data and mostly uses 0-9, a-z, and A-Z. It also has three other characters that would need encoded to be stored in the URL, which are /, =, and +. If you simply used URL encoding, it would take up 300% (3x) the size.
If you're only storing the characters in the fragment of the URL, base64-encoded text it doesn't need to be re-encoded and will not change. But if you want to send the data as part of the actual URL to visit, then it matters.
As referenced by lxg, there there is a base64url variant for that. This is a modified version of base64 to replace unsafe characters to store in the URL. Here is how to encode it:
function tobase64url(s) {
return btoa(x).replace(/\+/g,'-').replace(/\//g,'_').replace(/=/g,'');
}
console.log(tobase64url('\x00\xff\xff\xf1\xf1\xf1\xff\xff\xfe'));
// Returns "AP__8fHx___-" instead of "AP//8fHx///+"
And to decode a base64 string from the URL:
function frombase64url(s) {
return atob(x.replace(/-/g,'+').replace(/_/g, '/'));
}
Use encodeURIComponent and decodeURIComponent to serialize data for the fragment (aka hash) part of the URL.
This is safe because the character set output by encodeURIComponent is a subset of the character set allowed in the fragment. Specifically, encodeURIComponent escapes all characters except:
A - Z
a - z
0 - 9
- . _ ~ ! ' ( ) *
So the output includes the above characters, plus escaped characters, which are % followed by hexadecimal digits.
The set of allowed characters in the fragment is:
A - Z
a - z
0 - 9
? / : # - . _ ~ ! $ & ' ( ) * + , ; =
percent-encoded characters (a % followed by hexadecimal digits)
This set of allowed characters includes all the characters output by encodeURIComponent, plus a few other characters.
I've got some data from dbpedia using jena and since jena's output is based on xml so there are some circumstances that xml characters need to be treated differently like following :
Guns n ' Roses
I just want to know what kind of econding is this?
I want decode/encode my input based on above encode(r) with the help of javascript and send it back to a servlet.
(edited post if you remove the space between & and amp you will get the correct character since in stackoverflow I couldn't find a way to do that I decided to put like that!)
Seems to be XML entity encoding, and a numeric character reference (decimal).
A numeric character reference refers to a character by its Universal
Character Set/Unicode code point, and uses the format
You can get some info here: List of XML and HTML character entity references on Wikipedia.
Your character is number 39, being the apostrophe: ', which can also be referenced with a character entity reference: '.
To decode this using Javascript, you could use for example php.js, which has an html_entity_decode() function (note that it depends on get_html_translation_table()).
UPDATE: in reply to your edit: Basically that is the same, the only difference is that it was encoded twice (possibly by mistake). & is the ampersand: &.
This is an SGML/HTML/XML numeric character entity reference.
In this case for an apostrophe '.
We're allowing users to upload pictures and provide a text description. Users can view this through a pop up box (actually a div ) via javascript. The uploaded text is a parameter to a javascript function. I 'm worried about XSS and also finding issues with HTMLEncode().
We're using HTMLEncode to guard against XSS. Unfortunately, we're finding that HTMLEncode() only replaces '<' and '>'. We also need to replace single and double quotes that people may include. Is there a single function that will do all these special type characters or must we do that manually via .NET string.Replace()?
Unfortunately, we're finding that HTMLEncode() only replaces '<' and '>'.
Assuming you are talking about HttpServerUtility.HtmlEncode, that does encode the double-quote character. It also encodes as character references the range U+0080 to U+00FF, for some reason.
What it doesn't encode is the single quote. Bit of a shame but you can usually work around it by using only double quotes as attribute value delimiters in your HTML/XML. In that case, HtmlEncode is enough to prevent HTML-injection.
However, javascript is in your tags, and HtmlEncode is decidedly not enough to escape content to go in a JavaScript string literal. JavaScript-encoding is a different thing to HTML-encoding, so if that's the reason you're worried about the single quote then you need to employ a JS string encoder instead.
(A JSON encoder is a good start for that, but you would want to ensure it encodes the U+2028 and U+2029 characters which are, annoyingly, valid in JSON but not in JavaScript. Also you might well need some variety of HTML-escaping on top of that, if you have JavaScript in an HTML context. This can get hairy; it's usually better to avoid these problems by hiding the content you want in plain HTML, for example in a hidden input or custom attribute, where you can use standard HTML-escaping, and then read that data from the DOM in JS.)
If the text description is embedded inside a JavaScript string literal, then to prevent XSS, you will need to escape special characters such as quotes, backslashes, and newlines. The HttpUtility.HtmlEncode method is not suitable for this task.
If the JavaScript string literal is in turn embedded inside HTML (for example, in an attribute), then you will need to apply HTML encoding as well, on top of the JavaScript escaping.
You can use Microsoft's Anti-Cross Site Scripting library to perform the necessary escaping and encoding, but I recommend that you try to avoid doing this yourself. For example, if you're using WebForms, consider using an <asp:HiddenField> control: Set its Value property (which will be HTML-encoded automatically) in your server-side code, and access its value property from client-side code.
how about you htmlencode all of the input with this extended function:
private string HtmlEncode(string text)
{
char[] chars = HttpUtility.HtmlEncode(text).ToCharArray();
StringBuilder result = new StringBuilder(text.Length + (int)(text.Length * 0.1));
foreach (char c in chars)
{
int value = Convert.ToInt32(c);
if (value > 127)
result.AppendFormat("&#{0};", value);
else
result.Append(c);
}
return result.ToString();
}
this function will convert all non-english characters, symbols, quotes, etc to html-entities..
try it out and let me know if this helps..
If you're using ASP.NET MVC2 or ASP.NET 4 you can replace <%= with <%: to encode your output. It's safe to use for everything it seems (like HTML Helpers).
There is a good write up of this here: New <%: %> Syntax for HTML Encoding Output in ASP.NET 4 (and ASP.NET MVC 2)
I have an issue with submitting post data. I have a form which have a couple of text fields in, and when a button is pressed to submit the data, it is run through a custom from validation (JS), then I construct a query string like
title=test&content=some content
which is then submitted to the server. The problem I had is when I have '&' (eg  ) entered into one of the inputs which then breaks up the query string. Eg:
title=test&content=some content  
How do I get around this?
Thanks in advance,
Harry.
Run encodeURIComponent over each key and value.
var title = "test";
var content = "some content   ";
var data = encodeURIComponent('title') + /* You don't actually need to encode this as it is a string that only contains safe characters, but you would if you weren't sure about the data */
'=' + encodeURIComponent(title) +
'&' + encodeURIComponent('content') +
'=' + encodeURIComponent(content);
Encode the string..when you want to encode a query string with special characters you need to use encoding. ampersand is encoded like this
title=test&content=some content %26
basically any character in a query string can be replaced by its ASCII Hex equivalent with a % as the prefix
Space = %20
A = %41
B = %42
C = %43
...
You need to encode your query to make it URL-safe. You can refer to the following links on how to do that in JS:
http://xkr.us/articles/javascript/encode-compare/
http://www.webtoolkit.info/javascript-url-decode-encode.html
You said:
...and when a button is pressed to submit the data, it is run through a custom from validation (JS), then I construct a query string...
In the section where you are building the query string you should also run the value of each input through encodeURIComponent() as David Dorward suggested.
As you do - be careful that you only assign the new value to your processed query string and NOT the form element value, otherwise your users will think their input was somehow corrupted and potentially freak out.
[EDIT]
I just re-read your question and realized something important: you're encoding an   ;character. This is probably a more complicated issue than other posters here have read into. If you want that character, and other &code; type characters to transfer over you'll need to realize that they are codes. Those characters &, n, b, s, p and ; are not themselves the same as " " which is a space character that does not break.
You'll have to add another step of encoding/decoding. You can place this step either before of after the data is sent (or "POSTed").
Before:
(Using this question's answers)
var data = formElement.value;
data = rhtmlspecialchars(data, 0);
Which is intended to replace your "special" characters like with " " so that they are then properly encoded by encodeURIComponent(data)
Or after:
(using standard PHP functions)
<?PHP
$your_field_name = htmlspecialchars_decode(urldecode($_POST['your_field_name']));
?>
This assumes that you escaped the & in your POST with %26
If you replaced it with some function other than encodeURIComponent() you'll have to find a different way to decode it in PHP.
This should solve your problem:
encodeURIComponent(name)+'='+encodeURIComponent(value)+'&'+encodeURIComponent(name2)+'='+encodeURIComponent(value2)
You need to escape each value (and name if you want to be on the safe side) before concatenating them when you're building your query.
The JavaScript global function encodeURIComponent() does the escaping.
The global function escape() (DOM) does this for you in a browser. Although people are saying it is not doing the escaping well for unicode chars. Anyway if you're only concerned about '&' then this would solve your problem.