Storing long formatted text for a web app - javascript

My intention is to store books and other types of large blobs of formatted text (100 to thousands of words on each chapter) to be displayed with their format in an application built with the aurelia framework. I would prefer using JSON, but I could try other alternatives. The text has been written using google docs.
So far, trying to use JSON, Visual Studio Code says Unexpected end of string at the first carriage return, and the application gives me an error in the console:
Unhandled rejection SyntaxError: Unexpected token in JSON at position 780
Is there any way to indicate to JSON that something is formatted text, or any decent alternative?

You're JSON has characters in it that aren't properly escaped. Most likely these are quote " characters and need \" before them all. Unless you have a particularly robust workflow setup to handle transcribing, you're going to run into this problem a lot with large documents, especially coming from a word processor.
Instead, why not simply store the material as HTML? It is specifically designed to store and markup documents. It has headings, paragraphs, lists, etc. Browsers are already equipped to display it without doing any processing and it can be easily injected into your application by simply appending it to any element on the page.
Additionally, Google Docs should be able to save the document as HTML directly, so you don't have to do any manual markup.

You need to escape special characters. This discussion may help. Note that you will probably have your own list of escaped characters, which depends on your source string.

Related

JavaScript/NodeJS RTF CJK Conversions

I'm working on a node module that parses RTF files and does some find and replace. I have already come up with a solution for special characters expressed in escaped unicode here, but have ran into a wall when it comes to CJK characters. Is there an easy way to do these conversions in JavaScript, either with a library or built in?
Example:
An RTF file viewed in plain text contains:
Now testing symbols {鈴:200638d}
When parsed in NodeJS, this part of the file looks like:
Now testing symbols \{
\f1 \'e2\'8f
\f0 :200638d\}\
I understand that \f1 and \f0 denote font changes, and the \'e2\'8f block is the actual character... but how can I take \'e2\'8f and convert it back to 鈴, or conversely, convert 鈴 to \'e2\'8f?
I have tried looking up the character in different encodings and am not seeing anything that remotely resembles \'e2\'8f. I understand that the RTF control \'hh is A hexadecimal value, based on the specified character set (may be used to identify 8-bit values) (source) or maybe the better definition comes from Microsoft RTF Spec; %xHH (OCTET with the hexadecimal value of HH) (download) but I have no idea what to do with that information to get conversions going on this.
I was able to parse your sample file using my RTF parser and retrieve the correct character.
The key thing is the \fonttbl command, as the name suggests, defines the fonts used in the document. As part of the definition of each font the \fcharset command determines the character set to be used with this font. You need to use this to correctly interpret the character data.
My parser maps the argument to the \fcharset to a Codeset name here then this is then translated to a charecter set name which can be used to retrieve the correct Java Charsethere. Your character set handling will obviously be different as you are working in Javascript, but hopefully this information will help you move forward.

HTML in JSON that "should" not be there

I have description field in a form.
As suggested here, HTML escaping should not be done in input, so if you put <h1>Description</h1> it is saved like this to database.
The problem is that I have defined a REST API, and the output "could" be HTML.
Should I escape the field when constructing the JSON or should I output HTML in JSON and let the client escape it?.
I feel I should escape the HTML server side, but then this operation would cost processing time. On the other hand, escaping in HTML saves this server time, but people using the API not carefully escaping HTML could end with XSS attacks.
A client may, probably will, be a Javascript client which should process such potential HTML values using the DOM API:
document.getElementById('output').textContent = json.result;
Using this DOM API is perfectly safe and does not require to escape json.result, since it's never interpolated as HTML, but treated as text node by a higher level API. If you send escaped HTML and the client is doing it properly like here, then escaped HTML will be shown on the client; i.e. you're turning your data into garbage.
So, no, never escape values for unrelated contexts. Escape/encode for JSON when putting values into JSON, don't worry about what may or may not happen later.

What is the maximum length that $.parseJSON() can handle?

I have a long json array that needs to be sent to an html5 mobile app and parsed. The whole array has around 700kb (gziped to 150kb) and it's 554976 characters long at the moment. But it will increase on time.
Using jquery to parse the json, my app crashes while trying to parse it. And so does jsonlint, json parser.fr and any other online json validator I try, so I'm guessing eval() is not an option either.
Might be a broad question but what is the maximum "acceptable" length for a json array?
I have already removed as much data as I can from the array, the only option I can think of is to split the array in 3-4 server calls and parse it separately in the app. Is there any other option?
EDIT
Thanks to #fabien for pointing that if jsonlint crashes there is a problem on the json. There was a hidden "space" character in one of the nodes. It was parsed correctly on the server but not on the client.
I've parsed way bigger arrays with jquery.
My first guess is that there's an error in your json.
You have many ways to find it (sublime text could highlight the error but some time, it's a bit long). Try to paste it in a web tool like http://www.jsoneditoronline.org/. and use any of the buttons (to format or to send to the right view). It'll tell you where the error is.

Special character handling in javascript

I have some web pages. User can enter anything he wants into the forms which are in my web pages.
I want the special characters(which are visible and non visible on keyboard) to save in Database and retrieve them as it is.
Any suggestions ?
First of all, define what counts as a special character - since that description doesn't mean anything beyond "I think this might be handled differently."
Secondly, you shouldn't have to do anything extra in order to store these "special" characters (I'm guessing they're non-ASCII NLS characters) in the database - so long as the database supports these characters (you'll likely need to define your column as nvarchar). If the database doesn't support them at all, you'll have to store binary streams as BLOBs and just do all the decoding within your application.
So, as your question stands at the moment, my answer is simply:
Save the Unicode strings to a unicode column in the database
Load this column from the DB at a later point to retrieve them as-is.
If you've tried this, and are coming across any particular problems, then post them. But if you're merely investigating before implementing, I can't see why you'd run into issues.

nested quotes in javascript

A bit of a noob question here...
I have a javascript function on a list of table rows
<tr onclick="ClosePopup('{ScenarioID}', '{Name}');" />
However, the {Name} value can sometimes contain the character "'" (single quote). At the moment the error Expected: ')' comes up as a result because it is effectivly ending the javascript function early and destroying the syntax.
What is the best way to prohibit the single quotes in {Name} value from effecting the javascript?
Cheers!
You're committing the first mortal sin of insecure web template programming - not escaping the content of the values being rendered into the template. I can almost guarantee you that if you take that approach, your web app will be vulnerable to XSS (cross site scripting) and any third party will be able to run custom javascript in your page, stealing user data and wreaking havoc as they wish.
Check it out. http://en.wikipedia.org/wiki/Cross-site_scripting
The solution is to escape the content. And to do that properly in the javascript, which is also inside html, is a lot more than just putting escape sequences in front of backslashes.
Any decent templating engine out there should provide you a way to escape content as it's written to the template. Your database values can be left as-is, the important part is escaping it at output time. If your template engine or dynamic web app framework doesn't allow for this, change to one that does. :)
In support of the prior comment please read the following to gain a better understanding of why the security advice is so important.
http://eval.symantec.com/mktginfo/enterprise/white_papers/b-whitepaper_web_based_attacks_03-2009.en-us.pdf
I would think that you could kill just about any code injection by, for example, replacing
"Hello"
with
String.fromCharCode(72,101,108,108,111)
Although the security information provided by everyone is very valuable, it was not so relevant to me in this situation as everything in this instance is clientside, security measures are applied when getting the data and rendering the XML. The page is also protected through windows authentication (adminsitration section only) and the web app framework cannot be changed. The answer i was looking for was really quite simple in the end.
<tr onclick='ClosePopup("{ScenarioID}", "{Name}");' />

Categories

Resources