Why does this character encoding issue only occur on select systems?

Why does this character encoding issue only occur on select systems? - javascript

We are using a JavaScript WYSIWYG text editor called CKEditor. The editor has a source view that marks up, with HTML, what the user has entered in the text editor. Sometimes the editor will insert non-breaking spaces ( ) into this source view, which is fine.
Everything seemed to work correctly on the dev machines so we deployed to our production servers. At this point we started seeing a weird Â character (Â) being inserted into the text. After some reading I saw that this was reported in several tickets on the CKEditor bug tracking page. I was able to resolve the issue by setting the charset attribute on the script tag for ckeditor.js to UTF-8.
My question is this: Why did the script tag need the charsetattribute set in the first place, and why only on certain systems?
The last comment on this SO question mentions that the byte sequence for a non-breaking space in UTF-8 is actually the Â character followed by a non-breaking space in latin1 (which is ISO-8859-1 right?). This could definitely be a clue because another Â character is inserted, one after another, every time the user switches to source view. It is as if the CKEditor framework is trying to inject a non-breaking space, but that gets turned into Â&nbsp, then ÂÂ&nbsp, and so on. The content-type on all systems (viewed from Chrome debugger) is text/html;charset=ISO-8859-1, which I am unsure why. The Dfile.encoding option in all Tomcat configs is set to utf-8. The meta tag is also <meta charset="utf-8">.

Fire up your development tools in the Web browser. When a form is rendered / submitted, stop and look at the request and response headers that are sent back and forth. Make sure you see UTF-8 everywhere. If it's missing, then one side will assume "default encoding" - whatever that might be.
Also make sure you have set the charset on the forms because they don't automatically inherit the one from the page.
EDIT This page explains in detail how you can set the charset when using Tomcat plus the necessary code for your servlets.

Related

Why does this â€‹ sign pop up?

Whilst running a code on notepad ++ that makes traffic lights run automatically I notices that this sign pop ups next to one of the buttons, â€‹. I did some research and found out that to make it go away I need to specify the charset to equal utf-8. I did this and the sign went away however I am confused because the default character encoding in HTML5 is utf-8 and it is even shown in notepad that it is using utf-8.
I was wondering if someone could tell me why the sign pops up considering the fact that it was already encoded in utf-8.

There are a number of things that all need to be set to UTF8.
The original file, of course, needs to be UTF8.
However, there is also an HTML header that specifies the encoding of the file. If this header is set incorrectly, the browser may try another encoding.
So, using a specific over-ride in the HTML file can "work around" this issue.
There a bit of discussion here: <meta charset="utf-8"> vs <meta http-equiv="Content-Type">

Cross-Browser - Newline characters in textareas

In my web application (JSP, JQuery...) there is a form which, along with other fields, has a textarea where the user can input notes freely. The value is saved to the database as is.
The problem happens when the value has newline characters and is loaded back to the textarea; it sometimes "breaks" the Jquery code. Explaining further:
The value is loaded to the textarea using Jquery:
$('#p_notas').text("value_from_db");
When the user hits Enter to insert a new paragraph, the resulting value will include a newline character (or more than one char). This char is the problem as it varies from browser to browser and I haven't found out which one is causing the problem.
The error I get is a console error: SyntaxError: unterminated string literal. The page doesn't load correctly.
I'm not able to reproduce the problem. I tried with Chrome, Firefox and IE Edge (with several combinations of user agent and document mode).
We advise our users to use IE8+, Firefox or Chrome but we can't control it.
What I wanted to know is which character is causing the problem and how can I solve it.
Thanks
EDIT: Summing up - What are the differences in newline characters for the different browsers? Can I do anything to make them uniform?
EDIT 2: Looking at the page in the debugger, what I get is:
Case 1 (No problem)
$('#p_notas').text("This is the text I inserted \r\n More text");
Case 2 (Problem)
$('#p_notas').text("This is the text I inserted
More text");
In case 2 I get the Javascript error "SyntaxError: unterminated string literal." because it is interpreted as two lines of code
EDIT 3: #m02ph3u5 I tried using '\r' '\n' '\r\n' '\n\r' and I couldn't reproduce the problem.
EDIT 4: I'm going to try and replace all line breaks with '\n\r'
EDIT 5: In case it is of interest, what I did was treat the value before it was saved
value.replace(/(?:\r\n|\r(?=\n)|\n(?=\r))/g, '\n\r')

The problem isn't the browser but the operating system. Quoting from this post:
So, using \r\n will ensure linebreaks on all major operating systems
without issue.
Here's a nice read on the why: why do operating systems implement line breaks differently?
The problem you might be experiencing is saving the value of the textarea and then returning that value including any newlines. What you could do is "normalize" the value before saving, so that you don't have to change the output. In other words: get the value from the textarea, do a find-and-replace and replace every ossible occurrence of a newline (\r, \n) by a value that works on all OS's \r\n. Then, when you get the value from the database later on, it'll always be correct.

I suspect your problem is actually any new line in the entered input is causing an issue. It looks like on the server you are have a templated page something like:
$('#p_notas').text("<%=db.value%>");
So what you end up with client side is:
$('#p_notas').text("some notes that
were entered by the user");
or some other characters that break the JS. Embedded quotes would do it too.
You need to escape the user entered values some how. The preferred "modern" way is to format info you are returning as AJAX. If you are embedding the value within a template what I might do is:
<div style="display:none" id="userdata><%=db.value%></div>
<script>$('#p_notas').text($("#userdata").text());</script>
Of course if it were this exactly you could just embed the data in the text area <textarea><%=db.value%></textarea>

When you output data to the response, you always need to encode it using the appropriate encoding for the context it appears in.
You haven't mentioned which server-side technology you're using. In ASP.NET, for example, the HttpUtility class contains various encoding methods for different contexts:
HtmlEncode for general HTML output;
HtmlAttributeEncode for HTML attributes;
JavaScriptStringEncode for javascript strings;
UrlEncode for values passed in the query-string of a URL;
In some cases, you might need to encode the value more than once. For example, if you're passing a value in a URL via a javascript string, you'd need to UrlEncode the raw value, then JavaScriptStringEncode the result.
Assuming that you're using ASP.NET, and your code currently looks something like this:
$('#p_notas').text("<%# Eval("SomeField") %>");
change it to:
$('#p_notas').text("<%# HttpUtility.JavaScriptStringEncode(Eval("SomeField", "{0}")) %>");

Dynamically created JavaScript function not working with long parameter

I have several html A tag generated programmatically in ASP.NET with a JavaScript function taking long parameter in href. One of those has over 20K characters when it get assigned in backend, but I am seeing the actual link has only 5239 characters on the browser side and the JavaScript function does not have closing. So the link never works. I am thinking about workarounds for this implementation since it's not a good idea to put this much amount of data in links, but now I'm just curious about cause of the issue.
Examples of the code assigning values to the link:
HtmlAnchor.HRef = "javascript:doSomething('Import','" + strHeader_LineIds + "');"
In this case the variable strHeader_LineIds carries a string over 20k characters.
Example of what I'm actually seeing in client side:
<a id=anchor1 class=class1 href="javascript:doSomething('Import', 'blahblahblahblah....">Link Text</a>
Please note the javascript function has no closing here. But when I'm debugging in backend I do see the closing of the function.
I guess this issue may have something to do with the browser's URL limit? I am using IE and I learned IE has a maximum URL length limit as 2,083 characters from Here. But how can the link show up with 5,239 characters?

I've had a similar issue with javascript like dynamic functions created in code and then called. I found that I had to play with swapping out single quotes in the javascript function with double quotes or escaping the quotes.
Then again just reading your post could be a limit issue.
Have you tried assigning the long to an element in the background and then referencing that as part of the javacript. I know IE gets funny with spaces in passed in parameters.

I think found an answer to the issue though. According to This Article:
JavaScript URIs
The JavaScript protocol is used for bookmarklets (aka favlets), a lightweight form of extensibility that permits a user to click a button and run some stored JavaScript on the currently loaded page. In IE9, the team did some work to relax the length limit (from ~260 characters, if I recall correctly) to something significantly larger (~5kb, if I recall correctly).
So I just hit the ~5kb limit.

How do I keep my UTF-8 characters from becoming junk?

I'm creating a simple JavaScript multiple choice game. Here is a sample question:
p ∧ q ≡ q ∧ p by which rule?
When I run it on localhost, it works fine, it prints out those special characters. However, when I upload it to my school's server, it prints out garbage:
p âˆ¨ q â‰¡ q âˆ¨ p by which rule?
I have this at the top of my HTML:
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
I can't use PHP in my assignment, or I'd use header('Content-Type: text/xml, charset=utf-8');
If you want, I can give a link... but I'd rather not because then everyone can see my really bad educational game...
How can I keep my UTF-8 characters?
Edit: I found out that if I Filezilla my files up to the server and download them from the server, the characters become little squares. I don't know if that's useful information.

Edit: I found out that if I Filezilla my files up to the server and download them from the server, the characters become little squares. I don't know if that's useful information.
Yes, filezilla is corrupting your files in transit. Make sure filezilla transfers your files as binary in order to make sure the text doesn't get corrupted in transit. If its transferring in ascii mode, it'll try to fix newlines and unrecognized characters.

If you cannot easily fix the HTTP headers, escape from the problem by using “character escapes.” If e.g. “∧” occurs in HTML content, use ∧ for it. If it occurs in a JavaScript string literal, use \u2227 for it.
To check out the codes for other characters, consult e.g.
http://www.alanwood.net/unicode/mathematical_operators.html

Copying and pasting the questions into notepad or any other app that allows you to save as UTF-8 might work if that is a viable option.
I think you could also use a regex to identify the hex values and replace them with the corresponding value that would work in UTF8.
Also if you're using a specialized type of font this could cause the problem - are the questions stylized with a particular font? a set of fallbacks? you may need to do an #font-face import but I suspect there's another option...with the symbols you're trying to use it seems like LaTeX might be an option..I believe there are a few options out there for javascript, fonts, etc..
this article may also be useful: http://www.joelonsoftware.com/printerFriendly/articles/Unicode.html

Google AJAX Feed API, Dynamic Feed Control and the Japnese Language

English is fine but for Japanese feeds its showing invalid characters...
why i am getting invalid characters in Japnese feeds?
http://acsjapan.jp/j/index.html
not in english?
http://acsjapan.jp/
help me fix for japnese feeds..

This is an encoding issue.
You are using (implicit) ISO-8859-1 encoding on your web page. Your AJAX feed serves UTF-8 characters.
This is tricky: I don't think you can make the Google Service deliver its data in the ISO-8859-1 character set. The best way would be to switch your site to UTF-8 - but that may have deeper consequences, and require other changes, especially if you are using a CMS.
Mandatory basic reading: The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)

Develop Reference

JavaScript is the programming language of the Web.

Why does this character encoding issue only occur on select systems? - javascript

Related

Why does this â€‹ sign pop up?

Cross-Browser - Newline characters in textareas

Dynamically created JavaScript function not working with long parameter

How do I keep my UTF-8 characters from becoming junk?

Google AJAX Feed API, Dynamic Feed Control and the Japnese Language

Categories

Resources