SyntaxError Invalid character '\u8220' - javascript

There is a GTK+ Javascript ( Seed ) Database Tutorial here:
Javascript GTK+ Seed SqLite Tutorial
When I cut and paste the code into Geany or Gedit and compile it throws this error:
** (seed:19814): CRITICAL **: Line 3 in ./db.js: SyntaxError Invalid character '\u8220'
After unsuccessfully trying to locate and delete the offending character not once but countless times I gave up. Then I simply typed all the code into Geany and then it compiles and executes no errors .
My question is do we have a way for Geany or Gedit to actually display or even place mark such invisible characters as typing in small programs I have no issues with but as they become larger then it becomes exceedingly weary and inefficient, especially when one is trying to learn a new framework.
My apologies if this is not a proper question for Stackoverflow but I truly am trying to understand how the cutting and pasting process adds in these characters, I have looked through the html and I see no such character in the tutorial.
The whole point of having sample code is to allow the end-user to cut and paste and compile and learn ? But this gave me a real bad headache till I finally decided to type it all in and then everything works.

Your editors are displaying the offending character, you're just not recognizing it. The code point in the error message refers to the left double quotation mark character, which is indeed present in some examples on the linked page. This is almost certainly an unintentional effect of some well-meaning text editing software. As you correctly point out, code examples are intended to be copied and pasted, and you should report a bug to the site admins.
To add to the confusion, the error message printed by Seed's JavaScript parser is wrong: it misrepresents the \u201c character (which has decimal code 8220) as \u8220. This is why attempts to search for \u8220 fail.
To fix the problem in your file, you need to replace the character with an upright quotation mark. To search for an arbitrary Unicode character in a GTK-based text editor, press ctrl-f to initiate search and input the character by with ctrl-shift-u followed by the code point, in this case 201c. You will need to do the same for the right quotation mark, whose hex code is 201d.

Related

WordPress loading javascript with strange character set

I'm using WordPress 5.1 with Yoast SEO. Yoast SEO relies on the file components.js which is throwing the following error in the console (I've edited this for brevity - it's a very long string)
Uncaught SyntaxError: Invalid regular expression:
/[A-Za-zªµºÀ-ÖØ-öø-ƺƻƼ-Æ¿Ç€-ǃDŽ-ʓʔʕ-ʯʰ-ʸʻ-ËË-Ë‘Ë -ˤˮͰ-ͳͶ-ͷͺͻ-ͽͿΆΈ-ΊΌΎ-Î
The identical string does not appear in the file, though the file does include the following line when looking with a text editor:
["+"A-Za-zªµºÀ-ÖØ-öø-ƺƻƼ-Æ¿Ç€-ǃDŽ-ʓʔʕ-ʯʰ-ʸʻ-ËË-Ë‘Ë -ˤˮͰ-ͳͶ-ͷͺÍ
The line looks like this when looking through the webhost control panel:
["+"A-Za-zªµºÀ-ÖØ-öø-ƺƻƼ-ƿǀ-ǃDŽ-ʓʔʕ-ʯʰ-ʸʻ-ˁː-ˑˠ-ˤˮͰ-ͳͶ-ͷͺͻ-ͽͿΆΈ-ΊΌΎ-ΡΣ-ϵϷ-ҁ҂Ҋ-ԯԱ-Ֆՙ՚-՟ա-և։ःऄ-हऻ
The only odd thing is that the database had a mix of character sets (latin1, utf8 and utf8mb4) which I have attempted to fix and all tables now use utf8mb4_uncode_ci (this was chosen as it was the most common character set in the db).
There is also a mix of InnoDB and MyISAM table types. The site has a number of  characters around the site which is a common indicator of character set issues as far as I can tell.
So I'm guessing for some reason WordPress is loading the javascript file with the incorrect character set which is creating errors.
Is there a way to fix this? I'm a bit baffled.
Fixed.
This was due to blog_charset being set to UTF-7 in wp_options. Changing this to UTF-8 has solved the problem

Cross-Browser - Newline characters in textareas

In my web application (JSP, JQuery...) there is a form which, along with other fields, has a textarea where the user can input notes freely. The value is saved to the database as is.
The problem happens when the value has newline characters and is loaded back to the textarea; it sometimes "breaks" the Jquery code. Explaining further:
The value is loaded to the textarea using Jquery:
$('#p_notas').text("value_from_db");
When the user hits Enter to insert a new paragraph, the resulting value will include a newline character (or more than one char). This char is the problem as it varies from browser to browser and I haven't found out which one is causing the problem.
The error I get is a console error: SyntaxError: unterminated string literal. The page doesn't load correctly.
I'm not able to reproduce the problem. I tried with Chrome, Firefox and IE Edge (with several combinations of user agent and document mode).
We advise our users to use IE8+, Firefox or Chrome but we can't control it.
What I wanted to know is which character is causing the problem and how can I solve it.
Thanks
EDIT: Summing up - What are the differences in newline characters for the different browsers? Can I do anything to make them uniform?
EDIT 2: Looking at the page in the debugger, what I get is:
Case 1 (No problem)
$('#p_notas').text("This is the text I inserted \r\n More text");
Case 2 (Problem)
$('#p_notas').text("This is the text I inserted
More text");
In case 2 I get the Javascript error "SyntaxError: unterminated string literal." because it is interpreted as two lines of code
EDIT 3: #m02ph3u5 I tried using '\r' '\n' '\r\n' '\n\r' and I couldn't reproduce the problem.
EDIT 4: I'm going to try and replace all line breaks with '\n\r'
EDIT 5: In case it is of interest, what I did was treat the value before it was saved
value.replace(/(?:\r\n|\r(?=\n)|\n(?=\r))/g, '\n\r')
The problem isn't the browser but the operating system. Quoting from this post:
So, using \r\n will ensure linebreaks on all major operating systems
without issue.
Here's a nice read on the why: why do operating systems implement line breaks differently?
The problem you might be experiencing is saving the value of the textarea and then returning that value including any newlines. What you could do is "normalize" the value before saving, so that you don't have to change the output. In other words: get the value from the textarea, do a find-and-replace and replace every ossible occurrence of a newline (\r, \n) by a value that works on all OS's \r\n. Then, when you get the value from the database later on, it'll always be correct.
I suspect your problem is actually any new line in the entered input is causing an issue. It looks like on the server you are have a templated page something like:
$('#p_notas').text("<%=db.value%>");
So what you end up with client side is:
$('#p_notas').text("some notes that
were entered by the user");
or some other characters that break the JS. Embedded quotes would do it too.
You need to escape the user entered values some how. The preferred "modern" way is to format info you are returning as AJAX. If you are embedding the value within a template what I might do is:
<div style="display:none" id="userdata><%=db.value%></div>
<script>$('#p_notas').text($("#userdata").text());</script>
Of course if it were this exactly you could just embed the data in the text area <textarea><%=db.value%></textarea>
When you output data to the response, you always need to encode it using the appropriate encoding for the context it appears in.
You haven't mentioned which server-side technology you're using. In ASP.NET, for example, the HttpUtility class contains various encoding methods for different contexts:
HtmlEncode for general HTML output;
HtmlAttributeEncode for HTML attributes;
JavaScriptStringEncode for javascript strings;
UrlEncode for values passed in the query-string of a URL;
In some cases, you might need to encode the value more than once. For example, if you're passing a value in a URL via a javascript string, you'd need to UrlEncode the raw value, then JavaScriptStringEncode the result.
Assuming that you're using ASP.NET, and your code currently looks something like this:
$('#p_notas').text("<%# Eval("SomeField") %>");
change it to:
$('#p_notas').text("<%# HttpUtility.JavaScriptStringEncode(Eval("SomeField", "{0}")) %>");

Why does this character encoding issue only occur on select systems?

We are using a JavaScript WYSIWYG text editor called CKEditor. The editor has a source view that marks up, with HTML, what the user has entered in the text editor. Sometimes the editor will insert non-breaking spaces ( ) into this source view, which is fine.
Everything seemed to work correctly on the dev machines so we deployed to our production servers. At this point we started seeing a weird  character (Â) being inserted into the text. After some reading I saw that this was reported in several tickets on the CKEditor bug tracking page. I was able to resolve the issue by setting the charset attribute on the script tag for ckeditor.js to UTF-8.
My question is this: Why did the script tag need the charsetattribute set in the first place, and why only on certain systems?
The last comment on this SO question mentions that the byte sequence for a non-breaking space in UTF-8 is actually the  character followed by a non-breaking space in latin1 (which is ISO-8859-1 right?). This could definitely be a clue because another  character is inserted, one after another, every time the user switches to source view. It is as if the CKEditor framework is trying to inject a non-breaking space, but that gets turned into Â&nbsp, then ÂÂ&nbsp, and so on. The content-type on all systems (viewed from Chrome debugger) is text/html;charset=ISO-8859-1, which I am unsure why. The Dfile.encoding option in all Tomcat configs is set to utf-8. The meta tag is also <meta charset="utf-8">.
Fire up your development tools in the Web browser. When a form is rendered / submitted, stop and look at the request and response headers that are sent back and forth. Make sure you see UTF-8 everywhere. If it's missing, then one side will assume "default encoding" - whatever that might be.
Also make sure you have set the charset on the forms because they don't automatically inherit the one from the page.
EDIT This page explains in detail how you can set the charset when using Tomcat plus the necessary code for your servlets.

JS backslash escape char being converted to non-escaping character by Shift JIS

I'm currently working on a website that has two versions, one American website that's served as utf-8 and one Japanese version that's served as Shift JIS. The site is generated using Perl.
The problem:
I'm serving Javascript akin to the following.
var text = "test \"quote\"";
Which, on the Japanese site, is returning an error "Uncaught SyntaxError: Unexpected identifier." This is because the backslash is being converted to an elongated backslash character \, which isn't seen as an escape character and thus is breaking the line.
I can't seem to find anyone else running into this problem which makes me suspicious that there isn't something fundamentally wrong with our website. Has anyone encountered a similar situation and found a solution?
Many thanks
I found some helpful information here:
Why browser is showing different back slash for a email validation regex. How to prevent that?
Which lead me to this upsetting hack:
var text = "test ¥"quote¥"";
This works perfectly. Now, obviously this isn't the way to do it, but it will enable other devs to get on testing other JS interactions on the site while I concentrate on refactoring this code into something that doesn't rely on character escaping. I hope this information helps someone else at some point!

why does minified jQuery have line breaks? [duplicate]

I know about a similar question but it is a tiny bit off from what I am asking here, so please don't flag it as a duplicate.
When you see the production version of jQuery, why is there a newline after a while? I downloaded a copy and deleted all the newlines (apart from the licence) and it still worked. (I ran the entire unit test suite against my changes on Mozilla Firefox, Google Chrome and Opera.)
I know three newlines (not counting the license) is not going to slow it down a lot, but still, doesn't every tiny bit help?
I have assigned myself a small challenge, to squeeze every little bit of performance out of my JavaScript code.
jQuery currently use UglifyJS to minify their source code. In their build script, they specifically set the max_line_length directive to be 32 * 1024:
The documentation for UglifyJS has this to say on the max-line-len directive;
--max-line-len (default 32K characters) — add a newline after around 32K characters. I’ve seen both FF and Chrome croak when all the code was on a single line of around 670K. Pass –max-line-len 0 to disable this safety feature.
To cite the Closure Compiler FAQ:
The Closure Compiler intentionally adds line breaks every 500
characters or so. Firewalls and proxies sometimes corrupt or ignore
large JavaScript files with very long lines. Adding line breaks every
500 characters prevents this problem. Removing the line breaks has no
effect on a script's semantics. The impact on code size is small, and
the Compiler optimizes line break placement so that the code size
penalty is even smaller when files are gzipped.
This is relevant to any minification programs in general.
The lines (excluding the license) are all around 30k characters in length. It could be to avoid bugs where some Javascript parsers die on extremely long lines. This probably won't happen on today's browsers but maybe some older or more obscure ones have such limits.
(Old answer below, which might also be applicable, just not in this case)
This might be because JSMin, a popular Javascript minifier will retain line feeds in the output under certain conditions. This is because in Javascript line feeds are significant if you leave out semicolons, for example. The documentation says:
It is more conservative in omitting linefeeds, because linefeeds are sometimes treated as semicolons. A linefeed is not omitted if it precedes a non-ASCII character or an ASCII letter or digit or one of these characters:
\ $ _ { [ ( + -
and if it follows a non-ASCII character or an ASCII letter or digit or one of these characters:
\ $ _ } ] ) + - " '
Other minifiers might have similar rules.
So this is mostly a precaution against accidentally removing a line feed that may be necessary, syntax-wise. The last thing you want is that your minified JS won't work anymore because the minifier destroyed its semantics.
Regarding »I know three newlines (not counting the license) is not going to slow it down a lot, but still, doesn't every tiny bit help?«: When your server uses gzip compression the difference will likely be moot anyway.

Categories

Resources