unicode chars give "unterminated string literal" in js

unicode chars give "unterminated string literal" in js - javascript

This error is generated when my HTML has some weird characters seen as a whitespace.
<html xmlns="http://www.w3.org/1999/xhtml">
<head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<title></title>
</head>
<body>
<p>Some  Text</p>
</body>
</html>
Note that there is a character between Some and Text, but it is not seen here. I need to pass this to a function toJson(), but it returns an error saying unterminated string literal.
Everything just works fine when I use a simple text instead of this like:
Some<space>Text works fine.
I've tried all the str_replace function which I found while searching for the same -
1) var re = /(?![\x00-\x7F]|[\xC0-\xDF][\x80-\xBF]|[\xE0-\xEF][\x80-\xBF]{2}|[\xF0-\xF7][\x80-\xBF]{3})./g;
params.body_html = html.replace(re, '');
angular.toJson(params); // gives error
2) params.body_html.replace(/\uFFFD/g, '');
angular.toJson(params); // gives error
I don't know what character is this(may be unicode). When I copy this to a emacs file, it is seen as �򠠨.
Note: You see this character as a red dot when you edit this question and click on edit the snippet for the above html.
Any hints/ideas of how I can make this work ?

Got this working with:
params.body_html = params.body_html.replace(/\u2028/g, '');
angular.toJson(params); //works fine.
Thanks to #Gothdo for providing the character link.
But the problem is it'll only replace if html has only this particular unicode char. Is there any function with which all unicode characters gets replaced or trimmed ?

Related

Differentiate normal text from text in quotes

For a project that contains shortened JS code embedded onto a webpage, I want to know if text - which is from the value of a textarea on the webpage - is in quotes or not.
I already have this RegExp:
/(?:^|")([^"]*)(?:$|")/
It behaves weirdly when running .exec() on it via about:blank with something like "\"console\" console \"asdf\" asdf \"consolea\" consolea" (AKA only """ and "") , but I think it's because I don't really understand what the resulting data means nor am using it correctly or have the correct one.
What I'd want my code to do abstractly is this:
[Completed] Get the stringified value of the textarea on the page by its ID.
If console without any extra characters is included before the quoted text, get the quoted text minus its quotes (inside of a string, e.g. "text" instead of "\"text\"") just after it regardless of new-lines, provided that its starting quote comes before anything else after console.
Log the refined string to the console via console.log.
Code:
<!DOCTYPE html>
<html>
<head>
<meta charset="UTF-8">
<title>Custom Programming Language</title>
</head>
<body>
<textarea id="code"></textarea>
<br>
<button id="run">Run!</button>
<script>
var code = document.getElementById("code").value.toString();
// etc.
</script>
</body>
</html>

If you already have an algorithm in mind, just map it to a regular expression - let's break step 2 down:
"If console without any extra characters" - match (?:^|\s+)console\s+ ("console" at line start or preceded by one or more spaces)
"before the quoted text minus its quotes" - match \\?"(.+?)\\?" (anything wrapped in quotes as a capturing group, quantify lazily to match the first closing quote). If you only allow escaped quotes, remove the ? quantifier.
"regardless of new-lines" - set the m flag for multiline behavior.
All of the above combined yields us /(?:^|\s+)console\s+\\?"(.+?)\\?"/gm
(() => {
const code = document.querySelector("#code");
const btn = document.querySelector("#run");
code.value = `console \"test\"
some other code here
console \"another test\"
`;
const regex = /(?:^|\s+)console\s+\\?"(.+?)\\?"/gm;
btn.addEventListener("click", () => {
const { value } = code;
[...value.matchAll(regex)].forEach(m => console.log(m[1]));
});
})();
textarea {
width: 50vw;
height: 50vh;
}
<!DOCTYPE html>
<html>
<head>
<meta charset="UTF-8">
<title>Custom Programming Language</title>
</head>
<body>
<textarea id="code"></textarea>
<br>
<button id="run">Run!</button>
</body>
</html>

innerHTML does not show unicode text correctly

What is wrong with this code:
document.getElementById("artist").innerHTML = "Jürgen";
How can I make it that this text with à ü ë and so on is displayed correctly?
In the head I have set:
meta charset="UTF-8"
meta http-equiv="content-type" content="text/html; charset=UTF-8"
But this does not seem to work.
I also tried replacing ü with \uü. Also didnt work.

Yes need to use special HTML codes to display those letters. Just paste these provided codes (without quotation marks) wherever you need those letters / symbols inside your HTML, like this:
<p>My letter a fancy is like this á</p>
For à use "&aacute"
For ü use "&uuml";
For ë use "&euml";

For the Umlaut you will need to replace the u with the below html code which should show the correct output
ü

To properly declare your charset, note that it must be:
Within the <head> element,
Before any elements that contain text, such as the <title> element,
AND Within the first 512 bytes of your document, including DOCTYPE and whitespace
(source: code.google.com)
In short, it should be the first thing in your page <head>.
Also, I can't find the reference anymore, but it seems that it's not syntactically correct to use both <meta charset="utf-8"> and <meta http-equiv="content-type" content="text/html; charset=UTF-8">. At least, it is quite useless, they mean the same thing. The second one is deprecated and should be used only for very old browsers compatibility.
And of course, your HTML file should be saved in UTF-8 too.
I hope it will solve your problem

If you are using PHP, try using utf8_encode(text) to see if works!
example:
var html = `
<p class="project-title"><?php print utf8_encode($title); ?></p>
<p class="project-desc"><?php print utf8_encode($description); ?></p>`
document.body.innerHTML += html;
This worked for me :)

Display path with backslash (javascript)

I try to display a path on an simple javascript alert command :
<html xmlns="http://www.w3.org/1999/xhtml" lang="fr" xml:lang="fr">
<head>
</head>
<body>
<div onClick=myFunction('D:\user\myself\dos')>
clic here
</div>
<SCRIPT LANGUAGE = "JAVASCRIPT">
function myFunction(p) {
alert(p);
}
</SCRIPT>
But it does not display the backslash..
I suppose I should replace all "\" by "\" but I don't find a way to do it.
(I tried p = p.replace(/\\/g, '\\\\'); and a lot of other syntaxes but none of those worked.
Do you have any idea of how to deal with that ?
EDIT :
The path comes out from a function and I can't edit it directly in "onClick"

The backslash '\' itself is used as the escape character.
So add one more backslash before every backslash you are going to display.
In case if you cannot modify url try to add new attribute and access that attribute within onClick handler.
Try working snippet below:
function myFunction(elem) {
alert(elem.getAttribute('data-url'));
}
<html xmlns="http://www.w3.org/1999/xhtml" lang="fr" xml:lang="fr">
<head>
</head>
<body>
<div data-url="D:\user\myself\dos" onClick=myFunction(this)>
clic here
</div>
Update: Code snippet updated to allow displaying url without modifying string.

You just need to call your function with double the backslashes to escape the escape character:
myFunction('D:\\user\\myself\\dos')
Will this work in your case?

regex - replace multi line breaks with single in javascript

this is some kind of variable content in javascript:
<meta charset="utf-8">
<title>Some Meep meta, awesome</title>
<-- some comment here -->
<meta name="someMeta, yay" content="meep">
</head>
I want to reduce the multi line breaks (unknown number) to a single line break while the rest of the formatting is still maintained. This should be done in javascript with a regex.
I have problems with the tabulator or to keep the format.

Try this:
text.replace(/\n\s*\n/g, '\n');
This basically looks for two line breaks with only whitespace in between. And then it replaces those by a single line break. Due to the global flag g, this is repeated for every possible match.
edit:
is it possibile to leave a double line break instead of a single
Sure, simplest way would be to just look for three line breaks and replace them by two:
text.replace(/\n\s*\n\s*\n/g, '\n\n');
If you want to maintain the whitespace on one of the lines (for whatever reason), you could also do it like this:
text.replace(/(\n\s*?\n)\s*\n/, '$1');

myText = myText.replace(/\n{2,}/g, '\n');
See demo

Given the following (remember to encode HTML entities such as <, > and (among others, obviously) &):
<pre>
<head>
<meta charset="utf-8">
<title>Some Meep meta, awesome</title>
<-- some comment here -->
<meta name="someMeta, yay" content="meep">
</head>
</pre>
<pre>
</pre>
The following JavaScript works:
var nHTML = document.getElementsByTagName('pre')[0].textContent.replace(/[\r\n]{2,}/g,'\r\n');
document.getElementsByTagName('pre')[1].appendChild(document.createTextNode(nHTML));
JS Fiddle demo.

To replace all the extra line breaks and leave only one use:
myText = myText.replace(/\n\n*/g,'\r\n');

jQuery - .length() counts special chars as 2?

I have this problem when i use $('#id').val().length; it returns 2 when I use characters like æ, ø and å.
Can someone tell me why and how I can get it to work like ( one ) char?

I suspect something else is going on here and it is not an issue of encoding.
I refuse to believe this is a jQuery issue (see http://jsfiddle.net/KLzYf/ for my jsutification).
The following raw HTML will report back "1":
<html>
<head>
<meta http-equiv="content-type" content="text/html;charset=utf-8">
</head>
<html>
<body>
<input type="text" value="æ" id="test"/>
</body>
<script type="text/javascript">
alert(document.getElementById("test").value.length);
</script>
</html>
I'd be interested to see some of the HTML/other code. And to have a few tests, for instance, what do the following give you
alert("æ".length); //=1?
alert('"' + $('#id').val() + '"'); //are there any spaces/other chars?
Also, if you view-source on the HTML, what do the contents of your input look like.

try this:
http://jsfiddle.net/Innuendo108/GXwGG/
There are 2 characters and it says length=2

I think you used extra space either or any one side of that character.
<p id="t">æ</p>
$('#t').text().length
This work properly.

Develop Reference

JavaScript is the programming language of the Web.

unicode chars give "unterminated string literal" in js - javascript

Related

Differentiate normal text from text in quotes

innerHTML does not show unicode text correctly

Display path with backslash (javascript)

regex - replace multi line breaks with single in javascript

jQuery - .length() counts special chars as 2?

Categories

Resources