regex - replace multi line breaks with single in javascript

regex - replace multi line breaks with single in javascript - javascript

this is some kind of variable content in javascript:
<meta charset="utf-8">
<title>Some Meep meta, awesome</title>
<-- some comment here -->
<meta name="someMeta, yay" content="meep">
</head>
I want to reduce the multi line breaks (unknown number) to a single line break while the rest of the formatting is still maintained. This should be done in javascript with a regex.
I have problems with the tabulator or to keep the format.

Try this:
text.replace(/\n\s*\n/g, '\n');
This basically looks for two line breaks with only whitespace in between. And then it replaces those by a single line break. Due to the global flag g, this is repeated for every possible match.
edit:
is it possibile to leave a double line break instead of a single
Sure, simplest way would be to just look for three line breaks and replace them by two:
text.replace(/\n\s*\n\s*\n/g, '\n\n');
If you want to maintain the whitespace on one of the lines (for whatever reason), you could also do it like this:
text.replace(/(\n\s*?\n)\s*\n/, '$1');

myText = myText.replace(/\n{2,}/g, '\n');
See demo

Given the following (remember to encode HTML entities such as <, > and (among others, obviously) &):
<pre>
<head>
<meta charset="utf-8">
<title>Some Meep meta, awesome</title>
<-- some comment here -->
<meta name="someMeta, yay" content="meep">
</head>
</pre>
<pre>
</pre>
The following JavaScript works:
var nHTML = document.getElementsByTagName('pre')[0].textContent.replace(/[\r\n]{2,}/g,'\r\n');
document.getElementsByTagName('pre')[1].appendChild(document.createTextNode(nHTML));
JS Fiddle demo.

To replace all the extra line breaks and leave only one use:
myText = myText.replace(/\n\n*/g,'\r\n');

Related

innerHTML does not show unicode text correctly

What is wrong with this code:
document.getElementById("artist").innerHTML = "Jürgen";
How can I make it that this text with à ü ë and so on is displayed correctly?
In the head I have set:
meta charset="UTF-8"
meta http-equiv="content-type" content="text/html; charset=UTF-8"
But this does not seem to work.
I also tried replacing ü with \uü. Also didnt work.

Yes need to use special HTML codes to display those letters. Just paste these provided codes (without quotation marks) wherever you need those letters / symbols inside your HTML, like this:
<p>My letter a fancy is like this á</p>
For à use "&aacute"
For ü use "&uuml";
For ë use "&euml";

For the Umlaut you will need to replace the u with the below html code which should show the correct output
ü

To properly declare your charset, note that it must be:
Within the <head> element,
Before any elements that contain text, such as the <title> element,
AND Within the first 512 bytes of your document, including DOCTYPE and whitespace
(source: code.google.com)
In short, it should be the first thing in your page <head>.
Also, I can't find the reference anymore, but it seems that it's not syntactically correct to use both <meta charset="utf-8"> and <meta http-equiv="content-type" content="text/html; charset=UTF-8">. At least, it is quite useless, they mean the same thing. The second one is deprecated and should be used only for very old browsers compatibility.
And of course, your HTML file should be saved in UTF-8 too.
I hope it will solve your problem

If you are using PHP, try using utf8_encode(text) to see if works!
example:
var html = `
<p class="project-title"><?php print utf8_encode($title); ?></p>
<p class="project-desc"><?php print utf8_encode($description); ?></p>`
document.body.innerHTML += html;
This worked for me :)

Why document.write('\ud83d\ude00') can output a emoji in HTML which is UTF-8 charset?

This is html file:
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title>emoji</title>
</head>
<body>
\ud83d\ude00
<script>
var string = '\ud83d\ude00';
document.write(string);
console.log(string);
</script>
</body>
</html>
The unicode '\ud83d\ude00' in body tag can't show as a emoji but a string. But '\ud83d\ude00' can show as a emoji by document.write() and console.log() even thought the document charset is UTF-8. Why?

The UTF-8 in the meta header just says how to read the text of the HTML response, it doesn't put the actual DOM document into some kind of mode that makes the document itself UTF-8.
As for why your string works: A JavaScript string is a series of UTF-16 code units. So '\ud83d\ude00' defines the emoji at the JavaScript level. Then you use document.write to write that string out to the document. At that point, you're dealing with a live document, not the source text that you said was in UTF-8.
If you want to include the emoji in the document directly, rather than via document.write, just paste it into the document; your editor will output the appropriate UTF-8 sequence for it if you save the file as UTF-8 (which you need to, because you've told the browser that's the encoding you're using).

Please just consider this an addition to T.J. Crowder's answer - I don't have 50 rep so couldn't add as a comment :(
You're escaping in html incorrectly, in Javascript backslashes (\) are used to escape but in HTML escaping is achieved by prefixing with ampersand (&) and suffixing with a semicolon (;).
You can use HTML numeric character references:
<div>decimal: 😀</div>
<div>hex: 😀</div>
And here is a good reference for emoji HTML codes.
Disclaimer: I have no affiliation with the website

The most obvious reason would be that \ud83d\ude00 are no escape sequences in HTML test.
Using HTML entities instead it should work:
\ud83d\ude00
😀
😀

unicode chars give "unterminated string literal" in js

This error is generated when my HTML has some weird characters seen as a whitespace.
<html xmlns="http://www.w3.org/1999/xhtml">
<head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<title></title>
</head>
<body>
<p>Some  Text</p>
</body>
</html>
Note that there is a character between Some and Text, but it is not seen here. I need to pass this to a function toJson(), but it returns an error saying unterminated string literal.
Everything just works fine when I use a simple text instead of this like:
Some<space>Text works fine.
I've tried all the str_replace function which I found while searching for the same -
1) var re = /(?![\x00-\x7F]|[\xC0-\xDF][\x80-\xBF]|[\xE0-\xEF][\x80-\xBF]{2}|[\xF0-\xF7][\x80-\xBF]{3})./g;
params.body_html = html.replace(re, '');
angular.toJson(params); // gives error
2) params.body_html.replace(/\uFFFD/g, '');
angular.toJson(params); // gives error
I don't know what character is this(may be unicode). When I copy this to a emacs file, it is seen as �򠠨.
Note: You see this character as a red dot when you edit this question and click on edit the snippet for the above html.
Any hints/ideas of how I can make this work ?

Got this working with:
params.body_html = params.body_html.replace(/\u2028/g, '');
angular.toJson(params); //works fine.
Thanks to #Gothdo for providing the character link.
But the problem is it'll only replace if html has only this particular unicode char. Is there any function with which all unicode characters gets replaced or trimmed ?

Why can't I get this entity code to display correctly in a browser?

I'm trying to code a UK Pound symbol to be written to a document by JavaScript, but it's code is not being translated and is instead displayed as entered.
See this JSBin http://jsbin.com/orocox/1/edit
This is the JavaScript:
$("#price").text('£ 1.99');
This is the html:
<!DOCTYPE html>
<html>
<head>
<script src="http://ajax.googleapis.com/ajax/libs/jquery/1.8/jquery.min.js"></script>
<meta charset=utf-8 />
<title>JS Bin</title>
</head>
<body>
<span id="price"></span>
</body>
</html>
This is the result:
'&pound(;) 1.99'
*Note that the parenthesis around the ';' are added by me to prevent the entity code from being translated on StackOverflow, but they do not exist in the actual output.
The result I want is:
'£ 1.99'.

use unicode instead: jsbin
$("#price").text('\u00A3 1.99');
explanation: the £ is an html entity and is not processed as normal text. but unicode works for any text. since you are using text it is processed as a string not an html.
check this page's encoding reference : here

Try $("#price").html('£ 1.99'); instead.

Use the character itself:
$("#price").text('£ 1.99');
This is good for the readability of your code. How you type “£” depends on your editing environment. E.g., on Windows, you can produce it by typing Alt 0163 if you cannot find any more convenient way (depending on keyboard, keyboard layout, and editor being used).

jQuery - .length() counts special chars as 2?

I have this problem when i use $('#id').val().length; it returns 2 when I use characters like æ, ø and å.
Can someone tell me why and how I can get it to work like ( one ) char?

I suspect something else is going on here and it is not an issue of encoding.
I refuse to believe this is a jQuery issue (see http://jsfiddle.net/KLzYf/ for my jsutification).
The following raw HTML will report back "1":
<html>
<head>
<meta http-equiv="content-type" content="text/html;charset=utf-8">
</head>
<html>
<body>
<input type="text" value="æ" id="test"/>
</body>
<script type="text/javascript">
alert(document.getElementById("test").value.length);
</script>
</html>
I'd be interested to see some of the HTML/other code. And to have a few tests, for instance, what do the following give you
alert("æ".length); //=1?
alert('"' + $('#id').val() + '"'); //are there any spaces/other chars?
Also, if you view-source on the HTML, what do the contents of your input look like.

try this:
http://jsfiddle.net/Innuendo108/GXwGG/
There are 2 characters and it says length=2

I think you used extra space either or any one side of that character.
<p id="t">æ</p>
$('#t').text().length
This work properly.

Develop Reference

JavaScript is the programming language of the Web.

regex - replace multi line breaks with single in javascript - javascript

myText = myText.replace(/\n{2,}/g, '\n'); See demo

To replace all the extra line breaks and leave only one use: myText = myText.replace(/\n\n*/g,'\r\n');

Related

innerHTML does not show unicode text correctly

Why document.write('\ud83d\ude00') can output a emoji in HTML which is UTF-8 charset?

unicode chars give "unterminated string literal" in js

Why can't I get this entity code to display correctly in a browser?

jQuery - .length() counts special chars as 2?

Categories

Resources

Develop Reference

JavaScript is the programming language of the Web.

regex - replace multi line breaks with single in javascript - javascript

myText = myText.replace(/\n{2,}/g, '\n');​​​​​​​ See demo

To replace all the extra line breaks and leave only one use: myText = myText.replace(/\n\n*/g,'\r\n');​​​​​​​

Related

innerHTML does not show unicode text correctly

Why document.write('\ud83d\ude00') can output a emoji in HTML which is UTF-8 charset?

unicode chars give "unterminated string literal" in js

Why can't I get this entity code to display correctly in a browser?

jQuery - .length() counts special chars as 2?

Categories

Resources

myText = myText.replace(/\n{2,}/g, '\n'); See demo

To replace all the extra line breaks and leave only one use: myText = myText.replace(/\n\n*/g,'\r\n');