Remove multiple line breaks (\n) in JavaScript - javascript

We have an onboarding form for new employees with multiple newlines (4-5 between lines) that need stripped. I want to get rid of the extra newlines but still space out the blocks with one \n.
example:
New employee<br/>
John Doe
Employee Number<br/>
1234
I'm currently using text = text.replace(/(\r\n|\r|\n)+/g, '$1'); but that gets rid of all newlines without spacing.

text = text.replace(/(\r\n|\r|\n){2,}/g, '$1\n');
use this, it will remove newlines where there are at least 2 or more
update
on specific requirement of the OP I will edit the answer a bit.
text = text.replace(/(\r\n|\r|\n){2}/g, '$1').replace(/(\r\n|\r|\n){3,}/g, '$1\n');

We can tidy up the regex as follows:
text = text.replace(/[\r\n]{2,}/g, "\n");

Related

Why do I need to replace \n with \n?

I have a line of data like this:
1•#00DDDD•deeppink•1•100•true•25•100•Random\nTopics•1,2,3,0•false
in a text file.
Specifically, for my "problem", I am using Random\nTopics as a piece of text data, and I then search for '\n', and split the message up into two lines based on the placement of '\n'.
It is stored in blockObj.msg, and I search for it using blockObj.msg.split('\n'), but I kept getting an array of 1 (no splits). I thought I was doing something fundamentally wrong and spent over an hour troubleshooting, until on a whim, I tried
blockObj.msg = blockObj.msg.replace(/\\n/g, "\n")
and that seemed to solve the problem. Any ideas as to why this is needed? My solution works, but I am clueless as to why, and would like to understand better so I don't need to spend so long searching for an answer as bizarre as this.
I have a similar error when reading "text" from an input text field. If I type a '\n' in the box, the split will not find it, but using a replace works (the replace seems pointless, but apparently isn't...)
obj.msg = document.getElementById('textTextField').value.replace(/\\n/g, "\n")
Sorry if this is jumbled, long time user of reading for solutions, first time posting a question. Thank you for your time and patience!
P.S. If possible... is there a way to do the opposite? Replace a real "\n" with a fake "\n"? (I would like to have my dynamically generated data file to have a "\n" instead of a new line)
It is stored in blockObj.msg, and I search for it using blockObj.msg.split('\n'),
In a JavaScript string literal, \n is an escape sequence representing a new line, so you are splitting the data on new lines.
The data you have doesn't have new lines in it though. It has slash characters followed by n characters. They are data, not escape sequences.
Your call to replace (blockObj.msg = blockObj.msg.replace(/\\n/g, "\n")) works around this by replacing the slashes and ns with new lines.
That's an overcomplicated approach though. You can match the characters you have directly. blockObj.msg.split('\\n')
in your text file
1•#00DDDD•deeppink•1•100•true•25•100•Random\nTopics•1,2,3,0•false
means that there are characters which are \ and n thats how they are stored, but to insert a new line character by replacement, you are then searching for the \ and the n character pair.
obj.msg = document.getElementById('textTextField').value.replace(/\\n/g, "\n")
when you do the replace(/\\n/g, "\n")
you are searching for \\n this is the escaped version of the string, meaing that the replace must find all strings that are \n but to search for that you need to escape it first into \\n
EDIT
/\\n/g is the regex string..... \n is the value... so /\REGEXSTUFFHERE/g the last / is followed by regex flags, so g in /g would be global search
regex resources
test regex online

How to add '>' to every new line in a string in javascript?

I have a text area on a UI and I need the user to type in Markdown. I need to make sure that each line they type will start with > as I want to view everything the typed as a blockquote when they preview it.
So for example if they type in:
> some text user <b>typed</b>
another line
When the markdown is rendered, only the fist line is a blockquote. The rest is plain text outside the blockquote.
Is there a way I can check each line and add the > if it is missing.
Things I have tried:
I tried removing all > characters and replacing each \n with a \n>. This however messed up the markdown as the user can also type in <b>bold text</b>.
I have a loop that checks for the > character after every new line. I just don't know how to insert the > if its missing.
Loop code:
var match = /\r|\n/.exec(theString);
if (match) {
if (theString.charAt(match.index)!='>'){
// don't know how to ad the character
}
}
I also though that maybe I can enforce the > in the textarea, but that research got me nowhere. As in, I don't think that is possible.
I also thought, what if the user types multiple >>>>. At that stage I was thinking about it too much and said I'd leave out cases like that as maybe that is the user's intention.
If anyone has any suggestions and/or alternative solutions it would be very much appreciated. Thank you :)
You can use a regular expression to insert > to the beginning of each line, if it doesn't exist:
const input = `> some text user <b>typed</b>
another line
another line 2
> another line 3`;
const output = input.replace(/^(?!>)/gm, '> ');
console.log(output);
The pattern ^(?!>) means: match the beginning of a line, which is not followed by >.
If you only want to insert >s where lines have text already, then also lookahead for non-whitespace in the line:
const input = `> some text user <b>typed</b>
another line
another line 2
> another line 3`;
const output = input.replace(/^(?!>)(?=[^\n]*\S)/gm, '> ');
console.log(output);
I'd go with replace (first thing you tried). In order to insert literal > in HTML, you have to escape it.
Just replace \n with \n> and you're all set.

How to detect sentences without comments and markdown using Javascript regex?

Problem
I have a piece of text. It can contain every character from ASCII 32 (space) to ASCII 126 (tilde) and including ASCII 9 (horizontal tab).
The text may contain sentences. Every sentence ends with dot, question mark or exclamation mark, directly followed by space.
The text may contain a basic markdown styling, that is: bold text (**, also __), italic text (*, also _) and strikethrough (~~). Markdown may occur inside sentences (e.g. **this** is a sentence.) or outside them (e.g. **this is a sentence!**). Markdown may not occur across sentences, that is, there may not be a situation like this: **sentence. sente** nce.. Markdown may include more than one sentence, that is, there may be a situation like this: **sentence. sentence.**.
It can also contain two sequences of characters: <!-- and -->. Everything between these sequences is treated as a comment (like in HTML). Comments can occur at every position in the text, but cannot contains newlines characters (I hope that on Linux it is just ASCII 10).
I want to detect in Javascript all sentences, and for each of them put its length after this sentence in a comment, like this: sentence.<!-- 9 -->. Mainly, I do not care if their length includes the length of the markdown tags or not, but it would be nice if it does not.
What have I done so far?
So far, with help of this answer, I have prepared the following regex for detecting sentences. It mostly fits my needs – except that it includes comments.
const basicSentence = /(?:^|\n| )(?:[^.!?]|[.!?][^ *_~\n])+[.!?]/gi;
I have also prepared the following regex for detecting comments. It also works as expected, at least in my own tests.
const comment = /<!--.*?-->/gi;
Example
To better see what I want to achieve, let us have an example. Say, I have the following piece of text:
foo0
b<!-- comment -->ar.
foo1 bar?
<!-- comment -->
foo2bar!
(There is also a newline at the end of it, but I do not know how to add an empty line in Stackoverflow markdown.)
And the expected result is:
foo0
b<!-- comment -->ar.<!-- 10 -->
foo1 bar?<!-- 9 -->
<!-- comment -->
foo2bar!<!-- 12 -->
(This time, there is no also newline at the end.)
UPDATE: Sorry, I have corrected the expected result in the example.
Pass a callback to .replace that replaces all comments with the empty string, and then returns the length of the resulting trimmed match:
const input = `foo0
b<!-- comment -->ar.
foo1 bar?
<!-- comment -->
foo2bar!
`;
const output = input.replace(
/(?:^|\n| )(?:[^.!?]|[.!?][^ *_~\n])+[.!?]/g,
(match) => {
const matchWithoutComments = match.replace(/<!--.*?-->/g, '');
return `${match}<!-- ${matchWithoutComments.length} -->`;
}
);
console.log(output);
Of course, you can use a similar pattern to replace markdown notation with the inner text content as well, if you wish:
.replace(/([*_]{1,2}|~~)((.|\n)*?)\1/g, '$2')
(due to nested and possibly unbalanced tags, which regex is not very good at working with, you may have to repeat that line until no further replacements can be found)
Also, per comment, your current regular expression is expecting every sentence to end in ., !, or ?. The comment's ! in <!-- is treated as the end of a (short) sentence. One option would be to lookahead for whitespace (a space, or a newline) or the end of the input at the very end of the regex:
const input = `foo0
b<!-- comment -->ar.
foo1 bar?
<!-- comment -->
foo2bar!
<!-- comment -->`;
const output = input.replace(
/(?:^|\n| )(?:[^.!?]|[.!?][^ *_~\n])+[.!?](?=\s|$|[*_~])/g,
(match) => {
const matchWithoutComments = match.replace(/<!--.*?-->/g, '');
return `${match}<!-- ${matchWithoutComments.length} -->`;
}
);
console.log(output);
https://regex101.com/r/RaTIOi/1

Extract specific text in between 2 strings

Assume we have text such as the following.
Title: (some text)
My Title [abc]
Content: (some test)
My long content paragraph. With multiple sentences. [abc]
Short Content: (some text)
Short content [abc]
Using Javascript and RegEx, is it possible to extract the text so that it would be as follows.
Title: My Title
Content: My long content paragraph. With multiple sentences.
Short Content: Short content
Basically ignoring new lines and text in the () and [] brackets?
I've tried to use Regex but I can't get it to do exactly as I'd like. I'm also getting the issue that when I match Content: i'm getting a match for both Content: & Short Content: however i'd want to only match the occurrence where it is an exact match.
EDIT:
I'm new to RegEx. So far to extract the titles such as Title:, Content: and so on I have
/[A-Za-z]+:|[A-Za-z]+ [A-Za-z]+:|[A-Za-z]+ [A-Za-z]+ [A-Za-z]+:|[A-Za-z]+ [A-Za-z]+ [0-9]+:/g
And then I loop through and use this
[TITLENAME]:.*\n.*
I'm struggling to get past this. My next step would be to loop through the text that is matched above and then remove the bracket stuff. I'm sure there is a better way to do this!
You could use String.replace( /(\(|\)|\[|\])/g , '')
If you take a string and use the replace method with these two arguments it will return a string with the ()[] characters removed. I have escaped them all with \ since they are special characters in regex. It might be a little over zealous.
Also g makes the regular expression global so it will remove all instances
If the text within parenthesis (e.g. 'abc') is fixed and have a special meaning you can also go with: '/(\(some text\)\n|\(some test\)\n|(\[abc\]))|(^$\n)/gm'.
This way you would allow parenthesis in the real text that you want to preserve, e.g. some text (this I want to preserve) and other text.
Please note the multiline m flag.
https://regex101.com/r/cS3pRR/1

Converting special characters in c# string to html special characters

I am using .Net:
fulltext = File.ReadAllText(#location);
to read text anyfile content at given locatin.
I got result as:
fulltext="# vdk10.syx\t1.1 - 3/15/94\r# #(#)Copyright (C) 1987-1993 Verity, Inc.\r#\r# Synonym-list Database Descriptor\r#\r$control: 1\rdescriptor:\r{\r data-table: syu\r {\r worm:\tTHDSTAMP\t\tdate\r worm:\tQPARSER\t\t\ttext\r\t/_hexdata = yes\r varwidth:\tWORD\t\tsyv\r fixwidth:\tEXPLEN\t\t\t2 unsigned-integer\r varwidth:\tEXPLIST\t\tsyx\r\t/_hexdata = yes\r }\r data-table: syw\r {\r varwidth:\tSYNONYMS\tsyz\r }\r}\r\r ";
Now, I want this fulltext to be displayed in html page so that special characters are recognized in html properly. For examples: \r should be replaced by line break tag
so that they are properly formatted in html page display.
Is there any .net class to do this? I am looking for universal method since i am reading file and I can have any special characters. Thanks in advance for help or just direction.
You're trying to solve two problems:
Ensure special characters are properly encoded
Pretty-print your text
Solve them in this order:
First, encode the text, by importing the System.Web namespace and using HttpUtility (asked on StackOverflow). Use the result in step 2.
Pretty-printing is trickier, depending on the amount of pretty-printing that you want. Here are a few approaches, in increasing order of difficulty:
Put the text in a pre element. This should preserve newlines, tabs, spaces. You can still adjust the font used using CSS if you first slap a CSS class on the pre.
Replace all \r, \r\n and remaining \n with <br/>.
Study the structure of your text, parse it according to this structure, and provide specific tags in specific contexts. For example, the tab characters in your example may be indicative of a list of items. HTML provides the ol and ul elements for lists. Similarly, consecutive line breaks may indicate paragraphs, for which HTML provides the well known p element.
Thanks Everyone here for your valuable comment. I solved my formatting problem in client side with following code.
document.getElementById('textView').innerText = fulltext;
Here textview is the div where i want to display my fulltext . I don't think i need to replace special characters in string fulltext. I output as shown in the figure.

Categories

Resources