JavaScript: how to use a regular expression to remove blank lines from a string? - javascript

I need to use JavaScript to remove blank lines in a HTML text box. The blank lines can be at anywhere in the textarea element. A blank line can be just a return or white spaces plus return.
I am expecting a regular expression solution to this. Here are some I tried, but they are not working and cannot figure out why:
/^\s*\r?\n/g
/^\s*\r?\n$/g
Edit 1
It appears that the solution (I modified it a little) suggested by aaronman and m.buettner works:
string.replace(/^\s*\n/gm, "")
Can someone tell why my first regular expression is not working?
Edit 2
After reading all useful answers, I came up with this:
/^[\s\t]*(\r\n|\n|\r)/gm
Is this going to be one that cover all situations?
Edit 3
This is the most concise one covering all spaces (white spaces, tabs) and platforms (Linux, Windows, Mac).
/^\s*[\r\n]/gm
Many thanks to m.buettner!

Your pattern seems alright, you just need to include the multiline modifier m, so that ^ and $ match line beginnings and endings as well:
/^\s*\n/gm
Without the m, the anchors only match string-beginnings and endings.
Note that you miss out on UNIX-style line endings (only \r). This would help in that case:
/^\s*[\r\n]/gm
Also note that (in both cases) you don't need to match the optional \r in front of the \n explicitly, because that is taken care of by \s*.
As Dex pointed out in a comment, this will fail to clear the last line if it consists only of spaces (and there is no newline after it). A way to fix that would be to make the actual newline optional but include an end-of-line anchor before it. In this case you do have to match the line ending properly though:
/^\s*$(?:\r\n?|\n)/gm

I believe this will work
searchText.replace(/(^[ \t]*\n)/gm, "")

This should do the trick i think:
var el = document.getElementsByName("nameOfTextBox")[0];
el.value.replace(/(\r\n|\n|\r)/gm, "");
EDIT: Removes three types of line breaks.

Here's a reusable function that will trim each line's whitespace and remove any blank or space-only lines:
function trim_and_remove_blank_lines(string)
{
return string.replace(/^(?=\n)$|^\s*|\s*$|\n\n+/gm, "")
}
Usage example:
trim_and_remove_blank_lines("Line 1 \nLine2\r\n\r\nLine4\n")
//Returns 'Line 1\nLine2\nLine4'

function removeEmptyLine(text) {
return text.replace(/(\r?\n)\s*\1+/g, '$1');
}
test:
console.assert(removeEmptyLine('a\r\nb') === 'a\r\nb');
console.assert(removeEmptyLine('a\r\n\r\nb') === 'a\r\nb');
console.assert(removeEmptyLine('a\r\n \r\nb') === 'a\r\nb');
console.assert(removeEmptyLine('a\r\n \r\n \r\nb') === 'a\r\nb');
console.assert(removeEmptyLine('a\r\n \r\n 2\r\n \r\nb') === 'a\r\n 2\r\nb');
console.assert(removeEmptyLine('a\nb') === 'a\nb');
console.assert(removeEmptyLine('a\n\nb') === 'a\nb');
console.assert(removeEmptyLine('a\n \nb') === 'a\nb');
console.assert(removeEmptyLine('a\n \n \nb') === 'a\nb');
console.assert(removeEmptyLine('a\n \n2 \n \nb') === 'a\n2 \nb');

Related

Regex find string and replace that line and following lines

I am trying to find a regex to achieve the following criteria which I need to use in javascript.
Input file
some string is here and above this line
:62M:C111111EUR1211498,00
:20:0000/11111000000
:25:1111111111
:28C:00001/00002
:60M:C170926EUR1211498,06
:61:1710050926C167,XXNCHKXXXXX 11111//111111/111111
Output has to be
some string is here and above this line
:61:1710050926C167,XXNCHKXXXXX 11111//111111/111111
Briefly, find :62M: and then replace (and delete) the lines starting with :62M: followed by lines starting with :20:, :25:, :28c: and :60M:.
Or, find :62M: and replace (and delete) until the line starting with :61:.
Each line has fixed length of 80 characters followed by newline (CR LF).
Is this really possible with regex?
I know how to find a string and replace the same line where the string is. But here multiple lines to be removed which is quite hard for me.
Please could someone help me out if it is possible with regex.
Here it is. First I'm finding text to delete using regex (note that I'm using [^]* to match all the lines insted of .*, as it also matches newlines). Then I'm replacing it with a newline.
var regex = /:62M:.*([^]*):61:.*/;
var text = `some string is here and above this line
:62M:C111111EUR1211498,00
:20:0000/11111000000
:25:1111111111
:28C:00001/00002
:60M:C170926EUR1211498,06
:61:1710050926C167,XXNCHKXXXXX 11111//111111/111111`;
var textToDelete = regex.exec(text)[1];
var result = text.replace(textToDelete, '\n');
console.log(result);

Regex to match only when certain characters follow a string

I need to find a string that contains "script" with as many characters before or after, and enclosed in < and >. I can do this with:<*script.*>
I also want to match only when that string is NOT followed by a <
The closest I've come, so far, is with this: (<*script.*>)([^=?<*]*)$
However, that will fail for something like <script></script> because the last > isn't followed by a < (so it doesn't match).
How can I check if only the the first > is followed by < or not?
For example,
<script> abc () ; </script> MATCH
<< ScriPT >abc (”XXX”);//<</ ScriPT > MATCH
<script></script> DON'T MATCH
And, a case that I still am working on:
<script/script> DON'T MATCH
Thanks!
You were close with your Regex. You just needed to make your first query non-greedy using a ? after the second *. Try this out:
(?i)<*\s*script.*?>[^<]+<*[^>]+>
There is an app called Expresso that really helps with designing Regex strings. Give it a shot.
Explanation: Without the ? non-greedy argument, your second * before the first > makes the search go all the way to the end of the string and grab the > at the end right at that point. None of the other stuff in your query was even being looked at.
EDIT: Added (?i) at the beginning for case-insensitivity. If you want a javascript specific case-insensitive regex, you would do that like this:
/<*\s*script.*?>[^<]+<*[^>]+>/i
I noticed you have parenthesis in your regex to make groups but you didn't specifically say you were trying to capture groups. Do you want to capture what's between the <script> and </script>? If so, that would be:
/<*\s*script.*?>([^<]+)<*[^>]+>/i
If I understand what you are looking for give this a try:
regex = "<\s*script\s*>([^<]+)<"
Here is an example in Python:
import re
textlist = ["<script>show this</script>","<script></script>"]
regex = "<\s*script\s*>([^<]+)"
for text in textlist:
thematch = re.search(regex, text, re.IGNORECASE)
if thematch:
print ("match found:")
print (thematch.group(1))
else:
print ("no match sir!")
Explanation:
start with < then possible spaces, the word script, possible spaces, a >
then capture all (at least 1) non < and make sure that's followed by a <
Hope that helps!
This would be better solved by using substring() and/or indexOf()
JavaScript methods

Match only the line which end with specific char

How to match the line which does not contain the final dot (full stop/period), in order to add it afterwards.
Someword someword someword.
Someword someword someword
Someword someword someword.
These are my unsuccessful attempts:
.+(?=\.)
.+[^.]
--- update
This works for me:
.+\w+(?:\n)
https://regex101.com/r/sR0aD7/1
The following should match a string that ends with anything but dot: [^.]$ - "anything but dot" and end-of-text marker.
How to match the line which does not contain the final dot (full stop/period),
You can use negative lookahead like this:
/(?!\.$)/
OR else you can also inverse test:
if (!/\.$/.test(input)) { console.log("line is not ending with dot"); }
Regular expression is one way i think you can use this method also --->
function lastCharacter(sentence){
var length = sentence.length;
return sentence.charAt(length-1);
}
Example :-
Input ---> Hey JavaScript is damm good.
Use ---> lastCharacter('Hey JavaScript is damm good.');
Output ---> '.'
In other cases you can check with if condition for dot('.').
Just use something like this: [^\.]$
$ - Indicates end of line.
[^...] - Indicates selecting lines not containing "..."
\. - This is the escaped "." Character. It needs to be escaped because . is anything.
Pulling this together, you get a regular expression .+[^\.]$ which will match your line. You will need the m flag (Multiline) for this to work (I believe)
This works for me:
.+\w+(?:\n)
https://regex101.com/r/sR0aD7/1

Remove line breaks from start and end of string

I noticed that trim() does not remove new line characters from the start and end of a string, so I am trying to accomplish this with the following regex:
return str.replace(/^\s\n+|\s\n+$/g,'');
This does not remove the new lines, and I fear I am out of my depth here.
EDIT
The string is being generated with ejs like so
go = ejs.render(data, {
locals: {
format() {
//
}
}
});
And this is what go is, but with a few empty lines before. When I use go.trim() I still get the new lines before.
<?xml version="1.0"?>
<fo:root xmlns:fo="http://www.w3.org/1999/XSL/Format">
<fo:layout-master-set>
<fo:simple-page-master master-name="Out" page-width="8.5in" page-height="11in" margin-top="1in" margin-bottom="0.5in" margin-left="0.75in" margin-right="0.75in">
<fo:region-body margin-top="1in" margin-bottom="0.25in"/>
<fo:region-before extent="1in"/>
<fo:region-after extent="0.25in"/>
<fo:region-start extent="0in"/>
<fo:region-end extent="0in"/>
</fo:simple-page-master>
</fo:layout-master-set>
<fo:page-sequence master-reference="Out" initial-page-number="1" force-page-count="no-force">
<fo:static-content flow-name="xsl-region-before">
<fo:block font-size="14pt" text-align="center">ONLINE APPLICATION FOR SUMMARY ADVICE</fo:block>
<fo:block font-size="13pt" font-weight="bold" text-align="center">Re:
SDF, SDF
</fo:block>
</fo:static-content>
<fo:flow flow-name="xsl-region-body" font="10pt Helvetica">
.. removed this content
</fo:flow>
</fo:page-sequence>
</fo:root>
Try this:
str = str.replace(/^\s+|\s+$/g, '');
jsFiddle here.
String.trim() does in fact remove newlines (and all other whitespace). Maybe it didn't used to? It definitely does at the time of writing. From the linked documentation (emphasis added):
The trim() method removes whitespace from both ends of a string. Whitespace in this context is all the whitespace characters (space, tab, no-break space, etc.) and all the line terminator characters (LF, CR, etc.).
If you want to trim all newlines plus other potential whitespace, you can use the following:
return str.trim();
If you want to only trim newlines, you can use a solution that targets newlines specifically.
/^\s+|\s+$/g should catch anything. Your current regex may have the problem that if your linebreaks contain \r characters they wouldn't be matched.
Try this:
str.split('\n').join('');

jQuerys $.trim(), bug or poorly written?

$.trim() uses the following RegExp to trim a string:
/^(\s|\u00A0)+|(\s|\u00A0)+$/g
As it turns out, this can be pretty ugly, Example:
var mystr = ' some test -- more text new test xxx';
mystr = mystr.replace(/^(\s|\u00A0)+|(\s|\u00A0)+$/g, "");
This code hangs Firefox and Chrome, it just takes like forever. "mystr" contains whitespaces but mostly hex 160(A0) characters. This "problem" does only occur, if there is no prepending whitespace/A0, but somewhere within the string. I have no clue why this happens.
This expression:
/^[\n\r\t \xA0]+|[\n\r\t \xA0]$/g
just works fine in all tested scenarios. Maybe a better pattern for that?
Source: http://code.jquery.com/jquery-1.4.2.js
UPDATE
It looks like you can't copy&paste this example string, at some points those A0 characters are replaced. Firebug console will also replace the characters on pasting, you have to create your own string in a sepperate html file/editor to test this.
This is a known bug, as said in comments, and Crescent is right that it's this way in 1.4.2, but it's already fixed for the next release.
You can test the speed of String.prototype.trim on your string here: http://jsfiddle.net/dLLVN/
I get around 79ms in Chrome 117ms in Firefox for a million runs...so this will fix the hanging issue :)
As for the fix, take a look at the current source that'll be in 1.4.3, the native trimming is now used.
There were 2 commits in march for this:
http://github.com/jquery/jquery/commit/141ad3c3e21e7734e67e37b5fb39782fe11b3c18
http://github.com/jquery/jquery/commit/ba8938d444b9a49bdfb27213826ba108145c2e50
1.4.2 $.trim() function:
trim: function( text ) {
return (text || "").replace( rtrim, "" );
},
1.4.3 $.trim() function:
//earlier:
trim = String.prototype.trim
//new trim here
trim: trim ?
function( text ) {
return text == null ?
"" :
trim.call( text );
} :
// Otherwise use our own trimming functionality
function( text ) {
return text == null ?
"" :
text.toString().replace( trimLeft, "" ).replace( trimRight, "" );
}
The trimLeft and trimRight vary, depending on whether you're in IE or not, like this:
trimLeft = /^\s+/,
trimRight = /\s+$/,
// Verify that \s matches non-breaking spaces
// (IE fails on this test)
if ( !/\s/.test( "\xA0" ) ) {
trimLeft = /^[\s\xA0]+/;
trimRight = /[\s\xA0]+$/;
}
Normally an expression like ^\s+|\s+$ should be enough for trimming, since \s is supposed to match all space characters, even \0xa0 non-breaking spaces1. This expression should run without causing any problems.
Now probably some browser that jQuery wants to support doesn't match \0xa0 with \s and to work around this problem jQuery added the alternative (\s|\0xa0), to trim away non-breaking spaces on that browser too.
With this change, the second part of the regex looks like (\s|\0xa0)+$, which leads to problems in browsers where \0xa0 is also matched by \s. In a string containing a long sequence of \0xa0 characters, each character can be matched by \s or \0xa0, leading to lots of alternative matches and exponentially many combinations how different matches can be combined. If this sequence of \0xa0 characters is not at the end of the string, the trailing $ condition can never be fulfilled, no matter which spaces are matched by \s and which are matched by \0xax, but the browser doesn't know this and tries all combinations, potentially searching for a very long time.
The simplified expression you suggest will not be sufficient since \s is supposed to match all unicode space characters, not just the well-known ASCII ones.
1 According to MDC, \s is equivalent to [\t\n\v\f\r \u00a0\u2000\u2001\u2002\u2003\u2004\u2005\u2006\u2007\u2008\u2009\u200a\u200b\u2028\u2029\u3000]
As it turned out, this behavior was posted on jQuerys bugtracker one month ago:
http://dev.jquery.com/ticket/6605
Thanks to Andrew for pointing me to that.

Categories

Resources