Replacing carriage returns (?) after compiling HTML - javascript

After parsing HTML I get the following object:
I would like to strip all the "↵" except of one. How can I do this? I tried with something like this:
weirdString.replace(/(\r\n|\n|\r)/gm, ""));
However, this replaces all the "↵" but as I've already mentioned I want to replace all of those except the first...

You may capture it and restore with a backreference:
weirdString.replace(/^([^\S\r\n]*(?:\r\n?|\n))|(?:\r\n?|\n)/g, "$1"));
No need using m modifier here.
Details:
^ - start of a string
([^\S\r\n]*(?:\r\n?|\n)) - Capturing group 1:
[^\S\r\n]* - any 0+ whitespaces other than CR and LF
(?:\r\n?|\n) - any style line break
| - or
(?:\r\n?|\n) - any style line break.
With $1, only the contents captured into Group 1 are put back in the replacement result.
var weirdString = " \r\n\r\n\n\rSome text";
console.log(weirdString.replace(/^([^\S\r\n]*(?:\r\n?|\n))|(?:\r\n?|\n)/g, "$1"));

A little bit tricky, but why dont you first replace your first carriage return with something else? e.g.: %#% or something else, what your are not using in your text... then replace all other carriage returns, and at last return your %#% tag back to carrige return...

The exact matching regexp must cope with some things you have not accounted for:
first is whitespace that can be in between two such line ends. It should be considered the case of intervening.
Second is that the \r in front of \n should be considered optional, as it appears in texts that come from socket connections from internet (most protocols force to send \r\n but can be optional.
a sequence of two or more newlines of this type should be collapsed to one \n (or one \r\n as you prefer)
If you do a pattern match and substitute with multiple flag enabled you'll get the desired effect with this pattern:
([ \t]*\r*\n)+
as seen in the following demo. I have substituted the newlines by a [<--']\r\n to be able to see the effect. It also deletes all trailing whitespace at line ends (normally invisible) but doesn't touch the leading at beginning of lines (this could affect the visible looking of your text)

Related

Is it possible to replace only in a match group - REGEX

I have always tried to avoid regex because I simply can't get my head around how it really works. Most of the time I manage to get the expected result by luck more than actual skill.
However, I am trying to replace any whitespace character in a bundled webpack source with the string-replace-loader or the String-Replace-Plugin (which ever turns out easier). But before I try to do this on the actual source, I want to understand the regex which I am trying to perform.
The problem
I have query strings which always start with dqlParse followed by \n then maybe some \t and other whitespace characters. I have already managed to get my whitespace characters removed in a test string if I match this
/\s+\s/g
and simply replace it with " ".
Since I don't have control over all the strings within my bundle, I thought I can indicate which string is set for replacement by adding dqlParse infront of the string and then match and replace by groups. Unfortunately no luck so far.
What I have tried
So far I have tried something like this
/(^dqlParse)(.*)/g
which basically does what it should since match group $1 is dqlParse and match group $2 is the rest of the string where I would like to do the replacement.
Is it possible to replace only in the second match group?
Thanks! Any help appreciated!
Yes, you can do that with String#replace:
text = text.replace(/^(dqlParse)(.*)/g, function(_, x, y) {return x + y.replace(/\s{2,}/g, ' ');})
This will match and capture dqlParse into Group 1 (x variable in the callback function), and the rest of the line will get captured into Group 2 (y in the callback function). So, once the match is found, the replacement will be the concatenation of x and y with all two or more whitespace chunks replaced with a single space.

The second pattern of a regex not replacing apostrophe

I'm creating a regex that matches straight apostrophes and replaces them with a curly ones. Sometimes an apostrophe goes in the middle of two characters. Other times goes at the end of a character/word (e.g. ellipsis').
So I have two regexes that handle both situations (separated by an or statement).
However, only the first case is being replaced, not the second. In other words, this:
"Wor'd word'".replace(/(?<=\w)\'(?=\w)|(?<=\w)\'(?=\s)/, '’')
Becomes this:
"Wor’d word'"
This confuses me because both types of apostrophes are matching: https://regexr.com/4td7p
Why is this, and how to fix it?
Update: I figured the problem was that there's no space after the last apostrophe, so I changed the second part of the regex to this: (?<=\w)\'(?!\w) (don't match if there's a character after the apostrophe). But I'm getting the same result.
If you want to match (?<=\w)\' followed by a character and also match (?<=\w)\' not followed by a character, why not just drop the logic after it altogether and just use (?<=\w)'? (no need to escape 's in a regex)
You also need the global flag to replace more than one thing at a time:
console.log(
"Wor'd word'".replace(/(?<=\w)'/g, '’')
);
updated
var str = "Wor'd word' that's a good thing'";
var afterReplace = str.replace(/'\b/g, '’')
console.log(afterReplace);

Single regex to remove empty lines and double spaces from multiline input

I would like to combine two regex functions to clean up some textarea input. I wonder if it is even possible, or if I should keep it two separate ones (which work fine but aren't looking as pretty or clean).
I have adjusted either so that they utilize global and multiline (/gm) and are replaced by nothing (''). I tried with brackets and vertical/or lines in any position, but it never ends up giving the expected result, so I can only assume there is a way that I have overlooked or that I should keep it as is.
Regex 1: /^\s+[\r\n]/gm
Regex 2: /^\s+| +(?= )|\s+$/gm
Currently in JavaScript: string.replace(/^\s+[\r\n]/gm,'').replace(/^\s+| +(?= )|\s+$/gm,'')
The goal is to remove:
Empty spaces in the beginning and end of each line
Empty lines (including any in the very beginning and end)
Double spaces
Without it ending up on one and the same line. The single line breaks (\r\n) should still be there in the end.
Regex 1 is to remove any empty line (^\s+[\r\n]), Regex 2 does the trimming of whitespaces in the beginning (^\s+) and end (\s+$), and removes double (and triple, quadriple, etc) spaces in between (+(?= )).
Input:
Let's
make this
look
a little
nicer
and
more
readible
Output:
Let's
make this
look
a little
nicer
and
more
readible
Edit: Many thanks to Wiktor Stribiżew and his comment for this complete solution:
/^\s*$[\r\n]*|^[^\S\r\n]+|[^\S\r\n]+$|([^\S\r\n]){2,}|\s+$(?![^])/gm
I'd suggest the following expression with a substitution template "$1$2" (demo):
/^\s*|\s*$|\s*(\r?\n)\s*|(\s)\s+/g
Explanation:
^\s* - matches whitespace from the text beginning
\s*$ - matches whitespace from the text ending
\s*(\r?\n)\s* - matches whitespace between two words located in different lines, captures one CRLF to group $1
(\s)\s+ - captures the first whitespace char in a sequence of 2+ whitespace chars to group $2

Cannot get a regex to work in JavaScript that allows whitespace and backslash

I have a regular expression as below. It should allow alphabets, digits, round brackets, square brackets, backslash and following punctuation marks: period, comma, semi-colon, full colon, exclamation, percentage and dash.
^[(a-z)(A-Z) .,;:!'%\-(0-9)(\\)\(\)[\]\s]+$
Question : I have tried this regular expression with some text at this online tester: https://regex101.com/r/kO5tW2/2, but it always comes up with no matches. What is causing the expression to fail in above case? To me, the string being tested should come back as valid, but it's not.
Your spec does not mention a question mark. However, the test text you give does include a question mark. You could have tested this easily enough by removing one character at a time from the test text until you got a match, which would have happened when you removed the question mark.
Either add the question mark to the regexp, or remove it from your test test.
Also, you do not need to (and should not) enclose ranges in parentheses.
In the below, I've also removed escaping for characters which do not need to be escaped:
^[a-zA-Z .,;:!'%\-0-9\\()[\]\s?]+$
^
https://regex101.com/r/kO5tW2/4
Try adding m (multiline) modifier to regex
If you have a string consisting of multiple lines, like first line\nsecond line (where \n indicates a line break), it is often desirable to work with lines, rather than the entire string. Therefore, all the regex engines discussed in this tutorial have the option to expand the meaning of both anchors. ^ can then match at the start of the string (before the f in the above string), as well as after each line break (between \n and s). Likewise, $ still matches at the end of the string (after the last e), and also before every line break (between e and \n). Source

Make regex find a text inside tags with line breaks in the content - JavaScript

thanks for coming here, so i've got this little code:
while(/\[del\](.*?)\[\/del\]/i.exec(text) != null)
text = text.replace(/\[del\](.*?)\[\/del\]/i, "<s>$1</s>");
but when there are line breaks, it wont match.
Example:
[del]asdsadasdaasdadsadsadsadasdsadsa[/del] - this won't be matched
I'm really new to regex, so what I'm doing wrong?
By default in many regex flavors, the dot doesn't match the newline character. Javascript doesn't have the singleline modifier (?s) to change this behaviour. The most current trick to match all characters including newlines is to use [\s\S] that matches all that is a whitespace character and all that is not a whitespace character.
As an aside comment, you don't need to put the replace method in a while loop, since the replace will only perform a replacement if something is found. If you want to replace all occurences, just add the g command at the end of the pattern.
text = text.replace(/\[del\]([\s\S]*?)\[\/del\]/ig, "<s>$1</s>");
Note that for this specific replacement, since your del tag doesn't seem to have parameters, you can simply write:
text = text.replace(/\[(\/?)del\]/ig, "<$1s>");
(it avoids a lot of work)

Categories

Resources