How to split on white spaces not between quotes? [duplicate] - javascript

This question already has answers here:
Javascript split by spaces but not those in quotes
(3 answers)
Closed 2 years ago.
I am trying to split a string on white spaces only (\s), but that are not between a "quoted" section.
I am matching all text in between these quoted sections in the following manner:
(['"`]).*?\1
Regex101
However, when I try to add this as a negative lookahead, to only split on white spaces outside of those quotes, I can't get it to work:
\s(?!(['"`]).*?\1)
Regex101
How can I only split on the white spaces that are not in "quotes"?

\s(?=(?:[^'"`]*(['"`])[^'"`]*\1)*[^'"`]*$)
You can use this regex with lookahead to split upon.See demo.
https://regex101.com/r/5I209k/4
or if mixed tick types.
https://regex101.com/r/5I209k/7

The problem is that you need to exclude entries within the group. Instead of using a negative lookahead you could do it like this:
(\S*(?:(['"`]).*?\2)\S*)\s?|\s
Basically what it does is to:
captures any non-whitespace characters
that may contain a quoted string
and is optionally directly followed by any non-whitespace (e.g a comma after the quote).
then matches an optional trailing whitespace
OR
matches a single whitespace
Capture group1 will then contain an as long as possible sequences of all non-whitespace characters (unless they are within quotes). This can thus be used with the replacement group \1\n to replace your desired whitespaces with a newline.
Regex101: https://regex101.com/r/A4HswJ/1
JSFiddle: http://jsfiddle.net/u1kjudmg/1/

I'd use a simpler approach, no need of advanced features:
'([^']|\\.)*'|"([^"]|\\.)*"|`([^`]||\.)*`|\S*
meaning:
a single-quoted section '([^']|\\.)*'
or | a double-quoted section "([^"]|\\.)*"
or | a back-quoted section (can't place it inline in SO markdown)
or | an un-quoted section \S*
This will separate also quoted parts. If this is not wanted you can instead use
('([^']|\\.)*'|"([^"]|\\.)*"|`([^`]||\.)*`|\S)+
i.e. find sequences of tokens where each token is either a non-whitespace or a quoted section.

Related

Regular expression for allowing only SINGLE white space and single hyphen in between alpha numeric string [duplicate]

I've a textbox in an ASP.NET application, for which I need to use a regular expression to validate the user input string. Requirements for regex are -
It should allow only one space between words. That is, total number of spaces between words or characters should only be one.
It should ignore leading and trailing spaces.
Matches:
Test
Test abc
Non Matches:
Test abc def
Test abc --> I wanted to include multiple spaces between the 2 words. However the editor ignores these extra spaces while posting a question.
Assuming there must be either one or two 'words' (i.e. sequences of non-space characters)
"\s*\S+(\s\S+)?\s*"
Change \S to [A-Za-z] if you want to allow only letters.
Pretty straightforward:
/^ *(\w+ ?)+ *$/
Fiddle: http://refiddle.com/gls
Maybe this one will do?
\s*\S+?\s?\S*\s*
Edit: Its a server-encoded regex, meaning that you might need to remove one of those escaping slashes.
How about:
^\s*(\w+\s)*\w+\s*$

matching multiple lines beginning with whitespaces

I have a simple regex syntax to match lines that begin with exactly 4 spaces.
/^(\s{4}).*/g
The problem is that the . token matches everything except a new line so multiple lines beginning with 4 spaces, only the first line is matched. I've tried explicitly matching \n tokens but I haven't been able to quite get the results I need. I've been testing this using regexr.com here I can't use any syntax that isn't supported by javascript.
The ^ symbol can denote 2 things: a beginning of string, or a beginning of a line. To make it denote the latter, you need to specify the /m MULTILINE modifier:
/^(\s{4}).*/gm
Or - to only match literal regular spaces (note that \scan also match newlines):
/^( {4}).*/gm
See regex demo

using regex to see if string contains this word only, not within another word [duplicate]

This question already has an answer here:
Match and replace whole words in javascript
(1 answer)
Closed 8 years ago.
I have a regex that matches these strings in a string; however, it is matching non-words ( parts-of-words ) as well.
For example city is matched as it contains it. However, I want only the string it to be matched it if it the only characters between whitespace. So it or he would match, but not city or where.
Here is the regex ( pretty basic and simple ): they|he|she|her|him|them|it.
How can I get it to match these words if the word is only this?
Use word boundaries to denote the beginning and ending of a word.
http://www.regular-expressions.info/wordboundaries.html
So your regex would become something on the order of:
\b(they|he|she|her|him|them|it)\b
Check it out
It should be noted that this regular expression won't match words containing apostrophes, e.g. can't, won't, etc. For a discussion of this, see the following Stackoverflow post:
How do you use the Java word boundary with apostrophes?
Try to put an word boundary before the words,
(?:\bthey\b|\bhe\b|\bshe\b|\bher\b|\bhim\b|\bthem\b|\bit\b)
Explanation:
(?:...) # Non captuaring groups
\b # Word boundary(It matches between a word character and a non word character)
DEMO

Regex to allow special characters

I need a regex that will allow alphabets, hyphen (-), quote ('), dot (.), comma(,) and space. this is what i have now
^[A-Za-z\s\-]$
Thanks
I removed \s from your regex since you said space, and not white space. Feel free to put it back by replacing the space at the end with \s Otherwise pretty simple:
^[A-Za-z\-'., ]+$
It matches start of the string. Any character in the set 1 or more times, and end of the string. You don't have to escape . in a set, in case you were wondering.
You probably tried new RegExp("^[A-Za-z\s\-\.\'\"\,]$"). Yet, you have a string literal there, and the backslashes just escape the following characters - necessary only for the delimiting quote (and for backslashes).
"^[A-Za-z\s\-\.\'\"\,]$" === "^[A-Za-zs-.'\",]$" === '^[A-Za-zs-.\'",]$'
Yet, the range s-. is invalid. So you would need to escape the backslash to pass a string with a backslash in the RegExp constructor:
new RegExp("^[A-Za-z\\s\\-\\.\\'\\\"\\,]$")
Instead, regex literals are easier to read and write as you do not need to string-escape regex escape characters. Also, they are parsed only once during script "compilation" - nothing needs to be executed each time you the line is evaluated. The RegExp constructor only needs to be used if you want to build regexes dynamically. So use
/^[A-Za-z\s\-\.\'\"\,]$/
and it will work. Also, you don't need to escape any of these chars in a character class - so it's just
/^[A-Za-z\s\-.'",]$/
You are pretty close, try the following:
^[A-Za-z\s\-'.,]+$
Note that I assumed that you want to match strings that contain one or more of any of these characters, so I added + after the character class which mean "repeat the previous element one or more times".
Note that this will currently also allow tabs and line breaks in addition to spaces because \s will match any whitespace character. If you only want to allow spaces, change it to ^[A-Za-z \-'.,]+$ (just replaced \s with a space).

regex remove white space after text [duplicate]

This question already has answers here:
Trim string in JavaScript
(20 answers)
Closed 4 years ago.
From this regex,
text.replace(/^\s+|\s+$/g,"").replace(/ +/g,' ')
how do I remove the regex just for trailing white space?
I am new to regex and did some research but I'm not able to understand the pattern.
/^\s+|\s+$/g means
^ // match the beginning of the string
\s+ // match one or more whitespace characters
| // OR if the previous expression does not match (i.e. alternation)
\s+ // match one or more whitespace characters
$ // match the end of the string
The g modifier indicates to repeat the matching until no match is found anymore.
So if you want to remove the part the matches whitespace characters at the end of the string, remove the |\s+$ part (and the g flag since ^\s+ can only match at one position anyway - at the beginning of the string).
Useful resources to learn regular expressions:
http://www.regular-expressions.info/
Regex in JavaScript (since this seems to be JavaScript).

Categories

Resources