Regex to match middle string inside optional whitespace [duplicate] - javascript

This question already has answers here:
How to remove leading and trailing white spaces from a given html string?
(7 answers)
Closed 5 years ago.
I'm trying to create a regular expression that will match any given string (text or whitespace) inside of arbitrary, optional whitespace. The string itself could have whitespace in it;
I'm just trying to cut white space of the beginning and end, if it exists.
Example strings:
one
t
three
four
five
Expected output:
one
t
three
four
five
I've been testing on regextester.com but have so far haven't been able to get it quite right.
[^\s][\w\W]*[^\s] will match cases 1, 3, 4, and 5, but it fails for single-character strings.
[^\s]*[\w\W]*[^\s] gets 1, 2, and 4, but it includes the leading whitespace from 3 and 5.
Is there a regular expression can handle this task? I'd also settle for using option 2 above and then trimming off the leading whitespace afterwards, but not sure how to do that.

You don't need regex to strip whitespace. In python just use the .strip method of any text object. I am sure other languages have an equally convenient tool.

In java you can use the .trim() method on any String. This will remove leading and trailing whitespace
" spaces at front and end. ".trim() -> "spaces at front and end"

Related

No white space at beginning + allow space in the middle [duplicate]

This question already has answers here:
Regular expression: match start or whitespace
(8 answers)
Closed 1 year ago.
I have this regex for detecting #xxx
/(?:#)(.*[a-zA-Z0-9]*)/
it matches even when the #xxx is not separated from another string from the left (when it's typed in the middle of an input line).
xxx#xxx will match too so i added \s to require a space in the begining .Now it's
/\s(?:#)(.*[a-zA-Z0-9]*)/
But the problem is there isn't a match when the #xxx is typed in the begining of a line (the white space is still required) and i need it match in that case.
I tried to get inspired by https://stackoverflow.com/a/19973707/170592 so i added ^[^-\s] in the begining of the regex to make it
/^[^-\s](?:#)(.*[a-zA-Z0-9]*)/
But it didn't work neither.
I think that what you are looking for it is /\S+/ which means that check for any non-whitespace and I don't think you need the ^ at the beginning.
[-\S+](?:#)(.*[a-zA-Z0-9]*)

How to split on white spaces not between quotes? [duplicate]

This question already has answers here:
Javascript split by spaces but not those in quotes
(3 answers)
Closed 2 years ago.
I am trying to split a string on white spaces only (\s), but that are not between a "quoted" section.
I am matching all text in between these quoted sections in the following manner:
(['"`]).*?\1
Regex101
However, when I try to add this as a negative lookahead, to only split on white spaces outside of those quotes, I can't get it to work:
\s(?!(['"`]).*?\1)
Regex101
How can I only split on the white spaces that are not in "quotes"?
\s(?=(?:[^'"`]*(['"`])[^'"`]*\1)*[^'"`]*$)
You can use this regex with lookahead to split upon.See demo.
https://regex101.com/r/5I209k/4
or if mixed tick types.
https://regex101.com/r/5I209k/7
The problem is that you need to exclude entries within the group. Instead of using a negative lookahead you could do it like this:
(\S*(?:(['"`]).*?\2)\S*)\s?|\s
Basically what it does is to:
captures any non-whitespace characters
that may contain a quoted string
and is optionally directly followed by any non-whitespace (e.g a comma after the quote).
then matches an optional trailing whitespace
OR
matches a single whitespace
Capture group1 will then contain an as long as possible sequences of all non-whitespace characters (unless they are within quotes). This can thus be used with the replacement group \1\n to replace your desired whitespaces with a newline.
Regex101: https://regex101.com/r/A4HswJ/1
JSFiddle: http://jsfiddle.net/u1kjudmg/1/
I'd use a simpler approach, no need of advanced features:
'([^']|\\.)*'|"([^"]|\\.)*"|`([^`]||\.)*`|\S*
meaning:
a single-quoted section '([^']|\\.)*'
or | a double-quoted section "([^"]|\\.)*"
or | a back-quoted section (can't place it inline in SO markdown)
or | an un-quoted section \S*
This will separate also quoted parts. If this is not wanted you can instead use
('([^']|\\.)*'|"([^"]|\\.)*"|`([^`]||\.)*`|\S)+
i.e. find sequences of tokens where each token is either a non-whitespace or a quoted section.

Using Javascript RegEx to find the last whitespace before inserting linebreak except for some punctuation [duplicate]

This question already has an answer here:
Split long string into text chunks with jQuery
(1 answer)
Closed 7 years ago.
Here's the statement I have:
text.replace(/(.{35})/g, "$1\n");
It works, it inserts a new line every 35 characters. However, I don't want a word to be cut in half. How would I find the last whitespace BEFORE that 35th character? Is it possible to do with RegEx? The block it executes on should insert up to 6 linebreaks because the overall character limit is 210 characters.
This is the current output:
This is an example of current output:
this is some text that has been for
matted by that statement.
This is what I want:
This is an example of current output:
this is some text that has been
formatted by that statement.
It is being executed on a text field.
You can use the following regex:
text.replace(/.{0,35}\b/g, "$&\n");
See demo
Capturing groups are redundant here since we can access the matched text with $&. \b ensures whole word match, that we match at the word boundary. {0,35} is a greedy limiting quantifier (that is, it tries to match as many characters as it can), but matching will end before the 35th character if there is a word boundary earlier and 35th character is not at the boundary position.
EDIT:
So as not to insert the linebreak at the end of the string, and also keep a punctuation symbol in the character class on the current line, use
.{1,35}(?:[.,:;–—-]|\b)
See another demo

Regex pattern to match this string [duplicate]

This question already has answers here:
regex pattern to match a type of strings
(4 answers)
Closed 8 years ago.
I need to match the below type of strings using a regex pattern in javascript.
E.g. /this/<one or more than one word with hyphen>/<one or more than one word with hyphen>/<one or more than one word with hyphen>/<one or more than one word with hyphen>
So this single pattern should match both these strings:
1. /this/is/single-word
2. /this/is more-than/single/word-patterns/to-be-matched
Only the slash (/)and the 'this' in the beginning are consistent and contains only alphabets.
Try this -
^\/this(?:\/[\w\- ]+)+$
Demo here
There are some inconsistencies in your question, and it's not quite clear exactly what you want to match.
That being said, the following regex will provide a loose starting point for the exact strings that you want.
/this/(?:[\w|-]+/?){1,10}
This assumes the ' ' in your url was not intentional. This example will match a url with '/this/' + 1 to 10 additional '/' chunks.
(?:) -> non-matching group
[\w|-]+ -> one or more word characters or a hyphen
/? -> zero or one slashes
{1,10} -> 1 to 10 of the previous element, the non-matching group

JavaScript: \\d{4} RegExp allows more than 4 digits [duplicate]

This question already has answers here:
Match exact string
(3 answers)
Closed 4 years ago.
Have following validation for year value from text input:
if (!year.match(new RegExp('\\d{4}'))){
...
}
RegExp equals null if numeric of digits from 0 to 3. It's OK.
In case 4 digits it returns value.It's OK.
In case more than 4 digits it returns value again,that it's NOT OK.
Documentation says {n} declaration means exact number,but works like:
exact+
With such ugly validation it work's fine:
if (!year.match(new RegExp('\\d{4}')) || year.length>4){
...
}
I wish to utilize RegExp object only.
Yes it would allow more than 4 digits since it would be a partial match use the ^ and $ to mark the beginning and the end of the string.
if (!year.match(new RegExp('^\\d{4}$'))){
...
}
If you include ^ in your regex it matches the beginning of the string, while $ matches the end, so all up:
^\d{4}$
Will match only against beginning-of-string plus four digits plus end-of-string.
Note that regex literal syntax is generally a bit simpler than saying new Regex():
/^\d{4}$/
// is the equivalent of
new RegExp('^\\d{4}$')
Note that in the literal syntax you don't have to escape backslashes like with the string you pass to the new RegExp(). The forward slashes are not part of the expression itself, you can think of them like quotation marks for regexes.
Also, if you just want to check if a string matches a pattern (yes or no) without extracting what actually matched you should use the .test() method as follows:
if (!/^\d{4}$/.test(year)) {
...
}
It's matching the first four digits and then the fact that there's any remaining digits it neither here nor there. You need to change your regex so it stops after these four digits, say, by using the string termination anchors:
^\d{4}$
Try instead:
'^\\d{4}$'
What you had will match anything with 4 digits anywhere, such as asd1234asd or 123456789

Categories

Resources