Regex validation space in the middle of a string - javascript

I'm looking for an expression that requires a space in a string, it doesn't have to be dead in the middle just not at the end (or start).
I've had a look on google and stack-overflow, there are quite a few but I haven't found one that does what I need.
Here's what I have at the moment
var re = /^[A-Z]\'?[- a-zA-Z]( [a-zA-Z])*$/igm;

Based on the limited requirements you specified, this will do it. It requires a string to contain ONE space, anywhere but at the start or end.
/^[^ ]+ [^ ]+$/
Explanation: anchoring to the beginning of the string, allow one or more non-space characters, followed by a single space, followed by, again, one or more non-space characters, to the end of the string.
[^ ] is a negated character class. That is, it says "anything but the characters inside [ and ].

Your regex should be: /^[A-Z]\'?[-\sa-zA-Z](\s[a-zA-Z])*$/igm;. According to my idea, regex doesn't recognize a whitespace that why I replace those with \s.

Related

Finding all words ending in "ion" with regex in JavaScript [duplicate]

I need help putting together a regex that will match word that ends with "Id" with case sensitive match.
Try this regular expression:
\w*Id\b
\w* allows word characters in front of Id and the \b ensures that Id is at the end of the word (\b is word boundary assertion).
Gumbo gets my vote, however, the OP doesn't specify whether just "Id" is an allowable word, which means I'd make a minor modification:
\w+Id\b
1 or more word characters followed by "Id" and a breaking space. The [a-zA-Z] variants don't take into account non-English alphabetic characters. I might also use \s instead of \b as a space rather than a breaking space. It would depend if you need to wrap over multiple lines.
This may do the trick:
\b\p{L}*Id\b
Where \p{L} matches any (Unicode) letter and \b matches a word boundary.
How about \A[a-z]*Id\z? [This makes characters before Id optional. Use \A[a-z]+Id\z if there needs to be one or more characters preceding Id.]
I would use
\b[A-Za-z]*Id\b
The \b matches the beginning and end of a word i.e. space, tab or newline, or the beginning or end of a string.
The [A-Za-z] will match any letter, and the * means that 0+ get matched. Finally there is the Id.
Note that this will match words that have capital letters in the middle such as 'teStId'.
I use http://www.regular-expressions.info/ for regex reference
Regex ids = new Regex(#"\w*Id\b", RegexOptions.None);
\b means "word break" and \w means any word character. So \w*Id\b means "{stuff}Id". By not including RegexOptions.IgnoreCase, it will be case sensitive.

Regex for a valid hashtag

I need regular expression for validating a hashtag. Each hashtag should starts with hashtag("#").
Valid inputs:
1. #hashtag_abc
2. #simpleHashtag
3. #hashtag123
Invalid inputs:
1. #hashtag#
2. #hashtag#hashtag
I have been trying with this regex /#[a-zA-z0-9]/ but it is accepting invalid inputs also.
Any suggestions for how to do it?
The current accepted answer fails in a few places:
It accepts hashtags that have no letters in them (i.e. "#11111", "#___" both pass).
It will exclude hashtags that are separated by spaces ("hey there #friend" fails to match "#friend").
It doesn't allow you to place a min/max length on the hashtag.
It doesn't offer a lot of flexibility if you decide to add other symbols/characters to your valid input list.
Try the following regex:
/(^|\B)#(?![0-9_]+\b)([a-zA-Z0-9_]{1,30})(\b|\r)/g
It'll close up the above edge cases, and furthermore:
You can change {1,30} to your desired min/max
You can add other symbols to the [0-9_] and [a-zA-Z0-9_] blocks if you wish to later
Here's a link to the demo.
To answer the current question...
There are 2 issues:
[A-z] allows more than just letter chars ([, , ], ^, _, ` )
There is no quantifier after the character class and it only matches 1 char
Since you are validating the whole string, you also need anchors (^ and $)to ensure a full string match:
/^#\w+$/
See the regex demo.
If you want to extract specific valid hashtags from longer texts...
This is a bonus section as a lot of people seek to extract (not validate) hashtags, so here are a couple of solutions for you. Just mind that \w in JavaScript (and a lot of other regex libraries) equal to [a-zA-Z0-9_]:
#\w{1,30}\b - a # char followed with one to thirty word chars followed with a word boundary
\B#\w{1,30}\b - a # char that is either at the start of string or right after a non-word char, then one to thirty word (i.e. letter, digit, or underscore) chars followed with one to thirty word chars followed with a word boundary
\B#(?![\d_]+\b)(\w{1,30})\b - # that is either at the start of string or right after a non-word char, then one to thirty word (i.e. letter, digit, or underscore) chars (that cannot be just digits/underscores) followed with a word boundary
And last but not least, here is a Twitter hashtag regex from https://github.com/twitter/twitter-text/tree/master/js... Sorry, too long to paste in the SO post, here it is: https://gist.github.com/stribizhev/715ee1ee2dc1439ffd464d81d22f80d1.
You could try the this : /#[a-zA-Z0-9_]+/
This will only include letters, numbers & underscores.
A regex code that matches any hashtag.
In this approach any character is accepted in hashtags except main signs !##$%^&*()
(?<=(\s|^))#[^\s\!\#\#\$\%\^\&\*\(\)]+(?=(\s|$))
Usage Notes
Turn on "g" and "m" flags when using!
It is tested for Java and JavaScript languages via https://regex101.com and VSCode tools.
It is available on this repo.
Unicode general categories can help with that task:
/^#[\p{L}\p{Nd}_]+$/gu
I use \p{L} and \p{Nd} unicode categories to match any letter or decimal digit number. You can add any necessary category for your regex. The complete list of categories can be found here: https://unicode.org/reports/tr18/#General_Category_Property
Regex live demo:
https://regexr.com/5tvmo
useful and tested regex for detecting hashtags in the text
/(^|\s)(#[a-zA-Z\d_]+)/ig
examples of valid matching hashtag:
#abc
#ab_c
#ABC
#aBC
/\B(?:#|#)((?![\p{N}_]+(?:$|\b|\s))(?:[\p{L}\p{M}\p{N}_]{1,60}))/ug
allow any language characters or characters with numbers or _.
numbers alone or numbers with _ are not allowed.
It's unicode regex, so if you are using Python, you may need to install regex.
to test it https://regex101.com/r/NLHUQh/1

Regex expression to capitalise first character

For the address field, I need first character of every word to be uppercase. I have been using /\b./g which has eventually resulted in a problem where first character after special characters such as !#*&;' and so on are also capitalised. ie. King'S Street instead of King's Street.
Is there a way to adjust that expression to exclude that behaviour or is changing the entire expression more optimal?
replace \b with (^|[ ])
Your regex will be: /(^|[ ])./g
Explanation:
\b by definition: is used to find a match at the beginning or end of a word.
(^|[ ]) will match with the beginning of the string or any space characters
(^|[ ]). will match every space followed by a character and the first character of the string.
Side note:
Use (^|\s) to match every blank spaces.
Your regex will be: /(^|\s)./g
You could use a lookahead:
\b[a-z](?=\w+)
See a demo on regex101.com.

How to extract the last word in a string with a JavaScript regex?

I need is the last match. In the case below the word test without the $ signs or any other special character:
Test String:
$this$ $is$ $a$ $test$
Regex:
\b(\w+)\b
The $ represents the end of the string, so...
\b(\w+)$
However, your test string seems to have dollar sign delimiters, so if those are always there, then you can use that instead of \b.
\$(\w+)\$$
var s = "$this$ $is$ $a$ $test$";
document.body.textContent = /\$(\w+)\$$/.exec(s)[1];
If there could be trailing spaces, then add \s* before the end.
\$(\w+)\$\s*$
And finally, if there could be other non-word stuff at the end, then use \W* instead.
\b(\w+)\W*$
In some cases a word may be proceeded by non-word characters, for example, take the following sentence:
Marvelous Marvin Hagler was a very talented boxer!
If we want to match the word boxer all previous answers will not suffice due the fact we have an exclamation mark character proceeding the word. In order for us to ensure a successful capture the following expression will suffice and in addition take into account extraneous whitespace, newlines and any non-word character.
[a-zA-Z]+?(?=\s*?[^\w]*?$)
https://regex101.com/r/D3bRHW/1
We are informing upon the following:
We are looking for letters only, either uppercase or lowercase.
We will expand only as necessary.
We leverage a positive lookahead.
We exclude any word boundary.
We expand that exclusion,
We assert end of line.
The benefit here are that we do not need to assert any flags or word boundaries, it will take into account non-word characters and we do not need to reach for negate.
var input = "$this$ $is$ $a$ $test$";
If you use var result = input.match("\b(\w+)\b") an array of all the matches will be returned next you can get it by using pop() on the result or by doing: result[result.length]
Your regex will find a word, and since regexes operate left to right it will find the first word.
A \w+ matches as many consecutive alphanumeric character as it can, but it must match at least 1.
A \b matches an alphanumeric character next to a non-alphanumeric character. In your case this matches the '$' characters.
What you need is to anchor your regex to the end of the input which is denoted in a regex by the $ character.
To support an input that may have more than just a '$' character at the end of the line, spaces or a period for instance, you can use \W+ which matches as many non-alphanumeric characters as it can:
\$(\w+)\W+$
Avoid regex - use .split and .pop the result. Use .replace to remove the special characters:
var match = str.split(' ').pop().replace(/[^\w\s]/gi, '');
DEMO

regular expression using javascript

I have a regular expression ^(?=.*?[A-Za-z])\S*$ which indicates that the input should contain alphabets and can contain special characters or digits along with the alphabets. But it is not allowing white spaces since i have used \S.
Can some one suggest me a reg exp which should contain alphabets and it can contain digits or special characters and white space but alphabets are must and the last character should not end with a white space
Quite simply:
^(?=.*?[A-Za-z]).*$
Note that in JavaScript . doesn't match new lines, and there is no dot-all flag (/s). You can use something like [\s\S] instead if that is an issue:
^(?=[\s\S]*?[A-Za-z])[\s\S]*$
Since you only have a single lookahead, you can simplify the pattern to:
^.*[A-Za-z].*$
Or, even simpler:
[A-Za-z]
[A-Za-z] will match if it finds a letter anywhere in the string, you don't really need to search the rest from start to end.
To also validate the last character isn't a whitespace, it is probably easiest to use the lookahead again (as it basically means AND in regular expressions:
^(?=.*?[A-Za-z]).*\S$

Categories

Resources