Regex for a valid hashtag - javascript

I need regular expression for validating a hashtag. Each hashtag should starts with hashtag("#").
Valid inputs:
1. #hashtag_abc
2. #simpleHashtag
3. #hashtag123
Invalid inputs:
1. #hashtag#
2. #hashtag#hashtag
I have been trying with this regex /#[a-zA-z0-9]/ but it is accepting invalid inputs also.
Any suggestions for how to do it?

The current accepted answer fails in a few places:
It accepts hashtags that have no letters in them (i.e. "#11111", "#___" both pass).
It will exclude hashtags that are separated by spaces ("hey there #friend" fails to match "#friend").
It doesn't allow you to place a min/max length on the hashtag.
It doesn't offer a lot of flexibility if you decide to add other symbols/characters to your valid input list.
Try the following regex:
/(^|\B)#(?![0-9_]+\b)([a-zA-Z0-9_]{1,30})(\b|\r)/g
It'll close up the above edge cases, and furthermore:
You can change {1,30} to your desired min/max
You can add other symbols to the [0-9_] and [a-zA-Z0-9_] blocks if you wish to later
Here's a link to the demo.

To answer the current question...
There are 2 issues:
[A-z] allows more than just letter chars ([, , ], ^, _, ` )
There is no quantifier after the character class and it only matches 1 char
Since you are validating the whole string, you also need anchors (^ and $)to ensure a full string match:
/^#\w+$/
See the regex demo.
If you want to extract specific valid hashtags from longer texts...
This is a bonus section as a lot of people seek to extract (not validate) hashtags, so here are a couple of solutions for you. Just mind that \w in JavaScript (and a lot of other regex libraries) equal to [a-zA-Z0-9_]:
#\w{1,30}\b - a # char followed with one to thirty word chars followed with a word boundary
\B#\w{1,30}\b - a # char that is either at the start of string or right after a non-word char, then one to thirty word (i.e. letter, digit, or underscore) chars followed with one to thirty word chars followed with a word boundary
\B#(?![\d_]+\b)(\w{1,30})\b - # that is either at the start of string or right after a non-word char, then one to thirty word (i.e. letter, digit, or underscore) chars (that cannot be just digits/underscores) followed with a word boundary
And last but not least, here is a Twitter hashtag regex from https://github.com/twitter/twitter-text/tree/master/js... Sorry, too long to paste in the SO post, here it is: https://gist.github.com/stribizhev/715ee1ee2dc1439ffd464d81d22f80d1.

You could try the this : /#[a-zA-Z0-9_]+/
This will only include letters, numbers & underscores.

A regex code that matches any hashtag.
In this approach any character is accepted in hashtags except main signs !##$%^&*()
(?<=(\s|^))#[^\s\!\#\#\$\%\^\&\*\(\)]+(?=(\s|$))
Usage Notes
Turn on "g" and "m" flags when using!
It is tested for Java and JavaScript languages via https://regex101.com and VSCode tools.
It is available on this repo.

Unicode general categories can help with that task:
/^#[\p{L}\p{Nd}_]+$/gu
I use \p{L} and \p{Nd} unicode categories to match any letter or decimal digit number. You can add any necessary category for your regex. The complete list of categories can be found here: https://unicode.org/reports/tr18/#General_Category_Property
Regex live demo:
https://regexr.com/5tvmo

useful and tested regex for detecting hashtags in the text
/(^|\s)(#[a-zA-Z\d_]+)/ig
examples of valid matching hashtag:
#abc
#ab_c
#ABC
#aBC

/\B(?:#|#)((?![\p{N}_]+(?:$|\b|\s))(?:[\p{L}\p{M}\p{N}_]{1,60}))/ug
allow any language characters or characters with numbers or _.
numbers alone or numbers with _ are not allowed.
It's unicode regex, so if you are using Python, you may need to install regex.
to test it https://regex101.com/r/NLHUQh/1

Related

Regex creation to allow, disallow few characters

I am new to regex, i have this use case:
Allow characters, numbers.
Zero or one question mark allowed. (? - valid, consecutive question marks are not allowed (??)).
test-valid
?test - valid
??test- invalid
?test?test - valid
???test-invalid
test??test -invalid
Exlcude $ sign.
[a-zA-Z0-9?] - seems this doesn't work
Thanks.
Try the following regular expression: ^(?!.*\?\?)[a-zA-Z0-9?]+$
first we're using Negetive lookahead - which allows us to exclude any character which is followed by double question marks (Negetive lookahaed does not consume characters)
Since question mark has special meaning in regular expressions (Quantifier — Matches between zero and one times), each question mark is escaped using backslash.
The plus sign at the end is a Quantifier — Matches between one and unlimited times, as many times as possible
You can test it here
Your description can be broken down into the regex:
^(?:\??[a-zA-Z0-9])+\??$
You say characters and your description shows letters and numbers only, but it's possible \w (word characters) may be used instead - this includes underscore
It's between ^ and $ meaning the whole field must match (no partial matches, although if you want those you can remove this. The + means there must be at least one match (so empty string won't match). The capturing group ((\??[a-zA-Z0-9])) says I must either see a question mark followed by letters or just letters repeating many times, and the final question mark allows the string to end with a single question mark.
You probably don't want capturing groups here, so we can start that with ?: to prevent capture leading to:
^(?:\??[a-zA-Z0-9])+\??$
Matches
test
?test
?test?test
test?
Doesn't match
??test
???test
test??test
test??
<empty string>
?

JavaScript Regex to match 2 words and an whitespace character with length limitations

I actually got this regex ^[A-zÀ-ÿ ]{3,50}$ or ^[A-zÀ-ÿ\s]{3,50}$ that finds 3 to 50 characters of this specific alphabets.
I need a new regex to accept only 1 whitespace character \s maintaining the {3,50} limitation.
I tried ^[A-zÀ-ÿ]+\s[A-zÀ-ÿ]{3,50}$ but it is limiting the last tuple to 3-50 and not the whole thing.
Any help would be appreciated,
Thank you
Actually, to match ASCII letters, you need to use [A-Za-z], not [A-z] (see this SO thread).
As for the single obligatory whitespace, it can be added as in your attempt, and the length limitation can be added in the form of a lookahead:
/^(?=.{3,50}$)[A-Za-zÀ-ÿ]+\s[A-Za-zÀ-ÿ]+$/
^^^^^^^^^^^^
See the regex demo.

Regex for matching HashTags in any language

I have a field in my application where users can enter a hashtag.
I want to validate their entry and make sure they enter what would be a proper HashTag.
It can be in any language and it should NOT precede with the # sign.
I am writing in JavaScript.
So the following are GOOD examples:
Abcde45454_fgfgfg (good because: only letters, numbers and _)
2014_is-the-year (good because: only letters, numbers, _ and -)
בר_רפאלי (good because: only letters and _)
арбуз (good because: only letters)
And the following are BAD examples:
Dan Brown (Bad because has a space)
OMG!!!!! (Bad because has !)
בר רפ#לי (Bad because has # and a space)
We had a regex that matched only a-zA-Z0-9, we needed to add language support so we changed it to ignore white spaces and forgot to ignore special characters, so here I am.
Some other StackOverflow examples I saw but didn't work for me:
Other languges don't work
Again, English only
[edit]
Added explanation why bad is bad and good is good
I don't want a preceding # character, but if I would to add a # in the beginning, it should be a valid hashtag
Basically I don't want to allow any special characters like !##$%^&*()=+./,[{]};:'"?><
If your disallowed characters list is thorough (!##$%^&*()=+./,[{]};:'"?><), then the regex is:
^#?[^\s!##$%^&*()=+./,\[{\]};:'"?><]+$
Demo
This allows an optional leading # sign: #?. It disallows the special characters using a negative character class. I just added \s to the list (spaces), and also I escaped [ and ].
Unfortunately, you can't use constructs like \p{P} (Unicode punctuation) in JavaScript's regexes, so you basically have to blacklist characters or take a different approach if the regex solution isn't good enough for your needs.
I don't understand why this question does not get more votes. Hashtag detection for multiple languages is a problem. The only working option I could find is posted by Lucas above (all other ones do not work so well).
It needs a modification though:
#[^\s!##$%^&*()=+.\/,\[{\]};:'"?><]+
DEMO
this detects all the hashtags, not only in the beginning of the string, fixes an unescaped character, and removes the unnecessary $ in the end.
First if we exclude all symbol it will not a handy solution. Because symbol depends on keyboard layout and there are hundreds of math symbols and so on. So use this..
[\p{sc=Bengali}|\p{L}_\p{N}]+
1. If you think if language need extra care include like \p{sc=Bengali}|\p{sc=Spanish} etc. Suppose bangla has surrogate alphabet like া, ে ৌ etc so codepoint need to recognize Bangla separately first by \p{sc=Bengali}
2. Than use \p{L} that matches anything that is a Unicode letter a-z and letters like é,ü,ğ,i,ç too or normal any alphabet without complex...matches a single code point in the category "letter"
3. _ underscore allowed
4. \p{N} matches any kind of numeric character in any script. (\d matches only a digit (equal to [0-9]) but for allowed Unicode digit \p{N} only option, because its works with any digit codepoint)

Regex - how to ignore order of the matched groups? [duplicate]

This question already has answers here:
Password REGEX with min 6 chars, at least one letter and one number and may contain special characters
(10 answers)
Closed 2 years ago.
I'm trying to create a regex validation for a password which is meant to be:
6+ characters long
Has at least one a-z
Has at least one A-Z
Has at leat one 0-9
So, in other words, the match will have :
at least one a-z, A-Z, 0-9
at least 3 any other characters
I've came up with:
((.*){3,}[a-z]{1,}[A-Z]{1,}[0-9]{1,})
it seems pretty simple and logical to me, but 2 things go wrong:
quantifier {3,} for (.*) somehow doesn't work and destroys whole regex. At first I had {6,} at the end but then regex would affect the quantifiers in inner groups, so it will require [A-Z]{6,} instead of [A-Z]{1,}
when I remove {3,} the regex works, but will match only if the groups are in order - so that it will match aaBB11, but not BBaa11
This is a use case where I wouldn't use a single regular expression, but multiple simpler ones.
Still, to answer your question: If you only want to validate that the password matches those criteria, you could use lookaheads:
^(?=.{6})(?=.*?[a-z])(?=.*?[A-Z])(?=.*?[0-9])
You're basically looking for a position from which you look at
6 characters (and maybe more to follow, doesn't matter): (?=.{6})
maybe something, then a lowercase letter: (?=.*?[a-z])
maybe something, then an uppercase letter: (?=.*?[A-Z])
maybe something, then a digit: (?=.*?[0-9])
The order of appearance is arbitrary due to the maybe something parts.
(Note that I've interpreted 6 characters long as at least 6 characters long.)
I believe this is what you want:
^(?=.*[a-z])(?=.*[A-Z])(?=.*[0-9])[!-~]{6,}$
If we follow your spec to the letter, your validation password looks like this:
^(?=.*[a-z])(?=.*[A-Z])(?=.*[0-9]).{6,}$
However, we need to improve on this, because apart from the number, lower-case and upper-case letter, are you really willing to accept any character? For instance, can the user use a character in the Thai language? A space character? A tab? Didn't think so. :)
If you want to allow all the printable ASCII characters apart from space, instead of a dot, we can use this character range: [!-~]
How does it work?
The ^ anchor makes sure we start the match at the start of the string
The (?=.*[a-z]) lookahead ensures we have a lower-case character
The (?=.*[A-Z]) lookahead ensures we have an upper-case character
The (?=.*[0-9]) lookahead ensures we a digit
The (?=.*[a-z]) lookahead ensures we have a lower-case character
The [!-~]{6,} matches six or more ASCII printable ASCII characters that are not space.
The $ ensures we have reached the end of the string (otherwise, the password could contain more characters that are not allowed).
you could use this pattern ^(?=.*[a-z])(?=.*[A-Z])(?=.*[0-9]).{6,}

Regex to match card code input

How can I write a regex to match strings following these rules?
1 letter followed by 4 letters or numbers, then
5 letters or numbers, then
3 letters or numbers followed by a number and one of the following signs: ! & # ?
I need to allow input as a 15-character string or as 3 groups of 5 chars separated by one space.
I'm implementing this in JavaScript.
I'm not going to write out the whole regex for you since this is homework, but here are some hints which should help you out:
Use character classes. [A-Z] matches all uppercase. [a-z] matches all lowercase. [0-9] matches numbers. You can combine them like so [A-Za-z0-9].
Use quantifiers like {n} so [A-Z]{3} gives you 3 uppercase letters.
You can put other characters in character classes. Let's say you wanted to match % or # or #, you could do [%##] which would match any of those characters.
Some meta-characters (characters which have special meaning in the context of regular expressions) will need to be escaped like so: \$ (since $ matches the end of a line)
^ and $ match the beginning and end of the line respectively.
\s matches white-space, but if you sanitize your input, you shouldn't need to use this.
Flags after the regex do special things. For example in /[a-z]/i, the i ignores case.
This should be it:
/^[a-z][a-z0-9]{4} ?[a-z0-9]{5} ?[a-z0-9]{3}[0-9][!&#?]$/i
Feel free to change 0-9 and [0-9] with \d if you see fit.
The regex is simple and readable enough. ^ and $ make sure this is a whole match, so there aren't extra characters before or after the code, and the /i flag allows upper or lower case letters.
I would start with a tutorial.
Pay attention to the quantifiers (like {N}) and character classes (like [a-zA-Z])
^[a-zA-Z][a-zA-Z0-9]{4} ?[a-zA-Z0-9]{5} ?[a-zA-Z0-9]{3}[\!\&\#\?]$

Categories

Resources