Javascript Regex to avoid same character twice consecutively - javascript

I need a regular expression to avoid having the same character(# is the character) twice consecutively but can have them at muliple places.
For example:
someword#someword is ok
someword##someword is not ok
someword#someword#someword is ok too.
So basically this is my existing regular expression /^([a-zA-Z0-9'\-\x80-\xff\*\+ ]+),([a-zA-Z0-9'\-\x80-\xff\*\+\# ]+)$/ where the first group is the last name and second group is the first name. I have introduced a magical character # in the first name group which I will replace with a space when saving. The problem is I cannot have consecutive # symbols.

Looks for any repeated characters (repeated once):
/(.)\1/.test(string) // returns true if repeated characters are found
Looks for a repeated #:
string.indexOf('##') !== -1 // returns true if ## is found

str.replace(/(#{2,})/g,'#')
works for any number of occorences.. ##, ###, #### etc

str.replace(/##/g,'#')
finds and replaces all instances of '##' by '#'. Also works if you have more than 2 consecutive '#' signs. Doesn't replace single # signs or things that aren't # signs.
edit: if you don't have to replace but just want to test on it:
/##/.test(str)

Try this regex
/^(?!.*(.)\1)[a-zA-Z][a-zA-Z\d#]*$/
This regex will not allow # consecutively

I didn't find any pure regex solution that worked for your use-case
So here is mine:
^([a-z0-9]#?)*[a-z0-9]$
|____________|______|
| |-> Makes sure your string ends with an alphanumeric character
|-> Makes sure the start of the string is a mix of single # or alphanumeric characters
| It has to start with an alphanumeric character

Related

exclude full word with javascript regex word boundary

I'am looking to exclude matches that contain a specific word or phrase. For example, how could I match only lines 1 and 3? the \b word boundary does not work intuitively like I expected.
foo.js # match
foo_test.js # do not match
foo.ts # match
fun_tset.js # match
fun_tset_test.ts # do not match
UPDATE
What I want to exclude is strings ending explicitly with _test before the extension. At first I had something like [^_test], but that also excludes any combination of those characters (like line 3).
Regex: ^(?!.*_test\.).*$
Working examples: https://regex101.com/r/HdGom7/1
Why it works: uses negative lookahead to check if _test. exists somewhere in the string, and if so doesn't match it.
Adding to #pretzelhammer's answer, it looks like you want to grab strings that are file names ending in ts or js:
^(?!.*_test)(.*\.[jt]s)
The expression in the first parentheses is a negative lookahead that excludes any strings with _test, the second parentheses matches any strings that end in a period, followed by [jt] (j or t), followed by s.

Regex consume a character if it matches, but not otherwise

I am trying to write a regex expression which will capture all instances of the '#' character, except when two such characters appear in succession (essentially, an escape sequence). For example:
abd#ajk: # should be matched
abd##ajk: No matches
abd###ajk: The final # should match.
abd####ajk: No matches
This almost works with the negative lookahead expression #(?!#), except that because the second # is not consumed, the last of two # symbols will still be matched. What I think I want to do is to lookahead for an # but consume the character if it is there; otherwise, do not consume it. Is this possible?
Edit: I'm using Javascript which unfortunately rules out several good approaches :(
In JavaScript, to split strings at an unescaped #, you may actually match chunks of text that is either ## (an escaped #) and any chars other than #:
var strs = ['abd#ajk','abd##ajk','abd###ajk','abd####ajk'];
var rx = /(?:[^#]|##)+/g;
for (var s of strs) {
console.log(s, "=>", s.match(rx))
}
The regex is
/(?:[^#]|##)+/g
See its demo
Details
(?: - start of a non-capturing group that matches either of the 2 alternatives:
[^#]- any char other than#`
| - or
## - 2 #s
)+ - repeat matching 1 or more times.
The g modifier finds all matching occurrences inside the input string.
Since you didn't tag a programming language to your question here is my 2 cents for Java:
(?<=(?<!#)(?:##){0,999})#(?!#)
Java doesn't support infinite lookbehinds but bounded so here I explicitly specified max of even occurrences of #: 999.
JavsScript
Lookbehinds in JavaScript are not implemented and supported by many browsers yet. If you are trying to do this in JS then this would be your working solution:
Method 1
((?:[^#]*(?:##)+[^#]*)+)|#
(?:[^#]*(?:##)+[^#]*)+ Match ## occurrences and all its leading / trailing characters
|# Or a single #
JS Code:
str.split(/((?:[^#]*(?:##)+[^#]*)+)|#/).filter(Boolean);
Method 2 (Recommended)
Or if you don't have problem with using match() this is much more cleaner and of course faster:
(?:[^#]*(?:##)+[^#]*)+|[^#]+
JS Code:
console.log(
"aaaa#######bbb#aa###cccc##ddddd#".match(/(?:[^#]*(?:##)+[^#]*)+|[^#]+/g)
);

Regex delimit the start of a string and the end

I'm been having trouble with regex, which I doesn't understand at all.
I have a string '#anything#that#i#say' and want that the regex detect one word per #, so it will be [#anything, #that, #i, #say].
Need to work with spaces too :(
The closest that I came is [#\w]+, but this only get 1 word and I want separated.
You're close; [#\w] will match anything that is either a # or a word character. But what you want is to match a single # followed by any number of word characters, like this: #\w+ without the brackets
var str = "#anything#that#i#say";
var regexp = /#\w+/gi;
console.log(str.match(regexp));
It's possible to have this deal with spaces as well, but I'd need to see an example of what you mean to tell you how; there are lots of ways that "need to work with spaces" can be interpreted, and I'd rather not guess.
use expression >> /#\s*(\w+)/g
\s* : to check if zero or more spaces you have between # and word
This will match 4 word in your string '#anything#that#i#say'
even your string is containing space between '#anything# that#i# say'
sample to check: http://www.regextester.com/?fam=97638

regular expression to replace with ','

I have one RegExp, could anyone explain exactly what it does?
Regexp
b=b.replace(/(\d{1,3}(?=(?:\d\d\d)+(?!\d)))/g,"$1 ")
I think it is replacing with space(' ')
if i'm right, i want to replace it with comma(,) instead of space(' ').
To explain the regex, let's break it down:
( # Match and capture in group number 1:
\d{1,3} # one to three digits (as many as possible),
(?= # but only if it's possible to match the following afterwards:
(?: # A (non-capturing) group containing
\d\d\d # exactly three digits
)+ # once or more (so, three/six/nine/twelve/... digits)
(?!\d) # but only if there are no further digits ahead.
) # End of (?=...) lookahead assertion
) # End of capturing group
Actually, the outer parentheses are unnecessary if you use $& instead of $1 for the replacement string ($& contains the entire match).
The regex (\d{1,3}(?=(?:\d\d\d)+(?!\d))) matches any 1-3 digits ((\d{1,3}) that is followed by a multiple of 3 digits ((?:\d\d\d)+), that isn't followed by another digit ((?!\d)). It replaces it with "$1 ". $1 is replaced by the first capture group. The space behind it is... a space.
See regexpressions on mdn for more information about the different syntaxes.
If you want to seperate the numbers with a comma, instead of a space, you'll need to replace it with "$1," instead.
Don't try to solve everything by using regular expressions.
Regular expressions are meant for matching, not to fix non-text-encoded-as-text formatting.
If you want to format numbers differently, extract them and use format strings to reformat them on a character processing level. That is just an ugly hack.
It is okay to use regular expressions to find the numbers in the text, e.g. \d{4,} but trying to do the actual formatting with regexp is a crazy abuse.

Regular expression for excluding some characters with multiline matching

I want to ensure that the user input doesn't contain characters like <, > or &#, whether it is text input or textarea. My pattern:
var pattern = /^((?!&#|<|>).)*$/m;
The problem is, that it still matches multiline strings from a textarea like
this text matches
though this should not, because of this character <
EDIT:
To be more clear, I need exclude &# combination only, not & or #.
Please suggest the solution. Very grateful.
You're probably not looking for m (multiline) switch but s (DOTALL) switch in Javascript. Unfortunately s doesn't exist in Javascript.
However good news that DOTALL can be simulated using [\s\S]. Try following regex:
/^(?![\s\S]*?(&#|<|>))[\s\S]*$/
OR:
/^((?!&#|<|>)[\s\S])*$/
Live Demo
I don't think you need a lookaround assertion in this case. Simply use a negated character class:
var pattern = /^[^<>&#]*$/m;
If you're also disallowing the following characters, -, [, ], make sure to escape them or put them in proper order:
var pattern = /^[^][<>&#-]*$/m;
Alternate answer to specific question:
anubhava's solution works accurately, but is slow because it must perform a negative lookahead at each and every character position in the string. A simpler approach is to use reverse logic. i.e. Instead of verifying that: /^((?!&#|<|>)[\s\S])*$/ does match, verify that /[<>]|&#/ does NOT match. To illustrate this, lets create a function: hasSpecial() which tests if a string has one of the special chars. Here are two versions, the first uses anubhava's second regex:
function hasSpecial_1(text) {
// If regex matches, then string does NOT contain special chars.
return /^((?!&#|<|>)[\s\S])*$/.test(text) ? false : true;
}
function hasSpecial_2(text) {
// If regex matches, then string contains (at least) one special char.
return /[<>]|&#/.test(text) ? true : false;
}
These two functions are functionally equivalent, but the second one is probably quite a bit faster.
Note that when I originally read this question, I misinterpreted it to really want to exclude HTML special chars (including HTML entities). If that were the case, then the following solution will do just that.
Test if a string contains HTML special Chars:
It appears that the OP want to ensure a string does not contain any special HTML characters including: <, >, as well as decimal and hex HTML entities such as:  ,  , etc. If this is the case then the solution should probably also exclude the other (named) type of HTML entities such as: &, <, etc. The solution below excludes all three forms of HTML entities as well as the <> tag delimiters.
Here are two approaches: (Note that both approaches do allow the sequence: &# if it is not part of a valid HTML entity.)
FALSE test using positive regex:
function hasHtmlSpecial_1(text) {
/* Commented regex:
# Match string having no special HTML chars.
^ # Anchor to start of string.
[^<>&]* # Zero or more non-[<>&] (normal*).
(?: # Unroll the loop. ((special normal*)*)
& # Allow a & but only if
(?! # not an HTML entity (3 valid types).
(?: # One from 3 types of HTML entities.
[a-z\d]+ # either a named entity,
| \#\d+ # or a decimal entity,
| \#x[a-f\d]+ # or a hex entity.
) # End group of HTML entity types.
; # All entities end with ";".
) # End negative lookahead.
[^<>&]* # More (normal*).
)* # End unroll the loop.
$ # Anchor to end of string.
*/
var re = /^[^<>&]*(?:&(?!(?:[a-z\d]+|#\d+|#x[a-f\d]+);)[^<>&]*)*$/i;
// If regex matches, then string does NOT contain HTML special chars.
return re.test(text) ? false : true;
}
Note that the above regex utilizes Jeffrey Friedl's "Unrolling-the-Loop" efficiency technique and will run very quickly for both matching and non-matching cases. (See his regex masterpiece: Mastering Regular Expressions (3rd Edition))
TRUE test using negative regex:
function hasHtmlSpecial_2(text) {
/* Commented regex:
# Match string having one special HTML char.
[<>] # Either a tag delimiter
| & # or a & if start of
(?: # one of 3 types of HTML entities.
[a-z\d]+ # either a named entity,
| \#\d+ # or a decimal entity,
| \#x[a-f\d]+ # or a hex entity.
) # End group of HTML entity types.
; # All entities end with ";".
*/
var re = /[<>]|&(?:[a-z\d]+|#\d+|#x[a-f\d]+);/i;
// If regex matches, then string contains (at least) one special HTML char.
return re.test(text) ? true : false;
}
Note also that I have included a commented version of each of these (non-trivial) regexes in the form of a JavaScript comment.

Categories

Resources