Javascript Regex to avoid same character twice consecutively

Javascript Regex to avoid same character twice consecutively - javascript

I need a regular expression to avoid having the same character(# is the character) twice consecutively but can have them at muliple places.
For example:
someword#someword is ok
someword##someword is not ok
someword#someword#someword is ok too.
So basically this is my existing regular expression /^([a-zA-Z0-9'\-\x80-\xff\*\+ ]+),([a-zA-Z0-9'\-\x80-\xff\*\+\# ]+)$/ where the first group is the last name and second group is the first name. I have introduced a magical character # in the first name group which I will replace with a space when saving. The problem is I cannot have consecutive # symbols.

Looks for any repeated characters (repeated once):
/(.)\1/.test(string) // returns true if repeated characters are found
Looks for a repeated #:
string.indexOf('##') !== -1 // returns true if ## is found

str.replace(/(#{2,})/g,'#')
works for any number of occorences.. ##, ###, #### etc

str.replace(/##/g,'#')
finds and replaces all instances of '##' by '#'. Also works if you have more than 2 consecutive '#' signs. Doesn't replace single # signs or things that aren't # signs.
edit: if you don't have to replace but just want to test on it:
/##/.test(str)

Try this regex
/^(?!.*(.)\1)[a-zA-Z][a-zA-Z\d#]*$/
This regex will not allow # consecutively

I didn't find any pure regex solution that worked for your use-case
So here is mine:
^([a-z0-9]#?)*[a-z0-9]$
|____________|______|
| |-> Makes sure your string ends with an alphanumeric character
|-> Makes sure the start of the string is a mix of single # or alphanumeric characters
| It has to start with an alphanumeric character

Related

exclude full word with javascript regex word boundary

I'am looking to exclude matches that contain a specific word or phrase. For example, how could I match only lines 1 and 3? the \b word boundary does not work intuitively like I expected.
foo.js # match
foo_test.js # do not match
foo.ts # match
fun_tset.js # match
fun_tset_test.ts # do not match
UPDATE
What I want to exclude is strings ending explicitly with _test before the extension. At first I had something like [^_test], but that also excludes any combination of those characters (like line 3).

Regex: ^(?!.*_test\.).*$
Working examples: https://regex101.com/r/HdGom7/1
Why it works: uses negative lookahead to check if _test. exists somewhere in the string, and if so doesn't match it.

Adding to #pretzelhammer's answer, it looks like you want to grab strings that are file names ending in ts or js:
^(?!.*_test)(.*\.[jt]s)
The expression in the first parentheses is a negative lookahead that excludes any strings with _test, the second parentheses matches any strings that end in a period, followed by [jt] (j or t), followed by s.

Regex consume a character if it matches, but not otherwise

I am trying to write a regex expression which will capture all instances of the '#' character, except when two such characters appear in succession (essentially, an escape sequence). For example:
abd#ajk: # should be matched
abd##ajk: No matches
abd###ajk: The final # should match.
abd####ajk: No matches
This almost works with the negative lookahead expression #(?!#), except that because the second # is not consumed, the last of two # symbols will still be matched. What I think I want to do is to lookahead for an # but consume the character if it is there; otherwise, do not consume it. Is this possible?
Edit: I'm using Javascript which unfortunately rules out several good approaches :(

In JavaScript, to split strings at an unescaped #, you may actually match chunks of text that is either ## (an escaped #) and any chars other than #:
var strs = ['abd#ajk','abd##ajk','abd###ajk','abd####ajk'];
var rx = /(?:[^#]|##)+/g;
for (var s of strs) {
console.log(s, "=>", s.match(rx))
}
The regex is
/(?:[^#]|##)+/g
See its demo
Details
(?: - start of a non-capturing group that matches either of the 2 alternatives:
[^#]- any char other than#`
| - or
## - 2 #s
)+ - repeat matching 1 or more times.
The g modifier finds all matching occurrences inside the input string.

Since you didn't tag a programming language to your question here is my 2 cents for Java:
(?<=(?<!#)(?:##){0,999})#(?!#)
Java doesn't support infinite lookbehinds but bounded so here I explicitly specified max of even occurrences of #: 999.
JavsScript
Lookbehinds in JavaScript are not implemented and supported by many browsers yet. If you are trying to do this in JS then this would be your working solution:
Method 1
((?:[^#]*(?:##)+[^#]*)+)|#
(?:[^#]*(?:##)+[^#]*)+ Match ## occurrences and all its leading / trailing characters
|# Or a single #
JS Code:
str.split(/((?:[^#]*(?:##)+[^#]*)+)|#/).filter(Boolean);
Method 2 (Recommended)
Or if you don't have problem with using match() this is much more cleaner and of course faster:
(?:[^#]*(?:##)+[^#]*)+|[^#]+
JS Code:
console.log(
"aaaa#######bbb#aa###cccc##ddddd#".match(/(?:[^#]*(?:##)+[^#]*)+|[^#]+/g)
);

Regex delimit the start of a string and the end

I'm been having trouble with regex, which I doesn't understand at all.
I have a string '#anything#that#i#say' and want that the regex detect one word per #, so it will be [#anything, #that, #i, #say].
Need to work with spaces too :(
The closest that I came is [#\w]+, but this only get 1 word and I want separated.

You're close; [#\w] will match anything that is either a # or a word character. But what you want is to match a single # followed by any number of word characters, like this: #\w+ without the brackets
var str = "#anything#that#i#say";
var regexp = /#\w+/gi;
console.log(str.match(regexp));
It's possible to have this deal with spaces as well, but I'd need to see an example of what you mean to tell you how; there are lots of ways that "need to work with spaces" can be interpreted, and I'd rather not guess.

use expression >> /#\s*(\w+)/g
\s* : to check if zero or more spaces you have between # and word
This will match 4 word in your string '#anything#that#i#say'
even your string is containing space between '#anything# that#i# say'
sample to check: http://www.regextester.com/?fam=97638

regular expression to replace with ','

I have one RegExp, could anyone explain exactly what it does?
Regexp
b=b.replace(/(\d{1,3}(?=(?:\d\d\d)+(?!\d)))/g,"$1 ")
I think it is replacing with space(' ')
if i'm right, i want to replace it with comma(,) instead of space(' ').

To explain the regex, let's break it down:
( # Match and capture in group number 1:
\d{1,3} # one to three digits (as many as possible),
(?= # but only if it's possible to match the following afterwards:
(?: # A (non-capturing) group containing
\d\d\d # exactly three digits
)+ # once or more (so, three/six/nine/twelve/... digits)
(?!\d) # but only if there are no further digits ahead.
) # End of (?=...) lookahead assertion
) # End of capturing group
Actually, the outer parentheses are unnecessary if you use $& instead of $1 for the replacement string ($& contains the entire match).

The regex (\d{1,3}(?=(?:\d\d\d)+(?!\d))) matches any 1-3 digits ((\d{1,3}) that is followed by a multiple of 3 digits ((?:\d\d\d)+), that isn't followed by another digit ((?!\d)). It replaces it with "$1 ". $1 is replaced by the first capture group. The space behind it is... a space.
See regexpressions on mdn for more information about the different syntaxes.
If you want to seperate the numbers with a comma, instead of a space, you'll need to replace it with "$1," instead.

Don't try to solve everything by using regular expressions.
Regular expressions are meant for matching, not to fix non-text-encoded-as-text formatting.
If you want to format numbers differently, extract them and use format strings to reformat them on a character processing level. That is just an ugly hack.
It is okay to use regular expressions to find the numbers in the text, e.g. \d{4,} but trying to do the actual formatting with regexp is a crazy abuse.

Regular expression for excluding some characters with multiline matching

I want to ensure that the user input doesn't contain characters like <, > or &#, whether it is text input or textarea. My pattern:
var pattern = /^((?!&#|<|>).)*$/m;
The problem is, that it still matches multiline strings from a textarea like
this text matches
though this should not, because of this character <
EDIT:
To be more clear, I need exclude &# combination only, not & or #.
Please suggest the solution. Very grateful.

You're probably not looking for m (multiline) switch but s (DOTALL) switch in Javascript. Unfortunately s doesn't exist in Javascript.
However good news that DOTALL can be simulated using [\s\S]. Try following regex:
/^(?![\s\S]*?(&#|<|>))[\s\S]*$/
OR:
/^((?!&#|<|>)[\s\S])*$/
Live Demo

I don't think you need a lookaround assertion in this case. Simply use a negated character class:
var pattern = /^[^<>&#]*$/m;
If you're also disallowing the following characters, -, [, ], make sure to escape them or put them in proper order:
var pattern = /^[^][<>&#-]*$/m;

Alternate answer to specific question:
anubhava's solution works accurately, but is slow because it must perform a negative lookahead at each and every character position in the string. A simpler approach is to use reverse logic. i.e. Instead of verifying that: /^((?!&#|<|>)[\s\S])*$/ does match, verify that /[<>]|&#/ does NOT match. To illustrate this, lets create a function: hasSpecial() which tests if a string has one of the special chars. Here are two versions, the first uses anubhava's second regex:
function hasSpecial_1(text) {
// If regex matches, then string does NOT contain special chars.
return /^((?!&#|<|>)[\s\S])*$/.test(text) ? false : true;
}
function hasSpecial_2(text) {
// If regex matches, then string contains (at least) one special char.
return /[<>]|&#/.test(text) ? true : false;
}
These two functions are functionally equivalent, but the second one is probably quite a bit faster.
Note that when I originally read this question, I misinterpreted it to really want to exclude HTML special chars (including HTML entities). If that were the case, then the following solution will do just that.
Test if a string contains HTML special Chars:
It appears that the OP want to ensure a string does not contain any special HTML characters including: <, >, as well as decimal and hex HTML entities such as:  ,  , etc. If this is the case then the solution should probably also exclude the other (named) type of HTML entities such as: &, <, etc. The solution below excludes all three forms of HTML entities as well as the <> tag delimiters.
Here are two approaches: (Note that both approaches do allow the sequence: &# if it is not part of a valid HTML entity.)
FALSE test using positive regex:
function hasHtmlSpecial_1(text) {
/* Commented regex:
# Match string having no special HTML chars.
^ # Anchor to start of string.
[^<>&]* # Zero or more non-[<>&] (normal*).
(?: # Unroll the loop. ((special normal*)*)
& # Allow a & but only if
(?! # not an HTML entity (3 valid types).
(?: # One from 3 types of HTML entities.
[a-z\d]+ # either a named entity,
| \#\d+ # or a decimal entity,
| \#x[a-f\d]+ # or a hex entity.
) # End group of HTML entity types.
; # All entities end with ";".
) # End negative lookahead.
[^<>&]* # More (normal*).
)* # End unroll the loop.
$ # Anchor to end of string.
*/
var re = /^[^<>&]*(?:&(?!(?:[a-z\d]+|#\d+|#x[a-f\d]+);)[^<>&]*)*$/i;
// If regex matches, then string does NOT contain HTML special chars.
return re.test(text) ? false : true;
}
Note that the above regex utilizes Jeffrey Friedl's "Unrolling-the-Loop" efficiency technique and will run very quickly for both matching and non-matching cases. (See his regex masterpiece: Mastering Regular Expressions (3rd Edition))
TRUE test using negative regex:
function hasHtmlSpecial_2(text) {
/* Commented regex:
# Match string having one special HTML char.
[<>] # Either a tag delimiter
| & # or a & if start of
(?: # one of 3 types of HTML entities.
[a-z\d]+ # either a named entity,
| \#\d+ # or a decimal entity,
| \#x[a-f\d]+ # or a hex entity.
) # End group of HTML entity types.
; # All entities end with ";".
*/
var re = /[<>]|&(?:[a-z\d]+|#\d+|#x[a-f\d]+);/i;
// If regex matches, then string contains (at least) one special HTML char.
return re.test(text) ? true : false;
}
Note also that I have included a commented version of each of these (non-trivial) regexes in the form of a JavaScript comment.

Develop Reference

JavaScript is the programming language of the Web.

Javascript Regex to avoid same character twice consecutively - javascript

Looks for any repeated characters (repeated once): /(.)\1/.test(string) // returns true if repeated characters are found Looks for a repeated #: string.indexOf('##') !== -1 // returns true if ## is found

str.replace(/(#{2,})/g,'#') works for any number of occorences.. ##, ###, #### etc

str.replace(/##/g,'#') finds and replaces all instances of '##' by '#'. Also works if you have more than 2 consecutive '#' signs. Doesn't replace single # signs or things that aren't # signs. edit: if you don't have to replace but just want to test on it: /##/.test(str)

Try this regex /^(?!.(.)\1)[a-zA-Z][a-zA-Z\d#]$/ This regex will not allow # consecutively

Related

exclude full word with javascript regex word boundary

Regex consume a character if it matches, but not otherwise

Regex delimit the start of a string and the end

regular expression to replace with ','

Regular expression for excluding some characters with multiline matching

Categories

Resources

Develop Reference

JavaScript is the programming language of the Web.

Javascript Regex to avoid same character twice consecutively - javascript

Looks for any repeated characters (repeated once): /(.)\1/.test(string) // returns true if repeated characters are found Looks for a repeated #: string.indexOf('##') !== -1 // returns true if ## is found

str.replace(/(#{2,})/g,'#') works for any number of occorences.. ##, ###, #### etc

str.replace(/##/g,'#') finds and replaces all instances of '##' by '#'. Also works if you have more than 2 consecutive '#' signs. Doesn't replace single # signs or things that aren't # signs. edit: if you don't have to replace but just want to test on it: /##/.test(str)

Try this regex /^(?!.*(.)\1)[a-zA-Z][a-zA-Z\d#]*$/ This regex will not allow # consecutively

Related

exclude full word with javascript regex word boundary

Regex consume a character if it matches, but not otherwise

Regex delimit the start of a string and the end

regular expression to replace with ','

Regular expression for excluding some characters with multiline matching

Categories

Resources

Try this regex /^(?!.(.)\1)[a-zA-Z][a-zA-Z\d#]$/ This regex will not allow # consecutively