Removing Special Character in JavaScript - javascript

string = string.replace(/[^a-zA-Z0-9]/g, '');
Can anyone Explain this snippet (/[^a-zA-Z0-9]/g, '') ???
What it means?
I know what it does. But how

The snippet
string = string.replace(/[^a-zA-Z0-9]/g, '');
Uses RegEx (Regular Expression) to find a paten and then replace it.
This is how it works:
The .replace function replaces a specific character or patern in a string (first argument) with another (second argument)
The /[^a-zA-Z0-9]/g is the regex. a-z means lower case characters from a-z, A-Z means upper case characters from a-z and 0-9 means number characters from 0-9. The ^ means not, so, not a letter, uppercase letter or number. The g at the end stands for global meaning it will not just find one match for the patern but all the matches
The empty string is what to replace the patern with. So if it is not a upper/lower case letter or number character it will be replaced with nothing
Read about RegEx here

Related

Regular expression to validate a string starts with a char and contains allowed chars

I need help with my regular expression written in javascript.
I have tried using the regularExpression generator online, and the best i can come up with is the this:
^[a-z.-]{0,50}$
The expression must validate the following
String first char MUST start with a-z (no alpha)
String can contain any char in range a-z (no alpha), 0-9 and the characters dash "-" and dot "."
String can be of max length 50 chars
Examples of success strings
username1
username.lastname
username-anotherstring1
this.is.also.ok
No good strings
1badusername
.verbad
-bad
also very bad has spaces
// Thanks
Almost (assuming "no alpha" means no uppercase letters)
https://regex101.com/r/O9hvLP/3
^[a-z]{1}[a-z0-9\.-]{0,49}$
The {1} is optional, I put it there for descriptive reasons
I think this should cover what you want
^[a-z][a-z0-9.-]{0,49}$
That is starts a-z but then has 0-49 of a-z, 0-9 or .-
Live example: https://regexr.com/5k8eu
Edit: Not sure if you intended to allow upper and lowercase, but if you did both character classes could add A-Z as well!
If the . and - can not be at the end, and there can not be consecutive ones, another option could be:
^[a-z](?=[a-z0-9.-]{0,49}$)[a-z0-9]*(?:[.-][a-z0-9]+)*$
Explanation
^ Start of string
[a-z] Match a single char a-z
(?=[a-z0-9.-]{0,49}$) Assert 0-49 chars to the right to the end of string
[a-z0-9]* Match optional chars a-z0-9
(?:[.-][a-z0-9]+)* Optionally match either . or - and 1+ times a char a-z0-9
$ End of string
Regex demo

How to convert a camelcased string to sentence cased without excluding any special characters?

How to convert a camelcased string to sentence cased without excluding any special characters?
Suggest a regex for converting camelcased string with special characters and numbers to sentence case?:
const string = `includes:SummaryFromDetailHistory1990-AsAbstract`
Expected outcome:
Includes : Summary From Detail History 1990 - As Abstract
Currently I'm using lodash startCase to convert camelCased to sentenceCase. But the issue with this approach is that it is removing special characters like brackets, numbers, parenthesis, hyphens, colons, etc... (most of the special characters)
So the idea is to convert camelcased strings to sentence cased while preserve the string identity
For example:
const anotherString = `thisIsA100CharactersLong:SampleStringContaining-SpecialChar(s)10&20*`
const expectedReturn = `This Is A 100 Characters : Long Sample String Containing - Special Char(s) 10 & 20 *`
Is that possible with regex?
You'll have to deal with all the cases yourself:
[a-z](?=[A-Z]): lowercase followed by uppercase
[a-zA-Z](?=[0-9]): letter followed by digit
[0-9](?=[a-zA-Z]): digit followed by letter
[a-zA-Z0-9](?=[^a-zA-Z0-9]): letter or digit followed by neither letter nor digit (\w and \W could be used, but they cover _ too, so up to you)
[^a-zA-Z0-9](?=[a-zA-Z0-9]): not letter nor digit following by either letter or digit
etc.
Then, you can or them together:
([a-z](?=[A-Z])|[a-zA-Z](?=[0-9])|[0-9](?=[a-zA-Z])|[a-zA-Z0-9](?=[^a-zA-Z0-9])|[^a-zA-Z0-9](?=[a-zA-Z0-9]))
And replace by:
$1
(see the space after $1).
See https://regex101.com/r/4AVbAs/1 for instance.
You will hit edge cases though, e.g. Char(s), so you'll need special rules for the parens for instance (see the following section about lookbehinds that can help for that). A bit of a tough job, quite error prone too and hardly maintainable I'm afraid.
If lookbehinds were allowed, you would not need to capture the first char in each group, but wrap the left patterns in (?<=...) and replace by a simple space directly:
(?<=[a-z])(?=[A-Z]): preceded by lowercase, followed by uppercase
(?<=[a-zA-Z])(?=[0-9]): preceded by letter, followed by digit
(?<=[0-9])(?=[a-zA-Z]): preceded by digit, followed by letter
(?<=[a-zA-Z0-9])(?=[^a-zA-Z0-9])(?!(?:\(s)?\)): preceded by letter or digit, followed by not letter nor digit, as well as not followed by (s) nor )
(?<=[^a-zA-Z0-9])(?<!\()(?=[a-zA-Z0-9]): preceded by not letter nor digit, as well as not preceded by (, followed by letter or digit
or-ed together:
(?<=[a-z])(?=[A-Z])|(?<=[a-zA-Z])(?=[0-9])|(?<=[0-9])(?=[a-zA-Z])|(?<=[a-zA-Z0-9])(?=[^a-zA-Z0-9])(?!(?:\(s)?\))|(?<=[^a-zA-Z0-9])(?<!\()(?=[a-zA-Z0-9])
Replace with an empty space, see https://regex101.com/r/DB91DE/1.
The wanted result doesn't seem to be regular, some special characters are supposed to be preceeded with a space and some are not. Treating the parenthesis like you want is a bit tricky. You can use function to handle the parenthesis, like this:
let parenth = 0;
const str = `thisIsA100CharactersLong:SampleStringContaining-SpecialChar(s)10&20*`,
spaced = str.replace(/[A-Z]|\d+|\W/g, (m) => {
if (m === '(') {
parenth = 1;
return m;
}
if (parenth || m === ')') {
parenth = 0;
return m;
}
return ` ${m}`;
});
console.log(spaced);
If the data can contain other brackets, instead of just checking parentheses, use a RexExp to test any opening bracket: if (/[({[]/.test(m)) ..., and test for closing brackets: if (/[)}\]]/.test(m)) ....
You can test the snippet with different data at jsFiddle.
This is impossible. You cannot do this in regex. You will have to consider exceptions...

How to extract the last word in a string with a JavaScript regex?

I need is the last match. In the case below the word test without the $ signs or any other special character:
Test String:
$this$ $is$ $a$ $test$
Regex:
\b(\w+)\b
The $ represents the end of the string, so...
\b(\w+)$
However, your test string seems to have dollar sign delimiters, so if those are always there, then you can use that instead of \b.
\$(\w+)\$$
var s = "$this$ $is$ $a$ $test$";
document.body.textContent = /\$(\w+)\$$/.exec(s)[1];
If there could be trailing spaces, then add \s* before the end.
\$(\w+)\$\s*$
And finally, if there could be other non-word stuff at the end, then use \W* instead.
\b(\w+)\W*$
In some cases a word may be proceeded by non-word characters, for example, take the following sentence:
Marvelous Marvin Hagler was a very talented boxer!
If we want to match the word boxer all previous answers will not suffice due the fact we have an exclamation mark character proceeding the word. In order for us to ensure a successful capture the following expression will suffice and in addition take into account extraneous whitespace, newlines and any non-word character.
[a-zA-Z]+?(?=\s*?[^\w]*?$)
https://regex101.com/r/D3bRHW/1
We are informing upon the following:
We are looking for letters only, either uppercase or lowercase.
We will expand only as necessary.
We leverage a positive lookahead.
We exclude any word boundary.
We expand that exclusion,
We assert end of line.
The benefit here are that we do not need to assert any flags or word boundaries, it will take into account non-word characters and we do not need to reach for negate.
var input = "$this$ $is$ $a$ $test$";
If you use var result = input.match("\b(\w+)\b") an array of all the matches will be returned next you can get it by using pop() on the result or by doing: result[result.length]
Your regex will find a word, and since regexes operate left to right it will find the first word.
A \w+ matches as many consecutive alphanumeric character as it can, but it must match at least 1.
A \b matches an alphanumeric character next to a non-alphanumeric character. In your case this matches the '$' characters.
What you need is to anchor your regex to the end of the input which is denoted in a regex by the $ character.
To support an input that may have more than just a '$' character at the end of the line, spaces or a period for instance, you can use \W+ which matches as many non-alphanumeric characters as it can:
\$(\w+)\W+$
Avoid regex - use .split and .pop the result. Use .replace to remove the special characters:
var match = str.split(' ').pop().replace(/[^\w\s]/gi, '');
DEMO

How do I need to write this RegEx to match the given test case? (don't match the ending period)

regex:
/#([\S]*?(?=\s)(?!\. ))/g
given string:
'this string has #var.thing.me two strings to be #var. replaced'.replace(/#([\S]*?(?=\s)(?!\. ))/g,function(){return '7';})
expected result:
'this string has 7 two strings to be 7. replaced'
In case you want to make it "better" I'm trying to match Razor Html Encoded Expressions but mind the case about not matching an ending period followed by a space. The test case above shows that with the second (shorter) #var, whereas the first captures as #var.thing.me
Try with following regex:
var input = 'this string has #var.thing.me two strings to be #var. replaced';
input.replace(/(#[a-z][a-z.]+[a-z])/gi, function(){
return '7';
});
This regex (#[a-z]([a-z.]+[a-z])*) matches #, then letter (in case there cannot be dot after #), then letters or dot and letter again at the end.
i modificator allows makes regex case-insensitive.
Your pattern is not restrictive enough i.e., it captures too much. The last #var. (including the dot) in your example string is captured because it is followed by a space (as required by the positive lookahead) which, in addition, is not followed by a dot and a space (as required by the negative lookahead). You can try this pattern:
/#([\S]*?)(?=[.]?\s)/g
It will match the #something substring (which can contain dot characters) both when it is followed by a space (as it happens in the first match of your string) and when it is followed by a dot and a space (as it happens in the second match of your string). Testing it in the chromium browser console it seems to work fine:
> 'this string has #var.thing.me two strings to be #var. replaced'.replace(/#([\S]*?)(?=[.]?\s)/g,function(){return '7';})
"this string has 7 two strings to be 7. replaced"
Try this
#((?!\. )\S)+
See it here at regexr
This matches a # followed by non whitespace characters \S. But it matches the next non whitespace only, if it is not a dot followed by a space. This is ensured by the negative lookahead assertion (?!\. ) before the \S.

Why is 'B' matched by [a-z]?

a very simple & naive question:
why this is true?
new RegExp('^[a-z]+$', 'i').test('B')
apparently 'B' is out of [a-z]?
Yes, but you have the i parameter which tells the regex to ignore case.
From the MDN documentation for RegEx:
Parameters
pattern
The text of the regular expression.
flags
If specified, flags can have any combination of the following values:
...
i
ignore case
It's defining a class, which is to say [a-z] is symbolic of "any character, from a to z."
Regex is, by nature, case SensAtiVe as well, so [a-z] varies from [A-Z] (unless you use the i (case insensitive) flag, like you've demonstrated).
e.g.
/[a-z]/ -- Any single character, a through z
/[A-Z]/ -- Any single uppercase letter, A through Z
/[a-zA-Z]/ -- Any single upper or lowercase letter, a through z
/[a-z]/i or /[A-Z]/i -- (note the i) Any upper or lowercase letter, a through z
Summary
The [a-z] means a character set containing characters a-z.
The ^ is an anchor which means the set must begin with the first character of input.
The + means you must match on one or more from the character set.
The $ is an end anchor meaning the set must end the last character of input.
The i means to ignore case on your input letters.
It means any character between a and z.
As you specified the i flag (case insensitive), it contains also B.
The whole regexp checks that the string contains at least one character and that all characters are in a-z or A-Z.
You can check that new RegExp('^[a-z]+$', 'i').test('B') returns true.

Categories

Resources