Regex expression to capitalise first character - javascript

For the address field, I need first character of every word to be uppercase. I have been using /\b./g which has eventually resulted in a problem where first character after special characters such as !#*&;' and so on are also capitalised. ie. King'S Street instead of King's Street.
Is there a way to adjust that expression to exclude that behaviour or is changing the entire expression more optimal?

replace \b with (^|[ ])
Your regex will be: /(^|[ ])./g
Explanation:
\b by definition: is used to find a match at the beginning or end of a word.
(^|[ ]) will match with the beginning of the string or any space characters
(^|[ ]). will match every space followed by a character and the first character of the string.
Side note:
Use (^|\s) to match every blank spaces.
Your regex will be: /(^|\s)./g

You could use a lookahead:
\b[a-z](?=\w+)
See a demo on regex101.com.

Related

Finding all words ending in "ion" with regex in JavaScript [duplicate]

I need help putting together a regex that will match word that ends with "Id" with case sensitive match.
Try this regular expression:
\w*Id\b
\w* allows word characters in front of Id and the \b ensures that Id is at the end of the word (\b is word boundary assertion).
Gumbo gets my vote, however, the OP doesn't specify whether just "Id" is an allowable word, which means I'd make a minor modification:
\w+Id\b
1 or more word characters followed by "Id" and a breaking space. The [a-zA-Z] variants don't take into account non-English alphabetic characters. I might also use \s instead of \b as a space rather than a breaking space. It would depend if you need to wrap over multiple lines.
This may do the trick:
\b\p{L}*Id\b
Where \p{L} matches any (Unicode) letter and \b matches a word boundary.
How about \A[a-z]*Id\z? [This makes characters before Id optional. Use \A[a-z]+Id\z if there needs to be one or more characters preceding Id.]
I would use
\b[A-Za-z]*Id\b
The \b matches the beginning and end of a word i.e. space, tab or newline, or the beginning or end of a string.
The [A-Za-z] will match any letter, and the * means that 0+ get matched. Finally there is the Id.
Note that this will match words that have capital letters in the middle such as 'teStId'.
I use http://www.regular-expressions.info/ for regex reference
Regex ids = new Regex(#"\w*Id\b", RegexOptions.None);
\b means "word break" and \w means any word character. So \w*Id\b means "{stuff}Id". By not including RegexOptions.IgnoreCase, it will be case sensitive.

Regex validation space in the middle of a string

I'm looking for an expression that requires a space in a string, it doesn't have to be dead in the middle just not at the end (or start).
I've had a look on google and stack-overflow, there are quite a few but I haven't found one that does what I need.
Here's what I have at the moment
var re = /^[A-Z]\'?[- a-zA-Z]( [a-zA-Z])*$/igm;
Based on the limited requirements you specified, this will do it. It requires a string to contain ONE space, anywhere but at the start or end.
/^[^ ]+ [^ ]+$/
Explanation: anchoring to the beginning of the string, allow one or more non-space characters, followed by a single space, followed by, again, one or more non-space characters, to the end of the string.
[^ ] is a negated character class. That is, it says "anything but the characters inside [ and ].
Your regex should be: /^[A-Z]\'?[-\sa-zA-Z](\s[a-zA-Z])*$/igm;. According to my idea, regex doesn't recognize a whitespace that why I replace those with \s.

How to extract the last word in a string with a JavaScript regex?

I need is the last match. In the case below the word test without the $ signs or any other special character:
Test String:
$this$ $is$ $a$ $test$
Regex:
\b(\w+)\b
The $ represents the end of the string, so...
\b(\w+)$
However, your test string seems to have dollar sign delimiters, so if those are always there, then you can use that instead of \b.
\$(\w+)\$$
var s = "$this$ $is$ $a$ $test$";
document.body.textContent = /\$(\w+)\$$/.exec(s)[1];
If there could be trailing spaces, then add \s* before the end.
\$(\w+)\$\s*$
And finally, if there could be other non-word stuff at the end, then use \W* instead.
\b(\w+)\W*$
In some cases a word may be proceeded by non-word characters, for example, take the following sentence:
Marvelous Marvin Hagler was a very talented boxer!
If we want to match the word boxer all previous answers will not suffice due the fact we have an exclamation mark character proceeding the word. In order for us to ensure a successful capture the following expression will suffice and in addition take into account extraneous whitespace, newlines and any non-word character.
[a-zA-Z]+?(?=\s*?[^\w]*?$)
https://regex101.com/r/D3bRHW/1
We are informing upon the following:
We are looking for letters only, either uppercase or lowercase.
We will expand only as necessary.
We leverage a positive lookahead.
We exclude any word boundary.
We expand that exclusion,
We assert end of line.
The benefit here are that we do not need to assert any flags or word boundaries, it will take into account non-word characters and we do not need to reach for negate.
var input = "$this$ $is$ $a$ $test$";
If you use var result = input.match("\b(\w+)\b") an array of all the matches will be returned next you can get it by using pop() on the result or by doing: result[result.length]
Your regex will find a word, and since regexes operate left to right it will find the first word.
A \w+ matches as many consecutive alphanumeric character as it can, but it must match at least 1.
A \b matches an alphanumeric character next to a non-alphanumeric character. In your case this matches the '$' characters.
What you need is to anchor your regex to the end of the input which is denoted in a regex by the $ character.
To support an input that may have more than just a '$' character at the end of the line, spaces or a period for instance, you can use \W+ which matches as many non-alphanumeric characters as it can:
\$(\w+)\W+$
Avoid regex - use .split and .pop the result. Use .replace to remove the special characters:
var match = str.split(' ').pop().replace(/[^\w\s]/gi, '');
DEMO

String replace exact match in cyrillic

I want to use regex for string replace with Cyrillic characters. I want to use exact match option. My string replace is working with Latin characters and is looking like that:
'Edin'.replace(/\Edin\b/gi, ''); // Output is ""
The same expression is not working with Cyrillic characters
'Един'.replace(/\Един\b/gi, ''); // Output is still 'Един'
The problem here is \b word boundary chracter, which matches position at a word boundary. Word boundary is defined as (^\w|\w$|\W\w|\w\W). And in its turn word character \w is a set of ASCII characters [A-Za-z0-9_]. Obviously Cyrillic characters don't fall into this set.
For example, for the same reason /\w+/ regular expression will not match Cyrillyc string.
As dfsq wrote the problem is with word boundary.
If you remove \b you will get desired output, but it is quite different regex. It will replace Един also in cases where it is a part of word. To avoid that you can use negative lookahead and define which letters shouldn't appear behind, because they could be a part of word.
'Един'.replace(/\Един(?![A-я])/gi, '');

Regex not working as expected

Whats wrong with this regular expression?
/^[a-zA-Z\d\s&#-\('"]{1,7}$/;
when I enter the following valid input, it fails:
a&'-#"2
Also check for 2 consecutive spaces within the input.
The dash needs to be either escaped (\-) or placed at the end of the character class, or it will signify a range (as in A-Z), not a literal dash:
/^[A-Z\d\s&#('"-]{1,7}$/i
would be a better regex.
N. B: [#-\(] would have matched #, $, %, &, ' or (.
To address the added requirement of not allowing two consecutive spaces, use a lookahead assertion:
/^(?!.*\s{2})[A-Z\d\s&#('"-]{1,7}$/i
(?!.*\s{2}) means "Assert that it's impossible to match (from the current position) any string followed by two whitespace characters". One caveat: The dot doesn't match newline characters.
The - (hyphen) has a special meaning inside a character class, used for specifying ranges. Did you mean to escape it?:
/^[a-zA-Z\d\s&#\-\('"]{1,7}$/;
This RegExp matches your input.
You have an unescaped - in the middle of your character class. This means that you're actually searching for all characters between and including # and ( (which are #, $, %, &, ', and (). Either move it to the end or escape it with a backslash. Your regex should read:
/^[a-zA-Z\d\s&#\('"-]{1,7}$/
or
/^[a-zA-Z\d\s&#\-\('"]{1,7}$/
remove the ; at the end and
^[a-zA-Z\d\s\&\#\-\(\'\"]+$
Your input does not match the regular expression. The problem here is the hyphen in you regexp. If you move it from its position after the '#' character to the start of the regex, like so:
/^[-a-zA-Z\d\s&#\('"]{1,7}$/;
everything is fine and dandy.
You can always use Rubular for checking your regular expressions. I use it on a regular (no pun intended) basis.

Categories

Resources