Regex break words instead of matching entire words - javascript

I use the function below to linkify usernames that:
start with a letter a-z or number 0-9
contains letters a-z, numbers 0-9 and "-"
have a length of 2-50 characters.
.
function linkifyUsernames(text){
return text.replace(/#\b([0-9a-z-]{2,49})\b/ig,
"<a href='/profile/$1' target='_blank'>#$1</a>");
}
The function above works OK but the only problem is that it breaks words. For example,
#abcdéíú
The function linkifies the first part of the word
#abcdéíú
but I need a function that does not convert to links any words that start with # but contains other characters than a-z0-9-. So, the word #abcdéíú must stay untauched.
Word bondaries for some reason don't help.

This doesn't fall between the [0-9a-z] range, try this range [a-z\u00E0-\u00FC], shown in the following example:
https://regex101.com/r/vW2mR9/1

function linkifyUsernames(text) {
var pattern = /^(#)(\w{2,12})/ig;
if(text.match(pattern).length>0 && text.match(pattern)[0]=== text)
return text.replace(pattern, "<a href='/profile/$2' target='_blank'>#$2</a>");
else return text;
}
$('div').append(linkifyUsernames('#abcd'))
$('div').append('<br/>')
$('div').append(linkifyUsernames('#abcdéáú'))
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.0/jquery.min.js"></script>
<div>
</div>
The a-z will not cover the special characters like à,é.
You can simplify using \w notation like:
/^(#)(\w{2,12})/ig
Here example:
https://regex101.com/r/nZ7uI7/1

As stated on: http://www.regular-expressions.info/wordboundaries.html
There are three different positions that qualify as word boundaries:
Before the first character in the string, if the first character is a word character.
After the last character in the string, if the last character is a word character.
Between two characters in the string, where one is a word character and the other is not a word character.
If you try your sample string
#abcdéíú
with just a \w which is meant to match any word character you'll see that déíúare not word characters in regex word. So even if you use this [a-z\u00E0-\u00FC] from the prev answer the regex fail due to \b.

Related

covert string to spinal case when words in string are not separated by spaces

spinal case is separating words by dashes. i have the following code that works if words are separated by spaces, but not if you have a string where words are NOT separated by spaces like so: "ThisIsSpinalCase", which should return "this-is-spinal-case". can't think of a way to recognize every new word in a str. suggestions?
function spinalCase(str) {
return str.replace(/[\s\W_]/g, "-").toLowerCase();
​
}
​
spinalCase('This_is spinal case'); // returns this-is-spinal-case
edit: i realize i can probably check for when there is a new uppercase letter but this would require adding a space between the last word and the next word
The process is like this:
First letter doesn't change anything, so let it out.
In the rest of your string, you have two conditions:
You just have a space or an underscore, then you should replace it with a dash.
You have a capital letter, then you should replace it with a dash followed by that letter.
However, you can solve these two conditions, with one replace, by using this call of replace replace(/(([A-Z])|[\s_])+/g, "-$2"), this means that:
If the regex matches a space or underscore, it'll replace it by just a dash ($2 will match nothing, because it doesn't match a capital letter).
If the regex matches a capital letter, it'll replace it by a dash followed by that letter ($2 will be have the value of that letter).
We concatenate the first letter with the rest (after replacing).
We lower-case the result.
Done!
Example:
function spinalCase(str) {
return (str[0] + str.substr(1).replace(/(([A-Z])|[\s_])+/g, "-$2")).toLowerCase();
}

Character matching

I need to match a string to have words and not numbers, I need to match special characters such as $!:{}_ if they are part of the word but ignore otherwise.
I have a regex that matches for word and ignores numbers but cannot work out how to match special characters if they are part of the word but ignore otherwise.
Here is what I have correctly - /^d\s+/
Any help would be appreciated.
Allow words and letters only, and ignore any numbers. Special
characters such as (-_‘[]{}“£$&%!:;/) should either be ignored or
treated as part of the word they sit within.
Try using String.prototype.replace() with RegExp /\s\d+.*\d|\s+[^a-z]/ig to replace space character followed by digit followed by any character followed by digit , or space character followed by any character not a-z case insensitive
var str = "This is a test of 1,2,3 word-count - test.";
str = str.replace(/\s\d+.*\d|\s+[^a-z]/ig, "");
document.body.textContent = str;

How to extract the last word in a string with a JavaScript regex?

I need is the last match. In the case below the word test without the $ signs or any other special character:
Test String:
$this$ $is$ $a$ $test$
Regex:
\b(\w+)\b
The $ represents the end of the string, so...
\b(\w+)$
However, your test string seems to have dollar sign delimiters, so if those are always there, then you can use that instead of \b.
\$(\w+)\$$
var s = "$this$ $is$ $a$ $test$";
document.body.textContent = /\$(\w+)\$$/.exec(s)[1];
If there could be trailing spaces, then add \s* before the end.
\$(\w+)\$\s*$
And finally, if there could be other non-word stuff at the end, then use \W* instead.
\b(\w+)\W*$
In some cases a word may be proceeded by non-word characters, for example, take the following sentence:
Marvelous Marvin Hagler was a very talented boxer!
If we want to match the word boxer all previous answers will not suffice due the fact we have an exclamation mark character proceeding the word. In order for us to ensure a successful capture the following expression will suffice and in addition take into account extraneous whitespace, newlines and any non-word character.
[a-zA-Z]+?(?=\s*?[^\w]*?$)
https://regex101.com/r/D3bRHW/1
We are informing upon the following:
We are looking for letters only, either uppercase or lowercase.
We will expand only as necessary.
We leverage a positive lookahead.
We exclude any word boundary.
We expand that exclusion,
We assert end of line.
The benefit here are that we do not need to assert any flags or word boundaries, it will take into account non-word characters and we do not need to reach for negate.
var input = "$this$ $is$ $a$ $test$";
If you use var result = input.match("\b(\w+)\b") an array of all the matches will be returned next you can get it by using pop() on the result or by doing: result[result.length]
Your regex will find a word, and since regexes operate left to right it will find the first word.
A \w+ matches as many consecutive alphanumeric character as it can, but it must match at least 1.
A \b matches an alphanumeric character next to a non-alphanumeric character. In your case this matches the '$' characters.
What you need is to anchor your regex to the end of the input which is denoted in a regex by the $ character.
To support an input that may have more than just a '$' character at the end of the line, spaces or a period for instance, you can use \W+ which matches as many non-alphanumeric characters as it can:
\$(\w+)\W+$
Avoid regex - use .split and .pop the result. Use .replace to remove the special characters:
var match = str.split(' ').pop().replace(/[^\w\s]/gi, '');
DEMO

JS & Regex: how to replace punctuation pattern properly?

Given an input text such where all spaces are replaced by n _ :
Hello_world_?. Hello_other_sentenc3___. World___________.
I want to keep the _ between words, but I want to stick each punctuation back to the last word of a sentence without any space between last word and punctuation. I want to use the the punctuation as pivot of my regex.
I wrote the following JS-Regex:
str = str.replace(/(_| )*([:punct:])*( |_)/g, "$2$3");
This fails, since it returns :
Hello_world_?. Hello_other_sentenc3_. World_._
Why it doesn't works ? How to delete all "_" between the last word and the punctuation ?
http://jsfiddle.net/9c4z5/
Try the following regex, which makes use of a positive lookahead:
str = str.replace(/_+(?=\.)/g, "");
It replaces all underscores which are immediately followed by a punctuation character with the empty string, thus removing them.
If you want to match other punctuation characters than just the period, replace the \. part with an appropriate character class.
JavaScript doesn't have :punct: in its regex implementation. I believe you'd have to list out the punctuation characters you care about, perhaps something like this:
str = str.replace(/(_| )+([.,?])/g, "$2");
That is, replace any group of _ or space that is immediately followed by punctation with just the punctuation.
Demo: http://jsfiddle.net/9c4z5/2/

Regular Expression for alphabets with spaces

I need help with regular expression. I need a expression which allows only alphabets with space for ex. college name.
I am using :
var regex = /^[a-zA-Z][a-zA-Z\\s]+$/;
but it's not working.
Just add the space to the [ ] :
var regex = /^[a-zA-Z ]*$/;
This is the better solution as it forces the input to start with an alphabetic character. The accepted answer is buggy as it does not force the input to start with an alphabetic character.
[a-zA-Z][a-zA-Z ]+
This will allow space between the characters and not allow numbers or special characters. It will also not allow the space at the start and end.
[a-zA-Z][a-zA-Z ]+[a-zA-Z]$
This will accept input with alphabets with spaces in between them but not only spaces. Also it works for taking single character inputs.
[a-zA-Z]+([\s][a-zA-Z]+)*
Special Characters & digits Are Not Allowed.
Spaces are only allowed between two words.
Only one space is allowed between two words.
Spaces at the start or at the end are consider to be invalid.
Single word name is also valid : ^[a-zA-z]+([\s][a-zA-Z]+)*$
Single word name is in-valid : ^[a-zA-z]+([\s][a-zA-Z]+)+$
Regular expression starting with lower case or upper case alphabets but not with space and can have space in between the alphabets is following.
/^[a-zA-Z][a-zA-Z ]*$/
This worked for me
/[^a-zA-Z, ]/
This will work too,
it will accept only the letters and space without any symbols and numbers.
^[a-zA-z\s]+$
^ asserts position at start of the string Match a single character
present in the list below [a-zA-z\s]
matches the previous token between one and unlimited times, as many times as possible, giving back as needed (greedy) a-z matches a single
character in the range between a (index 97) and z (index 122) (case
sensitive) A-z matches a single character in the range between A
(index 65) and z (index 122) (case sensitive) \s matches any
whitespace character (equivalent to [\r\n\t\f\v ]) $ asserts position
at the end of the string, or before the line terminator right at the
end of the string (if any)
This worked for me, simply type in javascript regex validation
/[A-Za-z ]/
This one "^[a-zA-Z ]*$" is wrong because it allows space as a first character and also allows only space as a name.
This will work perfectly. It will not allow space as a first character.
pattern = "^[A-Za-z]+[A-Za-z ]*$"
This works for me
function validate(text) {
let reg = /^[A-Za-z ]+$/; // valid alphabet with space
return reg.test(text);
}
console.log(validate('abcdef')); //true
console.log(validate('abcdef xyz')); //true
console.log(validate('abc def xyz')); //true
console.log(validate('abcdef123')); //false
console.log(validate('abcdef!.')); //false
console.log(validate('abcdef#12 3')); //false
This will restrict space as first character
FilteringTextInputFormatter.allow(RegExp('^[a-zA-Z][a-zA-Z ]*')),
This will work for not allowing spaces at beginning and accepts characters, numbers, and special characters
/(^\w+)\s?/

Categories

Resources