Regular expression to replace words preserving spaces - javascript

I'm trying to develop a function in javascript that get a phrase and processes each word, preserving whiteSpaces. It would be something like this:
properCase(' hi,everyone just testing') => ' Hi,Everyone Just Testing'
I tried a couple of regular expressions but I couldn't find the way to get just the words, apply a function, and replace them without touching the spaces.
I'm trying with
' hi,everyone just testing'.match(/([^\w]*(\w*)[^\w]*)?/g, 'x')
[" hi,", "everyone ", "just ", "testing", ""]
But I can't understand why are the spaces being captured. I just want to capture the (\w*) group. also tried with /(?:[^\w]*(\w*)[^\w]*)?/g and it's the same...

What about something like
' hi,everyone just testing'.replace(/\b[a-z]/g, function(letter) {
return letter.toUpperCase();
});
If you want to process each word, you can use
' hi,everyone just testing'.replace(/\w+/g, function(word) {
// do something with each word like
return word.toUpperCase();
});

When you use the global modifier (g), then the capture groups are basically ignored. The returned array will contain every match of the whole expression. It looks like you just want to match consecutive word characters, in which case \w+ suffices:
>>> ' hi,everyone just testing'.match(/\w+/g)
["hi", "everyone", "just", "testing"]

See here: jsfiddle
function capitaliseFirstLetter(match)
{
return match.charAt(0).toUpperCase() + match.slice(1);
}
var myRe = /\b(\w+)\b/g;
var result = "hi everyone, just testing".replace(myRe,capitaliseFirstLetter);
alert(result);
Matches each word an capitalizes.

I'm unclear about what you're really after. Why is your regex not working? Or how to get it to work? Here's a way to extract words and spaces in your sentence:
var str = ' hi,everyone just testing';
var words = str.split(/\b/); // [" ", "hi", ",", "everyone", " ", "just", " ", "testing"]
words = word.map(function properCase(word){
return word.substr(0,1).toUpperCase() + word.substr(1).toLowerCase();
});
var sentence = words.join(''); // back to original
Note: When doing any string manipulation, replace will be faster, but split/join allows for cleaner, more descriptive code.

Related

How to check if a string contains a WORD in javascript? [duplicate]

This question already has answers here:
How to check if a string contain specific words?
(11 answers)
Closed 3 years ago.
So, you can easily check if a string contains a particular substring using the .includes() method.
I'm interested in finding if a string contains a word.
For example, if I apply a search for "on" for the string, "phones are good", it should return false. And, it should return true for "keep it on the table".
You first need to convert it into array using split() and then use includes()
string.split(" ").includes("on")
Just need to pass whitespace " " to split() to get all words
This is called a regex - regular expression
You can use of 101regex website when you need to work around them (it helps). Words with custom separators aswell.
function checkWord(word, str) {
const allowedSeparator = '\\\s,;"\'|';
const regex = new RegExp(
`(^.*[${allowedSeparator}]${word}$)|(^${word}[${allowedSeparator}].*)|(^${word}$)|(^.*[${allowedSeparator}]${word}[${allowedSeparator}].*$)`,
// Case insensitive
'i',
);
return regex.test(str);
}
[
'phones are good',
'keep it on the table',
'on',
'keep iton the table',
'keep it on',
'on the table',
'the,table,is,on,the,desk',
'the,table,is,on|the,desk',
'the,table,is|the,desk',
].forEach((x) => {
console.log(`Check: ${x} : ${checkWord('on', x)}`);
});
Explaination :
I am creating here multiple capturing groups for each possibily :
(^.*\son$) on is the last word
(^on\s.*) on is the first word
(^on$) on is the only word
(^.*\son\s.*$) on is an in-between word
\s means a space or a new line
const regex = /(^.*\son$)|(^on\s.*)|(^on$)|(^.*\son\s.*$)/i;
console.log(regex.test('phones are good'));
console.log(regex.test('keep it on the table'));
console.log(regex.test('on'));
console.log(regex.test('keep iton the table'));
console.log(regex.test('keep it on'));
console.log(regex.test('on the table'));
You can .split() your string by spaces (\s+) into an array, and then use .includes() to check if the array of strings has your word within it:
const hasWord = (str, word) =>
str.split(/\s+/).includes(word);
console.log(hasWord("phones are good", "on"));
console.log(hasWord("keep it on the table", "on"));
If you are worried about punctuation, you can remove it first using .replace() (as shown in this answer) and then split():
const hasWord = (str, word) =>
str.replace(/[.,\/#!$%\^&\*;:{}=\-_`~()]/g,"").split(/\s+/).includes(word);
console.log(hasWord("phones are good son!", "on"));
console.log(hasWord("keep it on, the table", "on"));
You can split and then try to find:
const str = 'keep it on the table';
const res = str.split(/[\s,\?\,\.!]+/).some(f=> f === 'on');
console.log(res);
In addition, some method is very efficient as it will return true if any predicate is true.
You can use .includes() and check for the word. To make sure it is a word and not part of another word, verify that the place you found it in is followed by a space, comma, period, etc and also has one of those before it.
A simple version could just be splitting on the whitespace and looking through the resulting array for the word:
"phones are good".split(" ").find(word => word === "on") // undefined
"keep it on the table".split(" ").find(word => word === "on") // "on"
This just splits by whitespace though, when you need parse text (depending on your input) you'll encounter more word delimiters than whitespace. In that case you could use a regex to account for these characters.
Something like:
"Phones are good, aren't they? They are. Yes!".split(/[\s,\?\,\.!]+/)
I would go with the following assumptions:
Words the start of a sentence always have a trailing space.
Words at the end of a sentence always have a preceding space.
Words in the middle of a sentence always have a trailing and preceding space.
Therefore, I would write my code as follows:
function containsWord(word, sentence) {
return (
sentence.startsWith(word.trim() + " ") ||
sentence.endsWith(" " + word.trim()) ||
sentence.includes(" " + word.trim() + " "));
}
console.log(containsWord("test", "This is a test of the containsWord function."));
Try the following -
var mainString = 'codehandbook'
var substr = /hand/
var found = substr.test(mainString)
if(found){
console.log('Substring found !!')
} else {
console.log('Substring not found !!')
}

How to replace found regex sub string with spaces with equal length in javascript?

In javascript if I have something like
string.replace(new RegExp(regex, "ig"), " ")
this replaces all found regexes with a single space. But how would I do it if I wanted to replace all found regexes with spaces that matched in length?
so if regex was \d+, and the string was
"123hello4567"
it changes to
" hello "
Thanks
The replacement argument (2nd) to .replace can be a function - this function is called in turn with every matching part as the first argument
knowing the length of the matching part, you can return the same number of spaces as the replacement value
In the code below I use . as a replacement value to easily illustrate the code
Note: this uses String#repeat, which is not available in IE11 (but then, neither are arrow functions) but you can always use a polyfill and a transpiler
let regex = "\\d+";
console.log("123hello4567".replace(new RegExp(regex, "ig"), m => '.'.repeat(m.length)));
Internet Exploder friendly version
var regex = "\\d+";
console.log("123hello4567".replace(new RegExp(regex, "ig"), function (m) {
return Array(m.length+1).join('.');
}));
thanks to #nnnnnn for the shorter IE friendly version
"123hello4567".replace(new RegExp(/[\d]/, "ig"), " ")
1 => " "
2 => " "
3 => " "
" hello "
"123hello4567".replace(new RegExp(/[\d]+/, "ig"), " ")
123 => " "
4567 => " "
" hello "
If you just want to replace every digit with a space, keep it simple:
var str = "123hello4567";
var res = str.replace(/\d/g,' ');
" hello "
This answers your example, but not exactly your question. What if the regex could match on different numbers of spaces depending on the string, or it isn't as simple as /d more than once? You could do something like this:
var str = "123hello456789goodbye12and456hello12345678again123";
var regex = /(\d+)/;
var match = regex.exec(str);
while (match != null) {
// Create string of spaces of same length
var replaceSpaces = match[0].replace(/./g,' ');
str = str.replace(regex, replaceSpaces);
match = regex.exec(str);
}
" hello goodbye and hello again "
Which will loop through executing the regex (instead of using /g for global).
Performance wise this could likely be sped up by creating a new string of spaces with the length the same length as match[0]. This would remove the regex replace within the loop. If performance isn't a high priority, this should work fine.

Separating words with Regex

I am trying to get this result: 'Summer-is-here'. Why does the code below generate extra spaces? (Current result: '-Summer--Is- -Here-').
function spinalCase(str) {
var newA = str.split(/([A-Z][a-z]*)/).join("-");
return newA;
}
spinalCase("SummerIs Here");
You are using a variety of split where the regexp contains a capturing group (inside parentheses), which has a specific meaning, namely to include all the splitting strings in the result. So your result becomes:
["", "Summer", "", "Is", " ", "Here", ""]
Joining that with - gives you the result you see. But you can't just remove the unnecessary capture group from the regexp, because then the split would give you
["", "", " ", ""]
because you are splitting on zero-width strings, due to the * in your regexp. So this doesn't really work.
If you want to use split, try splitting on zero-width or space-only matches looking ahead to a uppercase letter:
> "SummerIs Here".split(/\s*(?=[A-Z])/)
^^^^^^^^^ LOOK-AHEAD
< ["Summer", "Is", "Here"]
Now you can join that to get the result you want, but without the lowercase mapping, which you could do with:
"SummerIs Here" .
split(/\s*(?=[A-Z])/) .
map(function(elt, i) { return i ? elt.toLowerCase() : elt; }) .
join('-')
which gives you want you want.
Using replace as suggested in another answer is also a perfectly viable solution. In terms of best practices, consider the following code from Ember:
var DECAMELIZE_REGEXP = /([a-z\d])([A-Z])/g;
var DASHERIZE_REGEXP = /[ _]/g;
function decamelize(str) {
return str.replace(DECAMELIZE_REGEXP, '$1_$2').toLowerCase();
}
function dasherize(str) {
return decamelize(str).replace(DASHERIZE_REGEXP, '-');
}
First, decamelize puts an underscore _ in between two-character sequences of lower-case letter (or digit) and upper-case letter. Then, dasherize replaces the underscore with a dash. This works perfectly except that it lower-cases the first word in the string. You can sort of combine decamelize and dasherize here with
var SPINALIZE_REGEXP = /([a-z\d])\s*([A-Z])/g;
function spinalCase(str) {
return str.replace(SPINALIZE_REGEXP, '$1-$2').toLowerCase();
}
You want to separate capitalized words, but you are trying to split the string on capitalized words that's why you get those empty strings and spaces.
I think you are looking for this :
var newA = str.match(/[A-Z][a-z]*/g).join("-");
([A-Z][a-z]*) *(?!$|[a-z])
You can simply do a replace by $1-.See demo.
https://regex101.com/r/nL7aZ2/1
var re = /([A-Z][a-z]*) *(?!$|[a-z])/g;
var str = 'SummerIs Here';
var subst = '$1-';
var result = str.replace(re, subst);
var newA = str.split(/ |(?=[A-Z])/).join("-");
You can change the regex like:
/ |(?=[A-Z])/ or /\s*(?=[A-Z])/
Result:
Summer-Is-Here

Regex to replace all but the last non-breaking space if multiple words are joined?

Using javascript (including jQuery), I’m trying to replace all but the last non-breaking space if multiple words are joined.
For example:
Replace A String of Words with A String of Words
I think you want something like this,
> "A String of Words".replace(/ (?=.*? )/g, " ")
'A String of Words'
The above regex would match all the   strings except the last one.
Assuming your string is like this, you can use Negative Lookahead to do this.
var r = 'A String of Words'.replace(/ (?![^&]*$)/g, ' ');
//=> "A String of Words"
Alternative to regex, easier to understand:
var fn = function(input, sep) {
var parts = input.split(sep);
var last = parts.pop();
return parts.join(" ") + sep + last;
};
> fn("A String of Words", " ")
"A String of Words"

Javascript Regular Expression for multi-line string enclosed in quotes, possibly containing quotes

I would like to write a javascript regular expression that will match match multi-line strings contained in quotes that possibly also contain quotes. The final quote will be terminated with a comma.
For example:
"some text between the quotes including " characters",
This sting starts with a ", ends with ", and contains " characters.
How do I get this to work?
I guess the real question is How do i match a multi-line string that starts with " and ends with ",??
Doesn't a simple match() work ? You also need to use the \s\S trick to make the dot include line breaks (actually, that makes it accept every single characters ever):
var str = "bla bla \"some text between the quotes \n including \" characters\", bla bla";
var result = str.match(/"([\s\S]+)",/);
if (result == null) {
// no result was found
} else {
result = result[1];
// some text between the quotes
// including " characters
}
Match many non-", or " not followed by ,:
/"((?:[^"]|"(?!,))*)",/
or use lazy quantifier:
/"([\0-\uffff]*?)",/
Using a regular expression will be really tricky, I would try something like this:
var getQuotation = function(s) {
var i0 = s.indexOf('"')
, i1 = s.indexOf('",');
return (i0 && i1) ? s.slice(i0+1, i1) : undefined;
};
var s = "He said, \"this is a test of\n" +
"the \"emergency broadcast\" system\", obviously.";
getQuotation(s); // => 'this is a test of
// the "emergency broadcast" system'

Categories

Resources