I am trying to add the correct white space for data i am receiving. currently it shows like this
NotStarted
ReadyforPPPDReview
this is the code i am using
.replace(/([A-Z])/g, '$1')
"NotStarted" shows correct "Not Started" but "ReadyforPPPDReview" shows "Readyfor P P P D Review" when it should look like this "Ready for PPPD Review"
what is the best way to handle both of these using one regex or function?
You would need an NLP engine to handle this properly. Here are two approaches with simple regex, both have limitations:
1. Use list of stop words
We blindly add spaces before and after the stop words:
var str = 'NotStarted, ReadyforPPPDReview';
var wordList = 'and, for, in, on, not, review, the'; // stop words
var wordListRe = new RegExp('(' + wordList.replace(/, */g, '|') + ')', 'gi');
var result1 = str
.replace(wordListRe, ' $1 ') // add space before and after stop words
.replace(/([a-z])([A-Z])/g, '$1 $2') // add space between lower case and upper case chars
.replace(/ +/g, ' ') // remove excessive spaces
.trim(); // remove spaces at start and end
console.log('str: ' + str);
console.log('result1: ' + result1);
As you can imagine the stop words approach has some severe limitations. For example, words formula input would result in for mula in put.
1. Use a mapping table
The mapping table lists words that need to be spaced out (no drugs involved), as in this code snippet:
var str = 'NotStarted, ReadyforPPPDReview';
var spaceWordMap = {
NotStarted: 'Not Started',
Readyfor: 'Ready for',
PPPDReview: 'PPPD Review'
// add more as needed
};
var spaceWordMapRe = new RegExp('(' + Object.keys(spaceWordMap).join('|') + ')', 'gi');
var result2 = str
.replace(spaceWordMapRe, function(m, p1) { // m: matched snippet, p1: first group
return spaceWordMap[p1] // replace key in spaceWordMap with its value
})
.replace(/([a-z])([A-Z])/g, '$1 $2') // add space between lower case and upper case chars
.replace(/ +/g, ' ') // remove excessive spaces
.trim(); // remove spaces at start and end
console.log('str: ' + str);
console.log('result2: ' + result2);
This approach is suitable if you have a deterministic list of words as input.
Related
I am working on a code to transform a string of text into a Sentence case which would also retain Acronyms. I did explore similar posts in StackOverflow, however, I couldn't find the one which suits my requirement.
I have already achieved the transformation of Acronyms and the first letter in the sentence. however, I ran into other issues like some letters in the sentence are still in Uppercase, especially texts in and after Double Quotes (" ") and camelcase texts.
Below is the code I am currently working on, I would need someone to help me Optimize the code and to fix the issues.
String.prototype.toSentenceCase = function() {
var i, j, str, lowers, uppers;
str = this.replace(/(^\w{1}|\.\s*\w{1})/gi, function(txt) {
return txt.charAt(0).toUpperCase() + txt.substr(1).toLowerCase();
});
// Certain words such as initialisms or acronyms should be left uppercase
uppers = ['Id', 'Tv', 'Nasa', 'Acronyms'];
for (i = 0, j = uppers.length; i < j; i++)
str = str.replace(new RegExp('\\b' + uppers[i] + '\\b', 'g'),
uppers[i].toUpperCase());
// To remove Special caharacters like ':' and '?'
str = str.replace(/[""]/g,'');
str = str.replace(/[?]/g,'');
str = str.replace(/[:]/g,' - ');
return str;
}
Input: play around: This is a "String" Of text, which needs to be cONVERTED to Sentence Case at the same time keeping the Acronyms as it is like Nasa.
Current Output: Play around - This is a String Of text, which needs to be cONVERTED to Sentence Case at the same time keeping the ACRONYMS as it is like NASA.
Expected Output: Play around - this is a string of text, which needs to be converted to sentence case at the same time keeping the ACRONYMS as it is like NASA.
Here's a runnable version of the initial code (I have slightly modified the input string):
String.prototype.toSentenceCase = function() {
var i, j, str, lowers, uppers;
str = this.replace(/(^\w{1}|\.\s*\w{1})/gi, function(txt) {
return txt.charAt(0).toUpperCase() + txt.substr(1).toLowerCase();
});
// Certain words such as initialisms or acronyms should be left uppercase
uppers = ['Id', 'Tv', 'Nasa', 'Acronyms'];
for (i = 0, j = uppers.length; i < j; i++)
str = str.replace(new RegExp('\\b' + uppers[i] + '\\b', 'g'),
uppers[i].toUpperCase());
// To remove Special caharacters like ':' and '?'
str = str.replace(/[""]/g,'');
str = str.replace(/[?]/g,'');
str = str.replace(/[:]/g,' - ');
return str;
}
const input = `play around: This is a "String" Of text, which needs to be cONVERTED to Sentence Case at the same time keeping the Acronyms as it is like Nasa. another sentence. "third" sentence starting with a quote.`
const result = input.toSentenceCase()
console.log(result)
I ran into other issues like some letters in the sentence are still in Uppercase, especially texts in and after Double Quotes (" ") and camelcase texts.
Some letters remain uppercased because you are not calling .toLowerCase() anywhere in your code. Expect in the beginning, but that regex is targetingonly the initial letters of sentences, not other letters.
It can be helpful to first lowercase all letters, and then uppercase some letters (acronyms and initial letters of sentences). So, let's call .toLowerCase() in the beginning:
String.prototype.toSentenceCase = function() {
var i, j, str, lowers, uppers;
str = this.toLowerCase();
// ...
return str;
}
Next, let's take a look at this regex:
/(^\w{1}|\.\s*\w{1})/gi
The parentheses are unnecessary, because the capturing group is not used in the replacer function. The {1} quantifiers are also unnecessary, because by default \w matches only one character. So we can simplify the regex like so:
/^\w|\.\s*\w/gi
This regex finds two matches from the input string:
p
. a
Both matches contain only one letter (\w), so in the replacer function, we can safely call txt.toUpperCase() instead of the current, more complex expression (txt.charAt(0).toUpperCase() + txt.substr(1).toLowerCase()). We can also use an arrow function:
String.prototype.toSentenceCase = function() {
var i, j, str, lowers, uppers;
str = this.toLowerCase();
str = str.replace(/^\w|\.\s*\w/gi, (txt) => txt.toUpperCase());
// ...
return str;
}
However, the initial letter of the third sentence is not uppercased because the sentence starts with a quote. Because we are anyway going to remove quotes and question marks, let's do it at the beginning.
Let's also simplify and combine the regexes:
// Before
str = str.replace(/[""]/g,'');
str = str.replace(/[?]/g,'');
str = str.replace(/[:]/g,' - ');
// After
str = str.replace(/["?]/g,'');
str = str.replace(/:/g,' - ');
So:
String.prototype.toSentenceCase = function() {
var i, j, str, lowers, uppers;
str = this;
str = str.toLowerCase();
str = str.replace(/["?]/g,'');
str = str.replace(/:/g,' - ');
str = str.replace(/^\w|\.\s*\w/gi, (txt) => txt.toUpperCase());
// ...
return str;
}
Now the initial letter of the third sentence is correctly uppercased. That's because when we are uppercasing the initial letters, the third sentence doesn't start with a quote anymore (because we have removed the quote).
What's left is to uppercase acronyms. In your regex, you probably want to use the i flag as well for case-insensitive matches.
Instead of using a for loop, it's possible to use a single regex to look for all matches and uppercase them. This allows us to get rid of most of the variables as well. Like so:
String.prototype.toSentenceCase = function() {
var str;
str = this;
str = str.toLowerCase();
str = str.replace(/["?]/g,'');
str = str.replace(/:/g,' - ');
str = str.replace(/^\w|\.\s*\w/gi, (txt) => txt.toUpperCase());
str = str.replace(/\b(id|tv|nasa|acronyms)\b/gi, (txt) => txt.toUpperCase());
return str;
}
And looks like we are now getting correct results!
Three more things, though:
Instead of creating and mutating the str variable, we can modify this and chain the method calls.
It might make sense to rename the txt variables to match variables, since they are regex matches.
Modifying a built-in object's prototype is a bad idea. Creating a new function is a better idea.
Here's the final code:
function convertToSentenceCase(str) {
return str
.toLowerCase()
.replace(/["?]/g, '')
.replace(/:/g, ' - ')
.replace(/^\w|\.\s*\w/gi, (match) => match.toUpperCase())
.replace(/\b(id|tv|nasa|acronyms)\b/gi, (match) => match.toUpperCase())
}
const input = `play around: This is a "String" Of text, which needs to be cONVERTED to Sentence Case at the same time keeping the Acronyms as it is like Nasa. another sentence. "third" sentence starting with a quote.`
const result = convertToSentenceCase(input)
console.log(result)
I have a combination of replace methods. How can I convert them to one:
.replace(/\s+/g, " ").replace(/\,|\?|\!|\:|\./g,'').replace("'", "_")
Is there any solution?
It's possible with a replacer function which alternates between the different possibilities, captures the matching subpattern, and checks which subpattern was matched in the replacer function, but it's really ugly. Your current solution is much easier to read.
const string = ' here is multiple spaces consolidated, punctuation removed!! and apostrophes don\'t exist! ';
const result = string
.replace(
/(\s+)|(\,|\?|\!|\:|\.)|(')/g,
(match, g1, g2, g3) => (
g1 ? ' ' :
g2 ? '' :
'_'
)
);
console.log(result);
I'd use your original version with a slight tweak: use a character set in the second replace instead, it'll be easier to read.
const string = ' here is multiple spaces consolidated, punctuation removed!! and apostrophes don\'t exist! ';
const result = string
.replace(/\s+/g, " ")
.replace(/[,?!:.]/g,'')
.replace("'", "_");
console.log(result);
I want to replace second space occurrence of the sentence with a br.
I have tried this but it is deleting the rest.
var title = "My Title Needs Brace".split(" ").slice(0, 2).join(" ");
That will do the trick:
"My Title Needs Brace"
.split(' ')
.reduce(function (str, part, i) {
return str + (i === 2 ? '<br/>' : ' ') + part
});
// "My Title<br/>Needs Brace"
Let's break it and see how it works:
First, we take the string and split it. we'll use " " as our separator
"My Title Needs Brace".split(' ')
// ["My", "Title", "Needs", "Brace"]
Second, we'll use reduce to combine the array back into one string
["My", "Title", "Needs", "Brace"]
.reduce(function (str, part) { return str + ' ' + part }, '');
// "My Title Needs Brace"
Why reduce and not join?
The advantage of reduce over join is that it allows us to use a function, which will give us a fine-grained control over how we join back each part of the string
Now, all that left is to replace the 2nd space with <br/>,
for that, we'll use the 3rd argument of the reduce function, which stands for the index, and ask:
is this the 3rd part? use <br/>
otherwise, use " "
"My Title Needs Brace"
.split(' ')
.reduce(function (str, part, i) {
return str + (i === 2 ? '<br/>' : ' ') + part
});
// "My Title<br/>Needs Brace"
Note that this is the index of the string "part", not the spaces between them so the index is 2, not 1.
More about:
split
reduce
join
Try the following:
var title = "My Title Needs Brace".split(" ");
title.forEach(function(item, i, title){
if(i==1)
title[i] += "<br/>";
else
title[i] += ' ';
})
console.log(title.join(''));
I want to replace second space occurrence of the sentence with a br.
The simple way to do that is to add "<br/>" to the second element.
Here is the Code.
$(document).ready(function(){
var title = "My Title Needs Brace".split(" ");
title[1] = title[1]+"<br/>";
var newstr = title.join(" ");
$("#textd").html(newstr);
});
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<div id="textd">
</div>
maybe that will help :
var title = "My Title Needs Brace".split(" ");
t1=title [0]; My
t2=title[1]; // Title
t3=title[2]; // Needs
t4=title[3]; // Brace
you can drew here anything :
var htmlString = '' + t1 +''+ t2 + '<br />' + t3 +''+ t4 + '';
$('Anywhere').append(htmlString);
You can do this without splitting the string:
var title = 'My Title Needs Brace'.replace(/( .*?) /, '$1<br>');
Here, String.replace takes a RegExp and a string as arguments. The regex matches everything from the first space up through the second space, keeping everything except the second space in a capturing group. The string replaces the entire match with the contents of the capturing group, followed by '<br>'. Since the capturing group doesn't include the second space, this effectively only replaces the second space.
In javascript if I have something like
string.replace(new RegExp(regex, "ig"), " ")
this replaces all found regexes with a single space. But how would I do it if I wanted to replace all found regexes with spaces that matched in length?
so if regex was \d+, and the string was
"123hello4567"
it changes to
" hello "
Thanks
The replacement argument (2nd) to .replace can be a function - this function is called in turn with every matching part as the first argument
knowing the length of the matching part, you can return the same number of spaces as the replacement value
In the code below I use . as a replacement value to easily illustrate the code
Note: this uses String#repeat, which is not available in IE11 (but then, neither are arrow functions) but you can always use a polyfill and a transpiler
let regex = "\\d+";
console.log("123hello4567".replace(new RegExp(regex, "ig"), m => '.'.repeat(m.length)));
Internet Exploder friendly version
var regex = "\\d+";
console.log("123hello4567".replace(new RegExp(regex, "ig"), function (m) {
return Array(m.length+1).join('.');
}));
thanks to #nnnnnn for the shorter IE friendly version
"123hello4567".replace(new RegExp(/[\d]/, "ig"), " ")
1 => " "
2 => " "
3 => " "
" hello "
"123hello4567".replace(new RegExp(/[\d]+/, "ig"), " ")
123 => " "
4567 => " "
" hello "
If you just want to replace every digit with a space, keep it simple:
var str = "123hello4567";
var res = str.replace(/\d/g,' ');
" hello "
This answers your example, but not exactly your question. What if the regex could match on different numbers of spaces depending on the string, or it isn't as simple as /d more than once? You could do something like this:
var str = "123hello456789goodbye12and456hello12345678again123";
var regex = /(\d+)/;
var match = regex.exec(str);
while (match != null) {
// Create string of spaces of same length
var replaceSpaces = match[0].replace(/./g,' ');
str = str.replace(regex, replaceSpaces);
match = regex.exec(str);
}
" hello goodbye and hello again "
Which will loop through executing the regex (instead of using /g for global).
Performance wise this could likely be sped up by creating a new string of spaces with the length the same length as match[0]. This would remove the regex replace within the loop. If performance isn't a high priority, this should work fine.
I'm trying to develop a function in javascript that get a phrase and processes each word, preserving whiteSpaces. It would be something like this:
properCase(' hi,everyone just testing') => ' Hi,Everyone Just Testing'
I tried a couple of regular expressions but I couldn't find the way to get just the words, apply a function, and replace them without touching the spaces.
I'm trying with
' hi,everyone just testing'.match(/([^\w]*(\w*)[^\w]*)?/g, 'x')
[" hi,", "everyone ", "just ", "testing", ""]
But I can't understand why are the spaces being captured. I just want to capture the (\w*) group. also tried with /(?:[^\w]*(\w*)[^\w]*)?/g and it's the same...
What about something like
' hi,everyone just testing'.replace(/\b[a-z]/g, function(letter) {
return letter.toUpperCase();
});
If you want to process each word, you can use
' hi,everyone just testing'.replace(/\w+/g, function(word) {
// do something with each word like
return word.toUpperCase();
});
When you use the global modifier (g), then the capture groups are basically ignored. The returned array will contain every match of the whole expression. It looks like you just want to match consecutive word characters, in which case \w+ suffices:
>>> ' hi,everyone just testing'.match(/\w+/g)
["hi", "everyone", "just", "testing"]
See here: jsfiddle
function capitaliseFirstLetter(match)
{
return match.charAt(0).toUpperCase() + match.slice(1);
}
var myRe = /\b(\w+)\b/g;
var result = "hi everyone, just testing".replace(myRe,capitaliseFirstLetter);
alert(result);
Matches each word an capitalizes.
I'm unclear about what you're really after. Why is your regex not working? Or how to get it to work? Here's a way to extract words and spaces in your sentence:
var str = ' hi,everyone just testing';
var words = str.split(/\b/); // [" ", "hi", ",", "everyone", " ", "just", " ", "testing"]
words = word.map(function properCase(word){
return word.substr(0,1).toUpperCase() + word.substr(1).toLowerCase();
});
var sentence = words.join(''); // back to original
Note: When doing any string manipulation, replace will be faster, but split/join allows for cleaner, more descriptive code.