JS str replace Unicode aware

JS str replace Unicode aware - javascript

I am sure there is probably a dupe of this here somewhere, but if so I cannot seem to find it, nor can I glue the pieces together correctly from what I could find to get what I need. I am using JavaScript and need the following:
1) Replace the first character of a string with it's Unicode aware capitalization UNLESS the next (second) character is a - OR ` or ' (minus/dash, caret, or single-quote).
I have come close with what I could find except for getting the caret and single quote included (assuming they need to be escaped somehow) and what I believe to be a scope issue with the following because first returns undefined. I am also not positive which JS/String functions are Unicode aware:
autoCorrect = (str) => {
return str.replace(/^./, function(first) {
// if next char is not - OR ` OR ' <- not sure how to handle caret and quote
if(str.charAt(1) != '-' ) {
return first.toUpperCase(); // first is undefined here - scope??
}
});
}
Any help is appreciated!

Internally, JavaScript uses UCS-2, not UTF-8.
Handling Unicode in JavaScript isn't particularly beautiful, but possible. It becomes particularly ugly with surrogate pairs such as "🐱", but the for..of loop can handle that. Do never try to use indices on Unicode strings, as you might get only one half of a surrogate pair (which breaks Unicode).
This should handle Unicode well and do what you want:
function autoCorrect(string) {
let i = 0, firstSymbol;
const blacklist = ["-", "`", "'"];
for (const symbol of string) {
if (i === 0) {
firstSymbol = symbol;
}
else if (i === 1 && blacklist.some(char => char === symbol)) {
return string;
}
else {
const rest = string.substring(firstSymbol.length);
return firstSymbol.toUpperCase() + rest;
}
++i;
}
return string.toUpperCase();
}
Tests
console.assert(autoCorrect("δα") === "Δα");
console.assert(autoCorrect("🐱") === "🐱");
console.assert(autoCorrect("d") === "D");
console.assert(autoCorrect("t-minus-one") === "t-minus-one");
console.assert(autoCorrect("t`minus`one") === "t`minus`one");
console.assert(autoCorrect("t'minus'one") === "t'minus'one");
console.assert(autoCorrect("t^minus^one") === "T^minus^one");
console.assert(autoCorrect("t_minus_one") === "T_minus_one");

Related

Convert camel case to sentence case in javascript

I found myself needing to do camel case to sentence case string conversion with sane acronym support, a google search for ideas led me to the following SO post:
Convert camelCaseText to Sentence Case Text
Which is actually asking about title case not sentence case so I came up with the following solution which maybe others will find helpful or can offer improvements to, it is using ES6 which is acceptable for me and can easily be polyfilled if there's some horrible IE requirement.

The below uses capitalised notation for acronyms; I don't agree with Microsoft's recommendation of capitalising when more than two characters so this expects the whole acronym to be capitalised even if it's at the start of the string (which technically means it's not camel case but it gives sane controllable output), multiple consecutive acronyms can be escaped with _ (e.g. parseDBM_MXL -> Parse DBM XML).
function camelToSentenceCase(str) {
return str.split(/([A-Z]|\d)/).map((v, i, arr) => {
// If first block then capitalise 1st letter regardless
if (!i) return v.charAt(0).toUpperCase() + v.slice(1);
// Skip empty blocks
if (!v) return v;
// Underscore substitution
if (v === '_') return " ";
// We have a capital or number
if (v.length === 1 && v === v.toUpperCase()) {
const previousCapital = !arr[i-1] || arr[i-1] === '_';
const nextWord = i+1 < arr.length && arr[i+1] && arr[i+1] !== '_';
const nextTwoCapitalsOrEndOfString = i+3 > arr.length || !arr[i+1] && !arr[i+3];
// Insert space
if (!previousCapital || nextWord) v = " " + v;
// Start of word or single letter word
if (nextWord || (!previousCapital && !nextTwoCapitalsOrEndOfString)) v = v.toLowerCase();
}
return v;
}).join("");
}
// ----------------------------------------------------- //
var testSet = [
'camelCase',
'camelTOPCase',
'aP2PConnection',
'JSONIsGreat',
'thisIsALoadOfJSON',
'parseDBM_XML',
'superSimpleExample',
'aGoodIPAddress'
];
testSet.forEach(function(item) {
console.log(item, '->', camelToSentenceCase(item));
});

JavaScript - Regex to remove code / special characters / numbers etc

Answer #Wiktor Stribiżew suggested:
function myValidate(word) {
return (word.length === 1 || /[^A-Z]/i.test(word)) ? true : false;
}
Hello during the creation of an array I have a function that will not allow words with certain characters etc to be added to the array
function myValidate(word) {
// No one letter words
if (word.length === 1) {
return true;
}
if (word.indexOf('^') > -1 || word.indexOf('$') > -1) {
return true;
}
return false;
}
It seems like not the proper way of going about this and ive been looking into a regex that would handle it but have not been successful implementing it, tried numerous efforts like:
if (word.match('/[^A-Za-z]+/g') ) {
return true;
}
can some one shed some light on the proper way of handling this?

I suggest using a simpler solution:
function myValidate(word) {
return (word.length === 1 || /[^A-Z]/i.test(word)) ? false : true;
}
var words = ["Fat", "Gnat", "x3-2741996", "1996", "user[50]", "definitions(edit)", "synopsis)"];
document.body.innerHTML = JSON.stringify(words.filter(x => myValidate(x)));
Where:
word.length === 1 checks for the string length
/[^A-Z]/i.test(word) checks if there is a non-ASCII-letter symbol in the string
If any of the above condition is met, the word is taken out of the array. The rest remains.

EDIT: using test instead of match
You want to use test() because it returns a bool telling you if you match the regex or not. The match(), instead, always returns the matched elements. Those may be cast to true by coercion. This is not what you want.
To sum it all up you can just use this one-liner (no if needed and no quotes either, cannot get any simpler):
return word.test(/^[a-zA-Z][a-zA-Z]+$/); // two letter words
You should whitelist characters instead of blacklisting. That's one of the principles in security. In your case, don't tell what is wrong, but tell what is right:
if (word.test('/^[a-zA-Z]+$/')) { // two letter words
return false;
}
This will return false for all words that contain ONLY [a-zA-Z] characters. I guess this is what you want.
Your regex, instead, looked for illegal characters by negating the character group with the leading ^.
Two recommendations:
Just use regex in a positive way (without negation) and it'll be a lot easier to understand.
Also, validation functions normally return true for good data and false for bad data.
It is more readable this way:
if (validate(data))
{
// that's some good data we have here!
}

counting a word and returning a whether it is symmetric or not in Javascript

My whole goal was to write a loop that would take a string, count the letters and return two responses: one = "this word is symmetric" or two = "this word is not symmetric". However the code I wrote doesn't console anything out. Here's the code:
var arya = function(arraycount){
for (arraycount.length >= 1; arraycount.length <= 100; arraycount++) {
while (arraycount.length%2 === 0) {
console.log("This is a symmetric word and its length is " + " " arraycount.length " units.");
arraycount.length%2 != 0
console.log("Not a symmetric word");
}
}
}
arya("Michael");

There are many ways to accomplish your goal, but here are a few. The first is a somewhat naïve approach using a for loop, and the second uses recursion. The third asks whether the string equals the reverse of the string.
iterative (for loop) function
var isPalindromeIteratively = function(string) {
if (string.length <= 1) {
return true;
}
for (var i = 0; i <= Math.floor(string.length / 2); i++) {
if (string[i] !== string[string.length - 1 - i]) {
return false;
}
}
return true;
};
This function begins by asking whether your input string is a single character or empty string, in which case the string would be a trivial palindrome. Then, the for loop is set up: starting from 0 (the first character of the string) and going to the middle character, the loop asks whether a given character is identical to its partner on the other end of the string. If the parter character is not identical, the function returns false. If the for loop finishes, that means every character has an identical partner, so the function returns true.
recursive function
var isPalindromeRecursively = function(string) {
if (string.length <= 1) {
console.log('<= 1');
return true;
}
var firstChar = string[0];
var lastChar = string[string.length - 1];
var substring = string.substring(1, string.length - 1);
console.log('first character: ' + firstChar);
console.log('last character: ' + lastChar);
console.log('substring: ' + substring);
return (firstChar === lastChar) ? isPalindromeRecursively(substring) : false;
};
This function begins the same way as the first, by getting the trivial case out of the way. Then, it tests whether the first character of the string is equal to the last character. Using the ternary operator, the function, returns false if that test fails. If the test is true, the function calls itself again on a substring, and everything starts all over again. This substring is the original string without the first and last characters.
'reflecting' the string
var reflectivePalindrome = function(string) {
return string === string.split('').reverse().join('');
};
This one just reverses the string and sees if it equals the input string. It relies on the reverse() method of Array, and although it's the most expressive and compact way of doing it, it's probably not the most efficient.
usage
These will return true or false, telling you whether string is a palindrome. I assumed that is what you mean when you say "symmetric." I included some debugging statements so you can trace this recursive function as it works.
The Mozilla Developer Network offers a comprehensive guide of the JavaScript language. Also, here are links to the way for loops and while loops work in JS.

Convert String with Dot or Comma as decimal separator to number in JavaScript

An input element contains numbers a where comma or dot is used as decimal separator and space may be used to group thousands like this:
'1,2'
'110 000,23'
'100 1.23'
How would one convert them to a float number in the browser using JavaScript?
jQuery and jQuery UI are used. Number(string) returns NaN and parseFloat() stops on first space or comma.

Do a replace first:
parseFloat(str.replace(',','.').replace(' ',''))

I realise I'm late to the party, but I wanted a solution for this that properly handled digit grouping as well as different decimal separators for currencies. As none of these fully covered my use case I wrote my own solution which may be useful to others:
function parsePotentiallyGroupedFloat(stringValue) {
stringValue = stringValue.trim();
var result = stringValue.replace(/[^0-9]/g, '');
if (/[,\.]\d{2}$/.test(stringValue)) {
result = result.replace(/(\d{2})$/, '.$1');
}
return parseFloat(result);
}
This should strip out any non-digits and then check whether there was a decimal point (or comma) followed by two digits and insert the decimal point if needed.
It's worth noting that I aimed this specifically for currency and as such it assumes either no decimal places or exactly two. It's pretty hard to be sure about whether the first potential decimal point encountered is a decimal point or a digit grouping character (e.g., 1.542 could be 1542) unless you know the specifics of the current locale, but it should be easy enough to tailor this to your specific use case by changing \d{2}$ to something that will appropriately match what you expect to be after the decimal point.

The perfect solution
accounting.js is a tiny JavaScript library for number, money and currency formatting.
Check this for ref

You could replace all spaces by an empty string, all comas by dots and then parse it.
var str = "110 000,23";
var num = parseFloat(str.replace(/\s/g, "").replace(",", "."));
console.log(num);
I used a regex in the first one to be able to match all spaces, not just the first one.

This is the best solution
http://numeraljs.com/
numeral().unformat('0.02'); = 0.02

What about:
parseFloat(str.replace(' ', '').replace('.', '').replace(',', '.'));

All the other solutions require you to know the format in advance. I needed to detect(!) the format in every case and this is what I end up with.
function detectFloat(source) {
let float = accounting.unformat(source);
let posComma = source.indexOf(',');
if (posComma > -1) {
let posDot = source.indexOf('.');
if (posDot > -1 && posComma > posDot) {
let germanFloat = accounting.unformat(source, ',');
if (Math.abs(germanFloat) > Math.abs(float)) {
float = germanFloat;
}
} else {
// source = source.replace(/,/g, '.');
float = accounting.unformat(source, ',');
}
}
return float;
}
This was tested with the following cases:
const cases = {
"0": 0,
"10.12": 10.12,
"222.20": 222.20,
"-222.20": -222.20,
"+222,20": 222.20,
"-222,20": -222.20,
"-2.222,20": -2222.20,
"-11.111,20": -11111.20,
};
Suggestions welcome.

Here's a self-sufficient JS function that solves this (and other) problems for most European/US locales (primarily between US/German/Swedish number chunking and formatting ... as in the OP). I think it's an improvement on (and inspired by) Slawa's solution, and has no dependencies.
function realParseFloat(s)
{
s = s.replace(/[^\d,.-]/g, ''); // strip everything except numbers, dots, commas and negative sign
if (navigator.language.substring(0, 2) !== "de" && /^-?(?:\d+|\d{1,3}(?:,\d{3})+)(?:\.\d+)?$/.test(s)) // if not in German locale and matches #,###.######
{
s = s.replace(/,/g, ''); // strip out commas
return parseFloat(s); // convert to number
}
else if (/^-?(?:\d+|\d{1,3}(?:\.\d{3})+)(?:,\d+)?$/.test(s)) // either in German locale or not match #,###.###### and now matches #.###,########
{
s = s.replace(/\./g, ''); // strip out dots
s = s.replace(/,/g, '.'); // replace comma with dot
return parseFloat(s);
}
else // try #,###.###### anyway
{
s = s.replace(/,/g, ''); // strip out commas
return parseFloat(s); // convert to number
}
}

Here is my solution that doesn't have any dependencies:
return value
.replace(/[^\d\-.,]/g, "") // Basic sanitization. Allows '-' for negative numbers
.replace(/,/g, ".") // Change all commas to periods
.replace(/\.(?=.*\.)/g, ""); // Remove all periods except the last one
(I left out the conversion to a number - that's probably just a parseFloat call if you don't care about JavaScript's precision problems with floats.)
The code assumes that:
Only commas and periods are used as decimal separators. (I'm not sure if locales exist that use other ones.)
The decimal part of the string does not use any separators.

try this...
var withComma = "23,3";
var withFloat = "23.3";
var compareValue = function(str){
var fixed = parseFloat(str.replace(',','.'))
if(fixed > 0){
console.log(true)
}else{
console.log(false);
}
}
compareValue(withComma);
compareValue(withFloat);

This answer accepts some edge cases that others don't:
Only thousand separator: 1.000.000 => 1000000
Exponentials: 1.000e3 => 1000e3 (1 million)
Run the code snippet to see all the test suite.
const REGEX_UNWANTED_CHARACTERS = /[^\d\-.,]/g
const REGEX_DASHES_EXEPT_BEGINNING = /(?!^)-/g
const REGEX_PERIODS_EXEPT_LAST = /\.(?=.*\.)/g
export function formatNumber(number) {
// Handle exponentials
if ((number.match(/e/g) ?? []).length === 1) {
const numberParts = number.split('e')
return `${formatNumber(numberParts[0])}e${formatNumber(numberParts[1])}`
}
const sanitizedNumber = number
.replace(REGEX_UNWANTED_CHARACTERS, '')
.replace(REGEX_DASHES_EXEPT_BEGINING, '')
// Handle only thousands separator
if (
((sanitizedNumber.match(/,/g) ?? []).length >= 2 && !sanitizedNumber.includes('.')) ||
((sanitizedNumber.match(/\./g) ?? []).length >= 2 && !sanitizedNumber.includes(','))
) {
return sanitizedNumber.replace(/[.,]/g, '')
}
return sanitizedNumber.replace(/,/g, '.').replace(REGEX_PERIODS_EXEPT_LAST, '')
}
function formatNumberToNumber(number) {
return Number(formatNumber(number))
}
const REGEX_UNWANTED_CHARACTERS = /[^\d\-.,]/g
const REGEX_DASHES_EXEPT_BEGINING = /(?!^)-/g
const REGEX_PERIODS_EXEPT_LAST = /\.(?=.*\.)/g
function formatNumber(number) {
if ((number.match(/e/g) ?? []).length === 1) {
const numberParts = number.split('e')
return `${formatNumber(numberParts[0])}e${formatNumber(numberParts[1])}`
}
const sanitizedNumber = number
.replace(REGEX_UNWANTED_CHARACTERS, '')
.replace(REGEX_DASHES_EXEPT_BEGINING, '')
if (
((sanitizedNumber.match(/,/g) ?? []).length >= 2 && !sanitizedNumber.includes('.')) ||
((sanitizedNumber.match(/\./g) ?? []).length >= 2 && !sanitizedNumber.includes(','))
) {
return sanitizedNumber.replace(/[.,]/g, '')
}
return sanitizedNumber.replace(/,/g, '.').replace(REGEX_PERIODS_EXEPT_LAST, '')
}
const testCases = [
'1',
'1.',
'1,',
'1.5',
'1,5',
'1,000.5',
'1.000,5',
'1,000,000.5',
'1.000.000,5',
'1,000,000',
'1.000.000',
'-1',
'-1.',
'-1,',
'-1.5',
'-1,5',
'-1,000.5',
'-1.000,5',
'-1,000,000.5',
'-1.000.000,5',
'-1,000,000',
'-1.000.000',
'1e3',
'1e-3',
'1e',
'-1e',
'1.000e3',
'1,000e-3',
'1.000,5e3',
'1,000.5e-3',
'1.000,5e1.000,5',
'1,000.5e-1,000.5',
'',
'a',
'a1',
'a-1',
'1a',
'-1a',
'1a1',
'1a-1',
'1-',
'-',
'1-1'
]
document.getElementById('tbody').innerHTML = testCases.reduce((total, input) => {
return `${total}<tr><td>${input}</td><td>${formatNumber(input)}</td></tr>`
}, '')
<table>
<thead><tr><th>Input</th><th>Output</th></tr></thead>
<tbody id="tbody"></tbody>
</table>

From number to currency string is easy through Number.prototype.toLocaleString. However the reverse seems to be a common problem. The thousands separator and decimal point may not be obtained in the JS standard.
In this particular question the thousands separator is a white space " " but in many cases it can be a period "." and decimal point can be a comma ",". Such as in 1 000 000,00 or 1.000.000,00. Then this is how i convert it into a proper floating point number.
var price = "1 000.000,99",
value = +price.replace(/(\.|\s)|(\,)/g,(m,p1,p2) => p1 ? "" : ".");
console.log(value);
So the replacer callback takes "1.000.000,00" and converts it into "1000000.00". After that + in the front of the resulting string coerces it into a number.
This function is actually quite handy. For instance if you replace the p1 = "" part with p1 = "," in the callback function, an input of 1.000.000,00 would result 1,000,000.00

Count number of matches of a regex in Javascript

I wanted to write a regex to count the number of spaces/tabs/newline in a chunk of text. So I naively wrote the following:-
numSpaces : function(text) {
return text.match(/\s/).length;
}
For some unknown reasons it always returns 1. What is the problem with the above statement? I have since solved the problem with the following:-
numSpaces : function(text) {
return (text.split(/\s/).length -1);
}

tl;dr: Generic Pattern Counter
// THIS IS WHAT YOU NEED
const count = (str) => {
const re = /YOUR_PATTERN_HERE/g
return ((str || '').match(re) || []).length
}
For those that arrived here looking for a generic way to count the number of occurrences of a regex pattern in a string, and don't want it to fail if there are zero occurrences, this code is what you need. Here's a demonstration:
/*
* Example
*/
const count = (str) => {
const re = /[a-z]{3}/g
return ((str || '').match(re) || []).length
}
const str1 = 'abc, def, ghi'
const str2 = 'ABC, DEF, GHI'
console.log(`'${str1}' has ${count(str1)} occurrences of pattern '/[a-z]{3}/g'`)
console.log(`'${str2}' has ${count(str2)} occurrences of pattern '/[a-z]{3}/g'`)
Original Answer
The problem with your initial code is that you are missing the global identifier:
>>> 'hi there how are you'.match(/\s/g).length;
4
Without the g part of the regex it will only match the first occurrence and stop there.
Also note that your regex will count successive spaces twice:
>>> 'hi there'.match(/\s/g).length;
2
If that is not desirable, you could do this:
>>> 'hi there'.match(/\s+/g).length;
1

As mentioned in my earlier answer, you can use RegExp.exec() to iterate over all matches and count each occurrence; the advantage is limited to memory only, because on the whole it's about 20% slower than using String.match().
var re = /\s/g,
count = 0;
while (re.exec(text) !== null) {
++count;
}
return count;

(('a a a').match(/b/g) || []).length; // 0
(('a a a').match(/a/g) || []).length; // 3
Based on https://stackoverflow.com/a/48195124/16777 but fixed to actually work in zero-results case.

Here is a similar solution to #Paolo Bergantino's answer, but with modern operators. I'll explain below.
const matchCount = (str, re) => {
return str?.match(re)?.length ?? 0;
};
// usage
let numSpaces = matchCount(undefined, /\s/g);
console.log(numSpaces); // 0
numSpaces = matchCount("foobarbaz", /\s/g);
console.log(numSpaces); // 0
numSpaces = matchCount("foo bar baz", /\s/g);
console.log(numSpaces); // 2
?. is the optional chaining operator. It allows you to chain calls as deep as you want without having to worry about whether there is an undefined/null along the way. Think of str?.match(re) as
if (str !== undefined && str !== null) {
return str.match(re);
} else {
return undefined;
}
This is slightly different from #Paolo Bergantino's. Theirs is written like this: (str || ''). That means if str is falsy, return ''. 0 is falsy. document.all is falsy. In my opinion, if someone were to pass those into this function as a string, it would probably be because of programmer error. Therefore, I'd rather be informed I'm doing something non-sensible than troubleshoot why I keep on getting a length of 0.
?? is the nullish coalescing operator. Think of it as || but more specific. If the left hand side of || evaluates to falsy, it executes the right-hand side. But ?? only executes if the left-hand side is undefined or null.
Keep in mind, the nullish coalescing operator in ?.length ?? 0 will return the same thing as using ?.length || 0. The difference is, if length returns 0, it won't execute the right-hand side... but the result is going to be 0 whether you use || or ??.
Honestly, in this situation I would probably change it to || because more JavaScript developers are familiar with that operator. Maybe someone could enlighten me on benefits of ?? vs || in this situation, if any exist.
Lastly, I changed the signature so the function can be used for any regex.
Oh, and here is a typescript version:
const matchCount = (str: string, re: RegExp) => {
return str?.match(re)?.length ?? 0;
};

('my string'.match(/\s/g) || []).length;

This is certainly something that has a lot of traps. I was working with Paolo Bergantino's answer, and realising that even that has some limitations. I found working with string representations of dates a good place to quickly find some of the main problems. Start with an input string like this:
'12-2-2019 5:1:48.670'
and set up Paolo's function like this:
function count(re, str) {
if (typeof re !== "string") {
return 0;
}
re = (re === '.') ? ('\\' + re) : re;
var cre = new RegExp(re, 'g');
return ((str || '').match(cre) || []).length;
}
I wanted the regular expression to be passed in, so that the function is more reusable, secondly, I wanted the parameter to be a string, so that the client doesn't have to make the regex, but simply match on the string, like a standard string utility class method.
Now, here you can see that I'm dealing with issues with the input. With the following:
if (typeof re !== "string") {
return 0;
}
I am ensuring that the input isn't anything like the literal 0, false, undefined, or null, none of which are strings. Since these literals are not in the input string, there should be no matches, but it should match '0', which is a string.
With the following:
re = (re === '.') ? ('\\' + re) : re;
I am dealing with the fact that the RegExp constructor will (I think, wrongly) interpret the string '.' as the all character matcher \.\
Finally, because I am using the RegExp constructor, I need to give it the global 'g' flag so that it counts all matches, not just the first one, similar to the suggestions in other posts.
I realise that this is an extremely late answer, but it might be helpful to someone stumbling along here. BTW here's the TypeScript version:
function count(re: string, str: string): number {
if (typeof re !== 'string') {
return 0;
}
re = (re === '.') ? ('\\' + re) : re;
const cre = new RegExp(re, 'g');
return ((str || '').match(cre) || []).length;
}

Using modern syntax avoids the need to create a dummy array to count length 0
const countMatches = (exp, str) => str.match(exp)?.length ?? 0;
Must pass exp as RegExp and str as String.

how about like this
function isint(str){
if(str.match(/\d/g).length==str.length){
return true;
}
else {
return false
}
}

Develop Reference

JavaScript is the programming language of the Web.

JS str replace Unicode aware - javascript

Related

Convert camel case to sentence case in javascript

JavaScript - Regex to remove code / special characters / numbers etc

counting a word and returning a whether it is symmetric or not in Javascript

Convert String with Dot or Comma as decimal separator to number in JavaScript

Count number of matches of a regex in Javascript

Categories

Resources