Check if a single character is a whitespace? - javascript

What is the best way to check if a single character is a whitespace?
I know how to check this through a regex.
But I am not sure if this is the best way if I only have a single character.
Isn't there a better way (concerning performance) for checking if it's a whitespace?
If I do something like this. I would miss white spaces like tabs I guess?
if (ch == ' ') {
...
}

If you only want to test for certain whitespace characters, do so manually, otherwise, use a regular expression, ie
/\s/.test(ch)
Keep in mind that different browsers match different characters, eg in Firefox, \s is equivalent to (source)
[ \f\n\r\t\v\u00A0\u2028\u2029]
whereas in Internet Explorer, it should be (source)
[ \f\n\r\t\v]
The MSDN page actually forgot the space ;)

The regex approach is a solid way to go. But here's what I do when I'm lazy and forget the proper regex syntax:
str.trim() === '' ? alert('just whitespace') : alert('not whitespace');

I have referenced the set of whitespace characters matched by PHP's trim function without shame (minus the null byte, I have no idea how well browsers will handle that).
if (' \t\n\r\v'.indexOf(ch) > -1) {
// ...
}
This looks like premature optimization to me though.

this covers spaces, tabs and newlines:
if ((ch == ' ') || (ch == '\t') || (ch == '\n'))
this should be best for performance. put the whitespace character you expect to be most likely, first.
if performance is really important, probably best to consider the bigger picture than individual operations like this...

While it's not entirely correct, I use this pragmatic and fast solution:
if (ch.charCodeAt(0) <= 32) {...

#jake 's answer above -- using the trim() method -- is the best option. If you have a single character ch as a hex number:
String.fromCharCode(ch).trim() === ""
will return true for all whitespace characters.
Unfortunately, comparison like <=32 will not catch all whitespace characters. For example; 0xA0 (non-breaking space) is treated as whitespace in Javascript and yet it is > 32. Searching using indexOf() with a string like "\t\n\r\v" will be incorrect for the same reason.
Here's a short JS snippet that illustrates this: https://repl.it/#saleemsiddiqui/JavascriptStringTrim

Based on this benchmark, it appears the following method would be most performant:
For Performance:
function isWhitespace(c) {
return c === ' '
|| c === '\n'
|| c === '\t'
|| c === '\r'
|| c === '\f'
|| c === '\v'
|| c === '\u00a0'
|| c === '\u1680'
|| c === '\u2000'
|| c === '\u200a'
|| c === '\u2028'
|| c === '\u2029'
|| c === '\u202f'
|| c === '\u205f'
|| c === '\u3000'
|| c === '\ufeff'
}
There are, no doubt, some cases were you might want this level of performance (I'm working on a markdown converter and am trying to squeeze out as much performance as possible). However, in most cases, this level of optimization is unnecessary. In such cases, I would recommend something like this:
For Simplicity:
const whitespaceRe = /\s/
function isWhitespace(c) {
return whitespaceRe.test(c)
}
This is more readable, and less likely to have a typo and, therefore, less likely to have a bug.

var testWhite = (x) {
var white = new RegExp(/^\s$/);
return white.test(x.charAt(0));
};
This small function will allow you to enter a string of variable length as an argument and it will report "true" if the first character is white space or "false" otherwise. You can easily put any character from a string into the function using the indexOf or charAt methods. Examples:
var str = "Today I wish I were not in Afghanistan.";
testWhite(str.charAt(9)); // This would test character "i" and would return false.
testWhite(str.charAt(str.indexOf("I") + 1)); // This would return true.

function hasWhiteSpace(s) {
return /\s/g.test(s);
}
This will work
or
you can also use this indexOf():
function hasWhiteSpace(s) {
return s.indexOf(' ') >= 0;
}

how about this one :
((1L << ch) & ((ch - 64) >> 31) & 0x100002600L) != 0L

Related

Javascript - is there a better way to check an string instead of indexOf

I use this code and My question would be if there is a better way to check a string than indexOf:
if(documentFile.ending.indexOf('pdf') > -1 || documentFile.ending.indexOf('PDF') > -1 || documentFile.ending.indexOf('docx') > -1)
ES6 has boolean function. Use:
if ( documentFile.ending.includes('pdf') ) { }
Or for regex:
if ( documentFile.ending.match(/your-regex/) { }
Example spec: https://developer.mozilla.org/nl/docs/Web/JavaScript/Reference/Global_Objects/String/includes
If you are using ES6 then you may want to look at String.prototype.includes
var str = 'To be, or not to be, that is the question.';
console.log(str.includes('To be')); // true
In ES6 you have better option to use "includes"
otherwise use regex
if(/pdf/i.test(documentFile.ending))
Well, indexOf is really fast, a lot faster than using a regular expression. But something like /pdf$/i.test(str) lets you test the end as well as giving you case-insensitivity. But you could be more precise:
function endsWith(str, ending) {
return str != null
&& ending != null
&& ending.length <= str.length
&& str.lastIndexOf(ending) === str.length - ending.length;
}
Note the ending.length <= str.length which is there so that you don't do something like endsWith("", "a") and get true. :)

JS str replace Unicode aware

I am sure there is probably a dupe of this here somewhere, but if so I cannot seem to find it, nor can I glue the pieces together correctly from what I could find to get what I need. I am using JavaScript and need the following:
1) Replace the first character of a string with it's Unicode aware capitalization UNLESS the next (second) character is a - OR ` or ' (minus/dash, caret, or single-quote).
I have come close with what I could find except for getting the caret and single quote included (assuming they need to be escaped somehow) and what I believe to be a scope issue with the following because first returns undefined. I am also not positive which JS/String functions are Unicode aware:
autoCorrect = (str) => {
return str.replace(/^./, function(first) {
// if next char is not - OR ` OR ' <- not sure how to handle caret and quote
if(str.charAt(1) != '-' ) {
return first.toUpperCase(); // first is undefined here - scope??
}
});
}
Any help is appreciated!
Internally, JavaScript uses UCS-2, not UTF-8.
Handling Unicode in JavaScript isn't particularly beautiful, but possible. It becomes particularly ugly with surrogate pairs such as "🐱", but the for..of loop can handle that. Do never try to use indices on Unicode strings, as you might get only one half of a surrogate pair (which breaks Unicode).
This should handle Unicode well and do what you want:
function autoCorrect(string) {
let i = 0, firstSymbol;
const blacklist = ["-", "`", "'"];
for (const symbol of string) {
if (i === 0) {
firstSymbol = symbol;
}
else if (i === 1 && blacklist.some(char => char === symbol)) {
return string;
}
else {
const rest = string.substring(firstSymbol.length);
return firstSymbol.toUpperCase() + rest;
}
++i;
}
return string.toUpperCase();
}
Tests
console.assert(autoCorrect("δα") === "Δα");
console.assert(autoCorrect("🐱") === "🐱");
console.assert(autoCorrect("d") === "D");
console.assert(autoCorrect("t-minus-one") === "t-minus-one");
console.assert(autoCorrect("t`minus`one") === "t`minus`one");
console.assert(autoCorrect("t'minus'one") === "t'minus'one");
console.assert(autoCorrect("t^minus^one") === "T^minus^one");
console.assert(autoCorrect("t_minus_one") === "T_minus_one");

if .html() has specific value

this might be a very basic question, but I would like to know how I can find out if .html() has a particular value (in this case a string). An example:
<p id="text">Hello this is a long text with numbers like 01234567</p>
and I would like to ask
var $text = $('#text');
if ($text.html() == '01234567')
of course this would not work. But how can I enhance another method to .html() that asks
if($text.html().contains() == '01234567');
Important to say is, that in my case I definitely will search for things who are seperated with a space, not like withnumberslike01234567 but indeed it would be interesting if that would work as well.
Thanks in advance!
(' ' + document.getElementById('text').textContent + ' ').indexOf(' 01234567 ') != -1
Fixes problem with the text at the beginning, doesn't abuse regex, and hooray for vanilla.js!
You can use indexOf:
var text = $('#text').html();
if(text.indexOf(" 01234567") != -1) {
// your logic
}
Your HTML might start with 01234567, though; in that case, you can do this:
if((' ' + text).indexOf(" 01234567") != -1) {
// your logic
}
Thanks, bjb568 and Felix Kling.
As I understand from OP, these are the test cases:
hello12348hello // false
hello 1234hello // false
hello012348 hello // false
hello 1234 hello // TRUE
1234hello // false
hello1234 // false
1234 hello // TRUE
hello 1234 // TRUE
// false
1234 // TRUE
1234 // TRUE
** Changing "" by any other white-space character (e.g. \t, \n, ...) should give same results.
As OP said:
for things who are separated with a space, not like withnumberslike01234567
So, hello 01234567withnumberslike is also wrong!!!
Creating the function:
function contains(value, searchString){
// option 1: splitting and finding a word separated by white spaces
var words = value.split(/\s+/g);
for (var i = 0; i < words.length; i++){
if (words[i] === searchString){
return true;
}
}
return false;
// option 1a: for IE9+
return value.split(/\s+/g).indexOf(searchString) > -1;
// option 2: using RegEx
return (new RegExp("\\b" + searchString + "\\b")).test(value);
return (new RegExp("(^|\\s)" + searchString + "($|\\s)")).test(value); // this also works
// option 3: Hardcoded RegEx
return /\b1234\b/.test(value);
}
See case tests here in jsFiddle
It will also accept tabs as well as whitespaces..
NOTE I wouldn't worry about using RegEx, it isn't fast as indexOf, but it stills really fast. It shouldn't be an issue, unless you iterate millions of times. If it would be the case, perhaps you'll need to rethink your approach because probably something is wrong..
I would say to you think about compatibility, there is a lot of users still using IE8, IE7, even IE6 (almost 10% right now - April, 2014). -- No longer an issue in 2016..
Also, it's preferred to maintain code standards.
Since, you are using jQuery you can use too .text() to find string:
var element = $(this);
var elementText = element.text();
if (contains(elementText, "1234"){
element.text(elementText.replace("1234", "$ 1234.00"))
.addClass("matchedString");
$('#otherElement').text("matched: 1234");
}
Thanks to #Karl-AndréGagnon for the tips.
\b: any boundary word (or start/end of the string)
^: start of the string
\s: Any whitespace character
$: end of the string
http://rubular.com/r/Ul6Ci4pcCf
You can use the String.indexOf method in JavaScript to determine whether or not one string is contained in another. If the string passed into indexOf is not in the string, then -1 is returned. This is the behavior you should utilize.
If ($test.html().indexOf("1234567890") != -1)
//Do Something
if($text.html().indexOf('01234567') != -1) {
}
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/indexOf

Regex for a valid numeric with optional commas & dot

i am trying only to allow numerals and special chars like '.' and ',' to be allowed in my text string. for that i have tried following code
var pattern = /[A-Za-z]/g;
var nospecial=/[\(#\$\%_+~=*!|\":<>[\]{}`\\)';#&?$]/g;
if (!ev.ctrlKey && charCode!=9 && charCode!=8 && charCode!=36 && charCode!=37 && charCode!=38 && (charCode!=39 || (charCode==39 && text=="'")) && charCode!=40) {
console.log(text);
if (!pattern.test(text) && !nospecial.test(text)) {
console.log('if');
return true;
} else {
console.log('else');
return false;
}
}
but not getting the desired output. tell me where i am wrong.
Forget trying to blacklist, just do this to allow what you want:
var pattern = /^[0-9.,]*$/;
Edit: Also, rather than just checking for numbers, commas, and dots. I'm assuming something like this do even more than you were hoping for:
var pattern = /^(0|[1-9][0-9]{0,2}(?:(,[0-9]{3})*|[0-9]*))(\.[0-9]+){0,1}$/;
Demo
So why don't you try /^[0-9,.]*$/ instead of negating the test?
You can try this:
/([0-9]+[.,]*)+/
It will matche number with or withot coma or dots.
^(?!.*[^0-9.,\n]).*$
Not sure of what you mean by efficient but this fails faster though it takes long to match correct string.See demo.
http://regex101.com/r/aK2zV7/1
You could also just use the solution from this answer:
parseFloat(text.replace(',',''));

Count number of matches of a regex in Javascript

I wanted to write a regex to count the number of spaces/tabs/newline in a chunk of text. So I naively wrote the following:-
numSpaces : function(text) {
return text.match(/\s/).length;
}
For some unknown reasons it always returns 1. What is the problem with the above statement? I have since solved the problem with the following:-
numSpaces : function(text) {
return (text.split(/\s/).length -1);
}
tl;dr: Generic Pattern Counter
// THIS IS WHAT YOU NEED
const count = (str) => {
const re = /YOUR_PATTERN_HERE/g
return ((str || '').match(re) || []).length
}
For those that arrived here looking for a generic way to count the number of occurrences of a regex pattern in a string, and don't want it to fail if there are zero occurrences, this code is what you need. Here's a demonstration:
/*
* Example
*/
const count = (str) => {
const re = /[a-z]{3}/g
return ((str || '').match(re) || []).length
}
const str1 = 'abc, def, ghi'
const str2 = 'ABC, DEF, GHI'
console.log(`'${str1}' has ${count(str1)} occurrences of pattern '/[a-z]{3}/g'`)
console.log(`'${str2}' has ${count(str2)} occurrences of pattern '/[a-z]{3}/g'`)
Original Answer
The problem with your initial code is that you are missing the global identifier:
>>> 'hi there how are you'.match(/\s/g).length;
4
Without the g part of the regex it will only match the first occurrence and stop there.
Also note that your regex will count successive spaces twice:
>>> 'hi there'.match(/\s/g).length;
2
If that is not desirable, you could do this:
>>> 'hi there'.match(/\s+/g).length;
1
As mentioned in my earlier answer, you can use RegExp.exec() to iterate over all matches and count each occurrence; the advantage is limited to memory only, because on the whole it's about 20% slower than using String.match().
var re = /\s/g,
count = 0;
while (re.exec(text) !== null) {
++count;
}
return count;
(('a a a').match(/b/g) || []).length; // 0
(('a a a').match(/a/g) || []).length; // 3
Based on https://stackoverflow.com/a/48195124/16777 but fixed to actually work in zero-results case.
Here is a similar solution to #Paolo Bergantino's answer, but with modern operators. I'll explain below.
const matchCount = (str, re) => {
return str?.match(re)?.length ?? 0;
};
// usage
let numSpaces = matchCount(undefined, /\s/g);
console.log(numSpaces); // 0
numSpaces = matchCount("foobarbaz", /\s/g);
console.log(numSpaces); // 0
numSpaces = matchCount("foo bar baz", /\s/g);
console.log(numSpaces); // 2
?. is the optional chaining operator. It allows you to chain calls as deep as you want without having to worry about whether there is an undefined/null along the way. Think of str?.match(re) as
if (str !== undefined && str !== null) {
return str.match(re);
} else {
return undefined;
}
This is slightly different from #Paolo Bergantino's. Theirs is written like this: (str || ''). That means if str is falsy, return ''. 0 is falsy. document.all is falsy. In my opinion, if someone were to pass those into this function as a string, it would probably be because of programmer error. Therefore, I'd rather be informed I'm doing something non-sensible than troubleshoot why I keep on getting a length of 0.
?? is the nullish coalescing operator. Think of it as || but more specific. If the left hand side of || evaluates to falsy, it executes the right-hand side. But ?? only executes if the left-hand side is undefined or null.
Keep in mind, the nullish coalescing operator in ?.length ?? 0 will return the same thing as using ?.length || 0. The difference is, if length returns 0, it won't execute the right-hand side... but the result is going to be 0 whether you use || or ??.
Honestly, in this situation I would probably change it to || because more JavaScript developers are familiar with that operator. Maybe someone could enlighten me on benefits of ?? vs || in this situation, if any exist.
Lastly, I changed the signature so the function can be used for any regex.
Oh, and here is a typescript version:
const matchCount = (str: string, re: RegExp) => {
return str?.match(re)?.length ?? 0;
};
('my string'.match(/\s/g) || []).length;
This is certainly something that has a lot of traps. I was working with Paolo Bergantino's answer, and realising that even that has some limitations. I found working with string representations of dates a good place to quickly find some of the main problems. Start with an input string like this:
'12-2-2019 5:1:48.670'
and set up Paolo's function like this:
function count(re, str) {
if (typeof re !== "string") {
return 0;
}
re = (re === '.') ? ('\\' + re) : re;
var cre = new RegExp(re, 'g');
return ((str || '').match(cre) || []).length;
}
I wanted the regular expression to be passed in, so that the function is more reusable, secondly, I wanted the parameter to be a string, so that the client doesn't have to make the regex, but simply match on the string, like a standard string utility class method.
Now, here you can see that I'm dealing with issues with the input. With the following:
if (typeof re !== "string") {
return 0;
}
I am ensuring that the input isn't anything like the literal 0, false, undefined, or null, none of which are strings. Since these literals are not in the input string, there should be no matches, but it should match '0', which is a string.
With the following:
re = (re === '.') ? ('\\' + re) : re;
I am dealing with the fact that the RegExp constructor will (I think, wrongly) interpret the string '.' as the all character matcher \.\
Finally, because I am using the RegExp constructor, I need to give it the global 'g' flag so that it counts all matches, not just the first one, similar to the suggestions in other posts.
I realise that this is an extremely late answer, but it might be helpful to someone stumbling along here. BTW here's the TypeScript version:
function count(re: string, str: string): number {
if (typeof re !== 'string') {
return 0;
}
re = (re === '.') ? ('\\' + re) : re;
const cre = new RegExp(re, 'g');
return ((str || '').match(cre) || []).length;
}
Using modern syntax avoids the need to create a dummy array to count length 0
const countMatches = (exp, str) => str.match(exp)?.length ?? 0;
Must pass exp as RegExp and str as String.
how about like this
function isint(str){
if(str.match(/\d/g).length==str.length){
return true;
}
else {
return false
}
}

Categories

Resources