Fastest way in JS to detect number of leading zeros

Fastest way in JS to detect number of leading zeros - javascript

I am working on a project where I need to quickly check if there are a certain number of leading zeros in string. I have had success using regex:
var regex = new RegExp('^[0]{' + difficulty + '}.+');
if (regex.test(hash))
Also with substring and repeat:
if (hash.substring(0, difficulty) === '0'.repeat(difficulty))
For my specific purpose, speed is the most important element. I must find the fastest way to check if the number of leading zeros matches the difficulty. I have ran benchmark tests on both methods, but the results fluctuate in a way that I cannot tell which one is better. Also, if there is another better method, please let me know. Thanks in advance.

if (Number(n).toString() !== n){
console.log("leading 0 detected")
}
this may not be the fastest, but simpler than writing a function

function detect(hash, difficulty) {
for (var i = 0, b = hash.length; i < b; i ++) {
if (hash[i] !== '0') {
break;
}
}
return i === difficulty;
}
Your methods have drawbacks that you construct intermediate objects and do heavy arithmetics (especially with regexp but with substring and full string comparisons as well). This one should be quite fast.

Related

how to stop replace function in JavaScript?

I have a very big string (> 2mln chars).
And I need to replace the text in this string, but this works very slowly.
How I can stop the replacer function if offset > 65535?
let inReg = MyRegExp();
let str = "very big string....";
str = str.replace(inReg, (match, p1, p2, offset, string) => {
if (offset > 65535) {
return myConvertFunc(match);
} else {
//How i can stop replacer?
}
return match; //this did not stop the replacer function
});

You can't. You could test the offset and just return the original text if it was > 65535, but the search would still continue past that point and your callback would still get called.
Instead, isolate the part you want to do the change in:
const segmentLength = 65536;
str = str.substring(0, segmentLength).replace(myConvertFunc) +
str.substring(segmentLength);
On the face of it, that seems insane — creating two new strings, one of them a copy of 2M characters! But modern JavaScript engines are very smart about strings (which, being immutable, offer a lot of opportunity for optimization). It's likely any good JavaScript engine would reuse, not copy, the contents of the two strings created via substring above, sharing the underlying character array with the original.

How to reverse a string that contains complicated emojis?

Input:
Hello world👩‍🦰👩‍👩‍👦‍👦
Desired Output:
👩‍👩‍👦‍👦👩‍🦰dlrow olleH
I tried several approaches but none gave me correct answer.
This failed miserablly:
const text = 'Hello world👩‍🦰👩‍👩‍👦‍👦';
const reversed = text.split('').reverse().join('');
console.log(reversed);
This kinda works but it breaks 👩‍👩‍👦‍👦 into 4 different emojis:
const text = 'Hello world👩‍🦰👩‍👩‍👦‍👦';
const reversed = [...text].reverse().join('');
console.log(reversed);
I also tried every answer in this question but none of them works.
Is there a way to get the desired output?

If you're able to, use the _.split() function provided by lodash. From version 4.0 onwards, _.split() is capable of splitting unicode emojis.
Using the native .reverse().join('') to reverse the 'characters' should work just fine with emojis containing zero-width joiners
function reverse(txt) { return _.split(txt, '').reverse().join(''); }
const text = 'Hello world👩‍🦰👩‍👩‍👦‍👦';
console.log(reverse(text));
<script src="https://cdnjs.cloudflare.com/ajax/libs/lodash.js/4.17.20/lodash.min.js" integrity="sha512-90vH1Z83AJY9DmlWa8WkjkV79yfS2n2Oxhsi2dZbIv0nC4E6m5AbH8Nh156kkM7JePmqD6tcZsfad1ueoaovww==" crossorigin="anonymous"></script>

I took TKoL's idea of using the \u200d character and used it to attempt to create a smaller script.
Note: Not all compositions use a zero width joiner so it will be buggy with other composition characters.
It uses the traditional for loop because we skip some iterations in case we find combined emoticons. Within the for loop there is a while loop to check if there is a following \u200d character. As long there is one we add the next 2 characters as well and forward the for loop with 2 iterations so combined emoticons are not reversed.
To easily use it on any string I made it as a new prototype function on the string object.
String.prototype.reverse = function() {
let textArray = [...this];
let reverseString = "";
for (let i = 0; i < textArray.length; i++) {
let char = textArray[i];
while (textArray[i + 1] === '\u200d') {
char += textArray[i + 1] + textArray[i + 2];
i = i + 2;
}
reverseString = char + reverseString;
}
return reverseString;
}
const text = "Hello world👩‍🦰👩‍👩‍👦‍👦";
console.log(text.reverse());
//Fun fact, you can chain them to double reverse :)
//console.log(text.reverse().reverse());

Reversing Unicode text is tricky for a lot of reasons.
First, depending on the programming language, strings are represented in different ways, either as a list of bytes, a list of UTF-16 code units (16 bits wide, often called "characters" in the API), or as ucs4 code points (4 bytes wide).
Second, different APIs reflect that inner representation to different degrees. Some work on the abstraction of bytes, some on UTF-16 characters, some on code points. When the representation uses bytes or UTF-16 characters, there are usually parts of the API that give you access to the elements of this representation, as well as parts that perform the necessary logic to get from bytes (via UTF-8) or from UTF-16 characters to the actual code points.
Often, the parts of the API performing that logic and thus giving you access to the code points have been added later, as first there was 7 bit ascii, then a bit later everybody thought 8 bits were enough, using different code pages, and even later that 16 bits were enough for unicode. The notion of code points as integer numbers without a fixed upper limit was historically added as the fourth common character length for logically encoding text.
Using an API that gives you access to the actual code points seems like that's it. But...
Third, there are a lot of modifier code points affecting the next code point or following code points. E.g. there's a diacritic modifier turning a following a into an ä, e to ë, &c. Turn the code points around, and aë becomes eä, made of different letters. There is a direct representation of e.g. ä as its own code point but using the modifier is just as valid.
Fourth, everything is in constant flux. There are also a lot of modifiers among the emoji, as used in the example, and more are added every year. Therefore, if an API gives you access to the information whether a code point is a modifier, the version of the API will determine whether it already knows a specific new modifier.
Unicode provides a hacky trick, though, for when it's only about the visual appearance:
There are writing direction modifiers. In the case of the example, left-to-right writing direction is used. Just add a right-to-left writing direction modifier at the beginning of the text and depending on the version of the API / browser, it will look correctly reversed 😎
'\u202e' is called right to left override, it is the strongest version of the right to left marker.
See this explanation by w3.org
const text = 'Hello world👩‍🦰👩‍👩‍👦‍👦'
console.log('\u202e' + text)
const text = 'Hello world👩‍🦰👩‍👩‍👦‍👦'
let original = document.getElementById('original')
original.appendChild(document.createTextNode(text))
let result = document.getElementById('result')
result.appendChild(document.createTextNode('\u202e' + text))
body {
font-family: sans-serif
}
<p id="original"></p>
<p id="result"></p>

I know! I'll use RegExp. What could go wrong? (Answer left as an exercise for the reader.)
const text = 'Hello world👩‍🦰👩‍👩‍👦‍👦';
const reversed = text.match(/.(\u200d.)*/gu).reverse().join('');
console.log(reversed);

Alternative solution would be to use runes library, small but effective solution:
https://github.com/dotcypress/runes
const runes = require('runes')
// String.substring
'👨‍👨‍👧‍👧a'.substring(1) => '�‍👨‍👧‍👧a'
// Runes
runes.substr('👨‍👨‍👧‍👧a', 1) => 'a'
runes('12👩‍👩‍👦‍👦3🍕✓').reverse().join();
// results in: "✓🍕3👩‍👩‍👦‍👦21"

You don't just have trouble with emoji, but also with other combining characters.
These things that feel like individual letters but are actually one-or-more unicode characters are called "extended grapheme clusters".
Breaking a string into these clusters is tricky (for example see these unicode docs). I would not rely on implementing it myself but use an existing library. Google pointed me at the grapheme-splitter library. The docs for this library contain some nice examples that will trip up most implementations:
Using this you should be able to write:
var splitter = new GraphemeSplitter();
var graphemes = splitter.splitGraphemes(string);
var reversed = graphemes.reverse().join('');
ASIDE: For visitors from the future, or those willing to live on the bleeding edge:
There is a proposal to add a grapheme segmenter to the javascript standard. (It actually provides other segmenting options too).
It is in stage 3 review for acceptance at the moment and is currently implemented in JSC and V8 (see https://github.com/tc39/proposal-intl-segmenter/issues/114).
Using this the code would look like:
var segmenter = new Intl.Segmenter("en", {granularity: "grapheme"})
var segment_iterator = segmenter.segment(string)
var graphemes = []
for (let {segment} of segment_iterator) {
graphemes.push(segment)
}
var reversed = graphemes.reverse().join('');
You can probably make this neater if you know more modern javascript than me...
There is an implementation here - but I don't know what it requires.
Note: This points out a fun issue that other answers haven't addressed yet. Segmentation can depend upon the locale that you are using - not just the characters in the string.

I just decided to do it for fun, was a good challenge. Not sure it's correct in all cases, so use at your own risk, but here it is:
function run() {
const text = 'Hello world👩‍🦰👩‍👩‍👦‍👦';
const newText = reverseText(text);
console.log(newText);
}
function reverseText(text) {
// first, create an array of characters
let textArray = [...text];
let lastCharConnector = false;
textArray = textArray.reduce((acc, char, index) => {
if (char.charCodeAt(0) === 8205) {
const lastChar = acc[acc.length-1];
if (Array.isArray(lastChar)) {
lastChar.push(char);
} else {
acc[acc.length-1] = [lastChar, char];
}
lastCharConnector = true;
} else if (lastCharConnector) {
acc[acc.length-1].push(char);
lastCharConnector = false;
} else {
acc.push(char);
lastCharConnector = false;
}
return acc;
}, []);
console.log('initial text array', textArray);
textArray = textArray.reverse();
console.log('reversed text array', textArray);
textArray = textArray.map((item) => {
if (Array.isArray(item)) {
return item.join('');
} else {
return item;
}
});
return textArray.join('');
}
run();

You can use:
yourstring.split('').reverse().join('')
It should turn your string into a list, reverse it then make it a string again.

const text = 'Hello world👩‍🦰👩‍👩‍👦‍👦';
const reversed = text.split('').reverse().join('');
console.log(reversed);

Is there a cleaner way to remove non-alphanumeric chars and replace spaces?

I would like to replace all non-alphanumeric characters, and replace spaces with underscores. So far I've come up with this using multiple regex which works but is there a more 'efficient' way?
"Well Done!".toLowerCase().replace(/\s/, '-').replace(/[^\w-]/gi, '');
well-done

At least in other languages, invoking the regular expressions engine is expensive. I'm not sure if that's true of JavaScript, but here's how you'd do it "C-style". I'm sure benchmarking its performance yourself will be a valuable learning experience.
var x = "Well Done!";
var y = "";
var c;
for (var i = 0; i < x.length; i++)
{
c = x.charCodeAt(i);
if (c >= 48 && c <= 57 || c >= 97 && c <= 122)
{
y += x[i];
}
else if (c >= 65 && c <= 90)
{
y += String.fromCharCode(c+32);
}
else if (c == 32 || c >= 9 && c <= 13)
{
y += '-';
}
}
$('#output').html(y);
See http://www.asciitable.com/ for ASCII codes. Here's a jsFiddle. Note that I've also implemented your toLowerCase() simply by adding 32 to the uppercase letters.
Disclaimer
Personally of course, I prefer readable code, and therefore prefer regular expressions, or using some kind of a strtr function if one exists in JavaScript. This answer is purely to educate.

Note: I thought I could come up with a faster solution with a single regex, but I couldn't. Below is my failed method (you can learn from failure), and the results of a performance test, and my conclusion.
Efficiency can be measured many ways. If you wanted to reduce the number of functions called, then you could use a single regex and a function to handle the replacement.
([A-Z])|(\s)|([^a-z\d])
REY
The first group will have toLowerCase() applied, the second will be replaced with a - and the third will return nothing. I originally used + quantifier for groups 1 and 3, but given the expected nature of the text, removing it result in faster execution. (thanks acheong87)
'Well Done!'.replace(/([A-Z])|(\s)|([^a-z\d])/g, function (match, $0, $1) {
if ($0) return String.fromCharCode($0.charCodeAt(0) + 32);
else if ($1) return '-';
return '';
});
jsFiddle
Performance
My method was the worst performing:
Acheong87 fastest
Original 16% slower
Mine 53% slower
jsPerf
Conclusion
Your method is the most efficient in terms of code development time, and the performance penalty versus acheong87's method is offset by code maintainability, readability, and complexity reduction. I would use your version unless speed was of the utmost importance.
The more optional matches I added to the regular expression, the greater the performance penalty. I can't think of any advantages to my method except for the function reduction, but that is offset by the if statements and increase in complexity.

Fastest way to check if a JS variable starts with a number

I am using an object as a hash table and I have stuffed both regular properties and integers as keys into it.
I am now interested in counting the number of keys in this object which are numbers, though obviously a for (x in obj) { if (typeof x === "number") { ... } } will not produce the result I want because all keys are strings.
Therefore I determined that it is sufficient for my purposes to assume that if a key's first character is a number then it must be a number so I am not concerned if key "3a" is "wrongly" determined to be a number.
Given this relaxation I think i can just check it like this
for (x in obj) {
var charCode = x.charCodeAt(0);
if (charCode < 58 && charCode > 47) { // ascii digits check
...
}
}
thereby avoiding a regex and parseInt and such.
Will this work? charCodeAt is JS 1.2 so this should be bullet-proof, yes?
Hint: I would love to see a jsperf comparing my function with what everyone comes up with. :) I'd do it myself but jsperf confuses me
Update: Thanks for starting up the JSPerf, it confirms my hope that the charCodeAt function would be executing a very quick piece of code reading out the int value of a character. The other approaches involve parsing.

parseInt(x, 10) will correctly parse a leading positive or negative number from a string, so try this:
function startsWithNumber(x) {
return !isNaN(parseInt(x, 10));
}
startsWithNumber('123abc'); // true
startsWithNumber('-123abc'); // true
startsWithNumber('123'); // true
startsWithNumber('-123'); // true
startsWithNumber(123); // true
startsWithNumber(-123); // true
startsWithNumber('abc'); // false
startsWithNumber('-abc'); // false
startsWithNumber('abc123'); // false
startsWithNumber('-abc123'); // false

Why speculate when you can measure. On Chrome, your method appears to be the fastest. The proposed alternatives all come at about 60% behind on my test runs.

The question is misleading because it is hard to tell this of a variable's name but in the example you're dealing with object properties (which are some kind of variables of course...). In this case, if you only need to know if it starts with a number, probably the best choice is parseInt. It will return NaN for any string that doesn't start with a number.

You could also use isNaN(x) or isFinite(x) - see this SO question

Finding an element in an array

4,5,6,7];
pin=3;
We got to search pin in hay.
Conventionally we loop through hay and check for pin( assume there is no native function called array.indexOf ).
How about,
hay=hay.join(",");
pin=","+pin+",";
index=hay.indexOf(pin);
Any Suggestions please?

Consider hay of [2,3,4] and a pin of 2... you'll be looking for ",2," in a string "2,3,4". Now you could add commas to the start and end of hay as well, but it's a bit ugly, isn't it?
There's then the problem of strings having variable lengths: consider an array of [1,222222,3,4]. When you look for 3, you'll end up with an inappropriate index because of the length of "222222". (Even in the case of only single digit values, you'll need to divide by 3.)
You've then potentially got problems when you start moving from integers to decimal values, which may be formatted differently in different cultures - possibly using a comma as the decimal separator. I don't know whether JavaScript uses the culture-specific separator by default, but that's part of the problem - you're suddenly having to consider aspects of the language/platform which have nothing to do with the task at hand.
Generally speaking, converting data into a string format in order to do something which doesn't really depend on the string format is a bad idea. It would be better to write a general-purpose indexOf method to perform the looping for you. (I'd be surprised if such a method didn't already exist, to be honest, but it's easy enough to write once and reuse if you need to.)

Heck, assume there is no string indexOf, either.
var A=[11,7,9,1,17,13,19,18,10,6,3,8,2,5,4,14,20,15,16,12],
L=A.length, n=3;
while(L>-1 && A[--L]!==n);
alert(L)

You don't need to use string in the middle, you can just loop through your array, if I understand your question right.
var hay = [1, 2, 3, 'Whoa', 'wheee', 5, 'needle'], needle = 'needle';
for ( var i = 0, len = hay.length; i < len; i += 1 ) {
if ( hay[ i ] === needle ) {
alert( "hay’s element number " + i + " is the needle!" );
break;
}
}

Develop Reference

JavaScript is the programming language of the Web.

Fastest way in JS to detect number of leading zeros - javascript

if (Number(n).toString() !== n){ console.log("leading 0 detected") } this may not be the fastest, but simpler than writing a function

Related

how to stop replace function in JavaScript?

How to reverse a string that contains complicated emojis?

Is there a cleaner way to remove non-alphanumeric chars and replace spaces?

Fastest way to check if a JS variable starts with a number

Finding an element in an array

Categories

Resources