how to stop replace function in JavaScript? - javascript

I have a very big string (> 2mln chars).
And I need to replace the text in this string, but this works very slowly.
How I can stop the replacer function if offset > 65535?
let inReg = MyRegExp();
let str = "very big string....";
str = str.replace(inReg, (match, p1, p2, offset, string) => {
if (offset > 65535) {
return myConvertFunc(match);
} else {
//How i can stop replacer?
}
return match; //this did not stop the replacer function
});

You can't. You could test the offset and just return the original text if it was > 65535, but the search would still continue past that point and your callback would still get called.
Instead, isolate the part you want to do the change in:
const segmentLength = 65536;
str = str.substring(0, segmentLength).replace(myConvertFunc) +
str.substring(segmentLength);
On the face of it, that seems insaneΒ β€” creating two new strings, one of them a copy of 2M characters! But modern JavaScript engines are very smart about strings (which, being immutable, offer a lot of opportunity for optimization). It's likely any good JavaScript engine would reuse, not copy, the contents of the two strings created via substring above, sharing the underlying character array with the original.

Related

Removing a String from an Array, but adding the Number amount back in Javascript

I am trying to create a FEN notation converter for chess. I've decided the best way to go about this is to remove any unnecessary values from the string that I don't need. Here is what it looks like.
rnbqkbnr/pppppppp/8/
These are all strings in my code. I've already found solutions for the end of each file which is represented by the / in the string. But my new problem is taking away the number values and adding back in blank squares to cover them.
When you see a number in FEN, that is essentially how many empty squares on the Chess table there are until another piece. My idea once again, to dumb it down for myself, was to convert these numbers like 8 into this 11111111 a series of numbers representing each square that would have to be empty.
While in practice, I assumed that I would just be able to splice the 8 out and just start filling up that index onwards with 1, this seems to be viable option, but unfortunately is quite buggy when moving the number around to different places.
Would anyone have a better option for this? Yes I know that there is already libraries that accomplish this, but what's the fun of coding if you don't attempt to reinvent the wheel a couple times.
correctFen function - maps over the string looking for any numbers
const correctFen = () => {
newFen.map((pos, index) => {
if (Number(pos) && pos !== '/'){
for (let i = 0; i < Number(pos); i++) {
console.log('firing')
setNewFen(addAfter(newFen, index + i, '1'))
}
}
})
console.log(newFen)
figureOutPos()
}
after looking at this, it's not really removing the index that I'm wanting, could be a problem here, it's adding the 1s after the last /
function addAfter(array, index, newItem) {
return [
...array.slice(0, index),
newItem,
...array.slice(index)
];
}
Looks like you're modifying newFen from the map callback. This would indeed mess up indices because everything after the current index shifts around.
Instead, return the new character(s) from the map callback. In most cases this will be the original character, but if it's a digit, you'll return a string of multiple characters. Then join the array together into a string again.
Something like this:
// Assuming newFen is a string in some outer scope.
const correctFen = () => {
const correctedFen = newFen
// Split the string into an array of single characters.
.split('')
// Apply a function to each character and gather the results in an array.
.map((char, _index) => {
// Try to parse `char` as an integer.
const number = parseInt(char)
if (isNaN(number)) {
// `char` is not a digit. Return the character itself.
return char
} else {
// `char` is a digit. Return a string of that many 1s.
return '1'.repeat(number)
}
})
// Join the array back into a string.
.join('')
setNewFen(correctedFen)
}

How to reverse a string that contains complicated emojis?

Input:
Hello worldπŸ‘©β€πŸ¦°πŸ‘©β€πŸ‘©β€πŸ‘¦β€πŸ‘¦
Desired Output:
πŸ‘©β€πŸ‘©β€πŸ‘¦β€πŸ‘¦πŸ‘©β€πŸ¦°dlrow olleH
I tried several approaches but none gave me correct answer.
This failed miserablly:
const text = 'Hello worldπŸ‘©β€πŸ¦°πŸ‘©β€πŸ‘©β€πŸ‘¦β€πŸ‘¦';
const reversed = text.split('').reverse().join('');
console.log(reversed);
This kinda works but it breaks πŸ‘©β€πŸ‘©β€πŸ‘¦β€πŸ‘¦ into 4 different emojis:
const text = 'Hello worldπŸ‘©β€πŸ¦°πŸ‘©β€πŸ‘©β€πŸ‘¦β€πŸ‘¦';
const reversed = [...text].reverse().join('');
console.log(reversed);
I also tried every answer in this question but none of them works.
Is there a way to get the desired output?
If you're able to, use the _.split() function provided by lodash. From version 4.0 onwards, _.split() is capable of splitting unicode emojis.
Using the native .reverse().join('') to reverse the 'characters' should work just fine with emojis containing zero-width joiners
function reverse(txt) { return _.split(txt, '').reverse().join(''); }
const text = 'Hello worldπŸ‘©β€πŸ¦°πŸ‘©β€πŸ‘©β€πŸ‘¦β€πŸ‘¦';
console.log(reverse(text));
<script src="https://cdnjs.cloudflare.com/ajax/libs/lodash.js/4.17.20/lodash.min.js" integrity="sha512-90vH1Z83AJY9DmlWa8WkjkV79yfS2n2Oxhsi2dZbIv0nC4E6m5AbH8Nh156kkM7JePmqD6tcZsfad1ueoaovww==" crossorigin="anonymous"></script>
I took TKoL's idea of using the \u200d character and used it to attempt to create a smaller script.
Note: Not all compositions use a zero width joiner so it will be buggy with other composition characters.
It uses the traditional for loop because we skip some iterations in case we find combined emoticons. Within the for loop there is a while loop to check if there is a following \u200d character. As long there is one we add the next 2 characters as well and forward the for loop with 2 iterations so combined emoticons are not reversed.
To easily use it on any string I made it as a new prototype function on the string object.
String.prototype.reverse = function() {
let textArray = [...this];
let reverseString = "";
for (let i = 0; i < textArray.length; i++) {
let char = textArray[i];
while (textArray[i + 1] === '\u200d') {
char += textArray[i + 1] + textArray[i + 2];
i = i + 2;
}
reverseString = char + reverseString;
}
return reverseString;
}
const text = "Hello worldπŸ‘©β€πŸ¦°πŸ‘©β€πŸ‘©β€πŸ‘¦β€πŸ‘¦";
console.log(text.reverse());
//Fun fact, you can chain them to double reverse :)
//console.log(text.reverse().reverse());
Reversing Unicode text is tricky for a lot of reasons.
First, depending on the programming language, strings are represented in different ways, either as a list of bytes, a list of UTF-16 code units (16 bits wide, often called "characters" in the API), or as ucs4 code points (4 bytes wide).
Second, different APIs reflect that inner representation to different degrees. Some work on the abstraction of bytes, some on UTF-16 characters, some on code points. When the representation uses bytes or UTF-16 characters, there are usually parts of the API that give you access to the elements of this representation, as well as parts that perform the necessary logic to get from bytes (via UTF-8) or from UTF-16 characters to the actual code points.
Often, the parts of the API performing that logic and thus giving you access to the code points have been added later, as first there was 7 bit ascii, then a bit later everybody thought 8 bits were enough, using different code pages, and even later that 16 bits were enough for unicode. The notion of code points as integer numbers without a fixed upper limit was historically added as the fourth common character length for logically encoding text.
Using an API that gives you access to the actual code points seems like that's it. But...
Third, there are a lot of modifier code points affecting the next code point or following code points. E.g. there's a diacritic modifier turning a following a into an Γ€, e to Γ«, &c. Turn the code points around, and aΓ« becomes eΓ€, made of different letters. There is a direct representation of e.g. Γ€ as its own code point but using the modifier is just as valid.
Fourth, everything is in constant flux. There are also a lot of modifiers among the emoji, as used in the example, and more are added every year. Therefore, if an API gives you access to the information whether a code point is a modifier, the version of the API will determine whether it already knows a specific new modifier.
Unicode provides a hacky trick, though, for when it's only about the visual appearance:
There are writing direction modifiers. In the case of the example, left-to-right writing direction is used. Just add a right-to-left writing direction modifier at the beginning of the text and depending on the version of the API / browser, it will look correctly reversed 😎
'\u202e' is called right to left override, it is the strongest version of the right to left marker.
See this explanation by w3.org
const text = 'Hello worldπŸ‘©β€πŸ¦°πŸ‘©β€πŸ‘©β€πŸ‘¦β€πŸ‘¦'
console.log('\u202e' + text)
const text = 'Hello worldπŸ‘©β€πŸ¦°πŸ‘©β€πŸ‘©β€πŸ‘¦β€πŸ‘¦'
let original = document.getElementById('original')
original.appendChild(document.createTextNode(text))
let result = document.getElementById('result')
result.appendChild(document.createTextNode('\u202e' + text))
body {
font-family: sans-serif
}
<p id="original"></p>
<p id="result"></p>
I know! I'll use RegExp. What could go wrong? (Answer left as an exercise for the reader.)
const text = 'Hello worldπŸ‘©β€πŸ¦°πŸ‘©β€πŸ‘©β€πŸ‘¦β€πŸ‘¦';
const reversed = text.match(/.(\u200d.)*/gu).reverse().join('');
console.log(reversed);
Alternative solution would be to use runes library, small but effective solution:
https://github.com/dotcypress/runes
const runes = require('runes')
// String.substring
'πŸ‘¨β€πŸ‘¨β€πŸ‘§β€πŸ‘§a'.substring(1) => 'οΏ½β€πŸ‘¨β€πŸ‘§β€πŸ‘§a'
// Runes
runes.substr('πŸ‘¨β€πŸ‘¨β€πŸ‘§β€πŸ‘§a', 1) => 'a'
runes('12πŸ‘©β€πŸ‘©β€πŸ‘¦β€πŸ‘¦3πŸ•βœ“').reverse().join();
// results in: "βœ“πŸ•3πŸ‘©β€πŸ‘©β€πŸ‘¦β€πŸ‘¦21"
You don't just have trouble with emoji, but also with other combining characters.
These things that feel like individual letters but are actually one-or-more unicode characters are called "extended grapheme clusters".
Breaking a string into these clusters is tricky (for example see these unicode docs). I would not rely on implementing it myself but use an existing library. Google pointed me at the grapheme-splitter library. The docs for this library contain some nice examples that will trip up most implementations:
Using this you should be able to write:
var splitter = new GraphemeSplitter();
var graphemes = splitter.splitGraphemes(string);
var reversed = graphemes.reverse().join('');
ASIDE: For visitors from the future, or those willing to live on the bleeding edge:
There is a proposal to add a grapheme segmenter to the javascript standard. (It actually provides other segmenting options too).
It is in stage 3 review for acceptance at the moment and is currently implemented in JSC and V8 (see https://github.com/tc39/proposal-intl-segmenter/issues/114).
Using this the code would look like:
var segmenter = new Intl.Segmenter("en", {granularity: "grapheme"})
var segment_iterator = segmenter.segment(string)
var graphemes = []
for (let {segment} of segment_iterator) {
graphemes.push(segment)
}
var reversed = graphemes.reverse().join('');
You can probably make this neater if you know more modern javascript than me...
There is an implementation here - but I don't know what it requires.
Note: This points out a fun issue that other answers haven't addressed yet. Segmentation can depend upon the locale that you are using - not just the characters in the string.
I just decided to do it for fun, was a good challenge. Not sure it's correct in all cases, so use at your own risk, but here it is:
function run() {
const text = 'Hello worldπŸ‘©β€πŸ¦°πŸ‘©β€πŸ‘©β€πŸ‘¦β€πŸ‘¦';
const newText = reverseText(text);
console.log(newText);
}
function reverseText(text) {
// first, create an array of characters
let textArray = [...text];
let lastCharConnector = false;
textArray = textArray.reduce((acc, char, index) => {
if (char.charCodeAt(0) === 8205) {
const lastChar = acc[acc.length-1];
if (Array.isArray(lastChar)) {
lastChar.push(char);
} else {
acc[acc.length-1] = [lastChar, char];
}
lastCharConnector = true;
} else if (lastCharConnector) {
acc[acc.length-1].push(char);
lastCharConnector = false;
} else {
acc.push(char);
lastCharConnector = false;
}
return acc;
}, []);
console.log('initial text array', textArray);
textArray = textArray.reverse();
console.log('reversed text array', textArray);
textArray = textArray.map((item) => {
if (Array.isArray(item)) {
return item.join('');
} else {
return item;
}
});
return textArray.join('');
}
run();
You can use:
yourstring.split('').reverse().join('')
It should turn your string into a list, reverse it then make it a string again.
const text = 'Hello worldπŸ‘©β€πŸ¦°πŸ‘©β€πŸ‘©β€πŸ‘¦β€πŸ‘¦';
const reversed = text.split('').reverse().join('');
console.log(reversed);

Fastest way in JS to detect number of leading zeros

I am working on a project where I need to quickly check if there are a certain number of leading zeros in string. I have had success using regex:
var regex = new RegExp('^[0]{' + difficulty + '}.+');
if (regex.test(hash))
Also with substring and repeat:
if (hash.substring(0, difficulty) === '0'.repeat(difficulty))
For my specific purpose, speed is the most important element. I must find the fastest way to check if the number of leading zeros matches the difficulty. I have ran benchmark tests on both methods, but the results fluctuate in a way that I cannot tell which one is better. Also, if there is another better method, please let me know. Thanks in advance.
if (Number(n).toString() !== n){
console.log("leading 0 detected")
}
this may not be the fastest, but simpler than writing a function
function detect(hash, difficulty) {
for (var i = 0, b = hash.length; i < b; i ++) {
if (hash[i] !== '0') {
break;
}
}
return i === difficulty;
}
Your methods have drawbacks that you construct intermediate objects and do heavy arithmetics (especially with regexp but with substring and full string comparisons as well). This one should be quite fast.

Is there an eval() alternative for this expression?

I have a string expression like such: "1+2+3", and it must stay as a string. In other words, looping and casting the digits to perform the operation isn't an option, so my solution is eval("1+2+3"). But eval() is slow and has all these issues associated with it. Is there an alternative to evaluate my string?
Evaluating a string is not only slow, it's dangerous. What if, by malicious user intent or error, you end up evaluating code that crashes your program, destroys your data o opens a security hole?
No, you should not eval() the string. You should split it, cast the operands to numbers, and sum them.
You can keep the string around if you like (you said you needed the string), but using the string to actually perform this operation is a Really Bad Idea.
var string = "1+2+3"
var numbers = string.split('+').map(function(x) { return parseInt(x) })
var sum = numbers.reduce(function(total, x) { return total + x }, 0)
This is a silly question:
var reducer = function (a, b) {
return +a + +b;
};
"1+2+3".match(/[+-]?\d+/g).reduce(reducer); // 6
// or addition only
"1+2+3".split(/\D/).reduce(reducer); // 6

Calculate real length of a string, like we do with the caret

What I want is to calculate how much time the caret will move from the beginning till the end of the string.
Explanations:
Look this string "" in this fiddle: http://jsfiddle.net/RFuQ3/
If you put the caret before the first quote then push the right arrow β–Ί you will push 3 times to arrive after the second quote (instead of 2 times for an empty string).
The first way, and the easiest to calculate the length of a string is <string>.length.
But here, it returns 2.
The second way, from JavaScript Get real length of a string (without entities) gives 2 too.
How can I get 1?
1-I thought to a way to put the string in a text input, and then do a while loop with a try{setCaret}catch(){}
2-It's just for fun
The character in your question "󠀁" is the
Unicode Character 'LANGUAGE TAG' (U+E0001).
From the following Stack Overflow questions,
" Expressing UTF-16 unicode characters in JavaScript"
" How can I tell if a string contains multibyte characters in Javascript?"
we learn that
JavaScript strings are UCS-2 encoded but can represent Unicode code points outside the Basic Multilingual Pane (U+0000-U+D7FF and U+E000-U+FFFF) using two 16 bit numbers (a UTF-16 surrogate pair), the first of which must be in the range U+D800-U+DFFF.
The UTF-16 surrogate pair representing "󠀁" is U+DB40 and U+DC01. In decimal U+DB40 is 56128, and U+DC01 is 56321.
console.log("󠀁".length); // 2
console.log("󠀁".charCodeAt(0)); // 56128
console.log("󠀁".charCodeAt(1)); // 56321
console.log("\uDB40\uDC01" === "󠀁"); // true
console.log(String.fromCharCode(0xDB40, 0xDC01) === "󠀁"); // true
Adapting the code from https://stackoverflow.com/a/4885062/788324, we just need to count the number of code points to arrive at the correct answer:
var getNumCodePoints = function(str) {
var numCodePoints = 0;
for (var i = 0; i < str.length; i++) {
var charCode = str.charCodeAt(i);
if ((charCode & 0xF800) == 0xD800) {
i++;
}
numCodePoints++;
}
return numCodePoints;
};
console.log(getNumCodePoints("󠀁")); // 1
jsFiddle Demo
function realLength(str) {
var i = 1;
while (str.substring(i,i+1) != "") i++;
return (i-1);
}
Didn't try the code, but it should work I think.
Javascript doesn't really support unicode.
You can try
yourstring.replace(/[\uD800-\uDFFF]{2}/g, "0").length
for what it's worth

Categories

Resources