Why is String.indexOf functioning like this? [duplicate]

Why is String.indexOf functioning like this? [duplicate] - javascript

This question already has answers here:
Seemingly identical strings fail comparison
(4 answers)
Closed 2 years ago.
I'm trying to match some text based on a query that the user inputs. After encountering some issues, I found out this rather odd behaviour of String.indexOf that I simply cannot understand:
If I try to match a query without diacritics against a string with diacritics, it works: (not sure why)
"brezzel cu brânză".indexOf("bra")
11
But matching the same string with another letter after it, doesn't work:
"brezzel cu brânză".indexOf("bran")
-1
(tested both in Chrome & Firefox, same behaviour)
Is this a documented behaviour that I'm unaware of or what exactly is happening here?

If I remember correctly, js characters are encoded in 2 bytes. But many other unicode chars encoded 4 bytes. Now the char â is 4 bytes. The first 2 bytes is a, thats why the first case works. Use the escape function to see:
escape("brezzel cu brânză")
"brezzel%20cu%20bra%u0302nza%u0306"
see that %20 is space, followed by bra and then you have %u0302 which together with previous a, encodes â.
Probably you can tell the rest. Test it if you want to:
'a' + String.fromCharCode('0x0302') //â

Related

Split numbers from a string (can also be decimal number) [duplicate]

This question already has answers here:
how to extract floating numbers from strings in javascript
(3 answers)
Closed 2 years ago.
Lately iv'e been trying to find some ways to manipulate a string (for some project of mine) and i'm having a hard finding something that will mach my case.
usually the string will include 3 numbers (can also be decimal - that's what make it more complicated) and separated by 1 / 2 signs ("-", "x", "*" and so on...)
i did some research online and found this solution (which i thought it was good)
.match(/\d+/g)
when i tried it on some case the result was good
var word = "9-6x3"
word = word.match(/\d+/g)
it gave me array with 3 indexes, each index held a number ['9', '6', '3'] (which is good), but if the string had a dot (decimal number) this regex would have ignored it.
i need some regex which can ignore the dots in a string but can achieve the same result.
case =
var word = "9.5-9.3x7" output = ['9.5', '9.3', '7']

Try this regular expression to allow for an optional decimal place:
word.match(/\d+([\.]\d+)?/g)
This says:
\d+ - any number of digits
([\.]\d+)? - optionally one decimal point followed by digits

Here is a simple regex that suits your requirement,
/\d+\.?\d*/g

Why does a method like `toString` require two dots after a number? [duplicate]

This question already has answers here:
Why does 10..toString() work, but 10.toString() does not? [duplicate]
(3 answers)
Closed 5 years ago.
What is the logic behind 42..toString() with ..?
The double dot works and returns the string "42", whereas 42.toString() with a single dot fails.
Similarly, 42...toString() with three dots also fails.
Can anyone explain this behavior?
console.log(42..toString());
console.log(42.toString());

When you enter 42.toString() it will be parsed as 42 with decimal value "toString()" which is of course illegal. 42..toString() is simply a short version of 42.0.toString() which is fine. To get the first one to work you can simply put paranthesis around it (42).toString().

it is like 42.0.tostring() so it show's decimal point you can use (42).toString() 42 .toString() that also work there is space between 42 and dot. This is all because in javascript almost everything is object so that confusion in dot opt.

With just 42.toString(); it's trying to parse as a number with a decimal, and it fails.
and when we write 42..toString(); taken as 42.0.toString();
we can get correct output by
(42).toString();
(42.).toString();
Can refer Link for .toString() usage

How to generate a UID with only 7 characters and exclude repetition like imgur.com

take a look at this URL: http://imgur.com/JLhIuYt
You will see that the URL has a seemingly random string generated, which is build from 7 characters of
small letters (26 characters)
big letters (26 characters)
numbers (10 numbers)
n = (26+26+10) = 62
I would like to know how it is possible to generate a random string of only 7 characters, that works as a GUID.
With only 7 characters, imgur is using, they can generate 3.521.614.606.208 variations (62 to the power of 7). The question now arises, how imgur handles each variation to be used as an identifier, since it seems that those numbers are generated randomly.
Is there a way to find out, how it is possible to use 7 characters as UID and making sure, they don't repeat themselves?
One solution could be, to generate them in chunks and to use one after the other. Seems rather not good.
Any kind appreciated!
Btw.
Random Strin best generated in PHP

You can use this one-liner to generate a random string:
substr(
str_shuffle(
str_repeat('abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890', 7)
),
0,
7
)
Then, you can check in a database if it's already used or not like this for example:
while(
existsInTheDB(
$str = substr(
str_shuffle(
str_repeat(
'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890',
7
)
),
0,
7
)
)
){}
//now you can use $str

I assume they have a database in the backend and handle the collisions. You generate an identifier, look if it still exists, then generate another one till you get one that doesn't exist.
Or they have just incremented some 64bit integer and generated a string from it.

Performance of regex within jQuery data selector: dependance on certain string length

The setup: I have a div with a bunch of radio buttons, each of which has been associated with a custom attribute and value using $(element).data(attr_name,attr_value);. When an underlying data structure is changed, I iterate over the fields and set the appropriate buttons to checked:true by using the ':data' selector found here: https://stackoverflow.com/a/2895933/1214731
$($('#style-options').find(':radio').filter(':data('+key+'=='+value+')'))
.prop('checked',true).button('refresh');
This works great: it finds the appropriate elements, even with floating-point values.
Performance depends on value:
I noticed that when I clicked on certain buttons, the page took fractionally longer to respond (for most buttons there was no noticeable delay). Digging a little deeper, this seems to be occurring when certain floating point values are being searched for.
Using chrome dev tools, I logged the following:
> key='fill-opacity';
"fill-opacity"
> value=.2*2;
0.4
> console.time('find data'); for(var i=0;i<100;++i){$('#style-options').find(':radio').filter(':data('+key+'=='+value+')')} console.timeEnd('find data');
find data: 43.352ms undefined
> value=.2*3;
0.6000000000000001
> console.time('find data'); for(var i=0;i<100;++i){$('#style-options').find(':radio').filter(':data('+key+'=='+value+')')} console.timeEnd('find data');
find data: 10322.866ms undefined
The difference in speed is a factor of >200!
Next, I tried typing the number in manually (e.g. decimal place, six, 14x zeros, one) - same speed. All numbers with the same number of digits were the same speed. I then reduced the number of digits progressively:
# of digits time (ms)
16 10300
15 5185
14 2665
13 1314
12 673
11 359
10 202
9 116
8 77
7 60
6 50
5 41
4 39
I quickly ruled out the equality check between numeric and string - no dependence on string length there.
The regex execution is strongly dependent on string length
In the linked answer above, the regex that parses the data string is this:
var matcher = /\s*(?:((?:(?:\\\.|[^.,])+\.?)+)\s*([!~><=]=|[><])\s*("|')?((?:\\\3|.)*?)\3|(.+?))\s*(?:,|$)/g;
The string passed in is of the form [name operator value]. The length of name doesn't seem to make much difference; the length of value has a big impact on speed however.
Specific questions:
1) Why does the length of name have minimal effect on performance, while the length of value has a large effect?
2) Doubling the execution time with each additional character in name seems excessive - is this just a characteristic of the particular regex the linked solution uses, or is it a more general feature?
3) How can I improve performance without sacrificing a lot of flexibility? I'd like to still be able to pass arguments as a single string to a jQuery selector so type checking up front seems difficult, though I'm open to suggestions.
Basic test code for regex matching speeds:
matcher = /\s*(?:((?:(?:\\\.|[^.,])+\.?)+)\s*([!~><=]=|[><])\s*("|')?((?:\\\3|.)*?)\3|(.+?))\s*(?:,|$)/g;
console.time('regex'); for(var i=0;i<1000;++i){matcher.lastIndex=0; matcher.exec('x=='+.1111111111111)}; console.timeEnd('regex')
regex: 538.018ms
//add an extra digit - doubles duration of test
console.time('regex'); for(var i=0;i<1000;++i){matcher.lastIndex=0; matcher.exec('x=='+.11111111111111)}; console.timeEnd('regex')
regex: 1078.742ms
//add a bunch to the length of 'name' - minimal effect
console.time('regex'); for(var i=0;i<1000;++i){matcher.lastIndex=0; matcher.exec('xxxxxxxxxxxxxxxxxxxx=='+.11111111111111)}; console.timeEnd('regex')
regex: 1084.367ms

A characteristic of regexp matching is that they are greedy. If you try to match the expression a.*b to the string abcd, it will happen in these steps:
the first "a" will match
the .* will match the second char, then the third, till the end of the string
reaching the end of the string, there is still a "b" to matched, the matching will fail
the regexp processing starts to backtrack
the last char will be "unmatched" and it will try to match "b" to "d". Fails again. More backtracking
tries to match "b" to "c". Fail. Backtrack.
match "b" to "b". Success. Matching ends.
Although you matched just a few chars, you iterated all the string. If you have more than one greedy operator, you can easily get an input string that will match with exponential complexity.
Understanding backtracking will prevent a lot of errors and performance problems. For example, 'a.*b' will match all the string 'abbbbbbb', instead of just the first 'ab'.
The easiest way to prevent these kind of errors in modern regexp engines, is to use the non-greedy version of the operators * and +. They are usually represented by the same operators followed by a question mark: *? and +?.
I confess that I really didn't stop to debug the complicate regexp that you posted, but I believe that the problem is before matching the '=' symbol. The greedy operator is in this subexpression:
(?:\\\.|[^.,])+\.?)+
I'd try to change it to a non-greedy version:
(?:\\\.|[^.,])+\.?)+?
but this is really just a wild guess. I'm using pattern recognition to solve the problem :-) It makes sense because it is backtracking for each character of the "value" till matching the operator. The name is matched linearly.
This regular expression is just too complex for my taste. I love regular expressions, but it looks like this one matches this famous quotation:
Some people, when confronted with a problem, think "I know, I'll use
regular expressions." Now they have two problems.

Javascript + Emoji strangeness

I'm trying to do some string methods with some text that has Emoji embedded inside of it.
However, this is a very strange thing I have seen:
"🌑".length == 2
I'm just wondering how it decides what appears as 1 character to me, is actually 2.

In Javascript, a string is a sequence of 16-bit code points. Since emoji > are encoded above the BMP, it means that they are represented by a pair > of code points, also known as a surrogate pair.
So for instance, 0x1F600, which is 😀, is represented by:
"\uD83D\uDE00"
If you're willing to go further on this, you can read this article: Emojis in Javascript - Parsing emoji in Javascript is… not easy.

Develop Reference

JavaScript is the programming language of the Web.

Why is String.indexOf functioning like this? [duplicate] - javascript

Related

Split numbers from a string (can also be decimal number) [duplicate]

Why does a method like `toString` require two dots after a number? [duplicate]

How to generate a UID with only 7 characters and exclude repetition like imgur.com

Performance of regex within jQuery data selector: dependance on certain string length

Javascript + Emoji strangeness

Categories

Resources