Javascript string.length does not equal Python len()

Javascript string.length does not equal Python len() - javascript

Imagine the following text entered in an HTML textarea:
123456
7
If one calculates the length of this text via javascript, i.e. string.length, that comes out to 10.
Now if that input's length is measured in python, i.e. via len(string), it is 13.
It does not look 13 to the human eye, but if one runs print repr(string) in python, we get 123456\r\n\r\n\r\n7. That is 13 characters, not 10. For reference, this test was carried out in Ubuntu OS.
Is there any way for python to report the string length via a mechanism that imitates javascript's string.length's result? I.e. in simpler terms, how do I get 10 in python?
I understand I can manually iterate and collapse \r\n into a single character, but I wonder if there is a more robust - even inbuilt - way to do it? In any case, an illustrative example would be great!

You can make use of Regular Expressions which is much more elegant than iterating. Replacing the characters \n and \r by '' does the trick.
Use the re module of python.
import re
x = '123456\r\n\r\n\r\n7'
y = re.sub(r'\r\n','\n',x)
print(len(y)) #Answer will be 10
For further reference, check out the python docs

Related

Can this numeric range regex be refactored?

I need to match a number range:
-9223372036854775808 to 9223372036854775807
^(?:922337203685477580[0-7]|9223372036854775[0-7]\d{2}|922337203685477[0-4]\d{3}|92233720368547[0-6]\d{4}|9223372036854[0-6]\d{5}|922337203685[0-3]\d{6}|92233720368[0-4]\d{7}|9223372036[0-7]\d{8}|922337203[0-5]\d{9}|92233720[0-2]\d{10}|922337[0-1]\d{12}|92233[0-6]\d{13}|9223[0-2]\d{14}|922[0-2]\d{15}|92[0-1]\d{16}|9[01]\d{17}|[1-8]\d{18}|\d{0,18}|-(?:922337203685477580[0-8]|9223372036854775[0-7]\d{2}|922337203685477[0-4]\d{3}|92233720368547[0-6]\d{4}|9223372036854[0-6]\d{5}|922337203685[0-3]\d{6}|92233720368[0-4]\d{7}|9223372036[0-7]\d{8}|922337203[0-5]\d{9}|92233720[0-2]\d{10}|922337[0-1]\d{12}|92233[0-6]\d{13}|9223[0-2]\d{14}|922[0-2]\d{15}|92[0-1]\d{16}|9[01]\d{17}|[1-8]\d{18}|\d{0,18}))?$
// space for easier copy and paste
Yes, I know it sounds crazy, but there's a long story behind this. I can't figure out how to do this in JavaScript by just checking a range, because of the size of the number, and this must be accurate.
Here's the thought process in breaking this thing down. I just started with the max number and worked my way down, then worked on the negative by just adding the - in the regex. You'll obviously have to copy and paste this thing somewhere to see it all. Also, could be mistakes. Made my head nearly explode.
9,223,372,036,854,775,807
922337203685477580[0-7]
9223372036854775[0-7][0-9]{2}
922337203685477[0-4][0-9]{3}
92233720368547[0-6][0-9]{4}
9223372036854[0-6][0-9]{5}
922337203685[0-3][0-9]{6}
92233720368[0-4][0-9]{7}
9223372036[0-7][0-9]{8}
922337203[0-5][0-9]{9}
92233720[0-2][0-9]{10}
922337[0-1][0-9]{12}
92233[0-6][0-9]{13}
9223[0-2][0-9]{14}
922[0-2][0-9]{15}
92[0-1][0-9]{16}
9[01][0-9]{17}
[1-8][0-9]{18}
[0-9]{0,18}
There's a single digit different in the negative vs. positive, so you'll see where I had to basically duplicate most of this.
So a few question:
Did I do this right?
If not, what's a better way?
Can this be done without regular expressions considering the size of the number? I need to validate client-side.
Can it be refactored and still retain strict rules?
Suggestions appreciated :)

Can this be done without regular expressions considering the size of the number?
It can be done in a series of if statements using only string operations (no need to convert to numbers).
all strings that don't match [0-9]{1,19} are out
all candidates that are of length 18 or less are good
for length 19 you can work with string comparison to see if they are numerically less than your upper limit
tweak the above to take care of negative numbers

Your regex is correct.
This is a shorter version
^(?:-9223372036854775808|-?(?:\d{0,18}|(?!922337203685477580[8-9]|92233720368547758[1-9]|92233720368547759|922337203685477[6-9]|92233720368547[8-9]|9223372036854[8-9]|922337203685[5-9]|92233720368[6-9]|92233720369|922337203[7-9]|92233720[4-9]|9223372[1-9]|922337[3-9]|92233[8-9]|9223[4-9]|922[4-9]|92[3-9]|9[3-9])\d{19}))$
Regex demo
How to generate that regex without mistake:
Input max number:
9223372036854775807
Output:
9223372036854775807
922337203685477580
92233720368547758
9223372036854775
922337203685477
92233720368547
9223372036854
922337203685
92233720368
9223372036
922337203
92233720
9223372
922337
92233
9223
922
92
9
Replace last number letter
9->remove all line
8->9
7->[8-9]
6->[7-9]
5->[6-9]
4->[5-9]
3->[4-9]
2->[3-9]
1->[2-9]
0->[1-9]
Output:
922337203685477580[8-9]
92233720368547758[1-9]
92233720368547759
922337203685477[6-9]
92233720368547[8-9]
9223372036854[8-9]
922337203685[5-9]
92233720368[6-9]
92233720369
922337203[7-9]
92233720[4-9]
9223372[1-9]
922337[3-9]
92233[8-9]
9223[4-9]
922[4-9]
92[3-9]
9[3-9]
Regex [Output]
922337203685477580[8-9]|92233720368547758[1-9]|92233720368547759|922337203685477[6-9]|92233720368547[8-9]|9223372036854[8-9]|922337203685[5-9]|92233720368[6-9]|92233720369|922337203[7-9]|92233720[4-9]|9223372[1-9]|922337[3-9]|92233[8-9]|9223[4-9]|922[4-9]|92[3-9]|9[3-9]
Add these[output] to regex
(?!output)\d{19}
Will become [output2]
(?!922337203685477580[8-9]|92233720368547758[1-9]|92233720368547759|922337203685477[6-9]|92233720368547[8-9]|9223372036854[8-9]|922337203685[5-9]|92233720368[6-9]|92233720369|922337203[7-9]|92233720[4-9]|9223372[1-9]|922337[3-9]|92233[8-9]|9223[4-9]|922[4-9]|92[3-9]|9[3-9])\d{19}
Matches \d{19} <= 9223372036854775807
Add
^(?:-9223372036854775808|-?(?:\d{0,18}|[output2]))$
^(?:-9223372036854775808|-?(?:\d{0,18}|(?!922337203685477580[8-9]|92233720368547758[1-9]|92233720368547759|922337203685477[6-9]|92233720368547[8-9]|9223372036854[8-9]|922337203685[5-9]|92233720368[6-9]|92233720369|922337203[7-9]|92233720[4-9]|9223372[1-9]|922337[3-9]|92233[8-9]|9223[4-9]|922[4-9]|92[3-9]|9[3-9])\d{19}))$
Will match
-9223372036854775808 or
+/- \d{0,18} or
+/- \d{19} <= 9223372036854775807
Demo

Reassembling negative Python marshal int's into Javascript numbers

I'm writing a client-side Python bytecode interpreter in Javascript (specifically Typescript) for a class project. Parsing the bytecode was going fine until I tried out a negative number.
In Python, marshal.dumps(2) gives 'i\x02\x00\x00\x00' and marshal.dumps(-2) gives 'i\xfe\xff\xff\xff'. This makes sense as Python represents integers using two's complement with at least 32 bits of precision.
In my Typescript code, I use the equivalent of Node.js's Buffer class (via a library called BrowserFS, instead of ArrayBuffers and etc.) to read the data. When I see the character 'i' (i.e. buffer.readUInt8(offset) == 105, signalling that the next thing is an int), I then call readInt32LE on the next offset to read a little-endian signed long (4 bytes). This works fine for positive numbers but not for negative numbers: for 1 I get '1', but for '-1' I get something like '-272777233'.
I guess that Javascript represents numbers in 64-bit (floating point?). So, it seems like the following should work:
var longval = buffer.readInt32LE(offset); // reads a 4-byte long, gives -272777233
var low32Bits = longval & 0xffff0000; //take the little endian 'most significant' 32 bits
var newval = ~low32Bits + 1; //invert the bits and add 1 to negate the original value
//but now newval = 272826368 instead of -2
I've tried a lot of different things and I've been stuck on this for days. I can't figure out how to recover the original value of the Python integer from the binary marshal string using Javascript/Typescript. Also I think I deeply misunderstand how bits work. Any thoughts would be appreciated here.
Some more specific questions might be:
Why would buffer.readInt32LE work for positive ints but not negative?
Am I using the correct method to get the 'most significant' or 'lowest' 32 bits (i.e. does & 0xffff0000 work how I think it does?)
Separate but related: in an actual 'long' number (i.e. longer than '-2'), I think there is a sign bit and a magnitude, and I think this information is stored in the 'highest' 2 bits of the number (i.e. at number & 0x000000ff?) -- is this the correct way of thinking about this?

The sequence ef bf bd is the UTF-8 sequence for the "Unicode replacement character", which Unicode encoders use to represent invalid encodings.
It sounds like whatever method you're using to download the data is getting accidentally run through a UTF-8 decoder and corrupting the raw datastream. Be sure you're using blob instead of text, or whatever the equivalent is for the way you're downloading the bytecode.
This got messed up only for negative values because positive values are within the normal mapping space of UTF-8 and thus get translated 1:1 from the original byte stream.

Javascript + Emoji strangeness

I'm trying to do some string methods with some text that has Emoji embedded inside of it.
However, this is a very strange thing I have seen:
"🌑".length == 2
I'm just wondering how it decides what appears as 1 character to me, is actually 2.

In Javascript, a string is a sequence of 16-bit code points. Since emoji > are encoded above the BMP, it means that they are represented by a pair > of code points, also known as a surrogate pair.
So for instance, 0x1F600, which is 😀, is represented by:
"\uD83D\uDE00"
If you're willing to go further on this, you can read this article: Emojis in Javascript - Parsing emoji in Javascript is… not easy.

How to substract 2 char in javascript to get a difference in ascii

alert('g' - 'a') is returning Not a Number. ('NAN').
But I expect, to get the difference between ascii as alert(103-97) => alert(6). Hence 6 to be output.
In C, int i = 'g' - 'a', will give i = 6.
How to achieve this subtraction of 2 characters in javascript? (easily without much effort as below)
alert("g".charCodeAt(0) - "a".charCodeAt(0)) is giving 6.
Application : I am using this in chess program.

The only practicable way to do as you want is the way you've already suggested:
alert('g'.charCodeAt(0) - 'a'.charCodeAt(0));
As you know, this will retrieve the ASCII character code from 0th element of the string in each case, and subtract the second from the first.
Unfortunately this is the only way to retrieve the ASCII code of a given character, though using a function would be somewhat simpler, though given the brevity/simplicity of the charCodeAt() solution not all that much so.
References:
String.charCodeAt().

JavaScript doesn't treat characters as numbers; they are single-character strings instead. So the subtract operator will be calculating Number('g') - Number('a').
You should do 'g'.charCodeAt(0) - 'a'.charCodeAt(0) (there is no better way, but you can wrap it in a function)

You can write yourself a custom function. Something like this:
function asciiDif(a,b) {
return a.charCodeAt(0) - b.charCodeAt(0);
}
And then:
alert(asciiDif('g','a'));

Javascript percentage validation

I am after a regular expression that validates a percentage from 0 100 and allows two decimal places.
Does anyone know how to do this or know of good web site that has example of common regular expressions used for client side validation in javascript?
#Tom - Thanks for the questions. Ideally there would be no leading 0's or other trailing characters.
Thanks to all those who have replied so far. I have found the comments really interesting.

Rather than using regular expressions for this, I would simply convert the user's entered number to a floating point value, and then check for the range you want (0 to 100). Trying to do numeric range validation with regular expressions is almost always the wrong tool for the job.
var x = parseFloat(str);
if (isNaN(x) || x < 0 || x > 100) {
// value is out of range
}

I propose this one:
(^100(\.0{1,2})?$)|(^([1-9]([0-9])?|0)(\.[0-9]{1,2})?$)
It matches 100, 100.0 and 100.00 using this part
^100(\.0{1,2})?$
and numbers like 0, 15, 99, 3.1, 21.67 using
^([1-9]([0-9])?|0)(\.[0-9]{1,2})?$
Note what leading zeros are prohibited, but trailing zeros are allowed (though no more than two decimal places).

This reminds me of an old blog Entry By Alex Papadimoulis (of The Daily WTF fame) where he tells the following story:
"A client has asked me to build and install a custom shelving system. I'm at the point where I need to nail it, but I'm not sure what to use to pound the nails in. Should I use an old shoe or a glass bottle?"
How would you answer the question?
It depends. If you are looking to pound a small (20lb) nail in something like drywall, you'll find it much easier to use the bottle, especially if the shoe is dirty. However, if you are trying to drive a heavy nail into some wood, go with the shoe: the bottle with shatter in your hand.
There is something fundamentally wrong with the way you are building; you need to use real tools. Yes, it may involve a trip to the toolbox (or even to the hardware store), but doing it the right way is going to save a lot of time, money, and aggravation through the lifecycle of your product. You need to stop building things for money until you understand the basics of construction.
This is such a question where most people sees it as a challenge to come up with the correct regular expression to solve the problem, but it would be much better to just say that using regular expressions are using the wrong tool for the job.
The problem when trying to use regex to validate numeric ranges is that it is hard to change if the requirements for the allowed range is changes. Today the requirement may be to validate numbers between 0 and 100 and it is possible to write a regex for that which doesn't make your eyes bleed. But next week the requirment maybe changes so values between 0 and 315 are allowed. Good luck altering your regex.
The solution given by Greg Hewgill is probably better - even though it would validate "99fxx" as "99". But given the circumstances that might actually be ok.

Given that your value is in str
str.match(/^(100(\.0{1,2})?|([0-9]?[0-9](\.[0-9]{1,2})))$/)

^100(\.(0){0,2})?$|^([1-9]?[0-9])(\.(\d{0,2}))?\%$
This would match:
100.00
optional "1-9" followed by a digit (this makes the int part), optionally followed by a dot and two digits
From what I see, Greg Hewgill's example doesn't really work that well because parseFloat('15x') would simply return 15 which would match the 0<x<100 condition. Using parseFloat is clearly wrong because it doesn't validate the percentage value, it tries to force a validation. Some people around here are complaining about leading zeroes and some are ignoring trailing invalid characters. Maybe the author of the question should edit it and make clear what he needs.

I recomend this, if you are not exclusively developing for english speaking users:
[0-9]{1,2}((,|\.)[0-9]{1,10})?%?
You can simply replace the 10 by a 2 to get two decimal places.
My example will match:
15.5
5.4366%
1,43
50,55%
34
45%
Of cause the output of this one is harder to cast, but something like this will do (Java Code):
private static Double getMyVal(String myVal) {
if (myVal.contains("%")) {
myVal = myVal.replace("%", "");
}
if (myVal.contains(",")) {
myVal = myVal.replace(',', '.');
}
return Double.valueOf(myVal);
}

None of the above solutions worked for me, as I needed my regex to allow for values with numbers and a decimal while the user is typing ex: '18.'
This solution allows for an empty string so the user can delete their entire input, and accounts for the other rules articulated above.
/(^$)|(^100(\.0{1,2})?$)|(^([1-9]([0-9])?|0)\.(\.[0-9]{1,2})?$)|(^([1-9]([0-9])?|0)(\.[0-9]{1,2})?$)/

(100|[0-9]{1,2})(\.[0-9]{1,2})?
That should be the regex you want. I suggest you to read Mastering Regular Expression and download RegexBuddy or The Regex Coach.

#mlarsen:
Is not that a regex here won't do the job better.
Remember that validation msut be done both on client and on server side, so something like:
100|(([1-9][0-9])|[0-9])(\.(([0-9][1-9])|[1-9]))?
would be a cross-language check, just beware of checking the input length with the output match length.

(100(\.(0){1,2})?|([1-9]{1}|[0-9]{2})(\.[0-9]{1,2})?)

Develop Reference

JavaScript is the programming language of the Web.