Count bytes in textarea using javascript

Count bytes in textarea using javascript - javascript

I need to count how long in bytes a textarea is when UTF8 encoded using javascript. Any idea how I would do this?
thanks!

encodeURIComponent(text).replace(/%[A-F\d]{2}/g, 'U').length

Combining various answers, the following method should be fast and accurate, and avoids issues with invalid surrogate pairs that can cause errors in encodeURIComponent():
function getUTF8Length(s) {
var len = 0;
for (var i = 0; i < s.length; i++) {
var code = s.charCodeAt(i);
if (code <= 0x7f) {
len += 1;
} else if (code <= 0x7ff) {
len += 2;
} else if (code >= 0xd800 && code <= 0xdfff) {
// Surrogate pair: These take 4 bytes in UTF-8 and 2 chars in UCS-2
// (Assume next char is the other [valid] half and just skip it)
len += 4; i++;
} else if (code < 0xffff) {
len += 3;
} else {
len += 4;
}
}
return len;
}

[June 2020: The previous answer has been replaced due to it returning incorrect results].
Most modern JS environments (browsers and Node) now support the TextEncoder API, which may be used as follows to count UTF8 bytes:
const textEncoder = new TextEncoder();
textEncoder.encode('⤀⦀⨀').length; // => 9
This is not quite as fast as the getUTF8Length() function mentioned in other answers, below, but should suffice for all but the most demanding use cases. Moreover, it has the benefit of leveraging a standard API that is well-tested, well-maintained, and portable.

If you have non-bmp characters in your string, it's a little more complicated...
Because javascript does UTF-16 encode, and a "character" is a 2-byte-stack (16 bit) all multibyte characters (3 and more bytes) will not work:
<script type="text/javascript">
var nonBmpString = "foo€";
console.log( nonBmpString.length );
// will output 5
</script>
The character "€" has a length of 3 bytes (24bit). Javascript does interpret it as 2 characters, because in JS, a character is a 16 bit block.
So to correctly get the bytesize of a mixed string, we have to code our own function fixedCharCodeAt();
function fixedCharCodeAt(str, idx) {
idx = idx || 0;
var code = str.charCodeAt(idx);
var hi, low;
if (0xD800 <= code && code <= 0xDBFF) { // High surrogate (could change last hex to 0xDB7F to treat high private surrogates as single characters)
hi = code;
low = str.charCodeAt(idx + 1);
if (isNaN(low)) {
throw 'Kein gültiges Schriftzeichen oder Speicherfehler!';
}
return ((hi - 0xD800) * 0x400) + (low - 0xDC00) + 0x10000;
}
if (0xDC00 <= code && code <= 0xDFFF) { // Low surrogate
// We return false to allow loops to skip this iteration since should have already handled high surrogate above in the previous iteration
return false;
/*hi = str.charCodeAt(idx-1);
low = code;
return ((hi - 0xD800) * 0x400) + (low - 0xDC00) + 0x10000;*/
}
return code;
}
Now we can count the bytes...
function countUtf8(str) {
var result = 0;
for (var n = 0; n < str.length; n++) {
var charCode = fixedCharCodeAt(str, n);
if (typeof charCode === "number") {
if (charCode < 128) {
result = result + 1;
} else if (charCode < 2048) {
result = result + 2;
} else if (charCode < 65536) {
result = result + 3;
} else if (charCode < 2097152) {
result = result + 4;
} else if (charCode < 67108864) {
result = result + 5;
} else {
result = result + 6;
}
}
}
return result;
}
By the way...
You should not use the encodeURI-method, because, it's a native browser function ;)
More stuff:
Code on GitHub
More on Mozilla Developer Networks
Cheers
frankneff.ch / #frank_neff

Add Byte length counting function to the string
String.prototype.Blength = function() {
var arr = this.match(/[^\x00-\xff]/ig);
return arr == null ? this.length : this.length + arr.length;
}
then you can use .Blength() to get the size

I have been asking myself the same thing. This is the best answer I have stumble upon:
http://www.inter-locale.com/demos/countBytes.html
Here is the code snippet:
<script type="text/javascript">
function checkLength() {
var countMe = document.getElementById("someText").value
var escapedStr = encodeURI(countMe)
if (escapedStr.indexOf("%") != -1) {
var count = escapedStr.split("%").length - 1
if (count == 0) count++ //perverse case; can't happen with real UTF-8
var tmp = escapedStr.length - (count * 3)
count = count + tmp
} else {
count = escapedStr.length
}
alert(escapedStr + ": size is " + count)
}
but the link contains a live example of it to play with. "encodeURI(STRING)" is the building block here, but also look at encodeURIComponent(STRING) (as already point out on the previous answer) to see which one fits your needs.
Regards

encodeURI(text).split(/%..|./).length - 1

How about simple:
unescape(encodeURIComponent(utf8text)).length
The trick is that encodeURIComponent seems to work on characters while unescape works on bytes.

Try the following:
function b(c) {
var n=0;
for (i=0;i<c.length;i++) {
p = c.charCodeAt(i);
if (p<128) {
n++;
} else if (p<2048) {
n+=2;
} else {
n+=3;
}
}return n;
}

set meta UTF-8 just & it's OK!
<meta charset="UTF-8">
<meta http-equiv="content-type" content="text/html;charset=utf-8">
and js:
if($mytext.length > 10){
// its okkk :)
}

Related

Caesar Cipher technique and reverse case in javascript

I am beginner and want to make my own function.
I want to hash the password by shifting every character by given x
positions and reverse to lowercase/uppercase.
I think the code below should return "EFGH7654" but it return 55 with no error message.
How can I fix it? Is it because of I put a function in a function?
Or I type wrong any thing?
function hashPassword(password, x) {
// password is a string, x is a number
// return a string
// (ex. password = 'ab1By', x = 3 so it should return "DE4eB")
function shift(text, s) {
result = "";
for (let i = 0; i < text.length; i++) {
let char = text[i];
if (char.toUpperCase(text[i])) {
let ch = String.fromCharCode((char.charCodeAt(0) + s - 65) % 26 + 65);
result += ch;
} else {
let ch = String.fromCharCode((char.charCodeAt(0) + s - 97) % 26 + 97);
result += ch;
}
}
return result;
}
function reversecase(x) {
var output = '';
for (var i = 0, len = x.length; i < len; i++) {
var character = x[i];
if (character == character.toLowerCase()) {
// The character is lowercase
output = output + character.toUpperCase();
} else {
// The character is uppercase
output = output + character.toLowerCase();
}
}
return output
}
var str = "";
var result = "";
var charcode = "";
for (var i = 0; i < password.length; i++) {
if (typeof password[i] === typeof str) {
char = shift(password[i], x)
charcode = reversecase(char)
result += charcode;
} else {
num = password[i] + x
number = num % 10
result += number.toString()
}
}
return result
};
console.log(hashPassword("abcd4321", 4))

There a quite some problems in your code.
The first problem here is not only the nesting, but the fact that you're defining the result variable in the outer function scope using the var keyword. Then you use (read/write) that variable in different places.
In function shift() (also in return statement)
In the outer function (also in return statement)
The thing you have to understand is, that you're referring to the same variable result every time. To ensure that your variables are scoped, i.e. are only valid within a block (if statement, function body, etc.), you should use the let or const keywords. This makes your code a lot safer.
The second problem are some assumptions you make regarding data types. If you have a string let s = "my string 123", the expression typeof s[x] === 'string' will be true for every x in s.
Another problem is the algorithm itself. The outer function hashPassword() iterates over all characters of the input string. Within that loop you call function shift(password[i], x), passing a single character. The first parameter of shift() is called text - and there is another for loop (which is confusing and does not make sense).
To make things short, please have a look at this simplified version:
function shift(char, x) {
let result;
const code = char.charCodeAt(0);
if (code >= 65 && code < 91) {
result = String.fromCharCode((code + x - 65) % 26 + 65);
}
else if (code >= 48 && code <= 57) {
result = String.fromCharCode((code + x - 48) % 10 + 48);
}
else {
result = String.fromCharCode((code + x - 97) % 26 + 97);
}
return result;
}
function reverseCase(character) {
if (character === character.toLowerCase()) {
return character.toUpperCase();
}
else {
return character.toLowerCase();
}
}
function hashPassword(password, x) {
let result = "";
for (let i = 0; i < password.length; i++) {
const char = shift(password[i], x);
result += reverseCase(char);
}
return result;
}
console.log(hashPassword("abcd4321", 4)); // Output: EFGH8765

JS Caesar Cipher - Need help reviewing my code for repeating letters

I created a function to encode or decode messages below. I am struggling with finding the error in my code however. I have the function caesar(str, num) that is moving the letters of the alphabet (in str) over one place by (num). For example when I input caesar('Aaa', 1) I expect 'Bbb' in return, but with my function I am getting 'BBbb'. And for example when I input caesar('AAAAaaaa',1) I get 'BBBBBBBBbbbb'. Not sure why the upper cases are repeating and printing twice but the lower cases are fine. Thanks for any help.
const caesar = function(str, num) {
let secret = '';
for ( let i = 0; i < str.length; i++) {
let index = str.charCodeAt(i);
if (index >= 65 && index <= 90) {
secret += String.fromCharCode(index + num);
} else (index >= 97 && index <= 122)
secret += String.fromCharCode(index + num);
}
return secret;
}
console.log(caesar('Aaa', 1));

Encryption... almost works

I wrote a simple script for a website called Codewars (here: https://www.codewars.com/kata/57814d79a56c88e3e0000786). The purpose of the function was to encrypt a string such that every second character would appear first, and then the rest of them. I tested many random strings of text; it worked for a while. But then, I tested a specific case with 17 characters: "maybe do i really", and it resulted in a character being dropped (notably a space). Initially, I thought the issue was that the .join method didn't allow a double space in a row, so I attempted to make my own function to mimic its functionality: it did not solve the problem. Could anyone answer why this specific string loses a character and returns a wrong encryption? My jsfiddle is here: https://jsfiddle.net/MCBlastoise/fwz62j2g/
Edit: I neglected to mention that it runs a certain number of times based on parameter n, encrypting the string multiple times per that value.
And my code is here:
function encrypt(text, n) {
if (n <= 0 || isNaN(n) === true || text === "" || text === null) {
return text;
}
else {
for (i = 1; i <= n; i++) {
if (i > 1) {
text = encryptedString;
}
var evenChars = [];
var oddChars = [];
for (j = 0; j < text.length; j++) {
if (j % 2 === 0) {
evenChars.push(text.charAt(j));
}
else {
oddChars.push(text.charAt(j));
}
}
var encryptedString = oddChars.join("") + evenChars.join("");
}
return encryptedString;
}
}
function decrypt(encryptedText, n) {
if (n <= 0 || encryptedText === "" || encryptedText === null) {
return encryptedText;
}
else {
for (i = 1; i <= n; i++) {
if (i > 1) {
encryptedText = decryptedString;
}
var oddChars = [];
var evenChars = [];
for (j = 0; j < encryptedText.length; j++) {
if (j < Math.floor(encryptedText.length / 2)) {
oddChars.push(encryptedText.charAt(j));
}
else {
evenChars.push(encryptedText.charAt(j));
}
}
var convertedChars = []
for (k = 0; k < evenChars.length; k++) {
convertedChars.push(evenChars[k]);
convertedChars.push(oddChars[k]);
}
var decryptedString = convertedChars.join("");
}
return decryptedString;
}
}
document.getElementById("text").innerHTML = encrypt("maybe do i really", 1);
document.getElementById("text2").innerHTML = decrypt("ab oiralmyed ely", 1)
<p id="text"></p>
<p id="text2"></p>

Nothing wrong with the code itself. Basically HTML doesn't allow 2 or more spaces. You can use <pre> tag for the case like this.
document.getElementById("text").innerHTML = "<pre>" + encrypt("maybe do i really", 1) + "</pre>";

How do I check if an input contains an isbn using javascript

I need a script that will test an input field's contents to see if it contains an ISBN. I found a few examples of this, but none of them strip the dashes. I need this to happen or my search results don't work. I have the else part of the script working if the field doesn't have an ISBN, but can't get the ISBN test to work. Thank you in advance for any help!
function search() {
var textboxdata = $('#search').val();
if (textboxdata contains an ISBN number, strip it of dashes and) {
// perform ISBN search
document.location.href = "http://myurl?search=" + textboxdata;
}
else { //perform other search
}
}

Based on the algorithms given in the Wikipedia article, here's a simple javascript function for validating 10- and 13-digit ISBNs:
var isValidIsbn = function(str) {
var sum,
weight,
digit,
check,
i;
str = str.replace(/[^0-9X]/gi, '');
if (str.length != 10 && str.length != 13) {
return false;
}
if (str.length == 13) {
sum = 0;
for (i = 0; i < 12; i++) {
digit = parseInt(str[i]);
if (i % 2 == 1) {
sum += 3*digit;
} else {
sum += digit;
}
}
check = (10 - (sum % 10)) % 10;
return (check == str[str.length-1]);
}
if (str.length == 10) {
weight = 10;
sum = 0;
for (i = 0; i < 9; i++) {
digit = parseInt(str[i]);
sum += weight*digit;
weight--;
}
check = (11 - (sum % 11)) % 11
if (check == 10) {
check = 'X';
}
return (check == str[str.length-1].toUpperCase());
}
}

There is also a js library available for checking ISBN10 and ISBN13 formatting: isbnjs as well as isbn-verify
Edit 2/2/17 - previous link was to Google Code, some updated current links:
- npm for isbn-verify
- npm for isbnjs
- Github project

Take a look at this Wikipedia article:
http://en.wikipedia.org/wiki/International_Standard_Book_Number
Should give you some insight into how to validate an ISBN number.

Derek's code fails for this ISBN ==> "0756603390"
It's because the check digit will end up as 11.
incorrect == > check = 11 - (sum % 11);
correct ==> check = (11 - (sum % 11)) %11;
I tested the new code against 500 ISBN10s.

Is there a good javascript snippet anyone knows for formatting "abbreviated" numbers?

The key is abbreviated. For example, 1m instead of 1000000, and 12k instead of 12000 etc. - much like on StackOverflow!
I'm not sure what else to add other than I've tried:
format numbers abbreviated javascript
format numbers short javascript
And a few other searches and scoured the results with no luck. I feel like someone must have done this before, hate reinventing wheels and all that!
Cheers
Edit: I'm looking to take a number, i.e. 12345 and turn it into 12k
Sorry I wasn't very clear!

Here's some code I've written quite some time ago but it works fine. It even supports decimals.
function is_numeric(string) {
for(var i = 0; i < string.length; i++) {
if(string.charAt(i) < '0' || string.charAt(i) > '9') {
return false;
}
}
return true;
}
function charValueMultiplier(letter) {
switch(letter) {
case 'M':
case 'm': return 1000000;
case 'k':
case 'K': return 1000;
default: return 0;
}
}
// parse string like 1.5M or 10k and return the number
function parseNumber(string) {
string = string.replace(/ /g, ''); // remove spaces
var total = 0;
var partial = 0;
var partialFraction = 0;
var fractionLength = 0;
var isFraction = false;
// process the string; update total if we find a unit character
for(var i = 0; i < string.length; i++) {
var c = string.substr(i, 1);
if(c == '.' || c == ',') {
isFraction = true;
}
else if(is_numeric(c)) {
if(isFraction) {
partialFraction = partialFraction * 10 + parseInt(c, 10);
fractionLength++;
}
else {
partial = partial * 10 + parseInt(c, 10);
}
}
else {
total += charValueMultiplier(c) * partial + charValueMultiplier(c) * partialFraction / Math.pow(10, fractionLength);
partial = 0;
partialFraction = 0;
fractionLength = 0;
isFraction = false;
}
}
return Math.round(total + partial + partialFraction / Math.pow(10, fractionLength));
}

I made an npm package that can do this for you: https://www.npmjs.com/package/approximate-number
Usage for Node.js (or browsers via Browserify):
npm install --save approximate-number
and then in your code:
var approx = require('approximate-number');
approx(123456); // "123k"
Or, for usage in browsers via Bower:
bower install approximate-number
And then
window.approximateNumber(123456); // "123k"

If I understand correctly, you have a number n and want to format it to a string. Then
// n being the number to be formatted
var s = "" + n; // cast as string
if (n >= 1000000) {
s = s.substring(0, s.length - 6) + "M";
} else if (n >= 1000) {
s = s.substring(0, s.length - 3) + "k";
}
should do the job. You can of course extend it to your needs or even include numbers < 1.

Develop Reference

JavaScript is the programming language of the Web.

Count bytes in textarea using javascript - javascript

I need to count how long in bytes a textarea is when UTF8 encoded using javascript. Any idea how I would do this? thanks!

encodeURIComponent(text).replace(/%[A-F\d]{2}/g, 'U').length

Add Byte length counting function to the string String.prototype.Blength = function() { var arr = this.match(/[^\x00-\xff]/ig); return arr == null ? this.length : this.length + arr.length; } then you can use .Blength() to get the size

encodeURI(text).split(/%..|./).length - 1

How about simple: unescape(encodeURIComponent(utf8text)).length The trick is that encodeURIComponent seems to work on characters while unescape works on bytes.

Try the following: function b(c) { var n=0; for (i=0;i<c.length;i++) { p = c.charCodeAt(i); if (p<128) { n++; } else if (p<2048) { n+=2; } else { n+=3; } }return n; }

set meta UTF-8 just & it's OK! <meta charset="UTF-8"> <meta http-equiv="content-type" content="text/html;charset=utf-8"> and js: if($mytext.length > 10){ // its okkk :) }

Related

Caesar Cipher technique and reverse case in javascript

JS Caesar Cipher - Need help reviewing my code for repeating letters

Encryption... almost works

How do I check if an input contains an isbn using javascript

Is there a good javascript snippet anyone knows for formatting "abbreviated" numbers?

Categories

Resources