Regex for extracting separate letters in a loop with javascript - javascript

I'm working on a script to create metrics for online author identification. One of the things I came across in the literature is to count the frequency of each letter (how many a's, how many b's, etc) independent of upper or lower case. Since I don't want to create a separate statement for each letter, I'm trying to loop the thing, but I can't figure it out. The best I have been able to come up with is converting the ASCII letter code in to hex, and then...hopefully a miracle happens.
So far, I've got
element = id.toLowerCase();
var hex = 0;
for (k=97; k<122; k++){
hex = k.toString(16); //gets me to hex
letter = element.replace(/[^\hex]/g, "")//remove everything but the current letter I'm looking for
return letter.length // the length of the resulting string is how many times the ltter came up
}
but of course, when I do that, it interprets hex as the letters h e x, not the hex code for the letter I want.

Not sure why you'd want to convert to hex, but you could loop through the string's characters and keep track of how many times each one has appeared with an object used as a hash:
var element = id.toLowerCase();
var keys = {};
for(var i = 0, len = element.length; i<len; i++) {
if(keys[element.charAt(i)]) keys[element.charAt(i)]++;
else keys[element.charAt(i)] = 1;
}
You could use an array to do the same thing but a hash is faster.

Related

Sort all characters in a string

I'm trying to solve this problem where I want to sort array of characters in a string
Problem:
Sort an array of characters (ASCII only, not UTF8).
Input: A string of characters, like a full English sentence, delimited by a newline or NULL. Duplicates are okay.
eg: This is easy
Output: A string of characters, in sorted order of their ASCII values. You can overwrite the existing array.
eg: Taehiisssy
Solution Complexity: Aim for linear time and constant additional space.
I know that in JavaScript you can do something like
const sorted = str.split('').sort().join('')
EDIT: I'm trying to see if I can make use of charCodeAt(i) method if I can get anything out of it.
But this would be O(nLogN) ^^ not linear (+extra space O(N) for split)
But in constant space, how would we sort array of characters?
Character-by-character formulate a cumulative count
const s="This is easy";
// Create an array which will hold the counts of each character, from 0 to 255 (although strictly speaking ASCII is only up to 127)
let count = Array(256).fill(0);
// Look at each character in the input and increment the count for that character in the array.
for(let i=0; i<= s.length; i++) {
c=s.charCodeAt(i);
count[c]++;
}
let out="";
// Now scan through the character count array ...
for(let i=0; i<= 255; i++) {
// And for each character, e.g. "T", show it the number of times you saw it in the input
for(let rep=0; rep<count[i]; rep++){
out+=String.fromCharCode(i);
}
}
console.log(out);
This only uses a constant table size, 256 numbers long (or whatever number of different symbols you wish to allow).
And the time it takes is linearly dependent on the number of characters in the input string (assuming almost no time is spent on the inner FOR loop when the count is zero for that character).

list fo random numbers separated by commas

Okay so I am making a mario inspired game with randomly generating terrain, It is all working fine however the array of random numbers that randomises the terrain must be the same each time so that the user can enter a seed which is then merged with the larger list to provide a set of random numbers based off of the seed however I cannot think of any way to make this array the same each time without writing it out, and even then making an array of 1000 numbers will be timely. Can anyone suggest a fast way (number generators online dont format it in one single line of numbers separated by numbers so cannot use them)
or could someone provide me with a list that is on a single line separated by numbers that i can easily copy and paste into an array thanks! :)
The following code in Javascript will generate 1000 random numbers separated by commas.
var string = "";
var numberOfRandomNumbers = 1000;
for (var i = 0; i < numberOfRandomNumbers; i++) {
var randomNumber = Math.floor((Math.random() * 1000) + 1); //Will generate random number between 1 and 1000
string += randomNumber+",";
}
console.log(string.substring(0, string.length - 1)); //Print string to console and remove last comma

JavaScript: Non-standard ways to convert characters to codes, or strings to binary arrays (arrays of numbers)

Is there a trick in JavaScript to convert character to it's code, other than .charCodeAt(i)?
I want to convert string to binary array by a fastest way (i.e. faster than charCodeAt can do it).
But I also interested to know about slow methods too.
You could do a lookup table, but I doubt that it would be any faster. Something like:
function(str){
var codes = {},
output = [];
for (var i=0; i<256; i++){
String.codes[fromCharCode(i)] = i;
}
for (var i=0; i<str.length; i++){
output.push(codes[str[i]]);
}
return output;
}
Generally speaking though if you can go with native code, go with native code.
I doubt you will find something faster than this code to accomplish what you want:
function getCharArray(str)
{
var chars = [];
for (var i = 0, n = str.length; i != n; ++i) {
chars.push(str.charCodeAt(i));
}
return chars;
}
You have to do it this way because JavaScript doesn't have a separate type for a single character; even a single character is still a string and therefore you need String.charCodeAt().
Strings in javascript are unicode, not ascii, meaning that the code of a single character might be more than one byte.
"€".charCodeAt(1) = 8364
This fits into two bytes. If you convert it back to binary it could be interpreted as either one 2-byte-character or 2 1-byte-characters. There is just no way to know. Knowing the length of a unicode char is tricky. This might help you see why this problem is hard to solve: http://www.joelonsoftware.com/articles/Unicode.html
Of course I'm not saying you can't convert unicode strings to binary, I'm stating that this is the task you need to solve, not just converting single characters to bytes.

Calculate real length of a string, like we do with the caret

What I want is to calculate how much time the caret will move from the beginning till the end of the string.
Explanations:
Look this string "" in this fiddle: http://jsfiddle.net/RFuQ3/
If you put the caret before the first quote then push the right arrow ► you will push 3 times to arrive after the second quote (instead of 2 times for an empty string).
The first way, and the easiest to calculate the length of a string is <string>.length.
But here, it returns 2.
The second way, from JavaScript Get real length of a string (without entities) gives 2 too.
How can I get 1?
1-I thought to a way to put the string in a text input, and then do a while loop with a try{setCaret}catch(){}
2-It's just for fun
The character in your question "󠀁" is the
Unicode Character 'LANGUAGE TAG' (U+E0001).
From the following Stack Overflow questions,
" Expressing UTF-16 unicode characters in JavaScript"
" How can I tell if a string contains multibyte characters in Javascript?"
we learn that
JavaScript strings are UCS-2 encoded but can represent Unicode code points outside the Basic Multilingual Pane (U+0000-U+D7FF and U+E000-U+FFFF) using two 16 bit numbers (a UTF-16 surrogate pair), the first of which must be in the range U+D800-U+DFFF.
The UTF-16 surrogate pair representing "󠀁" is U+DB40 and U+DC01. In decimal U+DB40 is 56128, and U+DC01 is 56321.
console.log("󠀁".length); // 2
console.log("󠀁".charCodeAt(0)); // 56128
console.log("󠀁".charCodeAt(1)); // 56321
console.log("\uDB40\uDC01" === "󠀁"); // true
console.log(String.fromCharCode(0xDB40, 0xDC01) === "󠀁"); // true
Adapting the code from https://stackoverflow.com/a/4885062/788324, we just need to count the number of code points to arrive at the correct answer:
var getNumCodePoints = function(str) {
var numCodePoints = 0;
for (var i = 0; i < str.length; i++) {
var charCode = str.charCodeAt(i);
if ((charCode & 0xF800) == 0xD800) {
i++;
}
numCodePoints++;
}
return numCodePoints;
};
console.log(getNumCodePoints("󠀁")); // 1
jsFiddle Demo
function realLength(str) {
var i = 1;
while (str.substring(i,i+1) != "") i++;
return (i-1);
}
Didn't try the code, but it should work I think.
Javascript doesn't really support unicode.
You can try
yourstring.replace(/[\uD800-\uDFFF]{2}/g, "0").length
for what it's worth

Javascript split string at 160 characters and add counter?

I am looking to split a string into multiple strings at 160 characters I thought easy enough var splits = myString.split(160);
but apparently that doesn't work anyway.
The other point is that I want to add a counter (android sms style).
so lets say we have this example string (237 characters)
hi there, this message is 237 characters long, which makes it much to
long for a single message, this string will be split up into multiple
messages... this is actually for an sms application hence the reason
we need to split the string.
The final output should be
hi there, this message is 237 characters long, which makes it much to long for a single
message, this string will be split up into multiple messages... thi(1/2)
s string will be split up into multiple
messages... this is actually for an sms application hence the reason
we need to split the string.(2/2)
Now if there was always going to be 9 or less messages I could just do
//ok, so the next line won't work, but it gets the point across...
var splits = mystring.split(155);
var s = splits.length;
for(var i = 0; i < s; i++)
splits[i] += '('+(i+1)+'/'+s+')';
but the issue is that there could be anywhere up to 15 messages, so the amount of characters appended on the end is inconsistent (we want to keep the character count as low as possible to 0 padding numbers less than 10 is not an option).
How can I do this?
http://jsfiddle.net/Lobstrosity/nwaYe/
It will cut off at 160 * 15 characters (since you implied that 15 was the limit). It sets both of those numbers as variables up top so you can fiddle with either one.
Update
New fiddle: http://jsfiddle.net/Lobstrosity/nwaYe/2/
It's ugly...but it works...
It figures out if the y in (x/y) is going to be one digit or two.
It uses a placeholder in the indicators while it builds up the actual message.
It replaces the placeholders.
Finally, it splits on charLimit.
I hope someone else figures out a cleaner way to do this...
I think your best choice for getting optimal messages (i.e. max length) may be to handle each class of max message length separately.
Handle the cases where you have less than ten messages with your current code, then extend that method to cover cases where you have up to 99 messages. If there's a chance of going over 100 messages, then you could extend further, but that sounds unlikely.
Here's a code sample:
if (mystring.length < (155 * 9)) {
var splits = mystring.split(155);
var s = splits.length;
for(var i = 0; i < s; i++)
splits[i] += '('+(i+1)+'/'+s+')';
} else if (mystring.length < (154*9)+(153 * 90)) {
var splits1 = mystring.substr(0,154*9).split(154);
var splits2 = mystring.substr(154*9).split(153);
var splits = [];
var s = splits1.length + splits2.length;
for (var i = 0; i < splits1.length; i++)
splits[i] = splits1[i] + '('+(i+1)+'/'+s+')';
for (var i = 10; i < splits2.length+10; i++)
splits[i] = splits2[i-10] + '('+(i+1)+'/'+s+')';
}

Categories

Resources