Text Grouping Algorithm

Text Grouping Algorithm - javascript

Given an arbitrary string of text, the task is to group the text into separate sections of a template. Each section has different min length and max length parameters. A solution can be considered optimal for a section as long as it falls within those bounds. A greedy solution might result in some sections not meeting their minimums, which means the solution as a whole is not acceptable.
I'm having trouble efficiently constructing an algorithm to do this. It seems that a dynamic programming approach might help, but thus far, I haven't been able to couch it in dynamic programming terms. Does anyone have any leads on solving this problem?
function groupText(str, template)
Inputs:
str: a string of text
template: array of JavaScript objects.
One object per section that describes the min/max amount of text allowed
Output:
array: each element corresponds to one section.
The value of the element is the text that is in the section.
As an example, let's define a string str that is equal to "This is a test." We also have a template t. t consists of several sections. Each section s has a minimum and maximum amount of characters allowed. Let's say for this example there are only two sections: s1 and s2. s1 has a minimum of 1 character and a maximum of 100. s2 has a minimum of 10 characters and a maximum of 15. We pass our string str and our template t to a function groupText. groupText must return an array, with each element i corresponding to a section. For example, element 0 will correspond to s1. The value of the element will be the text that has been assigned to the section.
In this example, a solution might be.
s1text = "This "
s2text = "is a test."

If I understood the problem correctly there's no need of any search... just subtract from the total length the sum of the minimum lengths and what remains is the amount to be distributed. Then distribute this amount to each element up to its maximum until nothing is left... in code
var minsum = 0;
for (vsr i=0; i < sections.length; i++)
minsum += sections[i].min_size;
var extra = text.length - minsum;
if (extra < 0) return null; // no solution
var solution = [];
for (var i=0; i < sections.length; i++)
{
var x = sections[i].min_size + extra;
if (x > sections[i].max_size)
x = sections[i].max_size;
solution.push(x);
extra -= x - sections[i].min_size;
}
if (extra > 0) return null; // no solution
return solution;

OK, so here's an ad-hoc, untested algorithm. If it's no good, perhaps it's good enough to goad someone else into a better answer;
Let's have some trial data. Suppose your template comprises 6 sections, which have min,max limits as:
1 - 12
13 - 25
5 - 7
6 - 7
5 - 5
10 - 25
which means that you're going to need a string of at least 40 and at most 81 characters to satisfy your constraints. And therein lies the solution. First, compute a table like this:
40 - 81
39 - 69
26 - 34
21 - 37
15 - 30
10 - 25
in which each row gives the total length of string that can still be partitioned across the 'slots' in your template. Into slot 1 you put text so that you still have between 39 and 69 characters left for the rest of the slots. Into slot 2 you put text so that you still have between 26 and 34 characters. And so on.

Related

Splitting an array into columns, where the total child elements is roughly equal

I'm trying to achieve some tricky functionality within Javascript for creating a directory page. The directory page works by having items assigned to a letter (which is the first letter of the item).
For example:
{
A: [Alfie, Amelia, Ava, Alex, Aaron],
B: [Ben, Bella, Blake, Bailey, Bradley]
...
}
Essentially, the output should be 4 sub-arrays. These sub-arrays should be an array of letters. However, the functionality should decide how many letters are in each array by the number of items that are assigned to that letter.
So, the first array could contain only 3 letters, as each letter has 5 items assigned (totalling). The second array could contain 7 letters, as the total of sum of their children is roughly 15.
For example:
[
[A (5), B (5), C (5)],
[E (4), F (1), G (1), H (2), I (3), J (2), K (1)]
...
]
It's worth nothing that it isn't important for each of the 4 arrays to contain EXACTLY the same amount of items because it's likely for this not to be possible.
I'm not sure how I'd begin to achieve this functionality. Any points in the right direction would be much appreciated

Here is a simple approach for the question asked.
First, it is easy given maximum height of any column to figure out how many columns you will need. Just attempt it.
Next, the maximum height must be in the range from the largest number of items assigned to a letter, to the sum of all items assigned to all letters.
And now we can do a binary search to find the smallest allowed height that has at most 4 columns. The key loop should look something like:
while (lower < upper) {
let mid = Math.ceil((lower + upper)/2);
if (divideIntoColumns(items, mid).length <= 4) {
lower = mid;
}
else {
upper = mid - 1;
}
}
As #trincot pointed out, titles also take space. The easiest way to take that into account is do make you divideIntoColumns function be the estimated height of the column in a convenient unit (eg pixels). Even if that multiplies the size of your numbers by a factor of 20, that's just 4-5 extra rounds of binary search, which will be fine.

Writing a text scrambling algorithm backwards

Explanation
I'm building a simple word scrambler/encoder in for fun. Given a seed and the text to convert, the algorithm to turn the original string illegible is to:
Loop through the characters of a user-given seed, say "lkj"
Loop through the characters of the text to convert, say "hello"
Get the index of both according to a list of accepted characters; so if [a-z] were characters that could be scrambled, and the loop was on index 0 for both the seed and conversion text, I'd get l = index 11 and h = index 7
Add those two indices together. 11 + 7 = 18. If the index exceeds the length of the accepted list, decrease that length from the new index (eg. 33 - 26).
Get the character corresponding to index 18 on the accepted list. s is at index 18
Repeat until all the conversion characters have been looped through, returning the current seed index to 0 if the text exceeds the length of the seed
We end up with "souwy".
The algorithm to decode text (should be with the same seed) is just to do everything backwards. Start at the end of the string, start looping through the seed backwards at the index that it would stop at (seed.length % text.length), subtract indices instead of adding, then reverse the resulting string. So if we input "lkj" as the seed and "souwy" as the conversion text, we'll get "hello" back upon decoding.
Problem
The decoding seems to work only sometimes. With some combination of the seed and conversion text, the algorithm fails--but I have no idea what it could be. For example, using the following information:
Seed: lkj
Input: Hey guys! My email is yay#someDomain.com, but don't send me anything U_U
Decoding fails. However, if an 'x' is added to the end of the input, it works. What could be going wrong?
Fiddle

Looks like you have an off by one error when the length of your message is a multiple of the length of your seed.
The problem line was:
var is_currSeed = (numLastWords > 0) ? (numLastWords - 1) : 0
should be changed to:
var is_currSeed = (numLastWords > 0) ? (numLastWords - 1) : seed.length - 1;
Here's a working version.

Convert arbitary string to number between 0 and 1 in Javascript

I'm working with Youtube videos which have ids like 8DnKOc6FISU, rNsrl86inpo, 5qcmCUsw4EQ (i.e. 11 characters in the set A-Za-z0-9_-)
The goal is to convert each id to a colour (represented by the range 0-1), so they can be reliably charted.
According to this question these are 64 bit numbers. Given that:
I want to make full use of the colour space for any given set of videos
Colour perception isn't that accurate anyway
...it seems sensible to base this off the last 2-3 characters of the id.
My lead approach is a function I borrowed from here, which converts each character into a binary number like so:
function toBin(str){
var st,i,j,d;
var arr = [];
var len = str.length;
for (i = 1; i<=len; i++){
d = str.charCodeAt(len-i);
for (j = 0; j < 8; j++) {
st = d%2 == '0' ? "class='zero'" : ""
arr.push(d%2);
d = Math.floor(d/2);
}
}
}
But this leaves the question of how to convert this back to a float.
Any ideas for an elegant solution?
Many thanks for your help!

Knowing that system is base 64 (26+26+10+2), just read each symbol, add it to total and multiply result by 64 on each iteration. In the end you will have an integer number. Divide it by maximum value to normalize it to 0-1 range.
You'll start losing precision when your IDs approach 253 though, as that's maximum what JS can represent in integer precisely.
Alternatively you can count in floats from the very start by adding (letter_index / 64letter_position) for each position in line, but you'd be losing more precision in process.

Chunk a string every odd and even position

I know nothing about javascript.
Assuming the string "3005600008000", I need to find a way to multiply all the digits in the odd numbered positions by 2 and the digits in the even numbered positions by 1.
This pseudo code I wrote outputs (I think) TRUE for the odd numbers (i.e. "0"),
var camid;
var LN= camid.length;
var mychar = camid.charAt(LN%2);
var arr = new Array(camid);
for(var i=0; i<arr.length; i++) {
var value = arr[i]%2;
Alert(i =" "+value);
}
I am not sure this is right: I don't believe it's chunking/splitting the string at odd (And later even) positions.
How do I that? Can you please provide some hints?
/=================================================/
My goal is to implement in a web page a validation routine for a smartcard id number.
The logic I am trying to implement is as follows:
· 1) Starting from the left, multiply all the digits in the odd numbered positions by 2 and the digits in the even numbered positions by 1.
· 2) If the result of a multiplication of a single digit by 2 results in a two-digit number (say "7 x 2 = 14"), add the digits of the result together to produce a new single-digit result ("1+4=5").
· 3) Add all single-digit results together.
· 4) The check digit is the amount you must add to this result in order to reach the next highest multiple of ten. For instance, if the sum in step #3 is 22, to reach the next highest multiple of 10 (which is 30) you must add 8 to 22. Thus the check digit is 8.
That is the whole idea. Google searches on smartcard id validation returned nothing and I am beginning to think this is overkill to do this in Javascript...
Any input welcome.

var theArray = camid.split(''); // create an array entry for each digit in camid
var size = theArray.length, i, eachValue;
for(i = 0; i < size; i++) { // iterate over each digit
eachValue = parseInt(theArray[i], 10); // test each string digit for an integer
if(!isNaN(eachValue)) {
alert((eachValue % 2) ? eachValue * 2 : eachValue); // if mod outputs 1 / true (due to odd number) multiply the value by 2. If mod outputs 0 / false output value
}
}

I discovered that what I am trying to do is called a Luhn validation.
I found an algorithm right here.
http://sites.google.com/site/abapexamples/javascript/luhn-validation
Thanks for taking the time to help me out. Much appreciated.

It looks like you might be building to a Luhn validation. If so, notice that you need to count odd/even from the RIGHT not the left of the string.

Influence Math.random()

I'm looking for a way to influence Math.random().
I have this function to generate a number from min to max:
var rand = function(min, max) {
return Math.floor(Math.random() * (max - min + 1)) + min;
}
Is there a way to make it more likely to get a low and high number than a number in the middle?
For example; rand(0, 10) would return more of 0,1,9,10 than the rest.

Is there a way to make it more likely to get a low and high number than a number in the middle?
Yes. You want to change the distribution of the numbers generated.
http://en.wikipedia.org/wiki/Random_number_generation#Generation_from_a_probability_distribution

One simple solution would be to generate an array with say, 100 elements.
In those 100 elements represent the numbers you are interested in more frequently.
As a simple example, say you wanted number 1 and 10 to show up more frequently, you could overrepresent it in the array. ie. have number one in the array 20 times, number 10 in the array 20 times, and the rest of the numbers in there distributed evenly. Then use a random number between 0-100 as the array index. This will increase your probability of getting a 1 or a 10 versus the other numbers.

You need a distribution map. Mapping from random output [0,1] to your desired distribution outcome. like [0,.3] will yield 0, [.3,.5] will yield 1, and so on.

Sure. It's not entirely clear whether you want a smooth rolloff so (for example) 2 and 8 are returned more often than 5 or 6, but the general idea works either way.
The typical way to do this is to generate a larger range of numbers than you'll output. For example, lets start with 5 as the base line occurring with frequency N. Let's assume that you want 4 or 7 to occur at frequency 2N, 3 or 8 at frequency 3N, 2 or 9 and frequency 4N and 0 or 10 at frequency 5N.
Adding those up, we need values from 1 to 29 (or 0 to 28, or whatever) from the generator. Any of the first 5 gives an output of 0. Any of the next 4 gives and output of 1. Any of the next 3 gives an output of 2, and so on.
Of course, this doesn't change the values returned by the original generator -- it just lets us write a generator of our own that produces numbers following the distribution we've chosen.

Not really. There is a sequence of numbers that are generated based off the seed. Your random numbers come from the sequence. When you call random, you are grabbing the next element of the sequence.

Can you influence the output of Math.random in javascript (which runs client side)?
No. At least not in any feasible/practical manner.
But what you could do is to create your own random number generator that produces number in the distribution that you need.

There are probably an infinite number of ways of doing it, and you might want to think about the exact shape/curvature of the probability function.
It can be probably be done in one line, but here is a multi-line approach that uses your existing function definition (named rand, here):
var dd = rand(1,5) + rand(0,5);
var result;
if (dd > 5)
result = dd - 5;
else result = 6 - dd;

One basic result is that if U is a random variable with uniform distribution and F is the cumulative distribution you want to sample from, then Y = G(X) where G is the inverse of F has F as its cumulative distribution. This might not necessarily be the most efficient way of doing and generating random numbers from all sort of distributions is a research subfield in and of itself. But for a simple transformation it might just do the trick. Like in your case, F(x) could be 4*(x-.5)^3+.5, it seems to satisfy all constraints and is easy to invert and use as a transformation of the basic random number generator.

Develop Reference

JavaScript is the programming language of the Web.

Text Grouping Algorithm - javascript

Related

Splitting an array into columns, where the total child elements is roughly equal

Writing a text scrambling algorithm backwards

Convert arbitary string to number between 0 and 1 in Javascript

Chunk a string every odd and even position

Influence Math.random()

Categories

Resources