Bra size validation with RegExp (US, EU, Japan, Australia) - javascript

I want to check if an input is a valid bra measurement. In the US, bra sizes are written with an even number 28-48 and a letter A-I, AAA, AA, DD, DDD, HH or HHH. The EU, Japan and Australia use different numbers and patterns, ex. 90C C90 and DD6.
-I want to split the letters and digits, check that the letter is between A - I or AA, AAA, DD, DDD, HH or HHH, and that the number is 28 - 48 (even numbers only), 60-115 (increments of 5, so 65, 70, 75, etc.) or 6-28 even numbers only.
var input = $("#form_input").val("");
var bust = input.match(/[\d\.]+|\D+/g);
var vol = bust[0];
var band = bust[1];
I can write a long test condition:
if ((vol > 28 && vol < 48) && band == "AAA" || band == "AA" || band == "A" || band == "B" || etc.) { //some code
} else { error message" }```
How do I shorten this and do the above things using regex?

It is a bit of a long pattern with the alternatives, but you can easily adjust the ranges if something is missing or matches too much.
You can first check if the pattern matches using test. To get the band and the vol matches, one option is to extract either the digits or the uppercase chars from the match as there are matches for example for 90C and C90
^(?:(?:28|3[02468]|4[02468])(?:AA?|[BC]|D{1,4}|[E-I])|(?:[6-9][05]|1[01][05])(?:AA?|[BC]|DD?|[E-I])|[A-I](?:[6-9][05]|1[01][05])|(?:[68]|1[02468]|2[0246])(?:AA?|[BC]|DD?|[E-I]))$
Explanation
^ Start of string
(?: Non capture group for the alternatives
(?:28|3[02468]|4[02468]) Match from 28 - 48 in steps of 2
(?:AA?|[BC]|D{1,4}|[E-I]) Match AA, A, B, C, 1-4 times a D or a range E-I
| Or
(?:[6-9][05]|1[01][05]) Match from 60 - 115 insteps of 5
(?:AA?|[BC]|DD?|[E-I]) Match AA, A, B, C DD, D or a range E-I
| Or
[A-I](?:[6-9][05]|1[01][05]) Match a range A-I and a number 60 - 115 in steps of 5
| Or
(?:[68]|1[02468]|2[0246]) Match from 6 - 26 in steps of 2
(?:AA?|[BC]|DD?|[E-I]) Match AA, A, B, C, DD, D or a range E-I
) Close alternation
$ End of string
Regex demo
const pattern = /^(?:(?:28|3[02468]|4[02468])(?:AA?|[BC]|D{1,4}|[E-I])|(?:[6-9][05]|1[01][05])(?:AA?|[BC]|DD?|[E-I])|[A-I](?:[6-9][05]|1[01][05])|(?:[68]|1[02468]|2[0246])(?:AA?|[BC]|DD?|[E-I]))$/;
const str = `28A
28AA
30B
34AA
36DDDD
D70
I115
A70
H80
6AA
26I
`;
str.split('\n').forEach(s => {
if (pattern.test(s)) {
console.log(`Match: ${s}`);
let vol = s.match(/\d+/)[0];
let band = s.match(/[A-Z]+/)[0];
console.log(`vol: ${vol}`);
console.log(`band: ${band}`);
console.log("---------------------------------------");
}
})

^(((([0-4])(0|2|4|6|8))|(6|8))|(((6|7|8|9)(0|5))|(1[01][05])))((AAA)|(AA)|(DD)|(DDD)|(HH)|(HHH)|[A-I])$
Proof that all valid sizes match, while all 100_464 sample invalid sizes do not:
const validNumbers = Array
.from({ length: 22 }, (_, i) => 6 + (i * 2))
.concat(Array.from({ length: 12 }, (_, i) => 60 + (i * 5)));
const validLetters = [
'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I',
'AAA', 'AA', 'DD', 'DDD', 'HH', 'HHH'
];
const validSizes = validNumbers.map((number) => validLetters
.map((letter) => number + letter))
.flat();
const invalidNumbers = Array
.from({ length: 1_000 }, (_, i) => i)
.filter((n) => !validNumbers.includes(n))
const invalidLetters = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'.split('')
.map((letter) => Array.from({ length: 4 }, (_, i) => letter.repeat(i + 1)))
.flat();
const invalidSizes = invalidNumbers.map((number) => invalidLetters
.map((letter) => number + letter))
.flat();
const regex = /^(((([0-4])(0|2|4|6|8))|(6|8))|(((6|7|8|9)(0|5))|(1[01][05])))((AAA)|(AA)|(DD)|(DDD)|(HH)|(HHH)|[A-I])$/;
const falsePositives = invalidSizes.filter((size) => regex.test(size));
console.log({ falsePositives });
console.log({ validSizes: validSizes.map((size) => ({ size, isValid: regex.test(size) })) });

Related

How to divide a DocumentFragment based on character offset

I have a string that (potentially) contains HTML tags.
I want to split it into smaller valid HTML strings based on (text) character length. The use case is essentially pagination. I know the length of text that can fit on a single page. So I want to divide the target string into "chunks" or pages based on that character length. But I need each of the resulting pages to contain valid HTML without unclosed tags, etc.
So for example:
const pageCharacterSize = 10
const testString = 'some <strong>text with HTML</strong> tags
function paginate(string, pageSize) { //#TODO }
const pages = paginate(testString, pageCharacterSize)
console.log(pages)
// ['some <strong>text </strong>', '<strong>with HTML</strong> ', 'tags']
I think this is possible to do with a DocumentFragment or Range but I can't figure out how slice the pages based on character offsets.
This MDN page has a demo that does something close to what I need. But it uses caretPositionFromPoint() which takes X, Y coordinates as arguments.
Update
For the purposes of clarity, here are the tests I'm working with:
import { expect, test } from 'vitest'
import paginate from './paginate'
// 1
test('it should chunk plain text', () => {
// a
const testString = 'aa bb cc dd ee';
const expected = ['aa', 'bb', 'cc', 'dd', 'ee']
expect(paginate(testString, 2)).toStrictEqual(expected)
// b
const testString2 = 'a a b b c c';
const expected2 = ['a a', 'b b', 'c c']
expect(paginate(testString2, 3)).toStrictEqual(expected2)
// c
const testString3 = 'aa aa bb bb cc cc';
const expected3 = ['aa aa', 'bb bb', 'cc cc']
expect(paginate(testString3, 5)).toStrictEqual(expected3)
// d
const testString4 = 'aa bb cc';
const expected4 = ['aa', 'bb', 'cc']
expect(paginate(testString4, 4)).toStrictEqual(expected4)
// e
const testString5 = 'a b c d e f g';
const expected5 = ['a b c', 'd e f', 'g']
expect(paginate(testString5, 5)).toStrictEqual(expected5)
// f
const testString6 = 'aa bb cc';
const expected6 = ['aa bb', 'cc']
expect(paginate(testString6, 7)).toStrictEqual(expected6)
})
// 2
test('it should chunk an HTML string without stranding tags', () => {
const testString = 'aa <strong>bb</strong> <em>cc dd</em>';
const expected = ['aa', '<strong>bb</strong>', '<em>cc</em>', '<em>dd</em>']
expect(paginate(testString, 3)).toStrictEqual(expected)
})
// 3
test('it should handle tags that straddle pages', () => {
const testString = '<strong>aa bb cc</strong>';
const expected = ['<strong>aa</strong>', '<strong>bb</strong>', '<strong>cc</strong>']
expect(paginate(testString, 2)).toStrictEqual(expected)
})
Here is a solution that assumes and supports the following:
tags without attributes (you could tweak the regex to support that)
well formed tags assumed, e.g. not: <b><i>wrong nesting</b></i>, missing <b>end tag, missing start</b> tag
tags may be nested
tags are removed & later restored for proper characters per page count
page split is done by looking backwards for first space
function paginate(html, pageSize) {
let splitRegex = new RegExp('\\s*[\\s\\S]{1,' + pageSize + '}(?!\\S)', 'g');
let tagsInfo = []; // saved tags
let tagOffset = 0; // running offset of tag in plain text
let pageOffset = 0; // page offset in plain text
let openTags = []; // open tags carried over to next page
let pages = html.replace(/<\/?[a-z][a-z0-9]*>/gi, (tag, pos) => {
let obj = { tag: tag, pos: pos - tagOffset };
tagsInfo.push(obj);
tagOffset += tag.length;
return '';
}).match(splitRegex).map(page => {
let nextOffset = pageOffset + page.length;
let prefix = openTags.join('');
tagsInfo.slice().reverse().forEach(obj => {
if(obj.pos >= pageOffset && obj.pos < nextOffset) {
// restore tags in reverse order to maintain proper position
page = page.substring(0, obj.pos - pageOffset) + obj.tag + page.substring(obj.pos - pageOffset);
}
});
tagsInfo.forEach(obj => {
let tag = obj.tag;
if(obj.pos >= pageOffset && obj.pos < nextOffset) {
if(tag.match(/<\//)) {
// remove tag from openTags list
tag = tag.replace(/<\//, '<');
let index = openTags.indexOf(tag);
if(index >= 0) {
openTags.splice(index, 1);
}
} else {
// add tag to openTags list
openTags.push(tag);
}
}
});
pageOffset = nextOffset;
let postfix = openTags.slice().reverse().map(tag => tag.replace(/</, '</')).join('');
page = prefix + page.trim() + postfix;
return page.replace(/<(\w+)><\/\1>/g, ''); // remove tags with empty content
});
return pages;
}
[
{ str: 'some <strong>text <i>with</i> HTML</strong> tags, and <i>some <b>nested tags</b> sould be <b>supported</b> as well</i>.', size: 16 },
{ str: 'a a b b c c', size: 3 },
{ str: 'aa aa bb bb cc cc', size: 5 },
{ str: 'aa bb cc', size: 4 },
{ str: 'aa <strong>bb</strong> <em>cc dd</em>', size: 3 },
{ str: '<strong>aa bb cc</strong>', size: 2 }
].forEach(o => {
let pages = paginate(o.str, o.size);
console.log(pages);
});
Output:
[
"some <strong>text <i>with</i></strong>",
"<strong> HTML</strong> tags, and",
"<i>some <b>nested tags</b></i>",
"<i> sould be</i>",
"<i><b>supported</b> as</i>",
"<i>well</i>."
]
[
"a a",
"b b",
"c c"
]
[
"aa aa",
"bb bb",
"cc cc"
]
[
"aa",
"bb",
"cc"
]
[
"aa",
"<strong>bb</strong>",
" <em>cc</em>",
"<em>dd</em>"
]
[
"<strong>aa</strong>",
"<strong>bb</strong>",
"<strong>cc</strong>"
]
Update
Based on new request in comment I fixed the split regex from '[\\s\\S]{1,' + pageSize + '}(?!\\S)' to '\\s*[\\s\\S]{1,' + pageSize + '}(?!\\S)', e.g. added \\s* to catch leading spaces. I also added a page.trim() to remove leading spaces. Finally I added a few of the OP examples.

How to make pseudo-random BigInt generator convert to string of particular length of characters?

Along the lines of How to get this PRNG to generate numbers within the range? , I am this far (parts is an array of 32 2-character "symbols"):
const parts = `mi
ma
mo
ne
nu
di
da
do
be
bu
ti
te
ta
to
tu
ki
ke
ka
ko
ku
si
sa
so
ze
zu
fi
fa
fo
ve
vu
xe
xu`.trim().split(/\n+/)
const fetch = (x, o) => {
if (x >= o) {
return x
} else {
const v = (x * x) % o
return (x <= (o / 2n)) ? v : o - v
}
}
const fetchLarge = (x) => fetch(x, 41223334444555556666667777777888888889999999997n)
// the last number can be anything.
const buildLarge = (x, o) => fetchLarge((fetchLarge(x) + o) % BigInt(Math.pow(32, 31)) ^ 2030507011013017019023n)
const createArray = (n, mod = 32n) => {
if (!n) return [0];
let arr = [];
while (n) {
arr.push(Number(n % mod));
n /= mod;
}
return arr;
}
const write = (i) => {
const x = buildLarge(i++, 272261127249452727280272961627319532734291n)
return createArray(x).map(x => parts[x]).join('')
}
let i = 1n
while (i < 10000) {
console.log(write(i))
}
I am generating results along the lines of:
kitekutefaxunetotuzezumatotamabukidimasoxumoxudofasositinu,6038940986212279582529645303138677298679151
sokiketufikefotekakidotetotesamizununetefokixefitetisovene,5431347628569519336817719657935192515363318
xudamituzesimixuxemixudakedatetutununekobuzexesozuxedinenu,5713969289948157459645315228321450728816863
dazenenemovudadikukatatakibekaxexemovubedivusidatafisasine,5082175912370834928186684152014555456835302
xufotidosokabunudomimibefisimakusimokedamomazexekofomokane,4925740069222414438181195472381794794580863
sodozekadakuzemaxetexukuzumisikitazufitizexekatetotuxusone,5182433137814021540565892366585827483507958
kikokasatudatidatufikizesadimatakakatudisibumofotuzutaze,1019165422643074024784461594259815846823503
dakikinetofonexesimavufafisaxefosafisikofotasanekovetevu,1279315636939618596561544978621478602915302
kinunebebuzukokemidatekobusofokikozukobedodakesisikunuki,659622269329577207976266617866288582888591
sozesifamoxebusitotesisasizekudasomitatavudidizukadimate,480714979099063166920265752208932468511478
xumakikofakumixefotisikunumovudafasofikimozenudafosidaka,749508057657951412178315361964670398839871
dazedokutituzufakebutifokekusobuzutemanesadafadatetitamo,103886097260879003150138325027254855900902
xukemizukozefaxetudizukedimotevubesitekitavukakevutisibe,376136321524704717800574424940622855799327
dozexedivenudifabuvedavebukeketozukumasimakuvetuketomafaxe,42948292938975784099596927092482269526555367
mimasatukidisodifikekutovumazefikefonemofimotesonusazexuxe,43196343143305047528500657761292227037320224
zedafimasobukudizedozefoketuzekisadotufikudadokisakedofoxe,43000150124549846140482724444846720574088407
kisafimosotuvuvuzuzukodibevutemidazusisamokososikomofavuma,2692423943832809210699522830552769656612527
soxutokonebusidaketesomoxemibesonubudibekunumatifokokanemo,2942202721014541374299446744441542204274678
xusikematetemititafafakuxusinekefoketonebetokudonesomosama,2312137916289687577537008913213461971911327
How can I make it so all strings are of length 31 "symbols" (since symbols are 2 characters in this example, that is a total of 62 characters), like this:
xusikematetemititafafakuxusinekexusikematetemititafafakuxusine
That is: What should the 3 large bigint numbers be above in the algorithm? Also, what should they be so the distribution appears random? I noticed that using large numbers close to the boundary resulted in much better apparently-randomized results, compared to smaller numbers. Also, you can't just prefix the bigint x with 0's, which would result in mamamamamama.... Finally, there can at most be 2 pairs of same letters in a sequence, which I assume you can only really solve by just skipping over the results that don't fit that constraint (unless there is some math magic that can somehow tell if more than two of these 32 "symbols" appear next to each other).
Regarding the last part, these are valid results:
mamavumamavumama...
nanavumamavuvuma...
These are not valid:
mamamavumamavuma...
mavuvuvumamavuma...
Because there are 3 pairs in a row that are the same.
To summarize:
How to make it so all strings are 62 characters in length, without padding with zeroes? That means it must fit within some range of BigInts I'm not too sure about.
So that the distribution appears enormously random (i.e. so we don't get just the tail tip of the sequence changing slowly, but instead the entire number seems to completely change, as the examples show).
So that no more than two pairs are similar in a sequence? This part can just be solved by skipping results we find in the pseudo-randomized sequence, unless there is some magic to accomplish it that I'm not possibly fathoming :) For example, maybe there is some magic to do with multiples of similar 5-bit chunks or something, I don't know. But don't need to get fancy, skipping the results that match a regex is fine too.
Here we use Base 32 (hard-coded, but could be parts.length) for the 2 (maxRepeat) least significant digits and Base 31 (hard-coded, but could be parts.length-1) for the remaining digits. This gives the maximum range of values for the length.
All values from 0n to getMax(), inclusive, can be encoded to 31 (minLength) symbols.
The magic for preventing repeats longer than maxRepeat is to check the ith digit against the i - maxRepeat digit, making an adjustment to the ith digit if >=. While this produces valid encodings (ones that follow the rules), not all arbitrary symbol sequences are valid, even if they follow the rules. For example, the sequence mimami would never be generated and wouldn't be decode-able.
const split = new RegExp(`.{2}`, 'g');
const parts = 'mimamonenudidadobebutitetatotukikekakokusisasozezufifafovevuxexu'.match(split);
const partsMap = Object.fromEntries(parts.map((v,i) => ([v,BigInt(i)])));
const encode = (value, maxRepeat = 2, minLength = 31) => {
value = BigInt(value);
const digits = [];
// convert the value to digits
// the first least significant `maxRepeat` digits use base 32, the rest use base 31
while(value > 0) {
const radix = digits.length < maxRepeat ? 32n : 31n;
digits.push(value % radix);
value /= radix;
}
// add 0 padding
while(digits.length < minLength) {
digits.push(0n);
}
// adjust digits to prevent sequences longer than `maxRepeat`
const symbols = []
digits.forEach((v,i) => {
symbols.push((i < maxRepeat || v < symbols[i-maxRepeat]) ? v : v+1n);
});
// map to symbols and return string
const str = symbols.map(v => parts[v]).join('');
return str;
};
const decode = (str, maxRepeat = 2) => {
// split string into array of symbols
const symbols = str.match(split);
// convert symbols to digits
const digits = symbols.map(v => partsMap[v]).map((v,i,a) => {
if(i < maxRepeat || v < a[i-maxRepeat]) return v;
return v-1n;
});
// compute the threshold where we transition from base 31 to base 32
const threshold = digits.length - maxRepeat;
// convert digits to BigInt
const results = digits.reverse().reduce(
(s,v,i) => (s * (i >= threshold ? 32n : 31n) + v)
, 0n);
return results;
};
// compute the maximum value that can be encoded using `minLength` number of symbols
const getMax = (maxRepeat = 2, minLength = 31) => 32n ** BigInt(maxRepeat) * 31n ** BigInt(minLength - maxRepeat) - 1n;
// Consoles will print BigInt but Stackoverflow's interpreter
// doesn't understand them yet so we use `.toString()`.
console.log('limit:', getMax().toString());
console.log(encode(getMax()));
const n1 = 6038940986212279582529645303138677298679151n;
console.log(encode(n1)); // 'kitefitifomazekosaxubezutatudofotudimidanemadanumasisivebumimi'
const n2 = 0n;
console.log(encode(n2)); // 'mimimamamimimamamimimamamimimamamimimamamimimamamimimamamimima'
const s1 = 'kitefitifomazekosaxubezutatudofotudimidanemadanumasisivebumimi';
console.log(decode(s1).toString()); // 6038940986212279582529645303138677298679151
const s2 = 'mimimamamimimamamimimamamimimamamimimamamimimamamimimamamimima';
console.log(decode(s2).toString()); // 0
console.log(decode(encode(0n)) == 0n);
console.log(decode(encode(6038940986212279582529645303138677298679151n)) == 6038940986212279582529645303138677298679151n);
.as-console-wrapper { max-height: 100% !important; top: 0; }

Reorder Data in Log Files - Javascript

I'm trying to solve the Reorder Data in Log Files algorithm.
You have an array of logs. Each log is a space delimited string of words.
For each log, the first word in each log is an alphanumeric identifier. Then, either:
Each word after the identifier will consist only of lowercase letters, or;
Each word after the identifier will consist only of digits.
We will call these two varieties of logs letter-logs and digit-logs. It is guaranteed that each log has at least one word after its identifier.
Reorder the logs so that all of the letter-logs come before any digit-log. The letter-logs are ordered lexicographically ignoring identifier, with the identifier used in case of ties. The digit-logs should be put in their original order.
Return the final order of the logs.
Example:
Input: logs = ["dig1 8 1 5 1","let1 art can","dig2 3 6","let2 own kit dig","let3 art zero"]
Output: ["let1 art can","let3 art zero","let2 own kit dig","dig1 8 1 5 1","dig2 3 6"]
My idea is having a map for the digits and one for the letters. I have done it. Then, I would need to sort the digits and letters and add all the sorted letters to my answer array and all the sorted digits to my answer array.
var reorderLogFiles = function(logs) {
if(!logs || logs.length === 0)
return [];
let numbers = {
'0': 0, '1': 1, '2': 2, '3': 3, '4': 4, '5': 5, '6': 6,
'7': 7, '8': 8, '9': 9
};
let digits = new Map();
let letters = new Map();
for(let i=0; i<logs.length; i++) {
const log = logs[i].split(" ");
if(numbers[log[1]] !== undefined)
digits.set(log[0], log.splice(1, log.length));
else
letters.set(log[0], log.splice(1, log.length));
}
// How can I sort letter and digits?
let ans = [];
for(const [key, value] of sortedLetters) {
const temp = key + " " + value.join(" ");
ans.push(temp);
}
for(const [key, value] of sortedDigits) {
const temp = key + " " + value.join(" ");
ans.push(temp);
}
return ans;
};
I think you can simplify your code somewhat. First, create the digits and letters groups by filtering the original logs; this can be made easier by first splitting all the values in logs. Next, sort the letters based on the second value in the array and add the digits to the end of the sorted array. Finally, join the strings back together:
const reorderLogFiles = logs => {
// split values on first space
logs = logs.map(v => v.split(/\s+(.*)/).filter(Boolean));
// filter into digits and letters
let digits = logs.filter(v => v[1].match(/^[\s\d]+$/));
let letters = logs.filter(v => v[1].match(/^[a-z\s]+$/));
// sort the letters
letters.sort((a, b) => (c = a[1].localeCompare(b[1])) ? c : a[0].localeCompare(b[0]));
// reassemble the list
result = letters.concat(digits);
// and convert back to strings
result = result.map(a => a.join(' '));
return result;
}
let logs = ["dig1 8 1 5 1", "let1 art can", "dig2 3 6", "let2 own kit dig", "let3 art zero"];
console.log(reorderLogFiles(logs));
logs = ["a1 9 2 3 1", "g1 act car", "zo4 4 7", "ab1 off key dog", "a8 act zoo", "a2 act car"];
console.log(reorderLogFiles(logs));
Note this code can be written more compactly by chaining operations but I've written it out more fully to make it easier to follow.
If you don't want to use regex, you can test the first character of each substring to see if it's a digit or letter. For example:
let digits = logs.filter(v => v[1][0] >= '0' && v[1][0] <= '9');
let letters = logs.filter(v => v[1][0] >= 'a' && v[1][0] <= 'z');

JavaScript Regex to find UOM in a string

I have a list of products that contains UOM in the product title. It needs automatically detect the UOM in the title by using Regex.
Expectations
Banana Yogurt 70ml returns ml
Fish Nuggets 200G returns g
Potato Wedges 200 G returns g
I have this function below
detectMetricUnit = (title) => {
let unit,
regex = new RegExp(/(?:\d)/mg),
measurement = title.match(regex) && title.match(regex)[0],
matches = measurement && title.split(measurement)[1];
if(matches) {
if(/millilitre|milliliter|ml/.test(matches.toLowerCase())){
unit = 'ml';
} else if(/litre|liter|l/.test(matches.toLowerCase())){
unit = 'l';
} else if (/kilogram|kg/.test(matches.toLowerCase())) {
unit = 'kg';
} else if (/gram|g/.test(matches.toLowerCase())) {
unit = 'g';
}
}
return unit;
}
However I have some problematic strings such as
Chocolate Drink 330ML X 24 matches 3 and return null UOM
which I am expecting to get ml.
Appreciate if someone could point out my mistake in my regex. How do I actually get the full integers and find the UOM attached next to it even with a space?
You may define a dictionary of possible UOMs you want to detect and then build a regex similar to
/(\d+(?:\.\d+)?)\s?(millilitre|milliliter|ml|litre|liter|l|kilogram|kg|gram|g)\b/i
See the regex demo. The (\d+(?:\.\d+)?) part will capture an integer or float value into Group 1, then \s? match an optional whitespace (change to \s* to match 0 or more whitespaces), and then (millilitre|milliliter|ml|litre|liter|l|kilogram|kg|gram|g)\b will capture UOM unit into Group 2 as a whole word (due to \b word boundary).
Here is the JS implementation to get the first UOM from string:
let strs = ['Banana Yogurt 70ml', 'Fish Nuggets 200G', 'Potato Wedges 200 G', 'Chocolate Drink 330ML X 24']
let dct = {millilitre: 'ml', milliliter: 'ml', ml: 'ml', litre:'l', liter: 'l', l: 'l', kilogram: 'kg', kg: 'kg', gram: 'g', g: 'g'}
detectMetricUnit = (title) => {
let unit, match, val,
regex = new RegExp("(\\d+(?:\\.\\d+)?)\\s?(" + Object.keys(dct).join("|") + ")\\b", "i");
match = title.match(regex);
if (match) {
val = match[1];
unit = dct[match[2].toLowerCase()]
}
return [val, unit];
}
strs.forEach(x => console.log(detectMetricUnit(x)) )
To get all of them, multiple occurrences:
let strs = ['Banana Yogurt 70ml and Fish Nuggets 200G', 'Potato Wedges 200 G and Chocolate Drink 330ML X 24']
let dct = {millilitre: 'ml', milliliter: 'ml', ml: 'ml', litre:'l', liter: 'l', l: 'l', kilogram: 'kg', kg: 'kg', gram: 'g', g: 'g'}
detectMetricUnit = (title) => {
let match, results = [],
regex = new RegExp("(\\d+(?:\\.\\d+)?)\\s?(" + Object.keys(dct).join("|") + ")\\b", "ig");
while (match=regex.exec(title)) {
results.push([ match[1], dct[match[2].toLowerCase()] ]);
}
return results;
}
strs.forEach(x => console.log(x, detectMetricUnit(x)) )

Picking A, C and M for Linear congruential generator

I am looking to implement a simple pseudorandom number generator (PRNG) that has a specified period and guaranteed no collisions for the duration of that period. After doing some research I came across the very famous LCG which is perfect. The problem is, I am having trouble understanding how to properly configure it. Here is my current implementation:
function LCG (state)
{
var a = ?;
var c = ?;
var m = ?;
return (a * state + c) % m;
}
It says that in order to have a full period for all seed values the following conditions must be met:
c and m are relatively prime
a-1 is divisible by all prime factors of m
a-1 is a multiple of 4 if m is a multiple of 4
1 and 3 are simple to understand and test for. However what about 2, I don't quite understand what that means or how to check for it. And what about C, can it be zero? what if it's non-zero?
Overall I need to select A, C and M in such a way that I have a period of 48^5 - 1. M is equal to the period, I am not sure about A and C.
From Wikipedia:
Provided that c is nonzero, the LCG will have a full period for all seed values if and only if:
c and m are relatively prime,
a-1 is divisible by all prime factors of m,
a-1 is a multiple of 4 if m is a multiple of 4.
You said you want a period of 485-1, so you must choose m≥485-1. Let's try choosing m=485-1 and see where that takes us. The conditions from the Wikipedia article prohibit you from choosing c=0 if you want the period to be m.
Note that 11, 47, 541, and 911 are the prime factors of 485-1, since they're all prime and 11*47*541*911 = 485-1.
Let's go through each of those conditions:
For c and m to be relatively prime, c and m must have no common prime factors. So, pick any prime numbers other than 11, 47, 541, and 911, then multiply them together to choose your c.
You'll need to choose a such that a-1 is divisible by all the prime factors of m, i.e., a = x*11*47*541*911 + 1 for any x of your choosing.
Your m is not a multiple of 4, so you can ignore the third condition.
In summary:
m = 485-1,
c = any product of primes other than 11, 47, 541, and 911 (also, c must be less than m),
a = x*11*47*541*911 + 1, for any nonnegative x of your choice (also, a must be less than m).
Here's a smaller test case (in Python) using a period of 482-1 (which has prime factors 7 and 47):
def lcg(state):
x = 1
a = x*7*47 + 1
c = 100
m = 48**2 - 1
return (a * state + c) % m
expected_period = 48**2 - 1
seeds = [5]
for i in range(expected_period):
seeds.append(lcg(seeds[-1]))
print(len(set(seeds)) == expected_period)
It outputs True, as it should. (If you have any trouble reading Python, let me know and I can translate it to JavaScript.)
Based on Snowball's answer and the comments I've created a complete example. You can use the set == list comparison for smaller numbers. I could not fit 48^5-1 into memory.
To circumvent the a < m problem, I'm incrementing the target a few times to find a number where a is able to be < m (where m has duplicated prime factors). Surprisingly +2 is enough for a lot of numbers. The few extra numbers are later skipped while iterating.
import random
def __prime_factors(n):
"""
https://stackoverflow.com/a/412942/6078370
Returns all the prime factors of a positive integer
"""
factors = []
d = 2
while n > 1:
while n % d == 0:
factors.append(d)
n //= d
d += 1
if d * d > n:
if n > 1: factors.append(n)
break
return factors
def __multiply_numbers(numbers):
"""multiply all numbers in array"""
result = 1
for n in numbers:
result *= n
return result
def __next_good_number(start):
"""
https://en.wikipedia.org/wiki/Linear_congruential_generator#c%E2%89%A00
some conditions apply for good/easy rotation
"""
number = start
factors = __prime_factors(number)
while len(set(factors)) == len(factors) or number % 4 == 0:
number += 1
factors = __prime_factors(number)
return number, set(factors)
# primes < 100 for coprime calculation. add more if your target is large
PRIMES = set([2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41,
43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97])
def create_new_seed(target):
"""be aware, m might become > target"""
m, factors = __next_good_number(target)
a = __multiply_numbers(factors) + 1
# https://en.wikipedia.org/wiki/Coprime_integers
otherPrimes = [p for p in PRIMES if p not in factors]
# the actual random part to get differnt results
random.shuffle(otherPrimes)
# I just used arbitary 3 of the other primes
c = __multiply_numbers(otherPrimes[:3])
# first number
state = random.randint(0, target-1)
return state, m, a, c
def next_number(state, m, a ,c, limit):
newState = (a * state + c) % m
# skip out of range (__next_good_number increases original target)
while newState >= limit:
newState = (a * newState + c) % m
return newState
if __name__ == "__main__":
target = 48**5-1
state, m, a, c = create_new_seed(target)
print(state, m, a, c, 'target', target)
# list and set can't fit into 16GB of memory
checkSum = sum(range(target))
randomSum = 0
for i in range(target):
state = newState = next_number(state, m, a ,c, target)
randomSum += newState
print(checkSum == randomSum) # true
LCG is quite fascinating and usable in things like games.
You can iterate a giant list of things in a deterministic random order. Shuffeling and saving the whole list is not required:
def riter(alist):
""" iterate list using LCG """
target = len(alist)
state, m, a, c = create_new_seed(target)
for _ in range(target):
yield alist[state]
state = next_number(state, m, a ,c, target)
It is easy to save the state in between iteration steps:
savedState = '18:19:25:6:12047269:20'
print('loading:', savedState)
i, state, m, a, c, target = (int(i) for i in savedState.split(':'))
state = next_number(state, m, a, c, target)
i += 1
print('i:', i, 'is random number:', state, 'list done:', i+1 == target)
print('saving:', '{}:{}:{}:{}:{}:{}'.format(i, state, m, a, c, target))

Categories

Resources