Finding volume and item count using regular expressions - javascript

I am currently building a JavaScript web scraper for a grocery store that processes a title of a product and then returns the item count, volume and price per litre of a product. Most of the product titles look something like this:
Coca cola (vanilla flavour) 12 x 330 mL
In order to obtain meta data about this product, I have written a Regular Expression. It will look for look for a word boundary followed by a 1 or 2 digit number, whitespace, the string 'x', another whitespace and finally a 1, 2 or 3 digit number:
const filter = new RegExp(/\b\d{1,2}\sx\s\d{1,3}/);
I then test each result for a match with the Regular Expression and then calculate the item count, item volume, volume in litres and then the price per litre.
if (result.title.match(filter)) {
result.itemCount = parseInt(result.title.match(/\d{1}\s/));
result.itemVolume = parseInt(result.title.match(/\d{2,3}\s/));
result.litreVolume = (result.itemCount * result.itemVolume) / 1000;
result.pricePerLitre = +(result.price / result.litreVolume).toFixed(2);
} else {
result.itemCount = 1;
result.itemVolume = parseInt(result.title.match(/\d{2,3}\s/));
result.litreVolume = result.itemVolume / 1000;
result.pricePerLitre = +(result.price / result.litreVolume).toFixed(2);
}
90% of the results look good, but sometimes I get unexpected results. For example:
an item count of NaN, which may have to do with the fact that some titles contain several more numbers (Coca Cola (4-Way) 12 x 330 mL))
a volume of Infinity
a price per litre that is way too high
Clearly I am doing something wrong with my approach to calculating the desired meta data. What would be a better way of doing calculations with RegEx? Am I missing something that would make my calculations less prone to errors?

If i understand correctly filter \b\d{1,2}\sx\s\d{1,3} works, but your sub filters do not (\d{1}\s)...
I only used to using regex in c# but, i saw you could use groups in java also.
change your pattern to (\b\d{1,2})\sx\s(\d{1,3}). When you put brackets in your regex, that part becomes a group that you can acces afterwards.
As i said, i haven't used java in a few years, but i picked this code snippet from the web. It shows how to use groups in java. As pattern you should use the (\b\d{1,2})\sx\s(\d{1,3}). If it is the same as in c# group(0) is the whole result, group(1) is your first actual group, group(2) is the second.
// Create a Pattern object
Pattern r = Pattern.compile(pattern);
// Now create matcher object.
Matcher m = r.matcher(line);
if (m.find( )) {
System.out.println("Found value: " + m.group(0) );
System.out.println("Found value: " + m.group(1) );
}
I think you can write it with less code than stated above, but you get the picture ;-)

Related

How to generate a GUID with a custom alphabet, that behaves similar to an MD5 hash (in JavaScript)?

I am wondering how to generate a GUID given an input string, such that the same input string results in the same GUID (sort of like an MD5 hash). The problem with MD5 hashes is they just guarantee low collision rate, rather than uniqueness. Instead I would like something like this:
guid('v1.0.0') == 1231231231123123123112312312311231231231
guid('v1.0.1') == 6154716581615471658161547165816154716581
guid('v1.0.2') == 1883939319188393931918839393191883939319
How would you go about implementing this sort of thing (ideally in JavaScript)? Is it even possible to do? I am not sure where to start. Things like the uuid module don't take a seed string, and they don't let you use a custom format/alphabet.
I am not looking for the canonical UUID format, but rather a GUID, ideally one made up of just integers.
What you would need is define a one-to-one mapping of text strings (such as "v1.0.0") onto 40 digit long strings (such as "123123..."). This is also known as a bijection, although in your case an injection (a simple one-to-one mapping from inputs to outputs, not necessarily onto) may be enough. As you note, hash functions don't necessarily ensure this mapping, but there are other possibilities, such as full-period linear congruential generators (if they take a seed that you can map one-to-one onto input string values), or other reversible functions.
However, if the set of possible input strings is larger than the set of possible output strings, then you can't map all input strings one-to-one with all output strings (without creating duplicates), due to the pigeonhole principle.
For example, you can't generally map all 120-character strings one-to-one with all 40-digit strings unless you restrict the format of the 120-character strings in some way. However, your problem of creating 40-digit output strings can be solved if you can accept limiting input strings to no more than 1040 values (about 132 bits), or if you can otherwise exploit redundancy in the input strings so that they are guaranteed to compress losslessly to 40 decimal digits (about 132 bits) or less, which may or may not be possible. See also this question.
The algorithm involves two steps:
First, transform the string to a BigInt by building up the string's charCodeAt() values similarly to the stringToInt method given in another answer. Throw an error if any charCodeAt() is 0x80 or greater, or if the resulting BigInt is equal to or greater than BigInt(alphabet_length)**BigInt(output_length).
Then, transform the integer to another string by taking the mod of the BigInt and the output alphabet's size and replacing each remainder with the corresponding character in the output alphabet, until the BigInt reaches 0.
One approach would be to use the method from that answer:
/*
* uuid-timestamp (emitter)
* UUID v4 based on timestamp
*
* Created by tarkh
* tarkh.com (C) 2020
* https://stackoverflow.com/a/63344366/1261825
*/
const uuidEmit = () => {
// Get now time
const n = Date.now();
// Generate random
const r = Math.random(); // <- swap this
// Stringify now time and generate additional random number
const s = String(n) + String(~~(r*9e4)+1e4);
// Form UUID and return it
return `${s.slice(0,8)}-${s.slice(8,12)}-4${s.slice(12,15)}-${[8,9,'a','b'][~~(r*3)]}${s.slice(15,18)}-${s.slice(s.length-12)}`;
};
// Generate 5 UUIDs
console.log(`${uuidEmit()}
${uuidEmit()}
${uuidEmit()}
${uuidEmit()}
${uuidEmit()}`);
And simply swap out the Math.random() call to a different random function which can take your seed value. (There are numerous algorithms out there for creating a seedable random method, so I won't try prescribing a particular one).
Most random seeds expect numeric, so you could convert a seed string to an integer by just adding up the character values (multiplying each by 10^position so you'll always get a unique number):
const stringToInt = str =>
Array.prototype.slice.call(str).reduce((result, char, index) => result += char.charCodeAt(0) * (10**(str.length - index)), 0);
console.log(stringToInt("v1.0.0"));
console.log(stringToInt("v1.0.1"));
console.log(stringToInt("v1.0.2"));
If you want to generate the same extract string every time, you can take a similar approach to tarkh's uuidEmit() method but get rid of the bits that change:
const strToInt = str =>
Array.prototype.slice.call(str).reduce((result, char, index) => result += char.charCodeAt(0) * (10**(str.length - index)), 0);
const strToId = (str, len = 40) => {
// Generate random
const r = strToInt(str);
// Multiply the number by some things to get it to the right number of digits
const rLen = `${r}`.length; // length of r as a string
// If you want to avoid any chance of collision, you can't provide too long of a string
// If a small chance of collision is okay, you can instead just truncate the string to
// your desired length
if (rLen > len) throw new Error('String too long');
// our string length is n * (r+m) + e = len, so we'll do some math to get n and m
const mMax = 9; // maximum for the exponent, too much longer and it might be represented as an exponent. If you discover "e" showing up in your string, lower this value
let m = Math.floor(Math.min(mMax, len / rLen)); // exponent
let n = Math.floor(len / (m + rLen)); // number of times we repeat r and m
let e = len - (n * (rLen + m)); // extra to pad us to the right length
return (new Array(n)).fill(0).map((_, i) => String(r * (i * 10**m))).join('')
+ String(10**e);
};
console.log(strToId("v1.0.0"));
console.log(strToId("v1.0.1"));
console.log(strToId("v1.0.2"));
console.log(strToId("v1.0.0") === strToId("v1.0.0")); // check they are the same
console.log(strToId("v1.0.0") === strToId("v1.0.1")); // check they are different
Note, this will only work with smaller strings, (probably about 10 characters top) but it should be able to avoid all collisions. You could tweak it to handle larger strings (remove the multiplying bit from stringToInt) but then you risk collisions.
I suggest using MD5...
Following the classic birthday problem, all things being equal, the odds of 2 people sharing a birthday out of a group of 23 people is ( see https://en.wikipedia.org/wiki/Birthday_problem )...
For estimating MD5 collisions, I'm going to simplify the birthday problem formula, erring in the favor of predicting a higher chance of a collision...
Note though that whereas in the birthday problem, a collision is a positive result, in the MD5 problem, a collision is a negative result, and therefore providing higher than expected collision odds provides a conservative estimate of the chance of a MD5 collision. Plus this higher predicted chance can in some way be considered a fudge factor for any uneven distribution in the MD5 output, although I do not believe there is anyway to quantify this without a God computer...
An MD5 hash is 16 bytes long, resulting in a range of 256^16 possible values. Assuming that the MD5 algorithm is generally uniform in its results, lets suppose we create one quadrillion (ie, a million billion or 10^15) unique strings to run through the hash algorithm. Then using the modified formula (to ease the collision calculations and to add a conservative fudge factor), the odds of a collision are...
So, after 10^15 or one quadrillion unique input strings, the estimated odds of a hash collision are on par with the odds of winning the Powerball or the Mega Millions Jackpot (which are on order of 1 in ~300,000,000 per https://www.engineeringbigdata.com/odds-winning-powerball-grand-prize-r/ ).
Note too that 256^16 is 340282366920938463463374607431768211456, which is 39 digits, falling within the desired range of 40 digits.
So, suggest using the MD5 hash ( converting to BigInt ), and if you do run into a collision, I will be more than glad to spot you a lottery ticket, just to have a chance to tap into your luck and split the proceeds...
( Note: I used https://keisan.casio.com/calculator for the calculations. )
While UUID v4 is just used for random ID generation, UUID v5 is more like a hash for a given input string and namespace. It's perfect for what you describe.
As you already mentioned, You can use this npm package:
npm install uuid
And it's pretty easy to use.
import {v5 as uuidv5} from 'uuid';
// use a UUIDV4 as a unique namespace for your application.
// you can generate one here: https://www.uuidgenerator.net/version4
const UUIDV5_NAMESPACE = '...';
// Finally, provide the input and namespace to get your unique id.
const uniqueId = uuidv5(input, namespace);

Syntax for a regex for numbers prefixed by a particular string [duplicate]

This question already has answers here:
How to find a number in a string using JavaScript?
(9 answers)
Closed 3 years ago.
I really didn't want to ask this here as I'm sure this will get me downvoted, but I truly am stuck.
I have used multiple tools to test regex expressions but the syntax is very confusing.
What I have tried
I am going to be using javascript for this, I have the following string for example:
Product ID: 4381 - Fanta Berry cans 355ml x 24
This is a search result from an autocomplete dropdown, it will always have the format:
Product ID: product_id - Product Name
Now I need to get the product_id the number between the : and -
I have tried
/[\d]/g
But that simply selects all the numbers in the string.
I also tried:
[(:\b)-]
And that selects the : and - which are the characters between the number I want to get. But I can't seem to figure out the syntax to get the number between them. I feel like I'm very close but after hours of searching I can't seem to crack it, I know this isn't a place for people to do the work for you, but I assure you I have tried and am really at a loss! If anyone can tell me the little bit of syntax that's missing to get that number I would be very appreciative.
You can use a positive lookbehind:
(?<=(Product ID: ))(\d*)
Mind that this method is support by most, but not every browser.
Try specifying the product id prefix:
/Product ID: (\d+)/g
Here's full JavaScript code for you so that you don't get confused:
var str = "Product ID: 4381 - Fanta Berry cans 355ml x 24";
var ret = str.match(/Product ID: (\d+)/);
var product_id = ret[1];
console.log(product_id);
You can use /\d+/ just fine:
const string = "Product ID: 4381 - Fanta Berry cans 355ml x 24";
const match = string.match( /\d+/ )[0];
console.log( match );
Just make sure to select the first element of the resulting array.
Also note that not using the global flag will have the RegEx only match the first result

JavaScript - Convert Long Number to String

"" + 237498237498273908472390847239084710298374901823749081237409273492374098273904872398471298374
> '2.3749823749827392e+92'
I calculate IDs in a beautiful and arcane way:
time = new Date().getTime()
pid = process.pid
host = 0; (host +=s.charCodeAt(0) for s in os.hostname())
counter = MIPS.unique_id()
"#{host}#{pid}#{time}#{counter}"
Unfortunately, somewhere along the way the IDs (for example 11207648813339434616800). Unfortunately this means they sometimes turn to 1.1207648813339434e+22.
UPDATE:
It seems to be a "bug/feature" of redis. never expected that.
# Bug with Big Numbers on zadd
redis = require 'redis'
r = redis.createClient()
r.zadd 'zset', '342490809809999998098', 'key', ->
r.zscore 'zset', 'key', (_, results) ->
console.log typeof results # string
console.log results # 3.4249080981000002e+20
Javascript use 8-bytes double to store large numbers, which is 53bit precision. In your case, it is far beyond 53 bits, so you should use a big-number library, which can store big numbers precisely. Try javascript-bignum
Your number gets converted to 2.3749823749827392e+92 before you concatenate the number with the string to convert it.
The only solution is to use a container format that accepts an arbitrary number of digits, which is either a string or an array.
Can you provide us with a few more details as to how you are obtaining this number?

Chunk a string every odd and even position

I know nothing about javascript.
Assuming the string "3005600008000", I need to find a way to multiply all the digits in the odd numbered positions by 2 and the digits in the even numbered positions by 1.
This pseudo code I wrote outputs (I think) TRUE for the odd numbers (i.e. "0"),
var camid;
var LN= camid.length;
var mychar = camid.charAt(LN%2);
var arr = new Array(camid);
for(var i=0; i<arr.length; i++) {
var value = arr[i]%2;
Alert(i =" "+value);
}
I am not sure this is right: I don't believe it's chunking/splitting the string at odd (And later even) positions.
How do I that? Can you please provide some hints?
/=================================================/
My goal is to implement in a web page a validation routine for a smartcard id number.
The logic I am trying to implement is as follows:
· 1) Starting from the left, multiply all the digits in the odd numbered positions by 2 and the digits in the even numbered positions by 1.
· 2) If the result of a multiplication of a single digit by 2 results in a two-digit number (say "7 x 2 = 14"), add the digits of the result together to produce a new single-digit result ("1+4=5").
· 3) Add all single-digit results together.
· 4) The check digit is the amount you must add to this result in order to reach the next highest multiple of ten. For instance, if the sum in step #3 is 22, to reach the next highest multiple of 10 (which is 30) you must add 8 to 22. Thus the check digit is 8.
That is the whole idea. Google searches on smartcard id validation returned nothing and I am beginning to think this is overkill to do this in Javascript...
Any input welcome.
var theArray = camid.split(''); // create an array entry for each digit in camid
var size = theArray.length, i, eachValue;
for(i = 0; i < size; i++) { // iterate over each digit
eachValue = parseInt(theArray[i], 10); // test each string digit for an integer
if(!isNaN(eachValue)) {
alert((eachValue % 2) ? eachValue * 2 : eachValue); // if mod outputs 1 / true (due to odd number) multiply the value by 2. If mod outputs 0 / false output value
}
}
I discovered that what I am trying to do is called a Luhn validation.
I found an algorithm right here.
http://sites.google.com/site/abapexamples/javascript/luhn-validation
Thanks for taking the time to help me out. Much appreciated.
It looks like you might be building to a Luhn validation. If so, notice that you need to count odd/even from the RIGHT not the left of the string.

Counting rounds in a tournament

I've written a huge page in JavaScript for a tournament I'm hosting on a game. I've gotten everything I really need worked out into arrays, but I want to add rounds. The whole script adjusts to tournament settings (for more in the future) and I'd like this to adjust itself as well. So, let's say the tournament settings are [game,teamsize,entrylimit]. The entrylimit will be the key to finding the solution, because that decides the rounds. It works in a tree system (or however it's called). Let's say the entrylimit is 8. That means the first round will consist of 4 matches, and the second will consist of 2. If the entrylimit were 16, then the first round would consist of 8 matches, the second would consist of 4, and the third would consist of 2. I want to find a way to stick this into my loop where matches are written, and use the entrylimit and match number to generate the round number. All I need is a formula that can use those two variables to get my desired result. Also I apologize for the excessive amount of detail.
If I understand the problem, here's an example of how the entrylimit can get the number of rounds and the number of matches in each round.
Calculations:
var entrylimit=16;
var amount_of_rounds = Math.log(entrylimit) / Math.log(2);
for(i=amount_of_rounds; i>0; i--)
{
s = 'Round '+(amount_of_rounds-i+1)+' of '+amount_of_rounds+' consist of '+Math.pow(2, i-1)+' matches';
alert(s);
}
​
Try this:
var entryInfo = [];
function populateEntryInfo(entryLimit)
{
entryInfo = [];
var i = entryLimit;
while(i>1)
{
i = i/2;
entryInfo.push(i);
}
entryInfo.push(i);
}

Categories

Resources