Test for invalid text format in text area - javascript

A continuation of my previous question...
After testing the text format, if it is not the correct format I would like to figure out which pairs of hex values are incorrect (i.e. any pair that contains value(s) other than[0-9A-Fa-f]).
if( validFormat ) {
// do processing
}
else {
// find invalid hex value pairs
}
What is the most efficient way to obtain a list of incorrect(invalid) hex pairs so that I can report back the errors and their associated hex pairs.
Edit for additional question
Also, how would I go about testing to ensure there is not a "double space" anywhere, because that also constitutes for invalid format even though the hex pairs may be valid.
Thanks!

The easiest is to find all values and scan for those that are not valid:
var isHexPair = /^[0-9a-f]{2}$/i;
var allPairs = myTextArea.value.split(/\s+/);
var notHex = [];
for (var i=allPairs.length;i--;){
if (!isHexPair.test(allPairs[i])){
notHex.push(allPairs(i));
}
}
That regex says:
^ starting at the beginning of the string
[0-9a-f] find any character that is a digit or a-f
{2} find exactly two of them
$ making sure that we are now at the end of the string
i and make it case-insensitive (allow A-F as well as a-f)
With the above you can then do:
if (notHex.length){
// There is at least one invalid entry
}else{
// all is well
}
Edit: If you explicitly want to test that the string contains nothing but single-byte hex strings separated by a single space, the simplest test would just be:
if (/^([0-9a-f]{2} )+[0-9a-f]{2}$/i.test(myStr)){ /* valid! */ }

Take the values of the text area and store in a var since they are space separated do a .splt(" ") (splits on white space) on it and you will end up with an array of hex pairs. Then just iterate through the array comparing inside your loop to the regex from your last question, and store the invalid pairs in a new var and print that out to the user.

Related

how to find comma or other symbols in a string using javascript?

Let's assume I have an string... I want to convert them to numbers.. and It will only work for alphabets...
if my string contains comma or dot... I would like to avoid it...
I'm working with words by word. so if the string is--
"Hi, let's play!"
it should be converted to--
"4510, 584578'52 69775246!"
how can I do it?
function hasNumber(gottenWord) {
return (/\d/.test(gottenWord));
}
const numberTrue = hasNumber(gottenWord);
I was able to search for numbers but not sure how to search for symbols.. and I even have some custom symbols to search.
If all you want is to avoid anything that's not in abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ then the easiest way is to change \d to [^a-zA-Z]. [] indicates a character set, and starting a set with ^ means "not these".

How to get the correct element from a unicode string?

I want to get specific letters from an unicode string using index. However, it doesn't work as expected.
Example:
var handwriting = `๐–†๐–‡๐–ˆ๐–‰๐–Š๐–‹๐–Œ๐–๐–Ž๐–๐–๐–‘๐–’๐–“๐–”๐–•๐––๐–—๐–˜๐–™๐–š๐–›๐–œ๐–๐–ž๐–Ÿ๐•ฌ๐•ญ๐•ฎ๐•ฏ๐•ฐ๐•ฑ๐•ฒ๐•ณ๐•ด๐•ต๐•ถ๐•ท๐•ธ๐•น๐•บ๐•ป๐•ผ๐•ฝ๐•พ๐•ฟ๐–€๐–๐–‚๐–ƒ๐–„๐–…1234567890`
var normal = `abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890`
console.log(normal[3]) // gives 'd' but
console.log(handwriting[3]) // gives '๏ฟฝ' instead of '๐–‰'
also length doesn't work as expected normal.length gives correct value as 62 but handwriting.length gives 114.
Indexing doesn't work as expected. How can I access the elements of unicode array?
I tried this on python it works perfectly but in Javascript it is not working.
I need exact characters from the unicode string like an expected output of 'd' '๐–‰' for index 3
In Javascript, a string is a sequence of 16-bit code points. Since these characters are encoded above the Basic Multilingual Plane, it means that they are represented by a pair of code points, also known as a surrogate pair.
Reference
Unicode number of ๐–† is U+1D586. And 0x1D586 is greater than 0xFFFF (2^16). So, ๐–† is represented by a pair of code points, also known as a surrogate pair
console.log("๐–†".length)
console.log("๐–†" === "\uD835\uDD86")
One way is to create an array of characters using the spread syntax or Array.from() and then get the index you need
var handwriting = `๐–†๐–‡๐–ˆ๐–‰๐–Š๐–‹๐–Œ๐–๐–Ž๐–๐–๐–‘๐–’๐–“๐–”๐–•๐––๐–—๐–˜๐–™๐–š๐–›๐–œ๐–๐–ž๐–Ÿ๐•ฌ๐•ญ๐•ฎ๐•ฏ๐•ฐ๐•ฑ๐•ฒ๐•ณ๐•ด๐•ต๐•ถ๐•ท๐•ธ๐•น๐•บ๐•ป๐•ผ๐•ฝ๐•พ๐•ฟ๐–€๐–๐–‚๐–ƒ๐–„๐–…1234567890`
console.log([...handwriting][3])
console.log(Array.from(handwriting)[3])
A unicode character looks like '\u00E9' so if your string is longer this is normal.
To have the real length of a unicode string, you have to convert it to an array :
let charArray = [...handwriting]
console.log(charArray.length) //=62
Each item of your array is a char of your string.
charArray[3] will return you the unicode char corresponding to '๐–‰'

Regex to match only certain characters or strings and only one instance of each?

I feel like I know just enough about Regexes to get stuck. That said, I have an input field which will allow users to enter their currency symbol. I'm only wanting to allow said currency symbol and disallow anything else from being entered into that field. Some countries don't actually have a single symbol, but are just two or three characters, e.g., "Kr" for Krona. So the field has a max length of 3. Given it needs a max length of three to accommodate some currencies, I also don't want to allow three dollar signs to be entered, e.g., "$$$". I would only want to allow one single dollar, pound, euro, etc. sign.
Here's my basic code for allowing only these symbos in the input:
$('#id_currency_symbol').on('input',function (){
var value = $(this).val().toString();
newvalue = value.replace(/[^$ยฃโ‚ฌยฅโ‚ฃโ‚ฉ๏ฟฆ๏ฟฅโ‚ฝโ‚บโ‚นRkr]+/g,'');
$(this).val(newvalue);
});
This works for only allowing these symbols/letters, but like I said above, I don't want to allow users to enter more than a single instance of some symbols, i.e. dollar sign ($). In addition, I want to match exact strings for cases where the "symbol" is actually just two or three characters. In the case of Krona, the "symbol" is Kr. Given the above, users could in theory enter "rK" and it would be perfectly valid according to the regex, but I would ONLY want to allow the exact match of "Kr." Is that possible?
Thanks
I would suggest to forget regex, and go for O(1) algos,
var allowedCurrencyCodes = {
"$":true,
"ยข":true,
"ยฃ":true,
"INR":true,
"Kr":true,
.....,
.....,
.....
}
$(this).val(allowedCurrencyCodes[$(this).val()]?$(this).val():"");
you need to perform the check at blur event or when user has entered at least 3 chars, else it becomes buggy as it will keep on wiping the data right after first char.
if you want to keep check real time i.e. responsive when user is typing in, then you need to change the structure of allowedCurrencyCodes and convert it to nested object for multi-char currency codes, e.g $,ยฃ would be exactly same but INR or Kr will be defined like
"I":{
"N":{
"R":true
}
},
"K":{
"r":true
}
and minor change in fetch logic will be applied, where you will capture input and split it in array and then dip in allowedCurrencyCodes based on input chars, like
allowedCurrencyCodes[inputChar[0]][inputChar[1]]
or
allowedCurrencyCodes[inputChar[0]][inputChar[1]][inputChar[2]]
You may find the first occurrence of a currency symbol or acronym using a regex and then replace the whole input with the matched string. Single character currencies can be listed in [...] and any longer string may be added by alternation:
var checkInput = function(input) {
var regex = /[$ยฃโ‚ฌยฅโ‚ฃโ‚ฉ๏ฟฆ๏ฟฅโ‚ฝโ‚บโ‚น]|kr/i;
input = regex.exec(input);
return input == null ? "" : input[0];
}
console.log(checkInput("lkjahfkdshfjsdf Kr asdasda"));
console.log(checkInput("kr"));
console.log(checkInput("rk"));
console.log(checkInput("$$$"));
console.log(checkInput("โ‚ฃโ‚ฉ๏ฟฆ"));
console.log(checkInput("ABC"));
For completeness:
The "Regex to match only certain characters or strings and only one instance of each":
^(?:[$ยฃโ‚ฌยฅโ‚ฃโ‚ฉ๏ฟฆ๏ฟฅโ‚ฝโ‚บโ‚น]|kr)$
Demo: https://regex101.com/r/w9p9d9/1
Regex to strip off anything but "certain characters or strings" and these characters too if they appear more than once (for use within newvalue = value.replace(...,'');):
^(?=.*?([$ยฃโ‚ฌยฅโ‚ฃโ‚ฉ๏ฟฆ๏ฟฅโ‚ฝโ‚บโ‚น]|kr)|).*
Demo: https://regex101.com/r/qocsv5/1

parsing key/value pairs from string

I'm parsing the body text from incoming emails, looking for key/value pairs.
Example Email Body
First Name: John
Last Name:Smith
Email : john#example.com
Comments = Just a test comment that
may span multiple lines.
I tried using a RegEx ([\w\d\s]+)\s?[=|:]\s?(.+) in multiline mode. This works for most emails, but fails when there's a line break that should be part of the value. I don't know enough about RegEx to go any further.
I have another parser that goes line-by-line looking for the key/value pairs and simply folds a line into the last matched value if a key/value pair is NOT found. It's implemented in Scala.
val lines = text.split("\\r?\\n").toList
var lastLabelled: Int = -1
val linesBuffer = mutable.ListBuffer[(String, String)]()
// only parse lines until the first blank line
// null_? method is checks for empty strings and nulls
lines.takeWhile(!_.null_?).foreach(line => {
line.splitAt(delimiter) match {
case Nil if line.nonEmpty => {
val l = linesBuffer(lastLabelled)
linesBuffer(lastLabelled) = (l._1, l._2 + "\n" + line)
}
case pair :: Nil => {
lastLabelled = linesBuffer.length
linesBuffer += pair
}
case _ => // skip this line
}
})
I'm trying to use RegEx so that I can save the parser to the db and change it on a per-sender basis at runtime (implement different parsers for different senders).
Can my RegEx be modified to match values that contain newlines?
Do I need to just forget about using RegEx and use some JavaScript? I already have a JavaScript parser that lets me store the JS in the DB and essentially do everything that I want to do with the RegEx parser.
I think this should work...
((.+?)((\s*)(:|=)(\s*)))(((.|\n)(?!((.+?)(:|=))))+)
...as tested here http://regexpal.com/. If you loop through the matches you should be able to pull out the key and value.

Using Regular Expressions with Javascript replace method

Friends,
I'm new to both Javascript and Regular Expressions and hope you can help!
Within a Javascript function I need to check to see if a comma(,) appears 1 or more times. If it does then there should be one or more numbers either side of it.
e.g.
1,000.00 is ok
1,000,00 is ok
,000.00 is not ok
1,,000.00 is not ok
If these conditions are met I want the comma to be removed so 1,000.00 becomes 1000.00
What I have tried so is:
var x = '1,000.00';
var regex = new RegExp("[0-9]+,[0-9]+", "g");
var y = x.replace(regex,"");
alert(y);
When run the alert shows ".00" Which is not what I was expecting or want!
Thanks in advance for any help provided.
strong text
Edit
strong text
Thanks all for the input so far and the 3 answers given. Unfortunately I don't think I explained my question well enough.
What I am trying to achieve is:
If there is a comma in the text and there are one or more numbers either side of it then remove the comma but leave the rest of the string as is.
If there is a comma in the text and there is not at least one number either side of it then do nothing.
So using my examples from above:
1,000.00 becomes 1000.00
1,000,00 becomes 100000
,000.00 is left as ,000.00
1,,000.00 is left as 1,,000.00
Apologies for the confusion!
Your regex isn't going to be very flexible with higher orders than 1000 and it has a problem with inputs which don't have the comma. More problematically you're also matching and replacing the part of the data you're interested in!
Better to have a regex which matches the forms which are a problem and remove them.
The following matches (in order) commas at the beginning of the input, at the end of the input, preceded by a number of non digits, or followed by a number of non digits.
var y = x.replace(/^,|,$|[^0-9]+,|,[^0-9]+/g,'');
As an aside, all of this is much easier if you happen to be able to do lookbehind but almost every JS implementation doesn't.
Edit based on question update:
Ok, I won't attempt to understand why your rules are as they are, but the regex gets simpler to solve it:
var y = x.replace(/(\d),(\d)/g, '$1$2');
I would use something like the following:
^[0-9]{1,3}(,[0-9]{3})*(\.[0-9]+)$
[0-9]{1,3}: 1 to 3 digits
(,[0-9]{3})*: [Optional] More digit triplets seperated by a comma
(\.[0-9]+): [Optional] Dot + more digits
If this regex matches, you know that your number is valid. Just replace all commas with the empty string afterwards.
It seems to me you have three error conditions
",1000"
"1000,"
"1,,000"
If any one of these is true then you should reject the field, If they are all false then you can strip the commas in the normal way and move on. This can be a simple alternation:
^,|,,|,$
I would just remove anything except digits and the decimal separator ([^0-9.]) and send the output through parseFloat():
var y = parseFloat(x.replace(/[^0-9.]+/g, ""));
// invalid cases:
// - standalone comma at the beginning of the string
// - comma next to another comma
// - standalone comma at the end of the string
var i,
inputs = ['1,000.00', '1,000,00', ',000.00', '1,,000.00'],
invalid_cases = /(^,)|(,,)|(,$)/;
for (i = 0; i < inputs.length; i++) {
if (inputs[i].match(invalid_cases) === null) {
// wipe out everything but decimal and dot
inputs[i] = inputs[i].replace(/[^\d.]+/g, '');
}
}
console.log(inputs); // ["1000.00", "100000", ",000.00", "1,,000.00"]

Categories

Resources