I was doing a simple program in JS to get a list of keyCodes for all french phoneme symbols in the International Phonetic Alphabet, and I realised that key like ɔ̃ are actually considered as ɔ and ~.
My code:
var s = "iuyaɑãoɔɔ̃eεɛ̃øœœ̃əfvszʃʒlrpbmtdnkgɲjwɲ"
for (let i = 0; i < s.length; i++) {
console.log(s.charCodeAt(i))
}
console.log(s.length)
The excpected output is a list of keycodes for each of the characters in the String.
So is there any charset that has tilde accents in it?
It really doesn't seem like there really is a way to have a character ɔ̃ without any library... One simple solution will be to simply check to next character if there ever is a ɔ, and if it is a ~, then I will know this means ɔ̃. Thanks everyone for the answers and comments!
Related
I want to be clear I'm not looking for solutions. I'm really trying to understand what is being done. With that said all pointers and recommendations are welcomed. I am woking through freecodecamp.com task of Check for Palindromes. Below is the description.
Return true if the given string is a palindrome. Otherwise, return
false.
A palindrome is a word or sentence that's spelled the same way both
forward and backward, ignoring punctuation, case, and spacing.
Note You'll need to remove all non-alphanumeric characters
(punctuation, spaces and symbols) and turn everything lower case in
order to check for palindromes.
We'll pass strings with varying formats, such as "racecar", "RaceCar",
and "race CAR" among others.
We'll also pass strings with special symbols, such as "2A3*3a2", "2A3
3a2", and "2_A3*3#A2".
This is what I have for code right now again I'm working through this and using chrome dev tools to figure out what works and what doesn't.
function palindrome(str) {
// Good luck!
str = str.toLowerCase();
//str = str.replace(/\D\S/i);
str = str.replace(/\D\s/g, "");
for (var i = str.length -1; i >= 0; i--)
str += str[i];
}
palindrome("eye");
What I do not understand is when the below code is run in dev tools the "e" is missing.
str = str.replace(/\D\s/g, "");
"raccar"
So my question is what part of the regex am I miss understanding? From my understand the regex should only be getting rid of spaces and integers.
/\D\s/g is replacing any character not a digit, followed by a space with "".
So, in race car, the Regex matches "e " and replaces it with "", making the string raccar
For digit, you need to use \d. I think using an OR would get you what you want. So, you may try something like /\d|\s/g to get a digit or a space.
Hope this helps in some way in your understanding!
I'm new to regex and what I wanna do is that to parse my input as explained below using javascript:
There are 3 types of inputs that I might get:
someEmail#domain.com;anotherEmail#domain.com;
and
some name<someEmail#domain.com>;another name<anotherEmail#domain.com>;
or it might be like
someEmail#domain.com;another name<anotherEmail#domain.com>;
what I'm trying to do is separate the whole input by ; which will give me an array of emails, then check if each of those array items:
has < and > then retrieve the text between < and > as value.
doesn't have < and > then take the whole text as value.
I'm already trying to learn regex. If anyone gives me the regex, I would appreciate if it comes with an explanation so I can understand and learn.
Cheers
Try something like this as a starter - avoid the complicated regex - it's not required if your inputs are in the form stated:
str = 'someEmail#domain.com;another name<anotherEmail#domain.com>;someEmail#domain.com;anotherEmail#domain.com;some name<someEmail#domain.com>;another name<anotherEmail#domain.com>;test#test.com';
var splits = str.split(';');
for (var i = 0; i < splits.length; i++) {
if (splits[i].indexOf('<') == -1) {
$('#output').append(splits[i] + '<br>');
} else {
var address = splits[i].match(/<(.*?)>/)[1];
$('#output').append(address + '<br>');
}
}
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<div id='output'></div>
Splitting is not a bad idea. You can even go further and avoid regular expressions altogether as mentioned in another answer, but since you specifically asked about them:
You could then check each entry in your array with a regular expression like
/^([^<]+);/
which will match anything that consists only of characters that are not < followed by a ; and
/^.*<(.*)>;/
which will match anything of the second form your entries may have.
You can combine these into a single regular expression using |, but I suggest you simply test twice to avoid having to deal with too many capturing groups. You can even avoid the splitting part by using the global modifier, but again, it would make matters a lot more complicated, especially if you're new to regular expressions.
Please note that these examples will match a lot more than email addresses, but checking if they are actually valid is not easy. If you want to look into it, there are plenty of questions on SO about it.
I'm working on a pretty crude sanitizer for string input in Node(express):
I have glanced at some plugins and library, but it seems most of them are either too complex or too heavy. Therefor i decided to write a couple of simple sanitizer-functions on my own.
One of them is this one, for hard-sanitizing most strings (not numbers...)
function toSafeString( str ){
str = str.replace(/[^a-öA-Ö0-9\s]+/g, '');
return str;
}
I'm from Sweden, therefore i Need the åäö letters. And i have noticed that this regex also accept others charachters aswell... for example á or é....
Question 1)
Is there some kind of list or similar where i can see WHICH charachters are actually accepted in, say this regex: /[^a-ö]+/g
Question 2)
Im working in Node and Express... I'm thinking this simple function is going to stop attacks trough input fields. Am I wrong?
Question 1: Find out. :)
var accepted = [];
for(var i = 0; i < 65535 /* the unicode BMP */; i++) {
var s = String.fromCharCode(i);
if(/[a-ö]+/g.test(s)) accepted.push(s);
}
console.log(s.join(""));
outputs
abcdefghijklmnopqrstuvwxyz{|}~ ¡¢£¤¥¦§¨©ª«¬®¯°±²³
´µ¶·¸¹º»¼½¾¿ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö
on my system.
Question 2: What attacks are you looking to stop? Either way, the answer is "No, probably not".
Instead of mangling user data (I'm sure your, say, French or Japanese customers will have some beef with your validation), make sure to sanitize your data whenever it's going into customer view or out thereof (HTML escaping, SQL parameter escaping, etc.).
[x-y] matches characters whose unicode numbers are between that of x and that of y:
charsBetween = function(a, b) {
var a = a.charCodeAt(0), b = b.charCodeAt(0), r = "";
while(a <= b)
r += String.fromCharCode(a++);
return r
}
charsBetween("a", "ö")
> "abcdefghijklmnopqrstuvwxyz{|}~ ¡¢£¤¥¦§¨©ª«¬®¯°±²³´µ¶·¸¹º»¼½¾¿ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö"
See character tables for the reference.
For your validation, you probably want something like this instead:
[^a-zA-Z0-9ÅÄÖåäö\s]
This matches ranges of latin letters and digits + individual characters from a list.
There is a lot of characters that we actually have no idea about, like Japanese or Russian and many more.
So to take them in account we need to use Unicode ranges rather than ASCII ranges in regular expressions.
I came with this regular expression that covers almost all written letters of the whole Unicode table, plus a bit more, like numbers, and few other characters for punctuation (Chinese punctuation is already included in Unicode ranges).
It is hard to cover everything and probably this ranges might include too many characters including "exotic" ones (symbols):
/^[\u0040-\u1FE0\u2C00-\uFFC00-9 ',.?!]+$/i
So I was using it this way to test (have to be not empty):
function validString(str) {
return str && typeof(str) == 'string' && /^[\u0040-\u1FE0\u2C00-\uFFC00-9 ',.?!]+$/i.test(str);
}
Bear in mind that this is missing characters like:
:*()&#'\-:%
And many more others.
Can anyone tell me how to write a regex for the following scenario. The input should only be numbers or - (hyphen) or , (comma). The input can be given as any of the following
23
23,26
1-23
1-23,24
24,25-56,58-40,45
Also when numbers is given in a range, the second number should be greater than the first one. 23-1 should not be allowed. If a number is already entered it should not be allowed again. Like 1-23,23 should not be allowed
I'm not going to quibble with "I think" or "maybe" -- you can not do this with a Regex.
Matching against a regex can validate that the form of the input is correct, and can also be used to extract pieces of the input, but it can not do value comparisons, or duplicate elimination (except in limited well defined circumstances), or range checking.
What you have as input I interpret as a comma-separated list of values or ranges of values; in BNFish notation:
value :: number
range :: value '-' value
term :: value | range
list :: term [','term]*
A regex can be built that will match this to verify correct structure, but you'll have to do other validation for the value comparisons and to prevent the duplicate numbers.
The most straigtforward regex I can think of (on short notice) is this
([0-9]+|[0-9]+-[0-9]+)(, *([0-9]+|[0-9]+-[0-9]+))*
You have digits or digits-digits, optionally followed by comma[optional space](digits or digits-digits) - repeated zero or more times.
I tested this regex at http://www.fileformat.info/tool/regex.htm with the input 3,4-12,6,2,90-221
Of course you can replace the [0-9] with [\d] for regex dialects that allow it.
var str = "24,25-56,24, 58- 40,a 45",
trimmed = str.replace(/\s+/g, '')
//test for correct characters
if (trimmed.match(/[^,\-\d]/)) alert("Please use only digits and hyphens, separated by commas.")
//test for duplicates
var split = trimmed.split(/-|,/)
split.sort()
for (var i = 0; i < split.length - 1; i++) {
if (split[i + 1] == split[i]) alert("Please avoid duplicate numbers.")
}
//test for ascending range
split = trimmed.split(/,/)
for (var i in split) {
if (split[i].match("-") && eval(split[i]) < 0) alert("Please use an ascending range.")
}
I don't think you will be able to do this with a RegEx. Especially not the part about set logic - number already used, valid sequential range.
My suggestion would be to have a Regex verify the format, at the least -, number, comma. Then use the split method on commas and loop over the input to verify the set. Something like:
var number_ranges = numbers.split(',');
for (var i = 0; i < number_ranges.length; ++i) {
// verify number ranges in set
}
That logic is not exactly trivial.
I think with regular expressions it is better to take the time to learn them than to throw someone elses script into yours without knowing exactly what it is doing. You have excellent resources out there to help you.
Try these sites:
regular-expressions.info
w3schools.com
evolt.org
Those are the first three results form a google search. All are good resources. Good luck. Remember to double check what your regex is actually matching by outputing it to the screen, don't assume you know (that has bitten me more than one time).
Greetings JavaScript and regular expression gurus,
I want to return all matches in an input string that are 6-digit hexadecimal numbers with any amount of white space in between. For example, "333333 e1e1e1 f4f435" should return an array:
array[0] = 333333
array[1] = e1e1e1
array[2] = f4f435
Here is what I have, but it isn't quite right-- I'm not clear how to get the optional white space in there, and I'm only getting one match.
colorValuesArray = colorValues.match(/[0-9A-Fa-f]{6}/);
Thanks for your help,
-NorthK
Use the g flag to match globally:
/[0-9A-Fa-f]{6}/g
Another good enhancement would be adding word boundaries:
/\b[0-9A-Fa-f]{6}\b/g
If you like you could also set the i flag for case insensitive matching:
/\b[0-9A-F]{6}\b/gi
Alternatively to the answer above, a more direct approach might be:
/\p{Hex_Digit}{6}/ug
You can read more about Unicode Properties here.
It depends on the situation, but I usually want to make sure my code can't silently accept (and ignore, or misinterpret) incorrect input. So I would normally do something like this.
var arr = s.split();
for (var i = 0; i < arr.length; i++) {
if (!arr[i].match(/^[0-9A-Fa-f]{6}$/)
throw new Error("unexpected junk in string: " + arr[i]);
arr[i] = parseInt(arr[i], 16);
}
try:
colorValues.match(/[0-9A-Fa-f]{6}/g);
Note the g flag to Globally match.
result = subject.match(/\b[0-9A-Fa-f]{6}\b/g);
gives you an array of all 6-digit hexadecimal numbers in the given string subject.
The \b word boundaries are necessary to avoid matching parts of longer hexadecimal numbers.
For people who are looking for hex color with alpha code, the following regex works:
/\b[0-9A-Fa-f]{6}[0-9A-Fa-f]{0,2}\b\g
The code allows both hex with or without the alpha code.