file and directory regex - javascript

I am trying to create a regEx for file and directory path validation.
I have implemented this, but its failing 1 of the conditions, that it should not allow ie multiple slashes together.
Also, no other special character should not be allowed
var x = /^(\\|\/){1}([a-zA-Z0-9\s\-_\#\-\^!#$%&]*?(\\|\/)?)+(\.[a-z\/\/]+)?$/i
test 1 -> / (should pass)
test 2 -> /asdf (should pass)
test 3 -> /asdf/scd.csv (should pass)
test 4 -> //asdf (should fail, currently passing)
test 5 -> /asd/ads/c.csv/ (should pass)
test 6 -> asd/asfd/a (should fail)
Can suggestion how to solve this?

The path //asdf is valid on LINUX, UNIX, iOS, and Android, so your code already works. However, if it is important for some reason to invalidate that particular set of valid paths, simply substitute a plus sign in place of the an asterisk after the [a-z...] character group. That will cause invalidation of multiple path separators with no intervening characters.
It is probably useful to comment on larger issues with the regex approach and details.
1) You can use [\/] instead of (\|/), however both will allow false positives on every combination of operating system and file system. (Those that require forward slash should exclude backslashes as a separator and vice versa.)
2) The character range [a-zA-Z0-9\s-_\#-\^!#$%&] in the question is not the permissible character range for directory path elements for any known combination of operating system and file system. For instance, a period is valid in directory names for most.
3) Permissible character ranges are not portable. (The most reliable way to test path validation is to touch the file name on the actual file system, meaning actually instantiate an empty file and capture any indications of instantiation failure.)
4) You don't want or need a question mark after your asterisk or after your second (\|/) group. They don't create a bug, but they waste either compilation or run time, and they obfuscate your regex purpose.
5) You also need to repeat the character range just before the extension or rearrange like the example below.
6) You don't need to add the A-Z range to the a-z range if you use \i as a flag at the end of the regex.
7) It appears from the list of desired results that relative paths are to be filtered out, but there is no explicit mention of that as a rule for the solution.
With hesitation, this code is provided to demonstrate a few of the above improvements.
// This code is not production worthy
// for reasons (1) through (3) given
// above and is provided only for the
// purpose of clarifying points made.
var re = /^([\\/][a-z0-9\s\-_\#\-\^!#$%&]*)+(\.[a-z][a-z0-9]+)?$/i
console.log(
[
'/',
'/asdf',
'/asdf/scd.csv',
'//asdf',
'/asd/ads/c.csv/',
'asd/asfd/a'
].map(RegExp.prototype.test, re))

Try using /^(\/|([\\/][\w\s#^!#$%&-]+)+(\.[a-z]+[\\/]?)?)$/i instead, which forces at least one character to match between each slash:
var regex = /^(\/|([\\/][\w\s#^!#$%&-]+)+(\.[a-z]+[\\/]?)?)$/i
console.log([
'/',
'/asdf',
'/asdf/scd.csv',
'//asdf',
'/asd/ads/c.csv/',
'asd/asfd/a'
].map(RegExp.prototype.test, regex))

((\/[\w\s\.#^!#$%&-]+)+\/?)|\/[\w\.\s#^!#$%&-]*
This was tested to match your sample input,
BUT on np++ (i.e. perl-regex flavor), because I have no experience with javascript.
Therefor here the same in flavor-indpendent prose.
"(slash and character many times, followed by optional slash)
or
slash and zero or more characters".
Note1: I added explicit "." to allowed characters.
Note2: I assume your "\/" means, "explicit slash, not backslash".

Related

JS regex to get domain name from an email [duplicate]

How can I extract only top-level and second-level domain from a URL using regex? I want to skip all lower level domains. Any ideas?
Here's my idea,
Match anything that isn't a dot, three times, from the end of the line using the $ anchor.
The last match from the end of the string should be optional to allow for .com.au or .co.nz type of domains.
Both the last and second last matches will only match 2-3 characters, so that it doesn't confuse it with a second-level domain name.
Regex:
[^.]*\.[^.]{2,3}(?:\.[^.]{2,3})?$
Demonstration:
Regex101 Example
Updated 2019
This is an old question, and the challenge here is a lot more complicated as we start adding new vanity TLDs and more ccTLD second level domains (e.g. .co.uk, .org.uk). So much so, that a regular expression is almost guaranteed to return false positives or negatives.
The only way to reliably get the primary host is to call out to a service that knows about them, like the Public Suffix List.
There are several open-source libraries out there that you can use, like psl, or you can write your own.
Usage for psl is quite intuitive. From their docs:
var psl = require('psl');
// Parse domain without subdomain
var parsed = psl.parse('google.com');
console.log(parsed.tld); // 'com'
console.log(parsed.sld); // 'google'
console.log(parsed.domain); // 'google.com'
console.log(parsed.subdomain); // null
// Parse domain with subdomain
var parsed = psl.parse('www.google.com');
console.log(parsed.tld); // 'com'
console.log(parsed.sld); // 'google'
console.log(parsed.domain); // 'google.com'
console.log(parsed.subdomain); // 'www'
// Parse domain with nested subdomains
var parsed = psl.parse('a.b.c.d.foo.com');
console.log(parsed.tld); // 'com'
console.log(parsed.sld); // 'foo'
console.log(parsed.domain); // 'foo.com'
console.log(parsed.subdomain); // 'a.b.c.d'
Old answer
You could use this:
(\w+\.\w+)$
Without more details (a sample file, the language you're using), it's hard to discern exactly whether this will work.
Example: http://regex101.com/r/wD8eP2
Also, you can likely do that with some expression similar to,
^(?:https?:\/\/)(?:w{3}\.)?.*?([^.\r\n\/]+\.)([^.\r\n\/]+\.[^.\r\n\/]{2,6}(?:\.[^.\r\n\/]{2,6})?).*$
and add as much as capturing groups that you want to capture the components of a URL.
Demo
If you wish to simplify/modify/explore the expression, it's been explained on the top right panel of regex101.com. If you'd like, you can also watch in this link, how it would match against some sample inputs.
RegEx Circuit
jex.im visualizes regular expressions:
For anyone using JavaScript and wanting a simple way to extract the top and second level domains, I ended up doing this:
'example.aus.com'.match(/\.\w{2,3}\b/g).join('')
This matches anything with a period followed by two or three characters and then a word boundary.
Here's some example outputs:
'example.aus.com' // .aus.com
'example.austin.com' // .austin.com
'example.aus.com/howdy' // .aus.com
'example.co.uk/howdy' // .co.uk
Some people might need something a bit cleverer, but this was enough for me with my particular dataset.
Edit
I've realised there are actually quite a few second-level domains which are longer than 3 characters (and allowed). So, again for simplicity, I just removed the character counting element of my regex:
'example.aus.com'.match(/\.\w*\b/g).join('')
Since TLDs now include things with more than three-characters like .wang and .travel, here's a regex that satisfies these new TLDs:
([^.\s]+\.[^.\s]+)$
Strategy: starting at the end of the string, look for one or more characters that aren't periods or whitespace, followed by a single period, followed by one or more characters that aren't periods or whitespace.
http://regexr.com/3bmb3
With capturing groups you can achieve some magix.
For example, consider the following javascript:
let hostname = 'test.something.else.be';
let domain = hostname.replace(/^.+\.([^\.]+\.[^\.]+)$/, '$1');
document.write(domain);
This will result in a string containing 'else.com'. This is because the regex itself will match the complete string and the capturing group will be mapped to $1. So it replaces the complete string 'test.something.else.com' with '$1' which is actually 'else.com'.
The regex isn't pretty and can probably be made more dynamic with things like {3} for defining how many levels deep you want to look for subdomains, but this is just an illustration.
if you want all specific Top Level Domain name then you can write regular expression like this:
[RegularExpression("^(https?:\\/\\/)?(([\\w]+)?\\.?(\\w+\\.((za|zappos|zara|zero|zip|zippo|zm|zone|zuerich|zw))))\\/?$", ErrorMessage = "Is not a valid fully-qualified URL.")]
You can also put more domain name from this link:
https://www.icann.org/resources/pages/tlds-2012-02-25-en
The following regex matches a domain with root and tld extractions (named capture groups) from a url or domain string:
(?:\w+:\/{2})?(?<cs_domain>(?<cs_domain_sub>(?:[\w\-]+\.)*?)(?<cs_domain_root>[\w\-]+(?<cs_domain_tld>(?:\.\w{2})?(?:\.\w{2,3}|\.xn-+\w+|\.site|\.club))))\|
It's hard to say if it is perfect, but it works on all the test data sets that I have put it against including .club, .xn-1234, .co.uk, and other odd endings. And it does it in 5556 steps against 40k chars of logs, so the efficiency seems reasonable too.
If you need to be more specific:
/\.(?:nl|se|no|es|milru|fr|es|uk|ca|de|jp|au|us|ch|it|io|org|com|net|int|edu|mil|arpa)/
Based on http://www.seobythesea.com/2006/01/googles-most-popular-and-least-popular-top-level-domains/

Regex in Google Apps Script practical issue. Forms doesn't read regex as it should

I hope its just something i'm not doing right.
I've been using a simple script to create a form out of a spreadsheet. The script seems to be working fine. The output form is going to get some inputs from third parties so i can analyze them in my consulting activity.
Creating the form was not a big deal, the structure is good to go. However, after having the form creator script working, i've started working on its validations, and that's where i'm stuck at.
For text validations, i will need to use specific Regexes. Many of the inputs my clients need to give me are going to be places' and/or people's names, therefore, i should only allow them usign A-Z, single spaces, apostrophes and dashes.
My resulting regexes are:
//Regex allowing a **single name** with the first letter capitalized and the occasional use of "apostrophes" or "dashes".
const reg1stName = /^[A-Z]([a-z\'\-])+/
//Should allow (a single name/surname) like Paul, D'urso, Mac'arthur, Saint-Germaine ecc.
//Regex allowing **composite names and places names** with the first letter capitalized and the occasional use of "apostrophes" or "dashes". It must avoid double spaces, however.
const regNamesPlaces = /^[^\s]([A-Z]|[a-z]|\b[\'\- ])+[^\s]$/
//This should allow (names/surnames/places' names) like Giulius Ceasar, Joanne D'arc, Cosimo de'Medici, Cosimo de Medici, Jean-jacques Rousseau, Firenze, Friuli Venezia-giulia, L'aquila ecc.
Further in the script, these Regexes are called as validation pattern for the forms text items, in accordance with each each case.
//Validation for single names
var val1stName = FormApp.createTextValidation()
.setHelpText("Only the person First Name Here! Use only (A-Z), a single apostrophe (') or a single dash (-).")
.requireTextMatchesPattern(reg1stName)
.build();
//Validation for composite names and places names
var valNamesPlaces = FormApp.createTextValidation()
.setHelpText(("Careful with double spaces, ok? Use only (A-Z), a single apostrophe (') or a single dash (-)."))
.requireTextMatchesPattern(regNamesPlaces)
.build();
Further yet, i have a "for" loop that creates the form based on the spreadsheets fields. Up to this point, things are working just fine.
for(var i=0;i<numberRows;i++){
var questionType = data[i][0];
if (questionType==''){
continue;
}
else if(questionType=='TEXTNamesPlaces'){
form.addTextItem()
.setTitle(data[i][1])
.setHelpText(data[i][2])
.setValidation(valNamesPlaces)
.setRequired(false);
}
else if(questionType=='TEXT1stName'){
form.addTextItem()
.setTitle(data[i][1])
.setHelpText(data[i][2])
.setValidation(val1stName)
.setRequired(false);
}
The problem is when i run the script and test the resulting form.
Both validations types get imported just fine (as can be seen in the form's edit mode), but when testing it in preview mode i get an error, as if the Regex wasn't matching (sry the error message is in portuguese, i forgot to translate them as i did with the code up there):
A screenshot of the form in edit mode
A screeshot of the form in preview mode
However, if i manually remove the bars out of this regex "//" it starts working!
A screenshot of the form in edit mode, Regex without bars
A screenshot of the form in preview mode, Regex without bars
What am i doing wrong? I'm no professional dev but in my understanding, it makes no sense to write a Regex without bars.
If this is some Gforms pattern of reading regexes, i still need all of this to be read by the Apps script that creates this form after all. If i even try to pass the regex without the bars there, the script will not be able to read it.
const reg1stName = ^[A-Z]([a-z\'])+
const regNamesPlaces = ^[^\s]([A-Z]|[a-z]|\b[\'\- ])+[^\s]$
//Can't even be saved. Returns: SyntaxError: Unexpected token '^' (line 29, file "Code.gs")
Passing manually all the validations is not an option. Can anybody help me?
Thanks so much
This
/^[A-Z]([a-z\'\-])+/
will not work because the parser is trying to match your / as a string literal.
This
^[A-Z]([a-z\'\-])+
also will not work, because if the name is hyphenated, you will only match up to the hyphen. This will match the 'Some-' in 'Some-Name', for example. Also, perhaps you want a name like 'Saint John' to pass also?
I recommend the following :)
^[A-Z][a-z]*[-\.' ]?[A-Z]?[a-z]*
^ anchors to the start of the string
[A-Z] matches exactly 1 capital letter
[a-z]* matches zero or more lowercase letters (this enables you to match a name like D'Urso)
[-\.' ]? matches zero or 1 instances of - (hyphen), . (period), ' (apostrophe) or a single space (the . (period) needs to be escaped with a backslash because . is special to regex)
[A-Z]? matches zero or 1 capital letter (in case there's a second capital in the name, like D'Urso, St John, Saint-Germaine)

How to check if string is a valid Figma link?

I'm building an app on NodeJS that uses Figma API, and I need to check if the string passed by a user is a valid Figma link. I'm currently using this simple regex expression to check the string:
/^https\:\/\/www.figma.com\/.*/i
However, it matches all links from figma.com, even the home page, not only links to the files and prototypes. Here is an example Figma link that should match:
https://www.figma.com/file/OoYmkiTlusAzIjYwAgSbv8wy/Test-File?node-id=0%3A1
Also the match should be positive if this is a prototype link, with proto instead of file in the path.
Moreover, since I'm using the Figma API, it would be useful to extract necessary parts of the URL such as the file ID and node ID at the same time.
TL;DR
✅ Use this expression to capture four most important groups (type, file id, file name and URL properties) and work from there.
/^(?:https:\/\/)?(?:www\.)?figma\.com\/(file|proto)\/([0-9a-zA-Z]{22,128})(?:\/?([^\?]+)?(.*))?$/
From the docs
This is the regex expression code provided by Figma on their developer documentation page about embeds:
/https://([w.-]+.)?figma.com/(file|proto)/([0-9a-zA-Z]{22,128})(?:/.*)?$/
🛑 However, it doesn't work in JS as the documentation is currently wrong and this expression has multiple issues:
Slashes and a dots are not escaped with backslashes.
It doesn't match from the start of the string. I added the start of string anchor ^ after VLAZ pointed it out in the comments. This way we will avoid matching strings that don't start with https, for example malicious.site/?link=https://figma.com/...
It will match not only www. subdomain but any other amount of W which is not great (e.g. wwwww.) — it can be fixed by replacing letter match with a simpler expression. Also this is a useless capturing group, I'll make it non-capturing.
It would be nice if the link matched even if it doesn't begin with https:// as some engines (e.g. Twitter) strip this part for brevity and if person is copying a link from there, it should still be valid.
After applying all the improvements, we are left with the following expression:
/^(?:https:\/\/)?(?:www\.)?figma\.com\/(file|proto)\/([0-9a-zA-Z]{22,128})(?:\/.*)?$/
There is also a dedicated NPM package that simply checks the URL against the similar pattern. However, it contains some of the flaws listed above so I don't advice using it, especially for just one line of code.
Extracting parts of the URL
This expression is extremely useful to use with Figma API as it even extracts necessary parts from the URL such as type of link (proto/file) and the file key. You can access them by indexes.
You can also add a piece of regex to match specific keys in the query such as node-id:
/^(?:https:\/\/)?(?:www\.)?figma\.com\/(file|proto)\/([0-9a-zA-Z]{22,128})(?:\/.*)?node-id=([^&]*)$/
Now you can use it in code and get all the parts of the URL separately:
var pattern = /^(?:https:\/\/)?(?:www\.)?figma\.com\/(file|proto)\/([0-9a-zA-Z]{22,128})(?:\/.*)?node-id=([^&]*)$/
var matched = 'https://www.figma.com/file/OoYmkiTlusAzIjYwAgSbv8wy/Test-File?node-id=0%3A1'.match(pattern)
console.log('url:', matched[0]) // whole matched string
console.log('type:', matched[1]) // group 1
console.log('file key:', matched[2]) // group 2
console.log('node id:', matched[3]) // group 3
Digging deeper
I spent some time recreating this expression almost from scratch so it would match as many possible Figma file/prototype URLs without breaking things. Here are three similar versions of it that would work for different cases.
✅ This version captures the URL parameters and the name of the file separately for easier processing. You can check it here. I added it in the beginning of the answer, because I think it's the cleanest and most useful solution.
/^(?:https:\/\/)?(?:www\.)?figma\.com\/(file|proto)\/([0-9a-zA-Z]{22,128})(?:\/?([^\?]+)?(.*))?$/
The groups in it are as following:
Group 1: file/proto
Group 2: file key/id
Group 3: file name (optional)
Group 4: url parameters (optional)
✅ Next up, I wanted to do the same but separating the /duplicate part that can be added in the end of any Figma URL to create a duplicate of the file upon opening.
/^(?:https:\/\/)?(?:www\.)?figma\.com\/(file|proto)\/([0-9a-zA-Z]{22,128})(?:\/?([^\?]+)?([^\/]*)(\/duplicate)?)?$/
✅ And back to the node-id parameter. The following regex expression finds and captures multiple URLs inside a multiline string successfully. The only downside that I found in the end is that it (as well as all the previous ones) doesn't check if this URL contains unencoded special characters meaning that it can potentially break things, but it can be avoided by manually encoding all parameters using encodeURI() function.
/^(?:https:\/\/)?(?:www\.)?figma\.com\/(file|proto)\/([0-9a-zA-Z]{22,128})(?:\/([^\?\n\r\/]+)?((?:\?[^\/]*?node-id=([^&\n\r\/]+))?[^\/]*?)(\/duplicate)?)?$/gm
There are six groups that can be captured by this expression:
Group 1: file/proto
Group 2: file key/id
Group 3: file name (optional)
Group 4: url parameters (optional)
Group 5: node-id (optional; only present when group 4 is present)
Group 6: /duplicate
And, finally, here is the example of a match and its groups (or try it yourself):

Email Validation RegEx username/local name length check not running

I've debugged for a few hours now and have hit a wall - regex has never been my strongsuit. I have been able to alter the following regex to restrict 255 characters for domain fine, however, in trying to restrict the local/username portion of an email address I'm running into issues implementing a 64 character limit. I've gone through regex101 replacing +s and *s and attempting to understand what each pass is doing - however, even when I add a check against all non-whitespace characters with a limit of 64 it seems like the other checks pass and take precedence - although I'm not sure. Below is my regex currently without any of the 64 character checks that I've broken it with:
var emailCheck = new RegExp(/^((([a-z]|\d|[!#\$%&'\*\+\-\/=\?\^_`{\|}~]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])+(\.{0,1}([a-z]|\d|[!#\$%&'\*\+\-\/=\?\^_`{\|}~]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])+)*)|((\x22)((((\x20|\x09)*(\x0d\x0a))?(\x20|\x09)+)?(([\x01-\x08\x0b\x0c\x0e-\x1f\x7f]|\x21|[\x23-\x5b]|[\x5d-\x7e]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(\\([\x01-\x09\x0b\x0c\x0d-\x7f]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF]))))*(((\x20|\x09)*(\x0d\x0a))?(\x20|\x09)+)?(\x22)))#((([a-z]|\d|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(([a-z]|\d|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF]){1,255}([a-z]|\d|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])))\.)+(([a-z]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(([a-z]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF]){1,255}([a-z]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])))\.*$/i);
What I have so far can be seen at http://jsfiddle.net/mtqx0tz1/ , I've made other slight alterations (e.g. not allowing consecutive dots) but for the most part this regex comes from another stack post without the character limits.
Lastly, I'm aware this isn't the 'standard' so to speak and emails are checked server-side, however, I would like to be more safe than sorry...as well as work on some of my regex. Sorry if this question isn't worthy of an actual post - I'm just simply not seeing where in my passes {1,64} is failing. At this point I'm thinking about just sub-stringing the portion of the string up to the # sign and checking length that way...but it would be nice to include it in this statement since all the checks are done here to begin with.
I have used this regex validation and it works good.
The e-mail address is in the variable strIn
try
{
return Regex.IsMatch(strIn,
#"^(?("")("".+?(?<!\\)""#)|(([0-9a-z]((\.(?!\.))|[-!#\$%&'\*\+/=\?\^`\{\}\|~\w])*)(?<=[0-9a-z])#))" +
#"(?(\[)(\[(\d{1,3}\.){3}\d{1,3}\])|(([0-9a-z][-\w]*[0-9a-z]*\.)+[a-z0-9][\-a-z0-9]{0,22}[a-z0-9]))$",
RegexOptions.IgnoreCase, TimeSpan.FromMilliseconds(250));
}
catch (RegexMatchTimeoutException)
{
return false;
}

Phone number validation - excluding non repeating separators

I have the following regex for phone number validation
function validatePhonenumber(phoneNum) {
var regex = /^[1-9]{3}[-\s\.]{0,1}[0-9]{3}[-\s\.]{0,1}[0-9]{4}$/;
return regex.test(phoneNum);
}
However, I would liek to make sure it doesn;t pass for different separators such as in
111-222.3333
Any ideas how to make sure the separators are the same always?
Just make sure beforehand that there is at most one kind of separator, then pass the string through the regex as you were doing.
function validatePhonenumber(phoneNum) {
var separators = extractSeparators(phoneNum);
if(separators.length > 1) return false;
var regex = /^[1-9]{3}[-\s\.]{0,1}[0-9]{3}[-\s\.]{0,1}[0-9]{3}$/;
return regex.test(phoneNum);
}
function extractSeparators(str){
// Return an array with all the distinct chars
// that are present in the passed string
// and are not numeric (0-9)
}
You can use the following regex instead:
\d{3}([-\s\.])?\d{3}\1?\d{4}
Here is a working example:
http://regex101.com/r/nN9nT7/1
As result it will match the following result:
111-222-3333 --> ok
111.222.3333 --> ok
111 222 3333 --> ok
111-222.3333
111.222-3333
111-222 3333
111 222-3333
EDIT: after Alan Moore's suggestion:
Also matches 111-2223333. That's because you made the \1 optional,
which isn't necessary. One of JavaScript's stranger quirks is that a
backreference to a group that did not participate in the match,
succeeds anyway. So if there's no first separator, ([-\s.])? succeeds
because the ? made it optional, and \1 succeeds because it's
JavaScript. But I would have used ([-\s.]?) to capture the first
separator (which might be nothing), and \1 to match the same thing
again. This works in any flavor, including JavaScript.
We can improve the regex to:
^\d{3}([-\s\.]?)\d{3}\1\d{4}$
You'll need at least two passes to keep this maintainable and extensible.
JS' RegEx doesn't allow for creating variables for use later in the RegEx, if you want to support older browsers.
If you are only supporting modern browsers, Fede's answer is just fine...
As such, with ghetto-support, you aren't going to be able to reliably check that one separator is the same value every time, without writing a really, really, really, stupidly-long RegEx, using | to basically write out the RegEx 3 times.
A better way might be to grab all of the separators, and use a reduction or a filter to check that they all have the same value.
var userEnteredNumber = "999.231 3055";
var validNumber = numRegEx.test(userEnteredNumber);
var separators = userEnteredNumber.replace(/\d+/g, "").split("");
var firstSeparator = separators[0];
var uniformSeparators = separators.every(function (separator) { return separator === firstSeparator; });
if (!uniformSeparators) { /* also not valid */ }
You could make that a little neater, using closures and some applied functions, but that's the idea.
Alternatively, here's the big, ugly RegEx that would allow you to test exactly what the user entered.
var separatorTest = /^([0-9]{3}\.[0-9]{3}\.[0-9]{3,4})|([0-9]{3}-[0-9]{3}-[0-9]{3,4})|([0-9]{3} [0-9]{3} [0-9]{3,4})|([0-9]{9,10})$/;
Notice I had to include the exact same number-test three times, wrap each one in parens (to be treated as a single group), and then separate each group with an | to check each group, like an if, else if, else... ...and then plug in a separate special case for having no separator at all...
...not pretty.
I'm also not using \d, just because it's easy to forget that - and . are both accepted "digit"s, when trying to maintain one of these abominations.
Now, a word or two of warning:
People are liable to enter all kinds of crap; if this is for a commercial site, it's likely better to just strip separators entirely and validate the number is the right size, and conforms to some specifics (eg: doesn't start with /^555555/).
If not given any instruction about number format, people will happily use either no separator or a formal number, like (555) 555-5555 (or +1 (555) 555-5555 for the really pedantic), which is obviously going to fail hard, in this system (see point #1).
Be prepared to trim what you get, before validating.
Depending on your country/region/etc laws about data-security and consumer-vs-transaction record-keeping (again, may or may not be more important in a commercial setting), it's likely better to store both a "user-given" ugly number, and a system-usable number, which you either clean on the back-end, or submit along with the user-entered text.
From a user-interaction perspective, either forcing the number to conform, explicitly (placeholders showing them xxx-xxx-xxxx right above the input, in bold), or accepting any text, and prepping it yourself, is going to be 1000x better than accepting certain forms, but not bothering to tell the user up-front, and instead telling them what they did was wrong, after they try.
It's not cool for relationships; it's equally not cool, here.
You've got 9-digit and 10-digit numbers, so if you're trying for an international solution, be prepared to deal with all international separators (, \.\-\(\)\+) etc... again, why stripping is more useful, because THAT RegEx would be insane.

Categories

Resources