JavaScript regular expression to catch kanji - javascript

I can't get this javascript function to work the way I want...
// matches a String that contains kanji and/or kana character(s)
String.prototype.isKanjiKana = function(){
return !!this.match(/^[\u4E00-\u9FAF|\u3040-\u3096|\u30A1-\u30FA|\uFF66-\uFF9D|\u31F0-\u31FF]+$/);
}
it does return TRUE if the string is made of kanji and/or kana characters, FALSE if alphabet or other chars are present.
I would like it to return if at least 1 kanji and/or kana characters are present instead that if all of them are.
thank you in advance for any help!

The right answer is not to hardcode ranges. Never ever put magic numbers in your code! That is a maintenance nightmare. It is hard to read, hard to write, hard to debug, hard to maintain. How do you know you got the numbers right? What happens when they add new ones? No, do not use magic numbers. Please.
The right answer is to use named Unicode scripts, which are a fundemental aspect of every Unicode code point:
[\p{Han}\p{Hiragana}\p{Katakana}]
That requires the XRegExp plugin for Javascript.
The real problem is that Javascript regexes on their own are too primitive to support Unicode properties — and therefore, to support Unicode. Maybe that was once an acceptable compromise 15 years ago, but today it is nothing less than intolerably negligent, as you yourself have discovered.
You will also miss a few Common code points specified as kana in the new Script Extensions property, but probably no matter. You could just add \p{Common} to the set above.

Now that Unicode property escapes are part of the ES (2018) spec, the following regex can be used natively if the JS engine supports this feature (expanding on #tchrist's answer):
/[\p{Script_Extensions=Han}\p{Script_Extensions=Hiragana}\p{Script_Extensions=Katakana}]/u
If you want to exclude punctuation from being matched:
/(?!\p{Punctuation})[\p{Script_Extensions=Han}\p{Script_Extensions=Hiragana}\p{Script_Extensions=Katakana}]/u

/[\u3000-\u303f]|[\u3040-\u309f]|[\u30a0-\u30ff]|[\uff00-\uffef]|[\u4e00-\u9faf]|[\u3400-\u4dbf]/
Japanese style punctuation: [\u3000-\u303f]
Hiragana: [\u3040-\u309f]
Katakana: [\u30a0-\u30ff]
Roman characters + half-width katakana: [\uff00-\uffef]
Kanji: [\u4e00-\u9faf]|[\u3400-\u4dbf]

String.prototype.isKanjiKana = function(){
return !!this.match(/[\u4E00-\u9FAF\u3040-\u3096\u30A1-\u30FA\uFF66-\uFF9D\u31F0-\u31FF]/);
}
Don't anchor it to beginning and end of string with $^ and the + is useless in this case.

/[\u4E00-\u9FAF|\u3040-\u3096|\u30A1-\u30FA|\uFF66-\uFF9D|\u31F0-\u31FF]/

Why not just this? It will return true when it contains at least one Kanji.
/[一-龯]/.test(str)

Related

How do i allow only one (dash or dot or underscore) in a user form input using regular expression in javascript?

I'm trying to implement a username form validation in javascript where the username
can't start with numbers
can't have whitespaces
can't have any symbols but only One dot or One underscore or One dash
example of a valid username: the_user-one.123
example of invalid username: 1----- user
i've been trying to implement this for awhile but i couldn't figure out how to have only one of each allowed symbol:-
const usernameValidation = /(?=^[\w.-]+$)^\D/g
console.log(usernameValidation.test('1username')) //false
console.log(usernameValidation.test('username-One')) //true
How about using a negative lookahead at the start:
^(?!\d|.*?([_.-]).*\1)[\w.-]+$
This will check if the string
neither starts with digit
nor contains two [_.-] by use of capture and backreference
See this demo at regex101 (more explanation on the right side)
Preface: Due to my severe carelessness, I assumed the context was usage of the HTML pattern attribute instead of JavaScript input validation. I leave this answer here for posterity in case anyone really wants to do this with regex.
Although regex does have functionality to represent a pattern occuring consecutively within a certain number of times (via {<lower-bound>,<upper-bound>}), I'm not aware of regex having "elegant" functionality to enforce a set of patterns each occuring within a range of number of times but in any order and with other patterns possibly in between.
Some workarounds I can think of:
Make a regex that allows for one of each permutation of ordering of special characters (note: newlines added for readability):
^(?:
(?:(?:(?:[A-Za-z][A-Za-z0-9]*\.?)|\.)[A-Za-z0-9]*-?[A-Za-z0-9]*_?)|
(?:(?:(?:[A-Za-z][A-Za-z0-9]*\.?)|\.)[A-Za-z0-9]*_?[A-Za-z0-9]*-?)|
(?:(?:(?:[A-Za-z][A-Za-z0-9]*-?)|-)[A-Za-z0-9]*\.?[A-Za-z0-9]*_?)|
(?:(?:(?:[A-Za-z][A-Za-z0-9]*-?)|-)[A-Za-z0-9]*_?[A-Za-z0-9]*\.?)|
(?:(?:(?:[A-Za-z][A-Za-z0-9]*_?)|_)[A-Za-z0-9]*\.?[A-Za-z0-9]*-?)|
(?:(?:(?:[A-Za-z][A-Za-z0-9]*_?)|_)[A-Za-z0-9]*-?[A-Za-z0-9]*\.?)
)[A-Za-z0-9]*$
Note that the above regex can be simplified if you don't want usernames to start with special characters either.
Friendly reminder to also make sure you use the HTML attributes to enforce a minimum and maximum input character length where appropriate.
If you feel that regex isn't well suited to your use-case, know that you can do custom validation logic using javascript, which gives you much more control and can be much more readable compared to regex, but may require more lines of code to implement. Seeing the regex above, I would personally seriously consider the custom javascript route.
Note: I find https://regex101.com/ very helpful in learning, writing, and testing regex. Make sure to set the "flavour" to "JavaScript" in your case.
I have to admit that Bobble bubble's solution is the better fit. Here ia a comparison of the different cases:
console.log("Comparison between mine and Bobble Bubble's solution:\n\nusername mine,BobbleBubble");
["valid-usrId1","1nvalidUsrId","An0therVal1d-One","inva-lid.userId","anot-her.one","test.-case"].forEach(u=>console.log(u.padEnd(20," "),chck(u)));
function chck(s){
return [!!s.match(/^[a-zA-Z][a-zA-Z0-9._-]*$/) && ( s.match(/[._-]/g) || []).length<2, // mine
!!s.match(/^(?!\d|.*?([_.-]).*\1)[\w.-]+$/)].join(","); // Bobble bulle
}
The differences can be seen in the last three test cases.

Detect Russian / cyrillic in Javascript string?

I'm trying to detect if a string contains Russian (cyrillic) characters or not. I'm using this code:
term.match(/[\wа-я]+/ig);
but it doesn't work – or in fact it just returns the string back as it is.
Can somebody help with the right code?
Thanks!
Use pattern /[\u0400-\u04FF]/ to cover more cyrillic characters:
// http://jrgraphix.net/r/Unicode/0400-04FF
const cyrillicPattern = /^[\u0400-\u04FF]+$/;
console.log('Привіт:', cyrillicPattern.test('Привіт'));
console.log('Hello:', cyrillicPattern.test('Hello'));
UPDATE:
In some new browsers, you can use Unicode property escapes.
The Cyrillic script uses the same range as described above: U+0400..U+04FF
const cyrillicPattern = /^\p{Script=Cyrillic}+$/u;
console.log('Привіт:', cyrillicPattern.test('Привіт'));
console.log('Hello:', cyrillicPattern.test('Hello'));
Perhaps you meant to use the RegExp test method instead?
/[а-яА-ЯЁё]/.test(term)
Note that JavaScript regexes are not really Unicode-aware, which means the i flag will have no effect on anything that's not ASCII. Hence the need for spelling out lower- and upper-case ranges separately.

What is the difference of operators " and ' in javascript [duplicate]

Consider the following two alternatives:
console.log("double");
console.log('single');
The former uses double quotes around the string, whereas the latter uses single quotes around the string.
I see more and more JavaScript libraries out there using single quotes when handling strings.
Are these two usages interchangeable? If not, is there an advantage in using one over the other?
The most likely reason for use of single vs. double in different libraries is programmer preference and/or API consistency. Other than being consistent, use whichever best suits the string.
Using the other type of quote as a literal:
alert('Say "Hello"');
alert("Say 'Hello'");
This can get complicated:
alert("It's \"game\" time.");
alert('It\'s "game" time.');
Another option, new in ECMAScript 6, is template literals which use the backtick character:
alert(`Use "double" and 'single' quotes in the same string`);
alert(`Escape the \` back-tick character and the \${ dollar-brace sequence in a string`);
Template literals offer a clean syntax for: variable interpolation, multi-line strings, and more.
Note that JSON is formally specified to use double quotes, which may be worth considering depending on system requirements.
If you're dealing with JSON, it should be noted that strictly speaking, JSON strings must be double quoted. Sure, many libraries support single quotes as well, but I had great problems in one of my projects before realizing that single quoting a string is in fact not according to JSON standards.
There is no one better solution; however, I would like to argue that double quotes may be more desirable at times:
Newcomers will already be familiar with double quotes from their language. In English, we must use double quotes " to identify a passage of quoted text. If we were to use a single quote ', the reader may misinterpret it as a contraction. The other meaning of a passage of text surrounded by the ' indicates the 'colloquial' meaning. It makes sense to stay consistent with pre-existing languages, and this may likely ease the learning and interpretation of code.
Double quotes eliminate the need to escape apostrophes (as in contractions). Consider the string: "I'm going to the mall", vs. the otherwise escaped version: 'I\'m going to the mall'.
Double quotes mean a string in many other languages. When you learn a new language like Java or C, double quotes are always used. In Ruby, PHP and Perl, single-quoted strings imply no backslash escapes while double quotes support them.
JSON notation is written with double quotes.
Nonetheless, as others have stated, it is most important to remain consistent.
Section 7.8.4 of the specification describes literal string notation. The only difference is that DoubleStringCharacter is "SourceCharacter but not double-quote" and SingleStringCharacter is "SourceCharacter but not single-quote". So the only difference can be demonstrated thusly:
'A string that\'s single quoted'
"A string that's double quoted"
So it depends on how much quote escaping you want to do. Obviously the same applies to double quotes in double quoted strings.
I'd like to say the difference is purely stylistic, but I'm really having my doubts. Consider the following example:
/*
Add trim() functionality to JavaScript...
1. By extending the String prototype
2. By creating a 'stand-alone' function
This is just to demonstrate results are the same in both cases.
*/
// Extend the String prototype with a trim() method
String.prototype.trim = function() {
return this.replace(/^\s+|\s+$/g, '');
};
// 'Stand-alone' trim() function
function trim(str) {
return str.replace(/^\s+|\s+$/g, '');
};
document.writeln(String.prototype.trim);
document.writeln(trim);
In Safari, Chrome, Opera, and Internet Explorer (tested in Internet Explorer 7 and Internet Explorer 8), this will return the following:
function () {
return this.replace(/^\s+|\s+$/g, '');
}
function trim(str) {
return str.replace(/^\s+|\s+$/g, '');
}
However, Firefox will yield a slightly different result:
function () {
return this.replace(/^\s+|\s+$/g, "");
}
function trim(str) {
return str.replace(/^\s+|\s+$/g, "");
}
The single quotes have been replaced by double quotes. (Also note how the indenting space was replaced by four spaces.) This gives the impression that at least one browser parses JavaScript internally as if everything was written using double quotes. One might think, it takes Firefox less time to parse JavaScript if everything is already written according to this 'standard'.
Which, by the way, makes me a very sad panda, since I think single quotes look much nicer in code. Plus, in other programming languages, they're usually faster to use than double quotes, so it would only make sense if the same applied to JavaScript.
Conclusion: I think we need to do more research on this.
This might explain Peter-Paul Koch's test results from back in 2003.
It seems that single quotes are sometimes faster in Explorer Windows (roughly 1/3 of my tests did show a faster response time), but if Mozilla shows a difference at all, it handles double quotes slightly faster. I found no difference at all in Opera.
2014: Modern versions of Firefox/Spidermonkey don’t do this anymore.
If you're doing inline JavaScript (arguably a "bad" thing, but avoiding that discussion) single quotes are your only option for string literals, I believe.
E.g., this works fine:
<a onclick="alert('hi');">hi</a>
But you can't wrap the "hi" in double quotes, via any escaping method I'm aware of. Even " which would have been my best guess (since you're escaping quotes in an attribute value of HTML) doesn't work for me in Firefox. " won't work either because at this point you're escaping for HTML, not JavaScript.
So, if the name of the game is consistency, and you're going to do some inline JavaScript in parts of your application, I think single quotes are the winner. Someone please correct me if I'm wrong though.
Technically there's no difference. It's only matter of style and convention.
Douglas Crockford1 recommends using double quotes.
I personally follow that.
Strictly speaking, there is no difference in meaning; so the choice comes down to convenience.
Here are several factors that could influence your choice:
House style: Some groups of developers already use one convention or the other.
Client-side requirements: Will you be using quotes within the strings? (See Ady's answer.)
Server-side language: VB.NET people might choose to use single quotes for JavaScript so that the scripts can be built server-side (VB.NET uses double-quotes for strings, so the JavaScript strings are easy to distinguished if they use single quotes).
Library code: If you're using a library that uses a particular style, you might consider using the same style yourself.
Personal preference: You might think one or other style looks better.
Just keep consistency in what you use. But don't let down your comfort level.
"This is my string."; // :-|
"I'm invincible."; // Comfortable :)
'You can\'t beat me.'; // Uncomfortable :(
'Oh! Yes. I can "beat" you.'; // Comfortable :)
"Do you really think, you can \"beat\" me?"; // Uncomfortable :(
"You're my guest. I can \"beat\" you."; // Sometimes, you've to :P
'You\'re my guest too. I can "beat" you too.'; // Sometimes, you've to :P
ECMAScript 6 update
Using template literal syntax.
`Be "my" guest. You're in complete freedom.`; // Most comfort :D
I hope I am not adding something obvious, but I have been struggling with Django, Ajax, and JSON on this.
Assuming that in your HTML code you do use double quotes, as normally should be, I highly suggest to use single quotes for the rest in JavaScript.
So I agree with ady, but with some care.
My bottom line is:
In JavaScript it probably doesn't matter, but as soon as you embed it inside HTML or the like you start to get troubles. You should know what is actually escaping, reading, passing your string.
My simple case was:
tbox.innerHTML = tbox.innerHTML + '<div class="thisbox_des" style="width:210px;" onmouseout="clear()"><a href="/this/thislist/'
+ myThis[i].pk +'"><img src="/site_media/'
+ myThis[i].fields.thumbnail +'" height="80" width="80" style="float:left;" onmouseover="showThis('
+ myThis[i].fields.left +','
+ myThis[i].fields.right +',\''
+ myThis[i].fields.title +'\')"></a><p style="float:left;width:130px;height:80px;"><b>'
+ myThis[i].fields.title +'</b> '
+ myThis[i].fields.description +'</p></div>'
You can spot the ' in the third field of showThis.
The double quote didn't work!
It is clear why, but it is also clear why we should stick to single quotes... I guess...
This case is a very simple HTML embedding, and the error was generated by a simple copy/paste from a 'double quoted' JavaScript code.
So to answer the question:
Try to use single quotes while within HTML. It might save a couple of debug issues...
It's mostly a matter of style and preference. There are some rather interesting and useful technical explorations in the other answers, so perhaps the only thing I might add is to offer a little worldly advice.
If you're coding in a company or team, then it's probably a good idea to follow the "house style".
If you're alone hacking a few side projects, then look at a few prominent leaders in the community. For example, let's say you getting into Node.js. Take a look at core modules, for example, Underscore.js or express and see what convention they use, and consider following that.
If both conventions are equally used, then defer to your personal preference.
If you don't have any personal preference, then flip a coin.
If you don't have a coin, then beer is on me ;)
I am not sure if this is relevant in today's world, but double quotes used to be used for content that needed to have control characters processed and single quotes for strings that didn't.
The compiler will run string manipulation on a double quoted string while leaving a single quoted string literally untouched. This used to lead to 'good' developers choosing to use single quotes for strings that didn't contain control characters like \n or \0 (not processed within single quotes) and double quotes when they needed the string parsed (at a slight cost in CPU cycles for processing the string).
If you are using JSHint, it will raise an error if you use a double quoted string.
I used it through the Yeoman scafflholding of AngularJS, but maybe there is somehow a manner to configure this.
By the way, when you handle HTML into JavaScript, it's easier to use single quote:
var foo = '<div class="cool-stuff">Cool content</div>';
And at least JSON is using double quotes to represent strings.
There isn't any trivial way to answer to your question.
There isn't any difference between single and double quotes in JavaScript.
The specification is important:
Maybe there are performance differences, but they are absolutely minimum and can change any day according to browsers' implementation. Further discussion is futile unless your JavaScript application is hundreds of thousands lines long.
It's like a benchmark if
a=b;
is faster than
a = b;
(extra spaces)
today, in a particular browser and platform, etc.
Examining the pros and cons
In favor of single quotes
Less visual clutter.
Generating HTML: HTML attributes are usually delimited by double quotes.
elem.innerHTML = 'Hello';
However, single quotes are just as legal in HTML.
elem.innerHTML = "<a href='" + url + "'>Hello</a>";
Furthermore, inline HTML is normally an anti-pattern. Prefer templates.
Generating JSON: Only double quotes are allowed in JSON.
myJson = '{ "hello world": true }';
Again, you shouldn’t have to construct JSON this way. JSON.stringify() is often enough. If not, use templates.
In favor of double quotes
Doubles are easier to spot if you don't have color coding. Like in a console log or some kind of view-source setup.
Similarity to other languages: In shell programming (Bash etc.), single-quoted string literals exist, but escapes are not interpreted inside them. C and Java use double quotes for strings and single quotes for characters.
If you want code to be valid JSON, you need to use double quotes.
In favor of both
There is no difference between the two in JavaScript. Therefore, you can use whatever is convenient at the moment. For example, the following string literals all produce the same string:
"He said: \"Let's go!\""
'He said: "Let\'s go!"'
"He said: \"Let\'s go!\""
'He said: \"Let\'s go!\"'
Single quotes for internal strings and double for external. That allows you to distinguish internal constants from strings that are to be displayed to the user (or written to disk etc.). Obviously, you should avoid putting the latter in your code, but that can’t always be done.
Talking about performance, quotes will never be your bottleneck. However, the performance is the same in both cases.
Talking about coding speed, if you use ' for delimiting a string, you will need to escape " quotes. You are more likely to need to use " inside the string. Example:
// JSON Objects:
var jsonObject = '{"foo":"bar"}';
// HTML attributes:
document.getElementById("foobar").innerHTML = '<input type="text">';
Then, I prefer to use ' for delimiting the string, so I have to escape fewer characters.
There are people that claim to see performance differences: old mailing list thread. But I couldn't find any of them to be confirmed.
The main thing is to look at what kind of quotes (double or single) you are using inside your string. It helps to keep the number of escapes low. For instance, when you are working with HTML content inside your strings, it is easier to use single quotes so that you don't have to escape all double quotes around the attributes.
When using CoffeeScript I use double quotes. I agree that you should pick either one and stick to it. CoffeeScript gives you interpolation when using the double quotes.
"This is my #{name}"
ECMAScript 6 is using back ticks (`) for template strings. Which probably has a good reason, but when coding, it can be cumbersome to change the string literals character from quotes or double quotes to backticks in order to get the interpolation feature. CoffeeScript might not be perfect, but using the same string literals character everywhere (double quotes) and always be able to interpolate is a nice feature.
`This is my ${name}`
I would use double quotes when single quotes cannot be used and vice versa:
"'" + singleQuotedValue + "'"
'"' + doubleQuotedValue + '"'
Instead of:
'\'' + singleQuotedValue + '\''
"\"" + doubleQuotedValue + "\""
There is strictly no difference, so it is mostly a matter of taste and of what is in the string (or if the JavaScript code itself is in a string), to keep number of escapes low.
The speed difference legend might come from PHP world, where the two quotes have different behavior.
The difference is purely stylistic. I used to be a double-quote Nazi. Now I use single quotes in nearly all cases. There's no practical difference beyond how your editor highlights the syntax.
You can use single quotes or double quotes.
This enables you for example to easily nest JavaScript content inside HTML attributes, without the need to escape the quotes.
The same is when you create JavaScript with PHP.
The general idea is: if it is possible use such quotes that you won't need to escape.
Less escaping = better code.
In addition, it seems the specification (currently mentioned at MDN) doesn't state any difference between single and double quotes except closing and some unescaped few characters. However, template literal (` - the backtick character) assumes additional parsing/processing.
A string literal is 0 or more Unicode code points enclosed in single or double quotes. Unicode code points may also be represented by an escape sequence. All code points may appear literally in a string literal except for the closing quote code points, U+005C (REVERSE SOLIDUS), U+000D (CARRIAGE RETURN), and U+000A (LINE FEED). Any code points may appear in the form of an escape sequence. String literals evaluate to ECMAScript String values...
Source: https://tc39.es/ecma262/#sec-literals-string-literals

Phone number validation - excluding non repeating separators

I have the following regex for phone number validation
function validatePhonenumber(phoneNum) {
var regex = /^[1-9]{3}[-\s\.]{0,1}[0-9]{3}[-\s\.]{0,1}[0-9]{4}$/;
return regex.test(phoneNum);
}
However, I would liek to make sure it doesn;t pass for different separators such as in
111-222.3333
Any ideas how to make sure the separators are the same always?
Just make sure beforehand that there is at most one kind of separator, then pass the string through the regex as you were doing.
function validatePhonenumber(phoneNum) {
var separators = extractSeparators(phoneNum);
if(separators.length > 1) return false;
var regex = /^[1-9]{3}[-\s\.]{0,1}[0-9]{3}[-\s\.]{0,1}[0-9]{3}$/;
return regex.test(phoneNum);
}
function extractSeparators(str){
// Return an array with all the distinct chars
// that are present in the passed string
// and are not numeric (0-9)
}
You can use the following regex instead:
\d{3}([-\s\.])?\d{3}\1?\d{4}
Here is a working example:
http://regex101.com/r/nN9nT7/1
As result it will match the following result:
111-222-3333 --> ok
111.222.3333 --> ok
111 222 3333 --> ok
111-222.3333
111.222-3333
111-222 3333
111 222-3333
EDIT: after Alan Moore's suggestion:
Also matches 111-2223333. That's because you made the \1 optional,
which isn't necessary. One of JavaScript's stranger quirks is that a
backreference to a group that did not participate in the match,
succeeds anyway. So if there's no first separator, ([-\s.])? succeeds
because the ? made it optional, and \1 succeeds because it's
JavaScript. But I would have used ([-\s.]?) to capture the first
separator (which might be nothing), and \1 to match the same thing
again. This works in any flavor, including JavaScript.
We can improve the regex to:
^\d{3}([-\s\.]?)\d{3}\1\d{4}$
You'll need at least two passes to keep this maintainable and extensible.
JS' RegEx doesn't allow for creating variables for use later in the RegEx, if you want to support older browsers.
If you are only supporting modern browsers, Fede's answer is just fine...
As such, with ghetto-support, you aren't going to be able to reliably check that one separator is the same value every time, without writing a really, really, really, stupidly-long RegEx, using | to basically write out the RegEx 3 times.
A better way might be to grab all of the separators, and use a reduction or a filter to check that they all have the same value.
var userEnteredNumber = "999.231 3055";
var validNumber = numRegEx.test(userEnteredNumber);
var separators = userEnteredNumber.replace(/\d+/g, "").split("");
var firstSeparator = separators[0];
var uniformSeparators = separators.every(function (separator) { return separator === firstSeparator; });
if (!uniformSeparators) { /* also not valid */ }
You could make that a little neater, using closures and some applied functions, but that's the idea.
Alternatively, here's the big, ugly RegEx that would allow you to test exactly what the user entered.
var separatorTest = /^([0-9]{3}\.[0-9]{3}\.[0-9]{3,4})|([0-9]{3}-[0-9]{3}-[0-9]{3,4})|([0-9]{3} [0-9]{3} [0-9]{3,4})|([0-9]{9,10})$/;
Notice I had to include the exact same number-test three times, wrap each one in parens (to be treated as a single group), and then separate each group with an | to check each group, like an if, else if, else... ...and then plug in a separate special case for having no separator at all...
...not pretty.
I'm also not using \d, just because it's easy to forget that - and . are both accepted "digit"s, when trying to maintain one of these abominations.
Now, a word or two of warning:
People are liable to enter all kinds of crap; if this is for a commercial site, it's likely better to just strip separators entirely and validate the number is the right size, and conforms to some specifics (eg: doesn't start with /^555555/).
If not given any instruction about number format, people will happily use either no separator or a formal number, like (555) 555-5555 (or +1 (555) 555-5555 for the really pedantic), which is obviously going to fail hard, in this system (see point #1).
Be prepared to trim what you get, before validating.
Depending on your country/region/etc laws about data-security and consumer-vs-transaction record-keeping (again, may or may not be more important in a commercial setting), it's likely better to store both a "user-given" ugly number, and a system-usable number, which you either clean on the back-end, or submit along with the user-entered text.
From a user-interaction perspective, either forcing the number to conform, explicitly (placeholders showing them xxx-xxx-xxxx right above the input, in bold), or accepting any text, and prepping it yourself, is going to be 1000x better than accepting certain forms, but not bothering to tell the user up-front, and instead telling them what they did was wrong, after they try.
It's not cool for relationships; it's equally not cool, here.
You've got 9-digit and 10-digit numbers, so if you're trying for an international solution, be prepared to deal with all international separators (, \.\-\(\)\+) etc... again, why stripping is more useful, because THAT RegEx would be insane.

How to remove non alphanumeric characters and space, but keep foreign language in JavaScript

I want to remove signs like:
!##$%^&*()_+`-=[]\|{};':",./<>?。,‘“”’;【】『』
and many more.
But ensuring all the foreign characters are kept, such as Chinese, French, Greece, etc.
In Ruby, I'm able to do it with regex
/[^\p{Alnum}]/
It doesn't work in JS. Thanks in advance.
Answer
Thank Jordan Gray. The 8400-letter regex is awesome. It is a short solution as it is only 8.4kb compare to 63kb of XRegExp.
For the record, the full answer I use is
/[^1-9\u0041-\u005A\u0061-\u007A\u00AA\u00B5\u00BA\u00C0-\u00D6\u00D8-\u00F6\u00F8-\u02C1\u02C6-\u02D1\u02E0-\u02E4\u02EC\u02EE\u0370-\u0374\u0376\u0377\u037A-\u037D\u0386\u0388-\u038A\u038C\u038E-\u03A1\u03A3-\u03F5\u03F7-\u0481\u048A-\u0527\u0531-\u0556\u0559\u0561-\u0587\u05D0-\u05EA\u05F0-\u05F2\u0620-\u064A\u066E\u066F\u0671-\u06D3\u06D5\u06E5\u06E6\u06EE\u06EF\u06FA-\u06FC\u06FF\u0710\u0712-\u072F\u074D-\u07A5\u07B1\u07CA-\u07EA\u07F4\u07F5\u07FA\u0800-\u0815\u081A\u0824\u0828\u0840-\u0858\u08A0\u08A2-\u08AC\u0904-\u0939\u093D\u0950\u0958-\u0961\u0971-\u0977\u0979-\u097F\u0985-\u098C\u098F\u0990\u0993-\u09A8\u09AA-\u09B0\u09B2\u09B6-\u09B9\u09BD\u09CE\u09DC\u09DD\u09DF-\u09E1\u09F0\u09F1\u0A05-\u0A0A\u0A0F\u0A10\u0A13-\u0A28\u0A2A-\u0A30\u0A32\u0A33\u0A35\u0A36\u0A38\u0A39\u0A59-\u0A5C\u0A5E\u0A72-\u0A74\u0A85-\u0A8D\u0A8F-\u0A91\u0A93-\u0AA8\u0AAA-\u0AB0\u0AB2\u0AB3\u0AB5-\u0AB9\u0ABD\u0AD0\u0AE0\u0AE1\u0B05-\u0B0C\u0B0F\u0B10\u0B13-\u0B28\u0B2A-\u0B30\u0B32\u0B33\u0B35-\u0B39\u0B3D\u0B5C\u0B5D\u0B5F-\u0B61\u0B71\u0B83\u0B85-\u0B8A\u0B8E-\u0B90\u0B92-\u0B95\u0B99\u0B9A\u0B9C\u0B9E\u0B9F\u0BA3\u0BA4\u0BA8-\u0BAA\u0BAE-\u0BB9\u0BD0\u0C05-\u0C0C\u0C0E-\u0C10\u0C12-\u0C28\u0C2A-\u0C33\u0C35-\u0C39\u0C3D\u0C58\u0C59\u0C60\u0C61\u0C85-\u0C8C\u0C8E-\u0C90\u0C92-\u0CA8\u0CAA-\u0CB3\u0CB5-\u0CB9\u0CBD\u0CDE\u0CE0\u0CE1\u0CF1\u0CF2\u0D05-\u0D0C\u0D0E-\u0D10\u0D12-\u0D3A\u0D3D\u0D4E\u0D60\u0D61\u0D7A-\u0D7F\u0D85-\u0D96\u0D9A-\u0DB1\u0DB3-\u0DBB\u0DBD\u0DC0-\u0DC6\u0E01-\u0E30\u0E32\u0E33\u0E40-\u0E46\u0E81\u0E82\u0E84\u0E87\u0E88\u0E8A\u0E8D\u0E94-\u0E97\u0E99-\u0E9F\u0EA1-\u0EA3\u0EA5\u0EA7\u0EAA\u0EAB\u0EAD-\u0EB0\u0EB2\u0EB3\u0EBD\u0EC0-\u0EC4\u0EC6\u0EDC-\u0EDF\u0F00\u0F40-\u0F47\u0F49-\u0F6C\u0F88-\u0F8C\u1000-\u102A\u103F\u1050-\u1055\u105A-\u105D\u1061\u1065\u1066\u106E-\u1070\u1075-\u1081\u108E\u10A0-\u10C5\u10C7\u10CD\u10D0-\u10FA\u10FC-\u1248\u124A-\u124D\u1250-\u1256\u1258\u125A-\u125D\u1260-\u1288\u128A-\u128D\u1290-\u12B0\u12B2-\u12B5\u12B8-\u12BE\u12C0\u12C2-\u12C5\u12C8-\u12D6\u12D8-\u1310\u1312-\u1315\u1318-\u135A\u1380-\u138F\u13A0-\u13F4\u1401-\u166C\u166F-\u167F\u1681-\u169A\u16A0-\u16EA\u1700-\u170C\u170E-\u1711\u1720-\u1731\u1740-\u1751\u1760-\u176C\u176E-\u1770\u1780-\u17B3\u17D7\u17DC\u1820-\u1877\u1880-\u18A8\u18AA\u18B0-\u18F5\u1900-\u191C\u1950-\u196D\u1970-\u1974\u1980-\u19AB\u19C1-\u19C7\u1A00-\u1A16\u1A20-\u1A54\u1AA7\u1B05-\u1B33\u1B45-\u1B4B\u1B83-\u1BA0\u1BAE\u1BAF\u1BBA-\u1BE5\u1C00-\u1C23\u1C4D-\u1C4F\u1C5A-\u1C7D\u1CE9-\u1CEC\u1CEE-\u1CF1\u1CF5\u1CF6\u1D00-\u1DBF\u1E00-\u1F15\u1F18-\u1F1D\u1F20-\u1F45\u1F48-\u1F4D\u1F50-\u1F57\u1F59\u1F5B\u1F5D\u1F5F-\u1F7D\u1F80-\u1FB4\u1FB6-\u1FBC\u1FBE\u1FC2-\u1FC4\u1FC6-\u1FCC\u1FD0-\u1FD3\u1FD6-\u1FDB\u1FE0-\u1FEC\u1FF2-\u1FF4\u1FF6-\u1FFC\u2071\u207F\u2090-\u209C\u2102\u2107\u210A-\u2113\u2115\u2119-\u211D\u2124\u2126\u2128\u212A-\u212D\u212F-\u2139\u213C-\u213F\u2145-\u2149\u214E\u2183\u2184\u2C00-\u2C2E\u2C30-\u2C5E\u2C60-\u2CE4\u2CEB-\u2CEE\u2CF2\u2CF3\u2D00-\u2D25\u2D27\u2D2D\u2D30-\u2D67\u2D6F\u2D80-\u2D96\u2DA0-\u2DA6\u2DA8-\u2DAE\u2DB0-\u2DB6\u2DB8-\u2DBE\u2DC0-\u2DC6\u2DC8-\u2DCE\u2DD0-\u2DD6\u2DD8-\u2DDE\u2E2F\u3005\u3006\u3031-\u3035\u303B\u303C\u3041-\u3096\u309D-\u309F\u30A1-\u30FA\u30FC-\u30FF\u3105-\u312D\u3131-\u318E\u31A0-\u31BA\u31F0-\u31FF\u3400-\u4DB5\u4E00-\u9FCC\uA000-\uA48C\uA4D0-\uA4FD\uA500-\uA60C\uA610-\uA61F\uA62A\uA62B\uA640-\uA66E\uA67F-\uA697\uA6A0-\uA6E5\uA717-\uA71F\uA722-\uA788\uA78B-\uA78E\uA790-\uA793\uA7A0-\uA7AA\uA7F8-\uA801\uA803-\uA805\uA807-\uA80A\uA80C-\uA822\uA840-\uA873\uA882-\uA8B3\uA8F2-\uA8F7\uA8FB\uA90A-\uA925\uA930-\uA946\uA960-\uA97C\uA984-\uA9B2\uA9CF\uAA00-\uAA28\uAA40-\uAA42\uAA44-\uAA4B\uAA60-\uAA76\uAA7A\uAA80-\uAAAF\uAAB1\uAAB5\uAAB6\uAAB9-\uAABD\uAAC0\uAAC2\uAADB-\uAADD\uAAE0-\uAAEA\uAAF2-\uAAF4\uAB01-\uAB06\uAB09-\uAB0E\uAB11-\uAB16\uAB20-\uAB26\uAB28-\uAB2E\uABC0-\uABE2\uAC00-\uD7A3\uD7B0-\uD7C6\uD7CB-\uD7FB\uF900-\uFA6D\uFA70-\uFAD9\uFB00-\uFB06\uFB13-\uFB17\uFB1D\uFB1F-\uFB28\uFB2A-\uFB36\uFB38-\uFB3C\uFB3E\uFB40\uFB41\uFB43\uFB44\uFB46-\uFBB1\uFBD3-\uFD3D\uFD50-\uFD8F\uFD92-\uFDC7\uFDF0-\uFDFB\uFE70-\uFE74\uFE76-\uFEFC\uFF21-\uFF3A\uFF41-\uFF5A\uFF66-\uFFBE\uFFC2-\uFFC7\uFFCA-\uFFCF\uFFD2-\uFFD7\uFFDA-\uFFDC]+/g
This is funny, isn't it?
Someone has already suggested XRegExp. That's a great package, but if all you want is a way of matching letters you can grab it straight from the source.
In this case, what you're looking for is a way to match the whole Unicode Letter category, which is defined in the source for the XRegExp Unicode Base package. With XRegExp, you could match these with \p{L} or \p{Letter}. A JS-compatible expression for matching characters not in this category would work out to (brace yourself):
var NonLetters = /[^\u0041-\u005A\u0061-\u007A\u00AA\u00B5\u00BA\u00C0-\u00D6\u00D8-\u00F6\u00F8-\u02C1\u02C6-\u02D1\u02E0-\u02E4\u02EC\u02EE\u0370-\u0374\u0376\u0377\u037A-\u037D\u0386\u0388-\u038A\u038C\u038E-\u03A1\u03A3-\u03F5\u03F7-\u0481\u048A-\u0527\u0531-\u0556\u0559\u0561-\u0587\u05D0-\u05EA\u05F0-\u05F2\u0620-\u064A\u066E\u066F\u0671-\u06D3\u06D5\u06E5\u06E6\u06EE\u06EF\u06FA-\u06FC\u06FF\u0710\u0712-\u072F\u074D-\u07A5\u07B1\u07CA-\u07EA\u07F4\u07F5\u07FA\u0800-\u0815\u081A\u0824\u0828\u0840-\u0858\u08A0\u08A2-\u08AC\u0904-\u0939\u093D\u0950\u0958-\u0961\u0971-\u0977\u0979-\u097F\u0985-\u098C\u098F\u0990\u0993-\u09A8\u09AA-\u09B0\u09B2\u09B6-\u09B9\u09BD\u09CE\u09DC\u09DD\u09DF-\u09E1\u09F0\u09F1\u0A05-\u0A0A\u0A0F\u0A10\u0A13-\u0A28\u0A2A-\u0A30\u0A32\u0A33\u0A35\u0A36\u0A38\u0A39\u0A59-\u0A5C\u0A5E\u0A72-\u0A74\u0A85-\u0A8D\u0A8F-\u0A91\u0A93-\u0AA8\u0AAA-\u0AB0\u0AB2\u0AB3\u0AB5-\u0AB9\u0ABD\u0AD0\u0AE0\u0AE1\u0B05-\u0B0C\u0B0F\u0B10\u0B13-\u0B28\u0B2A-\u0B30\u0B32\u0B33\u0B35-\u0B39\u0B3D\u0B5C\u0B5D\u0B5F-\u0B61\u0B71\u0B83\u0B85-\u0B8A\u0B8E-\u0B90\u0B92-\u0B95\u0B99\u0B9A\u0B9C\u0B9E\u0B9F\u0BA3\u0BA4\u0BA8-\u0BAA\u0BAE-\u0BB9\u0BD0\u0C05-\u0C0C\u0C0E-\u0C10\u0C12-\u0C28\u0C2A-\u0C33\u0C35-\u0C39\u0C3D\u0C58\u0C59\u0C60\u0C61\u0C85-\u0C8C\u0C8E-\u0C90\u0C92-\u0CA8\u0CAA-\u0CB3\u0CB5-\u0CB9\u0CBD\u0CDE\u0CE0\u0CE1\u0CF1\u0CF2\u0D05-\u0D0C\u0D0E-\u0D10\u0D12-\u0D3A\u0D3D\u0D4E\u0D60\u0D61\u0D7A-\u0D7F\u0D85-\u0D96\u0D9A-\u0DB1\u0DB3-\u0DBB\u0DBD\u0DC0-\u0DC6\u0E01-\u0E30\u0E32\u0E33\u0E40-\u0E46\u0E81\u0E82\u0E84\u0E87\u0E88\u0E8A\u0E8D\u0E94-\u0E97\u0E99-\u0E9F\u0EA1-\u0EA3\u0EA5\u0EA7\u0EAA\u0EAB\u0EAD-\u0EB0\u0EB2\u0EB3\u0EBD\u0EC0-\u0EC4\u0EC6\u0EDC-\u0EDF\u0F00\u0F40-\u0F47\u0F49-\u0F6C\u0F88-\u0F8C\u1000-\u102A\u103F\u1050-\u1055\u105A-\u105D\u1061\u1065\u1066\u106E-\u1070\u1075-\u1081\u108E\u10A0-\u10C5\u10C7\u10CD\u10D0-\u10FA\u10FC-\u1248\u124A-\u124D\u1250-\u1256\u1258\u125A-\u125D\u1260-\u1288\u128A-\u128D\u1290-\u12B0\u12B2-\u12B5\u12B8-\u12BE\u12C0\u12C2-\u12C5\u12C8-\u12D6\u12D8-\u1310\u1312-\u1315\u1318-\u135A\u1380-\u138F\u13A0-\u13F4\u1401-\u166C\u166F-\u167F\u1681-\u169A\u16A0-\u16EA\u1700-\u170C\u170E-\u1711\u1720-\u1731\u1740-\u1751\u1760-\u176C\u176E-\u1770\u1780-\u17B3\u17D7\u17DC\u1820-\u1877\u1880-\u18A8\u18AA\u18B0-\u18F5\u1900-\u191C\u1950-\u196D\u1970-\u1974\u1980-\u19AB\u19C1-\u19C7\u1A00-\u1A16\u1A20-\u1A54\u1AA7\u1B05-\u1B33\u1B45-\u1B4B\u1B83-\u1BA0\u1BAE\u1BAF\u1BBA-\u1BE5\u1C00-\u1C23\u1C4D-\u1C4F\u1C5A-\u1C7D\u1CE9-\u1CEC\u1CEE-\u1CF1\u1CF5\u1CF6\u1D00-\u1DBF\u1E00-\u1F15\u1F18-\u1F1D\u1F20-\u1F45\u1F48-\u1F4D\u1F50-\u1F57\u1F59\u1F5B\u1F5D\u1F5F-\u1F7D\u1F80-\u1FB4\u1FB6-\u1FBC\u1FBE\u1FC2-\u1FC4\u1FC6-\u1FCC\u1FD0-\u1FD3\u1FD6-\u1FDB\u1FE0-\u1FEC\u1FF2-\u1FF4\u1FF6-\u1FFC\u2071\u207F\u2090-\u209C\u2102\u2107\u210A-\u2113\u2115\u2119-\u211D\u2124\u2126\u2128\u212A-\u212D\u212F-\u2139\u213C-\u213F\u2145-\u2149\u214E\u2183\u2184\u2C00-\u2C2E\u2C30-\u2C5E\u2C60-\u2CE4\u2CEB-\u2CEE\u2CF2\u2CF3\u2D00-\u2D25\u2D27\u2D2D\u2D30-\u2D67\u2D6F\u2D80-\u2D96\u2DA0-\u2DA6\u2DA8-\u2DAE\u2DB0-\u2DB6\u2DB8-\u2DBE\u2DC0-\u2DC6\u2DC8-\u2DCE\u2DD0-\u2DD6\u2DD8-\u2DDE\u2E2F\u3005\u3006\u3031-\u3035\u303B\u303C\u3041-\u3096\u309D-\u309F\u30A1-\u30FA\u30FC-\u30FF\u3105-\u312D\u3131-\u318E\u31A0-\u31BA\u31F0-\u31FF\u3400-\u4DB5\u4E00-\u9FCC\uA000-\uA48C\uA4D0-\uA4FD\uA500-\uA60C\uA610-\uA61F\uA62A\uA62B\uA640-\uA66E\uA67F-\uA697\uA6A0-\uA6E5\uA717-\uA71F\uA722-\uA788\uA78B-\uA78E\uA790-\uA793\uA7A0-\uA7AA\uA7F8-\uA801\uA803-\uA805\uA807-\uA80A\uA80C-\uA822\uA840-\uA873\uA882-\uA8B3\uA8F2-\uA8F7\uA8FB\uA90A-\uA925\uA930-\uA946\uA960-\uA97C\uA984-\uA9B2\uA9CF\uAA00-\uAA28\uAA40-\uAA42\uAA44-\uAA4B\uAA60-\uAA76\uAA7A\uAA80-\uAAAF\uAAB1\uAAB5\uAAB6\uAAB9-\uAABD\uAAC0\uAAC2\uAADB-\uAADD\uAAE0-\uAAEA\uAAF2-\uAAF4\uAB01-\uAB06\uAB09-\uAB0E\uAB11-\uAB16\uAB20-\uAB26\uAB28-\uAB2E\uABC0-\uABE2\uAC00-\uD7A3\uD7B0-\uD7C6\uD7CB-\uD7FB\uF900-\uFA6D\uFA70-\uFAD9\uFB00-\uFB06\uFB13-\uFB17\uFB1D\uFB1F-\uFB28\uFB2A-\uFB36\uFB38-\uFB3C\uFB3E\uFB40\uFB41\uFB43\uFB44\uFB46-\uFBB1\uFBD3-\uFD3D\uFD50-\uFD8F\uFD92-\uFDC7\uFDF0-\uFDFB\uFE70-\uFE74\uFE76-\uFEFC\uFF21-\uFF3A\uFF41-\uFF5A\uFF66-\uFFBE\uFFC2-\uFFC7\uFFCA-\uFFCF\uFFD2-\uFFD7\uFFDA-\uFFDC]/
jsFiddle demo.
Pretty hefty, but what you'd expect if you aim to match uppercase, lowercase, title case, modifier and other letters in every language in the basic multilingual plane¹, including all ligatures and accented characters. Don't forget to add 0-9 to the beginning if you want to match numbers as well!
If you trust your Unicode not to get mangled, you could use this shorter version:
var NonLetters = /[^A-Za-zªµºÀ-ÖØ-öø-ˁˆ-ˑˠ-ˤˬˮͰ-ʹͶͷͺ-ͽΆΈ-ΊΌΎ-ΡΣ-ϵϷ-ҁҊ-ԧԱ-Ֆՙա-ևא-תװ-ײؠ-يٮٯٱ-ۓەۥۦۮۯۺ-ۼۿܐܒ-ܯݍ-ޥޱߊ-ߪߴߵߺࠀ-ࠕࠚࠤࠨࡀ-ࡘࢠࢢ-ࢬऄ-हऽॐक़-ॡॱ-ॷॹ-ॿঅ-ঌএঐও-নপ-রলশ-হঽৎড়ঢ়য়-ৡৰৱਅ-ਊਏਐਓ-ਨਪ-ਰਲਲ਼ਵਸ਼ਸਹਖ਼-ੜਫ਼ੲ-ੴઅ-ઍએ-ઑઓ-નપ-રલળવ-હઽૐૠૡଅ-ଌଏଐଓ-ନପ-ରଲଳଵ-ହଽଡ଼ଢ଼ୟ-ୡୱஃஅ-ஊஎ-ஐஒ-கஙசஜஞடணதந-பம-ஹௐఅ-ఌఎ-ఐఒ-నప-ళవ-హఽౘౙౠౡಅ-ಌಎ-ಐಒ-ನಪ-ಳವ-ಹಽೞೠೡೱೲഅ-ഌഎ-ഐഒ-ഺഽൎൠൡൺ-ൿඅ-ඖක-නඳ-රලව-ෆก-ะาำเ-ๆກຂຄງຈຊຍດ-ທນ-ຟມ-ຣລວສຫອ-ະາຳຽເ-ໄໆໜ-ໟༀཀ-ཇཉ-ཬྈ-ྌက-ဪဿၐ-ၕၚ-ၝၡၥၦၮ-ၰၵ-ႁႎႠ-ჅჇჍა-ჺჼ-ቈቊ-ቍቐ-ቖቘቚ-ቝበ-ኈኊ-ኍነ-ኰኲ-ኵኸ-ኾዀዂ-ዅወ-ዖዘ-ጐጒ-ጕጘ-ፚᎀ-ᎏᎠ-Ᏼᐁ-ᙬᙯ-ᙿᚁ-ᚚᚠ-ᛪᜀ-ᜌᜎ-ᜑᜠ-ᜱᝀ-ᝑᝠ-ᝬᝮ-ᝰក-ឳៗៜᠠ-ᡷᢀ-ᢨᢪᢰ-ᣵᤀ-ᤜᥐ-ᥭᥰ-ᥴᦀ-ᦫᧁ-ᧇᨀ-ᨖᨠ-ᩔᪧᬅ-ᬳᭅ-ᭋᮃ-ᮠᮮᮯᮺ-ᯥᰀ-ᰣᱍ-ᱏᱚ-ᱽᳩ-ᳬᳮ-ᳱᳵᳶᴀ-ᶿḀ-ἕἘ-Ἕἠ-ὅὈ-Ὅὐ-ὗὙὛὝὟ-ώᾀ-ᾴᾶ-ᾼιῂ-ῄῆ-ῌῐ-ΐῖ-Ίῠ-Ῥῲ-ῴῶ-ῼⁱⁿₐ-ₜℂℇℊ-ℓℕℙ-ℝℤΩℨK-ℭℯ-ℹℼ-ℿⅅ-ⅉⅎↃↄⰀ-Ⱞⰰ-ⱞⱠ-ⳤⳫ-ⳮⳲⳳⴀ-ⴥⴧⴭⴰ-ⵧⵯⶀ-ⶖⶠ-ⶦⶨ-ⶮⶰ-ⶶⶸ-ⶾⷀ-ⷆⷈ-ⷎⷐ-ⷖⷘ-ⷞⸯ々〆〱-〵〻〼ぁ-ゖゝ-ゟァ-ヺー-ヿㄅ-ㄭㄱ-ㆎㆠ-ㆺㇰ-ㇿ㐀-䶵一-鿌ꀀ-ꒌꓐ-ꓽꔀ-ꘌꘐ-ꘟꘪꘫꙀ-ꙮꙿ-ꚗꚠ-ꛥꜗ-ꜟꜢ-ꞈꞋ-ꞎꞐ-ꞓꞠ-Ɦꟸ-ꠁꠃ-ꠅꠇ-ꠊꠌ-ꠢꡀ-ꡳꢂ-ꢳꣲ-ꣷꣻꤊ-ꤥꤰ-ꥆꥠ-ꥼꦄ-ꦲꧏꨀ-ꨨꩀ-ꩂꩄ-ꩋꩠ-ꩶꩺꪀ-ꪯꪱꪵꪶꪹ-ꪽꫀꫂꫛ-ꫝꫠ-ꫪꫲ-ꫴꬁ-ꬆꬉ-ꬎꬑ-ꬖꬠ-ꬦꬨ-ꬮꯀ-ꯢ가-힣ힰ-ퟆퟋ-ퟻ豈-舘並-龎ff-stﬓ-ﬗיִײַ-ﬨשׁ-זּטּ-לּמּנּסּףּפּצּ-ﮱﯓ-ﴽﵐ-ﶏﶒ-ﷇﷰ-ﷻﹰ-ﹴﹶ-ﻼA-Za-zヲ-하-ᅦᅧ-ᅬᅭ-ᅲᅳ-ᅵ]/
This gives a little idea of the sheer diversity of this category. Personally, I would use the first expression since it is far less likely to be invisibly mucked up in a way that you will probably never find out about.
¹ This doesn't include characters from the so-called astral planes such as emoji, ancient/historic scripts like Egyptian hieroglyphs and the many many CJK ideographs. It's your call if you think these should count as "letters" in some very extended sense and thus want to include them.
As #Barney suggested, I would go with XRegExp (v2.0.0 is about 63kb minimized). Here is an example, which I believe does what you are asking for? Using require.js to load the library.
HTML
<input id="entry" type="text"></input>
<div id="display"></div>
Javascript
require.config({
paths: {
'XRegExp': 'https://raw.github.com/slevithan/xregexp/v2.0.0/min/xregexp-all-min'
}
})
require(['XRegExp'], function() {
var display = document.getElementById('display'),
alnum = XRegExp('[^\\p{L}\\p{Nd}]', 'g');
document.getElementById('entry').addEventListener('keyup', function(evt) {
display.textContent = XRegExp.replace(evt.target.value, alnum, '');
}, false);
});
On jsFiddle

Categories

Resources