What Javascript constructs does JsLex incorrectly lex? - javascript

JsLex is a Javascript lexer I've written in Python. It does a good job for a day's work (or so), but I'm sure there are cases it gets wrong. In particular, it doesn't understand anything about semicolon insertion, and there are probably ways that's important for lexing. I just don't know what they are.
What Javascript code does JsLex lex incorrectly? I'm especially interested in valid Javascript source where JsLex incorrectly identifies regex literals.
Just to be clear, by "lexing" I mean identifying tokens in a source file. JsLex makes no attempt to parse Javascript, much less execute it. I've written JsLex to do full lexing, though to be honest I would be happy if it merely was able to successfully find all the regex literals.

Interestingly enough I tried your lexer on the code of my lexer/evaluator written in JS ;) You're right, it is not always doing well with regular expressions. Here some examples:
rexl.re = {
NAME: /^(?!\d)(?:\w)+|^"(?:[^"]|"")+"/,
UNQUOTED_LITERAL: /^#(?:(?!\d)(?:\w|\:)+|^"(?:[^"]|"")+")\[[^\]]+\]/,
QUOTED_LITERAL: /^'(?:[^']|'')*'/,
NUMERIC_LITERAL: /^[0-9]+(?:\.[0-9]*(?:[eE][-+][0-9]+)?)?/,
SYMBOL: /^(?:==|=|<>|<=|<|>=|>|!~~|!~|~~|~|!==|!=|!~=|!~|!|&|\||\.|\:|,|\(|\)|\[|\]|\{|\}|\?|\:|;|#|\^|\/\+|\/|\*|\+|-)/
};
This one is mostly fine - only UNQUITED_LITERAL is not recognized, otherwise all is fine. But now let's make a minor addition to it:
rexl.re = {
NAME: /^(?!\d)(?:\w)+|^"(?:[^"]|"")+"/,
UNQUOTED_LITERAL: /^#(?:(?!\d)(?:\w|\:)+|^"(?:[^"]|"")+")\[[^\]]+\]/,
QUOTED_LITERAL: /^'(?:[^']|'')*'/,
NUMERIC_LITERAL: /^[0-9]+(?:\.[0-9]*(?:[eE][-+][0-9]+)?)?/,
SYMBOL: /^(?:==|=|<>|<=|<|>=|>|!~~|!~|~~|~|!==|!=|!~=|!~|!|&|\||\.|\:|,|\(|\)|\[|\]|\{|\}|\?|\:|;|#|\^|\/\+|\/|\*|\+|-)/
};
str = '"';
Now all after the NAME's regexp messes up. It makes 1 big string. I think the latter problem is that String token is too greedy. The former one might be too smart regexp for the regex token.
Edit: I think I've fixed the regexp for the regex token. In your code replace lines 146-153 (the whole 'following characters' part) with the following expression:
([^/]|(?<!\\)(?<=\\)/)*
The idea is to allow everything except /, also allow \/, but not allow \\/.
Edit: Another interesting case, passes after the fix, but might be interesting to add as the built-in test case:
case 'UNQUOTED_LITERAL':
case 'QUOTED_LITERAL': {
this._js = "e.str(\"" + this.value.replace(/\\/g, "\\\\").replace(/"/g, "\\\"") + "\")";
break;
}
Edit: Yet another case. It appears to be too greedy about keywords as well. See the case:
var clazz = function() {
if (clazz.__) return delete(clazz.__);
this.constructor = clazz;
if(constructor)
constructor.apply(this, arguments);
};
It lexes it as: (keyword, const), (id, ructor). The same happens for an identifier inherits: in and herits.

Example: The first occurrence of / 2 /i below (the assignment to a) should tokenize as Div, NumericLiteral, Div, Identifier, because it is in a InputElementDiv context. The second occurrence (the assignment to b) should tokenize as RegularExpressionLiteral, because it is in a InputElementRegExp context.
i = 1;
var a = 1 / 2 /i;
console.info(a); // ⇒ 0.5
console.info(typeof a); // number
var b = 1 + / 2 /i;
console.info(b); // ⇒ 1/2/i
console.info(typeof b); // ⇒ string
Source:
There are two goal symbols for the lexical grammar. The InputElementDiv symbol is used in those syntactic grammar contexts where a division (/) or division-assignment (/=) operator is permitted. The InputElementRegExp symbol is used in other syntactic grammar contexts.
Note that contexts exist in the syntactic grammar where both a division and a RegularExpressionLiteral are permitted by the syntactic grammar; however, since the lexical grammar uses the InputElementDiv goal symbol in such cases, the opening slash is not recognised as starting a regular expression literal in such a context. As a workaround, one may enclose the regular expression literal in parentheses.
— Standard ECMA-262 3rd Edition - December 1999, p. 11

The simplicity of your solution for handling this hairy problem is very cool, but I noticed that it doesn't quite handle a change in something.property syntax for ES5, which allows reserved words following a .. I.e., a.if = 'foo'; (function () {a.if /= 3;});, is a valid statement in some recent implementations.
Unless I'm mistaken there is only one use of . anyway for properties, so the fix could be adding an additional state following the . which only accepts the identifierName token (which is what identifier uses, but it doesn't reject reserved words) would probably do the trick. (Obviously the div state follows that as per usual.)

I've been thinking about the problems of writing a lexer for JavaScript myself, and I just came across your implementation in my search for good techniques. I found a case where yours doesn't work that I thought I'd share if you're still interested:
var g = 3, x = { valueOf: function() { return 6;} } /2/g;
The slashes should both be parsed as division operators, resulting in x being assigned the numeric value 1. Your lexer thinks that it is a regexp. There is no way to handle all variants of this case correctly without maintaining a stack of grouping contexts to distinguish among the end of a block (expect regexp), the end of a function statement (expect regexp), the end of a function expression (expect division), and the end of an object literal (expect division).

Does it work properly for this code (this shouldn't have a semicolon; it produces an error when lexed properly)?
function square(num) {
var result;
var f = function (x) {
return x * x;
}
(result = f(num));
return result;
}
If it does, does it work properly for this code, that relies on semicolon insertion?
function square(num) {
var f = function (x) {
return x * x;
}
return f(num);
}

Related

Capitalize first character of a template literal [duplicate]

How do I make the first character of a string uppercase if it's a letter, but not change the case of any of the other letters?
For example:
"this is a test" → "This is a test"
"the Eiffel Tower" → "The Eiffel Tower"
"/index.html" → "/index.html"
function capitalizeFirstLetter(string) {
return string.charAt(0).toUpperCase() + string.slice(1);
}
Some other answers modify String.prototype (this answer used to as well), but I would advise against this now due to maintainability (hard to find out where the function is being added to the prototype and could cause conflicts if other code uses the same name/a browser adds a native function with that same name in future).
Edited to add this DISCLAIMER: please read the comments to understand the risks of editing JS basic types.
Here's a more object-oriented approach:
Object.defineProperty(String.prototype, 'capitalize', {
value: function() {
return this.charAt(0).toUpperCase() + this.slice(1);
},
enumerable: false
});
You'd call the function, like this:
"hello, world!".capitalize();
With the expected output being:
"Hello, world!"
In CSS:
p::first-letter {
text-transform:capitalize;
}
Here is a shortened version of the popular answer that gets the first letter by treating the string as an array:
function capitalize(s)
{
return s[0].toUpperCase() + s.slice(1);
}
Update
According to the comments below this doesn't work in IE 7 or below.
Update 2:
To avoid undefined for empty strings (see #njzk2's comment below), you can check for an empty string:
function capitalize(s)
{
return s && s[0].toUpperCase() + s.slice(1);
}
ES version
const capitalize = s => s && s[0].toUpperCase() + s.slice(1)
// to always return type string event when s may be falsy other than empty-string
const capitalize = s => (s && s[0].toUpperCase() + s.slice(1)) || ""
If you're interested in the performance of a few different methods posted:
Here are the fastest methods based on this jsperf test (ordered from fastest to slowest).
As you can see, the first two methods are essentially comparable in terms of performance, whereas altering the String.prototype is by far the slowest in terms of performance.
// 10,889,187 operations/sec
function capitalizeFirstLetter(string) {
return string[0].toUpperCase() + string.slice(1);
}
// 10,875,535 operations/sec
function capitalizeFirstLetter(string) {
return string.charAt(0).toUpperCase() + string.slice(1);
}
// 4,632,536 operations/sec
function capitalizeFirstLetter(string) {
return string.replace(/^./, string[0].toUpperCase());
}
// 1,977,828 operations/sec
String.prototype.capitalizeFirstLetter = function() {
return this.charAt(0).toUpperCase() + this.slice(1);
}
I didn’t see any mention in the existing answers of issues related to astral plane code points or internationalization. “Uppercase” doesn’t mean the same thing in every language using a given script.
Initially I didn’t see any answers addressing issues related to astral plane code points. There is one, but it’s a bit buried (like this one will be, I guess!)
Overview of the hidden problem and various approaches to it
Most of the proposed functions look like this:
function capitalizeFirstLetter(str) {
return str[0].toUpperCase() + str.slice(1);
}
However, some cased characters fall outside the BMP (basic multilingual plane, code points U+0 to U+FFFF). For example take this Deseret text:
capitalizeFirstLetter("𐐶𐐲𐑌𐐼𐐲𐑉"); // "𐐶𐐲𐑌𐐼𐐲𐑉"
The first character here fails to capitalize because the array-indexed properties of strings don’t access “characters” or code points*. They access UTF-16 code units. This is true also when slicing — the index values point at code units.
It happens to be that UTF-16 code units are 1:1 with USV code points within two ranges, U+0 to U+D7FF and U+E000 to U+FFFF inclusive. Most cased characters fall into those two ranges, but not all of them.
From ES2015 on, dealing with this became a bit easier. String.prototype[##iterator] yields strings corresponding to code points**. So for example, we can do this:
function capitalizeFirstLetter([ first='', ...rest ]) {
return [ first.toUpperCase(), ...rest ].join('');
}
capitalizeFirstLetter("𐐶𐐲𐑌𐐼𐐲𐑉") // "𐐎𐐲𐑌𐐼𐐲𐑉"
For longer strings, this is probably not terribly efficient*** — we don’t really need to iterate the remainder. We could use String.prototype.codePointAt to get at that first (possible) letter, but we’d still need to determine where the slice should begin. One way to avoid iterating the remainder would be to test whether the first codepoint is outside the BMP; if it isn’t, the slice begins at 1, and if it is, the slice begins at 2.
function capitalizeFirstLetter(str) {
if (!str) return '';
const firstCP = str.codePointAt(0);
const index = firstCP > 0xFFFF ? 2 : 1;
return String.fromCodePoint(firstCP).toUpperCase() + str.slice(index);
}
capitalizeFirstLetter("𐐶𐐲𐑌𐐼𐐲𐑉") // "𐐎𐐲𐑌𐐼𐐲𐑉"
You could use bitwise math instead of > 0xFFFF there, but it’s probably easier to understand this way and either would achieve the same thing.
We can also make this work in ES5 and below by taking that logic a bit further if necessary. There are no intrinsic methods in ES5 for working with codepoints, so we have to manually test whether the first code unit is a surrogate****:
function capitalizeFirstLetter(str) {
if (!str) return '';
var firstCodeUnit = str[0];
if (firstCodeUnit < '\uD800' || firstCodeUnit > '\uDFFF') {
return str[0].toUpperCase() + str.slice(1);
}
return str.slice(0, 2).toUpperCase() + str.slice(2);
}
capitalizeFirstLetter("𐐶𐐲𐑌𐐼𐐲𐑉") // "𐐎𐐲𐑌𐐼𐐲𐑉"
Deeper into internationalization (whose capitalization?)
At the start I also mentioned internationalization considerations. Some of these are very difficult to account for because they require knowledge not only of what language is being used, but also may require specific knowledge of the words in the language. For example, the Irish digraph "mb" capitalizes as "mB" at the start of a word. Another example, the German eszett, never begins a word (afaik), but still helps illustrate the problem. The lowercase eszett (“ß”) capitalizes to “SS,” but “SS” could lowercase to either “ß” or “ss” — you require out-of-band knowledge of the German language to know which is correct!
The most famous example of these kinds of issues, probably, is Turkish. In Turkish Latin, the capital form of i is İ, while the lowercase form of I is ı — they’re two different letters. Fortunately we do have a way to account for this:
function capitalizeFirstLetter([ first='', ...rest ], locale) {
return [ first.toLocaleUpperCase(locale), ...rest ].join('');
}
capitalizeFirstLetter("italy", "en") // "Italy"
capitalizeFirstLetter("italya", "tr") // "İtalya"
In a browser, the user’s most-preferred language tag is indicated by navigator.language, a list in order of preference is found at navigator.languages, and a given DOM element’s language can be obtained (usually) with Object(element.closest('[lang]')).lang || YOUR_DEFAULT_HERE in multilanguage documents.
In agents which support Unicode property character classes in RegExp, which were introduced in ES2018, we can clean stuff up further by directly expressing what characters we’re interested in:
function capitalizeFirstLetter(str, locale=navigator.language) {
return str.replace(/^\p{CWU}/u, char => char.toLocaleUpperCase(locale));
}
This could be tweaked a bit to also handle capitalizing multiple words in a string with fairly good accuracy for at least some languages, though outlying cases will be hard to avoid completely if doing so no matter what the primary language is.
The CWU or Changes_When_Uppercased character property matches all code points which change when uppercased in the generic case where specific locale data is absent. There are other interesting case-related Unicode character properties that you may wish to play around with. It’s a cool zone to explore but we’d go on all day if we enumerated em all here. Here’s something to get your curiosity going if you’re unfamiliar, though: \p{Lower} is a larger group than \p{LowercaseLetter} (aka \p{Ll}) — conveniently illustrated by the default character set comparison in this tool provided by Unicode. (NB: not everything you can reference there is also available in ES regular expressions, but most of the stuff you’re likely to want is).
Alternatives to case-mapping in JS (Firefox & CSS love the Dutch!)
If digraphs with unique locale/language/orthography capitalization rules happen to have a single-codepoint “composed” representation in Unicode, these might be used to make one’s capitalization expectations explicit even in the absence of locale data. For example, we could prefer the composed i-j digraph, ij / U+133, associated with Dutch, to ensure a case-mapping to uppercase IJ / U+132:
capitalizeFirstLetter('ijsselmeer'); // "IJsselmeer"
On the other hand, precomposed digraphs and similar are sometimes deprecated (like that one, it seems!) and may be undesirable in interchanged text regardless due to the potential copypaste nuisance if that’s not the normal way folks type the sequence in practice. Unfortunately, in the absence of the precomposition “hint,” an explicit locale won’t help here (at least as far as I know). If we spell ijsselmeer with an ordinary i + j, capitalizeFirstLetter will produce the wrong result even if we explicitly indicate nl as the locale:
capitalizeFirstLetter('ijsselmeer', 'nl'); // "Ijsselmeer" :(
(I’m not entirely sure whether there are some such cases where the behavior comes down to ICU data availability — perhaps someone else could say.)
If the point of the transformation is to display textual content in a web browser, though, you have an entirely different option available that will likely be your best bet: leveraging features of the web platform’s other core languages, HTML and CSS. Armed with HTML’s lang=... and CSS’s text-transform:..., you’ve got a (pseudo-)declarative solution that leaves extra room for the user agent to be “smart.” A JS API needs to have predictable outcomes across all browsers (generally) and isn’t free to experiment with heuristics. The user-agent itself is obligated only to its user, though, and heuristic solutions are fair game when the output is for a human being. If we tell it “this text is Dutch, but please display it capitalized,” the particular outcome might now vary between browsers, but it’s likely going to be the best each of them could do. Let’s see:
<!DOCTYPE html>
<dl>
<dt>Untransformed
<dd>ijsselmeer
<dt>Capitalized with CSS and <code>lang=en</code>
<dd lang="en" style="text-transform: capitalize">ijsselmeer
<dt>Capitalized with CSS and <code>lang=nl</code>
<dd lang="nl" style="text-transform: capitalize">ijsselmeer
In Chromium at the time of writing, both the English and Dutch lines come out as Ijsselmeer — so it does no better than JS. But try it in current Firefox! The element that we told the browser contains Dutch will be correctly rendered as IJsselmeer there.
This solution is purpose-specific (it’s not gonna help you in Node, anyway) but it was silly of me not to draw attention to it previously given some folks might not realize they’re googling the wrong question. Thanks #paul23 for clarifying more about the nature of the IJ digraph in practice and prompting further investigation!
As of January 2021, all major engines have implemented the Unicode property character class feature, but depending on your target support range you may not be able to use it safely yet. The last browser to introduce support was Firefox (78; June 30, 2020). You can check for support of this feature with the Kangax compat table. Babel can be used to compile RegExp literals with property references to equivalent patterns without them, but be aware that the resulting code can sometimes be enormous. You probably would not want to do this unless you’re certain the tradeoff is justified for your use case.
In all likelihood, people asking this question will not be concerned with Deseret capitalization or internationalization. But it’s good to be aware of these issues because there’s a good chance you’ll encounter them eventually even if they aren’t concerns presently. They’re not “edge” cases, or rather, they’re not by-definition edge cases — there’s a whole country where most people speak Turkish, anyway, and conflating code units with codepoints is a fairly common source of bugs (especially with regard to emoji). Both strings and language are pretty complicated!
* The code units of UTF-16 / UCS2 are also Unicode code points in the sense that e.g. U+D800 is technically a code point, but that’s not what it “means” here ... sort of ... though it gets pretty fuzzy. What the surrogates definitely are not, though, is USVs (Unicode scalar values).
** Though if a surrogate code unit is “orphaned” — i.e., not part of a logical pair — you could still get surrogates here, too.
*** maybe. I haven’t tested it. Unless you have determined capitalization is a meaningful bottleneck, I probably wouldn’t sweat it — choose whatever you believe is most clear and readable.
**** such a function might wish to test both the first and second code units instead of just the first, since it’s possible that the first unit is an orphaned surrogate. For example the input "\uD800x" would capitalize the X as-is, which may or may not be expected.
For another case I need it to capitalize the first letter and lowercase the rest. The following cases made me change this function:
//es5
function capitalize(string) {
return string.charAt(0).toUpperCase() + string.slice(1).toLowerCase();
}
capitalize("alfredo") // => "Alfredo"
capitalize("Alejandro")// => "Alejandro
capitalize("ALBERTO") // => "Alberto"
capitalize("ArMaNdO") // => "Armando"
// es6 using destructuring
const capitalize = ([first,...rest]) => first.toUpperCase() + rest.join('').toLowerCase();
This is the 2018 ECMAScript 6+ Solution:
const str = 'the Eiffel Tower';
const newStr = `${str[0].toUpperCase()}${str.slice(1)}`;
console.log('Original String:', str); // the Eiffel Tower
console.log('New String:', newStr); // The Eiffel Tower
If you're already (or considering) using Lodash, the solution is easy:
_.upperFirst('fred');
// => 'Fred'
_.upperFirst('FRED');
// => 'FRED'
_.capitalize('fred') //=> 'Fred'
See their documentation: https://lodash.com/docs#capitalize
_.camelCase('Foo Bar'); //=> 'fooBar'
https://lodash.com/docs/4.15.0#camelCase
_.lowerFirst('Fred');
// => 'fred'
_.lowerFirst('FRED');
// => 'fRED'
_.snakeCase('Foo Bar');
// => 'foo_bar'
Vanilla JavaScript for first upper case:
function upperCaseFirst(str){
return str.charAt(0).toUpperCase() + str.substring(1);
}
There is a very simple way to implement it by replace. For ECMAScript 6:
'foo'.replace(/^./, str => str.toUpperCase())
Result:
'Foo'
Capitalize the first letter of all words in a string:
function ucFirstAllWords( str )
{
var pieces = str.split(" ");
for ( var i = 0; i < pieces.length; i++ )
{
var j = pieces[i].charAt(0).toUpperCase();
pieces[i] = j + pieces[i].substr(1);
}
return pieces.join(" ");
}
CSS only
If the transformation is needed only for displaying on a web page:
p::first-letter {
text-transform: uppercase;
}
Despite being called "::first-letter", it applies to the first character, i.e. in case of string %a, this selector would apply to % and as such a would not be capitalized.
In IE9+ or IE5.5+ it's supported in legacy notation with only one colon (:first-letter).
ES2015 one-liner
const capitalizeFirstChar = str => str.charAt(0).toUpperCase() + str.substring(1);
Remarks
In the benchmark I performed, there was no significant difference between string.charAt(0) and string[0]. Note however, that string[0] would be undefined for an empty string, so the function would have to be rewritten to use "string && string[0]", which is way too verbose, compared to the alternative.
string.substring(1) is faster than string.slice(1).
Benchmark between substring() and slice()
The difference is rather minuscule nowadays (run the test yourself):
21,580,613.15 ops/s ±1.6% for substring(),
21,096,394.34 ops/s ±1.8% (2.24% slower) for slice().
It's always better to handle these kinds of stuff using CSS first, in general, if you can solve something using CSS, go for that first, then try JavaScript to solve your problems, so in this case try using :first-letter in CSS and apply text-transform:capitalize;
So try creating a class for that, so you can use it globally, for example: .first-letter-uppercase and add something like below in your CSS:
.first-letter-uppercase:first-letter {
text-transform:capitalize;
}
Also the alternative option is JavaScript, so the best gonna be something like this:
function capitalizeTxt(txt) {
return txt.charAt(0).toUpperCase() + txt.slice(1); //or if you want lowercase the rest txt.slice(1).toLowerCase();
}
and call it like:
capitalizeTxt('this is a test'); // return 'This is a test'
capitalizeTxt('the Eiffel Tower'); // return 'The Eiffel Tower'
capitalizeTxt('/index.html'); // return '/index.html'
capitalizeTxt('alireza'); // return 'Alireza'
capitalizeTxt('dezfoolian'); // return 'Dezfoolian'
If you want to reuse it over and over, it's better attach it to javascript native String, so something like below:
String.prototype.capitalizeTxt = String.prototype.capitalizeTxt || function() {
return this.charAt(0).toUpperCase() + this.slice(1);
}
and call it as below:
'this is a test'.capitalizeTxt(); // return 'This is a test'
'the Eiffel Tower'.capitalizeTxt(); // return 'The Eiffel Tower'
'/index.html'.capitalizeTxt(); // return '/index.html'
'alireza'.capitalizeTxt(); // return 'Alireza'
String.prototype.capitalize = function(allWords) {
return (allWords) ? // If all words
this.split(' ').map(word => word.capitalize()).join(' ') : // Break down the phrase to words and then recursive
// calls until capitalizing all words
this.charAt(0).toUpperCase() + this.slice(1); // If allWords is undefined, capitalize only the first word,
// meaning the first character of the whole string
}
And then:
"capitalize just the first word".capitalize(); ==> "Capitalize just the first word"
"capitalize all words".capitalize(true); ==> "Capitalize All Words"
Update November 2016 (ES6), just for fun:
const capitalize = (string = '') => [...string].map( // Convert to array with each item is a char of
// string by using spread operator (...)
(char, index) => index ? char : char.toUpperCase() // Index true means not equal 0, so (!index) is
// the first character which is capitalized by
// the `toUpperCase()` method
).join('') // Return back to string
then capitalize("hello") // Hello
SHORTEST 3 solutions, 1 and 2 handle cases when s string is "", null and undefined:
s&&s[0].toUpperCase()+s.slice(1) // 32 char
s&&s.replace(/./,s[0].toUpperCase()) // 36 char - using regexp
'foo'.replace(/./,x=>x.toUpperCase()) // 31 char - direct on string, ES6
let s='foo bar';
console.log( s&&s[0].toUpperCase()+s.slice(1) );
console.log( s&&s.replace(/./,s[0].toUpperCase()) );
console.log( 'foo bar'.replace(/./,x=>x.toUpperCase()) );
We could get the first character with one of my favorite RegExp, looks like a cute smiley: /^./
String.prototype.capitalize = function () {
return this.replace(/^./, function (match) {
return match.toUpperCase();
});
};
And for all coffee-junkies:
String::capitalize = ->
#replace /^./, (match) ->
match.toUpperCase()
...and for all guys who think that there's a better way of doing this, without extending native prototypes:
var capitalize = function (input) {
return input.replace(/^./, function (match) {
return match.toUpperCase();
});
};
Here is a function called ucfirst()(short for "upper case first letter"):
function ucfirst(str) {
var firstLetter = str.substr(0, 1);
return firstLetter.toUpperCase() + str.substr(1);
}
You can capitalise a string by calling ucfirst("some string") -- for example,
ucfirst("this is a test") --> "This is a test"
It works by splitting the string into two pieces. On the first line it pulls out firstLetter and then on the second line it capitalises firstLetter by calling firstLetter.toUpperCase() and joins it with the rest of the string, which is found by calling str.substr(1).
You might think this would fail for an empty string, and indeed in a language like C you would have to cater for this. However in JavaScript, when you take a substring of an empty string, you just get an empty string back.
Use:
var str = "ruby java";
console.log(str.charAt(0).toUpperCase() + str.substring(1));
It will output "Ruby java" to the console.
If you use Underscore.js or Lodash, the underscore.string library provides string extensions, including capitalize:
_.capitalize(string) Converts first letter of the string to
uppercase.
Example:
_.capitalize("foo bar") == "Foo bar"
If you're ok with capitalizing the first letter of every word, and your usecase is in HTML, you can use the following CSS:
<style type="text/css">
p.capitalize {text-transform:capitalize;}
</style>
<p class="capitalize">This is some text.</p>
This is from CSS text-transform Property (at W3Schools).
var capitalized = yourstring[0].toUpperCase() + yourstring.substr(1);
If you are wanting to reformat all-caps text, you might want to modify the other examples as such:
function capitalize (text) {
return text.charAt(0).toUpperCase() + text.slice(1).toLowerCase();
}
This will ensure that the following text is changed:
TEST => Test
This Is A TeST => This is a test
String.prototype.capitalize = function(){
return this.replace(/(^|\s)([a-z])/g,
function(m, p1, p2) {
return p1 + p2.toUpperCase();
});
};
Usage:
capitalizedString = someString.capitalize();
This is a text string => This Is A Text String
function capitalize(s) {
// returns the first letter capitalized + the string from index 1 and out aka. the rest of the string
return s[0].toUpperCase() + s.substr(1);
}
// examples
capitalize('this is a test');
=> 'This is a test'
capitalize('the Eiffel Tower');
=> 'The Eiffel Tower'
capitalize('/index.html');
=> '/index.html'
yourString.replace(/\w/, c => c.toUpperCase())
I found this arrow function easiest. Replace matches the first letter character (\w) of your string and converts it to uppercase. Nothing fancier is necessary.
var str = "test string";
str = str.substring(0,1).toUpperCase() + str.substring(1);
𝗔 𝗦𝗼𝗹𝘂𝘁𝗶𝗼𝗻 𝗧𝗵𝗮𝘁 𝗪𝗼𝗿𝗸𝘀 𝗙𝗼𝗿 𝗔𝗹𝗹 𝗨𝗻𝗶𝗰𝗼𝗱𝗲 𝗖𝗵𝗮𝗿𝗮𝗰𝘁𝗲𝗿𝘀
57 81 different answers for this question, some off-topic, and yet none of them raise the important issue that none of the solutions listed will work with Asian characters, emoji's, and other high Unicode-point-value characters in many browsers. Here is a solution that will:
const consistantCapitalizeFirstLetter = "\uD852\uDF62".length === 1 ?
function(S) {
"use-strict"; // Hooray! The browser uses UTF-32!
return S.charAt(0).toUpperCase() + S.substring(1);
} : function(S) {
"use-strict";
// The browser is using UCS16 to store UTF-16
var code = S.charCodeAt(0)|0;
return (
code >= 0xD800 && code <= 0xDBFF ? // Detect surrogate pair
S.slice(0,2).toUpperCase() + S.substring(2) :
S.charAt(0).toUpperCase() + S.substring(1)
);
};
const prettyCapitalizeFirstLetter = "\uD852\uDF62".length === 1 ?
function(S) {
"use-strict"; // Hooray! The browser uses UTF-32!
return S.charAt(0).toLocaleUpperCase() + S.substring(1);
} : function(S) {
"use-strict";
// The browser is using UCS16 to store UTF-16
var code = S.charCodeAt(0)|0;
return (
code >= 0xD800 && code <= 0xDBFF ? // Detect surrogate pair
S.slice(0,2).toLocaleUpperCase() + S.substring(2) :
S.charAt(0).toLocaleUpperCase() + S.substring(1)
);
};
Do note that the above solution tries to account for UTF-32. However, the specification officially states that browsers are required to do everything in UTF-16 mapped into UCS2. Nevertheless, if we all come together, do our part, and start preparing for UTF32, then there is a chance that the TC39 may allow browsers to start using UTF-32 (like how Python uses 24-bits for each character of the string). This must seem silly to an English speaker: no one who uses only latin-1 has ever had to deal with Mojibake because Latin-I is supported by all character encodings. But, users in other countries (such as China, Japan, Indonesia, etc.) are not so fortunate. They constantly struggle with encoding problems not just from the webpage, but also from the JavaScript: many Chinese/Japanese characters are treated as two letters by JavaScript and thus may be broken apart in the middle, resulting in � and � (two question-marks that make no sense to the end user). If we could start getting ready for UTF-32, then the TC39 might just allow browsers do what Python did many years ago which had made Python very popular for working with high Unicode characters: using UTF-32.
consistantCapitalizeFirstLetter works correctly in Internet Explorer 3+ (when the const is changed to var). prettyCapitalizeFirstLetter requires Internet Explorer 5.5+ (see the top of page 250 of this document). However, these fact are more of just jokes because it is very likely that the rest of the code on your webpage will not even work in Internet Explorer 8 - because of all the DOM and JScript bugs and lack of features in these older browsers. Further, no one uses Internet Explorer 3 or Internet Explorer 5.5 any more.
Check out this solution:
var stringVal = 'master';
stringVal.replace(/^./, stringVal[0].toUpperCase()); // Returns Master
Only because this is really a one-liner I will include this answer. It's an ES6-based interpolated string one-liner.
let setStringName = 'the Eiffel Tower';
setStringName = `${setStringName[0].toUpperCase()}${setStringName.substring(1)}`;
with arrow function
let fLCapital = s => s.replace(/./, c => c.toUpperCase())
fLCapital('this is a test') // "This is a test"
with arrow function, another solution
let fLCapital = s => s = s.charAt(0).toUpperCase() + s.slice(1);
fLCapital('this is a test') // "This is a test"
with array and map()
let namesCapital = names => names.map(name => name.replace(/./, c => c.toUpperCase()))
namesCapital(['james', 'robert', 'mary']) // ["James", "Robert", "Mary"]

Javascript regular expression to capture every possible mathematical operation between parenthesis

I am trying to capture mathematical expressions between parenthesis in a string with javascript. I need to capture parenthesis that ONLY include numbers and mathematical operators [0-9], +, - , *, /, % and the decimal dot. The examples below demonstrate what I am after. I managed to get close to the desired result but the nested parenthesis always screw my regex up so I need help! I also need it to look globally, not for first occurence only.
let string = "If(2>1,if(a>100, (-2*(3-5)(8-2)), (1+2)), (3(1+2)) )";
What I want to do if possible is manage to transform this syntax
if(condition, iftrue, iffalse)
to this syntax
if(condition) { iftrue } else { iffalse }
so that it can be evaluated by javascript and previewed in the browser. I have done it so far but if the iftrue or iffalse blocks contain parenthesis, everything blows up! So I m trying to capture that parenthesis and calculate before transforming the syntax. Any advice is appreciated.
The closest i got was this /[\d()+-*/.]/g which gets whats i want but in this example
(1+2) (1 < 1) sdasdasd (1*(2+3))
instead of dismissing the (1<1) group entirelly it matches (1 and 1). My ideal scenario would be
(1+2) (1<1) sdasdasd (1*(2+3))
Another example:
let codeToEval = "if(a>10, 2, 2*(b+4))";
codeToEval is the passed in a function that replaces a and b with the correct values so it ends up like this.
codeToEvalAfterReplacement = "if(5>10,2,2*(5+4))";
And now I want to transform this in
if(5>10) {
2
} else {
2*(5+4)
}
so it can be evaluated by javascript eval() and eventually previewed to the users.
Your current regex /[\d()+-*/.]/g will match single characters from the class
but multiple times because of the g flag, this is why (1 and 1) are still matched
in (1 < 1).
Based on your pattern requirements I would change it to /\([-+*/%.0-9()]+\)/g.
This will match parentheses containing one or more of the characters you describe within them.
Note that your current pattern has a - somewhere in the middle of a class which can lead to weird behaviours because some regex engines will treat +-* within a class as a range (plus through asterisk, which is a stange range). Notice I put - at the start of the class in the new pattern so it matches an actual -.
I've assumed there will be no empty parentheses (), if there are you can change + (one or more) after ] to * (zero or more)
The g flag is still added so you match every one of such expressions.
I can't say with 100% certainty that the new regex will allow you to robustly transform the syntax you state, as it depends on the complexity of the 'iftrue' and 'iffalse' code blocks. See if you can make it work with the new pattern, otherwise you may want to look into other solutions for parsing code.
Call function in if parenthesis and all conditions in that function.
if(test()){
// if code
}else{
// else code
}
function test(){
// check both cases here
if(case 1 && case 2){
return true
}
return false;
}

What are the actual uses of ES6 Raw String Access?

What are the actual uses of String.raw Raw String Access introduced in ECMAScript 6?
// String.raw(callSite, ...substitutions)
function quux (strings, ...values) {
strings[0] === "foo\n"
strings[1] === "bar"
strings.raw[0] === "foo\\n"
strings.raw[1] === "bar"
values[0] === 42
}
quux `foo\n${ 42 }bar`
String.raw `foo\n${ 42 }bar` === "foo\\n42bar"
I went through the below docs.
http://es6-features.org/#RawStringAccess
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/template_strings
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/raw
http://www.2ality.com/2015/01/es6-strings.html
https://msdn.microsoft.com/en-us/library/dn889830(v=vs.94).aspx
The only the thing that I understand, is that it is used to get the raw string form of template strings and used for debugging the template string.
When this can be used in real time development? They were calling this a tag function. What does that mean?
What concrete use cases am I missing?
The best, and very nearly only, use case for String.raw I can think of is if you're trying to use something like Steven Levithan's XRegExp library that accepts text with significant backslashes. Using String.raw lets you write something semantically clear rather than having to think in terms of doubling your backslashes, just like you can in a regular expression literal in JavaScript itself.
For instance, suppose I'm doing maintenance on a site and I find this:
var isSingleUnicodeWord = /^\w+$/;
...which is meant to check if a string contains only "letters." Two problems: A) There are thousands of "word" characters across the realm of human language that \w doesn't recognize, because its definition is English-centric; and B) It includes _, which many (including the Unicode consortium) would argue is not a "letter."
So if we're using XRegExp on the site, since I know it supports \pL (\p for Unicode categories, and L for "letter"), I might quickly swap this in:
var isSingleUnicodeWord = XRegExp("^\pL+$"); // WRONG
Then I wonder why it didn't work, facepalm, and go back and escape that backslash, since it's being consumed by the string literal.
Easy enough in that simple regex, but in something complicated, remembering to double all those backslashes is a maintenance pain. (Just ask Java programmers trying to use Pattern.)
Enter String.raw:
let isSingleUnicodeWord = XRegExp(String.raw`^\pL+$`);
Example:
let isSingleUnicodeWord = XRegExp(String.raw`^\pL+$`); // L: Letter
console.log(isSingleUnicodeWord.test("Русский")); // true
console.log(isSingleUnicodeWord.test("日本語")); // true
console.log(isSingleUnicodeWord.test("العربية")); // true
console.log(isSingleUnicodeWord.test("foo bar")); // false
<script src="https://cdnjs.cloudflare.com/ajax/libs/xregexp/3.1.1/xregexp-all.min.js"></script>
Now I just kick back and write what I mean. I don't even really have to worry about ${...} constructs used in template literals to do substitution, because the odds of my wanting to apply a quantifier {...} to the end-of-line assertion ($) are...low. So I can happily use substitutions and still not worry about backslashes. Lovely.
Having said that, though, if I were doing it a lot, I'd probably want to write a function and use a tagged template instead of String.raw itself. But it's surprisingly awkward to do correctly:
// My one-time tag function
function xrex(strings, ...values) {
let raw = strings.raw;
let max = Math.max(raw.length, values.length);
let result = "";
for (let i = 0; i < max; ++i) {
if (i < raw.length) {
result += raw[i];
}
if (i < values.length) {
result += values[i];
}
}
console.log("Creating with:", result);
return XRegExp(result);
}
// Using it, with a couple of substitutions to prove to myself they work
let category = "L"; // L: Letter
let maybeEol = "$";
let isSingleUnicodeWord = xrex`^\p${category}+${maybeEol}`;
console.log(isSingleUnicodeWord.test("Русский")); // true
console.log(isSingleUnicodeWord.test("日本語")); // true
console.log(isSingleUnicodeWord.test("العربية")); // true
console.log(isSingleUnicodeWord.test("foo bar")); // false
<script src="https://cdnjs.cloudflare.com/ajax/libs/xregexp/3.1.1/xregexp-all.min.js"></script>
Maybe the hassle is worth it if you're using it in lots of places, but for a couple of quick ones, String.raw is the simpler option.
First, a few things:
Template strings is old name for template literals.
A tag is a function.
String.raw is a method.
String.raw `foo\n${ 42 }bar\` is a tagged template literal.
Template literals are basically fancy strings.
Template literals can interpolate.
Template literals can be multi-line without using \.
String.raw is required to escape the escape character \.
Try putting a string that contains a new-line character \n through a function that consumes newline character.
console.log("This\nis\nawesome"); // "This\nis\nawesome"
console.log(String.raw`This\nis\nawesome`); // "This\\nis\\nawesome"
If you are wondering, console.log is not one of them. But alert is. Try running these through http://learnharmony.org/ .
alert("This\nis\nawesome");
alert(String.raw`This\nis\nawesome`);
But wait, that's not the use of String.raw.
Possible uses of String.raw method:
To show string without interpretation of backslashed characters (\n, \t) etc.
To show code for the output. (As in example below)
To be used in regex without escaping \.
To print windows director/sub-directory locations without using \\ to much. (They use \ remember. Also, lol)
Here we can show output and code for it in single alert window:
alert("I printed This\nis\nawesome with " + Sring.raw`This\nis\nawesome`);
Though, it would have been great if It's main use could have been to get back the original string. Like:
var original = String.raw`This is awesome.`;
where original would have become: This\tis \tawesome.. This isn't the case sadly.
References:
http://exploringjs.com/es6/ch_template-literals.html
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/raw
Template strings can be useful in many situations which I will explain below. Considering this, the String.raw prevents escapes from being interpreted. This can be useful in any template string in which you want to contain the escape character but do not want to escape it. A simple example could be the following:
var templateWithBackslash = String.raw `someRegExp displayed in template /^\//`
There are a few things inside that are nice to note with template strings.
They can contain unescaped line breaks without problems.
They can contain "${}". Inside these curly braces the javascript is interpreted instead.
(Note: running these will output the result to your console [in browser dev tools])
Example using line breaks:
var myTemplate = `
<div class="myClass">
<pre>
My formatted text
with multiple lines
{
asdf: "and some pretty printed json"
}
</pre>
</div>
`
console.log(myTemplate)
If you wanted to do the above with a normal string in Javascript it would look like the following:
var myTemplate = "\
<div class="myClass">\
<pre>\
My formatted text\
with multiple lines\
{\
asdf: "and some pretty printed json"\
}\
</pre>\
</div>"
console.log(myTemplate)
You will notice the first probably looks much nicer (no need to escape line breaks).
For the second I will use the same template string but also insert the some pretty printed JSON.
var jsonObj = {asdf: "and some pretty printed json", deeper: {someDeep: "Some Deep Var"}}
var myTemplate = `
<div class="myClass">
<pre>
My formatted text
with multiple lines
${JSON.stringify(jsonObj, null, 2)}
</pre>
</div>
`
console.log(myTemplate)
In NodeJS it is extremely handy when it comes to filepath handling:
var fs=require('fs');
var s = String.raw`C:\Users\<username>\AppData\Roaming\SomeApp\someObject.json`;
var username = "bob"
s=s.replace("<username>",username)
fs.readFile(s,function(err,result){
if (err) throw error;
console.log(JSON.parse(result))
})
It improves readability of filepaths on Windows. \ is also a fairly common separator, so I can definitely see why it would be useful in general. However it is pretty stupid how \ still escapes `... So ultimately:
String.raw`C:\Users\` //#==> C:\Users\`
console.log(String.raw`C:\Users\`) //#==> SyntaxError: Unexpected end of input.
In addition to its use as a tag, String.raw is also useful in implementing new tag functions as a tool to do the interleaving that most people do with a weird loop. For example, compare:
function foo(strs, ...xs) {
let result = strs[0];
for (let i = 0; i < xs.length; ++i) {
result += useFoo(xs[i]) + strs[i + 1];
}
return result;
}
with
function foo(strs, ...xs) {
return String.raw({raw: strs}, ...xs.map(useFoo));
}
The Use
(Requisite knowledge: tstring §.)
Instead of:
console.log(`\\a\\b\\c\\n\\z\\x12\\xa9\\u1234\\u00A9\\u{1234}\\u{00A9}`);
.you can:
console.log(String.raw`\a\b\c\n\z\x12\xa9\u1234\u00A9\u{1234}\u{00A9}`);
"Escaping"
<\\u> is fine, yet <\u> needs "escaping", eg:
console.log(String.raw`abc${'\\u'}abc`);
.Dit <\\x>, <\x>,
<console.log(String.raw`abc${`\\x`}abc`)>;
.<\`>, <`>, <console.log(String.raw`abc${`\``}abc`)>;
.<\${>, <${&>, <console.log(String.raw`abc${`$\{`}abc`)>;
.<\\1> (till <\\7>), <\1>, <console.log(String.raw`abc${`\\1`}abc`)>;
.<\\>, endunit <\>, <console.log(String.raw`abc${`\\`}`)>.
Nb
There's also a new "latex" string. Cf §.
I've found it to be useful for testing
my RegExps. Say I have a RegExp which
should match end-of-line comments because
I want to remove them. BUT, it must not
match source-code for a regexp like /// .
If your code contains /// it is not the
start of an EOL comment but a RegExp, as
per the rules of JavaScript syntax.
I can test whether my RegExp in variable patEOLC
matches or doesn't /// with:
String.raw`/\//` .match (patEOLC)
In other words it is a way to let my
code "see" code the way it exists in
source-code, not the way it exists
in memory after it has been read
into memory from source-code, with
all backslashes removed.
It is a way to "escape escaping" but
without having to do it separately
for every backslash in a string, but
for all of them at the same time.
It is a way to say that in a given
(back-quoted) string backslash
shall behave just like any other
character, it has no special
meaning or interpretation.

Converting lowercase character for every word to uppercase javascript [duplicate]

How do I make the first character of a string uppercase if it's a letter, but not change the case of any of the other letters?
For example:
"this is a test" → "This is a test"
"the Eiffel Tower" → "The Eiffel Tower"
"/index.html" → "/index.html"
function capitalizeFirstLetter(string) {
return string.charAt(0).toUpperCase() + string.slice(1);
}
Some other answers modify String.prototype (this answer used to as well), but I would advise against this now due to maintainability (hard to find out where the function is being added to the prototype and could cause conflicts if other code uses the same name/a browser adds a native function with that same name in future).
Edited to add this DISCLAIMER: please read the comments to understand the risks of editing JS basic types.
Here's a more object-oriented approach:
Object.defineProperty(String.prototype, 'capitalize', {
value: function() {
return this.charAt(0).toUpperCase() + this.slice(1);
},
enumerable: false
});
You'd call the function, like this:
"hello, world!".capitalize();
With the expected output being:
"Hello, world!"
In CSS:
p::first-letter {
text-transform:capitalize;
}
Here is a shortened version of the popular answer that gets the first letter by treating the string as an array:
function capitalize(s)
{
return s[0].toUpperCase() + s.slice(1);
}
Update
According to the comments below this doesn't work in IE 7 or below.
Update 2:
To avoid undefined for empty strings (see #njzk2's comment below), you can check for an empty string:
function capitalize(s)
{
return s && s[0].toUpperCase() + s.slice(1);
}
ES version
const capitalize = s => s && s[0].toUpperCase() + s.slice(1)
// to always return type string event when s may be falsy other than empty-string
const capitalize = s => (s && s[0].toUpperCase() + s.slice(1)) || ""
If you're interested in the performance of a few different methods posted:
Here are the fastest methods based on this jsperf test (ordered from fastest to slowest).
As you can see, the first two methods are essentially comparable in terms of performance, whereas altering the String.prototype is by far the slowest in terms of performance.
// 10,889,187 operations/sec
function capitalizeFirstLetter(string) {
return string[0].toUpperCase() + string.slice(1);
}
// 10,875,535 operations/sec
function capitalizeFirstLetter(string) {
return string.charAt(0).toUpperCase() + string.slice(1);
}
// 4,632,536 operations/sec
function capitalizeFirstLetter(string) {
return string.replace(/^./, string[0].toUpperCase());
}
// 1,977,828 operations/sec
String.prototype.capitalizeFirstLetter = function() {
return this.charAt(0).toUpperCase() + this.slice(1);
}
I didn’t see any mention in the existing answers of issues related to astral plane code points or internationalization. “Uppercase” doesn’t mean the same thing in every language using a given script.
Initially I didn’t see any answers addressing issues related to astral plane code points. There is one, but it’s a bit buried (like this one will be, I guess!)
Overview of the hidden problem and various approaches to it
Most of the proposed functions look like this:
function capitalizeFirstLetter(str) {
return str[0].toUpperCase() + str.slice(1);
}
However, some cased characters fall outside the BMP (basic multilingual plane, code points U+0 to U+FFFF). For example take this Deseret text:
capitalizeFirstLetter("𐐶𐐲𐑌𐐼𐐲𐑉"); // "𐐶𐐲𐑌𐐼𐐲𐑉"
The first character here fails to capitalize because the array-indexed properties of strings don’t access “characters” or code points*. They access UTF-16 code units. This is true also when slicing — the index values point at code units.
It happens to be that UTF-16 code units are 1:1 with USV code points within two ranges, U+0 to U+D7FF and U+E000 to U+FFFF inclusive. Most cased characters fall into those two ranges, but not all of them.
From ES2015 on, dealing with this became a bit easier. String.prototype[##iterator] yields strings corresponding to code points**. So for example, we can do this:
function capitalizeFirstLetter([ first='', ...rest ]) {
return [ first.toUpperCase(), ...rest ].join('');
}
capitalizeFirstLetter("𐐶𐐲𐑌𐐼𐐲𐑉") // "𐐎𐐲𐑌𐐼𐐲𐑉"
For longer strings, this is probably not terribly efficient*** — we don’t really need to iterate the remainder. We could use String.prototype.codePointAt to get at that first (possible) letter, but we’d still need to determine where the slice should begin. One way to avoid iterating the remainder would be to test whether the first codepoint is outside the BMP; if it isn’t, the slice begins at 1, and if it is, the slice begins at 2.
function capitalizeFirstLetter(str) {
if (!str) return '';
const firstCP = str.codePointAt(0);
const index = firstCP > 0xFFFF ? 2 : 1;
return String.fromCodePoint(firstCP).toUpperCase() + str.slice(index);
}
capitalizeFirstLetter("𐐶𐐲𐑌𐐼𐐲𐑉") // "𐐎𐐲𐑌𐐼𐐲𐑉"
You could use bitwise math instead of > 0xFFFF there, but it’s probably easier to understand this way and either would achieve the same thing.
We can also make this work in ES5 and below by taking that logic a bit further if necessary. There are no intrinsic methods in ES5 for working with codepoints, so we have to manually test whether the first code unit is a surrogate****:
function capitalizeFirstLetter(str) {
if (!str) return '';
var firstCodeUnit = str[0];
if (firstCodeUnit < '\uD800' || firstCodeUnit > '\uDFFF') {
return str[0].toUpperCase() + str.slice(1);
}
return str.slice(0, 2).toUpperCase() + str.slice(2);
}
capitalizeFirstLetter("𐐶𐐲𐑌𐐼𐐲𐑉") // "𐐎𐐲𐑌𐐼𐐲𐑉"
Deeper into internationalization (whose capitalization?)
At the start I also mentioned internationalization considerations. Some of these are very difficult to account for because they require knowledge not only of what language is being used, but also may require specific knowledge of the words in the language. For example, the Irish digraph "mb" capitalizes as "mB" at the start of a word. Another example, the German eszett, never begins a word (afaik), but still helps illustrate the problem. The lowercase eszett (“ß”) capitalizes to “SS,” but “SS” could lowercase to either “ß” or “ss” — you require out-of-band knowledge of the German language to know which is correct!
The most famous example of these kinds of issues, probably, is Turkish. In Turkish Latin, the capital form of i is İ, while the lowercase form of I is ı — they’re two different letters. Fortunately we do have a way to account for this:
function capitalizeFirstLetter([ first='', ...rest ], locale) {
return [ first.toLocaleUpperCase(locale), ...rest ].join('');
}
capitalizeFirstLetter("italy", "en") // "Italy"
capitalizeFirstLetter("italya", "tr") // "İtalya"
In a browser, the user’s most-preferred language tag is indicated by navigator.language, a list in order of preference is found at navigator.languages, and a given DOM element’s language can be obtained (usually) with Object(element.closest('[lang]')).lang || YOUR_DEFAULT_HERE in multilanguage documents.
In agents which support Unicode property character classes in RegExp, which were introduced in ES2018, we can clean stuff up further by directly expressing what characters we’re interested in:
function capitalizeFirstLetter(str, locale=navigator.language) {
return str.replace(/^\p{CWU}/u, char => char.toLocaleUpperCase(locale));
}
This could be tweaked a bit to also handle capitalizing multiple words in a string with fairly good accuracy for at least some languages, though outlying cases will be hard to avoid completely if doing so no matter what the primary language is.
The CWU or Changes_When_Uppercased character property matches all code points which change when uppercased in the generic case where specific locale data is absent. There are other interesting case-related Unicode character properties that you may wish to play around with. It’s a cool zone to explore but we’d go on all day if we enumerated em all here. Here’s something to get your curiosity going if you’re unfamiliar, though: \p{Lower} is a larger group than \p{LowercaseLetter} (aka \p{Ll}) — conveniently illustrated by the default character set comparison in this tool provided by Unicode. (NB: not everything you can reference there is also available in ES regular expressions, but most of the stuff you’re likely to want is).
Alternatives to case-mapping in JS (Firefox & CSS love the Dutch!)
If digraphs with unique locale/language/orthography capitalization rules happen to have a single-codepoint “composed” representation in Unicode, these might be used to make one’s capitalization expectations explicit even in the absence of locale data. For example, we could prefer the composed i-j digraph, ij / U+133, associated with Dutch, to ensure a case-mapping to uppercase IJ / U+132:
capitalizeFirstLetter('ijsselmeer'); // "IJsselmeer"
On the other hand, precomposed digraphs and similar are sometimes deprecated (like that one, it seems!) and may be undesirable in interchanged text regardless due to the potential copypaste nuisance if that’s not the normal way folks type the sequence in practice. Unfortunately, in the absence of the precomposition “hint,” an explicit locale won’t help here (at least as far as I know). If we spell ijsselmeer with an ordinary i + j, capitalizeFirstLetter will produce the wrong result even if we explicitly indicate nl as the locale:
capitalizeFirstLetter('ijsselmeer', 'nl'); // "Ijsselmeer" :(
(I’m not entirely sure whether there are some such cases where the behavior comes down to ICU data availability — perhaps someone else could say.)
If the point of the transformation is to display textual content in a web browser, though, you have an entirely different option available that will likely be your best bet: leveraging features of the web platform’s other core languages, HTML and CSS. Armed with HTML’s lang=... and CSS’s text-transform:..., you’ve got a (pseudo-)declarative solution that leaves extra room for the user agent to be “smart.” A JS API needs to have predictable outcomes across all browsers (generally) and isn’t free to experiment with heuristics. The user-agent itself is obligated only to its user, though, and heuristic solutions are fair game when the output is for a human being. If we tell it “this text is Dutch, but please display it capitalized,” the particular outcome might now vary between browsers, but it’s likely going to be the best each of them could do. Let’s see:
<!DOCTYPE html>
<dl>
<dt>Untransformed
<dd>ijsselmeer
<dt>Capitalized with CSS and <code>lang=en</code>
<dd lang="en" style="text-transform: capitalize">ijsselmeer
<dt>Capitalized with CSS and <code>lang=nl</code>
<dd lang="nl" style="text-transform: capitalize">ijsselmeer
In Chromium at the time of writing, both the English and Dutch lines come out as Ijsselmeer — so it does no better than JS. But try it in current Firefox! The element that we told the browser contains Dutch will be correctly rendered as IJsselmeer there.
This solution is purpose-specific (it’s not gonna help you in Node, anyway) but it was silly of me not to draw attention to it previously given some folks might not realize they’re googling the wrong question. Thanks #paul23 for clarifying more about the nature of the IJ digraph in practice and prompting further investigation!
As of January 2021, all major engines have implemented the Unicode property character class feature, but depending on your target support range you may not be able to use it safely yet. The last browser to introduce support was Firefox (78; June 30, 2020). You can check for support of this feature with the Kangax compat table. Babel can be used to compile RegExp literals with property references to equivalent patterns without them, but be aware that the resulting code can sometimes be enormous. You probably would not want to do this unless you’re certain the tradeoff is justified for your use case.
In all likelihood, people asking this question will not be concerned with Deseret capitalization or internationalization. But it’s good to be aware of these issues because there’s a good chance you’ll encounter them eventually even if they aren’t concerns presently. They’re not “edge” cases, or rather, they’re not by-definition edge cases — there’s a whole country where most people speak Turkish, anyway, and conflating code units with codepoints is a fairly common source of bugs (especially with regard to emoji). Both strings and language are pretty complicated!
* The code units of UTF-16 / UCS2 are also Unicode code points in the sense that e.g. U+D800 is technically a code point, but that’s not what it “means” here ... sort of ... though it gets pretty fuzzy. What the surrogates definitely are not, though, is USVs (Unicode scalar values).
** Though if a surrogate code unit is “orphaned” — i.e., not part of a logical pair — you could still get surrogates here, too.
*** maybe. I haven’t tested it. Unless you have determined capitalization is a meaningful bottleneck, I probably wouldn’t sweat it — choose whatever you believe is most clear and readable.
**** such a function might wish to test both the first and second code units instead of just the first, since it’s possible that the first unit is an orphaned surrogate. For example the input "\uD800x" would capitalize the X as-is, which may or may not be expected.
For another case I need it to capitalize the first letter and lowercase the rest. The following cases made me change this function:
//es5
function capitalize(string) {
return string.charAt(0).toUpperCase() + string.slice(1).toLowerCase();
}
capitalize("alfredo") // => "Alfredo"
capitalize("Alejandro")// => "Alejandro
capitalize("ALBERTO") // => "Alberto"
capitalize("ArMaNdO") // => "Armando"
// es6 using destructuring
const capitalize = ([first,...rest]) => first.toUpperCase() + rest.join('').toLowerCase();
This is the 2018 ECMAScript 6+ Solution:
const str = 'the Eiffel Tower';
const newStr = `${str[0].toUpperCase()}${str.slice(1)}`;
console.log('Original String:', str); // the Eiffel Tower
console.log('New String:', newStr); // The Eiffel Tower
If you're already (or considering) using Lodash, the solution is easy:
_.upperFirst('fred');
// => 'Fred'
_.upperFirst('FRED');
// => 'FRED'
_.capitalize('fred') //=> 'Fred'
See their documentation: https://lodash.com/docs#capitalize
_.camelCase('Foo Bar'); //=> 'fooBar'
https://lodash.com/docs/4.15.0#camelCase
_.lowerFirst('Fred');
// => 'fred'
_.lowerFirst('FRED');
// => 'fRED'
_.snakeCase('Foo Bar');
// => 'foo_bar'
Vanilla JavaScript for first upper case:
function upperCaseFirst(str){
return str.charAt(0).toUpperCase() + str.substring(1);
}
There is a very simple way to implement it by replace. For ECMAScript 6:
'foo'.replace(/^./, str => str.toUpperCase())
Result:
'Foo'
Capitalize the first letter of all words in a string:
function ucFirstAllWords( str )
{
var pieces = str.split(" ");
for ( var i = 0; i < pieces.length; i++ )
{
var j = pieces[i].charAt(0).toUpperCase();
pieces[i] = j + pieces[i].substr(1);
}
return pieces.join(" ");
}
CSS only
If the transformation is needed only for displaying on a web page:
p::first-letter {
text-transform: uppercase;
}
Despite being called "::first-letter", it applies to the first character, i.e. in case of string %a, this selector would apply to % and as such a would not be capitalized.
In IE9+ or IE5.5+ it's supported in legacy notation with only one colon (:first-letter).
ES2015 one-liner
const capitalizeFirstChar = str => str.charAt(0).toUpperCase() + str.substring(1);
Remarks
In the benchmark I performed, there was no significant difference between string.charAt(0) and string[0]. Note however, that string[0] would be undefined for an empty string, so the function would have to be rewritten to use "string && string[0]", which is way too verbose, compared to the alternative.
string.substring(1) is faster than string.slice(1).
Benchmark between substring() and slice()
The difference is rather minuscule nowadays (run the test yourself):
21,580,613.15 ops/s ±1.6% for substring(),
21,096,394.34 ops/s ±1.8% (2.24% slower) for slice().
It's always better to handle these kinds of stuff using CSS first, in general, if you can solve something using CSS, go for that first, then try JavaScript to solve your problems, so in this case try using :first-letter in CSS and apply text-transform:capitalize;
So try creating a class for that, so you can use it globally, for example: .first-letter-uppercase and add something like below in your CSS:
.first-letter-uppercase:first-letter {
text-transform:capitalize;
}
Also the alternative option is JavaScript, so the best gonna be something like this:
function capitalizeTxt(txt) {
return txt.charAt(0).toUpperCase() + txt.slice(1); //or if you want lowercase the rest txt.slice(1).toLowerCase();
}
and call it like:
capitalizeTxt('this is a test'); // return 'This is a test'
capitalizeTxt('the Eiffel Tower'); // return 'The Eiffel Tower'
capitalizeTxt('/index.html'); // return '/index.html'
capitalizeTxt('alireza'); // return 'Alireza'
capitalizeTxt('dezfoolian'); // return 'Dezfoolian'
If you want to reuse it over and over, it's better attach it to javascript native String, so something like below:
String.prototype.capitalizeTxt = String.prototype.capitalizeTxt || function() {
return this.charAt(0).toUpperCase() + this.slice(1);
}
and call it as below:
'this is a test'.capitalizeTxt(); // return 'This is a test'
'the Eiffel Tower'.capitalizeTxt(); // return 'The Eiffel Tower'
'/index.html'.capitalizeTxt(); // return '/index.html'
'alireza'.capitalizeTxt(); // return 'Alireza'
String.prototype.capitalize = function(allWords) {
return (allWords) ? // If all words
this.split(' ').map(word => word.capitalize()).join(' ') : // Break down the phrase to words and then recursive
// calls until capitalizing all words
this.charAt(0).toUpperCase() + this.slice(1); // If allWords is undefined, capitalize only the first word,
// meaning the first character of the whole string
}
And then:
"capitalize just the first word".capitalize(); ==> "Capitalize just the first word"
"capitalize all words".capitalize(true); ==> "Capitalize All Words"
Update November 2016 (ES6), just for fun:
const capitalize = (string = '') => [...string].map( // Convert to array with each item is a char of
// string by using spread operator (...)
(char, index) => index ? char : char.toUpperCase() // Index true means not equal 0, so (!index) is
// the first character which is capitalized by
// the `toUpperCase()` method
).join('') // Return back to string
then capitalize("hello") // Hello
SHORTEST 3 solutions, 1 and 2 handle cases when s string is "", null and undefined:
s&&s[0].toUpperCase()+s.slice(1) // 32 char
s&&s.replace(/./,s[0].toUpperCase()) // 36 char - using regexp
'foo'.replace(/./,x=>x.toUpperCase()) // 31 char - direct on string, ES6
let s='foo bar';
console.log( s&&s[0].toUpperCase()+s.slice(1) );
console.log( s&&s.replace(/./,s[0].toUpperCase()) );
console.log( 'foo bar'.replace(/./,x=>x.toUpperCase()) );
We could get the first character with one of my favorite RegExp, looks like a cute smiley: /^./
String.prototype.capitalize = function () {
return this.replace(/^./, function (match) {
return match.toUpperCase();
});
};
And for all coffee-junkies:
String::capitalize = ->
#replace /^./, (match) ->
match.toUpperCase()
...and for all guys who think that there's a better way of doing this, without extending native prototypes:
var capitalize = function (input) {
return input.replace(/^./, function (match) {
return match.toUpperCase();
});
};
Here is a function called ucfirst()(short for "upper case first letter"):
function ucfirst(str) {
var firstLetter = str.substr(0, 1);
return firstLetter.toUpperCase() + str.substr(1);
}
You can capitalise a string by calling ucfirst("some string") -- for example,
ucfirst("this is a test") --> "This is a test"
It works by splitting the string into two pieces. On the first line it pulls out firstLetter and then on the second line it capitalises firstLetter by calling firstLetter.toUpperCase() and joins it with the rest of the string, which is found by calling str.substr(1).
You might think this would fail for an empty string, and indeed in a language like C you would have to cater for this. However in JavaScript, when you take a substring of an empty string, you just get an empty string back.
Use:
var str = "ruby java";
console.log(str.charAt(0).toUpperCase() + str.substring(1));
It will output "Ruby java" to the console.
If you use Underscore.js or Lodash, the underscore.string library provides string extensions, including capitalize:
_.capitalize(string) Converts first letter of the string to
uppercase.
Example:
_.capitalize("foo bar") == "Foo bar"
If you're ok with capitalizing the first letter of every word, and your usecase is in HTML, you can use the following CSS:
<style type="text/css">
p.capitalize {text-transform:capitalize;}
</style>
<p class="capitalize">This is some text.</p>
This is from CSS text-transform Property (at W3Schools).
var capitalized = yourstring[0].toUpperCase() + yourstring.substr(1);
If you are wanting to reformat all-caps text, you might want to modify the other examples as such:
function capitalize (text) {
return text.charAt(0).toUpperCase() + text.slice(1).toLowerCase();
}
This will ensure that the following text is changed:
TEST => Test
This Is A TeST => This is a test
String.prototype.capitalize = function(){
return this.replace(/(^|\s)([a-z])/g,
function(m, p1, p2) {
return p1 + p2.toUpperCase();
});
};
Usage:
capitalizedString = someString.capitalize();
This is a text string => This Is A Text String
function capitalize(s) {
// returns the first letter capitalized + the string from index 1 and out aka. the rest of the string
return s[0].toUpperCase() + s.substr(1);
}
// examples
capitalize('this is a test');
=> 'This is a test'
capitalize('the Eiffel Tower');
=> 'The Eiffel Tower'
capitalize('/index.html');
=> '/index.html'
yourString.replace(/\w/, c => c.toUpperCase())
I found this arrow function easiest. Replace matches the first letter character (\w) of your string and converts it to uppercase. Nothing fancier is necessary.
var str = "test string";
str = str.substring(0,1).toUpperCase() + str.substring(1);
𝗔 𝗦𝗼𝗹𝘂𝘁𝗶𝗼𝗻 𝗧𝗵𝗮𝘁 𝗪𝗼𝗿𝗸𝘀 𝗙𝗼𝗿 𝗔𝗹𝗹 𝗨𝗻𝗶𝗰𝗼𝗱𝗲 𝗖𝗵𝗮𝗿𝗮𝗰𝘁𝗲𝗿𝘀
57 81 different answers for this question, some off-topic, and yet none of them raise the important issue that none of the solutions listed will work with Asian characters, emoji's, and other high Unicode-point-value characters in many browsers. Here is a solution that will:
const consistantCapitalizeFirstLetter = "\uD852\uDF62".length === 1 ?
function(S) {
"use-strict"; // Hooray! The browser uses UTF-32!
return S.charAt(0).toUpperCase() + S.substring(1);
} : function(S) {
"use-strict";
// The browser is using UCS16 to store UTF-16
var code = S.charCodeAt(0)|0;
return (
code >= 0xD800 && code <= 0xDBFF ? // Detect surrogate pair
S.slice(0,2).toUpperCase() + S.substring(2) :
S.charAt(0).toUpperCase() + S.substring(1)
);
};
const prettyCapitalizeFirstLetter = "\uD852\uDF62".length === 1 ?
function(S) {
"use-strict"; // Hooray! The browser uses UTF-32!
return S.charAt(0).toLocaleUpperCase() + S.substring(1);
} : function(S) {
"use-strict";
// The browser is using UCS16 to store UTF-16
var code = S.charCodeAt(0)|0;
return (
code >= 0xD800 && code <= 0xDBFF ? // Detect surrogate pair
S.slice(0,2).toLocaleUpperCase() + S.substring(2) :
S.charAt(0).toLocaleUpperCase() + S.substring(1)
);
};
Do note that the above solution tries to account for UTF-32. However, the specification officially states that browsers are required to do everything in UTF-16 mapped into UCS2. Nevertheless, if we all come together, do our part, and start preparing for UTF32, then there is a chance that the TC39 may allow browsers to start using UTF-32 (like how Python uses 24-bits for each character of the string). This must seem silly to an English speaker: no one who uses only latin-1 has ever had to deal with Mojibake because Latin-I is supported by all character encodings. But, users in other countries (such as China, Japan, Indonesia, etc.) are not so fortunate. They constantly struggle with encoding problems not just from the webpage, but also from the JavaScript: many Chinese/Japanese characters are treated as two letters by JavaScript and thus may be broken apart in the middle, resulting in � and � (two question-marks that make no sense to the end user). If we could start getting ready for UTF-32, then the TC39 might just allow browsers do what Python did many years ago which had made Python very popular for working with high Unicode characters: using UTF-32.
consistantCapitalizeFirstLetter works correctly in Internet Explorer 3+ (when the const is changed to var). prettyCapitalizeFirstLetter requires Internet Explorer 5.5+ (see the top of page 250 of this document). However, these fact are more of just jokes because it is very likely that the rest of the code on your webpage will not even work in Internet Explorer 8 - because of all the DOM and JScript bugs and lack of features in these older browsers. Further, no one uses Internet Explorer 3 or Internet Explorer 5.5 any more.
Check out this solution:
var stringVal = 'master';
stringVal.replace(/^./, stringVal[0].toUpperCase()); // Returns Master
Only because this is really a one-liner I will include this answer. It's an ES6-based interpolated string one-liner.
let setStringName = 'the Eiffel Tower';
setStringName = `${setStringName[0].toUpperCase()}${setStringName.substring(1)}`;
with arrow function
let fLCapital = s => s.replace(/./, c => c.toUpperCase())
fLCapital('this is a test') // "This is a test"
with arrow function, another solution
let fLCapital = s => s = s.charAt(0).toUpperCase() + s.slice(1);
fLCapital('this is a test') // "This is a test"
with array and map()
let namesCapital = names => names.map(name => name.replace(/./, c => c.toUpperCase()))
namesCapital(['james', 'robert', 'mary']) // ["James", "Robert", "Mary"]

Regular expressions - javascript

I know this is a simple thing. but i just cant make it work.
Req: A word which contain at least one number, alphabets (can be both cases) and at least one symbol (special character).
In c# (?=.[0-9])(?=.[a-zA-z])(?=.*[!##$%_]) worked. But in javascript its not working.
Seems like it always look for number at the beginning since my condition starts with number in the regexp.f
Can anyone give me a regexp that can be used in javascript?
-Rakesh
JavaScript does support lookaheads. However, your groups expect that there's at least on character before the number and letter (because they start with just a dot .). Try adding a * to those two dots:
var pattern = /(?=.*[0-9])(?=.*[a-zA-z])(?=.*[!##$%_])/;
pattern.test('xxx'); // false
pattern.test('111'); // false
pattern.test('!!!'); // false
pattern.test('x1!'); // true
I'm seeing the same problem with this regular expression in C#, too.
Just to cover the obvious answer, given the requirements as stated I would use separate tests.
/[0-9]/.test(string) && /[a-z]/i.test(string) && /[!##$%_]/.test(string)
If you're interested in abstracting this away, one way is to store the tests in an array.
var tests = [ /[0-9]/, /[a-z]/i, /[!##$%_]/ ];
And one way to evaluate multiple tests without modifying the scope of surrounding code, simply shoehorning this into a closure, follows.
var passes = (function(){
for (var i=0; i<tests.length; i++)
if (!tests[i].test(string)) return false;
return true;
})();
I don't think javascript supports those lookaheads. Try
/(.*[a-zA-Z].*[0-9].*[!##$%_].*|.*[a-zA-Z].*[!##$%_].*[0-9].*|.*[0-9].*[a-zA-Z].*[!##$%_].*|.*[!##$%_].*[a-zA-Z].*[0-9].*|.*[0-9].*[!##$%_].*[a-zA-Z].*|.*[!##$%_].*[0-9].*[a-zA-Z].*)/
Not expecting any points for elegance...
Edit: As bdukes pointed out, js does support lookaheads. However, this (ugly) expression does work.
You can have a very long reg exp, with the three character classes repeated in differtent order, or use more than one test-
function teststring(s){
return /^[\da-zA-Z!##$%_]+$/.test(s) &&
/\d/.test(s) && /[a-zA-Z]/.test(s) && /[!##$%_]/.test(s);
}

Categories

Resources