Line length for lines with strings - javascript

Being obsessed with neatness in Javascript lately, I was curious about whether there is some type of common practice about how to deal with lines that span over 80 cols due to string length. With innerHTML I can mark line breaks with a backslash and indentation spaces won't show up in the content of the element, but that doesn't seem to go for eg. console.log().
Are there any conventions for this or should I just learn to live with lines longer than 80 cols? :)

There's no universal convention. With modern high-res monitors you can easily fit 160 columns and still have room for IDE toolbars without needing to scroll, so I wouldn't be concerned about sticking to 80 columns.
Some people go out of their way to never have any line of code go past n columns, where n might be 80, or 160, or some other arbitrary number based on what fits for their preferred font and screen resolution. Some people I work with don't care and have lines that go way off to the right regardless of whether it is due to a long string or a function with lots of parameters or whatever.
I try to avoid any horizontal scrolling but I don't obsess about it so if I have a string constant that is particularly long I will probably put it all on one line. If I have a string that is built up by concatenating constants and variables I will split it over several lines, because that statement will already have several + operators that are a natural place to add line breaks. If I have a function with lots of parameters, more than would fit without scrolling, I will put each parameter on a newline. For an if statement with a lot of conditions I'd probably break that over several lines.
Regarding what you mentioned about innerHTML versus console.log(): if you break a string constant across lines in your source code by including a backslash and newline then any indenting spaces you put on the second line will become part of the string:
var myString1 = "This has been broken\
into two lines.";
// equivalent to
var myString2 = "This has been broken into two lines.";
// NOT equivalent to
var myString3 = "This has been broken\
into two lines.";
If you use that string for innerHTML the spaces will be treated the same as spaces in your HTML source, i.e., the browser will display it with multiple spaces compressed down to a single space. But for any other uses of the string in your code including console.log() the space characters will all be included.
If horizontal scrolling really bothers you and you have a long string the following method lets you have indenting without extra spaces in the string:
var myString3 = "Hi there, this has been broken"
+ " into several lines by concatenating"
+ " a number of shorter strings.";

Related

mark part of a plain text

I want to somehow mark a part of a plain text to emphasize that without putting extra characters around that.
I figured out that I can use the combining characters and used \u0332, but when I tried to emphasize a string including numbers and white spaces, I realized that the character doesn't combine with them (will be combined with the next character):
console.log("8 o'clock".replace(/./g, a => a + "\u{0332}"));
I'm trying to find any appropriate combining character or make \u0332 to combine with all other characters, or any idea like this to mark a plain text.

Remove all parentheses, commas etc from a string and then extract the first 5 words

I have a string, eg:
Lenovo K6 Power (Silver, 32 GB)(4 GB RAM), smart phone
I want to remove all parentheses and contents within the parentheses, commas etc and then extract the first five words only so that I get the result as
Lenovo K6 Power smart phone
Is there any method to apply regex to get this result?
Here's one way of doing it:
var str = 'Lenovo K6 Power (Silver, 32 GB)(4 GB RAM)';
document.write(str.match(/\w+/g).slice(0,5).join(' '));
It gets all words into an array (match(/\w+/g)), then gets the first five (slice(0,5)), to join then back to a string separated by space (join(' ')).
(And... Considering the question is tagged with regex, I believe a word could be defined as consisting of regex word characters, i.e. \w.)
Edit
The question has changed so the answer isn't correct anymore. Here's an update snippet that works with the new criteria:
var str = 'Lenovo K6 Power (Silver, 32 GB)(4 GB RAM), smart phone';
document.write(str.split(/(?:\W*\([^)]*\))*\W+/).slice(0,5).join(' '));
This one split's the string instead, using the regex (?:\W*\([^)]*\))*\W+ which will match everything but word characters (\W), unless they're inside parentheses (everything inside parentheses is matched).
spliting on that will give an array with only the desired words. Therefrom the logic is the same.
var s1 = "Lenovo K6 Power (Silver, 32 GB)(4 GB RAM), smart phone";
var s2 = s1.replace(/\([^)]*\)|, /g,'')
console.log(s2) //Output : "Lenovo K6 Power smart phone"
var myString = "Lenovo K6 Power (Silver, 32 GB)(4 GB RAM), smart phone";
while (/\(.*\)/.test(myString)) {
myString = myString.replace(/\(.*?\)/.exec(myString)[0],'');
}
console.log(myString.match(/\w+/g));
The first snippet matches all parentheses pairs as long as there are some and removes them, them it matches all remaining words.
Output: Obj... ["Lenovo", "K6", "Power", "smart", "phone"]
This is a general solution, to always only get the first 5 Elements change the console log to
var obj = myString.match(/\w+/g);
for (var i = 0; i < 5; i ++)
{
console.log(obj[i]);
}
Your question is extremely trivial and can be answered with the most basic JavaScript skills. I strongly suggest you go back and review your tutorials and intros, and try solving your problem yourself.
To remove something, you simply do
string.replace(WHAT, '')
In other words, you replace something with nothing (the empty string ''). In the case you mentioned, this, a simple Google search for something like "javascript remove regexp" will give you plenty of pointers. In this case, one of the first results actually is about removing parentheses.
In your case, I guess you finally decided you want to remove parentheses and what's inside them. In about the first five minutes of learning regexp, you should have learned to write
/\(.*?\)/g
^^ an actual left paren
^^^ any number of characters
^^ an actual right paren
^ match this over and over again
If you need help with this, try an online regexp tester such as regex101.com. It will also give you a readable version of your regexp.
The only thing moderately advanced about this is the .*?, where the ? means "non-greedy"--in other words, take characters up only to the next right paren.
I'm sure you already learned why you have to write \( to match a left parenthesis, right? The \ escapes the parentheses, because by itself the parentheses would have a special meaning to regexp. You know the g flag too, right? That means replace all the matches.
To find the first five tokens, you first need to split your string into tokens. I'm sure you recall from your studies the basic Array methods, including--drum roll--split! Split your string with string.split(' '). That will split on single spaces. If you want to split on any whitespace, you could try string.split(/\s+/).
Now go back and read the documentation for split real carefully, although I know you already have. Look carefully at the second argument, called limit. It does exactly what you want. It splits into segments, but no more than specified by limit.
The solution to your problem, which you could easily have come up with if you had spent about five minutes studying the documentation and experimenting, is
input.replace(/\(.*?\)/g, '').split(/\s+/, 5)
Unfortunately, your approach of posting on Stack Overflow is not going to scale well at all. You can't post here every time there is some minor problem you cannot figure out yourself. To be perfectly frank, if you cannot learn how to learn, then you better give up on being a programmer and try brick-laying instead. You need to learn how to figure out things yourself. Before anything else, you need to learn how to read (and digest) the documentation. Very early in your career, you're also going to need to learn how to debug your programs, since Stack Overflow is no better a way to get your programs debugged than it is to get them written for you in the first place. If you simply cannot bring yourself to read documents or learn by yourself, and can only work by asking other people how to do every little thing, then find a chat room or forum where there are people with nothing better to do than answer such questions. That is not what Stack Overflow is.
console.log('Lenovo K6 Power (Silver, 32 GB)(4 GB RAM), smart phone'.replace(/\(.*?\)/g, '').split(/\s+/, 5));
Fixing this code so as to remove the comma is left as an exercise for you to use your new-found learning powers on. Hint: you may want to use the regexp feature called "alternation", which is represented by the vertical bar or pipe |. You may also find yourself needing to use character classes, which is another thing you should have learned about very early in your regexp studies.
None of this has anything to do with TypeScript, or Angular, as you seem to have thought when you initially posted the question. It's a little concerning that you seem to think that doing basic regexp or string or array manipulation would somehow be a TypeScript or Angular issue. TypeScript is merely a typing layer on top of JavaScript. Angular is a framework for building web apps. Neither replaces JavaScript, or provides any new basic language capability. In fact, to use either effectively, you must know JavaScript well.

JavaScript + RegEx Complications- Searching Strings Not Containing SubString

I am trying to use a RegEx to search through a long string, and I am having trouble coming up with an expression. I am trying to search through some HTML for a set of tags beginning with a tag containing a certain value and ending with a different tag containing another value. The code I am currently using to attempt this is as follows:
matcher = new RegExp(".*(<[^>]+" + startText + "((?!" + endText + ").)*" + endText + ")", 'g');
data.replace(matcher, "$1");
The strangeness around the middle ( ((\\?\\!endText).)* ) is borrowed from another thread, found here, that seems to describe my problem. The issue I am facing is that the expression matches the beginning tag, but it does not find the ending tag and instead includes the remainder of the data. Also, the lookaround in the middle slowed the expression down a lot. Any suggestions as to how I can get this working?
EDIT: I understand that parsing HTML in RegEx isn't the best option (makes me feel dirty), but I'm in a time-crunch and any other alternative I can think of will take too long. It's hard to say what exactly the markup I will be parsing will look like, as I am creating it on the fly. The best I can do is to say that I am looking at a large table of data that is collected for a range of items on a range of dates. Both of these ranges can vary, and I am trying to select a certain range of dates from a single row. The approximate value of startText and endText are \\#\\#ASSET_ID\\#\\#_<YYYY_MM_DD>. The idea is to find the code that corresponds to this range of cells. (This edit could quite possibly have made this even more confusing, but I'm not sure how much more information I could really give without explaining the entire application).
EDIT: Well, this was a stupid question. Apparently, I just forgot to add .* after the last paren. Can't believe I spent so long on this! Thanks to those of you that tried to help!
First of all, why is there a .* Dot Asterisk in the beginning? If you have text like the following:
This is my Text
And you want "my Text" pulled out, you do my\sText. You don't have to do the .*.
That being said, since all you'll be matching now is what you need, you don't need the main Capture Group around "Everything". This: .*(xxx) is a huge no-no, and can almost always be replaced with this: xxx. In other words, your regex could be replaced with:
<[^>]+xxx((?!zzz).)*zzz
From there I examine what it's doing.
You are looking for an HTML opening Delimeter <. You consume it.
You consume at least one character that is NOT a Closing HTML Delimeter, but can consume many. This is important, because if your tag is <table border=2>, then you have, at minimum, so far consumed <t, if not more.
You are now looking for a StartText. If that StartText is table, you'll never find it, because you have consumed the t. So replace that + with a *.
The regex is still success if the following is NOT the closing text, but starts from the VERY END of the document, because the Asterisk is being Greedy. I suggest making it lazy by adding a ?.
When the backtracking fails, it will look for the closing text and gather it successfully.
The result of that logic:
<[^>]*xxx((?!zzz).)*?zzz
If you're going to use a dot anyway, which is okay for new Regex writers, but not suggested for seasoned, I'd go with this:
<[^>]*xxx.*?zzz
So for Javascript, your code would say:
matcher = new RegExp("<[^>]*" + startText + ".*?" + endText, 'gi');
I put the IgnoreCase "i" in there for good measure, but you may or may not want that.

Why does Closure Compiler insist on adding more bytes?

If I give Closure Compiler something like this:
window.array = '0123456789'.split('');
It "compiles" it to this:
window.array="0,1,2,3,4,5,6,7,8,9".split(",");
Now as you can tell, that's bigger. Is there any reason why Closure Compiler is doing this?
I think this is what's going on, but I am by no means certain...
The code that causes the insertion of commas is tryMinimizeStringArrayLiteral in PeepholeSubstituteAlternateSyntax.java.
That method contains a list of characters that are likely to have a low Huffman encoding, and are therefore preferable to split on than other characters. You can see the result of this if you try something like this:
"a b c d e f g".split(" "); //Uncompiled, split on spaces
"a,b,c,d,e,f,g".split(","); //Compiled, split on commas (same size)
The compiler will replace the character you try to split on with one it thinks is favourable. It does so by iterating over the characters of the string and finding the most favourable splitting character that does not occur within the string:
// These delimiters are chars that appears a lot in the program therefore
// probably have a small Huffman encoding.
NEXT_DELIMITER: for (char delimiter : new char[]{',', ' ', ';', '{', '}'}) {
for (String cur : strings) {
if (cur.indexOf(delimiter) != -1) {
continue NEXT_DELIMITER;
}
}
String template = Joiner.on(delimiter).join(strings);
//...
}
In the above snippet you can see the array of characters the compiler claims to be optimal to split on. The comma is first (which is why in my space example above, the spaces have been replaced by commas).
I believe the insertion of commas in the case where the string to split on is the empty string may simply be an oversight. There does not appear to be any special treatment of this case, so it's treated like any other split call and each character is joined with the first appropriate character from the array shown in the above snippet.
Another example of how the compiler deals with the split method:
"a,;b;c;d;e;f;g".split(";"); //Uncompiled, split on semi-colons
"a, b c d e f g".split(" "); //Compiled, split on spaces
This time, since the original string already contains a comma (and we don't want to split on the comma character), the comma can't be chosen from the array of low-Huffman-encoded characters, so the next best choice is selected (the space).
Update
Following some further research into this, it is definitely not a bug. This behaviour is actually by design, and in my opinion it's a very clever little optimisation, when you bear in mind that the Closure compiler tends to favour the speed of the compiled code over size.
Above I mentioned Huffman encoding a couple of times. The Huffman coding algorithm, explained very simply, assigns a weight to each character appearing the the text to be encoded. The weight is based on the frequency with which each character appears. These frequencies are used to build a binary tree, with the most common character at the root. That means the most common characters are quicker to decode, since they are closer to the root of the tree.
And since the Huffman algorithm is a large part of the DEFLATE algorithm used by gzip. So if your web server is configured to use gzip, your users will be benefiting from this clever optimisation.
This issue was fixed on Apr 20, 2012 see revision:
https://code.google.com/p/closure-compiler/source/detail?r=1267364f742588a835d78808d0eef8c9f8ba8161
Ironically, split in the compiled code has nothing to do with split in the source. Consider:
Source : a = ["0","1","2","3","4","5"]
Compiled: a="0,1,2,3,4,5".split(",")
Here, split is just a way to represent long arrays (long enough for sum of all quotes + commas to be longer than split(","") ). So, what's going on in your example? First, the compiler sees a string function applied to a constant and evaluates it right away:
'0123456789'.split('') => ["0","1","2","3","4","5","6","7","8","9"]
At some later point, when generating output, the compiler considers this array to be "long" and writes it in the above "split" form:
["0","1","2","3","4","5","6","7","8","9"] => "0,1,2,3,4,5,6,7,8,9".split(",")
Note that all information about split('') in the source is already lost at this point.
If the source string were shorter, it would be generated in the array array form, without extra splitting:
Source : a = '0123'.split('')
Compiled: a=["0","1","2","3"]

Truncate strings that have no whitespace

I'm having an issue where some strings are overflowing their container. I can, of course use overflow:hidden, but it doesn't look that great as I would prefer text-overflow: ellipsis. However, the cross-platform support for that is iffy at best, and the existing solutions I've found don't allow the string to wrap if it contains spaces.
So, my workaround is to truncate the strings in Javascript if the string doesn't contain any spaces for the first N characters. Obviously it won't work as intended if the user doesn't have the correct font installed or if they're zoomed in, but in that case, overflow:hidden will kick in, so I'm okay with that.
Now my question is, how can I do this in as efficient a manner as possible? Will I have to loop through the first N characters and see if each one is a space, and if none of them are, do a string.substring(0, N-3) and then append an ellipsis to it?
Shouldn't have to loop to do simple "is there a space" detection:
var shortened = yourstring.substr(0,N);
if (shortened.indexOf(' ') == -1) {
var shortened = shortened.substr(0, N-3) + '...';
}

Categories

Resources