var paragraph is giving me "unterminated string constant" error - javascript

I have this file
var paragraph = "Abandon| give up or over| yield| surrender| leave| cede| let go| deliver| turn over| relinquish| depart from| leave| desert| quit| go away from| desert| forsake| jilt| walk out on | give up| renounce| discontinue| forgo| drop| desist| abstain from|
recklessness| intemperance| wantonness| lack of restraint| unrestraint|
abandoned |left alone| forlorn| forsaken| deserted| neglected| rejected| shunned| cast off | jilted| dropped| ";
with a lot of spacing, so it's giving me that error at the spacings.
then running a loop and alerting the output
var sentences = paragraph.split("|");
var newparagraph = "";
for (i = 0; i < sentences.length; i++) {
var words = sentences[i].split(" ");
if (words.length < 4) {
newparagraph += sentences[i] + "|";
}
}
alert(newparagraph);
how do I read from a file that doesn't get errors from spacing?

As the other answers noted, javascript automatically puts a semicolon at the end of (or what it thinks is) every statement. More about it here.
http://en.wikipedia.org/wiki/JavaScript_syntax#Whitespace_and_semicolons
It doesn't understand text with line-breaks in them. You could use the '\' character to signify your text contains line-breaks.
var text = "this is\
a very long\
sentence";
But the above practice is generally frowned upon. Best bet, define strings in one line or use concatenation (+) to break your strings into multiple lines. If your text must contain line-breaks, use '\n' character.

I can't see a quote character at the end of var paragraph = "... line.

You're missing a terminating quote and semi-colon at the end of the var paragraph assignment. You can use a tool like jslint (http://www.jslint.com/) to check your syntax, if in doubt.

Related

Regex split comma except escaped [duplicate]

I have this string:
a\,bcde,fgh,ijk\,lmno,pqrst\,uv
I need a JavaScript function that will split the string by every , but only those that don't have a \ before them
How can this be done?
Here's the shortest thing I could come up with:
'a\\,bcde,fgh,ijk\\,lmno,pqrst\\,uv'.replace(/([^\\]),/g, '$1\u000B').split('\u000B')
The idea behind is to find every place where comma isn't prefixed with a backslash, replace those with string that is uncommon to come up in your strings and then split by that uncommon string.
Note that backslashes before commas have to be escaped using another backslash. Otherwise, javascript treats form \, as escaped comma and produce simply a comma out of it! In other words if you won't escape the backslash, javascript sees this: a\,bcde,fgh,ijk\,lmno,pqrst\,uv as this a,bcde,fgh,ijk,lmno,pqrst,uv.
Since regular expressions in JavaScript does not support lookbehinds, I'm not going to cook up a giant hack to mimic this behavior. Instead, you can just split() on all commas (,) and then glue back the pieces that shouldn't have been split in the first place.
Quick 'n' dirty demo:
var str = 'a\\,bcde,fgh,ijk\\,lmno,pqrst\\,uv'.split(','), // Split on all commas
out = []; // Output
for (var i = 0, j = str.length - 1; i < j; i++) { // Iterate all but last (last can never be glued to non-existing next)
var curr = str[i]; // This piece
if (curr.charAt(curr.length - 1) == '\\') { // If ends with \ ...
curr += ',' + str[++i]; // ... glue with next and skip next (increment i)
}
out.push(curr); // Add to output
}
Another ugly hack around the lack of look-behinds:
function rev(s) {
return s.split('').reverse().join('');
}
var s = 'a\\,bcde,fgh,ijk\\,lmno,pqrst\\,uv';
// Enter bizarro world...
var r = rev(s);
// Split with a look-ahead
var rparts = r.split(/,(?!\\)/);
// And put it back together with double reversing.
var sparts = [ ];
while(rparts.length)
sparts.push(rev(rparts.pop()));
for(var i = 0; i < sparts.length; ++i)
$('#out').append('<pre>' + sparts[i] + '</pre>');
Demo: http://jsfiddle.net/ambiguous/QbBfw/1/
I don't think I'd do this in real life but it works even if it does make me feel dirty. Consider this a curiosity rather than something you should really use.
In case if need remove backslashes also:
var test='a\\.b.c';
var result = test.replace(/\\?\./g, function (t) { return t == '.' ? '\u000B' : '.'; }).split('\u000B');
//result: ["a.b", "c"]
In 2022 most of browsers support lookbehinds:
https://caniuse.com/js-regexp-lookbehind
Safari should be your only concern.
With a lookbehind you can split your string this way:
"a\\,bcde,fgh,ijk\\,lmno,pqrst\\,uv".split(/(?<!\\),/)
// => ['a\\,bcde', 'fgh', 'ijk\\,lmno', 'pqrst\\,uv']
You can use regex to do the split.
Here is the link to regex in javascript http://www.w3schools.com/jsref/jsref_obj_regexp.asp
Here is the link to other post where the author have used regex for split Javascript won't split using regex
From the first link if you note you can create a regular expression using
?!n Matches any string that is not followed by a specific string n
[,]!\\

Splitting a string by white space and a period when not surrounded by quotes

I know that similar questions have been asked many times, but my regular expression knowledge is pretty bad and I can't get it to work for my case.
So here is what I am trying to do:
I have a text and I want to separate the sentences. Each sentence ends with some white space and a period (there can be one or many spaces before the period, but there is always at least one).
At the beginning I used /\s+\./ and it worked great for separating the sentences, but then I noticed that there are cases such as this one:
"some text . some text".
Now, I don't want to separate the text in quotes. I searched and found a lot of solutions that work great for spaces (for example: /(".*?"|[^"\s]+)+(?=\s*|\s*$)/), but I was not able to modify them to separate by white space and a period.
Here is the code that I am using at the moment.
var regex = /\s+\./;
var result = regex.exec(fullText);
if(result == null) {
break;
}
var length = result[0].length;
var startingPoint = result.index;
var currentSentence = fullText.substring(0,startingPoint).trim();
fullText = fullText.substring(startingPoint+length);
I am separating the sentences one by one and removing them from the full text.
The length var represents the size of the portion that needs to be removed and startingPoint is the position on which the portion starts. The code is part of a larger while cycle.
Instead of splitting you may try and match all sentences between delimiters. In this case it will be easier to skip delimiters in quotes. The respective regex is:
(.*?(?:".*?".*?)?|.*?)(?: \.|$)
Demo: https://regex101.com/r/iS9fN6/1
The sentences then may be retrieved in this loop:
while (match = regex.exec(input)) {
console.log(match[1]); // each next sentence is in match[1]
}
BUT! This particular expression makes regex.exec(input) return true infinitely! (Looks like a candidate to one more SO question.)
So I can only suggest a workaround with removing the $ from the expression. This will cause the regex to miss the last part which later may be extracted as a trailer not matched by the regex:
var input = "some text . some text . \"some text . some text\" some text . some text";
//var regex = /(.*?(?:".*?".*?)?|.*?)(?: \.|$)/g;
var regex = /(.*?(?:".*?".*?)?|.*?) \./g;
var trailerPos = 0;
while (match = regex.exec(input)) {
console.log(match[1]); // each next sentence is in match[1]
trailerPos = match.index + match[0].length;
}
if (trailerPos < input.length) {
console.log(input.substring(trailerPos)); // the last sentence in
// input.substring(trailerPos)
}
Update:
If sentences span multiple lines, the regex won't work since . pattern does not match the newline character. In this case just use [\s\S] instead of .:
var input = "some \ntext . some text . \"some\n text . some text\" some text . so\nm\ne text";
var regex = /([\s\S]*?(?:"[\s\S]*?"[\s\S]*?)?|[\s\S]*?) \./g;
var trailerPos = 0;
var sentences = []
while (match = regex.exec(input)) {
sentences.push(match[1]);
trailerPos = match.index + match[0].length;
}
if (trailerPos < input.length) {
sentences.push(input.substring(trailerPos));
}
sentences.forEach(function(s) {
console.log("Sentence: -->%s<--", s);
});
Use the encode and decode of javascript while sending and receiving.

check whether string contains a line break

So, I have to get HTML of textarea and check whether it contains line break. How can i see whether it contain \n because the string return using val() does not contain \n and i am not able to detect it. I tried using .split("\n") but it gave the same result. How can it be done ?
One minute, IDK why when i add \n to textarea as value, it breaks and move to next line.
Line breaks in HTML aren't represented by \n or \r. They can be represented in lots of ways, including the <br> element, or any block element following another (<p></p><p></p>, for instance).
If you're using a textarea, you may find \n or \r (or \r\n) for line breaks, so:
var text = $("#theTextArea").val();
var match = /\r|\n/.exec(text);
if (match) {
// Found one, look at `match` for details, in particular `match.index`
}
Live Example | Source
...but that's just textareas, not HTML elements in general.
var text = $('#total-number').text();
var eachLine = text.split('\n');
alert('Lines found: ' + eachLine.length);
for(var i = 0, l = eachLine.length; i < l; i++) {
alert('Line ' + (i+1) + ': ' + eachLine[i]);
}
You can simply use this
var numberOfLineBreaks = (enteredText.match(/\n/g)||[]).length;

How to detect line breaks in a text area input?

What is the best way to check the text area value for line breaks and then calculate the number of occurrences, if any?
I have a text area on a form on my webpage. I am using JavaScript to grab the value of the text area and then checking its length.
Example
enteredText = textareaVariableName.val();
characterCount = enteredText.length; // One line break entered returns 1
If a user enters a line break in the text area my calculation above gives the line break a length of 1. However I need to give line breaks a length of 2. Therefore I need to check for line breaks and the number of occurrences and then add this onto the total length.
Example of what I want to achieve
enteredText = textareaVariableName.val();
characterCount = enteredText.length + numberOfLineBreaks;
My solution before asking this question was the following:
enteredText = textareaVariableName.val();
enteredTextEncoded = escape(enteredText);
linebreaks = enteredTextEncoded.match(/%0A/g);
(linebreaks != null) ? numberOfLineBreaks = linebreaks.length : numberOfLineBreaks = 0;
I could see that encoding the text and checking for %0A was a bit long-winded, so I was after some better solutions. Thank you for all the suggestions.
You can use match on the string containing the line breaks, and the number of elements in that array should correspond to the number of line breaks.
enteredText = textareaVariableName.val();
numberOfLineBreaks = (enteredText.match(/\n/g)||[]).length;
characterCount = enteredText.length + numberOfLineBreaks;
/\n/g is a regular expression meaning 'look for the character \n (line break), and do it globally (across the whole string).
The ||[] part is just in case there are no line breaks. Match will return null, so we test the length of an empty array instead to avoid errors.
Here's one way:
var count = text.length + text.replace(/[^\n]/g, '').length;
Alternatively, you could replace all the "naked" \n characters with \r\n and then use the overall length.
I'd do this using a regular expression:
var inTxt = document.getElementById('txtAreaId').value;
var charCount = inTxt.length + inTxt.match(/\n/gm).length;
where /\n/ matches linebreaks (obviously), g is the global flag. m stands for mult-line, which you evidently need in this case...Alternatively, though as I recall this is a tad slower:
var charCount = inTxt.length + (inTxt.split("\n").length);
Edit
Just realized that, if no line breaks are matched, this will spit an error, so best do:
charCount = intTxt.length + (inTxt.match(/\n/) !== null ? inTxt.match(/\n/gm).length : 0);
Or something similar...
For new JS use encodeURI(), because escape() is deprecated in ECMAScript 1.5.
Instead use:
enteredText = textareaVariableName.val();
enteredTextEncoded = encodeURI(enteredText);
linebreaks = enteredTextEncoded.match(/%0A/g);
(linebreaks != null) ? numberOfLineBreaks = linebreaks.length : numberOfLineBreaks = 0;
You can split the text based on new lines:
let textArray = text.split(/^/gm)
console.log(textArray.length)

Regex matching JS source that's not in a string or regex literal

Do there exist comprehensive regular expressions that, when applied to JavaScript source code, will match all valid string literals (such as "say \"Hello\"") and regex literals (such as /and\/or/)? The expressions would have to cover all edge cases, including line breaks and escape sequences.
Alternatively, does anyone know of regexes for matching patterns outside of string and regex literals?
My goal is to implement a simple JavaScript syntax extension that allows macros in delimeters (e.g. {{#foo.bar}} or ##foo.bar#) to be expanded by a preprocessor. However, I'd like the macros to be processed only outside of literals.
For now, I'm trying to accomplish this using just string replacement, without having to augment an existing JavaScript lexer/parser.
This JavaScript preprocessor will itself be implemented in JavaScript.
This is the regex that I've been using to match quoted strings which is pretty good since it should work with almost all engines since it does not require backtracking or backreferences or any of that voodoo. This will match all text INSIDE literals.
"(\\.|[^"])*"
Depending on the engine, it might support non capturing groups. In that case you can use
"(?:\\.|[^"])*"
and it should be faster.
I think this is too much for regexes.
Consider var foo = "//" // /"(?:\\.|[^"])*"/. Where do the strings, comments and regex literals start and end? You would need to write a complete JavaScript parser to cover all edge cases. Of course, the parser will be using regexes...
I would probably go about doing something like the following. It will need to be improved for certain possible conditions, though.
var str = '"aaa \"sss \\t bbb" sss #3 ss# ((t sdsds)) ff ';
str += '/gg sdfd \/dsds/ {aaa bbb} {{ss}} {#sdsd#}';
var repeating = ['"','\\\'','/','\\~','\\#'];
// "example" 'example' /example/ ~example~ #example#
var enclosing = [];
enclosing.push(['\\{','\\}']);
enclosing.push(['\\{\\{','\\}\\}']);
enclosing.push(['\\[','\\]']);
enclosing.push(['\\(\\(','\\)\\)']);
// {example} {{example}} [example] ((example))
for (var forEnclosing='',i = 0 ; i < enclosing.length; i++) {
var e = enclosing[i];
var r = e[0]+'(\\\\['+e[0]+e[1]+']|[^'+e[0]+e[1]+'])*'+e[1];
forEnclosing += r + (i < enclosing.length-1 ? '|' : '');
}
for (var forRepeating='',i = 0; i < repeating.length; i++) {
var e = repeating[i];
var r = e+'(\\'+e+'|[^'+e+'])*'+e;
forRepeating += r + (i < repeating.length-1 ? '|' : '');
}
var rx = new RegExp('('+forEnclosing+'|'+forRepeating+')','g');
var m = str.match(rx);
try { for (var i = 0; i < m.length; i++) console.log(m[i]) }
catch(e) {}
Outputs:
"aaa "sss \t bbb"
#3 ss#
((t sdsds))
/gg sdfd /dsds/
{aaa bbb}
{{ss}}
{#sdsd#}
The closest you can get with a regex is to have one regex that matches EITHER a string literal (single- or double-quoted) OR a regex OR a comment (OR whatever else might contain bogus matches) OR one of your macro thingies:
"[^"\\]*(?:\\.[^"\\]*)*"
|
'[^'\\]*(?:\\.[^'\\]*)*'
|
/[^/\\]*(?:\\.[^/\\]*)*/[gim]*
|
/\*[^*]*(?:\*(?!/)[^*]*)*\*/
|
##(\w+\.\w+)#
If group #1 contains anything after the match, it must be what you're looking for. Otherwise, ignore this match and go on to the next one.

Categories

Resources