Rendering Plaintext as HTML maintaining whitespace – without <pre> - javascript

Given any arbitrary text file full of printable characters, how can this be converted to HTML that would be rendered exactly the same (with the following requirements)?
Does not rely on any but the default HTML whitespace rules
No <pre> tag
No CSS white-space rules
<p> tags are fine, but not required (<br />s and/or <div>s are fine)
Whitespace is maintained exactly.
Given the following lines of input (ignore erroneous auto syntax highlighting):
Line one
Line two, indented four spaces
A browser should render the output exactly the same, maintaining the indentation of the second line and the gap between "indented" and "spaces". Of course, I am not actually looking for monospaced output, and the font is orthogonal to the algorithm/markup.
Given the two lines as a complete input file, example correct output would be:
Line one<br /> Line two,
indented four spaces
Soft wrapping in the browser is desirable. That is, the resulting HTML should not force the user to scroll, even when input lines are wider than their viewport (assuming individual words are still narrowing than said viewport).
I’m looking for fully defined algorithm. Bonus points for implementation in python or javascript.
(Please do not just answer that I should be using <pre> tags or a CSS white-space rule, as my requirements render those options untenable. Please also don’t post untested and/or naïve suggestions such as “replace all spaces with .” After all, I’m positive a solution is technically possible — it’s an interesting problem, don’t you think?)

The solution to do that while still allowing the browser to wrap long lines is to replace each sequence of two spaces with a space and a non break space.
The browser will correctly render all spaces (normal and non break ones), while still wrapping long lines (due to normal spaces).
Javascript:
text = html_escape(text); // dummy function
text = text.replace(/\t/g, ' ')
.replace(/ /g, ' ')
.replace(/ /g, ' ') // second pass
// handles odd number of spaces, where we
// end up with " " + " " + " "
.replace(/\r\n|\n|\r/g, '<br />');

Use a zero-width space (​) to preserve whitespace and allow the text to wrap. The basic idea is to pair each space or sequence of spaces with a zero-width space. Then replace each space with a non-breaking space. You'll also want to encode html and add line breaks.
If you don't care about unicode characters, it's trivial. You can just use string.replace():
function textToHTML(text)
{
return ((text || "") + "") // make sure it is a string;
.replace(/&/g, "&")
.replace(/</g, "<")
.replace(/>/g, ">")
.replace(/\t/g, " ")
.replace(/ /g, "​ ​")
.replace(/\r\n|\r|\n/g, "<br />");
}
If it's ok for the white space to wrap, pair each space with a zero-width space as above. Otherwise, to keep white space together, pair each sequence of spaces with a zero-width space:
.replace(/ /g, " ")
.replace(/(( )+)/g, "​$1​")
To encode unicode characters, it's a little more complex. You need to iterate the string:
var charEncodings = {
"\t": " ",
" ": " ",
"&": "&",
"<": "<",
">": ">",
"\n": "<br />",
"\r": "<br />"
};
var space = /[\t ]/;
var noWidthSpace = "​";
function textToHTML(text)
{
text = (text || "") + ""; // make sure it is a string;
text = text.replace(/\r\n/g, "\n"); // avoid adding two <br /> tags
var html = "";
var lastChar = "";
for (var i in text)
{
var char = text[i];
var charCode = text.charCodeAt(i);
if (space.test(char) && !space.test(lastChar) && space.test(text[i + 1] || ""))
{
html += noWidthSpace;
}
html += char in charEncodings ? charEncodings[char] :
charCode > 127 ? "&#" + charCode + ";" : char;
lastChar = char;
}
return html;
}
Now, just a comment. Without using monospace fonts, you'll lose some formatting. Consider how these lines of text with a monospace font form columns:
ten seven spaces
eleven four spaces
Without the monospaced font, you will lose the columns:
ten seven spaces
eleven four spaces
It seems that the algorithm to fix that would be very complex.

While this doesn't quite meet all your requirements — for one thing it doesn't handle tabs, I've used the following gem, which adds a wordWrap() method to Javascript Strings, on a couple of occasions to do something similar to what you're describing — so it might be a good starting point to come up with something that also does the additional things you want.
//+ Jonas Raoni Soares Silva
//# http://jsfromhell.com/string/wordwrap [rev. #2]
// String.wordWrap(maxLength: Integer,
// [breakWith: String = "\n"],
// [cutType: Integer = 0]): String
//
// Returns an string with the extra characters/words "broken".
//
// maxLength maximum amount of characters per line
// breakWith string that will be added whenever one is needed to
// break the line
// cutType 0 = words longer than "maxLength" will not be broken
// 1 = words will be broken when needed
// 2 = any word that trespasses the limit will be broken
String.prototype.wordWrap = function(m, b, c){
var i, j, l, s, r;
if(m < 1)
return this;
for(i = -1, l = (r = this.split("\n")).length; ++i < l; r[i] += s)
for(s = r[i], r[i] = ""; s.length > m; r[i] += s.slice(0, j) + ((s = s.slice(j)).length ? b : ""))
j = c == 2 || (j = s.slice(0, m + 1).match(/\S*(\s)?$/))[1] ? m : j.input.length - j[0].length
|| c == 1 && m || j.input.length + (j = s.slice(m).match(/^\S*/)).input.length;
return r.join("\n");
};
I'd also like to comment that it seems to me as though, in general, you'd want to use a monospaced font if tabs are involved because the width of words would vary with the proportional font used (making the results of using of tab stops very font dependent).
Update: Here's a slightly more readable version courtesy of an online javascript beautifier:
String.prototype.wordWrap = function(m, b, c) {
var i, j, l, s, r;
if (m < 1)
return this;
for (i = -1, l = (r = this.split("\n")).length; ++i < l; r[i] += s)
for (s = r[i], r[i] = ""; s.length > m; r[i] += s.slice(0, j) + ((s =
s.slice(j)).length ? b : ""))
j = c == 2 || (j = s.slice(0, m + 1).match(/\S*(\s)?$/))[1] ? m :
j.input.length - j[0].length || c == 1 && m || j.input.length +
(j = s.slice(m).match(/^\S*/)).input.length;
return r.join("\n");
};

Is is very simple if you use jQuery library in your project.
Just one line ,Add asHTml extenstion to String Class and :
var plain='<a> i am text plain </a>'
plain.asHtml();
/* '<a> i am text plain </a>' */
DEMO :http://jsfiddle.net/abdennour/B6vGG/3/
Note : You will not have to access to DoM . Just use builder design pattern of jQuery $('<tagName />')

Related

Javascript - Can't copy ASCII code 13 (Carriage Return)

(Q) How to copy (thousands of) carriage returns (not new lines) from one textarea to another?
I used the script below to check for "broken characters".
And apparently the carriage return is the only character that won't copy correctly.
I would like to avoid substituting.
(Edit 1)
Carriage returns (code 13) automatically get converted to "\n".
"\n".charCodeAt(0); returns 10.
I need it to return 13.
(Q) Is there a way to convert all carriage returns that got converted to new lines back to carriage returns, without converting new lines that are not converted from a carriage return?
(Edit 2)
It seems like I will have to use a substitute for carriage returns.
(Q) Any suggestions?
function getListOfChars()
{
let arrayOfChars = [];
for(let charCode = 0; charCode < 65536 /*1114112*/; charCode++)
{
arrayOfChars.push(String.fromCharCode(charCode));
//arrayOfChars.push(String.fromCodePoint(charCode));
}
return arrayOfChars;
}
function getBrokenChars()
{
let listOfChars = getListOfChars();
let listOfBrokenChars = [];
let char;
let textareaValue;
let textareaValueCharCode;
for(let x = 0; x < listOfChars.length; x++)
{
char = listOfChars[x];
document.getElementById('textarea').value = char;
textareaValue = document.getElementById('textarea').value;
textareaValueCharCode = textareaValue.charCodeAt(0);
//textareaValueCharCode = textareaValue.codePointAt(0);
if(x !== textareaValueCharCode)
{
listOfBrokenChars.push(char);
console.log("\"" + char + "\"" + " (" + x + ")" + " -> " + "\"" + textareaValue + "\"" + " (" + textareaValueCharCode + ")");
}
}
return listOfBrokenChars;
}
let brokenChars = getBrokenChars();
<!DOCTYPE html>
<html>
<head>
<title>Fix Bug Char Codes</title>
</head>
<body>
<textarea id="textarea">a</textarea>
</body>
<script src="fixBugCharCodes.js"></script>
</html>
You can'tg copy \r to a text area (at least in Firefox when tested) because cross-platform browsers treat a carriage return as being used as a newline character under some O/S, and replace it with a newline character, '\n', so it gets treated as newline in JavaSdcript.
Similarly if you put an MSDOS new line pair CRLF ('\r\n`) into a text area, browsers will convert the pair to a single newline charavter.

Finding the index to a non-specified character

Let's say for example I have a string
thisIsThisTuesday Day
I want to find the index of all the capital letters, test if there is a space before it, and if not insert one. I would need the index of each one.
At least from what I can see indexOf(String) will only produce the index of the first occurance of the character T/t
This :
for(i=0;i<str.length;i++){
let char=str[i];
if(isNaN(char*1)&&char==char.toUpperCase()){
y=str.indexOf(char);
console.log(char,y)
}
}
would produce the capital letters, and their indexes but will only display the first occurrence of the character in question. I feel pretty confident that the part I am missing is a for() loop in order to move the index iteration..but it escapes me.
Thank you in advance!
You can use a regex:
It matches any non-whitespace character followed by a capital letter and replaces it by the two characters with a space between.
const str = "thisIsThisTuesday Day";
const newstr = str.replace(/([^ ])([A-Z])/g, "$1 $2");
console.log(newstr);
You can use the following regular expression:
/(?<=\S)(?=[A-Z])/g
The replace will insert spaced between characters which are non-space followed by a capital letter.
See example below:
let str = "thisIsThisTuesday Day";
const res = str.replace(/(?<=\S)(?=[A-Z])/g, ' ');
console.log(res);
Note: As pointed out ?<= (positive lookbehind) is currently not be available in all browsers.
Actually, the String.indexOf function can take a second argument, specifying the character it should start searching from. Take a look at: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/indexOf
But, if you just want to find all capital letters and prefix them with a space character, if one is not found, there are many approaches, for example:
var str = "thisIsThisTuesday Day";
var ret = '';
for (var i=0; i<str.length; i++) {
if (str.substr(i, 1) == str.substr(i, 1).toUpperCase()) {
if ((i > 0) && (str.substr(i - 1,1) != " "))
ret += " ";
}
ret += str.substr(i,1);
}
After running this, ret will hold the value "this Is This Tuesday Day"
You could iterate over the string and check if each character is a capital. Something like this:
const s = 'thisIsThisTuesday Day';
const format = (s) => {
let string = '';
for (let c of s) {
if (c.match(/[A-Z]/)) string += ' ';
string += c;
}
return string;
};
console.log(format(s));
Or alternatively with reduce function:
const s = 'thisIsThisTuesday Day';
const format = (s) => s.split('').reduce((acc, c) => c.match(/[A-Z]/) ? acc + ` ${c}` : acc + c, '');
console.log(format(s));

Limit number of characters per line and wrap while preserving leading white space

My goal is:
Select a block of text (for this example, just a string).
limit number of characters per line.
preserve leading whitespace for each line and reapply it after text is wrapped.
I'm able to limit the number of characters per line correctly, but I'm having trouble with the white space, etc...
Any help would be appreciated
Fiddle
var str = `i am a string that has new lines and whitespace. I need to preserve the leading whitespace and add it back on after the string has been broken up after n characters.
This line has leading whitespace. Tttttt rrrrrr
ttgvgggjjj. Gyjfry bh jkkfrtuj hhdt iihdrtttg.
Here is another line. Hjkkl gggdetu jcfgjbfftt.
This line has no leading whitespace, so i dont need any reapplied. Jjjxsrg bjlkdetyhk llhfftt`;
function addNewlines(str) {
var result = '';
while(str.length > 0) {
result += str.substring(0, 25) + '<br />';
str = str.substring(25);
}
return result;
}
var newStr = addNewlines(str).toString();
document.getElementById("result").innerHTML = newStr;
Should end up looking something like this:
i am a string that has ne
w lines and whitespace. I
need to preserve the lea
ding whitespace and add i
t back on after the strin
g has been broken up afte
r n characters.
This line has leading
whitespace. Tttttt rr
rrrr ttgvgggjjj. Gyjf
ry bh jkkfrtuj hhdt i
ihdrtttg. Here is ano
ther line. Hjkkl gggd
etu jcfgjbfftt.
This line has no leading
whitespace, so i dont n
eed any reapplied. Jjjx
srg bjlkdetyhk llhfftt
Sometimes when dealing with a new algorithm, is much easier to use two or more passes. So you think how it would work in steps instead of everything at once.
I have an implementation working that have 2 passes: first I join the paragraph lines, and in the second pass I perform the actual splitting.
I'm sure there's a better approach, but this works and is fully commented.
I wasn't sure your version of ES so I made it ES5-compatible.
https://jsfiddle.net/2ngtj3aj/
// Divides a string into chunks of specific size
function chunk(str, size) {
var chunks = [];
while(str) {
chunks.push(str.substring(0, size));
str = str.substring(size);
}
return chunks;
}
// Removes all spaces from the left of a string
function trimLeft(str) {
while(str.substr(0,1) == " ") {
str = str.substr(1);
}
return str;
}
// Repeats a character n times
function repeat(c, n) {
return Array(n + 1).join(c);
}
function addNewlines(str) {
var MAX_COLS = 25; // maximum colums on the text
var DEFAULT_LEADING = 3; // default leading to reapply
var MIN_LEADING = 1; // minimum amount of spacing to be considered a paragraph
var CR = "\n";
var result = '';
var leading = 0;
var chunks = [];
var formattedLines = []; // store the intermediary lines
var startLeadingSpaceLine = -1; // where does a paragrph start
var i, l; // counters
var lines = str.split(CR); // input lines
// In the first pass, we join the paragraph lines
for (i = 0; i < lines.length; i++) {
l = lines[i];
// If line is empty, we don't use it
if (l.trim() == "") continue;
if (l.substr(0, MIN_LEADING) == repeat(" ", MIN_LEADING)) {
// If line has leading whitespace, remove the leading space
l = trimLeft(l);
if (startLeadingSpaceLine > -1) {
// If we are already on a paragraph,
// we don't overwrite the flag
} else {
// But if this is the first line of an paragraph,
// We set a flag to allow to join this line with the next one
// if that contains identation as well
startLeadingSpaceLine = i;
}
// If we are on a paragraph, we don't add this line to the array,
// first we need to wait to see if we have more lines in the paragraph
// We also update the line in the array with the whitespace removed
lines[i] = l;
continue;
} else {
// If line doesn't has whitespace, we check if we have just finished
// an paragraph
if (startLeadingSpaceLine > -1) {
// If we do, then we need to add the previous lines to the array
// Note: if we want to leave a space between lines, we need to use
// join(' ') instead of join('')
var paragraphLines = lines.slice(startLeadingSpaceLine, i).join('');
// We add the whitespace we like
paragraphLines = repeat(" ", DEFAULT_LEADING) + paragraphLines;
formattedLines.push(paragraphLines);
}
}
formattedLines.push(l);
}
// Now we parse again the lines, this time we will divide
// the lines into chunks
for (i = 0; i < formattedLines.length; i++) {
l = formattedLines[i];
// Now check against DEFAULT_LEADAING since we have already changed
// the identation
if (l.substr(0, DEFAULT_LEADING) == repeat(" ", DEFAULT_LEADING)) {
// If line has leading whitespace, remove the leading space
// We aded it before just to be able to detect the paragraph.
l = trimLeft(l);
// Divide the line into chunks. We take into account the space
// we have removed, otherwise the paragraph will bleed to the
// right.
l = chunk(l, MAX_COLS - DEFAULT_LEADING);
// We add leading space to all paragraph lines
for(var j = 0; j < l.length; j++) {
l[j] = repeat(" ", DEFAULT_LEADING) + l[j];
}
// Optional: we add blank lines between paragraphs
l = [" "].concat(l).concat([" "]);
} else {
// If we have a simple line, just divide it into chunks
l = chunk(l, MAX_COLS);
}
// Join the lines with newlines and add to the result
l = l.join(CR);
result += l + CR;
}
return result;
}
var process = function() {
var newStr = addNewlines(input.value).toString();
document.getElementById("result").innerHTML = newStr;
}
var input = document.getElementById("input");
input.addEventListener("change", process);
input.addEventListener("keyup", process);
process();
<h3>RESULTS</h3>
<textarea id="input" rows="10" cols="80">i am a string that has new lines and whitespace. I need to preserve the leading whitespace and add it back on after the string has been broken up after n characters.
This line has leading whitespace. Tttttt rrrrrr
ttgvgggjjj. Gyjfry bh jkkfrtuj hhdt iihdrtttg.
Here is another line. Hjkkl gggdetu jcfgjbfftt.
This line has no leading whitespace, so i dont need any reapplied. Jjjxsrg bjlkdetyhk llhfftt</textarea>
<pre id="result"></pre>
General Logic
Your string already contains all whitespaces. If you add console.log(newStr) to your script and look into your console, you'll see that the whitespaces are already there.
You might want to remove all trailing whitespaces (whitespaces before a new line character starts). You can do that by using replace with a regex: var newStr = addNewlines(str).toString().replace(/\s+(?=\n)/g, "");.
Additionally, since all Tab-Characters ("\t") will be recognized as only 1 character but take up way more space than the others, you might want to replace those with 3 or 4 spaces instead. Something like .replace(/\t/g, " ")
Another thing to take into consideration are newlines that are already present before. You'll want to stop counting there and start a new counter after the already present newline.
Displaying inside a Textarea
var str = `i am a string that has new lines and whitespace. I need to preserve the leading whitespace and add it back on after the string has been broken up after n characters.
This line has leading whitespace. Tttttt rrrrrr
ttgvgggjjj. Gyjfry bh jkkfrtuj hhdt iihdrtttg.
Here is another line. Hjkkl gggdetu jcfgjbfftt.
This line has no leading whitespace, so i dont need any reapplied. Jjjxsrg bjlkdetyhk llhfftt`;
function addNewlines(str) {
var result = '';
str = str.replace(/\t/g, " ");
while(str.length > 0) {
nPos = str.indexOf("\n");
len = nPos > 0 && nPos < 25 ? nPos + 1 : 25;
result += str.substring(0, len) + '\n';
str = str.substring(len);
}
return result;
}
var newStr = addNewlines(str).toString().replace(/\s+(?=\n)/g, "");
document.getElementById("result").value = newStr;
<textarea id="result"></textarea>
Displaying in HTML
If you want to display those whitespaces in HTML, then you can make use of the CSS Property white-space: pre.
var str = `i am a string that has new lines and whitespace. I need to preserve the leading whitespace and add it back on after the string has been broken up after n characters.
This line has leading whitespace. Tttttt rrrrrr
ttgvgggjjj. Gyjfry bh jkkfrtuj hhdt iihdrtttg.
Here is another line. Hjkkl gggdetu jcfgjbfftt.
This line has no leading whitespace, so i dont need any reapplied. Jjjxsrg bjlkdetyhk llhfftt`;
function addNewlines(str) {
var result = '';
str = str.replace(/\t/g, " ");
while(str.length > 0) {
nPos = str.indexOf("<br />");
len = nPos > 0 && nPos < 25 ? nPos + 1 : 25;
result += str.substring(0, len) + '\n';
str = str.substring(len);
}
return result;
}
var newStr = addNewlines(str).toString().replace(/\s+(?=\n)/g, "");
console.log(newStr);
document.getElementById("result1").innerHTML = newStr;
document.getElementById("result2").innerHTML = newStr;
document.getElementById("result3").innerHTML = newStr;
document.getElementById("result4").innerHTML = newStr;
document.getElementById("result5").innerHTML = newStr;
div {
font-family: monospace;
}
<h1>normal</h1>
<div id="result1" style="white-space: normal"></div>
<h1>pre</h1>
<div id="result2" style="white-space: pre"></div>
<h1>nowrap</h1>
<div id="result3" style="white-space: nowrap"></div>
<h1>pre-wrap</h1>
<div id="result4" style="white-space: pre-wrap"></div>
<h1>pre-line</h1>
<div id="result5" style="white-space: pre-line"></div>
Also, in your example you're using the tab character to indent your lines. If you wanted those removed as well, then you'd have to remove all occurences of those. You can do that by using another regex and the replace method like this: var newStr = addNewlines(str).toString().replace(/\s+(?=\n)/g, "").replace(/\t/g, "");.

adding a space to every space in a string, then cycling back around until length is met

I have the following while loop as part of my text justify function. The idea is that I have text strings (str) that need to be justified (spaces added to existing spaces in between words) to equal to a given length (len)
The catch is I can only add one space to an existing space at a time before I iterate over to the next space in the string and add another space there. If that's it for all spaces in the string and it's still not at the required length, I cycle back over to the original space (now two spaces) and add another. Then it goes to the next space between words and so on and so on. The idea is that any spaces between words in the string should not have a differential of more than one space (i.e. Lorem---ipsum--dolor--sit, not Lorem----ipsum--dolor-sit)
From my research, I decided that using a substring method off the original string to add that first extra space, then I will increment the index and move to the next space in the string and repeat the add. Here's my code:
var indexOf = str.indexOf(" ", 0);
if ( indexOf > -1 ) {
while ( indexOf > -1 && str.length < len ) {
//using a regexp to find a space before a character
var space = /\s(?=\b)/.exec(str);
str = str.substring(0, indexOf + 1) + " " + str.substring(indexOf + 1);
//go to next space in string
indexOf = str.indexOf(space, indexOf + 2);
if ( indexOf === -1 ) {
//loops back to beginning of string
indexOf = str.indexOf(space, 0);
}
}
}
finalResults.push(str);
This code works most of the time, but I noticed that there are instances where the cycle of spacing is not correct. For example, it generates the following string:
sit----amet,--blandit
when the correct iteration would be
sit---amet,---blandit
Any assistance in making this code properly iterate over every space (to add one space) in the string once, then cycling back around to the beginning of the string to start over until the desired length is achieved would be most appreciated.
I think it's more efficient to compute the number spaces required in the beginning.
var s = "today is a friday";
var totalLength = 40;
var tokens = s.split(/\s+/);
var noSpaceLength = s.replace(/\s+/g,'').length;
var minSpace = Math.floor((totalLength - noSpaceLength)/(tokens.length-1));
var remainder = (totalLength - noSpaceLength) % (tokens.length-1);
var out = tokens[0];
for (var i = 1; i < tokens.length; i++) {
var spaces = (i <= remainder ? minSpace+1 : minSpace);
out += "-".repeat(spaces) + tokens[i];
}
$('#out').text(out);
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<div id="out"></div>
This solution
splits the string (s) into words in an array (a)
finds the number of spaces to be added between all words (add)
finds the remainder of spaces to be added between first words (rem)
then sticks the words with add spaces + one if rem is not exhausted
Code
var s = "Caballo sin Nombre"; // assume one space between words
var len = 21; // desired length
var need = len - s.length;
var a = s.split(/ /); // split s
// need>0 and at least two words
if (need > 0 && a.length>1) {
var add = Math.floor(need / (a.length-1)) + 1; // all spaces need that (+existing)
var rem = need % (a.length-1); // remainder
var sp = '';
while (add-- > 0) sp += ' ';
// replace
var i,res = ''; // result
for (i=0 ; i<a.length-1 ; i++) {
res += a[i] + sp;
if (rem-- > 0) res += ' '; // remainder
}
res += a[i];
s = res;
}
console.log("'" + s + "' is " + s.length + " chars long.");
This function adds the spaces using a global replace, carefully limiting the text size.
function expand (txt, colwidth) {
txt = txt.replace (/\s\s+/, ' '); // Ensure no multiple spaces in txt
for (var spaces = ' ', // Spaces to check for
limit = colwidth - txt.length; // number of additional spaces required
limit > 0; // do while limit is positive
spaces += ' ') // add 1 to spaces to search for
txt = txt.replace (RegExp (spaces, 'g'),
function (r) {
// If limit > 0 then add a space else do not.
return limit > 0 && --limit ? r + ' ' : r
});
return txt;
}
for (var w = 21; w--;) console.log (expand ('this is a test.', w));
Shows this on console:
this is a test.
this is a test.
this is a test.
this is a test.
14 this is a test.

Regexp search not surrounded by

I want to find all occurences of % that are not within quotation characters.
Example> "test% testing % '% hello' " would return ["%","%"]
Looking at another stack overflow thread this is what I found:
var patt = /!["'],*%,*!['"]/g
var str = "testing 123 '%' % '% ' "
var res = str.match(patt);
However this gives me null. Have you any tips of what I should do?
Demo
You could try the below positive lookahead assertion based regex.
> var s = "test% testing % '% hello' "
> s.match(/%(?=(?:[^']*'[^']*')*[^']*$)/g)
[ '%', '%' ]
> var str = "testing %"
undefined
> str.match(/%(?=(?:[^']*'[^']*')*[^']*$)/g)
[ '%' ]
> var str1 = "testing '%'"
undefined
> str1.match(/%(?=(?:[^']*'[^']*')*[^']*$)/g)
null
Try this:
var patt= /[^"'].*?(%).*?[^'"]/g ;
var str = "testing 123 '%' % '% ' "
var res = str.match(patt);
console.dir(res[1]); // result will be in the 1st match group: res[1]
Here is the link to the online testing.
Explanation:
[^"'] - any character except " or '
.*? any characters (except new line) any times or zero times not greedy.
Update
Actually you must check if behing and ahead of % there are no quotes.
But:
JavaScript regular expressions do not support lookbehinds
So you have no way to identify " or ' preceding % sign unless more restrictions are applied.
I'd suggest to do searching in php or other language (where lookbehind is supported) or impose more conditions.
Since I'm not a big fan of regular expressions, here's my approach.
What is important in my answer, if there would be a trailing quote in the string, the other answers won't work. In other words, only my answer works in cases where there is odd number of quotes.
function countUnquoted(str, charToCount) {
var i = 0,
len = str.length,
count = 0,
suspects = 0,
char,
flag = false;
for (; i < len; i++) {
char = str.substr(i, 1);
if ("'" === char) {
flag = !flag;
suspects = 0;
} else if (charToCount === char && !flag) {
count++;
} else if (charToCount === char) {
suspects++;
}
}
//this way we can also count occurences in such situation
//that the quotation mark has been opened but not closed till the end of string
if (flag) {
count += suspects;
}
return count;
}
As far as I believe, you wanted to count those percent signs, so there's no need to put them in an array.
In case you really, really need to fill this array, you can do it like that:
function matchUnquoted(str, charToMatch) {
var res = [],
i = 0,
count = countUnquoted(str, charToMatch);
for (; i < count; i++) {
res.push('%');
}
return res;
}
matchUnquoted("test% testing % '% hello' ", "%");
Trailing quote
Here's a comparison of a case when there is a trailing ' (not closed) in the string.
> var s = "test% testing % '% hello' asd ' asd %"
> matchUnquoted(s, '%')
['%', '%', '%']
>
> // Avinash Raj's answer
> s.match(/%(?=(?:[^']*'[^']*')*[^']*$)/g)
['%', '%']
Use this regex: (['"]).*?\1|(%) and the second capture group will have all the % signs that are not inside single or double quotes.
Breakdown:
(['"]).*?\1 captures a single or double quote, followed by anything (lazy) up to a matching single or double quote
|(%) captures a % only if it wasn't slurped up by the first part of the alternation (i.e., if it's not in quotes)

Categories

Resources