Differentiate normal text from text in quotes

Differentiate normal text from text in quotes - javascript

For a project that contains shortened JS code embedded onto a webpage, I want to know if text - which is from the value of a textarea on the webpage - is in quotes or not.
I already have this RegExp:
/(?:^|")([^"]*)(?:$|")/
It behaves weirdly when running .exec() on it via about:blank with something like "\"console\" console \"asdf\" asdf \"consolea\" consolea" (AKA only """ and "") , but I think it's because I don't really understand what the resulting data means nor am using it correctly or have the correct one.
What I'd want my code to do abstractly is this:
[Completed] Get the stringified value of the textarea on the page by its ID.
If console without any extra characters is included before the quoted text, get the quoted text minus its quotes (inside of a string, e.g. "text" instead of "\"text\"") just after it regardless of new-lines, provided that its starting quote comes before anything else after console.
Log the refined string to the console via console.log.
Code:
<!DOCTYPE html>
<html>
<head>
<meta charset="UTF-8">
<title>Custom Programming Language</title>
</head>
<body>
<textarea id="code"></textarea>
<br>
<button id="run">Run!</button>
<script>
var code = document.getElementById("code").value.toString();
// etc.
</script>
</body>
</html>

If you already have an algorithm in mind, just map it to a regular expression - let's break step 2 down:
"If console without any extra characters" - match (?:^|\s+)console\s+ ("console" at line start or preceded by one or more spaces)
"before the quoted text minus its quotes" - match \\?"(.+?)\\?" (anything wrapped in quotes as a capturing group, quantify lazily to match the first closing quote). If you only allow escaped quotes, remove the ? quantifier.
"regardless of new-lines" - set the m flag for multiline behavior.
All of the above combined yields us /(?:^|\s+)console\s+\\?"(.+?)\\?"/gm
(() => {
const code = document.querySelector("#code");
const btn = document.querySelector("#run");
code.value = `console \"test\"
some other code here
console \"another test\"
`;
const regex = /(?:^|\s+)console\s+\\?"(.+?)\\?"/gm;
btn.addEventListener("click", () => {
const { value } = code;
[...value.matchAll(regex)].forEach(m => console.log(m[1]));
});
})();
textarea {
width: 50vw;
height: 50vh;
}
<!DOCTYPE html>
<html>
<head>
<meta charset="UTF-8">
<title>Custom Programming Language</title>
</head>
<body>
<textarea id="code"></textarea>
<br>
<button id="run">Run!</button>
</body>
</html>

Related

Javascript innerText - carriage return - regex not working

I am trying to parse some text, and innerText is not outputing the newline characters. I have used white-space, not sure why it's not working.
The parts variable should have 3 strings in this case, but only getting one string.
I am sure it must be something trivial I am missing.
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Document</title>
</head>
<style>
#test1 {
white-space: pre-wrap;
}
</style>
<body>
<div id='test1'>1
00:00:13,513 --> 00:00:16,607
a
2
00:00:18,218 --> 00:00:20,516
b
3
00:00:22,355 --> 00:00:24,880
c
</div>
</body>
<script>
var test1 = document.getElementById('test1').innerText;
<!-- This is not working, parts should have 3 elements, but it cannot find newline character so only has one element -->
var parts = test1.split(/\r?\n\s+\r?\n/g);
console.log(parts)
</script>
</html>
Update
Thanks for the answers, but my string is a little more complicated then abc. I updated the code with a more real example. The regex is taken from a srt file parsing code, and it works if I upload the file, but not when I paste in the text. What's wrong with the html? I am trying to look at regex101 site to see if I can figure this out.

Your regular expression isn't properly formatted. \r?\n\s+\r?\n means:
\r? - Optionally match a line feed
\n - Match a newline
\s+ - Match one or more space characters
\r? - Optionally match a line feed
\n - Match a newline
It requires at least a newline, followed by spaces, followed by another newline. But since there aren't two consecutive newlines in the input text, nothing gets split.
To match full lines, I'd just split by \n instead, trim each string, and filter out the empty ones:
const text = `
a
b
c
`;
const result = text
.split('\n')
.map(str => str.trim())
.filter(Boolean);
console.log(result);
If you wanted to do this with a single regular expression, match \S (non-space), followed by as many characters as you can until getting to the end of the line:
const text = `
a
b
c
`;
const result = text.match(/\S(?:.*\S)?/g);
console.log(result);
Given the changed text, if you want to match it instead, remove the \s+ from your regex, since there are no space characters between the two consecutive newlines:
const text = `
1
00:00:13,513 --> 00:00:16,607
a
2
00:00:18,218 --> 00:00:20,516
b
3
00:00:22,355 --> 00:00:24,880
c
`;
console.log(
text.split(/(?:\r?\n){2}/)
);

Just use
var parts = test1.split(/\s+/g).filter(n => n);
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Document</title>
</head>
<style>
#test1 {
white-space: pre-wrap;
}
</style>
<body>
<div id='test1'>
a b c
</div>
</body>
<script>
var test1 = document.getElementById('test1').innerText;
<!-- This is not working, parts should have 3 elements, but it cannot find newline character so only has one element -->
var parts = test1.split(/\s+/g).filter(n => n);
console.log(parts)
</script>
</html>

I found out that with the SRT subtitles file format, it needs a CR (carriage return) for this regex to work.
When you put text in a div, it ignores the CR characters, so they are not detected by innerText, so that's why this regex doesn't work.
When you do:
var parts = test1.split('\r')
It returns 0 matches, because the html hides the carriage return characters.
I decided to encode my string in base64 and storing it in a input, instead of storing it in div as is.

unicode chars give "unterminated string literal" in js

This error is generated when my HTML has some weird characters seen as a whitespace.
<html xmlns="http://www.w3.org/1999/xhtml">
<head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<title></title>
</head>
<body>
<p>Some  Text</p>
</body>
</html>
Note that there is a character between Some and Text, but it is not seen here. I need to pass this to a function toJson(), but it returns an error saying unterminated string literal.
Everything just works fine when I use a simple text instead of this like:
Some<space>Text works fine.
I've tried all the str_replace function which I found while searching for the same -
1) var re = /(?![\x00-\x7F]|[\xC0-\xDF][\x80-\xBF]|[\xE0-\xEF][\x80-\xBF]{2}|[\xF0-\xF7][\x80-\xBF]{3})./g;
params.body_html = html.replace(re, '');
angular.toJson(params); // gives error
2) params.body_html.replace(/\uFFFD/g, '');
angular.toJson(params); // gives error
I don't know what character is this(may be unicode). When I copy this to a emacs file, it is seen as �򠠨.
Note: You see this character as a red dot when you edit this question and click on edit the snippet for the above html.
Any hints/ideas of how I can make this work ?

Got this working with:
params.body_html = params.body_html.replace(/\u2028/g, '');
angular.toJson(params); //works fine.
Thanks to #Gothdo for providing the character link.
But the problem is it'll only replace if html has only this particular unicode char. Is there any function with which all unicode characters gets replaced or trimmed ?

Display path with backslash (javascript)

I try to display a path on an simple javascript alert command :
<html xmlns="http://www.w3.org/1999/xhtml" lang="fr" xml:lang="fr">
<head>
</head>
<body>
<div onClick=myFunction('D:\user\myself\dos')>
clic here
</div>
<SCRIPT LANGUAGE = "JAVASCRIPT">
function myFunction(p) {
alert(p);
}
</SCRIPT>
But it does not display the backslash..
I suppose I should replace all "\" by "\" but I don't find a way to do it.
(I tried p = p.replace(/\\/g, '\\\\'); and a lot of other syntaxes but none of those worked.
Do you have any idea of how to deal with that ?
EDIT :
The path comes out from a function and I can't edit it directly in "onClick"

The backslash '\' itself is used as the escape character.
So add one more backslash before every backslash you are going to display.
In case if you cannot modify url try to add new attribute and access that attribute within onClick handler.
Try working snippet below:
function myFunction(elem) {
alert(elem.getAttribute('data-url'));
}
<html xmlns="http://www.w3.org/1999/xhtml" lang="fr" xml:lang="fr">
<head>
</head>
<body>
<div data-url="D:\user\myself\dos" onClick=myFunction(this)>
clic here
</div>
Update: Code snippet updated to allow displaying url without modifying string.

You just need to call your function with double the backslashes to escape the escape character:
myFunction('D:\\user\\myself\\dos')
Will this work in your case?

Why can't I get this entity code to display correctly in a browser?

I'm trying to code a UK Pound symbol to be written to a document by JavaScript, but it's code is not being translated and is instead displayed as entered.
See this JSBin http://jsbin.com/orocox/1/edit
This is the JavaScript:
$("#price").text('£ 1.99');
This is the html:
<!DOCTYPE html>
<html>
<head>
<script src="http://ajax.googleapis.com/ajax/libs/jquery/1.8/jquery.min.js"></script>
<meta charset=utf-8 />
<title>JS Bin</title>
</head>
<body>
<span id="price"></span>
</body>
</html>
This is the result:
'&pound(;) 1.99'
*Note that the parenthesis around the ';' are added by me to prevent the entity code from being translated on StackOverflow, but they do not exist in the actual output.
The result I want is:
'£ 1.99'.

use unicode instead: jsbin
$("#price").text('\u00A3 1.99');
explanation: the £ is an html entity and is not processed as normal text. but unicode works for any text. since you are using text it is processed as a string not an html.
check this page's encoding reference : here

Try $("#price").html('£ 1.99'); instead.

Use the character itself:
$("#price").text('£ 1.99');
This is good for the readability of your code. How you type “£” depends on your editing environment. E.g., on Windows, you can produce it by typing Alt 0163 if you cannot find any more convenient way (depending on keyboard, keyboard layout, and editor being used).

regex - replace multi line breaks with single in javascript

this is some kind of variable content in javascript:
<meta charset="utf-8">
<title>Some Meep meta, awesome</title>
<-- some comment here -->
<meta name="someMeta, yay" content="meep">
</head>
I want to reduce the multi line breaks (unknown number) to a single line break while the rest of the formatting is still maintained. This should be done in javascript with a regex.
I have problems with the tabulator or to keep the format.

Try this:
text.replace(/\n\s*\n/g, '\n');
This basically looks for two line breaks with only whitespace in between. And then it replaces those by a single line break. Due to the global flag g, this is repeated for every possible match.
edit:
is it possibile to leave a double line break instead of a single
Sure, simplest way would be to just look for three line breaks and replace them by two:
text.replace(/\n\s*\n\s*\n/g, '\n\n');
If you want to maintain the whitespace on one of the lines (for whatever reason), you could also do it like this:
text.replace(/(\n\s*?\n)\s*\n/, '$1');

myText = myText.replace(/\n{2,}/g, '\n');
See demo

Given the following (remember to encode HTML entities such as <, > and (among others, obviously) &):
<pre>
<head>
<meta charset="utf-8">
<title>Some Meep meta, awesome</title>
<-- some comment here -->
<meta name="someMeta, yay" content="meep">
</head>
</pre>
<pre>
</pre>
The following JavaScript works:
var nHTML = document.getElementsByTagName('pre')[0].textContent.replace(/[\r\n]{2,}/g,'\r\n');
document.getElementsByTagName('pre')[1].appendChild(document.createTextNode(nHTML));
JS Fiddle demo.

To replace all the extra line breaks and leave only one use:
myText = myText.replace(/\n\n*/g,'\r\n');

Develop Reference

JavaScript is the programming language of the Web.

Differentiate normal text from text in quotes - javascript

Related

Javascript innerText - carriage return - regex not working

unicode chars give "unterminated string literal" in js

Display path with backslash (javascript)

Why can't I get this entity code to display correctly in a browser?

regex - replace multi line breaks with single in javascript

Categories

Resources