Counting the number of lines in Google Document

Counting the number of lines in Google Document - javascript

Problem:
I'd like to be able to count the number of lines in a Google Document. For example, the script must return 6 for the following text.
There doesn't seem to be any reliable method of extracting '\n' or '\r' characters from the text though.
text.findText(/\r/g) //OR
text.findText(/\n/g)
The 2nd line of code is not supposed to work anyway, because according to GAS documentation, 'new line characters are automatically converted to /r'

If you are still looking for the solution, how about this answer? Unfortunately, I couldn't find the prepared methods for retrieving the number of lines in the Google Document. In order to do this, how about this workaround?
If the end of each line can be detected, the number of lines can be retrieved. So I tried to add the end markers of each line using OCR. I think that there might be several workarounds to solve your issue. So please think of this as one of them.
At Google Documents, when a sentence is over the page width, the sentence automatically has the line break. But the line break has no \r\n or \n. When users give the line break by the enter key, the line break has \r\n or \n. By this, the text data retrieved from the document has only the line breaks which were given by users. In your case, it seems that your document has the line breaks for after incididunt and consequat.. So the number of lines doesn't become 6.
I thought that OCR may be able to be used for this situation. The flow is as follows.
Convert Google Document to PDF.
Convert PDF to text data using OCR.
I selected "ocr.space" for OCR.
If you have already known APIs of OCR, you can try to do this.
When I used OCR of Drive API, the line breaks of \r\n or \n were not added to the converted text data. So I used ocr.space. ocr.space can add the line breaks.
Count \n in the converted text data.
This number means the number of lines.
The sample script for above flow is as follows. When you use this, please retrieve your apikey at "ocr.space". When you input your information and email to the form, you will receive an email including API key. Please use it to this sample script. And please read the quota of API. I tested this using Free plan.
Sample script :
var apikey = "### Your API key for using ocr.space ###";
var id = DocumentApp.getActiveDocument().getId();
var url = "https://docs.google.com/feeds/download/documents/export/Export?id=" + id + "&format=pdf&access_token=" + ScriptApp.getOAuthToken();
var blob = UrlFetchApp.fetch(url).getBlob();
var payload = {method: "POST", headers: {apikey: apikey}, payload: {file: blob}};
var ocrRes = JSON.parse(UrlFetchApp.fetch("https://api.ocr.space/Parse/Image", payload));
var result = ocrRes.ParsedResults.map(function(e){return e.ParsedText.match(/\n/g).length})[0];
Logger.log(result)
Result :
When your sentences are used, 6 is obtained as the result of script.
Note :
Even if the last line of the document has no \r\n or \n, the converted text data has \r\n at the end of all lines.
In this case, the precision of OCR is not important. The important point is to retrieve the line breaks.
I tested this script for several documents. In my environment, the correct number of line can be retrieved. But I'm not sure whether this script works for your environment. If this script cannot be used for your environment, I'm sorry.

As you noted in the comments there is no API to do retrieve the number of lines in Google Docs. This happens because the document is rendered dynamically in the client side, so the server doesn't know this number.
One possible solution is scraping the HTML of the Google Doc, because each line is redered with it's own divwith the "kix-lineview" class, however you will need to actually open the page in an iframe or headless browser and then scroll page by page to make them render and then be able to count the divs

After publishing your Google Doc with «Publish to the web» in «File» menu, use the URL in the following script:
var url = "https://docs.google.com/document/d/e/2PACX-1vSElK...iwUhaFo/pub";
var text = UrlFetchApp.fetch(url).getContentText();
var count = (text.match(/<\/br>/g) || []).length;
Logger.log(count.toString());
This is only handy if all of your document lines are ended in </br>, although there is the possibility to add any other variants:
var url = "https://docs.google.com/document/d/e/2PACX-1vSElK...iwUhaFo/pub";
var text = UrlFetchApp.fetch(url).getContentText();
var count1 = (text.match(/<\/br>/g) || []).length;
var count2 = (text.match(/<\/p>/g) || []).length;
var count3 = (text.match(/<hr>/g) || []).length;
var count = coun1 + coung2 + count3;
Logger.log(count);

Related

How to perform normal and regex search on huge text lines using jquery

I have huge log data content to be shown on the browser for which it's rendered with each line in a pre tag like the following for it to look clean.
<div id="log_block">
<pre class="logs" id="line_1">00:00:00.001 INFO Current Directory C:/DATA</pre>
<pre class="logs" id="line_2">00:00:00.001 INFO No image info file found</pre>
<pre class="logs" id="line_3">00:00:00.001 INFO Command line: ls -l</pre>
.
.
<pre class="logs" id="line_10000">00:20:10.001 INFO Command line</pre>
</div>
The application has a search box where users can make normal search for a phrase or regex search.
When any user searches for a text, I have used find function of the browser to show the searched word, wherein I'm hiding all the lines that do not contain that word.
function find_searched_word() {
val = $('#search_bar').val();
$('#log_block pre:not(:contains("' + val + '"))').hide();
}
For regex search, I'm looping through each line of pre to check if it matches and store in variable and then renders it.
html_content = '';
$("#log_block pre").each(function(i){
if(this.html().match(regex))
html_content = html_content + this;
});
For loglines up to 20,000+ lines the process is fast but when log lines cross 100,000+ lines then search becomes very slow and sometimes page stucks.
What is the best alternative to performing a phrase and regex search for 100,000+ text lines rendered on the browser?

String from api response cant display break line, but hard code string can display break line

Good day,
I am using Angular 8 as my frontend, and java api service as my backend.
I need to retrieve some String from backend, and the String will having \n in between, for example:
"Instructions:\n1. Key in 122<16 digit \nreload pin># to reload.\n2. Press SEND/CALL from
In my .ts file, I am setting this String value as follow:
this.str = this.apiRes.responseMsg1;
console.log("this.str : " + this.str);
This will give me Instructions:\n1. Key in 122<16 digit \nreload pin># to reload.\n2. Press SEND/CALL, thus when I use it to display in html, it will just display as 1 line.
If I hard code this String to a String variable, for example:
this.str = "Instructions:\n1. Key in 122<16 digit \nreload pin># to reload.\n2. Press SEND/CALL from";
console.log("this.str : " + this.str);
It will give me :
Instructions:
1. Key in 122<16 digit
reload pin># to reload.
2. Press SEND/CALL from
Which is what I want.
I am not really familiar with Angular, I am trying to find this answer in google, but cant get any related result.
May I know why is this happen? And any way I can display the api message accordingly?

HTML ignores \n line breaks. You basically have two options to fix this:
Replace the line breaks with <br> elements.
Add a line of css white-space: pre-wrap; to your HTML element which is displaying your string.

How can I use a Mirth-Javascript to remove line breaks in HL7 messages?

An HL7 message comes into Mirth and throws a "processing" error. At the very bottom of the message in Raw format is a partial line that has been separated from the line above it. I have to manually correct this every time. I am hoping to use a Mirth-Javascript as a message filter that can fix this so that everything flows without human intervention.
Below message snippet triggers the error. In this example it is the very last line of the HL7 message.
OBX|68|FT|PT6663&IMP^PET/CT Imaging Whole Body||
||||||F|||202254836969552|||
Currently my only fix is to open the HL7 message and manually go to the line break and bring it up to the line above it that is part of the segment.
The HL7 message should look like this:
OBX|68|FT|PT1103&IMP^PET/CT Imaging Whole Body||||||||F|||20190327101958|||

This worked. Put the following into the preprocessor.
message = message.replace(/[\r\n]+(?![A-Z][A-Z][A-Z0-9]\|)/g, "");
return message;

Remove all line brakes in the channel's pre-processor or attachment script, and then insert them back based on the segment names.
The best way would be to stop the message generating system insert line brakes into OBX.5 field.

Removing all line breaks would be an approach, but it could be a problem later on, you could set up a replace script, that instead of '/n', searches for '|/n|' or a similar string, that way, it would fix that particular problem as well as any other undesired line breaks in between vertical separators, tho it wouldnt help if it broke anywhere else, so keep that in mind.

Put this code snippet in your preprocessor script. It worked for me
var newmessage = message.replace(/[\n\r]$/,"");
while (newmessage.match(/(\r\n|\r|\n)([^A-Z]|[A-Z][^A-Z]|[A-Z]{2}[^A-Z\d]|[A-Z]{2}[\d][^|]|[A-Z]{3}[^|])/i)) {
var extrabit = newmessage.match(/(\r\n|\r|\n)([^A-Z]|[A-Z][^A-Z]|[A-Z]{2}[^A-Z\d]|[A-Z]{2}[\d][^|]|[A-Z]{3}[^|])/i)[0].substring(1);
var newmessage = newmessage.replace(/(\r\n|\r|\n)([^A-Z]|[A-Z][^A-Z]|[A-Z]{2}[^A-Z\d]|[A-Z]{2}[\d][^|]|[A-Z]{3}[^|])/i,'\\.br\\' + extrabit);
}
return newmessage;

Mirth processor expects every line the first 3 characters should contain valid HL7 segments otherwise the mirth throws an error.
To remove the invalid line breaks in the HL7 message you should follow the below steps.
1.Channel -->Scripts -->Preprocessor.
Paste the bellow code top of the "return message;" statement
message = message.replace(/[\r\n]+(?![A-Z][A-Z][A-Z0-9]\|)/g, ""); //This Line is for invalid line breaks in incoming message APPEND to PREVIOUS SEGMENT .
Save the changes and deploy the channel for new changes affected.

From your question, the HL7 field that contains line breaks is OBX(5,1) which should hold Observation Value.
Observation value may contain line breaks as a part of data. Line break (<CR> or ASCII 13) is segment separator by default. If this is received as a part of data, there will be issues while parsing message. This is the root cause of the problem you mentioned in the question.
The segment separator is not negotiable. It is always a carriage return. I have explained this in more details in this answer.
Ideally, those line breaks should be replaced with its escape sequence while building HL7 message. More details about it are already given in one of my earlier answers here.
So, your inbound message
OBX|68|FT|PT6663&IMP^PET/CT Imaging Whole Body||
||||||F|||202254836969552|||
should be actually
OBX|68|FT|PT6663&IMP^PET/CT Imaging Whole Body||\X0D\\X0D\||||||F|||202254836969552|||
About your actual question that how to do this with Mirth/Javascript, there should not be need in your particular use case. This conversion should be done before sending message to Mirth. So, the one who is sending this message to you should build it like this.
While actually displaying observation value on UI, you again need to do the reverse process.
Edit:
If line break is different than <CR> (ASCII 13), then respective HEX should be replaced in \X0D\. Details are mentioned in my linked answer; I am not repeating those here.

I had the similar issue of having blank lines between the segments and i solved it liked this :
content = content.replace(/^\s*\n/gm, '');
Note: This will just remove blank lines. You need to still figure out how to get the next line on current line
You can try regex to eliminate all '\n' not followed by any segment.

reformat characters in json data

I am retrieving data from reddit json. and some data is like that:
The actual resolution of this image is 3067x2276, not 4381x3251. See [this](https://www.reddit.com/r/EarthPorn/wiki/index#wiki_resolution.3F_what_is_that_and_how_can_i_find_it.3F) page for information on how to find out what the resolution of an image is.
i want to insert the data into <p></p> on my page but the link is as it is above (not clickable).
Notice when i try to post it on stackoverflow, it very nicely reformats into a clickable link. How do i do that?
reformatted by stackoverflow:
The actual resolution of this image is 3067x2276, not 4381x3251. See this page for information on how to find out what the resolution of an image is.
How do i achieve that?

I feel like I cheated, but inspecting the OP in my browser, I get...
<p>The actual resolution of this image is 3067x2276, not 4381x3251. See this page for information on how to find out what the resolution of an image is.</p>
In other words, if you find [words](URL), replace it with:
words
This little regex tries to capture the contents of [] followed by (). Checking for http may be insufficient depending on the sort of links you expect...
let regex = /\[(.*?)\]\(([^\)]+)\)/g;
let matches = regex.exec(line);
// matches ought to contains words and a potential url
if (matches.length > 2 && matches[2].startsWith("http://")) {
// matches[2] is probably a url, so...
let replace = `${matches[1]}`
// ...
}

Start with Regular Expressions, basically wildcards on steroids.
/\[.*\]\(.*\)/, While looking weird, will find [*](*) where * can be any length string. All this can do is find the first index of this appearing. I tried looking but i'm not the best with JS.
https://www.w3schools.com/js/js_regexp.asp

How do I prevent spaces from being URL encoded for a `javascript:` URL?

I'm trying to use the following as a URL that executes javascript:
javascript:var field = document.getElementsByName("actions[hide]"); + for (i = 0; i < field.length; i++)field[i].click();
However, the spaces get URL encoded when I bookmark it, replaced with %20, which (for a reason unknown to me) causes the JS code not to work.
javascript:var%20field%20=%20unescape%20document.getElementsByName("actions[hide]");%20+%20for%20(i%20=%200;%20i%20<%20field.length;%20i++)field[i].click();

if you want to create a bookmarklet i would suggest you this site:
http://benalman.com/code/test/jquery-run-code-bookmarklet/
there is written that it is used for jquery code but you can also convert normal javascript with this generator. Or you can simplify use the jquery and convert your code from:
var field = document.getElementsByName("actions[hide]"); + for (i = 0; i < field.length; i++)field[i].click();
to
$('[name="actions[hide]"]').each(function() { $(this).click(); });
i use this script every time i create a new bookmarklet and i love it
EDIT: when you enter your code you must paste it without the "javascript:" text in front.

I didn't understand as well the purpose you mean for that javascript as url scheme.
Anyway if you put that string into the browse address bar to work within the current web page, probably it won't.
You can try to call an anonymous function:
Click me

Encoding spaces in javascript: URIs shouldn't (and doesn't in my experience) break scripts. The problem is most likely your + which is also a special character in URIs (it also means a space) but isn't being automatically converted by the browser as the character is allowed at that point in a URI.
You need to encode the + character as %2B (along with any other special characters you might have in the JS).

Develop Reference

JavaScript is the programming language of the Web.

Counting the number of lines in Google Document - javascript

Related

How to perform normal and regex search on huge text lines using jquery

String from api response cant display break line, but hard code string can display break line

How can I use a Mirth-Javascript to remove line breaks in HL7 messages?

reformat characters in json data

How do I prevent spaces from being URL encoded for a `javascript:` URL?

Categories

Resources