Render file text to html and be able to search its content

Render file text to html and be able to search its content - javascript

I'm working on Angular 4 and I have a feature in application is render a file text with pure text (not parse to any structure) to html. I'm able to search a part of content on the text. Now I have a solution as the following:
Solution 1:
I would like to analyze the text, all text are wrapped in div element, that div contains multiple paragraph elements <p> content which separated by newline (line-break) in the file. Then go search keyword and to find out a <p> contains keyword, styling CSS highlight for specific text within <p> element. But I have googled but still not found any sample for this.
Maybe you will ask me why don't you parse all contents in one element tag. I think that it's more complicated for css styling and scrolling to position occurring keyword.
Therefore, I need some helps to find out for this case. Thank you in advance.
UPDATE:
Whether should I need to use find feature built-in of browser?

Related

WISIWYG Text Length Limitation Without Breaking HTML Tags

I am using CKEditor as a WYSIWYG rich text editor with Vue3.
The editor generates its output inside an IFrame, so in the end I guess an HTML body with the text a user entered, wrapped by HTML tags. Accessing the iframe content is done with document.querySelector('iframe').contentDocument which gives me innerTextr and innerHTML, and everything is accessible.
I am trying to limit the text length, so if a user writes this text:
I am an example text and this part is bold. and limitation is 23 characters, the . character will be deleted automatically. BTW this will be the output generated by the editor:
<p>I am an example text and <b>this part is bold.</b></p>
So I am able to get the stripped text by removing any HTML tag, storing the "clean" text in a variable and then getting its length. Moreover, I am able to delete this letter on a keyup trigger.
Issue is, whenever a letter is deleted, it also modifies the HTML tags and break their structure. So I get this: <p>I am an example text and <b>this part is bold instead of this: <p>I am an example text and <b>this part is bold</b></p>.
Is there any better way of planning this cusomization so I will get ONLY the inner text trimmed.

The way I worked around this problem was to
delete the text, obtaining <p>I am an example text and <b>this part is bold
check for unclosed tags (does the nb of <x> equal the number of </x>?), and close them if needed
There may be a better way, but I wrote that piece of code about 10 years ago and it does the work just fine.

How to select all tags except anchors (neither anchors inside another element) with document.querySelectorAll?

edit: Is it possible to get all the inner text from tags in HTML document except text from anchor tags <a> (neither the the text from <a> anchors inside another elements) with the document.querySelectorAll method?
My program has an input field that allows users to insert some selector to get the text for certain tags in a given site page.
So, if I want to insert a selector that gets text from all nodes except <a> tags, how can I accomplish that?
I mean *:not(a) does not work, because it selects tags that may have <a>descendants and not() selector does not accept complex selectors, so *:not(* a) does not work.
I know I could delete those nodes from document first, but is it possible to accomplish this task only selecting those nodes I want with the document.querySelectorAll method?
Example:
<html>
<... lots of other tags with text inside>
<div>
<p> one paragraph </p>
<a> one link </a>
</div>
</...>
</html>
I want all the text in the html except "one link"
edit:
If you do document.querySelectorAll('*:not(a)'), you select the div, that has inside an a element. So, the innerText of this div contains the text from a element
Thank you

Your question is how to allow users to extract information from arbitrary hypertext [documents]. This means that solving the problem of "which elements to scrape" is just part of it. The other part is "how to transform the set of elements to scrape into a data set that the user ultimately is interested in".
Meaning that CSS selectors alone won't do. You need data transformation, which will deal with the set of elements as input and yield the data set of interest as output. In your question, this is illustrated by the case of just wanting the text content of some elements, or entire document, but as if the a elements were not there. That is your transformation procedure in this particular case.
You do state, however, that you want to allow users to specify what they want to scrape. This translates to your transformation procedure having other variables and possibly being general with respect to the kind of transformations it can do.
With this in mind, I would suggest you look into technologies like XSLT. XSLT, for one, is designed for these things -- transforming data.
Depending on how computer literate you expect your users to be, you might need to encapsulate the raw power and complexity of XSLT, giving users a simple UI which translates their queries to XSLT and then feeds the resulting XSL stylesheets to an XSLT processor, for example. In any case, XSLT itself will be able to carry a lot of load. You also won't need both XSLT and CSS selectors -- the former uses XPath which you can utilize and even expose to users.
Let's consider the following short example of a HTML document you want scraped:
<html>
<body>
<p>I think the document you are looking for is at example.com.</p>
</body>
</html>
If you want all text extracted but not a elements, the following XSL stylesheet will configure an XSLT processor to yield exactly that:
<?xml version="1.0" encoding="utf-8" ?>
<stylesheet version="1.0" xmlns="http://www.w3.org/1999/XSL/Transform">
<output method="text" />
<template match="a" />
</stylesheet>
The result of transforming the HTML document with the above XSL stylesheet document is the following text:
I think the document you are looking for is at .
Note how the a element is "stripped" leaving an empty space between "at" and the sentence punctuation ("."). The template element, being empty, configures the XSLT processor to not produce any text when transforming a elements ("a" is a valid, if very simple, XPath expression, by the way -- it selects all a elements). This is all part of XSLT, of course.
I have tested this with Free Online XSL Transformer which uses the very potent SAX library.
Of course, you can cover one particular use case -- yours -- with JavaScript, without XSLT. But how are you going to let your users express what they want scraped? You will probably need to invent some [simple] language -- which might as well be [the already invented] XSLT.
XSLT isn't readily available across different user agents or JavaScript runtimes, not out of the box -- native XSLT 1.0 implementations are indeed provided by both Firefox and Chrome (with the XSLTProcessor class) but are not specified by any standards body and so may be missing in your particular runtime environment. You may be able to find a suitable JavaScript implementation though, but in any case you can invoke the scraper on the server side.
Encapsulating the XSLT language behind some simpler query language and user interface, is something you will need to decide on -- if you're going to give your users the kind of possibilities you say you want them to have, they need to express their queries somehow, whether through a WYSIWYG form or with text.

clone top node, remove as from the clone, get text.
const bodyClone = document.body.cloneNode(true);
bodyClone.querySelectorAll("a").forEach(e => e.remove());
const { textContent } = bodyClone;

you can use
document.querySelectorAll('*:not(a)')
hope it will work.

Adding html/any tags to either side of selection - Javascript

Adding HTML/any tags to either side of selection - Javascript
The problem:
After creating a textarea box in my PHP/html file I wished to add a little more functionality and decided to make an textarea that can use formatting, for example
<textarea>
This is text that was inserted. <b>this text was selected and applied a style
via a button<b>
</textarea>
It doesn't matter what the tags are, (could be bubbles for all that I care due to the fact the PHP script, on receiving the $_POST data will automatically apply the correct tags with the tag as the style ID. Not relevant)
The Question/s
How can I create this feature using javascript?
Are there any links that may help?
And can, if there is information, can you explain it?
EDIT: Other close example but not quite is stackoverflow's editor and note that I do not wish to use 3rd party scripts, this is a learning process for me.
The tags that are inserted in the text are saved to a database and then when the page is requested the PHP replaces the tags with the style ID. If there is a work around not involving 3rd party scripts please suggest
And for the anti-research skeptics on a google search, little was found that made sense and there was Previous Research on SOF:
- https://stackoverflow.com/questions/8752123/how-to-make-an-online-html-editor
- Adding tags to selection
Thanks in Advance

<textarea> elements cannot contain special markup, only values. You can't apply any styling in a textarea.
What you'll need to do is fake everything that a text box would normally do, including drawing a cursor. This is a lot of work, as hackattack said.
You can do a lot if you grab jQuery and start poking around. Toss a <div> tag out there with an ID for ease and start hacking away.
I've never made one personally, but there is a lot to it. HTML5's contentEditable can maybe get you a good chunk of the way there: http://html5demos.com/contenteditable/
If you want to pass this data back to the server, you'll need to grab the innerHTML of the container and slap that into a hidden input upon submission of your form.
Here's other some things you can check out if you're just messing around:
tabindex HTML attribute, to get focus in your box from tabbing
jQuery.focus() http://api.jquery.com/focus/, to determine when someone clicks in your box
cursor: text in CSS for looks http://wap.w3schools.com/cssref/pr_class_cursor.asp
jQuery.keypress() http://api.jquery.com/keypress/, or similar for grabbing keystrokes
Edit: I think I completely misunderstood
If you're not looking for a rich text editor, and just want some helper buttons for code, maybe selectionStart and selectionEnd is what you're after. I don't know what the browser support is, but it's working in Chrome:
http://jsfiddle.net/5yXsd/

you can not do anything beside basic formatting inside a texarea. If you want complex formatting, look into setting a div's contentEditable attribute to true. Or you can make a wysisyg editor, but that is a big project. I strongly suggest using 3rd party code on this one.

I suggest you using the iframe to implement the WYSIWYG effect.
There is a property in iframe called designMode
See here for more
https://developer.mozilla.org/en/Rich-Text_Editing_in_Mozilla
Also there is a lightweight example maybe you would like to take a look:
http://code.google.com/p/rte-light/source/browse/trunk/jquery.rte.js

How to detect where the word is wrapped

I would like to add 'span' tag on the beginning and '/span' on the end of each line of text as it is presented in the website and change it dynamically when a div containing such text is resized. The problem is that I don't know how to detect where the text is being wrapped - if I had such information it would be easy. So my question is: is there a way to determine where the text is wrapped using javascript?
I have found a javascript library which hyphenates the text in the site but I'm not sure how does it detect line wraps. The working example is here and it's source can be found here.

yes, this is like to Search For and Highlight Text.
using javascript you can learn it here:
Search For and Highlight Text
or here:
how-can-i-use-jquery-to-style-parts-of-all-instances-of-a-specific-word
just download/save the web page there and you can learn it.

Replace selected text with jquery/javascript

I am trying to build a specialized WYSIWYG text editor in the browser, and have a very limited set of functionality, but the biggest part of that is wrapping certain text in span tags.
I can find many resources explaining standard stuff (execCommand and whatnot), but have looked and looked and can't find anything to do what I need.
Basically, it's as simple as it sounds: user selects some text, clicks a button or whatever, and the text gets replaced with some other text (the initial case is that same text wrapped in some HTML tags).
I can find ways to do this in a textarea, but I'm just in regular HTML land, with the content in question inside a div with contentEditable marked as true.
I have also found ways to replace all occurences of text, or the first occurence, but not a specific one. Most solutions I find fail when trying to replace anything but the first occurence.
I'm hoping jQuery can do this in some way.

Have you tried the jQuery wrapSelection plugin?

This is pretty similar to this question. It might help.

Develop Reference

JavaScript is the programming language of the Web.