How to get web content before visit that web page - javascript

how to get description/content of web page for given URL.
(Something like Google gives the short description of each resulting link).
I want to do this in my jsp page.
Thank in advance!

Idea: Open the URL as a stream, then HTML-parse the String in its description meta tag.
Grab URL content:
URL url = new URL("http://www.url-to-be-parsed.com/page.html");
BufferedReader in = new BufferedReader(
new InputStreamReader(
url.openStream()));
Will need to tweak the above code depending on what your HTML parser library requires (a stream, strings, etc).
HTML-Parse the tags:
<meta name="description" content="This is a place where webmasters can put a description about this web page" />
You might also be interested in grabbing the title of that page:
<title>This is the title of the page!</title>
Caution: Regular expressions do not seem to work reliably on HTML documents, so a HTML-parser is better.
An example with HTML Parser:
Use HasAttributeFilter to filter by tags that have name="description" attribute
try a Node ---> MetaTag casting
Get the content using MetaTag.getAttribute()
Code:
import org.htmlparser.Node;
import org.htmlparser.Parser;
import org.htmlparser.util.NodeList;
import org.htmlparser.util.ParserException;
import org.htmlparser.filters.HasAttributeFilter;
import org.htmlparser.tags.MetaTag;
public class HTMLParserTest {
public static void main(String... args) {
Parser parser = new Parser();
//<meta name="description" content="Some texte about the site." />
HasAttributeFilter filter = new HasAttributeFilter("name", "description");
try {
parser.setResource("http://www.youtube.com");
NodeList list = parser.parse(filter);
Node node = list.elementAt(0);
if (node instanceof MetaTag) {
MetaTag meta = (MetaTag) node;
String description = meta.getAttribute("content");
System.out.println(description);
// Prints: "YouTube is a place to discover, watch, upload and share videos."
}
} catch (ParserException e) {
e.printStackTrace();
}
}
}
Considerations:
If this is done in a JSP each time the page is loaded, you might get a slowdown due to the network I/O to the URL. Even worse if you do this each time on-the-fly for a page of yours that has many URL links in it, then the slowdown could be massive due to the sequential operation of n URLs. Maybe you can store this information in a database and refresh them as needed instead of doing in it on-the-fly in the JSPs.

Related

How to open a preview file (pdf, docx, txt, etc.) in another page of the browser using Angular and Java

I am developing a web application using Angular 7 and Java 8. I am uploading a file (pdf, docx, txt etc...), but the problem is that I can't open it in another page of the browser through a RESTful web service. I am getting the error 415 Unsupported Media Type. I have tried with the POST and GET method whitout any success. These are the snippets of the code, front-end and back-end:
Angular component (method called by a button passing the path + filename example : C/doc/foo.pdf)
download(doc) {
this.service.downloadFile(doc.path).subscribe(response => {
if(response) {
let blob = new Blob([response], { type: 'text/json; charset=utf-8' });
const url= window.URL.createObjectURL(blob);
window.open(url);
}
});
}
Angular service
downloadFile(path): Observable<Blob> {
const url = '/download';
return this.http.post<Blob>(url, path, { responseType: 'blob' as 'json' });
}
Java Controller
#PostMapping(value = "download", produces = { MediaType.APPLICATION_JSON_UTF8_VALUE })
#ResponseBody
public byte[] download(#RequestBody String path) {
try {
return this.provvedimentoService.getDownload(path);
} catch (IOException | EgesException e) {
e.printStackTrace();
return null;
}
}
Java Service
public byte[] getDownload(String pathFile) throws EgesException, IOException {
Path path = Paths.get(pathFile);
if(path.toFile().exists())
return Files.readAllBytes(path);
else
throw new EgesException(EgesExceptionConstants.WARNING_ACT_NOTFOUND_EXCEPTION);
}
You can't. Browsers don't have any built-in way to view Word docs so unless the user has configured their browser to open it with some plugin (which 99% of the world hasn't done), the browser will prompt them to download the file.
No browsers currently have the code necessary to render Word Documents, and as far as I know, there are no client-side libraries that currently exist for rendering them either.
However, if you only need to display the Word Document, but don't need to edit it, you can use Google Documents' Viewer via an <iframe> to display a remotely hosted .pdf/ .doc/.docx /.txt etc
<iframe src="https://docs.google.com/gview?url=http://remote.url.tld/path/to/document.doc&embedded=true"></iframe>
Many people used this methods to view their documents file. you can check here:How to display a word document using fancybox
<iframe width="100%" height="300px" src="https://docs.google.com/gview?url=http://index-of.co.uk/Google/googlehacking.pdf&embedded=true"></iframe>
You can add your document file URL like this
https://docs.google.com/viewer?url=<url for your file>&embedded=true
Here will be your file url <url for your file>
You have another way to view the .pdf/ .doc/.docx /.txt etc like google iframe system of Microsoft Document viwer webapp.
You can use it like this.
<iframe src='https://view.officeapps.live.com/op/embed.aspx?src=http://writing.engr.psu.edu/workbooks/formal_report_template.doc' width='80%' height='800px' frameborder='0'></iframe>
<iframe src='https://view.officeapps.live.com/op/embed.aspx?src=http://writing.engr.psu.edu/workbooks/formal_report_template.doc' width='80%' height='800px' frameborder='0'></iframe>
A solution adapted from "How do I render a Word document (.doc, .docx) in the browser using JavaScript?".
This is an embedded Microsoft Office document, powered by Office Online.
embed.aspx?src=<your_will_be_your_document_file_url>' width='80%'
add your document file ur here <your_will_be_your_document_file_url>

Creating .eml file from dynamic web components (React/Vue/Angular) [string of compiled html]

The title may be confusing, so let me expand a little more:
My goal is to have a front end framework/(library), like React, Vue, or Angular, that has a normal user interface stuff, such as the user inputting data or an uploading an image to a server.
I then want the web app to basically make an HTML email. So, I'm thinking the best way is to create a text file of HTML, but it will be of the format .eml instead of .txt so it's easy to open in mail clients and send the email.
My question:
- How can I create a string of dynamic HTML that is then saved as a file for the user to download. dynamic meaning sometimes it may be just 1 or 2 items, sometimes it may be 15, but the point is that the variable will change and a loop will be run for as many objects as there are so that the appropriate amount of HTML will be created.
I'm asking because we all know how to display a view in React/others, but how can we get a programmatic pseudo-view in the logic code. That is, how do we get a string representation of the views output of the resulting html, if that makes sense. And then create an .eml file holding that html so the user can download.
Is this even possible in the operations of today's popular frameworks?
====
EDIT
Just an idea I had from research, for generating the file it seems a Blob might be best.
var file = new Blob([html_string], {type: 'text/plain'})
Some, for React, some code would be like the following (thanks to Chris's answer from this SO question.)
class MyApp extends React.Component {
_downloadTxtFile = () => {
var element = document.createElement("a");
var file = new Blob([document.getElementById('myInput').value], {type: 'text/plain'});
element.href = URL.createObjectURL(file);
element.download = "myFile.txt";
element.click();
}
render() {
return (
<div>
<input id="myInput" />
<button onClick={this._downloadTxtFile}>Download txt</button>
</div>
);
}
}
ReactDOM.render(<MyApp />, document.getElementById("myApp"));
Which leaves the question of how to create the string of HTML. Maybe ES6 template literals with embedded expressions? But, that wouldn't be JSX exactly, so I'm not sure how to throw a for loop in there. I'll continue researching or if someone knows how to throw all this together.

get start and end of each page in docx file with javascript [duplicate]

Using OpenXML, can I read the document content by page number?
wordDocument.MainDocumentPart.Document.Body gives content of full document.
public void OpenWordprocessingDocumentReadonly()
{
string filepath = #"C:\...\test.docx";
// Open a WordprocessingDocument based on a filepath.
using (WordprocessingDocument wordDocument =
WordprocessingDocument.Open(filepath, false))
{
// Assign a reference to the existing document body.
Body body = wordDocument.MainDocumentPart.Document.Body;
int pageCount = 0;
if (wordDocument.ExtendedFilePropertiesPart.Properties.Pages.Text != null)
{
pageCount = Convert.ToInt32(wordDocument.ExtendedFilePropertiesPart.Properties.Pages.Text);
}
for (int i = 1; i <= pageCount; i++)
{
//Read the content by page number
}
}
}
MSDN Reference
Update 1:
it looks like page breaks are set as below
<w:p w:rsidR="003328B0" w:rsidRDefault="003328B0">
<w:r>
<w:br w:type="page" />
</w:r>
</w:p>
So now I need to split the XML with above check and take InnerTex for each, that will give me page vise text.
Now question becomes how can I split the XML with above check?
Update 2:
Page breaks are set only when you have page breaks, but if text is floating from one page to other pages, then there is no page break XML element is set, so it revert back to same challenge how o identify the page separations.
You cannot reference OOXML content via page numbering at the OOXML data level alone.
Hard page breaks are not the problem; hard page breaks can be counted.
Soft page breaks are the problem. These are calculated according to
line break and pagination algorithms which are implementation
dependent; it is not intrinsic to the OOXML data. There is nothing
to count.
What about w:lastRenderedPageBreak, which is a record of the position of a soft page break at the time the document was last rendered? No, w:lastRenderedPageBreak does not help in general either because:
By definition, w:lastRenderedPageBreak position is stale when content has
been changed since last opened by a program that paginates its
content.
In MS Word's implementation, w:lastRenderedPageBreak is known to be unreliable in various circumstances including
when table spans two pages
when next page starts with an empty paragraph
for
multi-column layouts with text boxes starting a new column
for
large images or long sequences of blank lines
If you're willing to accept a dependence on Word Automation, with all of its inherent licensing and server operation limitations, then you have a chance of determining page boundaries, page numberings, page counts, etc.
Otherwise, the only real answer is to move beyond page-based referencing frameworks that are dependent upon proprietary, implementation-specific pagination algorithms.
This is how I ended up doing it.
public void OpenWordprocessingDocumentReadonly()
{
string filepath = #"C:\...\test.docx";
// Open a WordprocessingDocument based on a filepath.
Dictionary<int, string> pageviseContent = new Dictionary<int, string>();
int pageCount = 0;
using (WordprocessingDocument wordDocument =
WordprocessingDocument.Open(filepath, false))
{
// Assign a reference to the existing document body.
Body body = wordDocument.MainDocumentPart.Document.Body;
if (wordDocument.ExtendedFilePropertiesPart.Properties.Pages.Text != null)
{
pageCount = Convert.ToInt32(wordDocument.ExtendedFilePropertiesPart.Properties.Pages.Text);
}
int i = 1;
StringBuilder pageContentBuilder = new StringBuilder();
foreach (var element in body.ChildElements)
{
if (element.InnerXml.IndexOf("<w:br w:type=\"page\" />", StringComparison.OrdinalIgnoreCase) < 0)
{
pageContentBuilder.Append(element.InnerText);
}
else
{
pageviseContent.Add(i, pageContentBuilder.ToString());
i++;
pageContentBuilder = new StringBuilder();
}
if (body.LastChild == element && pageContentBuilder.Length > 0)
{
pageviseContent.Add(i, pageContentBuilder.ToString());
}
}
}
}
Downside: This wont work in all scenarios. This will work only when you have a page break, but if you have text extended from page 1 to page 2, there is no identifier to know you are in page two.
Unfortunately, As Why only some page numbers stored in XML of docx file? answers, docx dose not contains reliable page number service. Xml files carry no page number, until microsoft Word open it and render dynamically. Even you read openxml documents like https://learn.microsoft.com/en-us/dotnet/api/documentformat.openxml.wordprocessing.pagenumber?view=openxml-2.8.1 .
You can unzip some docx files, and search "page" or "pg". Then you will know it. I do this on different kinds of docx files in my situation. All tell me the same truth. Glad if this helps.
List<Paragraph> Allparagraphs = wp.MainDocumentPart.Document.Body.OfType<Paragraph>().ToList();
List<Paragraph> PageParagraphs = Allparagraphs.Where (x=>x.Descendants<LastRenderedPageBreak>().Count() ==1) .Select(x => x).Distinct().ToList();
Rename docx to zip.
Open docProps\app.xml file. :
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<Properties xmlns="http://schemas.openxmlformats.org/officeDocument/2006/extended-properties" xmlns:vt="http://schemas.openxmlformats.org/officeDocument/2006/docPropsVTypes">
<Template>Normal</Template>
<TotalTime>0</TotalTime>
<Pages>1</Pages>
<Words>141</Words>
<Characters>809</Characters>
<Application>Microsoft Office Word</Application>
<DocSecurity>0</DocSecurity>
<Lines>6</Lines>
<Paragraphs>1</Paragraphs>
<ScaleCrop>false</ScaleCrop>
<HeadingPairs>
<vt:vector size="2" baseType="variant">
<vt:variant>
<vt:lpstr>Название</vt:lpstr>
</vt:variant>
<vt:variant>
<vt:i4>1</vt:i4>
</vt:variant>
</vt:vector>
</HeadingPairs>
<TitlesOfParts>
<vt:vector size="1" baseType="lpstr">
<vt:lpstr/>
</vt:vector>
</TitlesOfParts>
<Company/>
<LinksUpToDate>false</LinksUpToDate>
<CharactersWithSpaces>949</CharactersWithSpaces>
<SharedDoc>false</SharedDoc>
<HyperlinksChanged>false</HyperlinksChanged>
<AppVersion>14.0000</AppVersion>
</Properties>
OpenXML lib reads wordDocument.ExtendedFilePropertiesPart.Properties.Pages.Text from <Pages>1</Pages> property . This properies are created only by winword application. if word document changed wordDocument.ExtendedFilePropertiesPart.Properties.Pages.Text is not actual. if word document created programmatically the wordDocument.ExtendedFilePropertiesPart is offten null.

Conditional redirect URL java

It's realted to Conditional redirect the current URLs to new URLs, so that I can write "ONE script" and copy it into all the current old HTMLs for the same script. It would help those users whom had bookmarked the old URL and redirect to the new URL.
I have an old webpeges like
www.oldsite.com/a1.html
www.oldsite.com/a1/5698.html
www.oldsite.com/x1.html
www.oldsite.com/YY1.html
and want to get current URL of an user and using location.path and redirect to newsite.com base on the path (suffix)
ie
www.oldsite.com/a1.html to www.newsite.com/b1.html
www.oldsite.com/a1/5698.html to www.newsite.com/b1/5698.html
www.oldsite.com/x1.html to www.newsite.com/99.html
www.oldsite.com/YY1.html to www.newsite.com/ZZ1.html
I have tried many times but fail!
Since I am a new comer to HTML, can anyone can help me?
My site held more than a hundred old html!
Here is my draft
<html>
<head>
<javascript>
if window.location.path='/al.html'
{
window.location="http://www.newsite.com/b1.html"}
else if window.location.path='/al/5698.html'
{
window.location="http://www.newsite.com/b1/5698.html"}
.........
</script>
</head>
Given
import java.net.URI;
URI oldUri;
you can do that translation by creating a new uri from the old.
URI newUri = new URI(
oldUri.getScheme(),
"www.newsite.com",
translatePath(oldUri.getPath()),
oldUri.getQuery(),
oldUri.getFragment());
You haven't provided enough examples for me to know what translatePath should do, but given a string like "/a1.html" it returns "/b1.html".

How to decode a file from base64 encoding with JavaScript

My company has a very strict intranet for work related, the net has a single doorway to allow files in and out. The doorway's security does not allow special kinds of files (*.txt, *.doc etc only), and even in those specific kinds of files, it searches for patterns that approve that the file is really that kind. (You can't simply disguise a *.zip file as a *.doc file.)
As a security project, I was told to find a way to bypass this system, and insert a single C language .exe file that says 'Hello World'.
What I thought was to change the extension to .txt, and base64 encode it so that it would be more acceptable for the system. The problem is, how to decode it once it's in. It's very easy on the outside, PHP or any other decent language can do it for me. However, in there, the only real language I have access to is JavaScript (on IE6 and maybe, MAYBE, on IE8).
So the question is as follows, can I use JavaScript to read a file from the file system, decode it, and write it back? or at least display the result for me?
Note that I don't ask for decoding/encoding a message, this one is easy, I look to decode encode a file.
JSON might be the answer you are looking for. It can actually do the trick.
Encode your txt file in JSON format. It is very likely for it to pass your company's doorway security
var myJsonData = { "text" : "SGVsbG8sIHdvcmxkIQ==" }; // <-- base64 for "Hello, world!"
Import your txt file using plain html script syntax
<script src="hello.txt" type="text/javascript"> </script>
That's it! Now you can access a JSON object using the Syntax:
alert(myJsonData.text);
To complete your job, get this simple Javascript base64 decoder.
You're done. Here's the (very simple) code I've used:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta http-equiv="content-type" content="text/html; charset=windows-1250">
<meta name="generator" content="PSPad editor, www.pspad.com">
<title></title>
<script src="base64utils.js" type="text/javascript"> </script>
<script src="hello.txt" type="text/javascript"> </script>
<script type="text/javascript">
function helloFunction() {
document.getElementById("hello").innerHTML = decode64(myJsonData.text);
}
</script>
</head>
<body onload="helloFunction();">
<p id="hello"></p>
</body>
</html>
Using only javascript (i.e. no plugins like AIR etc), browsers don't allow access to the file system. Not only is it not possible to write a file to the disk, it's not possible to even read it - browsers are very strict on that sort of thing, thank goodness.
You cannot do this with straight JS in the browser, security context and the DOM do not allow filesystem access.
You cannot do this with current versions of flash, older versions (pre 7 IIRC) had some security flaws that allowed filesystem access.
You could do this with a custom plugin, and possibly a signed Java applet, or COM (ActiveX component, IE only).
I would suggest working with IT regarding your intranet to open up the context/permissions needed in this case as that may be the shortest path to what you are wanting here. Alternative, you could create a command-line utility to easily encrypt/decrypt given files signed by a common key.
It all depends on how you can get the file in. If you have the base-64 encoded exe as a .txt, you could easily use Flash!
I'm not quite sure how you would implement this, but you can load a file into flash and as3 using flex.
<?xml version="1.0" encoding="utf-8"?>
<mx:Application xmlns:mx="http://www.adobe.com/2006/mxml" layout="absolute">
<mx:Script>
<![CDATA[
import flash.net.FileReference;
import flash.net.FileFilter;
import flash.events.IOErrorEvent;
import flash.events.Event;
import flash.utils.ByteArray;
//FileReference Class well will use to load data
private var fr:FileReference;
//File types which we want the user to open
private static const FILE_TYPES:Array = [new FileFilter("Text File", "*.txt;*.text")];
//called when the user clicks the load file button
private function onLoadFileClick():void
{
//create the FileReference instance
fr = new FileReference();
//listen for when they select a file
fr.addEventListener(Event.SELECT, onFileSelect);
//listen for when then cancel out of the browse dialog
fr.addEventListener(Event.CANCEL,onCancel);
//open a native browse dialog that filters for text files
fr.browse(FILE_TYPES);
}
/************ Browse Event Handlers **************/
//called when the user selects a file from the browse dialog
private function onFileSelect(e:Event):void
{
//listen for when the file has loaded
fr.addEventListener(Event.COMPLETE, onLoadComplete);
//listen for any errors reading the file
fr.addEventListener(IOErrorEvent.IO_ERROR, onLoadError);
//load the content of the file
fr.load();
}
//called when the user cancels out of the browser dialog
private function onCancel(e:Event):void
{
trace("File Browse Canceled");
fr = null;
}
/************ Select Event Handlers **************/
//called when the file has completed loading
private function onLoadComplete(e:Event):void
{
//get the data from the file as a ByteArray
var data:ByteArray = fr.data;
//read the bytes of the file as a string and put it in the
//textarea
outputField.text = data.readUTFBytes(data.bytesAvailable);
//clean up the FileReference instance
fr = null;
}
//called if an error occurs while loading the file contents
private function onLoadError(e:IOErrorEvent):void
{
trace("Error loading file : " + e.text);
}
]]>
</mx:Script>
<mx:Button label="Load Text File" right="10" bottom="10" click="onLoadFileClick()"/>
<mx:TextArea right="10" left="10" top="10" bottom="40" id="outputField"/>
</mx:Application>
To decode it, look into http://help.adobe.com/en_US/FlashPlatform/reference/actionscript/3/mx/utils/Base64Decoder.html
If the security system scans for patterns in files, it is very unlikely that it will overlook a base64-encoded file or base64-encoded contents in files. E-mail attachments are base64-encoded, and if the system is any good it will scan for potentially harmful e-mail attachments even if they are named .txt. The base64-encoded start of an EXE file is almost certainly recognized by it. So ISTM you are asking the wrong question.

Categories

Resources