Effective way of getting an article's published date/author dynamically? - javascript

I'm working on a referencing webapp as part of a course I am studying, the aim of which is to allow students to quickly and easily reference the materials they find information in and I'm running into a couple of issues with things.
The first is getting an article/site's published date. When dealing with static HTML sites this is easy, as I can simply use document.lastModified to pull in the time it was last modified. Issues arise when dealing with the much more common CMS powered website, as pages are dynamically generated which causes document.lastModified to always return the equivalent of 'now'... which isn't accurate at all.
There are steps that developers of sites can take to make this a bit easier with the implementation of HTML5, namely with the addition of the element, which can have additional attributes set to define it as the time a post was published. Sites like these are fine, but the vast majority of sites aren't using HTML5 and I don't really see this changing any time soon. Anyone out there got some ideas on how to accurately identify when a post was created?
The second is accurately identifying the author of a post or page. There are a couple of ways to identify this. The first is if a site has used the hAtom microformat to identify elements of the site, which makes things easy... but as with post dates isn't common.
The next is looking at the meta data of a site, and identifying the author based on content stored there. This is both uncommon and also generally the owner of the site, or another person not responsible for the post, which leaves it somewhat unreliable for use as a resource.

If the website has an RSS feed and the article is recent enough to be included in it you could extract metadata about the article from it.

Sounds like a pretty tough thing to make, only because there is absolutely no standardization for this information that I know of. Some sites might put it in their keywords, others not.
I did some scraping as part of a media criticism class, and I find that pretty much each cms has to be processed individually. Overall, making something that would find the author info on a random web pages sounds very difficult.
You might be able to make something specifically for capturing this info from WordPress blogs, since those have so many commonalities. But something designed to just hit up any site and grab specific pieces of info, that's pretty tough.
Not trying to discourage you at all, just saying that you've set a pretty high goal, imho.

Sorry I can't help very much, but what about using regex to scan the page for 'By ___' or 'Source: ___' to get the author / source of the information?
As for the date last modified, as far as I know there's no easy way to grab this, as regex'ing for a date would return recent articles in sidebars, links, etc. And yeah, as you said document.lastmodified wouldn't work. You could consider replacing this with "date added" to your referencer, or similar.
Hope this helps you at least a little bit, and if not, gives you an idea or two.
Of course, if there's any API / RSS available, you could scan it for the last updated / posted date, and use that?

Related

Will Google search results reflect website text that I replace with a script? [duplicate]

I am just wondering if Google or other search engines execute JavaScript on your web page. For example, if you set the title tag using JavaScript, does the Google search engine see that?
There have been some experiments performed for SEO purposes which indicate that at least the big players (Google, for example) can and do follow some simple JavaScript. They avoid sneaky redirects and such, but some basic content manipulation does seem to get through. (I don't have a link handy for Google themselves confirming or denying this, it's just various posts I've come across when dealing with this before.)
However, this is generally considered unreliable. If SEO is being done for any important purpose, don't rely on the spiders indexing much dynamic content.
There's actually a very good (in my opinion, anyway) answer here to a very similar question. What I like about that answer is how it breaks down the steps for generating good, indexable, and best of all maintainable web pages with concerns properly separated. Adhering as much as possible to this process generally results in good SEO, good accessibility, and good design skills in general.
Yes, Google executes Javascript. How much is a moving target.
Google executed some Javascript as early as 2011: http://searchengineland.com/google-can-now-execute-ajax-javascript-for-indexing-99518
This article from 2012 documents some experiments on what Javascript Google did and did not run at the time: http://moz.com/ugc/can-google-really-access-content-in-javascript-really
In May 2014, Google said publicly that they execute Javascript: http://googlewebmastercentral.blogspot.com/2014/05/understanding-web-pages-better.html Although that post says that Google has been getting better, there are no publicly available details on what Javascript Google does and does not execute -- but presumably they are at least as good at it as they were in 2012.
I'm pretty sure they dont. However, you can see for yourself: google have a tool which will show you your page as it sees it as http://www.google.com/webmasters/
if the text is in the onpage javascript, google will see the text. but it will not be seen as the text of the title element.
but hey, this is quite easy to test. just do it. wait two days. if you then google your site with site:.... look whats in the headline. if it's in there then the answer is yes: google sees it, if not: no google doesn't. it's easily testable.
(p.s.: my money is on: no)
We need to remember that JavaScript is client-side language, and always start executing from client-side. If all of titles or contents are via javascript then it'll be output from client-side, and I doubt it'll show up on Google search (meanwhile if outputted on .html, then yes).
If I am correct as of latest, meta tags are "fuel for search-engine", and it have ties to SEO, where it is common robots to be scripted to crawl on your site.

Display Comic Book files on a webpage?

I'm thinking about creating a webpage and I'm trying to brainstorm some ways to display them in the page.
If i wanted to get dirty and create everything myself, i think i could do it with html5, CSS3, and javascript/jquery. Just do some kind of page buttons with an image tag and maybe get into some more detailed stuff as it comes up (i dont know how i would do zooming and multiple pages).
But wahat i really want to know is if there is already some way to do this? I've looked around for a bit and cant seem to find any sort of plugin that would read a cbz file or display an set of images with the 'e-reader' type of tools in mind. Just wondering if anyone knows of anything?
Thanks
I used to use an online reader for a long time so I started an experiment to build one myself a while back: netcomix
It's open source so you can see if you find anything appealing in what I did. I figured I'd do all the real UI work client side with HTML, CSS, and JavaScript and the server was strictly responsible for acting as a service (for example, to supply a list of comics or a list of all the pages in a particular issue) and serving up the individual JPG/PNG/GIF files. That compartmentalized things nicely and I was very pleased with how jQuery BBQ gave me a history that I could back through even though I stayed on one page the whole time.
Now if I were to do the same experiment again, I'd use Backbone.js to give some structure to the client side and obviously it needs a lot of love because the server side really does nothing at the moment. Early versions were strictly hard coded although I started putting in some simple SQL stuff in there in the latest version. It's nothing more than an experiment though and should be treated as such. It's there for ideas and little else. If you find it interesting and want some more ideas contact me and I'll be happy to let you know all my wacky ideas for such a program.
I know this is an old question. But web technologies have gotten better in the last few years. There are several comic book readers that can work in the browser using pure HTML and JavaScript. I wrote one called: http://comic-book-reader.com .
If you want to see a very simple example of how to read CBR and CBZ files in the browser. You should check out http://workhorsy.github.io/uncompress.js/examples/simple/index.html which uses the JavaScript library https://github.com/workhorsy/uncompress.js

Replace flash with jquery/html5

I have a project with that uses flash for most pages. Now the client want to replace the flash with jquery/html. So from where I have to start with?
The project has swf file and it is embedded by swfobject(javascript).
Can someone help me with giving a idea or steps how I can convert the swf to javascript/html?
If the Flash doesn't contain a huge amount of animation/dynamic interaction then I'd say you should follow a process along these lines:
Create HTML documents where the content is structured in a simple, consistent format. If you're not sure about how to do this then I'd suggest you find out some of the basic principles associated with semantic markup (Google is your friend here, there are plenty of great resources - but here's a starter I just came across). Don't worry about elements which involve animation or special user interaction at this stage - just set aside a part of your HTML where such elements need to go for now
Use CSS to layout the document structure with the desired appearance. Do this one step at a time - starting from the largest elements in the page and working your way down to the smallest ones (I find this the best way to build your UI reliably though others may approach it differently).
Once your basic pages are structured correctly and looking good it's time to focus on the animated/interactive aspects. The easiest way to do this is to use other people's work: jQuery plugins. Identify what functionality you need and find plugins that already provide that functionality (i.e. you mentioned a slideshow function - Google for "jQuery slideshow" and play around with the options - there will be plenty).
Realistically, if you're not particularly familiar with HTML/CSS/JS this will not be a simple task for you. Here's a few other thoughts that might help you:
Focus initially on the content structure: if you get this right everything else builds on top of it with much less pain. In addition, Google really likes good document structure so this will go a very small way to getting the site better search result rankings.
Don't worry too much about HTML5: aside from the fact that you have to do extra work to make it fully browser compatible (at least, if you have need of a great deal of browser compatibility), it just isn't really necessary to take advantage of elements like nav or video yet (unless your client won't allow the use of Flash video - but that's for another discussion). Don't bite off more than you can chew.
Be consistent, this is related to the first point above - if you apply structure to your elements consistently across all of your documents you can then take advantage of the same CSS and JS much more easily.
As for the w3schools comments elsewhere on this page, I would say don't use them for tutorials - but they can be a useful reference source for learning HTML elements and attributes and for CSS rules (although there are many other sites who could help here).
Well, I hope this helps you out. Sorry I can't be more specific but I'd need a much more detailed description of your problem before I could give you much more... Good luck
1) Learn Flash (hopefully you've achieved that)
2) Learn Javascript. Learn html5. There are a lot of resources and tutorials.
3) Take the source of flash project. Read it and slowly rewrite to javascript.
4) Once you have translated the project, it is propably suboptimal. Think in javascript to change some flash-like constructions into javascript-like. Do minor optimizations until you're happy with results.
Have you tried Wallaby or Swiffy? Adobe is reacting to the demise of excessive Flash usage in other ways too.

Where can I get advice on how to build completely ajax web apps?

I am building a completely ajax web app (this is the first web app I have ever created). I am not exactly sure if I am going about it the right way. Any suggestions or places where I can go to find suggestions?
Update:
I currently am using jQuery. I am working on fully learning that. I have designed a UI almost completely. I am struggling in some parts trying to balance a good UX, good design and fitting all the options I want to fit in it.
I have started with the design. I am currently struggling with whether to use absolute positioning or not and if not how do I use float etc. to do the same type of thing. I am trying to make it have a liquid layout (I hate fixed-layout pages) and am trying to figure out what I should use to make it look the same in most screen sizes.
Understand JavaScript. Know what a closure is, how JavaScript's event handling works, how JavaScript interacts with the DOM (beyond simply using jQuery), prototypal inheritance, and other things. It will help you when your code doesn't work and you need to fix it.
Maintain usability. All the AJAX magic you add is useless if users cannot figure out how to use it. Keep things simple, don't overload the user by giving him information he doesn't need to know (hide less important information, allowing the user to click a link to show it), and if possible, test your app with actual users to make sure that the interface is intuitive to them.
Code securely. Do not allow your server to get hacked. There are many different types of security flaws in web apps, including cross-site scripting (XSS), cross-site request forgery (CSRF), and SQL injection. You need to be well aware of these and other pitfalls and how to avoid them.
One starting point is to look at the Javascript Libraries and decide which one to use:
http://code.google.com/apis/libraries/
http://en.wikipedia.org/wiki/Comparison_of_JavaScript_frameworks
You probably don't want to do raw Javascript code without any library. Once you decide on a library to use, then you can look at its documentations online or the books about using them. jQuery does have pretty good documentation.
Define "right way."
There are many "right ways" to code an app.
Things to keep in mind are trying to design a nice interface. The interface can make or break an application and studies show that it can even make it seem faster if you do it right. jQuery is good for this.
Another thing to consider going in is what browsers do you want to support? Firefox is really doing well and Google Chrome's market share is growing so you will want so support those for sure. IE is a tough one as it doesn't have the best support for standards, but if you are selling a product you will really want this.
One of the best articles that I've ever come across about the structure of an ajax web application is this one. A little outdated because it refers to XML as the primary data-interchange format, now JSON. jQuery, a javascript framework, contains excellent functionality for both DOM manipulation and AJAX calls. Both are a must in any AJAX-driven web app.

What's the best way of stopping users from copying and pasting text from a web app?

The site I'm working on displays some proprietary 3rd party data that's quite valuable. As such they want to stop people copying and pasting their information. They understand that, of course, there's nothing we can do to stop users just writing down info or printing it off, but they want to make it as difficult as possible for their data to be taken. The other big concern is performance. The site sees a healthy amount of activity, so keeping it snappy is a big deal.
I was hoping to get a bit of feedback from you guys on the best way of accomplishing this
Some potential solutions that have been suggested:
Use a bit of javascript to stop users hitting ctrl / right clicking (irritating and won't stop more advanced users)
Use flex (very slow, but very safe since the data is binary)
Create or find some funky html to image converter and display the data as images
Your thoughts and opinions are very welcome.
Thanks in advance!
Charge the users for access to the information.
You can try all sorts of code workarounds, but you really aren't going to stop anyone who is determined. By charging, you limit access to people who really need the information and if they copy it, then at least you've been reimbursed. It also filters out a lot of the people who would use it maliciously. Also, put a legal notice on the information detailing how it can be used so that you can follow up copiers with legal action if necessary.
This really sounds like a serious problem with the origins of the question. If this is something that shouldn't be easy to copy, why is it visible at all?
If its really proprietary, why is it a good idea to post it on the web?
Seems that an internal webpage would be more appropriate.
It is a tricky situation, since this is the web...
You could use a very small bit of flash to display the sensitive data, which you'd have complete control over, and if it's small, shouldn't hurt your download times. This would probably be my preferred method.
Option #3 would stop people from copying and pasting, but it wouldn't stop them from downloading the image. I'm not sure if that matters to you.
Do you need to serve audiences that have javascript turned off? If not, you could use AJAX to pull the sensitive information in the first place, then use a script to stop them from copying that div or whatever.
You might want to check out Tynt Tracer. It doesn't prevent copying, but at least allows you to track where it's going...in part anyway.
You might look at the option 1, as a "bare minimum" way of doing it, but admittedly it isn't a great option, as simply disabling JS gets around it.
Your third idea would also work, but you can actually make it easier to save by going to the image and the way they are stored in temporary internet files.
Also, as a side note, to prevent printing you might want to specify a print only CSS that hides all content.
body {display:none;}
It isn't perfect, but again stops the casual user from printing.
Charging money for the content is a good answer, but I'm guessing you're already charging for the content.
#2 is clearly the most secure option, and the most flexible, allowing you to really punish yourself as much as possible as well (do things like implement over the wire encryption etc...) So it should come as no surprise it is also the most expensive to implement.
Given, someone can just decompile your code and inspect memory, but at that point, it is doubtful you are going to stop anyone.
Offer the information for download in password protected pdf, where the only thing that they can do is to view it, no printing, copy paste, etc. Although you can't stop a print screen. Primo PDF can do that for you and is free. http://www.primopdf.com/
They key here is the that effort it takes to bypass any solution you choose, is greater than the value of the information you are trying to protect from being copied.

Categories

Resources