Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
I have a web app that displays productivity figures for the vessels currently working in our container terminal. The data is somewhat sensitive, but is not really top secret, since is just moves pending, total, productivity in moves per hour and the times of the first move and an estimate when the vessel will finish operations. Although, if made public, the shipping lines could be able to see how their competition's vessels are being attended, and might cause issues with our marketing department.
Now, I created the webapp for our smartphone users, so they can have the real-time productivity board at hand. That help them assess the operation the moment is happening, and make corrective actions to speed up the lifting equipment or fix any issues on the fly.
The app runs in an internal web server, and the users must log into the VPN to view the app data. Is not accessible from the outside. We have been recently requested by some of our customers to have the data available to them, but segregated in order to see their vessels only. That is no problem. I can do that easily, but the issue is that I don't want to give access to our VPN to each and every customer that want to use the app.
The app works this way:
a) A Pentaho ETL runs querying our databases, and produces an XML file which is saved in the apache webroot.
b) The XML file es read by the webapp written in HTML5, JS, JQuery, and also using bootstrap.js, datatables.js, realgauge.js and some other frameworks.
My idea is to copy the app resource files to the public webserver, and have a cron job ftp all XML files being updated by the minute, since is accessible from the LAN. That way our smartphone users will no longer have to log into the VPN to access the app.
But, there are security concerns, since HTML, JS and XML files will be exposed to the public. The app will not be publicized, but I'm afraid that an attacker, just browsing the web root directory, might pinpoint the files and extract the data.
So, my question is one of a recommendation on which path should I take:
I've been doing some research in XML encryption, but I will need to provide some kind of token that will be used as a seed for the encryption algorithm, and I'm not quite sure how secure can it be.
Have a user/password authentication implemented on the app, but it might be complicated to maintain a database of users and passwords for everyone that will access the app. I worried about the administrative overhead of lost passwords and the sort. Although, I haven't researched the subject fully jet. I looked into hello.js, and it seems promising. I would like to hear your opinions on that.
We use Joomla 3 as our CMS for our website, so maybe there is something we can use on the joomla side, maybe use its user/password authentication system to control the access to the app.
Any other option that you consider I should research on.
Our main goal is: Have the app available to our mobile or other external users, while not exposing the plain XML file with all the data.
Many thanks to all for the help.
UPDATE
I've been researching on a Joomla template called "Blank". It turns out there is even a Bootstrap version, so if I can fit my code into the template, I can do access control within Joomla to publish my content to logged in users, and apply the customized template. With this I'll be fixing 2 issues.
I can publish customized customer data
I can also publish our mobile site to every one of our own mobile users, and I'll be saving tons of $ on VPN licenses.
Thanks all for your help.
I'm assuming goals of: (1) pretty good security, and (2) minimal development work necessary.
Then I prefer your approach #2. I would guess from the situation you describe, that there isn't a huge need to change passwords, so you can just generate user/password combinations yourself and share it with clients. You could update it once a year if necessary. Then it's straight forward to either secure access on your app using user/password login, or you could encrypt each xml for the client using the client's password.
If you found there really was a major need for clients to change their passwords, the question would be how to store and update the passwords instead of just having the app read a flat file.
Related
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 3 months ago.
Improve this question
The company I work for has a requirement to protect some area where articles are rendered, I've implemented some procedures to protect web-scraping but the problem remains for manual scraping.
The anti web scraping bot protection mechanism seems to be working good so far, but I see clients trying the manual scraping.
What I have tried to protect the article contents:
Set a copy eventhandler on article's wrapper element to prevent copy.
-> Clients can make use of userscripts (greasemonkey, etc) to efficiently bypass this by removing the eventhandler or simply making scripts to copy the contents and save to a file
Developer console protection -> useless
Redirect if F12 pressed -> useless
Seems like protecting HTML is undoable (unless someone tells me otherwise) so I'd like to know other ways to display text and render it totally UNABLE to copy.
Things I've thought:
JS detection mechanisms to diagnose if the user has any sort of userscript running, in other words, if there's no malicious JS code being injected and executed to extract the text
Transforming the article's HTML into a PDF and displaying it inline with some sort of anti text-select/copy (if this even exists).
Transforming the article's HTML into chunks of base64 images which will render the text completely unable to select and copy
Are there any good ways to prevent my content from being stolen while not interfering much with user experience? Unfortunately flash applets are not supported anymore, it used to work charms that era.
EDIT: Cmon folks, I just need ideas for at least make end user's efforts a bit harder, i.e. you can't select text if they're displayed as images, you can only select image's themselves.
Thanks!
As soon as you ship HTML out of your machine, whoever gets it can mangle it at leisure. You can make it harder, but not impossible.
Rethink your approach. "Give information out" and "forbid it's use" somewhat clashes...
No, You Can't
Once the browser loaded your page, You can't protect the content from copying / downloading.
It can be text, image or videos, You can protect it from unauthorised access. But you can't protect from get scraped by the authorized person.
But you can make it harder using the steps that you mentioned in your question and restricting the copyright laws.
This issue still exists in many sites, Especially In E-learning platforms, such as udemy and etc... In those sites, The premium courses are still getting copied / leaked by the person who bought it.
From Udemy FAQ
For a motivated Pirate, however, any content that appears on a computer screen is vulnerable to theft. This is unavoidable and a problem across the industry. Giants like Netflix, Youtube, Amazon, etc. all have the same issue, and as an industry, we continue to work on new technology solutions to limit Piracy.
Because pirating techniques currently outpace protection, we hired a company who is specifically dedicated to enforcing the DMCA laws on your behalf and target violating individuals, hosting sites, and DNS servers in an attempt to get any unauthorized content removed.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
The last couple of days I began to teach myself how to create a Website from scratch.
I bought a webspace and fooled around with html, css and javascript and when I wanted to build a online chess game I learned about Node.js
But I don't understand what Node.js is used for because the documentation shows how to install and create a fresh server(!) with Node.js and handle requests.
Do I don't have to use a apache installation on my server anymore?
Do I create the whole website and all it's pages with Node.js like the index or about page?
If I use Node.js just for a web application, how can I add the web app to an already existing Apache website's page?
I think I really got confused and need some help to understand Node.js better since so many are using it.
Do I don't have to use a apache installation on my server anymore?
Correct. You create your whole web server in node.js. You generally don't use or need Apache with it.
Do I create the whole website and all it's pages with Node.js like the index or about page?
Yes, you create the whole web server in node.js and use it to serve all your web pages. Typically one might use a number of libraries with node.js such as Express for mapping all the routes in your web app and your favorite template engine to help with filling in data in HTML pages before serving them to the client. At very high scale, one might consider using some other infrastructure (like nginx) to offload static resources from your node.js server to increase scalability, but that is not necessary at small or medium scale as node.js scales really well.
If I use Node.js just for a web application, how can I add the web app to an already existing Apache website's page?
You can run one of the two web servers on a different port and have two web servers that are part of your web-site liking to each as needed. But, typically you would move everything you currently have in Apache over to your node.js app. You don't have to do that, but most people wouldn't start out with an objective to build a web-site out of both node.js and Apache.
One thing to keep in mind is that node.js/Express are conceptually a bit different from Apache in how you build a simple web-site. A node.js/Express web server serves NO content at all by default. So, you don't just drop a directory hierarchy of web pages on your node.js server and expect it to serve those pages by default. You can do that pretty easily with express.static() (a feature of the Express library) if that's part of your site design, but you have to consciously configure that in node.js (it takes just two lines of code to do so).
If you want to write a "simple" chess game, you're best bet is to start learning Canvas. This tutorial by the Mozilla Foundation is one I personally used and enjoyed a lot. If you want the computer to play as CPU opponent, then you would likely need to use a server(node.js!).
You can use Node.js to build a simple website, but that would be like using a screwdriver as a hammer. Node.js is used for making desktop apps and for programming servers. Servers(backend) analyze user inputs and provide some sort of feedback. Let's use StackOverflow as an example. The frontend(HTML, CSS, Javascript) is what you see. All of that is done in the browser. The way you get that code to someone's computer so they can render your website is via a server. Servers do other cool things. When you do a search on a website or save a post, the server is dealing with storing that data or finding you the right results. If you want to build an API, like the one used for Google maps, or for Yahoo Finance, you'd use a server.
If you want to make your own server using Node.js, I'd recommend using Digital Ocean or Heroku. They are beginner-friendly and are respected in the industry. Heroku is free and owned by Salesforce if that makes a difference. However, this is unnecessary, for beginners. I recommend using a free or low-cost hosting platform that deals with that for you.
The thing about Node.js is you can use it to create websites via template engines, but I wouldn't recommend that. You are essentially writing server-side javascript to create HTML. That's foolish most of the time when you can just write the HTML itself. Just have node.js deploy it.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 5 years ago.
Improve this question
I have a few questions which I'd appreciate to have some answers on.
So I've created a backend node server with express & mongo which is running specific tasks on the net and saves it in the database in a loop. I've also added an admin page with express & bootstrap. And that works fine. What I needed then was a frontend page - for this I chose VueJS. I started that project seperate for multiple reasons. I felt that this would be easier to get started, since I didn't have any frontend framework experience before and the backend project was written in typescript and I'd rather use normal es6 JS for now.
Right now - the site has already made some pretty decent progress and is at the point where I need to establish connection with the database and also use some of the already implemented functions in the backend project.
And this created the question:
Should I create new functions and/or create and use API's? Would there be any problem with the mongodb in the form of accessing and writing to it by two different processes? Would there be security issues if I'd create "public" apis from my already existing backend logic? (Haven't written any apis yet.)
Or should I use the time and import the frontend project into the backend (meaning also either translating new to typescript or switching to normal ES6 JS)? Would this be a security risk since I'd rather not have the backend logic in my frontend site.
I appreciate any answer to that!
Thank you :)
This is a question of can you afford to run two servers? separating your front end from your back end is actually a good move considering all things microservices since it allows you to scale these things separately for future purposes. Like your backend needing more resources once you start catering to mobile user as well or once you get more api calls, while your front end server need only serve the ui and assets, nothing more. Though the clear downside is the increase in costs since you do need to run two servers instead of one, something that is difficult when you are just starting out
Should I create new functions and/or create and use API's?
For your backend? Yes. APIs are the way to do things now in the webspace as it future proofs you and allows a more controlled and uniform way to access your backend(everything goes through the api). So if your front end isnt accessing your database through the APIs yet, i suggest you refactor them to do so.
For your concerns about mongo, im pretty sure mongo already has features in place to avoid deadlocks.
As for security of your API, I suggest checking out JWT.
should I use the time and import the frontend project into the backend
should you go this path instead due to cost concerns, i would suggest rewriting one of the codebase to comply with the other for uniformity's sake, though do that at your leisure(we can't have you wasting all your precious time rewriting code that already works just fine). this isnt really that much of a security issue since backend code isnt being sent to the front end for all your users to see
Let me start by saying I've never used Vue. However, whenever I use react, I always make separate projects for the front end and the back end. I find it's 'cleaner' to keep the two separate.
I see no reason for you to transcribe your entire project from typescript. Simply have your frontend make requests to your backend.
If you're looking to brush up on your web security, I recommend you look into the Open Web Application Security Project.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I'm a newbie developing a small web application that uses HTML, JavaScript (Angular 1.5.0) & CSS. I am using Grunt to process, jslint, minify my javascript and CSS files. This front-end communicates via HTTP Rest calls to a Django application that is also running locally (which has its own database)
This webapp currently lives only on my laptop (MacBook Pro) and I use the PyCharm IDE to edit the files. When I want to test out the app, I simply go to http://localhost:63342/myapp/index.html#. PyCharm's built-in webserver serves it up for me and I can play with it there.
However, I want to allow a select few people to also access the webapp from other locations on the internet. When they try to access http://MyPublicIpAddress:63342/myapp/index.html, they get connection denied.
What is the quickest/easiest/simplest way I can share my webapp with those other people? I do not know much about setting up and configuring Webservers, so if you can give me the simple/easy instructions (or point me to a doc!) that would be most appreciated.
I posted this question to the PyCharm community forum here, but got no response.
Edit
Many answers say I need hosting service. Yes, If I want to deploy my website to a fixed IP address. But is there no way to simply allow them to briefly visit my webapp while temporarily running a toy web-server on my laptop? This is not a long-term solution I understand. But just to give them a peek. If possible I would like to avoid the effort and learning-curve involved in pushing it to a hosting service. I would have to setup the back-end API, database, etc (which are all currently running locally)
There's many services that allow you to host your project online.
For small projects
CodePen: http://codepen.io/
Plunker: http://plnkr.co/
kodeWeave: http://kodeweave.sourceforge.net/
For large projects
Cloud9IDE: https://c9.io/
Koding: http://koding.com/
Github: https://pages.github.com/
Sourceforge: https://sourceforge.net/
Heroku: https://www.heroku.com/
BTW: kodeWeave is my project. It uses Github Gists to save and retrieve your weaves online, thus is not actually saved on the site plus it's a very reliable host when it comes to small projects like it is. (Inspiration from Dabblet.)
It's being made kind of as a JSFiddle alternative for mobile devices, except without all the http requests.
It has many libraries built in (Such as JQuery, Angular, Font Awesome, etc:) in addition when you export as a zip file you will get all those libraries (Hence the except without all the http requests comment). You can also export your weave as a Windows, Linux, Mac, Chrome Application, and/or as a Chrome popup extension.
You can watch this video I made that explains how to use kodeWeave for desktop exportation.
I've listed services I use and recommend. I will not list something I haven't tried without warning.
If you have a spare laptop you can use that as a web server. I've never done it myself because it's not worth the this for me. However something you may want to look into
Lastly you can read Creating a Local Server Configuration with PyCharm which maybe the option you're looking for.
Use localtunnel to expose your localhost- https://github.com/localtunnel/localtunnel
You need hosting, or try codepen.io for small project.
Change the configuration in PyCharm to host at 0.0.0.0. You will also need to port forward your router... I would strongly suggest not using this as any sort of long, medium or short term solution.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 7 years ago.
Improve this question
I was wondering what would be the most ethical way to consume some bytes (386 precisely) of content from a given Site A, with an application (e.g. Google App Engine) in some Site B, but doing it right, no scraping intended, I really just need to check the status of a public service and they're currently not providing any API. So the markup in Site A has a JavaScript array with the info I need and being able to access that let's say once every five minutes would suffice.
Any advice will be much appreciated.
UPDATE:
First all thanks much for the feedback. Site A is basically the website of the company that currently runs our public subway network, so I'm planning to develop a tiny free Android app for anyone to have not only a map with the whole network and its stations but also updated information about the availability of the service (and those are the bytes I will eventually be consuming), etcétera.
There will be some very differents points of view, but hopefully here is some food for thought:
Ask the site owner first, if they know ahead of time they are less likely to be annoyed.
Is the content on Site A accessible on a public part of the site, e.g. without the need to log in?
If the answer to #2 is that it is public content, then I wouldn't see an issue, as scraping the site for that information is really no different then pointing your browser at the site and reading it for yourself.
Of course, the answer to #3 is dependent on how the site is monetised. If Site A provides advertistment for generating revenue for the site, then it might not be an idea to start scraping content, as you would be bypassing how the site makes money.
I think the most important thing to do, is talk to the site owner first, and determine straight from them if:
Is it ok for me to be scraping content from their site.
Do they have an API in the pipeline (simply highlighting the desire may prompt them to consider it).
Just my point of view...
Update (4 years later): The question specifically embraces the ethical side of the problem. That's why this old answer is written in this way.
Typically in such situation you contact them.
If they don't like it, then ethically you can't do it (legally is another story, depending on providing license on the site or not. what login/anonymousity or other restrictions they have for access, do you have to use test/fake data, etc...).
If they allow it, they may provide an API (might involve costs - will be up to you to determine how much the fature is worth to your app), or promise some sort of expected behavior for you, which might itself be scrapping, or whatever other option they decide.
If they allow it but not ready to help make it easier, then scraping (with its other downsides still applicable) will be right, at least "ethically".
I would not touch it save for emailing the site admin, then getting their written permission.
That being said -- if you're consuming the content yet not extracting value beyond the value
a single user gets when observing the data you need from them, it's arguable that any
TOU they have wouldn't find you in violation. If however you get noteworthy value beyond
what a single user would get from the data you need from their site -- ie., let's say you use
the data then your results end up providing value to 100x of your own site's users -- I'd say
you need express permission to do that, to sleep well at night.
All that's off however if the info is already in the public domain (and you can prove it),
or the data you need from them is under some type of 'open license' such as from GNU.
Then again, the web is nothing without links to others' content. We all capture then re-post
stuff on various forums, say -- we read an article on cnn then comment on it in an online forum,
maybe quote the article, and provide a link back to it. Just depends I guess on how flexible
and open-minded the site's admin and owner are. But really, to avoid being sued (if push
comes to shove) I'd get permission.
Use a user-agent header which identifies your service.
Check their robots.txt (and re-check it at regular intervals, e.g. daily).
Respect any Disallow in a record that matches your user agent (be liberal in interpreting the name). If there is no record for your user-agent, use the record for User-agent: *.
Respect the (non-standard) Crawl-delay, which tells you how many seconds you should wait before requesting a resource from that host again.
"no scraping intended" - You are intending to scrape. =)
The only reasonable ethics-based reasons one should not take it from their website is:
They may wish to display advertisements or important security notices to users
This may make their statistics inaccurate
In terms of hammering their site, it is probably not an issue. But if it is:
You probably wish to scrape the minimal amount necessary (e.g. make the minimal number of HTTP requests), and not hammer the server too often.
You probably do not wish to have all your apps query the website; you could have your own website query them via a cronjob. This will allow you better control in case they change their formatting, or let you throw "service currently unavailable" errors to your users, just by changing your website; it introduces another point of failure, but it's probably worth it. This way if there's a bug, people don't need to update their apps.
But the best thing you can do is to talk to the website, asking them what is best. They may have a hidden API they would allow you to use, and perhaps have allowed others to use as well.