I've read through quite a few pages of documentation and other StackOverflow questions/answers but can't seem to come across anything that can help me with my scenario.
I'm hosting a public, static site in an S3 bucket. This site makes some calls to an API that I have hosted in an EC2-instance in a VPC. Because of this, my API can only be called by other instances and services in the same VPC.
But I'm not sure how to allow the S3 Bucket site access to my API.
I've tried creating VPC Endpoints and going down that route, but all that did was restrict access to my S3 site from only the instances within my VPC (which is not what I want).
I would appreciate any help with this, thank you so much.
Hopefully my question is clear.
No, S3 Static Websites are 100% client side code. So it's basically just html + css + javascript being delivered, as-is from S3. If you want to get dynamic content into your website, you need to look at calling an API accessible from your user's browser, i.e. from the internet.
AWS API Gateway with Private Integrations could be used to accept the incoming REST call and send it on to your EC2 Server in your VPC.
My preferred solution to adding dynamic data to S3 Static Websites is using API Gateway with AWS Lambda to create a serverless website. This minimises running costs, maintenance, and allows for quick deployments. See The Serverless Framework for getting up and running with this solution.
A static site doesn't run on a server. It runs entirely in the web browser of each site visitor. The computer it is running on would be the laptop of your end-user. None of your code runs in the S3 bucket. The S3 bucket simply stores the files and serves them to the end-user's browser which then runs the code. The route you are going down to attempt to give the S3 bucket access to the VPC resource is not going to work.
You will need to make your API publicly accessible in order for code running in your static site (running in web browsers, on end-user's laptops/tablets/phones/etc.) to be able to access it. You should look into something like API keys or JWT tokens to provide security for your public API.
Related
Please help me rephrase the question if it doesn't fully reflect what I want.
I have an app that consists of:
1)Go server API
2)Go server Frontend that utilizes API's endpoints
3)Docker container that runs both servers
During development I had localhost set statically in multiple places, such as:
1)In the API server for enabling CORS in order for Frontend to be able to communicate with the API
Example:
func enableCors(w *http.ResponseWriter) {
(*w).Header().Set("Access-Control-Allow-Origin", "http://localhost:8080")
(*w).Header().Set("Access-Control-Allow-Credentials", "true")
}
2)In the API server for redirecting to and from the API in order to authenticate the user
3)In the Frontend server JavaScript part to access the API
4)In Google API Authorized redirect URIs as:
http://localhost:8001/oauth/authenticate?method=google
For the first time I want to make my application operate in a more production like way.
If every time someones downloads my application and builds a docker image from it the resulting container would have a different external IP address how should I set up my code structure to look like in order to account for this dynamic IP?(Especially with Google cloud APIs redirect for OAUTH)? Or am I fundamentally wrong and this is not possible/not wanted behavior in the first place as there usually is one server on which X application is hosted and its IP is always known and static in the real world?
The only thing I came up with if after all it is something achievable is to somehow get container IP inside the Dockerfile and then set it to an Environment variable that would then be used by both servers but that still doesn't solve the Google API OAUTH static allowed redirect URI problem. Hopefully, I was able to convey what the problem is because I have had trouble phrasing it.
Progress:
So far I have managed to set environment variable through a shell script during docker build like so:
export SERVER_IP=$(hostname -I | xargs)
and for the JavaScript I used webpack plugin Dotenv with systemvars parameter set to true to capture system env and incorporate it into JS.
It is enough to ask users of your app to complete config file. They must write their own APIKey at least, and I think they will configure their Google API and can easily get their IP(or just writes their static domain name). You must only inform users in README.
I have been working on a webapplication, where I scrape data by using Scrapy and launch the data on S3. Now I want to fetch the data to my React project. This is working well, if I set the data to be public.
axios
.get(`https://s3-eu-west-1.amazonaws.com/bucket/data.json`)
.then(res => {
console.log("Data: ", res.data);
this.setState({ events: res.data });
})
.catch(console.log("error"));
Question
I don't want the data I'm scraping to be public and should only be available for my webapplication. Is this even possible?
I'm assuming you're talking about a client-side web application that runs in the user's browser? As far as I know, you need at least some server-side component to control or allow access to private S3 resources. This could be a lambda function or an actual server, but AFAIK there is no safe way to do this from the client only.
There are two ways that I'm aware of to expose private S3 resources to a client-side app:
If there is a server under your control (EG a NodeJS server that delivers your app, or perhaps provides API services) you can connect to S3 securely from the server side and deliver whatever you need to the client side. This could also be done from a lambda function. Whatever you choose you still need a way to make sure the client/app requesting the content should have access to the content, EG the user should have a valid session.
You could allow access to a private S3 object by generating a pre-signed URL that gives the client app some fixed amount of time to download the content. This is probably an endpoint on your server (or lambda) that your client-side app calls only after making sure the user that requested it is authorized.
Here's a tutorial on Medium that explains both options: https://blog.fyle.in/sharing-files-using-s3-pre-signed-urls-e05d4603e067
Here's a StackOverflow answer with example code for Node: Nodejs AWS SDK S3 Generate Presigned URL
Per my review of how to setup secure access to amazon s3 buckets it looks like we first generate an IAM user and then tie a security policy allowing s3 access to that user. After that we can generate API keys for the bucket, which can authenticate request for bucket access. That's my understanding at this point, please correct me if I missed something.
I assume the API keys should be server side only (The Secret Access Key). In other words it's not safe to place these directly inside the webapp? Hence we would first have to send the data to our server, and then once there we can send it to the bucket using the API key?
Is there any way to secure access directly from a web app to an amazon s3 bucket?
Approach Summary
Per the discussion with #CaesarKabalan it sounds like the approach that would allow this is:
1) Create an IAM user that can create identities that can be authenticated via Amazon Cognito - Lets call the credentials assigned from this step Cognito Credentials.
2) The user signs in to the webapp with for example Google
3) The webapp makes a request to the webapp's server (Could be a lambda function) to signup the user with Amazon Cognito
4) The webapp now obtains credentials for the user directly from Amazon Cognito and uses these to send the data to the s3 bucket.
I think that's where we are conceptually. Now it's time to test!
From your question I'm not sure what portions of your application are in AWS nor your security policies but you basically have three options:
(Bad) Store your keys on the client. Depending on the scope of your deployment this might be ok. For example if each client has it's own dedicated user and bucket there probably isn't much risk, especially if this is for a private organization where you control all aspects of the access. This is the easiest but less secure. You should not use this if your app is multi-tenant. Probably move along...
(Great) Use an API endpoint to move this data into your bucket. This would involve some sort of infrastructure to receive the file securely from the client then move it into S3 with the security keys stored locally. This would be similar to a traditional web app doing IO into a database. All data into S3 goes through this tier of your app. Downsides are you have to write that service, host it, and pay for bandwidth costs.
(Best) Use Amazon Cognito to assign each app/user their own access key. I haven't done this personally but my understanding is you can provision each entity their own short-lived access credentials that can be renewed and you can give them access to write data straight to S3. The hard part here will be structuring your S3 buckets and properly designing the IAM credentials for your app users to ONLY be able to do exactly what you want. The upside here is the users write directly to S3 bucket, you're using all native AWS services and writing very little custom code. This I would consider the best, most secure, and enterprise class solution. Here is an example: Amazon S3: Allows Amazon Cognito Users to Access Objects in Their Bucket
Happy to answer any more questions or clarify.
What is the current best practice if I want to allow a user to have a local html/js website that can upload and download files from an Amazon S3 bucket, as an anonymous user?
I.E. I would like to load a local version of index.html in my browser, and access a previously set up (public?) s3 bucket, and not require the user to be logged in to Amazon, or any other identity service. (I am aware of the pitfalls of using a public s3 bucket)
Is my only option to use the AWS javascript SDK, and an Unauthenticated user as mentioned here?
https://aws.amazon.com/blogs/developer/authentication-with-amazon-cognito-in-the-browser/
That is the only reasonably safe, reasonably viable solution... assuming it will work with a local file.
It's technically possible to write to a bucket using ordinary HTTP PUT requests by setting bucket policy accordingly, but... here be dragons. If your bucket allows anonymous access, then it's relatively trivial for anyone to upload objects to your bucket that you cannot access, at all, other than to delete them.
Usually, when someone asks about anonymous access, they're trying to save some effort... but the effort saved is not likely to be worth the potential cost.
I am using Google Cloud Storage. To upload to cloud storage I have looked at different methods. The method I find most common is that the file is sent to the server, and from there it is sent to Google Cloud storage.
I want to move the file directly from the user's web browser to Google Cloud Storage. I can't find any tutorials related to this. I have read through the Google API Client SDK for JavaScript.
Going through the Google API reference, it states that files can be transferred using a HTTP request. But I am confused about how to do it using the API client library for JavaScript.
People here would require to share some code. But I haven't written any code, I have failed in finding a method to do the job.
EDIT 1: Untested Sample Code
So I got really interested in this, and had a few minutes to throw some code together. I decided to build a tiny Express server to get the access token, but still do the upload from the client. I used fetch to do the upload instead of the client library.
I don't have a Google cloud account, and thus have not tested this, so I can't confirm that it works, but I can't see why it shouldn't. Code is on my GitHub here.
Please read through it and make the necessary changes before attempting to run it. Most notably, you need to specify the location of the private key file, as well as ensure that it's there, and you need to set the bucket name in index.html.
End of edit 1
Disclaimer: I've only ever used the Node.js Google client library for sending emails, but I think I have a basic grasp of Google's APIs.
In order to use any Google service, we need access tokens to verify our identity; however, since we are looking to allow any user to upload to our own Cloud Storage bucket, we do not need to go through the standard OAuth process.
Google provides what they call a service account, which is an account that we use to identify instances of our own apps accessing our own resources. Whereas in a standard OAuth process we'd need to identify our app to the service, have the user consent to using our app (and thus grant us permission), get an access token for that specific user, and then make requests to the service; with a service account, we can skip the user consent process, since we are, in a sense, our own user. Using a service account enables us to simply use our credentials generated from the Google API console to generate a JWT (JSON web token), which we then use to get an access token, which we use to make requests to the cloud storage service. See here for Google's guide on this process.
In the past, I've used packages like this one to generate JWT's, but I couldn't find any client libraries for encoding JWT's; mostly because they are generated almost exclusively on servers. However, I found this tutorial, which, at a cursory glance, seems sufficient enough to write our own encoding algorithm.
I'd like to point out here that opening an app to allow the public free access to your Google resources may prove detrimental to you or your organization in the future, as I'm sure you've considered. This is a major security risk, which is why all the tutorials you've seen so far have implemented two consecutive uploads.
If it were me, I would at least do the first part of the authentication process on my server: when the user is ready to upload, I would send a request to my server to generate the access token for Google services using my service account's credentials, and then I would send each user a new access token that my server generated. This way, I have an added layer of security between the outside world and my Google account, as the burden of the authentication lies with my server, and only the uploading gets done by the client.
Anyways, once we have the access token, we can utilize the CORS feature that Google provides to upload files to our bucket. This feature allows us to use standard XHR 2 requests to use Google's services, and is essentially designed to be used in place of the JavaScript client library. I would prefer to use the CORS feature over the client library only because I think it's a little more straightforward, and slightly more flexible in its implementation. (I haven't tested this, but I think fetch would work here just as well as XHR 2.).
From here, we'd need to get the file from the user, as well as any information we want from them regarding the file (read: file name), and then make a POST request to https://www.googleapis.com/upload/storage/v1/b/<BUCKET_NAME_HERE>/o (replacing with the name of your bucket, of course) with the access token added to the URL as per the Making authenticated requests section of the CORS feature page and whatever other parameters in the body/query string that you wish to include, as per the Cloud Storage API documentation on inserting an object. An API listing for the Cloud Storage service can be found here for reference.
As I've never done this before, and I don't have the ability to test this out, I don't have any sample code to include with my answer, but I hope that my post is clear enough that putting together the code should be relatively straightforward from here.
Just to set the record straight, I've always found OAuth to be pretty confusing, and have generally shied away from playing with it due to my fear of its unknowns. However, I think I've finally mastered it, especially after this post, so I can't wait to get a free hour to play around with it.
Please let me know if anything I said is not clear or coherent.