I am new to GAE so struggling to understand a few things.
I am trying to build a python web-app that processes videos uploaded by users (through the webapp) and displays some visualizations (built using d3-js) once the processing is done. The artifacts created during processing are saved locally and later uploaded to user-specific GCS buckets (they are not publically accessible).
I want to be able to display the visualization (using processed video artifacts) when a user requests for it. As per my understanding, since these are dynamically generated, I cannot store the artifacts in static folder for javascript to access. So, it seems that I have to save the processed video artifacts in a /tmp folder.
How do I ensure that javascript is able to fetch files from this external /tmp folder?
Or is there a better way to do this using GCS itself, how do I access buckets from javascript without making them public?
Please suggest some resources or ideas to solve this. Thanks!
I think you've got it backwards.
You have a private bucket, that's great for security. In order to have the client javascript (browser, mobile App) to download an object you need to either:
Have a HTTP handler for your python GAE that retrieves the file from GCS and sends it to the client. (flask pseudo code)
#app.route('/private/<name>')
def hello_name(name):
## if user is not authorized
#### return 401.
storage_client = storage.Client()
bucket = storage_client.bucket(bucket_name)
blob = bucket.blob(name)
bts = blob.download_as_bytes()
return bts
Give the client a Signed URL from GCS so they can download the file directly.
#app.route('/private/<name>')
def hello_name(name):
## if user is not authorized
#### return 401.
storage_client = storage.Client()
bucket = storage_client.bucket(bucket_name)
blob = bucket.blob(name)
url = blob.generate_signed_url(
version="v4",
# This URL is valid for 15 minutes
expiration=datetime.timedelta(minutes=15),
# Allow GET requests using this URL.
method="GET",
)
return url
As a note, in the second case the javascript file will need to first access /prvite/your_file_name to retrieve the signed url, then it will need to download the actual file from GCS using the signed url.
I store profile pictures in the s3 bucket and I store the aws URL in my database. when I need the profile picture I set the URL in the database into image tag.for this I set s3 bucket policy as public for read access.is this is a good idea or is there any other way to do this?
One way of going around making a bucket publicly accessible is to:
put all your image files under one 'folder'
create a CloudFront Distribution that serves only from that folder
only allow read access to the Identity that will be generated by the CF wizard
On the back end you should be able to infer the final location of the assets, as you should know the CF endpoint at this point.
Note: you should set the CF endpoint as an env var for your backend and not hardcode it.
Generating a Dropbox token will give write access to the project folder, which is why i need a read only token to use on the client side
I haven't been able to do anything because I haven't found any option on the developer dashboard to see the individual tokens and their permissions
var fetch = require('isomorphic-fetch');
var Dropbox = require('dropbox').Dropbox;
const dbx= new Dropbox({
accessToken: 'yourAccessTokenHere',
fetch:fetch,
});
Dropbox has released "scopes" functionality on the Dropbox API, which you can use to configure an app or access token to only a limited set of functionality, such as the ability to read but not write files.
You can find more information about the release in our blog post here:
https://dropbox.tech/developers/now-available--scoped-apps-and-enhanced-permissions
We have our bucket with new Aws SDK API on AWS S3. We uploaded and tagged lots of files and folders with tags.
How can we filter on key-value tag, or only one of them? I'd like to find all the objects with key = "temp", or key = "temp" and value = "lol".
Thanks!
I also hoped that AWS will eventually support "search files by tags" because that would open up possibilities like e.g. having a photo storage with the names, descriptions, location stored in tags so I wouldn't need a separate database.
But, apparently AWS explicitly is not supporting this, and will probably never do so. Quoting from their storage service white paper:
Amazon S3 doesn’t suit all storage situations. [...] some storage needs for which you should consider other AWS storage options [...]
Amazon S3 doesn’t offer query capabilities to retrieve specific objects. When you use Amazon S3 you need to know the exact bucket name and key for the files you want to retrieve from the service. Amazon S3 can’t be used as a database or search engine by itself.
Instead, you can pair Amazon S3 with Amazon DynamoDB, Amazon CloudSearch, or Amazon Relational Database Service (Amazon RDS) to index and query metadata about Amazon S3 buckets and objects.
AWS suggests using DynamoDB, RDS or CloudSearch instead.
There seems to be one way to achieve what you're looking for, although it's not ideal, or particularly user-friendly.
The AWS S3 tagging documentation says that you can grant accounts permissions for objects with a given tag. If you created a new account with the right permissions then you could probably get the filtered list.
Not particularly useful on an ongoing basis, though.
AFAIK - Resource Groups don't support tags on an S3 Object level only on a bucket level.
Source: https://aws.amazon.com/blogs/aws/new-aws-resource-tagging-api/ (scroll down the page to the table).
This is now possible using AWS Resource Tagging API and S3 Select (SQL). See this post: https://aws.amazon.com/blogs/architecture/how-to-efficiently-extract-and-query-tagged-resources-using-the-aws-resource-tagging-api-and-s3-select-sql/.
However, the Resource Tagging API supports only tags on buckets for the S3 service, not on objects: New – AWS Resource Tagging API
There's no way to filter/search by tags. But you can implement this yourself using S3.
You can create a special prefix in a bucket, e.g. /tags/. Then for each actual object you add and want to assign a tag (e.g. Department=67), you add a new object in /tags/, e.g: /tags/XXXXXXXXX_YYYYYYYYY_ZZZZZZZZZ, where
XXXXXXXXX = hash('Department')
YYYYYYYYY = hash('67')
ZZZZZZZZZ = actualObjectKey
Then when you want to get all objects that have a particular tag assigned (e.g. Department), you have to execute the ListObjectsV2 S3 API for prefix /tags/XXXXXXXXX_. If you want objects that have particular tag value (e.g. Department=67), you have to execute the ListObjectsV2 S3 API for prefix /tags/XXXXXXXXX_YYYYYYYYY_
https://docs.aws.amazon.com/AmazonS3/latest/API/API_ListObjectsV2.html
It's not that fast but still does the job.
Obvious downside is that you have to remove the tags yourself. For example, you can do all of this above with a S3 triggers and lambda.
You should be able to query tags and values that you added
using resource-groups/query resource:
https://${region}.console.aws.amazon.com/resource-groups/resources
There is many way to get filter list of s3 by tag. I used in my code:
import boto3
from botocore.exceptions import ClientError
def get_tag_value(tags, key):
for tag in tags:
if tag["Key"] == key:
return tag["Value"]
return ""
def filter_s3_by_tag_value(tag_key,tag_value):
s3 = boto3.client('s3')
response = s3.list_buckets()
s3_list=[]
for bucket in response["Buckets"]:
try:
response_tags = s3.get_bucket_tagging(Bucket=bucket["Name"])
if get_tag_value(response_tags["TagSet"],tag_key) == tag_value:
s3_list.append(bucket["Name"])
except ClientError as e:
print(e.response["Error"]["Code"])
return s3_list
def filter_s3_by_tag_key(tag_key):
s3 = boto3.client('s3')
response = s3.list_buckets()
s3_list=[]
for bucket in response["Buckets"]:
try:
response_tags = s3.get_bucket_tagging(Bucket=bucket["Name"])
if get_tag_value(response_tags["TagSet"],tag_key) != "":
s3_list.append(bucket["Name"])
except ClientError as e:
print(e.response["Error"]["Code"])
return s3_list
print(filter_s3_by_tag_value(tag_key,tag_value))
print(filter_s3_by_tag_key(tag_key))
AWS now supports tagging of S3 images.
They have APIs to add/remove tags.
Amazon S3 Select, Amazon Athena can be used to search for S3 resources with TAGS.
Currently the max number of tags per resource is 10 (thanks Kyle Bridenstine for pointing out the correct number).
I would like to upload directly to a Google Cloud Storage bucket from my client-side JavaScript code, although the bucket should not be publicly writeable. The server side code is Node.js. In the past I have been able to upload directly to Amazon S3, by generating temporary authorization for the client on the server. What is the procedure for uploading a file to a write-protected Google Cloud Storage bucket, i.e. which requires authorization?
I've found that I can use getSignedUrl, which is supposed to allow certain temporary access (e.g. for writing) to a file:
bucket = gcs.bucket('aBucket')
bucket.file('aFile').getSignedUrl({
action: 'write',
expires: moment.utc().add(1, 'days').format(),
}, (error, signedUrl) => {
if (error == null) {
console.log(`Signed URL is ${signedUrl}`)
}
})
In order to upload a file, issue a PUT request to the obtained signed URL.