Techno-Scribble: AWS Simple Storage Service

Object storage basics - Simple storage service (S3) can store objects up to 5TB in size, one of the earliest available services on AWS public cloud;

Highly durable & consistent, data is replicated into multiple devices across multiple facilities
Object attributes

key = name of the object
value = object payload
version id = with version enabled snapshot
bucket name + key + version id uniquely identify an S3 object

Buckets - referred to using ARN (Amazon Resource Name)

100 buckets by default
buckets can be configured with sub-resources (within S3)
namespaces are universal; path style URL v/s virtual OR bucket based URL [path style URL will be deprecated]
virtual or bucket based URL can optionally contain region name in the namespace URL

Pricing - using intelligent tiering can help reduce the cost; since it's the same cost as S3 standard & can give you access to glacier / deep archive when necessary, automatically thereby saving some cost;

------------------------------------------------------------------------------------------------------------------------

S3 security - is applicable at object level, bucket level and user level (IAM);

S3 offers versioning and ability to delete / restore a given object version;
with versioning, object lifecycle can be managed on S3 - to store / delete objects;
optionally MFA can be enabled for additional security;
S3 also supports cross-region replication
S3 Encryption at Rest

AES-256 keys generated from AWS OR custom generated key
KMS use AWS KMS generated key management services
encrypt data from client side before uploading to S3

Bucket policies - by default access to all objects are private, only to the resource owner, the one who created it

policies are categorized as resource policies & user policies
resource policies are applied to S3 resources & user policies are applied to IAM users in your account

Resource based access policies - 2 types

access control lists (ACLs), can allow grant access to AWS accounts & pre-defined groups
control bucket level or object level grant access policies via XML schema configuration

can choose resource level access within a list of grants
classified as legacy

bucket policies allow grant permissions to bucket & objects with AWS accounts AND IAM USERS

classified NEW, allows configuration via XML OR JSON

Choice of resource based policies v/s ROLES

When assuming a role, the user has to give up all his current access before assuming the role
when you assume a role, you give up your original permissions and take the permissions assigned to the new role
with resource based policies, user will retain his current access to existing objects
resource based policies are supported with S3, SQS & SNS

User policies - applied to users, groups OR roles using AWS IAM

applied via AWS IAM service, and not using S3 console
expressed using JSON
directly applied to users, hence no anonymous access
with user policies, following elements come into play

Principal: account to allow/deny access to actions & resources
Effect: effect take when user requests an action
Action: list of permissions to allow/deny
Resource: bucket or object to which the policies apply to, specified in ARN (Amazon resource name)
SID: not required or S3, general description of the policy statement

S3 Object locks - are used to store objects using a WORM model

WORM = write once, read many model; prevents objects from being deleted or modified for a fixed amount of time OR indefinitely;

Object locks modes - Governance mode, where users can't override or delete / alter an object without special permissions; Compliance mode, where protected object cannot be overwritten OR deleted by any user, inclusive of root user in my AWS account; object retention can't be modified / shortened; compliance mode related terms --> retention period & legal hold;

------------------------------------------------------------------------------------------------------------------------

S3 integration with Cloudwatch - Cloud watch logs each event details for S3 objects OR EC2 instance logs OR VPC logs --> Cloud Watch event monitoring can be enabled on many AWS services;

Cloud watch captures the record of each event moving in & out of S3 or similar services; Cloud watch events can be used to monitor actions within S3 objects and apply appropriate corrective options;

With Cloudtrail, API calls triggered from S3 can be monitored; This is useful information to analyze what time, by whom an API call was made so as to troubleshoot RCA, detect & protect applications on AWS; Cloudtrail typically logs API calls in JSON format...on the contrary, Cloudwatch uses flat format; Think of Cloudtrail as similar to APIC or APIGEE logs

Cloudtrail operates at the API level, basically captures the API operations related to the bucket OR object; S3 Access Logging operates at object level operations, for examples requests related to accessing an API;

------------------------------------------------------------------------------------------------------------------------

Encryption for data-at-rest

server side encryption - S3 encrypts the data before writing it to disk and decrypts data before reading from the disk, options:

SSE-S3 --> server side encryption with S3 managed keys, AES-256 encryption, master key rotated automatically
SSE-KMS --> server side encryption with KMS managed keys, keys managed by AWS KMS AES-256; creation & control of master keys + data keys in our control; AWS operators do not have access to all keys to decrypt the data; control access to master keys - job of account owner / adminstrator;
SSE-C --> server side encryption with customer managed keys

client side encryption - data is encrypted before being uploaded into S3

in the case of client side encryption, client encrypts the data locally
another option is for the customer to use AWS KMS or services to manage the keys
data uploaded to S3 is already encrypted

Object versioning - multiple versions of an object increases the storage cost, storage cost is multiple of the number of versions of each object; an object you uploaded, storage cost increases that many number of times; versioning & snapshot is hence to be used with careful consideration of costs;

versioning, when enabled, and enable encryption --> creates a new encrypted version of existing objects. Older existing versions of objects are all unencrypted;

------------------------------------------------------------------------------------------------------------------------

Object replication - during replication, objects are encrypted in transit over SSL; cross account replication is possible; storage class & owner can be changed post copy;

same region replication - use cases - log aggregation, data sovereignty, replication between AWS accounts

cross region replication - use cases - compliance, latency, disaster recovery;

In order for replication to work

versioning should be enabled on source & destination buckets
must have permissions to replicate
AWS account should have read access to source bucket
if object lock is enabled on the source, target should also have object lock enabled
can only replicate to a single target bucket

What is replicated

objects along with object metadata & tags are replicated
unencrypted objects are replicated
lock retention info is replicated

What is not replicated

existing objects in the buckets before setting up replication are not replicated
bucket sub-level resources such as lifecycle mgmt, etc. are not replicated
objects encrypted using "CUSTOMER PROVIDED KEYS" (SSE-C) are not replicated
objects encrypted using KMS are not replicated by default
objects that are replicated by another rule are not re-replicated
objects in s3 GLACIER OR DEEP ARCHIVE are not replicated

S3 RTC (replication time control)

configure replication time control to replicate objects within a specified time frame
has an SLA of 99.99% to complete replication within minutes
Cloudwatch events can be setup to monitor SLA breach for replication
without S3 RTC, replication is usually asynchronous
takes several hours to replicate

S3 Glacier & Deep Archive

Objects have certain 30-day rule to AUTO-transition into S3-IA, S3-1Zone-IA, S3-Intelligent tier, S3-Glacier, S3-Deep archive storage classes;
< 128kb objects cannot directly AUTO-transition to S3-IA / S3-1Zone-IA / S3-Intelligent tiering;
for every record stored in glacier / deep archive, 40KB of extra storage is added for object name, index details & metadata; This increases the storage cost for a large number of smaller files;
for storage into S3-IA, the file size should be a minimum of 128KB. otherwise transition into IA won't occur;

------------------------------------------------------------------------------------------------------------------------

S3 performance - S3 uses read-after-write consistency model, so object written will immediately be available for read access. NO DELAYS;

by default, S3 serves upto 3500 requests / second for PUT / POST / DELETE / COPY calls, per prefix
by default, S3 serves upto 5500 requests / second for GET / HEAD per prefix
in order for to scale horizontally for higher performance, increase the number of prefixes

Minimize latency for performance - regional considerations - S3 bucket placed near to end users / in that region;

To retrieve a newly uploaded object to S3, S3 responds to 'GET' operation until replication is completed; Update & delete are however eventually consistent - hence updates & deletes are completed locally before replication and further replicated to other places - to get to an eventually consistent state;

KMS request rates - server side encryption / decryption has a limit; quota is region specific; quota increase for KMS is not allowed;

Transfer acceleration --> used for reduced latency data transfer when S3 bucket spans distance far from end user region; cache frequently accessed content from S3;

use CDN [content distribution network] such as CloudFront
using cache solutions such as elastic cache
use AWS Elemental MediaStore --> to cache video content
use Geo-proximity routing policies to route requests to the users' closest location
transfer acceleration is used to reduce latency & improves rendering speed
cross-region replication does not use transfer accelaration

Multi-part upload - [PUT] / download [GET], introducing parallelism;

recommend multi-part uploads for objects > 100MB; where a large object is split into multiple chunks and further uploaded / downloaded improves performance; even without introducing multiple prefixes OR scaling number of EC2 instances, introducing parallelism gives us similar results;

Cloudfront CDN - Data in CDN expires by default after 24 hrs (TTL / time to live); Cloudfront can front-end OR point to:

an EC2 instance
an S3 bucket
Route 53 endpoint
Elastic load balancer (ELB)
OR A COMPLETELY DIFFERENT EXTERNAL SYSTEM

Geo restriction is supported with CloudFront; CDN can exist on an Edge location OR "Distribution", which is a collection of Edge locations; CloudFront CDN supports caching both static content & dynamic content; dynamic content means java / groovy / jsp / etc; contains 2 types of distribution:

web-distribution: for websites
RTMP: for media streaming

Transfer Acceleration - enables fast, easy & secure transfers of files over long distances between client & S3 bucket; leverages Cloudfront's globally distributed edge locations; as data arrives at an edge location, it is routed to AWS S3 bucket over an optimized network path; used to optimize PUTS / GETS / LISTS; Inventory buckets - created in the same region as the AWS S3 bucket;

Simple notification service (SNS) - supports pub/sub - distribution pattern; notification is published with a "topic" on the channel; messages are further received by the subscriber to the "topic";

Elastic Map Reduce - Managed Hadoop framework for processing huge amounts of data; supports Apache Spark, HBase, Presto & Flink; contains master node, core nodes & task nodes - core nodes is the HDFS;

------------------------------------------------------------------------------------------------------------------------

S3 analytics - S3 supports analytics via data lake [Athena, Redshift spectrum, Quicksight]; IoT streaming with Kinesis Firehose writing to S3 buckets - is another option; ML & AI storage, Rekognition, Lex, MXNet, storage class analysis, S3 management analytics;

More Nifty S3 tricks

provides data transfer accelaration using cloudfront in reverse;
"requester pays" rather than the bucket owner;
"TAGS" always useful for costing, billing, security, ALSO FOR document classfication / ARCHIVED objects;
static web hosting, static content, media / content simple & massively scalable
"BitTorrent" support

S3 Select - only a subset of data can be retrieved from an object using SQL like expressions; instead of retrieving all the data, only required data can be selected using S3 SELECT expressions;

Glacier SELECT - similar to S3 SELECT, used to retrieve subset of data from Glacier;

Data sync - is a way to synchronize your data between on-premise & AWS cloud; automatically encrypts data in-transit & at-rest;

Amazon Athena - interactive query service enables to analyse & query data located in S3 using SQL; server-less, pay per query OR per TB scanned;

Amazon Macie - is a machine learing (ML) & natural language processing (NLP) service to discover, classify & protect sensitive data on S3; uses AI to detect if S3 objects contain sensitive data such as PII; works with S3, cloudtrail data analyze; good for PCI-DSS & identity theft protection;

------------------------------------------------------------------------------------------------------------------------

Techno-Scribble

Adsense ad-unit

AWS Simple Storage Service

No comments:

Post a Comment

Adsense ad-unit

Featured posts

Why Cloud Adoption...What are the necessary steps needed to migrate onto cloud