Techno-Scribble: AWS Relational Database Services (RDS)

RDS-VPC association: Traditionally, AWS RDS instances were optionally hosted on a VPC, when associated with EC2 database instances were optionally tagged to VPC --> known as 'DB Classic' instances;

New database instances are associated to VPC by default; association to VPC / subnet [private or public] can dynamically be modified by AWS console; AWS also provides to move traditional instances over from Classic into VPC based setup;

Cost optimization factors for RDS

region, instance type, RDS engine & license
data transfer costs - typically outbound data transfers from database; inbound data transfers are free
storage costs - (General purpose / magnetic / SSD)
backup storage costs, usually equivalent to the database instance configuration costs
replication costs [in terms of IOPS chosen]

Database scaling related

horizontal scaling & read replication are selectively supported by database engines
there is no downtime associated with horizontal / vertical scaling within a region OR availability zone
minimal downtime when scaling across multiple AZs, since upgrade occurs on secondary instance and then fails over to the primary; single AZ will be unavailable during scaling operation;
licensing cost in directly relates to CPU sockets / calls
storage size remains same irrespective of scaling

RDS Multi AZ

Used for high availability; availability zones are connected via low latency link;
provides an enterprise grade, fault-tolerant solution; leverages another AZ for secondary host;
multi-AZ assures synchronous writes between primary & secondary host; this means every write is replicated synchronously to secondary host before acknowledge to client;

Cross AZ connectivity

DNS <--> app instances <--> database (primary / secondary, if multi-AZ enabled)
DNS resolution for database happens via CNAME resolution for the database primary host
On a fail-over, DNS entry is updated to the secondary host when the primary host becomes unavailable
Once AWS RDS resolves issues with primary host and it becomes available, RDS triggers data synchronization from secondary to the primary host
Requests are routed to & remain with the secondary host until the synchronization is complete

Read replica - read replication is different from multi-AZ failover; multi-AZ failover will have primary & secondary; Updates to source database are asynchronously replicated to the read replica; read replicas can be in the same AZ OR different AZ OR different region altogether managed by AWS RDS;

replication can have lag - eventually consistent; latest snapshot from primary host is written into the read replica; hence queries to read replica MAY NOT return the latest results; REPLICATION LAG METRIC CAN INDICATE HOW STALE / INCONSISTENT THE DATA COULD BE;

When to use read replica

Scaling - load balancing, redirect excess traffic for read / SELECT operations;
re-route traffic similar to fail-over when source database isn't available;
disaster recovery, DR cross-regions, with RPO until the last primary snapshot;
data analytics / warehouse reporting;

Multi-AZ v/s read replicas

read replicas as the name suggests, replicates data onto a different instance in the same AZ or another AZ or another region;
with read replicas, DNS name re-route is NOT AUTOMATIC, it's not a fail-over scenario; more of a load balancing scenario for performance;

Backup & Recovery

Backup can be taken automated OR manual
first snapshot written into S3 contains all data from EBS volumes
subsequent snapshots are incremental, only differences from previous snapshots are stored
Enable point-in-time RDS snapshots to achieve RPO of less than 10 minutes; point-in-time snapshots enable gives and RPO upto 5 minutes; RTO depends on the amount of data to restore;

Automated backups

backup window, when backup should be taken; retention period, until how long data shd be retained [1-35 days, 7 days default];

in multi-AZ configuration, backups are taken daily - from EBS volumes OF THE SECONDARY RDS HOST; saved into S3 during the maintenance window configured in backup configuration;

backup are stored in AWS RDS owned S3 buckets; objects in S3 are not visible OR accessible;
transaction logs are written to S3 every 5 minutes; hence in case of failure, database can be restored UPTO last 5 minutes (RTO) when failure occurred;

in single AZ configuration, all similar steps, except for EBS -> S3 write happening in single AZ only from primary host;

automated backups involves daily snapshots + 5 min transaction logs into S3; hence database may experience slight performance impact during backup window;
EBS volumes supports RAID configurations - what level of RAID is required, is to be determined with calculations;

Manual backups - are triggered from mgmt console, CLI or APIs; taken from secondary host EBS volumes similar to automated backups;

Database security, encryption - basically restrict access to database applying policies, use IAM to define RBAC & principle of least privileges to provision database access;

restrict public access to database, make it private, as far as possible encrypt your data at rest;
use VPC peering to access within subnet, Direct Connect to connect to on-premise hosts authenticated access to inbound DB traffic;
to encrypt data-at-rest SHA256 algorithm based encryption keys are used to encrypt your data in DB instances;
once encrypted, data is encrypted across regions, AZs, read replicas & fail-over instances; including backups & snapshots;
encryption is enabled at volume level, so no impact to the application;

Two-tier encryption -- Master keys created by customer, each DB instance has its own data key, has good benefits; first lower risk of compromising key; easier to maintain fewer number of master keys; better performance when encrypting large data; supports key rotation, there by reduce risk of compromised master key;

Aurora & Aurora server-less models - starts with 10GB & scales in steps of 10GB up to 64TB (storage auto-scaling)

compute resources can scale upto 32vCPUs & 244GB of memory
2 copies of data in each AZ, minimum of 3 AZs
Aurora server-less provides relatively simple, cost-effective option for infrequent, intermittently, or unpredictable workloads; this service offers automatic scalability, for unknown or unpredictable workloads;
aurora server-less has a "writer" node and "reader" node, when u TURN ON 'read-replication' --->>> across different availability zones; this means that the read-replication is across availability zones;

Techno-Scribble

Adsense ad-unit

AWS Relational Database Services (RDS)

No comments:

Post a Comment

Adsense ad-unit

Featured posts

Why Cloud Adoption...What are the necessary steps needed to migrate onto cloud