Multi-tenancy is one of the challenges to build & deliver SaaS solutions; there are different strategies available to partition data, while complexity arises to secure data - with controlled access across tenants, ensuring data confidentiality, integrity & seamless access to SaaS consumers;
Given SaaS, by definition, has no consumer involvement to manage network, infrastructure, database & application [all managed by cloud-provider], it's a challenge for any cloud provider to t-shirt size the hardware & software components, optimize cost on the shared environment as well as plan security measures in-place with due consideration to non-functional attributes such as performance, availability & reliability - so as to win consumers' confidence & help consumers build their business on the cloud;
Partitioning models - silo, bridge & pool; where silo model has separate database instance for each tenant; bridge model has single database & separate schema for each tenant and pool model has a shared (database + schema) and partitioning by database record / row classified with tenant id; In case of a pool model, partitioning key is used to segregate access to tenant data;
Silo model tradeoffs:
- applicable when customer needs to comply to strict regulatory & security requirements;
- outages are limited since availability is managed at tenant level;
- cross-tenant impacts are limited, hence less vulnerable to security attacks;
- comparatively priced higher with other models;
Pool model tradeoffs:
- one-stop monitoring across all tenants, managing & monitoring health checks, troubleshooting & problem resolution is at one place; on-boarding new clients onto the shared platform is easier;
- provides operational agility to cloud service provider, while disruptions impact all consumers on-boarded to the platform;
- outages are comparatively more frequent, given shared environment, more vulnerable to attacks - hence requires due diligence to introduce:
- monitoring mechanisms, incident management & resolution process
- intelligent / automated issue resolution (where applies)
- dynamic scaling to onboard new consumers, provision capacity at runtime, when needed - de-provision when demand reduces;
- managing multi-tenant data in a shared data-model is less isolated, hence consistency & integrity is a trade-off; data size & distribution also influences data management strategy in pool model;
Multi-tenancy on Dynamo DB - Dynamo DB has no notion of database instance, being NoSQL database; instead - all tables created are global to an account in a region - hence every table should be created unique for each account;
- uses eventual consistent READs, read the written data upto or after 1 SECOND - the data is written
- also supports strongly consitent READs, read the written data within < 1 SECOND; Called One-second rule;
- spread across different geographic locations; stored on SSD storage, hence faster;
strongly consistent read is also possible with dynamo DB; - also has the ability to conduct ACID compliant transactions;
- silo model on dynamo DB requires grouping of tables associated to a specific tenant
- approach to also create secure, controlled view of the tables - prevent cross-account access;
- IAM policies are used to control access to dynamo DB tables;
- Cloudwatch metrics can be captured at the table level, hence simplifies aggregation of client metrics;
- table read & write capacity measured as IOPS - are; applied at table level, hence distinct scaling policies can be created;
With dynamo DB:
- primary key & sort keys can be created, similar to an index;
- secondary indexes can be created, which are local & global secondary indexes;
- local secondary index will query by the partition key, sort key can be different
- global secondary index does not have any dependency on partition / sort key
- in dynamo db, we can use global secondary index and aggregate on a specific field, to use a feature similar to Map-reduce in cloudant
- concept of sparse indexes, where index will only exists for only those JSON records where the field exists...other database records will not be indexed
- Dynamo DB is best suited to store structured & consistent JSON documents;
- local secondary index to be created when we create the table
- global secondary indexes can be created anytime after table is created
- Advanced dynamo DB offers on-demand backup & restore, operating within the same region as the source table;
- another advanced dynamo DB service offering is the point-in-time recovery, it's an incremental backup feature enabled on-demand; RPO can be upto the last five minutes;
strangely auto-scaling with dynamo DB doesn't work as expected, while it scales out, scale in has a problem when workload decreases; provision on-demand instances, if required when the throughput IOPS OR number of read/writes are really unknown;
for known workloads, provision dynamo DB instances by throughput IOPS required, downside here is that if u don't enable auto scaling, you may run into the risk of database instance crash, if capacity is overloaded;
Bridge
model & silo model operate similar for Dynamo DB; only difference
here is table level access policies are a bit more relaxed;
With pool modeling, challenge is that the data in multi-tenant SaaS environment doesn't typically have uniform distribution; it's very common for few SaaS vendors to consume large portion of the available data footprint; hence creating consumer footprints directly can cause partition "hot spots" - which in turn impacts costs & performance of the solution; hence sub-optimal distribution of keys with increased IOPS to offset impact & distribute workload - is a design approach to resolve this;
DAX with dynamo DB - DAX - dynamo DB accelarator, fully managed highly available in-memory cache [notice it's an im-memory cache]; for performance improvement, reduces request time from milliseconds to microseconds, even under load; it fails over across multiple availability zones, it's a managed cache, with no code changes needed; something like an improvement or improvised version of cache over dynamo DB;
Dynamo DB streams - it's something like a sequence of events, ordered in FIFO fashion, retained upto 24hrs [inserts, updates, deletes]; remember streams are retained upto 24hrs only;Multi-tenancy on RDS - follows natural mapping to silo, bridge & pool models;
- silo model - separate database instances is created & maintained for each tenant
- bridge model is achieved creating different schemas for each tenant
- different tenants, using the bridge models, can run different versions of the product at a given point in time and gradually migrate schema on per tenant basis
- introducing schema changes is one of the main challenges with bridge models
- pool model- moving all data into shared infrastructure model; store tenant data in a single RDS instance, tenants share common tables; primary theme is trading management & provisioning complexity for agility
- fully managed, clustered peta byte scale data warehouse; extremely cost effective from other on-premises data warehouses such as Teradata or Netezza;
- its postgreSQL compatible, with JDBC & ODBC drivers available; features parallel processing & columnar data stores which are optimized for complex queries;
- redshift newly supports an option to QUERY DIRECTLY FROM S3, feature called Redshift Spectrum; concept of a data lake is to LAND UNPROCESSED DATA INTO A LARGE AREA, APPLY A FRAMEWORK AND query it;
- by default, retention period is 1 day, max retention upto 35 days; default data compression is offered and 3 copies of data is maintained; original & backup on the compute nodes + backup on Amazon S3;
- for disaster recovery, redshift can ASYNCHRONOUSLY replicate data across data centers OR regions; Redshift is available only within ONE AZ one availability zone;
No comments:
Post a Comment