AWS
offers a variety of services and service types; service applicability
in a given context is a point for architects / designers to search,
research, test & adapt. This blog features -
AWS well-architected framework lenses , focusing on HPC examples & AWS supporting HPC workloads...
For high performance computing, AWS offers certain specific instance types in EC2 --> to associate highly performant compute load:
- this includes a family of supported instance types such as x-large M5*, R4*, R5*, X*, *-metal, etc;
- range of transmission varies between 1500 MTUs upto 9001 MTUs maximum transfer units or jumbo frames;
- note that network transmission speeds upto 100 Gbps are achievable for high performance computing;
- an appropriate instance type (typically EC2 instance type], grouping [AZs / VPC] should be used to maximize network throughput for high performance computing
With instance type variations:
- elastic network adapter can support network speeds upto 100 Gbps
- intel 82599 virtual function (VF) instance type supports network speeds upto 10Gbps
- network traffic in cluster placement groups can use upto 10 Gbps for single-flow traffic
- cluster placement groups with AWS DX (Direct connect) to on-premises are limited to 5 Gbps
High-performance computing lens - cloud platforms can host HPC workloads. Natural ebb & flow, bursting characteristic of HPC workloads make them well suited for pay-as-you-go infrastructure. Certain terms associated with HPC workloads:
- vCPUs - synchronous to threads / hyper-threads
- align procurement model to support the workload - use pay-as-you-go model to support flexible loads e.g. peak / stress / high workload run for a duration, following which the resource utilization drops; elasticity is the key here, that helps meet varying workload requirements
- AWS Parallel cluster - use to experiment the workload, optimize architecture for performance & cost; use AMIs & EBS snapshots; S3 & cloud formation templates along with AWS parallel cluster configuration templates;
- test real-world workloads - application requirements vary based on algorithm complexity, mathematical formulae applied (finite element methods, extrapolations, moving averages, calculus, predictive analysis, etc.) size & complexity of models used, user-interface simulation requirements, requirements for visual graphics, 3-D complex image rendering, etc. Extrapolate real-world anticipated load for performance tests
- use spot instances - least expensive for non-critical workloads; good for research-oriented workload simulation
- Select storage solution that best aligns to the requirement; for e.g. create RAID 0 array to achieve higher levels of performance, where I/O performance is critical to fault tolerance;
- for storage solutions with fixed size, such as EBSx / FSx - ensure to monitor the amount of storage used v/s overall storage size; automate scaling of storage resources based on threshold limit;
Scenarios where applicable: genomics, computational chemistry, risk modeling, computer-aided design & mechanics, weather prediction & seismic imaging, machine learning & deep learning, autonomous driving, etc.
Loosely coupled scenarios - entails processing of a large number of smaller jobs with shared memory parallelization (SMP); e.g. monte-carlo simulations, image processing, genomics analysis & electronic design automation (EDA);
- compute - driven by application's memory-to-compute ratio; GPUs / FPGA accelerators on EC2 instances
- network - performance of the workload is not sensitive to the bandwidth / network latency between the instances; hence cluster placement is not necessary
- storage - driven by data size, I/O (read / write) & data transfers
- deployment - distributed across AZs with no impact to performance; can be simulated with AWS Batch & AWS parallel cluster or combination of AWS services
- compute - homogeneous cluster with similar compute nodes; per core instance size can be chosen by available memory optimized instance nodes; largest size per core preferred proportionate to the workload
- network - cluster placement group applies here; AWS Elastic Fabric Adapter (EFA) can be used to support tightly coupled workloads with high inter-node communication at scale;
- storage - driven by data size, I/O (read / write) & data transfers
- deployment - can be deployed via AWS Batch & AWS parallel cluster configuration / AWS cloud formation or EC2 fleet
-----------------------------------------------------------------------------------------------------------------------
- Address data protection & data loss prevention using storage services such as S3 / EBS / EFS; understand availability & durability requirements applied to HPC scenarios
- Failure tolerance can improve deploying to multiple AZs / regions;
- trade-off between reliability & cost pillars is needed with considerations around clustered placements for compute instances, based on latency (tight / loose coupling)
- choose an appropriate instance family for compute- , memory- or GPU intensive workloads
- each instance family has an associated instance size - large / extra-large, allows to vertically scale capacity; choose an appropriate 'larger' instance type to support the appropriate HPC workload (tight / loosely coupled)
- Choose new-gen instance types for HPC workloads - Nitro, with enhanced high-speed I/O acceleration; Nitro delivers performance similar to bare-metal;
- When choosing underlying hardware, look for:
- advanced processing features
- hyper-threading technology
- advanced vector extensions, processor affinity
- processor state controls (C-state & P-state) - choice between reduced latency with single / dual core versus consistent performance at lower frequencies
- FSx for Lustre natively integrates with S3; presents entire content of S3 buckets as a file system - allows optimize storage costs
- Demand based approach
- Buffer based approach
- Time based approach
Networking considerations - tightly coupled applications benefit from an Elastic Fabric Adapter (EFA), a network device that can attach to your Amazon EC2 instance. EFA provides lower and more consistent latency & higher throughput than TCP used in cloud-based HPC systems.
- EFA supports OS-bypass access model via Libfabric API - HPC applications communicate directly with the network interface hardware.
- EFA enhances inter-instance communication; optimized to work on the existing AWS network infrastructure; critical to scale tightly coupled applications
partition calculations:
(1) number of partition = (2000 RCU / 3000) + (2000 WCU / 1000)
where RCU = read capacity units & WCU = write capacity units
(2) number of partition by size = x / 10GB
where x = data capacity for storage
total number of partition = ROUND(MAX (1,2))
we have a partition key & sort key
partition key is like a primary key and sort key is like a aggregation or similar key similar to map / reduce concept
DAX - Dynamo DB Accelarator, its a in memory cache that sits in front of the dynamo DB table
- with ENI, we get 1 private & public IPv4 address, 1 or more IPv6 address, 1 MAC address, source/destination check flag and description
- used to create low-budget high availability solutions, networks connecting sub nets
- use network & security appliances in your VPC
- used to higher I/O performance, higher networking bandwidth, higher packet per second (PPS) performance
- there is no additional charge for enhanced networking
- choose elastic network adapter OVER ENI when you got to get good speeds in the order of 50 to 100 Gbps
- it's not required to keep adding ENIs rather configure ENA or VF for high speed networking
EFA - elastic fabric adapter -- network device that can be attached to Amazon EC2 instance to accelarate High Performance Capacity (HPC) and machine learning applications
- EFA gets attached to EC2 instance and provides high performance computing
- use EFA for HPC & machine learning kind of use cases
No comments:
Post a Comment