Redfire75369’s avatarRedfire75369’s Twitter Archive—№ 2,239

              1. Nvidia Data Center GPU MCM Approach: Composable On-Package Architecture (COPA) Divergence of HPC and DL GPUs by modifying I/O and cache, depending on the workload, using chiplets. (1/x)
            1. …in reply to @Redfire75369
              DL Bottlenecks (2/x)
          1. …in reply to @Redfire75369
            Ballooning LLC Requirements for DL Workloads (3/x)
        1. …in reply to @Redfire75369
          Bandwidth and Efficiency of 2.5D and 3D Packaging 28.7 mm² of silicon area is used for 14.7 TB/s with 3.5D Packaging. Conservative Estimate of 960 MB L3 in 826 mm² (4/x)
      1. …in reply to @Redfire75369
        Effect of increasing BW in DL and HPC Workloads (5/x)
        oh my god twitter doesn’t include alt text from images in their APIoh my god twitter doesn’t include alt text from images in their APIoh my god twitter doesn’t include alt text from images in their API
    1. …in reply to @Redfire75369
      Effect of various COPA-GPUs in DL Workloads (6/x)
      oh my god twitter doesn’t include alt text from images in their API
  1. …in reply to @Redfire75369
    The mysterious GPU-N Note: All benchmarks are run in NVArchSim, simulating GPU-N High Accuracy for MLPerf Workloads Comparison with other DL ASICs (7/x)
    oh my god twitter doesn’t include alt text from images in their API
    1. …in reply to @Redfire75369
      GPU-N and COPA-GPU Configurations SMs: 134 Frequency: 1.4 GHz TFLOPs: 24.2 FP32, 779 FP16 L2: 60 MiB (to 1920 MiB) DRAM Bandwidth: 2.687 TB/s (to 6.3 TB/s) DRAM Capacity: 100 GiB (to 233 GiB) (8/x)
      oh my god twitter doesn’t include alt text from images in their APIoh my god twitter doesn’t include alt text from images in their API
      1. …in reply to @Redfire75369
        Now, it's time for my favourite, speculation. I believe GPU-N is a single-die GH100, with a total of 144 SMs, with 134 enabled. It seems to have 6144b HBM2e @ 3.5 Gb/s. GPU-N is the HPC version of the GPU, while the various COPA-GPUs (CG) are DL-optimised. (9/x)
        1. …in reply to @Redfire75369
          CG-1 and CG-3 have 6144b HBM2e CG-2 and CG-4 have 12288b HBM2e CG-5 has 14336b HBM2e I don't expect all of these CGs to exist, since it would require both 2.5D and 3D packaging capable GH100 dies. (10/x)
          1. …in reply to @Redfire75369
            As I was worried, it seems Nvidia has decided to focus on DL performance, given the FP32 scaling is only due to increased SM count, and presumably FP64 (HPC) as well. This isn't really conducive to the 3x performance rumour. (11/x)
            1. …in reply to @Redfire75369
              Source: GPU Domain Specialization via Composable On-Package Architecture By: Nvidia, USA (12/x) dl.acm.org/doi/10.1145/3484505