HBeonLabs

Committed to excellence!!!

Training: HBCS102-- Introduction to GPGPU using CUDA

 NVIDIA's CUDA: Supercomputing on your desktop!

The graphics cards that we use for gaming/visual enhancement has two basic components: a Graphics Processing Unit (GPU) and off-chip DRAM. GPUs are designed for compute intensive jobs, where CPUs are too slow. On the other hand CPUs are designed for data caching and controlling, where GPUs are useless.

GPUs in general have a highly parallel architecture and in particular some of NVIDIA’s GPUs have 240 cores per processor (compare this with modern CPUs: 2, 4 or 8 cores). With such a parallel architecture, GPUs provide excellent computational platform, not only for graphical applications but any application where we have significant data parallelism. The GPUs thus are not limited to its use as a graphics engine but as parallel computing architecture capable of performing floating point operations at the rate of Tera bytes/s. People have realized the potential of GPUs for highly computational tasks, and have been working in general purpose computation on GPUs (GPGPU) for a long time. However, life before NVIDIA’s Compute Unified Device Architecture (CUDA) was extremely difficult for the programmer, since the programmers need to call graphics API (Open GL, Open MP, Open CV etc.). This also has a very slow learning rate. CUDA solved all these problems by providing a hardware abstraction, hiding the inner details of the GPUs, and the programmer is freed from the burden of learning graphics programming.


 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
CUDA is C language with some extensions for processing on GPUs. The user writes a C code, while the compiler bifurcates the code into two portions. One portion is delivered to CPU (because CPU is best for such tasks), while the other portion, involving extensive calculations, is delivered to the GPU(s), that executes the code in parallel. Because C is a familiar programming language, CUDA results in very steep learning curve and hence it is becoming a favorite tool for accelerating various applications. NVIDIA's CUDA SDK is being employed in a plethora of fields right from the computational finance to Neural network and fuzzy logic to simulations for Nanotechnology.

CUDA has several advantages over traditional general purpose computation on GPUs (GPGPU) using graphics APIs.
  • Scattered reads – code can read to arbitrary addresses in memory.
  • It is high level-basically an extension to C language. So the learning rate is much higher as compared to the traditional GPGPU.
  • Shared memory – CUDA exposes a fast-shared memory region (16KB in size) that can be shared amongst threads. This can be used as a user-managed cache, enabling higher bandwidth than is possible using texture lookups.
  • Faster downloads and readbacks to and from the GPU
  • Full support for integer and bit wise operations
In short CUDA lets you exploit these tiny supercomputers i.e GPUs, that ships with your graphics cards, and lets you accelerate your applications significantly ,some time as high as 100 times and even more depending upon how smartly you have exploited the resources of GPUs

Course Overview:

This course (HBCS102) is provided face-to-face/on-site as well as ON-LINE (in real time) divided into three modules: Level "A" , Level "B" and Level "C", level C being the most advance course, comprising mainly of practical applications.

 Level "A" is an introductory course on parallel programming with about 20% of the time devoted for CUDA programming. This level does not require any parallel computing knowledge. Only a Data structures level course is required. Some exposure to image processing is also given in this module. The course starts from C programming language, and covers the detail of Graphics card hardware ( GPU architecture, DRAM, PCIe, etc). Apart from these concepts we also cover elementary concepts in CUDA programming on Windows and Linux environment. The course aims at making the trainee understand how to write a simple program for squaring of (say) first 10000 integers, and such other simple CUDA programs. In short the candidate learns how to write simple CUDA programs and understand basic hardware and software details, without bothering about the performance.

Level "B" discusses parallel programming concepts in detail giving specific focus on CUDA programming.  Specifically you are exposed to the following special topics: Performance metrics - speedup, utilization, efficiency, scalability, Models of Parallel Computation: SIMD  (Single Instruction Multiple Data), MIMD (Multiple Instruction Multiple Data), GPU Compute Architecture, CUDA, Memory organization in CUDA, Memory Optimization,  Coalesced Access, Occupancy, Transparent Scalability, Performance Guidelines. Finally the trainee learns different algorithms for fast Matrix Multiplication and implements the same in CUDA, getting significant performance benefits.

Level "C" is the advance course and is mainly related to practical implementations. Level C is a hands-on course involving significant parallel programming on massive-core GPUs fundamentally CUDA compatible NVIDIA's GPU. Specifically we will be working on NIDS (Network Intrusion Detection System) acceleration on GPUs. This will require core knowledge of networking fundamentals as well CUDA programming skills.  

_____________________________________________________________________________

Target Audience:

Professionals, researchers and students with background in Mathematics, Computer Science, IT, Electrical, Electronics & Communications and similar fields can enroll for this course.

___________________________________________________________________

Prerequisites: 

For Level "A", the person should be familiar with the concepts of C programming language. Although the parallel programming will be taught in the training in Level "A", but some exposure to it will help you grasp the concept quickly.  

___________________________________________________________________

Reference Books

Introduction to Algorithms, Third Edition
Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest and Clifford Stein

Introduction to Parallel Computing by Ananth Grama, George Karypis, Vipin Kumar and Anshul Gupta (Pearson)

CUDA programming Guide, CUDA Best Practice Guide  (Download from nvidia.com)

GPU GEMS 3 by Hubert Nguyen 

Reading Material from Internet 

(Cuda Drivers / Cuda Download / Cuda SDK / CUDA VS Wizard)
_____________________________________________________________________________

For any specific information  or query contact us at info@hbeonlabs.com

Welcome to HBeonLabs

Language Translator

Testimonials

  • "I am very lucky to do my 6 months training at HBeonLabs. Here I learnt in-depth knowledge about microcontroller and how to interface various devices to it. I am the only person ..."
    Vineet Sangha
    Thapar University,Patiala
  • "I am Rishabh Mittal, working as a network consultant engineer in Ericsson,Gurgaon. Due to HBeonLabs I am able to crack my interview. I have learnt a lot from HBeonLabs during my..."
    Rishabh Mittal
    Galgotias College of Engineering and Technology , Greater Noida

Newest Members

   

Visitor Counter