Welcome to Vivek Kale's Homepage.

[Work Webpage] [ResearchGate] [Google Scholar] [GitHub] [bitbucket] [LinkedIn] [twitter]


I am a Principal Member of Technical Staff at Sandia National Laboratories - California. I primarily work on research and development for HPC Software Technology. This HPC Software Technology enables Scientific Software relevant to the U.S. Department of Energy to run efficiently on supercomputing platforms having nodes with heterogenous devices, including GPUs. I am also interested in AI/ML for automated performance tuning and automated testing in HPC (including modern generative AI techniques), AI/ML-guided Scientific Software and hybrid supercomputer-cloud platforms. Additionally, my work can be used to optimize AI/ML workloads on modern data center platforms having GPUs. Key topics in my work are Kokkos, C++ Parallel STL, C++ Executors, OpenMP, MPI, loop scheduling and loop transformations, LLVM, adaptive runtime systems, tools for profiling and debugging, and performance modeling. I completed my PhD in Computer Science in May 2015 from the University of Illinois at Urbana-Champaign. During this time, I also worked with and held positions with the U.S. Department of Energy laboratories, particularly at Lawrence Livermore National Laboratory and Argonne National Laboratory. Since earning my doctorate, I continued research on combining work on loop scheduling and inter-node load balancing, making loop scheduling and load balancing work synergistically together to improve performance of scientific and engineering simulations run on supercomputers. A highlight of my work is the development of multi-GPU programming capabilities in LLVM's OpenMP, including mechanisms for maintaining data locality when scheduling computation and data to multiple GPUs. This is increasingly important for scaling HPC applications involving AI/ML, bioinformatics, drug discovery on multi-GPU platforms such as NVIDIA's DGX.

Representative Work
Vivek Kale, Hanru Yan, Shyamali Mukherjee, Jackson Mayo, Keita Teranishi, Richard Rutledge and Alessandro Orso. Toward Automated Detection of Portability Bugs in Kokkos Parallel Programs   8th International Workshop on Software Correctness for HPC Applications, SC24. November 18, 2024.   [paper] [code]

Mathialakan Thavappiragasasm and Vivek Kale. CPU-GPU Performance Tuning for Improving Performance of Modern Scientific Applications on Exascale Supercomputers.   IEEE's International Conference on High-Performance Computing (HiPC) 2023. Goa, India. December 18-21, 2023.   [paper] [code]

Shravan Kale, Kevin Huck, David Boehme, Vanessa Surjadidjaja, and Vivek Kale. Performance Analysis and Auto-tuning Tools for Performance Portable Parallel Programs.   2023 ACM/IEEE International Conference for High Performance Computing Networking, Storage, and Analysis. Denver, CO, USA. November 12-17, 2023.   [paper] [code]

Vivek Kale, Vanessa Surjadidjaja, Christian Trott, and James Brandt. Data Order Reduction for Performance Monitoring of Supercomputers via the Kokkos Tools Sampler Utility.   LDMSCon 2023. Boston, MA, USA. June 13-15, 2023.   [paper] [code]

Vivek Kale and Shyamali Mukherjee. Tools to Rapidly Develop Sophisticated HPC Software Libraries.   SIAM Computational Science and Engineering Conference 2023. Amsterdam, Netherlands. March 2, 2023.   [paper] [code]

Mathialakan Thavappiragasam and Vivek Kale. OpenMP’s Asynchronous Offloading for All-pairs Shortest Path Graph Algorithms on GPUs.   HiPar 2022 Workshop at The 2022 International Conference for High Performance Computing Networking, Storage, and Analysis. November 16, 2022. Dallas, Texas, USA.   [paper ][code]

Mathialakan Thavappiragasam, Vivek Kale, Oscar Hernandez and Ada Sedova. Addressing Load Imbalance in Bioinformatics and Biomedical Applications: Efficient Scheduling across Multiple GPUs.   In Proceedings of 12th International Workshop on High Performance Bioinformatics and Biomedicine. December 9th, 2021. Houston, Texas (virtual).   [paper] [code]

Raul Torres, Vivek Kale, Abid Malik, Tom Scogland, Roger Ferrer and Barbara M. Chapman. Support in OpenMP for Multi-GPU Parallelism.   The International Conference for High Performance Computing Networking, Storage, and Analysis. Extended Abstract and Poster. November 19, 2021. St. Louis, Missouri, USA.   [paper][code]

Seonmyeong Bak, Colleen Bertoni, Swen Boehm, Reuben Budiardja, Barbara M. Chapman, Johannes Doerfert, Markus Eisenbach, Hal Finkel, Oscar Hernandez, Joseph Huber, Shintaro Iwasaki, Vivek Kale, Paul R.C. Kent, JaeHyuk Kwack, Meifeng Lin, Piotr Luszczek, Ye Luo, Buu Pham and P.K. Yeung. OpenMP Application Experiences: Porting to Accelerated Nodes.   In Journal of Parallel Computing. October 23rd, 2021.   [paper]

Vivek Kale, Wenbin Lu, Anthony Curtis, Abid Malik, Barbara Chapman and Oscar Hernandez. Toward Supporting MultiGPU targets via taskloop and User-defined Schedules   Proceedings of the 2020 International Workshop of OpenMP. September 23-25, 2020. Austin, USA. (virtual)   [paper] [code]

Jonas H. Muller Korndorfer, Florina Ciorba, Christian Iwainsky, Johannes Doerfert, Hal Finkel, Vivek Kale and Michael Klemm. A Runtime Approach for Dynamic Load Balancing of OpenMP Parallel Loops in LLVM.   Extended Abstract (Poster). Proceedings of the 2019 ACM/IEEE International Conference for High Performance Computing, Networking, Storage, and Analysis.   [paper]

Vivek Kale, Christian Iwainsky, Michael Klemm, Jonas H. Muller Kondorfer and Florina Ciorba. Toward a Standard Interface for User-defined Scheduling in OpenMP. Fifteenth International Workshop on OpenMP. September 2019. Auckland, New Zealand.   [paper]

Vivek Kale and Oscar Hernandez. User-defined Schedules in OpenMP for Improved Performance Portability. Department of Energy Performance, Portability, and Productivity Workshop. Poster. April 2019. Denver, USA.  [paper][code]

Vivek Kale and Martin Kong. Locality-aware Loop Scheduling Strategies in OpenMP   Extended Abstract. OpenMPCon 2018. September 2018. Barcelona, Spain.   [paper]

Vivek Kale, Harshitha Menon, Karthik Senthil. Adaptive Loop Scheduling with Charm++ to Improve Performance of Scientific Applications. SC 2017 Poster. Denver, USA. November 2017. (Selected as a Candidate for the Best Poster) [pdf]

Vivek Kale and William D. Gropp. A User-defined Schedule for OpenMP. OpenMPCon 2017.  September 2017. New York, USA. [paper]

Vivek Kale and William D. Gropp. Composing Low-overhead Scheduling Strategies for Improving Performance of Scientific Applications. IWOMP 2015. October 2015. Aachen, Germany. [paper]

Simplice Donfack, Laura Grigori, William D. Gropp, Vivek Kale. Hybrid Static/Dynamic Scheduling for Already Optimized Dense Matrix Factorization. IPDPS 2012. May 2012. Shanghai, China. [paper]

Vivek Kale, Todd Gamblin, Torsten Hoefler, Bronis R. de Supinski, William D. Gropp. Slack-conscious Lightweight Loop Scheduling for Scaling Past the Noise Amplification Problem. SC 2012 Poster. November 2012. Salt Lake City, USA.  [pdf]

Torsten Hoefler, James Dinan, Darius Buntinas, Pavan Balaji, Brian Barrett, Ron Brightwell, William Gropp, Vivek Kale, Rajeev Thakur. MPI+MPI: A New Hybrid Approach to Parallel Programming with MPI Plus Shared Memory. EuroMPI 2012. September 2012. Madrid, Spain. [paper]

Vivek Kale and William Gropp. Load Balancing Regular Meshes on SMPs with MPI. EuroMPI 2010. September 2010. Stuttgart, Germany. (Selected as a Best Paper) [pdf]

 Vivek Kale and Edgar Solomonik. Parallel Sorting Pattern. ParaPLoP 2010. March 2010. Carefree, USA. [pdf]

This page was last updated on October  17, 2024.