Senior HPC ML Applications Engineer - GPU & CPU
Location: Austin, Texas, US
Company: Advanced Micro Devices
Apply for Job
What you do at AMD changes everything
We care deeply about transforming lives with AMD technology to enrich our industry, our communities and the world. Our mission is to build great products that accelerate next-generation computing experiences – the building blocks for the data center, artificial intelligence, PCs, gaming and embedded. Underpinning our mission is the AMD culture. We push the limits of innovation to solve the world’s most important challenges. We strive for execution excellence, while being direct, humble, collaborative and inclusive of diverse perspectives. This is who we are at our best. One Company. One Team.
AMD together we advance_
Senior HPC ML Applications Engineer – GPU & CPU
The Role:
We are looking for our next team member to join our growing HPC Data Center GPU (DCGPU) team, to enable and optimize HPC applications and provide performance and systems expertise to our internal partners & customers prior to 1st Si through going to production of our Epyc processors and Instinct(MI) GPU accelerators based systems and solutions .
A ‘hands-on’ role working independently and with other AMD engineers to tackle technical HPC functional and performance issues, collaborating with our customer-facing organizations, our internal R&D and other key engineering groups. Working across a variety of partners on the bring-up, design, debug and performance of the world’s largest HPC systems, making a significant impact at a global level, including working with the ‘Mega Datacenters’ and HPC cloud providers. Growing the success and market penetration of the AMD GPU as it applies to HPC.
The Person:
Very Strong solution-oriented mindset
Expertise in HPC application performance testing and debug on CPU and/or GPU
Strong technical ownership and ability to lead technical relationships with both customers and HPC partners
Ability to independently prioritize opportunities to deliver results on time
Proven success establishing relationships internally and across a network of customers and partners
Excellent verbal and written communication skills
Key Responsibilities:
Seek maximum HPC performance while achieveing highest quality on AMD EPYC plus Instict systems through a combination of performance optimization, HPC workload debug and characterization, compilers, math libraries and lower-level AMD-internal toolsets
Feeding back performance bottlenecks and functional issues to the relevant engineering groups during bring-up to improve quality and performance
Partner with our collaborative internal development and validation teams supporting with a deeper level of HPC application and system-level expertise
Attending and leading high-value technical HPC discussions to portray general AMD GPU proposition and its application to HPC
Technically owning and resolving customer and partner issues. Submitting JIRA tickets and driving resolution
Collaborate on future architectures, functional validation and performance testing
Attend internal working groups in resolving engineering issues; contribute to the debug and testing of unreleased GPU based solutions and their readiness for HPC workloads
Document and publish system health and performance results, as well as procedures you have generated and procedures automation
Preferred Experience:
Proven HPC application experience balanced with partner or customer-facing experience
HPC Functional applications bring-up, triage, and performance profiling, monitoring tools, and software performance optimization
Expertise working with large codes from source, with appropriately linked math libraries and flag optimization, working with different compilers, MPI libraries, and math libraries
System-level hardware and its configuration on performance, such as Infiniband and shared parallel filesystems
Proven understanding of baseline testing of synthetic codes: HPL, STREAM, DGEMM, HPCG, HPCC
Linux administration; understanding setup for HPC middleware
Nice to Haves:
Experience working on very large codes such as weather and associated tuning for greater scalability
Any experience understanding/inspecting/writing assembly
Understanding of memory and cache hierarchy and methods to query performance/latency at each level
Understanding HPC dataflow down to the register-level
Academic Credentials:
- List any desired degrees, certifications, etc.
- Use the words preferred or desired, instead of required
Location:
Austin Texas
Requisition Number: 175741
Country: United States State: Texas City: Austin
Job Function: Design
Benefits offered are described here.
AMD does not accept unsolicited resumes from headhunters, recruitment agencies or fee based recruitment services. AMD and its subsidiaries are equal opportunity employers. We consider candidates regardless of age, ancestry, color, marital status, medical condition, mental or physical disability, national origin, race, religion, political and/or third party affiliation, sex, pregnancy, sexual orientation, gender identity, military or veteran status. Please click here for more information.
Follow Us