AI Research Computing Infrastructure Engineer

Company: Frederick National Laboratory for Cancer Research
Location: Frederick
Posted on: February 22, 2026

Job Description:

AI Research Computing Infrastructure Engineer Job ID: req4426 Employee Type: exempt full-time Division: Enterprise Information Technology Facility: Frederick: Ft Detrick Location: PO Box B, Frederick, MD 21702 USA The Frederick National Laboratory is operated by Leidos Biomedical Research, Inc. The lab addresses some of the most urgent and intractable problems in the biomedical sciences in cancer and AIDS, drug development and first-in-human clinical trials, applications of nanotechnology in medicine, and rapid response to emerging threats of infectious diseases. Accountability, Compassion, Collaboration, Dedication, Integrity and Versatility; it's the FNL way. PROGRAM DESCRIPTION The mission of Enterprise Information Technology (EIT) is to develop an enterprise-level, consolidated information technology infrastructure that provides exceptional IT capabilities to the Frederick National Labs for Cancer Research (NCI-Frederick/FNLCR) in support of basic, translational, and clinical cancer and AIDS research. The IT Operations Group (ITOG) is a part of Enterprise Information Technology (EIT) within Leidos Biomedical Research, Inc. ITOG is responsible for computational servers, storage servers, virtual machine infrastructure, and the FNLCR network. ITOG focuses on implementing enterprise IT best practices in the areas of computational services, storage, backup, and archiving; batch and application support; server consolidation and virtualization; network infrastructure; unification of voice, teleconferencing, and video communication technologies; and improved infrastructure for collocation of dedicated servers. KEY ROLES/RESPONSIBILITIES: The Research Computing Infrastructure Engineer will design, build, and operate next-generation high-performance computing (HPC) environments that support container-based workflows and GPU-accelerated research computing. The position will play a key role in evaluating, implementing, and maintaining scalable and secure computing architectures for advanced data analysis, AI/ML model training, and simulation workloads. The engineer will collaborate closely with researchers, IT professionals, and external partners to translate scientific requirements into reliable, high-performance computing solutions. Design and implement next-generation high-performance computing (HPC) environments that leverage container-driven workflows for GPU-accelerated research. Build and maintain container orchestration systems for batch and distributed workloads. Integrate containerized job workflows with existing HPC schedulers and storage systems. Develop and maintain job templates for batch GPU training and multi-node distributed computing. Automate deployment, configuration, and scaling through infrastructure-as-code and CI/CD practices. Monitor, benchmark, and optimize system performance, reliability, and resource utilization. Collaborate with researchers to containerize and optimize legacy workflows for scalable execution. Lead evaluation of emerging tools (e.g., Prefect, Ray, Airflow, Dagster) for workflow orchestration and distributed computing. Contribute to the development of tools and bridges between orchestration frameworks and traditional HPC environments. BASIC QUALIFICATIONS To be considered for this position, you must minimally meet the knowledge, skills, and abilities listed below: Possession of Bachelor’s degree from an accredited college/university according to the Council for Higher Education Accreditation (CHEA) or four (4) years relevant experience in lieu of degree. Foreign degrees must be evaluated for U.S. equivalency. In addition to the education requirement, a minimum of eight (8) years of related experience. Strong Linux systems engineering and administration experience. Hands-on experience with container orchestration tools such as Kubernetes, Nomad, Run:AI, etc. Hands-on experience with scripting/programming skills (Python, Bash, or Go) for automation, monitoring, and job orchestration. Experience with infrastructure-as-code / automation tooling (Terraform, Ansible, Packer, or equivalent). Familiarity with system performance analysis, monitoring, and tuning. Comfortable with small-team environments and taking end-to-end ownership of compute infrastructure. Ability to obtain and maintain a security clearance. PREFERRED QUALIFICATIONS Candidates with these desired skills will be given preferential consideration: Experience with multi-node distributed ML frameworks (PyTorch DDP, Ray, Horovod, TensorFlow,etc). Familiarity with pipeline orchestration tools (Prefect, Airflow, Dagster, Kubeflow). Understanding of resource management and scheduling concepts (queues, allocations, GPU device plugins, gang scheduling, multi-node coordination). Understanding of storage integration with high-performance clusters (POSIX object storage, VAST or similar). Familiarity with cloud GPU environments (AWS, GCP, Azure) and hybrid workflows. Familiarity with workflow orchestration/pipeline tools (Argo, Kubeflow, Ray, MLFlow). Good communication and documentation skills, the ability to make complex infrastructure understandable to researchers and other engineers. EXPECTED COMPETENCIES: Expertise in Kubernetes, Nomad, or equivalent container orchestration systems for large-scale computing. Deep knowledge of Linux systems administration, performance tuning, and automation. Ability to translate research computing needs into scalable, reliable infrastructure designs. Commitment to documentation, reproducibility, and open science principles. Collaborative mindset and willingness to mentor peers in containerization and HPC best practices. Commitment to Non-Discrimination All qualified applicants will receive consideration for employment without regard to sex, race, ethnicity, color, age, national origin, citizenship, religion, physical or mental disability, medical condition, genetic information, pregnancy, family structure, marital status, ancestry, domestic partner status, sexual orientation, gender identity or expression, veteran or military status, or any other basis prohibited by law. Leidos will also consider for employment qualified applicants with criminal histories consistent with relevant laws. Pay and Benefits Pay and benefits are fundamental to any career decision. That's why we craft compensation packages that reflect the importance of the work we do for our customers. Employment benefits include competitive compensation, Health and Wellness programs, Income Protection, Paid Leave and Retirement. More details are available here 123,800.00 - 207,125.00 USD The posted pay range for this job is a general guideline and not a guarantee of compensation or salary. Additional factors considered in extending an offer include, but are not limited to, responsibilities of the job, education, experience, knowledge, skills, and abilities as well as internal equity, and alignment with market data. The salary range posted is a full-time equivalent salary and will vary depending on scheduled hours for part time positions

Keywords: Frederick National Laboratory for Cancer Research, Centreville , AI Research Computing Infrastructure Engineer, IT / Software / Systems , Frederick, Virginia

Didn't find what you're looking for? Search again!

Let Frederick recruiters find you. Post your resume for free!

Get Frederick IT / Software / Systems jobs via email.

View more Centreville IT / Software / Systems jobs

Other IT / Software / Systems Jobs

Database Engineer
Description: Job Description Job Description Are you a Database Engineer who is ready for a new challenge that will launch your career to the next level Tired of being treated like a company drone Tired of promised (more...)
Company: GliaCell Technologies
Location: Linthicum Heights
Posted on: 02/23/2026

Installation Technician - Flexible schedule!
Description: Become a Tech Today HelloTech is a nationwide, on-demand tech support platform. HelloTech provides affordable, same-day, on-site tech support services such as installations, setups, troubleshooting and (more...)
Company: HelloTech
Location: Staunton
Posted on: 02/23/2026

Signals Processing Engineer III
Description: Job Description Job Description This position requires an Active Top Secret SCI with Fullscope Polygraph clearance to be considered. ProSync Technology Group, LLC ProSync is an award-winning, SDVOSB (more...)
Company: Prosync
Location: Fort George G Meade
Posted on: 02/23/2026

Salary in Centreville, Virginia Area | More details for Centreville, Virginia Jobs |Salary

Installation Technician - $100 per job!
Description: Become a Tech Today HelloTech is a nationwide, on-demand tech support platform. HelloTech provides affordable, same-day, on-site tech support services such as installations, setups, troubleshooting and (more...)
Company: HelloTech
Location: Staunton
Posted on: 02/23/2026

Installation Technician
Description: Join HelloTech, a leader in in-home tech support services, as our newest Installation Technician. This role is crucial for providing hands-on support and technical expertise to our customers
Company: HelloTech
Location: Staunton
Posted on: 02/23/2026

Installation Technician - Make up to $100 per job!
Description: Become a Tech Today HelloTech is a nationwide, on-demand tech support platform. HelloTech provides affordable, same-day, on-site tech support services such as installations, setups, troubleshooting and (more...)
Company: HelloTech
Location: Staunton
Posted on: 02/23/2026

Cyber Security Specialist
Description: Job Description Job Description BOOST LLC is a dynamic management consulting firm that offers an array of government-compliant back-office solutions to support our teaming partners within the GovCon space. (more...)
Company: BOOST LLC
Location: Fort Belvoir
Posted on: 02/23/2026

Electro-Optic Support Specialist (Onsite) - Contingent Upon Award
Description: Job Description Job Description Description: Seeking a dedicated Electro-Optic Support Specialist to provide hands-on technical support at the EOSF in MCSC Quantico. This key personnel role focuses on (more...)
Company: Loch Harbour Group Inc
Location: Quantico
Posted on: 02/23/2026

Senior Cyber Advisor
Description: Job Description Job Description Be Challenged and Make a Difference In a world of technology, people make the difference. We believe if we invest in great people, then great things will happen. At AnaVation, (more...)
Company: AnaVation
Location: Linthicum Heights
Posted on: 02/23/2026

Senior Cloud Architect
Description: Job Description Job Description Cohere is seeking a Senior Cloud Architect You will take a lead on architecting an existing high-profile application in a cloud DevOps environment utilizing available (more...)
Company: Cohere Technology Group LLC
Location: Centreville
Posted on: 02/23/2026

Loading more jobs...

AI Research Computing Infrastructure Engineer

Didn't find what you're looking for? Search again!

Other IT / Software / Systems Jobs

Log In or Create An Account