Worldwide Remote Jobs

Principal Performance Engineer

Cornelis Networks
๐Ÿ“ USA ๐Ÿ’ผ full_time ๐Ÿ’ฐ competitive compensation package including equity and cash
Apply Now ๐Ÿ“… 16 hours ago
๐Ÿง  python

Job Description

Join Cornelis Networks: Pioneer the Future of AI and HPC Networking as a Principal Performance Engineer

Are you a seasoned performance engineer passionate about pushing the limits of high-performance computing (HPC) and artificial intelligence (AI)? Cornelis Networks is seeking a driven and innovative Principal Performance Engineer to spearhead end-to-end performance optimization for our next-generation networking solutions. As a key member of our team, you’ll play a pivotal role in shaping the future of AI and HPC infrastructure, enabling breakthroughs across diverse industries.

About Cornelis Networks

Cornelis Networks is a leading provider of cutting-edge, scalable networking solutions designed for AI and HPC datacenters. Our unique architecture seamlessly integrates hardware, software, and system-level technologies to maximize the efficiency of GPU, CPU, and accelerator-based compute clusters, regardless of scale. We empower our customers to tackle the world’s most demanding computational challenges, driving innovation in areas like cloud computing, autonomous systems, aerospace and defense, manufacturing, life sciences, and climate research. Backed by top-tier venture capital and strategic investors, we are committed to performance, innovation, and scalability.

The Opportunity

As a Principal Performance Engineer, you will be an individual contributor with technical leadership scoping, driving performance strategy and leading investigations across the networking stack. Your work will directly impact the performance and scalability of our products, enabling our customers to achieve unprecedented results. This is a highly visible role with the opportunity to work on cutting-edge technologies and collaborate with leading experts in the field.

Key Responsibilities

  • Performance Ownership: Lead pre- and post-launch performance validation, debugging, and optimization for adapters, switches, and fabric software, from lab environments to large-scale production deployments.
  • Post-Silicon Leadership: Drive performance optimization and characterization for networking ASICs and end-products, correlating pre-silicon models with silicon performance.
  • Customer Enablement: Provide white-glove customer support, reproduce field issues, co-debug in shared/onsite labs, and deliver solutions through mitigations, fixes, and per-customer tuning guides.
  • Workload Innovation: Research and enable forward-looking workloads, including AI inference, distributed AI training, and traditional HPC applications.
  • Multi-Fabric Expertise: Evaluate and tune Cornelis/Omni-Path, Ethernet/RoCEv2, and InfiniBand across diverse topologies, routing protocols, and congestion control mechanisms.
  • Platform Optimization: Explore platform designs and tunings end-to-end, including CPU/GPU NUMA placement, PCIe/GPU-Direct, BIOS/firmware, switch/NIC QoS & scheduling, and more.
  • Experiment Design: Create credible experiments, synthesize representative traffic, replay workload traces, and conduct on-cluster A/B tests with statistically sound comparisons.

Required Qualifications

  • 10+ years of experience in performance engineering, post-silicon validation, or systems performance for high-speed networking or HPC/AI products.
  • Deep expertise in post-silicon bring-up and performance validation of networking ASICs/systems.
  • Proven ability to debug networking hardware and software for performance tuning and issue resolution in production-scale deployments.
  • Hands-on experience with multiple fabric technologies, including Cornelis/Omni-Path, Ethernet/RoCEv2, and/or InfiniBand.
  • Strong understanding of AI/HPC workloads, including NCCL/RCCL collectives, UCX/libfabric/MPI, and optimization techniques for training and inference.
  • Expertise in experimentation and analysis, including workload modeling, on-cluster A/B tests, and tail-latency analysis.
  • Proficiency in automation using Python and Linux, with experience in data pipelines, dashboards, and CI hooks.
  • Excellent cross-functional communication skills and the ability to lead without authority.
  • BS/MS in CE/EE/CS or equivalent experience.

Preferred Qualifications

  • Experience supporting customer-facing performance optimization or field application engineering.
  • Experience building or leading a white-glove performance support program.
  • Familiarity with inference stacks such as NVIDIA Triton, TensorRT-LLM, and vLLM.
  • Background in benchmarking, including MLPerf exposure and HPC application tuning.
  • Contributions to UCX, libfabric, NCCL/RCCL, or kernel networking.
  • Deep understanding of networking and memory data flows.

Location and Benefits

This role supports remote work for employees residing within the United States, with occasional travel to our Chesterbrook Corporate Center in Wayne, PA. We offer a competitive compensation package, including equity, cash, and incentives, as well as comprehensive health and retirement benefits.

Join Our Team

At Cornelis Networks, you’ll find a dynamic and flexible work environment where you can collaborate with some of the most influential names in the semiconductor industry. We are an equal opportunity employer and value diversity at our company. We do not discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status.

Build Your CV for remote jobs in Minutes

Latest Jobs

Similar Jobs

Innovatrics
๐Ÿ“ Slovakia ๐Ÿ’ผ contract ๐Ÿ“… Sep 12, 2025
Loganix
๐Ÿ“ Worldwide ๐Ÿ’ผ contract ๐Ÿ“… Sep 12, 2025
NXT LABS
๐Ÿ“ Pakistan ๐Ÿ’ผ full_time ๐Ÿ“… Sep 12, 2025
Intellectsoft
๐Ÿ“ Ukraine ๐Ÿ’ผ full_time ๐Ÿ“… Sep 12, 2025