Infrastructure & DevOps Engineer | Italy (Remote/Hybrid) A leading European innovator in high-performance computing and AI infrastructure is seeking an Infrastructure & DevOps Engineer to architect the foundation of its compute capabilities. The company operates at the cutting edge of multi-GPU cluster management, bridging the gap between sophisticated hardware provisioning and seamless cloud-native orchestration. The Role The focus of this position is the end-to-end reliability of a heterogeneous compute environment. The engineer will be responsible for making large-scale deployments reproducible and ensuring that developers have frictionless access to high-power resources. Key Responsibilities: * Provision and maintain high-performance CPU/GPU clusters across multiple physical locations. * Implement dynamic compute and storage scaling to meet fluctuating workload demands. * Design hardware and software-level storage solutions, including distributed filesystems and storage tiering. * Manage container orchestration through Kubernetes and Docker for both production and R&D workloads. * Develop infrastructure as code (IaC) utilising Terraform and Ansible. * Optimise job scheduling and resource allocation via Slurm and Kubernetes. * Establish robust observability using Prometheus, Grafana, and IPMI. * Conduct system-level performance profiling, focusing on GPU utilisation and I/O throughput. * Oversee secure networking, VPN management, and disaster recovery protocols. Technical Profile The ideal candidate brings a deep understanding of Linux system administration and the unique challenges of managing bare-metal and virtualised hardware. Essential Experience: * Advanced Linux administration and networking principles. * Proven expertise with Docker and Kubernetes orchestration. * Hands-on experience with IaC tools (Terraform or Ansible). * Background in HPC environments and job scheduling via Slurm. * Experience in hardware infrastructure management (IPMI, BMC) and server maintenance. * Ability to design storage systems such as NFS, Ceph, or other distributed filesystems. Preferred Skills: * Familiarity with bare-metal provisioning tools like MaaS. * Experience navigating cloud service environments (AWS or similar). Why Join? The company offers the opportunity to work on highly complex, large-scale infrastructure projects that directly power the next generation of AI development. This is a chance to move beyond standard cloud DevOps and dive into the intricacies of hardware-level performance and global compute distribution. If this role is of any interest please apply directly on LinkedIn or send a copy of your CV to [email protected]. Interested? Apply directly through LinkedIn, or send your CV to [email protected] By applying to this role you understand that we may collect your personal data and store and process it on our systems. For more information please see our Privacy Notice (https://eu-recruit.com/about-us/privacy-notice/).
Infrastructure / Devops Engineer
EUROPEAN TECH RECRUIT
Roma, Lazio
Pubblicato 24 giorni fa
Segnala lavoro