Job description

IT

Site Reliability Expert (SRE)

Quebec

Simons Campus - IT

Full time

Are you looking to join our Information Technology team in a unique role that contributes to the optimal maintenance of our production environment? Join the Simons family as a Site Reliability Engineer (SRE).

The person in this role plays a key part in ensuring the smooth operation of our production environment by adopting a proactive, software-engineering-oriented approach. Reporting to the Director of Solution Architecture and Software Engineering, the SRE is responsible for ensuring the continuous availability of large-scale distributed software applications while maintaining high levels of performance and reliability. 


Key Responsibilities:

  • Provide primary operational support for multiple large-scale distributed software applications.
  • Collect and analyze metrics from operating systems and applications to support performance optimization and incident troubleshooting.
  • Measure and optimize system performance.
  • Deliver infrastructure services using Infrastructure as Code (IaC).
  • Maintain services that use the Operator Framework.
  • Maintain and enhance continuous integration and continuous deployment (CI/CD) tools using ArgoCD and GitHub Actions.
  • Automate IT operations tasks using Ansible.
  • Participate in system design consultations, platform management, and capacity planning.
  • Balance feature development velocity and reliability with well-defined service-level objectives.
  • Collaborate with development teams to improve services through rigorous testing procedures.
  • Build sustainable systems and services through automation and continuous improvement.
  • Develop software and systems to manage platform infrastructure and applications. 


Desired Profile:

  • Bachelor’s degree in computer science, software engineering, IT engineering, electrical engineering, or any other relevant field.
  • At least two (2) years of experience in a role related to DevOps, SRE, platform engineering, or software engineering.
  • Experience with Kubernetes, preferably Red Hat OpenShift.
  • Experience with full-stack observability platforms such as Datadog and New Relic.
  • Practical coding knowledge beyond simple scripting.
  • Strong understanding of cloud-native approaches.
  • Advanced programming skills (structured and object-oriented) using one or more high-level languages such as Java, Python, C/C++, Go, and JavaScript.
  • Proactive approach to identifying issues, performance bottlenecks, and areas for improvement.
  • Strong teamwork abilities and communication skills to work effectively with diverse stakeholders in a constantly evolving environment.
  • Ability to communicate effectively in both French and English, spoken and written, in order to use systems and tools and carry out various tasks in English. 


Benefits Available:  

  • A telemedicine service and Employee and Family Assistance Program.  
  • Group insurance plan and RRSP.  
  • Up to 40% off Simons purchases.  
  • Fitness area with changing rooms, group classes, and kinesiology services. 
  • Cafeteria service offering an extensive and affordable menu.