Azure Site Reliability Engineer
Start Date: 19/05/2025
End Date: 31/03/2026
Location: Brussels
Regime: Full-time
Application Deadline: 23/04
Description:
We are seeking an experienced Azure Site Reliability Engineer to join our Engineering chapter team. You will play a critical role in ensuring the reliability, scalability, monitoring, and performance of our cloud-based services in the Consumer Centricity product organization. Your responsibilities will include designing, implementing best practices, and managing our infrastructure. You will work within cross-functional teams to improve systems and processes and ensure uptime and efficiency.
Key Responsibilities:
- Automation and CI/CD: Design, create, and maintain automation frameworks for deployment, scaling, and managing productive environments.
- System Monitoring and Maintenance: Implement and manage monitoring tools to ensure system health and performance. Proactively identify and fix issues before they impact users.
- Incident Management: Respond to and resolve incidents in a timely manner, perform root cause analysis, and implement measures to prevent recurrence.
- Performance Optimization: Analyze system performance and implement improvements to ensure scalability and efficiency.
- Capacity Planning: Conduct capacity planning assessments to predict system needs and ensure resources are in place to handle growth.
- Collaboration: Work closely with development teams to integrate systems reliability into the development lifecycle through continuous integration and deployment practices.
- Documentation: Create and maintain comprehensive documentation related to systems architecture, configuration, and operational procedures.
- Tool Development: Develop and maintain internal tools to streamline processes and improve system reliability.
- Security: Ensure that security controls are implemented, monitored, and maintained across all systems.
- Service Level Objectives (SLOs): Define and track Service Level Objectives (SLOs) to ensure reliability metrics meet business requirements.
- On-call Support: Participate in on-call rotations to provide 24/7 support for critical systems and infrastructure.
Requirements:
- Experience: Minimum of 5 years in a Site Reliability Engineer or DevOps role with extensive experience in Microsoft Azure.
- Proficient in scripting languages (Python, Azure CLI, PowerShell).
- Experience with containerization technologies (Docker, Kubernetes).
- Proficiency in Azure Cloud services (VMs, Storage, Networking, etc.).
- Experience in Infrastructure as Code (IaC) tools such as Terraform, ARM templates, or Bicep to automate secure provisioning and configuration of Azure resources.
- Strong experience with monitoring, logging, and alerting tools such as Azure Monitor, Application Insights, or Log Analytics and third-party solutions like Grafana, Splunk, or Elastic Stack.
- Experience in Azure governance and cost management using Azure Cost Management, Azure Policies, and management groups.
- Strong understanding of cloud networking, hybrid cloud, and virtual networking concepts (e.g., VPNs, subnets, NSGs, load balancing, hub & spoke).
- Fluent in English.
Not a must, but advantageous:
- Microsoft Azure certifications, such as Azure Solutions Architect Expert, or Azure DevOps Engineer Expert.
- Experience with the following technologies: Kong, Event Hubs, Dapr.
- Extra Languages: French (B1), Dutch (B1).