Site Reliability Engineer

Technology
Malta

By ticking this box and submitting a job application, you consent to us processing your personal data according to Catena Media's Privacy Policy. Your personal data remains in our database for 24 months after the most recent update. You have the right to change or delete your data at any time.

We are looking for Site Reliability Engineers to be responsible for ensuring that our systems and services are predominantly resilient, scalable and efficient with stellar uptime records. As a Site Reliability Engineer in Catena Media, you are partly software engineer, partly systems engineer and efficiency expert with a keen eye on scalable solutions. You are passionate about getting code into production that brings efficiency and quality gains to our technological eco-systems.

SCOPE OF THE ROLE

  • As a SRE, your focus will be to ensure that our systems are reliable, scalable, secure, monitored, and support operations are automated. This role fills the large gap between Software Developers and Tech Ops staff and requires you to be an excellent communicator.
  • YOUR CHALLENGE

  • You build and maintain a distributed infrastructure which is resilient, secure and scalable
  • You are constantly mentoring and coaching other members on your team
  • You create automatic problem detection and resolution procedures
  • You are a keen to troubleshoot issues across the entire stack: hardware, software, application and network. Regardless if residing on Physical hardware or in cloud-based environments.
  • Working closely with the R&D teams and make sure changes are continuously being built, tested, and deployed
  • You strive to drive standardization efforts across multiple disciplines and services
  • You identify and drive opportunities to improve automation for the company to aid in scaling and security
  • Manage timely resolution of all critical and/or complex problems meeting SLA requirements
  • You collaborate with stakeholders to serve, observe, own, and solve problems through innovation, thus increasing the efficiency of our production deployments and in turn also reducing friction with production deployments with the ultimate goal of increasing or sustaining our availability
  • Develop, configure and optimize service and application monitoring
  • You participate in an on-call rotation as a 2nd line escalation from S/NOC
  • TO DO IT, YOU WILL NEED

  • You are keen on efficiency and take performance & scalability very seriously
  • Able to quickly assess and address problems, escalating as necessary
  • You are passionate about automation and have experience with configuration management tools such as Ansible, Salt, Puppet.
  • You have experience with containerization technologies such as Docker and orchestration technologies such as Kubernetes
  • You have experience with cloud platforms such as AWS, Azure & GCP
  • You are comfortable with CI / CD pipelines and have managed integrated workflows using tooling such as Jenkins & Git
  • You have profound experience with Linux-based operating systems & LAMP/LEMP stacks
  • You are comfortable with the TCP & UDP networking stacks
  • You are an enthusiast for optimum performance within the use of the HTTP Protocol
  • You are comfortable with Monitoring and Alerting tools such as OP5, Nagios, PRTG, Zabbin or APMs such as Instana, Dynatrace, Prometheus, Grafana & PagerDuty and treat this as a first-class citizen
  • Ideally you have Database Administration experience (MySQL, MongoDB)
  • Experience in one or more of the following: Python, Go, Perl, Ruby, shell scripting.