Senior Site Reliability Engineer

Posted 09 January 2023
LocationUnited Kingdom
Reference9485
ContactBarry Twohig

Job description

Senior Site Reliability Engineer Job Description Work with us in creating our groundbreaking AI cyber-security platform! This new cyber-security platform is being built from the ground up using new research into machine learning and a reactive distributed architecture. You will be responsible for working with the senior engineering team to deploy, troubleshoot, monitor and scale our services, along with infrastructure managing models towards classifying unstructured data, and protecting the customer's sensitive and critical company information. What You'll Do ? Automate build, test, deploy and release management processes. ? Implement CI/CD according to GitOps best practices ? Set up and scale our infrastructure using IaC (Infrastructure as code) ? Build monitoring that alerts on symptoms rather than on outages ? Working with software developers and software engineers to ensure that development follows established processes and works as intended ? Document every action so your findings turn into repeatable actions and then into automation ? Participate in scrum meetings, standup, sprint planning ? Plan the growth of our infrastructure ? Improve operational processes (such as deployments and upgrades) to make them as boring as possible ? Debug production issues across services and levels of the stack What You'll Need ? Experience with GitOps best practices ? Experience with AWS services such as EC2, ECR, EKS, S3 ? Experience with tools such as Rancher, K3s, Kubernetes, Terraform and Helm or similar technologies ? Experienced with modern monitoring tools like Prometheus, Grafana, ELK and Loki ? Have strong programming skills: Shell, and Python and/or Go ? Knowledge of messaging services such as Kafka ? Decision-making, research, and problem-solving skills ? Time management with the ability to organise, assess, and prioritise multiple tasks, projects, and demands ? Experience working in an agile scrum team and strong interpersonal communication skills ? Think about systems: edge cases, failure modes, behaviours, specific implementations ? Know your way around Linux and the Unix Shell ? Have an enthusiastic, go-for-it attitude. When you see something broken, you can't help but fix it Bonus points ? Experience designing/configuring highly scalable on-premise solutions ? Experience working on security-related software solutions ? Experience securing systems against cybersecurity threats (DevSecOps)