Caplin is the market leader in web trading technology and single-dealer platforms. Its award-winning software enables its customers to build high-performance web trading apps and to deliver real-time information, including live prices, securely and reliably to those apps.
We currently have an immediate vacancy for a Site Reliability Engineer based who can help us in our move into the cloud. This is an outstanding opportunity for an individual wanting to bring their skills to an organisation at the leading edge of web-based trading technology at the beginning of their journey to offer fully managed services to their customers.
Caplin’s head office is in London and we are currently working remotely.
As a Site Reliability Engineer you will work within our Managed Services team to implement and launch our first fully managed deployments in AWS and ensure high levels of security, availability and observability.
Once live the managed services, support and engineering teams will work together to monitor the system 24 hours a day and respond immediately to outages and issues. When not resolving production issues the team will be responsible for the continuous improvement of the system as a whole.
- Assist with the design and implementation of our cloud offering, from continuous integration pipelines through to AWS configuration and deployments
- Work with the support team to quickly respond to production events
- Continuously improve our ability to detect, resolve and avoid future issues with the production environment either by utilising third party software or building custom tools where necessary
- Ensure the deployment adheres to the highest security standards in collaboration with the Information Security Manager
- Be part of our on call, out of core hours rota together with members of the support, engineering and management teams
You should apply if:
You will have the following skills and experience:
- Proven track record delivering managed services in a cloud environment
- Hands on experience configuring and managing AWS (Amazon Web Services) making use of services such as Fargate, CloudWatch, ELB, IAM, VPC
- Experience with observability tooling such as Prometheus / Grafana / ELK stack
- Experience managing and monitoring systems and responding to production alerts
- Experience with Docker and container orchestration
- Comfortable working with Linux variants and the command line
- Strong communication skills
- Detail oriented
- Proactive and self-improving
Other desirable skills:
- AWS certifications at foundational, associate, professional or speciality tiers
- Kubernetes certifications such as Certified Kubernetes Application Developer or Certified Kubernetes Administrator
- Development experience with web technologies and the ability to create custom tooling as required
- Experience with Terraform or other infrastructure as code frameworks
- Experience deploying services on Kubernetes
- Understanding and implementation experience of Public Cloud best practice
- Knowledge of data protection regulations, including GDPR
- Experience of working with regulated financial institutions