Site Reliability Engineer (SRE) - Monitoring
Full time, Permanent, Hybrid Job in Sofia, Bulgaria
Remote IT World helps Tech and Blockchain Professionals to get hired for 100% remote jobs.
We are a first-choice staffing partner of high-growth startups and scale-ups worldwide.
Ready to embrace freedom and flexibility?
We’re building a new technical support/ sre - monitoring team and looking to hire 6 smart, reliable and ambitious people to join as
Site Reliability Engineer - Monitoring
Join an innovative, dynamic software company based in Sofia, Bulgaria. We provide B2B services to airlines, passenger service systems, and a variety of travel companies. And we are very good at it. Our solutions are the most technologically advanced in the travel technology market. Our clients are major global companies around the world.
The company culture promotes innovation, initiative, streamlined communication and decision making. If you have great ideas, you will have the opportunity to research, get approval for them and implement them quickly.
We are seeking a few highly motivated Site Reliability Engineers (SRE) to join our team. As a SRE- Monitoring, you will work closely with all members of the Infra team to ensure that our systems are monitored and meet our Service Level Agreements (SLAs).
- Assist in the development and maintenance of monitoring systems to track our systems' health and performance.
- Work with development teams to ensure that applications are designed with monitoring in mind.
- Assist and be part of the building and maintaining dashboards that provide real-time visibility into system performance and availability.
- Respond to alerts and incidents, troubleshoot issues, and work with cross-functional teams to resolve them.
- Аnalyze trends in system performance and proactively identify potential issues before they occur.
- Assist in developing and maintaining Service Level Objectives (SLOs) and Service Level Indicators (SLIs) for our systems.
- Monitor and report on our SLA compliance, and work with teams to identify areas for improvement.
- Develop and maintain runbooks and documentation for incident response and resolution.
- Participate in on-call rotations to ensure 24/7 availability of our systems.
- Work closely with Senior SREs to continuously evaluate and improve our monitoring and incident response processes.
- Handson skills in Linux environment.
- Knowledge of monitoring tools ex. Prometheus, Grafana etc..
- Understanding of networking concepts ex. TCP/IP and/or DNS etc.
- Experience with automation and scripting languages such as Python or Bash is considered a plus.
- Previous commercial experience in system administration and/or technical support - Is considered as a big plus
You will be ideal if you have:
- Familiarity with cloud technologies
- Ansible experience
- Understanding in GIT
- Knowledge in containerization orchestrators ex. Kubernetes
Skills and Attributes
- Proactive attitude and responsible personality
- Excellent communication and collaboration skills.
- Ability to resolve incidents while directly working with clients.
- Willingness to work on shifts.
- Advanced spoken and written English language
- Opportunity to expand knowledge and skills in the DevOps methodology
- Chance to be among the first team members
- Attractive compensation package
- Company provided equipment
- Private health insurance
- Access to Multisport card
- Preliminary interview with HR
- Technical Interview with the Monitoring Team Lead
- Final round with the C-level
If you are passionate about monitoring and meeting SLAs, and have the skills and experience to excel in this role, we would love to hear from you. Apply today and join our team of dedicated SREs!
For Site Reliability Engineers - Monitoring only shortlisted candidates will be contacted.
Your job search is strictly confidential.