Cloud Operations Engineer at Nexthink
Boston, MA, US
Nexthink is a global leader in Digital Employee Experience. Our product allows enterprises to create highly productive digital workplaces for their employees by delivering optimal end-user experience. Through a unique combination of real-time analytics, automation and employee feedback across all endpoints, Nexthink helps IT teams meet the needs of the modern digital workplace.
Headquartered in Switzerland, Nexthink also has offices in France, UK, Germany, Spain, UAE, Saudi Arabia, Australia and the US. Our growing team of Nexthinkers is proud to be making the digital work lives of seven million employees across 1,000 customers more productive.
Thanks to our fantastic growth we are looking for new rock stars!
Nexthink is looking for passionate and innovative professionals that are keen to join a newly formed and fast growing Cloud Operations team in Boston. The team is being built to ensure our Cloud platform is operated using best in class methodologies and tools and allow us to delight our clients with the best cloud experience.
The team is responsible of maintaining our Cloud solutions with top performance, availability and service level, but also ensure that it runs in a cost-efficient way. The Cloud Operations Engineer will also use her/his Software Engineering skills to prototype and deliver tools and products that will help reaching those goals, and will also participate into the operational requirements process.
Finally, you will be part of a fast growing, international company with an opportunity to join the Cloud team, a strategic initiative that will help accelerate this growth.
We are interested in every qualified candidate who is eligible to work in the United States. However, we are not able to sponsor visas.
- Monitoring. Use and own the specifications of our tooling set related to monitoring, telemetry, reliability, automation for End to End service
- Incident management and response: Detect, diagnose and fix incidents finding solutions to achieve required Service Levels (rollback, restore backups, etc). Owner of the post-mortem process of such incidents by writing technical content both for customers and internal stakeholders.
- Operations. Define or build automation mechanisms for cloud operations: build, deploy, update, patch, backup, restore, scale, extend, protect, etc. Use past experience to solve most relevant issues in a proactive fashion by either writing product or platform specifications, or building the required automation to prevent the issues to surface again.
- Change Control. Owning the product update process for live client instances
- Reliability. Manage the availability of the production instances of our cloud services. Understand and be able to communicate the scale, capacity, security, redundancy and performance attributes and requirements of the cloud services
- Subject matter expert: be the ultimate escalation point for major platform related incidents
- Min 5 years of experience in Software Development with knowledge of best practice of professional software development, deploying, and in general lifecycle management.
- Experience with monitoring solutions, such as: Azure Analytics, Grafana, and others
- Experience administering and deploying on cloud-based platforms (Azure, AWS, Google and/or others), using infrastructure as code (Cloud Formation, Terraform, etc.), configuration management tools (Ansible, Puppet) and pipeline creation tools (like Jenkins).
- Solid understanding of the network stack (TCP/IP, VPN, HTTP, SSL, routing, etc.), cloud topologies (VPC, Virtual Subnets, NACLS, NSG, ILB, ELB, etc.) and storage (S3, EBS, Azure Files etc).
- At ease with operating and managing production systems, solving issues striking the right balance between urgency and methodology.
- Strong problem solving and analytical skills
- Experience in coordinating teams and persons to maintain a SLA.
- Excellent written and verbal skills in English