About the Company: We are seeking an experienced Manager or Director of Production Operations who will manage the Production Operations team, oversee and improve monitoring, platform performance, infra-structure, systems architecture, and 24x7 support. This role requires someone with a rock-solid systems administration experience and a hands-on attitude.
Job Description:
- Build and maintain a state of the art monitoring and alert system
- Monitor stability and performance of the production systems
- Troubleshoot issues with applications, OS, hardware, and the network.
- Plan capacity for datacenter migration and expansion
- Configure new servers, virtual servers, and network equipment
- Automate processes and monitoring that are currently done manually
- Establish, document, and maintain processes and policies
- Maintain and improve current change control process
- Manage and participate in 24x7 coverage and on-call troubleshooting
- Prioritize tasks for the team, while interacting with the engineering team
Requirements:
- Bachelor Degree in computer science or related engineering degree
- Experience with web metrics and reporting solutions
- Experience troubleshooting and resolving server and application performance issues
- Experience with different monitoring systems (OpenNMS, Nagios, Cacti, etc)
- Experience in at least two relevant scripting or programming languages (shell and a major scripting language like Perl or Python)
- Experience with NFS, DNS, BIND, DHCP
- Experience with storage systems (SAN, NAS, etc.)
- Demonstrable experience with SQL Databases (preferably MS SQL Server or MySQL)
- Excellent verbal and written communication skills
- Ability to produce system documentation, including writing requirements, operational specifications, system architecture, test plans and as-built documentation, all with attention to detail
- Well-organized, detail oriented and able to work effectively and efficiently in a fast-paced environment
- Able to travel internationally
Location: Silicon Valley, USA