*Only local candidates will be considered for this role
As a Sr. Site Reliability Engineer (SRE, DevOps, Cloud) at PeopleConnect, you will work with to define and build the cloud strategy to support all of classmates.com needs. You will be central to the development of the future of classmates.com as we modernize our infrastructure platform. We are at a critical junction in migrating from hosted datacenters toward cloud native computing, all the while maintaining business in a hybrid fashion as we continue our transition. This role must provide both innovative cloud native solutions that are cost effective as well as steering direction for our hosted environment, ensuring performance and stability. You will work closely with a talented group of application admins, system admins, and developers, who work hard and play harder!
The position will require a good mix of steadfast persistence, innovative thinking, ability to interpret performance data, and good people skills. If you can do all that while having fun, even better!
Current projects include:
- Re-think our approach to CI\CD and the use of infrastructure
- Re-imagine the infrastructure for running our sites and services
- Develop Canary and Blue/Green deployment strategies and implement them
- Provide thought leadership, mentorship and technical vision related to site reliability, DevOps and a ‘cloud-first’ culture that’s focused on data excellence
- Analyze and recommend the right cloud services and services to build technical solutions to meet business needs
- Research and implement new and better ways to interact with Cloud Services
- Work across the aisle (Business, Technology) to ensure project scope is known & results are implemented.
- Bring Awareness to cost optimizations, efficiencies and how to best leverage cloud services
- Drive orchestration efforts concerning all things “Cloud”.
- Design implement and support self-service aspects for Cloud Native applications.
- Develop and implement Proactive Security aspects for Cloud Native applications
- Implement logic surrounding cloud best practices, aka Well Architected Frameworks for Cloud Adoption.
- A focus on continual improvement and collaboration: drive technical innovation in operations via automation; creation of processes that enhance operational excellence and workflow
- 5+ years of Linux/Unix experience along with commonly leveraged tools/languages of such
- 5+ years building and running systems designed for high availability Internet-facing services, including performance engineering or capacity management
- Solid understanding of how commodity compute can be leveraged within a Cloud Native Organization.
- Deep and relevant experience with Cloud Services within AWS, especially Data & Security services.
- Relatable automation experience of cloud assets via Terraform, alternatively Cloud Formation.
- 3+ years of development of systems management and administration automation in Perl, Python, Ruby, Bash/Shell, Java, C or your go-to of choice
- Experience building scripts, tooling, and automation that not only functions well, but scales to meet requirements of tomorrow.
- Excellent troubleshooting and problem analysis skills in 24x7 production environments.
- Prior experience with logging or monitoring services or infrastructure a plus
- Experience with distributed systems (Small, Large or anywhere in between), including system performance testing, profiling, and tuning
- Working knowledge of SQL and database administration basics
- Knowledge of TCP/IP networking, architecture, and core technologies (such as DNS, DHCP, HTTP, Routing, VPN) a plus!
- Four-year degree in Information Systems Management, Computer Sciences, or equivalent experience
- 5+ years supporting highly available Linux infrastructure
- 5+ years automation, orchestration or relevant Cloud Experience
- Previous experience migrating to cloud and/or containerization
- Previous experience designing systems and services for 24x7x365 operational stability
PeopleConnect, Inc. is an equal opportunity employer