Cloud Service Site Reliability engineer

Bridgewater, NJ 08807

Employment Type: Direct Hire Area of Specialty: Other Area(s) Job Number: 7512

Job Description

The Sr. Cloud Service Reliability Engineer is responsible for effective design, implementation, operation and maintenance of infrastructure on premise and in the cloud.

You will contribute as a core team member in the design, development, testing, and support of data analytics systems. This position requires evaluation, implementation and management of software tools and practices to mitigate risks and introduce operational efficiencies. 


  • Ensure that our hybrid cloud environment – Specifically Google Cloud Platform (GCP), Amazon Web Services (AWS) and Microsoft Azure meets requirements for redundancy, scalability, performance and security. 

  • Expert-level knowledge in Microsoft technologies including Active Directory, Azure Active Directory, Office 365, and windows servers.

  • Lead projects related to cloud infrastructure and help define the current and future directions by collaborating with server, storage, network and applications teams.

  • Provide hands-on technical expertise to design, deploy, secure and optimize cloud services

  • Familiarity with container solutions (kubernetes, Docker, etc.)

  • Be proficient in one or more Configuration management tools including Puppet, Chef, Fabric, Ansible, and/or Salt.

  • Working knowledge of infrastructure as code (IaC) software tools such as Terraform/Ansible with a demonstrated implementation.

  • Design & implement DevOps Best practices, establish standards and policies for managing source code and continuous integration/delivery using Jenkins and Github.

  • Manage multi-tenant infrastructure and data analytics systems consisting of technologies  like Hadoop, MapR, Informatica and other data related technologies. 

  • Collaborate with product managers, lead engineers and data scientists on all facets of Big Data ecosystem. 

  • In-depth understanding of networking, distributed systems, cloud design patterns, API' s, and security.

  • Engage in service capacity planning and demand forecasting, performance analysis and system tuning. 

  • Investigate, evaluate, test and recommend technical solutions for future systems.

  • Participate in a 24x7 on-call rotation to handle product availability issues as well as urgent customer support escalations.

  • Ability to work on complicated projects with multiple stages and convert long term strategy into short and long-term objectives.

  • Participate in architecture reviews.

  • Provide couching and mentoring to junior staff members.



  • Minimum of 8 years of experience in engineering site reliability, Linux, Windows, DevOps, and maintaining infrastructure on premise and in cloud environment.

  • Possess at least 3 years of managing a multi-tenant production Hadoop or other data analysis environment.


  • Bachelor’ s Degree in Information Technology

  • Cloud Systems Administrator or Developer certification considered a plus


  • A deep understanding of operating systems and computer architecture

  • Well versed with DevOps and SRE practices

  • Strong knowledge and understanding of microservices based architectures, APIs, etc

  • Ability to write scripts from scratch using Python, Perl or Ruby

  • Strong analytical and troubleshooting skills 

  • Experience with Splunk, Solarwinds and other operational monitoring tools

  • Highly collaborative with effective written and verbal communication skills

  • Ability to concentrate on a wide range of loosely defined complex situations, which require creativity and originality, where guidance and counsel may be unavailable.

  • Leadership and management skills in mentoring, evaluation, and development following a servant-leader mentality

This position will be required to work with technical resources through the leadership level (up to the VP level) of the Corporate Applications organization that is responsible for database and business analytics services.
Must be able to liaise between the multiple organizations within the Infrastructure & Operations team, often coordinating project and BAU efforts the systems, network and operations teams. Be able to present issues and suggestions to Infrastructure & Operations management up to and including the SVP responsible for the area.

This position will also be required to interact with multiple external service providers and companies. They will need to interface with managed services and support providers in the big data technology space. Responsible to coordinate activities around platform health, stability, architecture, design and disaster recovery. The position will also be required to engage external public cloud service providers and have discussions on cloud services, service levels, and costs associated with their services. This will span technical and account management vendor resources.


  • Monitors and reviews technology and project budgets.


  • Recommend best methods for technical resolutions 

  • Develop Infrastructure standards

  • Recommends and evaluates vendors for projects

  • Delegation of day to day problems and / or issues that may arise

    Enjoy technology. Ability to change, as the business needs evolves. Stay on pace with the rapid change present within the cloud computing space.


Ability to work in a fast-paced environment where change is the norm. 

This a global role.

Meet Your Recruiter

Nyree Anderson
Senior Technical Recruiter

Dynamic and results-oriented, Nyree has a proven track record in technical recruiting. She possess a strong understanding of technical requirements; exceptional ability to source passive candidates as well as effectively assess suitability for opportunities on hand. Along with being resourceful and solutions-focused, Nyree’s innate ability to build fruitful relationships has brought her success across many industries.

Send an email reminder to:

Share This Job:

Related Jobs:

Login to save this search and get notified of similar positions.