SITE RELIABILITY MANAGER WITH SECURITY CLEARANCE
Company: Karsun Solutions Llc
Location: Washington
Posted on: October 14, 2024
|
|
Job Description:
Find Your Next at Karsun Solutions and transform your career
with the company transforming possible for the US Government. At
Karsun, collaboration drives our community. We're committed to
building an environment where team members from diverse backgrounds
can innovate, learn and grow with us. Here at Karsun, the only
limit to your potential is the limit of your curiosity. And because
we know well-being empowers us to thrive, we offer robust and
comprehensive benefits including: * Health, Life & Disability
Insurance - Medical, Dental, Life and Disability coverage is paid
for by Karsun for full time employees.
* Paid Parental Leave
* 401k Retirement Plan - with pre-tax and post-tax ROTH
contribution offerings and immediate vesting with a per pay period
match
* Generous time off programs including 11 paid holidays per
year
* Supplemental plans such as Vision, Pet Insurance and 529 Savings
Plan
* Employee Assistance Program with behavioral health, physical
wellness and financial advice
* Employee Discounts & Perks
* In-house Technical/Skills Training
Join Team Karsun and Find Your Next. Karsun Solutions is an Equal
Employment Opportunity (EEO) employer. It is the policy of the
Company to provide equal employment opportunities to all qualified
applicants without regard to race, color, religion, sex, sexual
orientation, gender identity, national origin, age, protected
veteran or disabled status, or genetic information. Karsun does not
accept unsolicited resumes through or from search firms or staffing
agencies. All unsolicited resumes will be considered the property
of Karsun and Karsun will not be obligated to pay a placement fee.
We are seeking a highly skilled and experienced Site Reliability
Manager to join our team. The ideal candidate will be responsible
for ensuring the reliability, scalability, and performance of our
systems and services. They will lead a team of engineers in
designing, implementing, and maintaining robust infrastructure and
automation solutions. The ideal candidate must reside in the
Washington DC area and be available to work on site in downtown
Washington DC as required. * Lead a service delivery team of 8-20
people (Service Support specialist, DevSecOps and Site reliability
engineers) * Define and implement best practices for infrastructure
as code, deployment automation, and monitoring
* Collaborate with cross-functional teams to design scalable and
fault-tolerant architectures.
* Develop and maintain service level objectives (SLOs) and key
performance indicators (KPIs) to measure system reliability and
performance.
* Conduct post-mortems and root cause analyses for incidents and
implement preventive measures to mitigate future incidents.
* Drive continuous improvement initiatives to enhance the
reliability, scalability, and efficiency of our systems and
services.
* Mentor and coach team members to foster a culture of learning and
innovation. Required: * Bachelor's degree in computer science,
Engineering, or a related field; Master's degree preferred.
* 10+ years of experience in a similar role managing a team of site
reliability engineers and delivering in AWS cloud platform.
* Proven track record of managing high-performance teams.
* 5+ years of experience supporting operations and maintenance for
cloud-native applications in production that are fault-tolerant,
self-healing, scalable and high available,
* Deep understanding of cloud computing platforms (e.g., AWS,
Azure, GCP) and containerization technologies (e.g., Docker,
Kubernetes).
* Strong knowledge of infrastructure as code tools (e.g.,
Terraform, Ansible, ArgoCD) and CI/CD pipelines.
* Experience with monitoring, logging, and observability tools like
DataDog, AWS Cloudwatch, ELK, Prometheus, Splunk etc. * Excellent
communication and interpersonal skills, with the ability to
collaborate effectively with cross-functional teams.
* Strong problem-solving and analytical skills, with a keen
attention to detail.
* Certifications such as AWS Certified DevOps Engineer or Google
Professional Cloud DevOps Engineer are a plus.
* Ability to obtain and maintain a Public Trust clearance.
Preferred: * Understanding of modern architecture, e.g.
micro-services, EDA, etc., and cautious against overcomplexity and
overengineering
* Experience with monitoring and metrics platforms, e.g. New Relic,
Prometheus, InfluxDB, Grafana, Splunk, etc
* Experience designing and operating distributed systems and cloud
infrastructure at scale In accordance with pay transparency
guidelines, the proposed salary range for this position is
$140,000.00 to $180,000.00. Final salary will be determined based
on various factors such as relevant skills, experience and
certifications.
Keywords: Karsun Solutions Llc, Montgomery Village , SITE RELIABILITY MANAGER WITH SECURITY CLEARANCE, Professions , Washington, Maryland
Click
here to apply!
|