Site Reliability Developer

at

SSENSE

Dallas, TX
Full Time
3y ago

Company Description

This is a remote role, employees are welcome to work near any of our principal location hubs: Toronto, Montreal, Vancouver, Dallas and NYC.

SSENSE (pronounced [es-uhns]) is a global technology platform operating at the intersection of culture, community, and commerce. Headquartered in Montreal, it features a mix of established and emerging luxury brands across womenswear, menswear, kidswear, and Everything Else.

SSENSE has garnered critical acclaim as both an e-commerce engine and a producer of cultural content, generating an average of 100 million monthly page views. Approximately 80% of its audience is between the ages of 18 to 40. It is privately held and has achieved high double digit annual growth and profitability since its inception.

Job Description

SSENSE is looking for a Site Reliability Developer to join our rapidly growing technology team. They will join the squad and are responsible for keeping all user-facing services and other production systems running smoothly. SRE will be accountable for the reliability, scalability and resilience of infrastructure components in terms of quality assurance and production.

RESPONSIBILITIES 

Production Operations  -  60%

  • Respond to production emergencies, troubleshoot, monitor and escalate until successful resolution

  • Collaborate with the Infrastructure team to implement and  improve automation, reduce toil and help to deploy and improve our cloud environment faster.

  • Constantly oversee cloud efficiency and capacity, reduce and improve the system resources usage to optimize cloud cost

  • Improve existing documentation for site reliability measures, applications and cloud components. Create and maintain runbooks to aid understanding and and improve recovery and resolution time

Maintain Service Level Objectives (SLO) / Service Level Indicator (SLI)  - 30%

  • Collaborate with Software Engineering teams, help to adopt SRE approach for Availability, Reliability, Scalability, Disaster Recovery and Performance of production services

  • Contribute to identifying Service Level Indicators (SLIs)  and contribute to solutions for improving service defensiveness, reducing alert noise, improving monitoring, and helping our services reach Service Level Objectives (SLOs)

Knowledge sharing and coaching - 5%

  • Join, organize or participate in SSENSE University sessions to ramp up on various technologies

  • Participate in the buddy and onboarding programs for new engineers

Recruiting - 5%                                                                                                       

  • Participate in HR recruiting events, helping to identify and recruit top talents.

Qualifications

  • Bachelor’s degree in Computer Science, Engineering, or a related technical field, Master’s degree is an asset

  • Minimum 3 years of experience working as SRE or DevOps

  • Experience in Unix/Linux environments with a good understanding of operating systems internals

  • Experience with service-oriented architectures, micro-services and CI/CD

  • Experience with state configuration tools (Ansible, Puppet, Chef, etc.)

  • Proficiency with any infrastructure automation tool (Terraform, CloudFormation, Serverless, CDK)

  • Experience with distributed logging and monitoring( ELK, DataDog, SumoLogic, CloudWatch, Prometheus)

  • Practical knowledge of caching technologies (Redis, Varnish) with the ability to identify opportunities for improvement.

  • Experience with RDBMS (MySql, Postgres) and NoSQL (DynamoDB, MongoDB)

  • Practical experience working with AWS, Certification is preferred.

  • Ability to use container based environments and tools such as Kubernetes, Docker, Helm

  • Amazon EKS, ECS experience is  an asset

SKILLS

  • Willingness and ability to learn quickly

  • High work ethic and results oriented

  • High sense of accountability and ownership

  • Solution-oriented mindset and can-do attitude to overcome challenges

  • Team player with a natural ability to build relationships

  • Ability to thrive in a fast-paced environment and master frequently changing Web technologies and techniques.

Additional Information

WORLD CLASS TECHNOLOGY 

Technology is at the core of everything we do at SSENSE. Driven by an engineering mindset and a problem-solving attitude, we blend fashion with technology to deliver an unparalleled experience to our customers as we build seamless, custom solutions to deliver the SSENSE offering. 

WORLD CLASS TEAM
The SSENSE tech team is responsible for an international headless commerce platform. Working in an agile environment, our squads are made up of experienced innovators in Product Management, QA, Design, DevOps, Software Development, Machine Learning, Data Engineering, and Security. Headquartered in Montreal, our technology organization has been growing at a rate of 2X year-over-year and is doubling once again in 2021 as we expand across Canada, US, and Europe.  

WORLD CLASS PLATFORM 

The SSENSE platform runs on Amazon Web Services making use of serverless microservices across web, mobile and app. Our event-source architecture already achieves over 10,000 requests / second and growing at an unmatched pace, currently unseen across the industry.  Our data-driven culture of innovation empowers every product team across the tech organization to explore building, testing and learning with the latest in Machine Learning techniques. Our automated continuous improvement DevOps model (making use of both blue / green and canary deployments) results in an average of 50 production releases every day.  

Read more about us on our SSENSE Tech Blog.

    Apply for this job

    Click on apply will take you to the actual job site or will open email app.

    Click above box to copy link
    Copied
    Get exclusive remote work stories and fresh remote jobs, weekly 👇
    View all remote jobs
    Onkar By: Onkar