Senior Site Reliability Developer - Technical Lead

at

SSENSE

Waterloo, Canada
Full Time
3y ago

Company Description

This is a remote role, employees are welcome to work near any of our principal location hubs: Toronto, Montreal, Vancouver, Dallas and NYC.

We are a purpose-driven, creative community who believes challenging convention moves culture forward, so we use our platform to amplify the voices that are changing the way we see the world. When you join us, you will become a part of an industry-leading fashion platform with a global reach. 

Founded in 2003, we pace the vanguard of directional retail with a mix of luxury, streetwear, and avant-garde labels. We produce cutting-edge original content and take pride in building our own technology solutions and systems from scratch. Our field of focus has grown beyond that of a typical e-commerce entity as we explore the nexus of content, commerce, and culture. 

Our team is currently 1100 strong, serving 150 countries, generating an average of 76 million monthly page views, and achieving high double-digit annual growth since inception. Our work is varied and ambitious. We have a roadmap full of rewarding projects to keep you motivated and engaged, with a leadership ethos based on transparency, collaboration and enabling performance.

Job Description

SSENSE is looking for a Senior Site Reliability Developer - Technical Lead to join our rapidly growing technology team. The Senior SRE-TL will join the SRE squad and will be  responsible for keeping all user-facing services and other production systems running smoothly. The Senior SRE -  Technical Lead will be accountable for the reliability, scalability and resilience of complex infrastructure components in terms of quality assurance and production. The ideal candidate will actively contribute to knowledge dissemination within the organization, participate in the recruiting and onboarding of new employees.

RESPONSIBILITIES

Team leadership, knowledge sharing & coaching   - 25%

  • Enforce an effective and efficient scrum process where all team members work in the same direction
  • Guide SRE engineers, when needed, to break down user stories into manageable tasks
  • Propose and drive a development process that emphasizes quality through code reviews, automated testing, continuous integration pipelines and documentation
  • Develop a deep understanding of the team’s roadmap and influence it with fact-based technical arguments
  • Ensure proper documentation of team activities
  • Ensure the demo of features developed are well prepared and presented to stakeholders
  • Review Pull Request, documentation  with the objective to guide and upskill junior developers on various technical/SRE topics
  • Provide fact-based technical feedback on each squad member to managers as part of the evaluation cycle
  • Actively contribute to SSENSE University, the internal peer learning platform, to promote continuous learning
  • Participate in the onboarding of new developers 
  • Mentor Junior in all areas and other SREs  in their area of deep knowledge.
  • Set an example for a team of SREs with positive and inclusive leadership and discussion on work
  • Trusted to de-escalate conflicts inside the team

Production Operations  - 20%

  • Handle emergency response either by being on-call or by reacting to symptoms according to monitoring and escalation when needed
  • Accountable for ensuring & improving documentation on site reliability measures, either in application documentation, or in runbooks, explaining the issues encountered and the solutions implemented
  • Actively seek and identify opportunities and implement them to  improve the availability and performance of the system by applying the learnings from monitoring and observation
  • Identify parts of the system that do not scale, provide immediate palliative measures and drive long term resolution of these incidents.
  • Improve the SSENSE codebase by resolving issues
  • Optimize cloud cost and reduce system resource usage by setting clear requirements through efficiency and capacity planning

Maintain Service Level Objectives (SLO)/ Service Level Indicator (SLI)  - 20%

  • Plan, design and execute solutions within the infrastructure team to reach specific goals agreed upon
  • Share the learnings publicly, either by creating issues that provide context for anyone to understand it or by writing blog posts
  • Proposes ideas and solutions within the infrastructure team to reduce the workload by automation
  • Identify Service Level Indicators (SLIs) that will align the team to meet the availability and latency objectives
  • Perform and run blameless RCAs on incidents and outages aggressively looking for answers that will prevent the incident from ever happening again

Product delivery - 15%

  • Anticipate the technical challenges the squad will face when delivering solutions and propose and implement  technical solutions to those issues
  • Write testable, efficient, and reusable code suitable for continuous integration and automated deployments, that respects best practices and SSENSE development standards
  • Raise the bar for professional SRE engineers, lead by example, and help others learn the craft through rigorous code reviews and coaching

Ownership and accountability - 10%

  • Be accountable for performance, reliability, scalability and resilience of complex and critical infrastructure components (web servers, data stores, hosted services, load balancers, etc.) through the proper use of replication, sharding, load balancing, monitoring, SLAs, alerting, and auto-scaling
  • Be an active participant in the incident escalation chain and prompt resolution
  • Upgrade and patch systems as required while ensuring availability of service
  • Contribute to cross-squad initiatives, acting as a change agent amongst peers to foster adoption of new processes or technical solutions

Recruiting - 10%                                                                                                     

  • Contribute to the hiring process with application review or being part of the interview team to qualify SRE candidates

Qualifications

  • Bachelor’s degree in Computer Science, Engineering, or a related technical field, Master’s degree, an asset
  • Minimum  8 years of experience working as SRE
  • A minimum of 8 years experience administrating Linux based environments (Red Hat, CentOS, Debian or Ubuntu)
  • A minimum of 8 years experience with service-oriented architectures, micro-services.
  • Must have  at least 2 years of working in Agile development life cycle
  • A minimum of 8 years experience practicing continuous integration and continuous delivery
  • Minimum 5 years of experience with infrastructure automation frameworks in at least two of these technologies:, Saltstack, Terraform, or Cloud Foundation engine
  • Expertise in infrastructure to support a microservice architecture
  • A minimum of 4 years experience in Infrastructure-as-code specifically with Terraform
  • Strong knowledge of caching technologies (Fastly, Redis) with the ability to identify opportunities for improvement
  • Expertise with RDBMS (MySql, Post-gres) and NoSQL (DynamoDB, DocumentDB, Mongo DB) databases at scale
  • Proficiency in Cloud resources (AWS) with the ability to operate them for the components owned, Certification preferred
  • Ability to use containers and orchestration frameworks (Kubernetes, Docker, Container registries etc.)
  • Proficiency in Git
  • Must have at least 4 years of  experience  with Kubernetes. Nice to have Amazon EKS, ECS experience

SKILLS

  • Willingness and ability to learn fast
  • High work ethic and results oriented
  • High sense of accountability and ownership
  • Solution-oriented mindset and can-do attitude to overcome challenges
  • Team player with a natural ability to build relationships
  • Ability to thrive in a fast-paced environment and master frequently changing Web technologies and techniques

Additional Information

WORLD CLASS TECHNOLOGY 

Technology is at the core of everything we do at SSENSE. Driven by an engineering mindset and a problem-solving attitude, we blend fashion with technology to deliver an unparalleled experience to our customers as we build seamless, custom solutions to deliver the SSENSE offering. 

WORLD CLASS TEAM
The SSENSE tech team is responsible for an international headless commerce platform. Working in an agile environment, our squads are made up of experienced innovators in Product Management, QA, Design, DevOps, Software Development, Machine Learning, Data Engineering, and Security. Headquartered in Montreal, our technology organization has been growing at a rate of 2X year-over-year and is doubling once again in 2021 as we expand across Canada, US, and Europe.  

WORLD CLASS PLATFORM 

The SSENSE platform runs on Amazon Web Services making use of serverless microservices across web, mobile and app. Our event-source architecture already achieves over 10,000 requests / second and growing at an unmatched pace, currently unseen across the industry.  Our data-driven culture of innovation empowers every product team across the tech organization to explore building, testing and learning with the latest in Machine Learning techniques. Our automated continuous improvement DevOps model (making use of both blue / green and canary deployments) results in an average of 50 production releases every day.  

 

Read more about us on our SSENSE Tech Blog.

    Apply for this job

    Click on apply will take you to the actual job site or will open email app.

    Click above box to copy link
    Copied
    Get exclusive remote work stories and fresh remote jobs, weekly 👇
    View all remote jobs
    Onkar By: Onkar