Company Description

This is a remote role, employees are welcome to work near any of our principal location hubs: Toronto, Montreal, Vancouver, Dallas and NYC.

We are a purpose-driven, creative community who believes challenging convention moves culture forward, so we use our platform to amplify the voices that are changing the way we see the world. When you join us, you will become a part of an industry-leading fashion platform with a global reach.

Founded in 2003, we pace the vanguard of directional retail with a mix of luxury, streetwear, and avant-garde labels. We produce cutting-edge original content and take pride in building our own technology solutions and systems from scratch. Our field of focus has grown beyond that of a typical e-commerce entity as we explore the nexus of content, commerce, and culture.

Our team is currently 1100 strong, serving 150 countries, generating an average of 76 million monthly page views, and achieving high double-digit annual growth since inception. Our work is varied and ambitious. We have a roadmap full of rewarding projects to keep you motivated and engaged, with a leadership ethos based on transparency, collaboration and enabling performance.

Job Description

SSENSE is looking for a Senior Site Reliability Developer - Technical Lead to join our rapidly growing technology team. The Senior SRE-TL will join the SRE squad and will be responsible for keeping all user-facing services and other production systems running smoothly. The Senior SRE - Technical Lead will be accountable for the reliability, scalability and resilience of complex infrastructure components in terms of quality assurance and production. The ideal candidate will actively contribute to knowledge dissemination within the organization, participate in the recruiting and onboarding of new employees.

RESPONSIBILITIES

Team leadership, knowledge sharing & coaching - 25%

Enforce an effective and efficient scrum process where all team members work in the same direction
Guide SRE engineers, when needed, to break down user stories into manageable tasks
Propose and drive a development process that emphasizes quality through code reviews, automated testing, continuous integration pipelines and documentation
Develop a deep understanding of the team’s roadmap and influence it with fact-based technical arguments
Ensure proper documentation of team activities
Ensure the demo of features developed are well prepared and presented to stakeholders
Review Pull Request, documentation with the objective to guide and upskill junior developers on various technical/SRE topics
Provide fact-based technical feedback on each squad member to managers as part of the evaluation cycle
Actively contribute to SSENSE University, the internal peer learning platform, to promote continuous learning
Participate in the onboarding of new developers
Mentor Junior in all areas and other SREs in their area of deep knowledge.
Set an example for a team of SREs with positive and inclusive leadership and discussion on work
Trusted to de-escalate conflicts inside the team

Production Operations - 20%

Handle emergency response either by being on-call or by reacting to symptoms according to monitoring and escalation when needed
Accountable for ensuring & improving documentation on site reliability measures, either in application documentation, or in runbooks, explaining the issues encountered and the solutions implemented
Actively seek and identify opportunities and implement them to improve the availability and performance of the system by applying the learnings from monitoring and observation
Identify parts of the system that do not scale, provide immediate palliative measures and drive long term resolution of these incidents.
Improve the SSENSE codebase by resolving issues
Optimize cloud cost and reduce system resource usage by setting clear requirements through efficiency and capacity planning

Maintain Service Level Objectives (SLO)/ Service Level Indicator (SLI) - 20%

Plan, design and execute solutions within the infrastructure team to reach specific goals agreed upon
Share the learnings publicly, either by creating issues that provide context for anyone to understand it or by writing blog posts
Proposes ideas and solutions within the infrastructure team to reduce the workload by automation
Identify Service Level Indicators (SLIs) that will align the team to meet the availability and latency objectives
Perform and run blameless RCAs on incidents and outages aggressively looking for answers that will prevent the incident from ever happening again

Product delivery - 15%

Anticipate the technical challenges the squad will face when delivering solutions and propose and implement technical solutions to those issues
Write testable, efficient, and reusable code suitable for continuous integration and automated deployments, that respects best practices and SSENSE development standards
Raise the bar for professional SRE engineers, lead by example, and help others learn the craft through rigorous code reviews and coaching

Ownership and accountability - 10%

Be accountable for performance, reliability, scalability and resilience of complex and critical infrastructure components (web servers, data stores, hosted services, load balancers, etc.) through the proper use of replication, sharding, load balancing, monitoring, SLAs, alerting, and auto-scaling
Be an active participant in the incident escalation chain and prompt resolution
Upgrade and patch systems as required while ensuring availability of service
Contribute to cross-squad initiatives, acting as a change agent amongst peers to foster adoption of new processes or technical solutions

Recruiting - 10%

Contribute to the hiring process with application review or being part of the interview team to qualify SRE candidates

Qualifications

Bachelor’s degree in Computer Science, Engineering, or a related technical field, Master’s degree, an asset
Minimum 8 years of experience working as SRE
A minimum of 8 years experience administrating Linux based environments (Red Hat, CentOS, Debian or Ubuntu)
A minimum of 8 years experience with service-oriented architectures, micro-services.
Must have at least 2 years of working in Agile development life cycle
A minimum of 8 years experience practicing continuous integration and continuous delivery
Minimum 5 years of experience with infrastructure automation frameworks in at least two of these technologies:, Saltstack, Terraform, or Cloud Foundation engine
Expertise in infrastructure to support a microservice architecture
A minimum of 4 years experience in Infrastructure-as-code specifically with Terraform
Strong knowledge of caching technologies (Fastly, Redis) with the ability to identify opportunities for improvement
Expertise with RDBMS (MySql, Post-gres) and NoSQL (DynamoDB, DocumentDB, Mongo DB) databases at scale
Proficiency in Cloud resources (AWS) with the ability to operate them for the components owned, Certification preferred
Ability to use containers and orchestration frameworks (Kubernetes, Docker, Container registries etc.)
Proficiency in Git
Must have at least 4 years of experience with Kubernetes. Nice to have Amazon EKS, ECS experience

SKILLS

Willingness and ability to learn fast
High work ethic and results oriented
High sense of accountability and ownership
Solution-oriented mindset and can-do attitude to overcome challenges
Team player with a natural ability to build relationships
Ability to thrive in a fast-paced environment and master frequently changing Web technologies and techniques

Additional Information

WORLD CLASS TECHNOLOGY

Technology is at the core of everything we do at SSENSE. Driven by an engineering mindset and a problem-solving attitude, we blend fashion with technology to deliver an unparalleled experience to our customers as we build seamless, custom solutions to deliver the SSENSE offering.

WORLD CLASS TEAM
The SSENSE tech team is responsible for an international headless commerce platform. Working in an agile environment, our squads are made up of experienced innovators in Product Management, QA, Design, DevOps, Software Development, Machine Learning, Data Engineering, and Security. Headquartered in Montreal, our technology organization has been growing at a rate of 2X year-over-year and is doubling once again in 2021 as we expand across Canada, US, and Europe.

WORLD CLASS PLATFORM

The SSENSE platform runs on Amazon Web Services making use of serverless microservices across web, mobile and app. Our event-source architecture already achieves over 10,000 requests / second and growing at an unmatched pace, currently unseen across the industry. Our data-driven culture of innovation empowers every product team across the tech organization to explore building, testing and learning with the latest in Machine Learning techniques. Our automated continuous improvement DevOps model (making use of both blue / green and canary deployments) results in an average of 50 production releases every day.

Read more about us on our SSENSE Tech Blog.

Apply for this job

Click on apply will take you to the actual job site or will open email app.

Click above box to copy link

Copied

Share this job via