Site Reliability Engineer

About the position

Join us in building the best eCommerce SaaS platform to revolutionize how retailers engage with shoppers for postpurchase activities.

Linc is looking for Software Developer in Test who is passionate about their craft, who thrive on challenge and who believe in success through collaboration – people who would complement our existing team of world-class developers!

We offer a fun, dynamic and result driven environment. If scale and complexity in a high growth startup excite you, Linc is the right place to be.

About Linc:

Linc’s Conversational AI platform delivers automated experiences at scale across webchat, email, SMS, Facebook, Google Business Messaging and voice connected platforms such as Alexa. From brand awareness, to consultative buying assistance to buying assistance, post-purchase and retention, Linc’s platform provides over 85% out of the box automation for all customers’ most commonly asked questions and services. The end result is higher shopper retention rates and a higher lifetime value with those clients.

‍As a fast growing startup, Linc has delighted tens of millions of shoppers, and its customer list includes world-class brands like Levi’s, Carter's, PacSun, Venus, Kimberly-Clark, LampsPlus. Learn more at letslinc.com

About the role:

The Site Reliability Engineer role will monitor and maintain excellent availability and performance of our platform, which consists of AI systems, API backend servers, serverless applications, SQL and NoSQL data stores running in the cloud. The role will also perform troubleshooting, triage issues to development teams that are based in the US and Taiwan, and roll back problematic releases in order to restore normal system functionality.

Responsibilities

Collaborate with engineering teams to define and maintain services SLA
Monitor metrics, alerts, logs across infrastructure and applications
Create and maintain tools to monitor the platform which consists of AI systems, API backend servers, serverless applications, SQL and NoSQL data stores running in the cloud
Respond to incidents, troubleshoot, investigate root causes, resolve and capture incident details
Triage and escalate issues to the appropriate development teams
Roll back problematic releases in order to restore normal system functionality
Conduct post-incident investigation and report

Key Qualifications

B.S. in Computer Science or a related field
1+ years of site reliability engineering experience
Familiarity with at least one cloud service provider, preferably AWS
Familiar with basic SQL commands and Intent protocols.
Proficient in cloud application orchestration tools like Kubernetes, helm
Experience with monitoring stacks, preferably Datadog
Experience with data stores like MongoDB, Postgres, Redis, Elastic Search
Excellent verbal and written communication skills

Nice to Have

Certified AWS DevOps Engineer - Professional
Certified Kubernetes Application Developer
Experience with microservices, preferably on the Python stack
Experience with load and performance testing and tuning

‍

Apply for this position