Santa Clara, CA | Remote
Innovation

Site Reliability Engineer

Linc is looking for Site Reliability Engineers who are passionate about their craft, who thrive on challenge and who believe in success through collaboration – people who would complement our existing team of world-class developers!

About the position

Join us in building the best eCommerce SaaS platform to revolutionize how retailers engage with shoppers for post­purchase activities.

Linc is looking for Software Developer in Test who is passionate about their craft, who thrive on challenge and who believe in success through collaboration – people who would complement our existing team of world-class developers!

We offer a fun, dynamic and result­ driven environment. If scale and complexity in a high growth startup excite you, Linc is the right place to be.

About Linc:

Linc’s Conversational AI platform delivers automated experiences at scale across webchat, email, SMS, Facebook, Google Business Messaging and voice connected platforms such as Alexa. From brand awareness, to consultative buying assistance to buying assistance, post-purchase and retention, Linc’s platform provides over 85% out of the box automation for all customers’ most commonly asked questions and services. The end result is higher shopper retention rates and a higher lifetime value with those clients.

As a fast growing startup, Linc has delighted tens of millions of shoppers, and its customer list includes world-class brands like Levi’s, Carter's, PacSun, Venus, Kimberly-Clark, LampsPlus. Learn more at letslinc.com

About the role:

The Site Reliability Engineer role will monitor and maintain excellent availability and performance of our platform, which consists of AI systems, API backend servers, serverless applications, SQL and NoSQL data stores running in the cloud.  The role will also perform troubleshooting, triage issues to development teams that are based in the US and Taiwan, and roll back problematic releases in order to restore normal system functionality.

Responsibilities

  • Collaborate with engineering teams to define and maintain services SLA
  • Monitor metrics, alerts, logs across infrastructure and applications
  • Create and maintain tools to monitor the platform which consists of AI systems, API backend servers, serverless applications, SQL and NoSQL data stores running in the cloud
  • Respond to incidents, troubleshoot, investigate root causes, resolve and capture incident details
  • Triage and escalate issues to the appropriate development teams
  • Roll back problematic releases in order to restore normal system functionality
  • Conduct post-incident investigation and report 

Key Qualifications

  • B.S. in Computer Science or a related field
  • 1+ years of site reliability engineering experience
  • Familiarity with at least one cloud service provider, preferably AWS
  • Familiar with basic SQL commands and Intent protocols. 
  • Proficient in cloud application orchestration tools like Kubernetes, helm
  • Experience with monitoring stacks, preferably Datadog
  • Experience with data stores like MongoDB, Postgres, Redis, Elastic Search
  • Excellent verbal and written communication skills

Nice to Have

  • Certified AWS DevOps Engineer - Professional
  • Certified Kubernetes Application Developer
  • Experience with microservices, preferably on the Python stack
  • Experience with load and performance testing and tuning

Apply for this position