Lead Site Reliability Engineer

OmniTI is looking for Lead Site Reliability Engineer to help us grow our team!

The OmniTI Ops team is a flexible and progressive group. We work closely with developers, DBA's, and client groups to help them manage availability and performance in the midst of constant changes. We are not risk averse; instead we strive to understand why things fail and the true impact of those failures so that we can empower others. As team lead, you will help reinforce that collaboration is a cornerstone, and that being friendly and outgoing are keys to making that work.

About The Job

The SRE Team lead doesn't just fix websites, you will oversee all systems operations initiatives, including designing company-wide systems layout and network architecture and building out a multi-datacenter managed hosting platform. On top of that, you'll lead your team into the field and help our clients take their broken, failing infrastructures and turn them into something that "just works". You will have direct input into the business you work on, and you'll be responsible for mentoring the people working on your team.

About You:

  • Familiarity with agile methodologies, including Kanban
  • Excellent communication skills, both written and verbal
  • Ability to remain comfortable and calm in the midst of chaos
  • Ability to translate technical needs into business plans, and vice versa

Requirements & Education

  • Experience with cloud and virtualization technologies: AWS, VirtualBox, KVM, zones/containers, Vagrant, Docker
  • In-depth understanding of Unix oriented operating systems including illumos, Linux, Solaris 10+, or *BSD
  • Excellent troubleshooting skills with the ability to dive deep into all aspects of the stack to identify and fix problems
  • Strong background in web server technologies such as Apache, HAProxy, nginx
  • Familiarity with technologies such as Apache Traffic Server or Varnish, and a good working knowledge of the issues when implementing web caching
  • Strong knowledge of IP networking protocols
  • Programming/scripting experience in Ruby, Python, bash, Perl and/or JavaScript
  • Experience with configuration management tools such as Chef, Puppet, or Ansible
  • Familiarity with version control systems such as Git/Subversion, from both an end user and administrator perspective

Bonus:

  • Experience in "continuous deployment" environments
  • Experience working on multiple layers of the stack (OS, DB, Programming, etc...)
  • A history of working and sharing with external / OSS communities

You must be willing to share in an on-call rotation and work to eliminate sources of operational disruption. We work on systems that are both very large, and some that are small, but they are all mission critical to our clients, and they include a wide array of technologies. You should be comfortable getting very hands on in helping make things go.

Note: This position is located in Fulton, Maryland, however, remote work is also an option. If you contribute to an open source project, have a github, have a blog, or are involved in technology in some other way, we would love to hear about it when you write to us!

Interested? Apply here.

At OmniTI we believe in diversity as a core asset. From the tools we use to the technologies we choose to the people we work with, diversity in approach has always lead us to better success. We take pride in the diversity of our staff, and seek diversity in our applicants.

Staff Thoughts

This is a great place to be exposed to a wide variety of technologies and to be mentored by some of the brightest minds in the business. Knowledge is shared openly, and the amount is limited only by your ability to absorb it.

~ Eric Sproul, Systems Administrator.

Where else can you work with people from whose books you learned to program.

~ Leon Fayer, Vice President.