Create and maintain a continuous testing framework that observes and records and trends real-time availability data for all of our clients
Develop and maintain on-premise and cloud capacity plans that ensure we are delivering a BlackLine service that is performant and cost-effective
Improve the BlackLine SaaS service experience by discovering and highlighting optimization opportunities with existing code to address application availability, performance, observability, efficiency, and security challenges
Lead in the development of requirement definitions, capacity planning, and process refinement.
Develop tools and systems to automate the identification, analysis, and remediation of application events, infrastructure issues, or requests
Establish and maintain Key Performance Indicators for the overall health of the service and build tools to exercise and evaluate if these KPIs are being met
Works cross-functionally to surface common pain points, architect solutions, establish conventions, and evangelize application development and operations best practices
Regularly learn new systems and tools as the BlackLine platform and ecosystem evolves
Contribute knowledge, skills, and personal qualities to a dedicated team of top engineers through mentorship and training, solving real-life problems in a bleeding-edge, high-performance, and high-traffic environment
Assessing, testing, tracking, predicting, and reporting all related performance aspects of a suite of production applications from a performance, responsiveness, capacity, and availability perspective
Serve as technical lead for large projects, determining objectives and approaches to critical assignments, and may oversee multiple projects concurrently
Publish performance result findings, conclusions, and recommendations
Support integration of performance data into customer experience analytics tools and reporting
Participate in our on-call rotation and conduct incident reviews
Other duties as assigned
Qualifications:
BS or MS in Computer Science (or equivalent diploma and/or certifications) with 7+ years of related experience
Advanced knowledge of at least two of the following programming languages: C#, Visual Basic, PowerShell, Java, Go, Linux Shell, Ruby
Demonstrated history of developing or operating production web applications and a solid understanding of HTTP(S), HTML, JavaScript, CSS, and XML
Significant experience in a lead role on a software development team
Baseline understanding of project management process/procedures with experience: agile and waterfall. Experience managing one or more small to medium projects
Experience deploying high availability systems and software
Experience with troubleshooting distributed web applications in a production environment.
Advanced level knowledge of IIS and Windows Server or Linux and Apache
Experience with infrastructure as a code and platform as a service
Experience with configuration management tools Ex Chef, Ansible, Puppet, or container orchestration platforms like Kubernetes or Docker Swarm
Advanced level knowledge in deploying and managing open source observability tools; such as Prometheus, Graphana, Jaeger, or commercial equivalents
Capable of producing clean, readable code in a multi-developer team environment
Extensive knowledge of managing cloud platforms and cloud native tools
Must possess the ability to handle multiple goals concurrently and function in a fast-paced, demanding, ever changing high-growth environment
Must maintain the highest level of integrity, courtesy, and respect while interacting with internal customers, employees, and business contacts
Ability to effectively communicate (oral and written) in all business relationships and various levels of management in a clear, direct manner
Ability to interface with internal technical experts using professional interpersonal skills
Experience in analyzing datasets to draw conclusions, and graph datasets supporting these conclusions
Intermediate level proficiency in application load balancing methods (F5 LTM, Windows NLB, etc.)
Working knowledge of TCP/IP and networking concepts
Proficiency with statistical concepts; confidence interval, hypothesis testing, sampling
Operating systems concepts such as CPU, memory, disk queues and graphing/analyzing these over time
Must possess strong organizational skills and be able to work with minimal oversight
Ability to understand new technologies quickly and adapt these into daily work and goals