We use cookies. Find out more about it here. By continuing to browse this site you are agreeing to our use of cookies.
#alert
Back to search results

Senior Software Engineer - Hail Team

Broad Institute
United States, Massachusetts, Cambridge
Jan 25, 2025

Description & Requirements
At the Broad Institute broadly and within the Neale Lab specifically, we leverage statistical and software techniques to understand the mechanisms of disease from extremely large datasets generated by scalable sequencing technologies. The lab and Institute are entering an age of one million sequences, millions of transcriptomes, tens of thousands of medical images, and complete medical records. The development of scalable scientific assays has transformed biological engineering problems into software engineering ones. We seek a senior software engineer to help solve those problems.
This team develops, maintains, and operates Hail, a suite of libraries, data systems, and services for analyzing the world's largest genome sequencing datasets. Hail supports scientists beginning with individual sequences through the production of a sequencing matrix, the calculation of per-row and per-column statistics, distributed matrix multiplications to search for genetic relatedness, preparation of thousands of phenotypes per sequence, regression to search for genetic associations with phenotypes, subsetting and export for distribution to collaborators, and as a data store for web-based data browsers and rare disease diagnostic support systems.
The team faces three major challenges in the coming years. First, the largest sequencing callset has doubled every year since 2003 and the next doubling is anticipated in 2025. Second, the phenotypes have grown from binary disease status tables to medical records, medical images, and cellular assays. Third, the project must adapt to the changing hardware landscape, new scientific-analytical techniques, and new analytical databases.
Hail's two core products are Query and Batch, both of which are open source and openly developed. We are seeking a Senior Software Engineer to focus primarily on Batch. Batch is a cost-metered, multi-tenant, spot-tolerant, elastic, horizontally-scalable compute engine. The team operates an installation of Batch as a Software-as-a-Service for a community of hundreds of scientists within the Broad Institute.
Batch is implemented in Python, the control plane is deployed on Kubernetes, the compute plan is a directly managed set of VMs. Batch relies on many technologies including: OCI container images, crun, Google and Azure cloud storage, Google and Azure VM APIs, Google and Azure container registry APIs, Grafana, Prometheus, OAuth2, MySQL, Envoy, and asyncio.
Responsibilities
  • Work with scientists and software engineers to realize transformative scientific goals.
  • Design, implement, test, tune, document, deploy, operate, maintain and support new features, analysis methods, and infrastructure.
  • Operate and maintain computing infrastructure and software deployments.
  • Participate in constructive code reviewing, share best practices with team members.
  • Mentor junior engineers and interns.
  • Contribute to system architecture and design.
  • Refine software development processes and best practices.
Requirements
  • Expertise in Python as our primary programming language.
  • B.S. or B.A. in Computer Science or related field.
  • 5+ years industry experience working as part of a software team.
  • Experience designing and developing one of: compilers, query planners, or distributed systems.
  • Understanding of computer science fundamentals.
  • Facility with "tools of the trade", e.g., Unix system administration, shell scripting, build and deployment tools, version control, etc.
  • Ability to meet deadlines and work cooperatively in a small, collaborative team with limited formal processes.

In addition to Python, our current technology stack also includes the JVM, Scala, GCP, Azure, and C++. Our domain knowledge includes machine learning, bioinformatics, statistical genetics, compilers, and theoretical math. Hires need not have experience with every aspect of our technologies and domains.
Our website: https://hail.is. Our GitHub: https://github.com/hail-is/hail.
Applied = 0

(web-6f6965f9bf-7hrd4)