Data Engineer

About the Peterson Center on Healthcare

The Peterson Center on Healthcare is a non-profit organization dedicated to making higher quality, more affordable healthcare a reality for all Americans. The organization is working to transform U.S. healthcare into a high-performance system by finding innovative solutions that improve quality and lower costs, and accelerating their adoption on a national scale. Established by the Peter G. Peterson Foundation, the Center collaborates with stakeholders across the healthcare system and engages in grant-making, partnerships, and research.

The Position

The Center is looking for a Data Engineer responsible for creating and managing the data pipelines to enable us to scale our technology-enabled service for transforming primary care practices. The Data Engineer's primary focus will be working with a variety of healthcare data sources, and setting up data pipelines that surface these data in usable formats to our team and users at scale. This infrastructure-building work will enable physicians and their colleagues to surface data to make informed changes to their practice, and enable Center staff to easily add new health systems and primary care practices to the platform with minimal effort. Reports to the Director of Product Development. This is a full-time position located in New York, N.Y.

The Data Engineer will be part of a development team building products that will have the opportunity to positively impact the quality of care delivered to millions of patients, improve the quality of life for thousands of physicians, and significantly reduce the cost burden of U.S. healthcare. The Center is taking a model of high-performance primary care transformation identified through our work with Stanford University’s Clinical Excellence Research Center and advancing it in several primary care practices through an integrated product and service effort.

The successful candidate will be the voice of back end engineering at the Center, being the go-to person on every aspect of data aggregation, automation and pipelines with the ability to quickly translate team and user needs into real working internal products. He/She will work with a diverse team of engineers, data leads, designers and healthcare improvers.

The main responsibilities of the job include:

  • Leading data projects to automate ingestion of data, aggregation of this data into data warehouses, setting up data pipelines to surface this data in an automated fashion, and feeding this data to a variety of users and internal staff in usable formats (structured data) and interfaces
  • Working with product and data teams to ensure user and staff needs are met
  • Working with partners to access data and feed into our pipelines
  • Owning technical partner and customer relationships

Some initial projects may include:

  • Creating a complete ETL pipeline, ingesting electronic health record (EHR) and other clinical data, and making the data available to the web application through API endpoints, using messaging systems like RabbitMQ and AWS microinstance workers
  • Developing interim data services that users can upload CSV data reports into for parsing, mapping, analysis and visualization as PDF reports or web application-level data visualizations
  • Setting up mobile data acquisition system at the primary care practice system level to collect and analyze informal data collection to support workflow improvement experiments
  • Developing metadata annotation systems and analytics to user generated content to enable recommendation systems for personalized content delivery to help primary care practices move even faster in transforming their workflows

The Person

The ideal candidate will have:

  • Significant experience with data/back end engineering in Ruby on Rails or Python, and SQL databases such as PostgreSQL
  • Significant experience setting up ETL pipelines for multiple enterprise partners and navigating relationships required to get the job done, automating data aggregation, validation, visualization and analysis
  • Experience working on a software team with a production application and building data pipeline linkages to web applications (the current app is built on a Ruby on Rails, JavaScript, PostgreSQL)
  • Experience creating internal tools for non-technical users that surface data in a usable format for analysis
  • Experience working with healthcare data—electronic health record, claims, survey and other operational/process data
  • Ability to work with a variety of unstructured and structured data sources and create structured data formats and pipelines
  • Ability to quickly identify pain points in user and staff workflow, prototype and deploy solutions and rapidly iterate
  • Ability to quickly learn new languages, frameworks, and data schemas
  • Proficiency and understanding of integration of data with modern web applications
  • Preferred: experience building data visualizations using frameworks such as d3.js, matplotlib, ggplot and tools like Shiny, Tableau, etc.


  • 2–3 years of data/back end engineering experience utilizing Python or Ruby on Rails, and SQL databases
  • Bachelors in a quantitative field (e.g. computer science, engineering, mathematics, physics) preferred
  • Deep interest in data enabling technologies, with multiple side projects and interests within the open source community preferred
  • Passion for improving healthcare system and working within a mission-driven culture

To Apply

We are a dynamic, growing organization that embraces critical thinking, problem solving and innovative ideas. If you have relevant experience and qualifications, please send your resume to