Skip to main content

Data engineer

Find out what a data engineer does and the skills
you need to do the job.

Updated: July 22, 2022

Introduction to the role of data engineer

Data engineers work closely with product managers, researchers, designers, software engineers, and data scientists to design ways to ingest, optimize, synthesize, and analyze data to help users make informed decisions. They build software and infrastructure to process data from a variety of sources and store it in a way that supports analytical and other needs of end users.

Below you’ll find the full list of skills for becoming a data engineer at Skylight and a description of the skills required for each level. These descriptions offer insight into the scope of work someone at each level should be capable of doing on a consistent basis. We use these role descriptions both as a guide during the hiring process and as a springboard for discussing career progression at Skylight.

Required skills

Data pipelines

You’re familiar with multiple data formats, and can ingest, transform, and enrich data to correct issues and enhance analysis capabilities. You understand how to expose and process data from batch and streaming processes. You have experience with data processing tools, languages, and libraries. You can load data into an array of target databases and structures for further analysis.

Data platforms

You’re familiar with various data storage technologies and performance optimization strategies. You understand that data must be stored and processed on computers, either on-premises, in the cloud, or a combination thereof, and you understand the advantages and disadvantages of each approach. You can optimize data storage, memory utilization, and compute resources for both performance and cost.

Data analysis and visualization

You can develop important insights about a collection of data to answer analytical questions and inform decisions about data storage and use. You understand analytic tools and can identify and apply tools and techniques appropriate for the data and use case. You record and analyze problems with the data and can identify appropriate solutions to address them. You use appropriate visualization tools to highlight actionable insights, trends, and recommendations from data.

Coding

You’re comfortable using at least one programming language with common data science-oriented libraries. You employ good coding practices to produce working, readable, reusable, and performant code. You use test-driven development practices when appropriate.

Security, privacy, and compliance

You know how to build secure data pipelines and storage platforms and can work with DevOps, security, and software engineers to implement appropriate safeguards to avoid data breaches. You preserve privacy and security when performing development (e.g., protecting personally identifiable information (PII) and protected health information (PHI)). You take steps to protect the security and integrity of this information at every stage of analysis. When appropriate, you use or develop tools to generate representative synthetic data or de-identify real data for analysis.

Strategic and policy thinking

You’re able to engage with stakeholders to identify appropriate and impactful questions to ask of data. You can build and present visual representations of data sets and models to communicate a “story” about data. You use data to communicate ideas and to help create services, business models, workflows, and strategies. You use data to drive decisions. You constantly look for opportunities where data analysis can add value, and you can effectively work with strategists, operations, and leadership to identify and pursue opportunities to make a data-driven impact.

Data engineer career pathway

Associate data engineer

  1. Associate
  2. I not completed
  3. II not completed
  4. Senior not completed
  5. Staff not completed
  6. Principal not completed

As an associate data engineer, you’ll need to have an understanding of the role and show potential. You’ll need (and receive) guidance and training from more experienced team members to produce good work and develop your skills. You’re familiar with data engineering best practices and can articulate them to others.

Skills needed for this level

Data pipelines

You understand and are learning about how to extract, transform, and load data into a data processing pipeline. With instruction and support, you’re able to transform data from its initial raw state to analysis-ready.

Data platforms

You’re comfortable learning to use tools, frameworks, and languages for data storage, manipulation, and analysis.

Data analysis and visualization

You’re familiar with several basic analysis methods and are learning about others. You’re learning to evaluate how well specific data models support analytic methods based on the characteristics of the models and methods.

Coding

You have experience coding in at least one language or framework. With instruction and support, you can modify basic functionality in an existing environment. You understand the importance of proper code documentation, as well as writing and updating unit tests that validate your code works. You’re learning how to use version control tools and appreciate their importance.

Security, privacy, and compliance

You understand the importance of protecting privacy when handling data and can follow instructions to ensure private data stays protected while doing your work. You have an awareness of what kinds of information should be protected.

Strategic and policy thinking

You understand the importance of stakeholder buy-in. You’re aware of how the storage, processing, and accessibility of data are essential for achieving goals. You know how data analysis can help inform services and workflows, as well as how it impacts mission outcomes. You work with other team members to deliver insights to answer policy considerations.

Data engineer I

  1. Associate completed
  2. I
  3. II not completed
  4. Senior not completed
  5. Staff not completed
  6. Principal not completed

At this level, you’ll be expected to have some practical experience but still need regular guidance and training to produce your best work and develop your skills. You’re able to complete simple tasks independently and will work in combination with more senior colleagues on complex tasks.

Skills needed for this level

Data pipelines

You understand how the components of a modern data pipeline fit together. You can interpret and compose data in common data formats such as XML, JSON and CSV. You can navigate and update data pipelines to perform basic data transformations on specific fields without changing the underlying data schema or structure.

Data platforms

You can use SQL queries to efficiently answer basic questions about data. You understand standard database performance optimization concepts such as query structure, query constraints, and indexes.

Data analysis and visualization

You can apply standard analytic methods (e.g., aggregation, linear regression), and are building proficiency with more sophisticated techniques (e.g., clustering). Additionally, you’re comfortable using visualization tools to communicate your analytical processes and results to technical and non-technical audiences. You’re able to combine these data analysis and visualization skills to answer stakeholder questions about data.

Coding

You can complete simple enhancements and fixes independently and generally deliver working code. You can upload code to a shared repository, respond to reviews, and merge features in a version control system.

Security, privacy, and compliance

You understand how data can be encrypted in transit and at rest and ensure that sensitive data is always sufficiently protected. You understand that sensitive data must be protected during development as well as in distributed solutions. You’re able to use tools that generate synthetic or anonymized data from sensitive data.

Strategic and policy thinking

You understand the importance of stakeholder buy-in. You use data to communicate ideas and to help create services and workflows. You understand how data can impact mission outcomes. You work with other team members to deliver insights to answer policy questions.

Data engineer II

  1. Associate completed
  2. I completed
  3. II
  4. Senior not completed
  5. Staff not completed
  6. Principal not completed

A data engineer II is usually embedded in a multidisciplinary team and is responsible for transforming, storing, and analyzing data. At this level, you’re able to work independently to complete simple and intermediate tasks.

Skills needed for this level

Data pipelines

You’re able to ingest data from a variety of sources and can work independently to perform moderately complex transformations.

Data platforms

You can define and create database schemas and load data into a database. You can compose and run efficient SQL queries including complex joins and aggregations. You can use code syntax (e.g., R, Pandas, Spark) to perform queries.

Data analysis and visualization

You know most standard analytic methods, and you have some familiarity with advanced methods (e.g., neural nets, transformers, convolution). You can apply, implement, and modify a variety of standard methods to fit most types of data problems you encounter. You’re able to build collections of interactive visualizations and dashboards that tell stories about data and analysis.

Coding

You have experience with multiple programming languages and frameworks. You consistently deliver high-quality code and are comfortable tackling simple and intermediate features without guidance. Your code is self-documenting and easy for others to understand. You can identify problems and opportunities for improvement when reviewing other peoples’ code and recommend adjustments to correct or improve their code. You’re able to estimate how long it will take you to perform basic and intermediate tasks.

Security, privacy, and compliance

You understand how data can be encrypted in transit and at rest and can enable data encryption in a variety of environments and contexts. You’re able to effectively use tools that generate synthetic or anonymized data from sensitive data.

Strategic and policy thinking

You work with stakeholders to cooperatively identify questions that deliver impactful analyses. You can effectively articulate ideas, recommendations, and conclusions. You implement solutions that efficiently and effectively answer key questions.

Senior data engineer

  1. Associate completed
  2. I completed
  3. II completed
  4. Senior
  5. Staff not completed
  6. Principal not completed

You’re experienced in all major aspects of data engineering and can synthesize a broad set of inputs, constraints, and required outcomes to design and implement data pipelines. You’re able to work independently on complex projects as well as provide guidance and mentorship to junior engineers.

Skills needed for this level

Data pipelines

You can ingest data from a variety of sources and work independently to perform complex transformations. You can move data from one data structure to another and perform filtering and data augmentation by incorporating external data (e.g., API results).

Data platforms

You’re well versed in a variety of data storage formats, such as row oriented, columnar, and object storage, and can explain advantages and disadvantages of each format. You understand the differences and tradeoffs between on-premises and cloud infrastructure.

Data analysis and visualization

You’re broadly fluent in standard and advanced methods. You can design or modify standard methods to adapt to the nuances of problems you encounter. You advocate for the simplest and most interpretable model that can solve the problem. You’re able to build complex and performant visualizations that are well tailored for their target audiences. You keep up with journals, publications, and conferences that increase your exposure to new and emerging methods.

Coding

You understand and use software best practices and can teach them to others. You write understandable and well-documented code. You’re adept at providing kind and actionable feedback to other engineers through regular code reviews and 1:1s. You’re able to provide effort estimates not just for yourself, but for the broader team. You know how to build quality in from the start. You write tests, follow DevOps best practices, and understand infrastructure as code.

Security, privacy, and compliance

You can implement end-to-end security, including authentication, authorization, and encryption. You follow and enforce good privacy practices and protect PII in your daily work. You can identify security gaps and holes in your product and related processes and work with other engineers and leaders to close them. You can create and use data synthesis and anonymization tools to minimize the use of PII during development and testing.

Strategic and policy thinking

You’re an expert at using data to communicate ideas and to help create new services, business models, workflows, and strategies. You look for ways to increase data literacy, and you can effectively communicate how data-driven decisions might provide increased benefits.

Staff data engineer

  1. Associate completed
  2. I completed
  3. II completed
  4. Senior completed
  5. Staff
  6. Principal not completed

A staff data engineer leads and aligns data engineering activities across several teams. You’re a prominent voice on your teams and within the organization, advocating for, exemplifying, and ensuring good data engineering practices. You can succinctly convey advanced data engineering concepts and are instrumental in helping your team grow and overcome challenges.

Skills needed for this level

Data pipelines

You’re able to troubleshoot and solve most challenging data ingestion and transformation problems. You can design, configure, and document efficient data pipelines from the ground up. You help project teams solve complex data pipeline efficiency and scalability problems.

Data platforms

You can synthesize the details and peculiarities of complex data sets to design highly efficient systems that meet budgetary and functional requirements. You apply knowledge in a broad array of established and modern data storage platforms and tools. You can propose and persuasively explain choices around data platforms to team members and stakeholders.

Data analysis and visualization

You can identify issues in standard and advanced data analysis and modeling approaches, as well as develop customized solutions to address these challenges. You support colleagues across your project in solving analysis problems and can clearly communicate suggestions to help them with their work. You guide your team’s data communication and visualization strategy to ensure your work is represented accurately and is accessible to stakeholders.

Coding

You produce exceptional working, well-tested, and performant code. You provide comprehensive and valuable feedback to other engineers in incremental changes. You monitor the broader codebase and ensure that it stays clean and well-organized.

Security, privacy, and compliance

You’re responsible for data security across multiple projects and can mentor other engineers as needed. You know current security best practices around data management and can effectively communicate to your product and engineering teams how and why to implement them. You stay up to date with data-related vulnerabilities that could impact your solutions, and work with DevOps, security, and software engineers to address them. You define and promote processes within your team that ensure PII is protected during development and testing.

Strategic and policy thinking

You work with stakeholders to cooperatively identify meaningful questions that can deliver impactful analysis. You use data to communicate ideas and to help create new services, business models, workflows and strategies. You can effectively communicate how data-driven decisions might provide increased benefits.

Principal data engineer

  1. Associate completed
  2. I completed
  3. II completed
  4. Senior completed
  5. Staff completed
  6. Principal

A principal data engineer leads data engineers in an organization and attracts and builds talent. You’ll define and ensure best practices, influence organizational strategy and priorities, and provide support for all levels of data engineers and other staff.

Skills needed for this level

Data pipelines

You work across teams to design and implement highly efficient data pipelines. You’re well-versed in a broad spectrum of data pipeline tools, technologies, and strategies. You use this expertise to provide mentorship and guidance to the rest of Skylight.

Data platforms

You stay up to date with cutting edge data storage technologies and approaches. You develop these solutions across projects regardless of their technical composition. Your expertise helps the company and engineering teams make sound decisions regarding storage platforms and best practices. You can quickly identify performance bottlenecks and come up with options to remediate them. You’re comfortable working with infrastructure both in and outside cloud environments.

Data analysis and visualization

You demonstrate mastery over standard, advanced, and custom models. You know how to tailor any model to fit the problem, and you can explain the comparative advantages and disadvantages of choosing one model over another. You have mastered the technical and non-technical expression of modeling details. You can clearly communicate analytical problems and succinctly convey and deliver analytic recommendations within the context of client or policy outcomes. You establish best practices for data modeling, visualization, and communication approach, and encourage adherence across the company by working with individual team leads.

Coding

You make sure that Skylight’s data engineers are aware of emerging and established practices and technologies. You can analyze tradeoffs and give sound guidance across Skylight teams to apply the most appropriate approach. You define organizational standards for code quality and best practices.

Security, privacy, and compliance

You’re an expert in data engineering-related security best practices and keep up to date with current guidance and emerging vulnerabilities. You make sure that project teams across the organization are aware of critical security vulnerabilities and follow up with project leads and Skylight leadership as appropriate.

Strategic and policy thinking

You use data to proactively ask meaningful questions to project leads and stakeholders across the company. You work with stakeholders to determine priorities, questions, and outcomes that can be addressed with data. You collaborate with leadership, strategists, and policymakers to incorporate data at every project stage. You identify, pursue, and deliver on new opportunities to leverage data-driven thinking to achieve mission outcomes.

We’re hiring!

Let’s work together to change the way government
serves millions of people.