Updated: October 26, 2022
Introduction to the role of data engineer
Data engineers work closely with product managers, researchers, designers, software engineers, and data scientists to design ways to ingest, optimize, synthesize, and analyze data to help users make informed decisions. They build software and infrastructure to process data from a variety of sources and store it in a way that supports analytical and other needs of end users.
Below you’ll find the full list of skills for becoming a data engineer at Skylight and a description of the skills required for each level. These descriptions offer insight into the scope of work someone at each level should be capable of doing on a consistent basis. We use these role descriptions both as a guide during the hiring process and as a springboard for discussing career progression at Skylight.
Required skills
Data pipelines
You’re familiar with multiple data formats, and ingest, transform, and enrich data to correct issues and enhance analysis capabilities. You expose and process data from batch and streaming processes. You have experience with data processing tools, languages, and libraries. You load data into an array of target databases and structures for further analysis.
Data platforms
You’re familiar with various data storage technologies and performance optimization strategies. You understand that data must be stored and processed on computers, either on-premises, in the cloud, or a combination thereof, and you understand the advantages and disadvantages of each approach. You optimize data storage, memory utilization, and compute resources for both performance and cost.
Data analysis and visualization
You develop important insights about a collection of data to answer analytical questions and inform decisions about data storage and use. You understand analytic tools and identify and apply tools and techniques appropriate for the data and use case. You record and analyze problems with the data and identify appropriate solutions to address them. You use appropriate visualization tools to highlight actionable insights, trends, and recommendations from data.
Coding
You’re comfortable using at least one programming language with common data science-oriented libraries. You employ good coding practices to produce working, readable, reusable, and performant code. You use test-driven development practices when appropriate.
Security, privacy, and compliance
You know how to build secure data pipelines and storage platforms and work with DevOps, security, and software engineers to implement appropriate safeguards to avoid data breaches. You preserve privacy and security when performing development (e.g., protecting personally identifiable information (PII) and protected health information (PHI)). You take steps to protect the security and integrity of this information at every stage of analysis. When appropriate, you use or develop tools to generate representative synthetic data or de-identify real data for analysis.
Strategic and policy thinking
You engage with stakeholders to identify appropriate and impactful questions to ask of data. You build and present visual representations of data sets and models to communicate a “story” about data. You use data to communicate ideas and to help create services, business models, workflows, and strategies. You use data to drive decisions. You constantly look for opportunities where data analysis adds value, and you effectively work with strategists, operations, and leadership to identify and pursue opportunities to make a data-driven impact.
Data engineer career pathway
Associate data engineer
-
Associate
-
I not completed
-
II not completed
-
Senior not completed
-
Staff not completed
-
Principal not completed
As an associate data engineer, you’ll need to have an understanding of the role and show potential. You’ll need (and receive) guidance and training from more experienced team members to produce good work and develop your skills. You’re familiar with data engineering best practices and articulate them to others.
Skills needed for this level
Data pipelines
You understand and are learning about how to extract, transform, and load data into a data processing pipeline. With instruction and support, you transform data from its initial raw state to analysis-ready.
Data platforms
You’re comfortable learning to use tools, frameworks, and languages for data storage, manipulation, and analysis.
Data analysis and visualization
You’re familiar with several basic analysis methods and are learning about others. You’re learning to evaluate how well specific data models support analytic methods based on the characteristics of the models and methods.
Coding
You have experience coding in at least one language or framework. With instruction and support, you modify basic functionality in an existing environment. You understand the importance of proper code documentation, as well as writing and updating unit tests that validate your code works. You’re learning how to use version control tools and appreciate their importance.
Security, privacy, and compliance
You understand the importance of protecting privacy when handling data and follow instructions to ensure private data stays protected while doing your work. You have an awareness of what kinds of information should be protected.
Strategic and policy thinking
You understand the importance of stakeholder buy-in. You’re aware of how the storage, processing, and accessibility of data are essential for achieving goals. You know how data analysis informs services and workflows, as well as how it impacts mission outcomes. You work with other team members to deliver insights to answer policy considerations.
Data engineer I
-
Associate completed
-
I
-
II not completed
-
Senior not completed
-
Staff not completed
-
Principal not completed
At this level, you’ll be expected to have some practical experience but still need regular guidance and training to produce your best work and develop your skills. You complete simple tasks independently and will work in combination with more senior colleagues on complex tasks.
Skills needed for this level
Data pipelines
You understand how the components of a modern data pipeline fit together. You interpret and compose data in common data formats such as XML, JSON and CSV. You navigate and update data pipelines to perform basic data transformations on specific fields without changing the underlying data schema or structure.
Data platforms
You use SQL queries to efficiently answer basic questions about data. You understand standard database performance optimization concepts such as query structure, query constraints, and indexes.
Data analysis and visualization
You apply standard analytic methods (e.g., aggregation, linear regression), and are building proficiency with more sophisticated techniques (e.g., clustering). Additionally, you’re comfortable using visualization tools to communicate your analytical processes and results to technical and non-technical audiences. You combine these data analysis and visualization skills to answer stakeholder questions about data.
Coding
You complete simple enhancements and fixes independently and generally deliver working code. You upload code to a shared repository, respond to reviews, and merge features in a version control system.
Security, privacy, and compliance
You encrypt data in transit and at rest and ensure that sensitive data is always sufficiently protected. You understand that sensitive data must be protected during development as well as in distributed solutions. You use tools that generate synthetic or anonymized data from sensitive data.
Strategic and policy thinking
You understand the importance of stakeholder buy-in. You use data to communicate ideas and to help create services and workflows. You understand how data impacts mission outcomes. You work with other team members to deliver insights to answer policy questions.
Data engineer II
-
Associate completed
-
I completed
-
II
-
Senior not completed
-
Staff not completed
-
Principal not completed
A data engineer II is usually embedded in a multidisciplinary team and is responsible for transforming, storing, and analyzing data. At this level, you work independently to complete simple and intermediate tasks.
Skills needed for this level
Data pipelines
You ingest data from a variety of sources and work independently to perform moderately complex transformations.
Data platforms
You define and create database schemas and load data into a database. You compose and run efficient SQL queries including complex joins and aggregations. You use code syntax (e.g., R, Pandas, Spark) to perform queries.
Data analysis and visualization
You know most standard analytic methods, and you have some familiarity with advanced methods (e.g., neural nets, transformers, convolution). You apply, implement, and modify a variety of standard methods to fit most types of data problems you encounter. You build collections of interactive visualizations and dashboards that tell stories about data and analysis.
Coding
You have experience with multiple programming languages and frameworks. You consistently deliver high-quality code and are comfortable tackling simple and intermediate features without guidance. Your code is self-documenting and easy for others to understand. You identify problems and opportunities for improvement when reviewing other peoples’ code and recommend adjustments to correct or improve their code. You estimate how long it will take you to perform basic and intermediate tasks.
Security, privacy, and compliance
You encrypt data in transit and at rest and enable data encryption in a variety of environments and contexts. You effectively use tools that generate synthetic or anonymized data from sensitive data.
Strategic and policy thinking
You work with stakeholders to cooperatively identify questions that deliver impactful analyses. You effectively articulate ideas, recommendations, and conclusions. You implement solutions that efficiently and effectively answer key questions.
Senior data engineer
-
Associate completed
-
I completed
-
II completed
-
Senior
-
Staff not completed
-
Principal not completed
You’re experienced in all major aspects of data engineering and synthesize a broad set of inputs, constraints, and required outcomes to design and implement data pipelines. You work independently on complex projects as well as provide guidance and mentorship to junior engineers.
Skills needed for this level
Data pipelines
You ingest data from a variety of sources and work independently to perform complex transformations. You move data from one data structure to another and perform filtering and data augmentation by incorporating external data (e.g., API results).
Data platforms
You’re well versed in a variety of data storage formats, such as row oriented, columnar, and object storage, and explain advantages and disadvantages of each format. You understand the differences and tradeoffs between on-premises and cloud infrastructure.
Data analysis and visualization
You’re broadly fluent in standard and advanced methods. You design or modify standard methods to adapt to the nuances of problems you encounter. You advocate for the simplest and most interpretable model that solves the problem. You build complex and performant visualizations that are well tailored for their target audiences. You keep up with journals, publications, and conferences that increase your exposure to new and emerging methods.
Coding
You understand and use software best practices and teach them to others. You write understandable and well-documented code. You’re adept at providing kind and actionable feedback to other engineers through regular code reviews and 1:1s. You provide effort estimates not just for yourself, but for the broader team. You build quality in from the start. You write tests, follow DevOps best practices, and understand infrastructure as code.
Security, privacy, and compliance
You implement end-to-end security, including authentication, authorization, and encryption. You follow and enforce good privacy practices and protect PII in your daily work. You identify security gaps and holes in your product and related processes and work with other engineers and leaders to close them. You create and use data synthesis and anonymization tools to minimize the use of PII during development and testing.
Strategic and policy thinking
You’re an expert at using data to communicate ideas and to help create new services, business models, workflows, and strategies. You look for ways to increase data literacy, and you effectively communicate how data-driven decisions might provide increased benefits.
Staff data engineer
-
Associate completed
-
I completed
-
II completed
-
Senior completed
-
Staff
-
Principal not completed
A staff data engineer leads and aligns data engineering activities across several teams. You’re a prominent voice on your teams and within the organization, advocating for, exemplifying, and ensuring good data engineering practices. You succinctly convey advanced data engineering concepts and are instrumental in helping your team grow and overcome challenges.
Skills needed for this level
Data pipelines
You troubleshoot and solve most challenging data ingestion and transformation problems. You design, configure, and document efficient data pipelines from the ground up. You help project teams solve complex data pipeline efficiency and scalability problems.
Data platforms
You synthesize the details and peculiarities of complex data sets to design highly efficient systems that meet budgetary and functional requirements. You apply knowledge in a broad array of established and modern data storage platforms and tools. You propose and persuasively explain choices around data platforms to team members and stakeholders.
Data analysis and visualization
You identify issues in standard and advanced data analysis and modeling approaches, as well as develop customized solutions to address these challenges. You support colleagues across your project in solving analysis problems and clearly communicate suggestions to help them with their work. You guide your team’s data communication and visualization strategy to ensure your work is represented accurately and is accessible to stakeholders.
Coding
You produce exceptional working, well-tested, and performant code. You provide comprehensive and valuable feedback to other engineers in incremental changes. You monitor the broader codebase and ensure that it stays clean and well-organized.
Security, privacy, and compliance
You’re responsible for data security across multiple projects and mentor other engineers as needed. You know current security best practices around data management and effectively communicate to your product and engineering teams how and why to implement them. You stay up to date with data-related vulnerabilities that could impact your solutions, and work with DevOps, security, and software engineers to address them. You define and promote processes within your team that ensure PII is protected during development and testing.
Strategic and policy thinking
You work with stakeholders to cooperatively identify meaningful questions that deliver impactful analysis. You use data to communicate ideas and to help create new services, business models, workflows and strategies. You effectively communicate how data-driven decisions might provide increased benefits.
Principal data engineer
-
Associate completed
-
I completed
-
II completed
-
Senior completed
-
Staff completed
-
Principal
A principal data engineer leads data engineers in an organization and attracts and builds talent. You’ll define and ensure best practices, influence organizational strategy and priorities, and provide support for all levels of data engineers and other staff.
Skills needed for this level
Data pipelines
You work across teams to design and implement highly efficient data pipelines. You’re well-versed in a broad spectrum of data pipeline tools, technologies, and strategies. You use this expertise to provide mentorship and guidance to the rest of Skylight.
Data platforms
You stay up to date with cutting edge data storage technologies and approaches. You develop these solutions across projects regardless of their technical composition. Your expertise helps the company and engineering teams make sound decisions regarding storage platforms and best practices. You quickly identify performance bottlenecks and come up with options to remediate them. You’re comfortable working with infrastructure both in and outside cloud environments.
Data analysis and visualization
You demonstrate mastery over standard, advanced, and custom models. You tailor any model to fit the problem, and you explain the comparative advantages and disadvantages of choosing one model over another. You have mastered the technical and non-technical expression of modeling details. You clearly communicate analytical problems and succinctly convey and deliver analytic recommendations within the context of client or policy outcomes. You establish best practices for data modeling, visualization, and communication approach, and encourage adherence across the company by working with individual team leads.
Coding
You make sure that Skylight’s data engineers are aware of emerging and established practices and technologies. You analyze tradeoffs and give sound guidance across Skylight teams to apply the most appropriate approach. You define organizational standards for code quality and best practices.
Security, privacy, and compliance
You’re an expert in data engineering-related security best practices and keep up to date with current guidance and emerging vulnerabilities. You make sure that project teams across the organization are aware of critical security vulnerabilities and follow up with project leads and Skylight leadership as appropriate.
Strategic and policy thinking
You use data to proactively ask meaningful questions to project leads and stakeholders across the company. You work with stakeholders to determine priorities, questions, and outcomes that can be addressed with data. You collaborate with leadership, strategists, and policymakers to incorporate data at every project stage. You identify, pursue, and deliver on new opportunities to leverage data-driven thinking to achieve mission outcomes.
See all roles