Skip to main content

Data scientist

Find out what a data scientist does and the skills
you need to do the job.

Updated: February 18, 2021

Introduction to the role of data scientist

Data scientists organize, synthesize, and analyze data to help users make informed decisions. They apply a variety of analytical models to find meaningful insights in both structured and unstructured data, and they communicate these insights in an easily digestible format.

Below you’ll find the full list of skills for becoming a data scientist at Skylight and a description of the skills required for each level. These descriptions offer insight into the scope of work someone at each level should be capable of doing on a consistent basis. We use these role descriptions both as a guide during the hiring process and as a springboard for discussing career progression at Skylight.

Required skills

Strategic and policy thinking

You’re able to engage with stakeholders to identify appropriate and impactful questions to ask of data. You use data to communicate ideas and to help create services, business models, workflows, and strategies. You use data to drive decisions. You constantly look for opportunities where data analysis can add value, and you can effectively work with strategists, operations, and leadership to identify and pursue opportunities to make a data-driven impact.

Data engineering

You’re familiar with numerous techniques to clean and pre-process data from a variety of sources in a variety of forms, whether it’s structured or unstructured, textual or numeric. You’re comfortable with commonly-used packages for cleaning and manipulating data. You’re proficient at data modeling. You understand the appropriate level of granularity at which to analyze a data set.

Anonymizing for analysis

You preserve privacy and security when performing data analysis (e.g., protecting personally identifiable information (PII) and protected health information (PHI)). You take steps to protect the security and integrity of this information at every stage of analysis. When appropriate, you “de-identify” any and all data or analysis before presenting or sharing it.

Analysis and model development

You’re widely versed in a variety of analytic methods and can choose appropriate techniques to analyze a data set. You actively work to stay informed of the latest modeling algorithms and data science trends.

You can design, prototype, and build credible, effective, and accessible statistical models. You apply your broad understanding of data science algorithms, including machine learning and artificial intelligence (AI) algorithms, to develop solutions to complex data problems.

Presenting analysis and results

You can articulately communicate your data, models, and findings to technical and non-technical audiences. You put aside the jargon to distill modeling problems to their essence, and you’re able to succinctly explain the problem, its solution, and the valuable insights you gained.

Visualizing a data narrative

You can build and present visual illustrations of data sets and models to communicate a “story” about data. You use appropriate visualization tools to highlight actionable insights, trends, and recommendations from data.

Data championing

You’re the first and strongest advocate for your data. You champion leading practices within data science, including transparency and accessibility, in your team and others. You evangelize data-driven decision-making and look for ways to make data more democratic.

Data scientist career pathway

Associate data scientist

  1. Associate
  2. I not completed
  3. II not completed
  4. Senior not completed
  5. Staff not completed
  6. Principal not completed

As an associate data scientist in an entry-level role, you’ll need to have an understanding of the role and show potential. You’ll need and (receive) guidance and training from more experienced data scientists to produce good work and develop your skills.

Skills needed for this level

Strategic and policy thinking

You understand the importance of stakeholder buy-in to data science. You’re aware of how data analysis can help inform services and workflows, as well as how it impacts mission outcomes. You work with other team members to deliver insights to answer policy considerations.

Data engineering

You understand some methods of pre-processing data to create the most appropriate form for analysis. You can recognize structured vs. unstructured data. You have an awareness of the level of granularity a particular analysis may require.

Anonymizing for analysis

You understand the importance of protecting privacy when analyzing data. You have an awareness of what kinds of information should be protected.

Analysis and model development

You know many standard analytic methods (e.g., regression, clustering), and have an awareness of some advanced methods. You know how to pick the right standard models to match the problem. You can use analytic methods to answer stakeholder questions about data. You can implement typical out-of-the-box algorithms, such as K-Means, using appropriate tools and packages.

Presenting analysis and results

You can articulate the technical decisions that went into a model’s design and provide statistical interpretation of the results you obtained.

Visualizing a data narrative

You have a basic ability to use fundamental visualization tools to generate charts or other information about your models. You primarily use tables, line/bar graphs, and other simple visualizations to communicate analysis results.

Data championing

You’re familiar with data science best practices and can articulate them to others. You practice transparency and accessibility in your analytic work.

Data scientist I

  1. Associate completed
  2. I
  3. II not completed
  4. Senior not completed
  5. Staff not completed
  6. Principal not completed

A data scientist I is embedded in a multidisciplinary team. At this level, you’ll be expected to have some practical experience but still need regular guidance and training to produce your best work and develop your skills. You’ll work in combination with a more senior data scientist.

Skills needed for this level

Strategic and policy thinking

You understand the importance of stakeholder buy-in to data science. You use data to communicate ideas and to help create services and workflows. You understand how data can impact mission outcomes. You work with other team members to deliver insights to answer policy questions.

Data engineering

You understand some methods of pre-processing data to create the most appropriate form for analysis. You can recognize structured vs. unstructured data. You have an awareness of what level of granularity an analysis may require.

Anonymizing for analysis

You know when to anonymize data by eliminating PII and other protected demographic information from data to-be-analyzed, such as date of birth, name, or unique identifier (e.g., social security numbers).

Analysis and model development

You know most standard analytic methods, and you have some familiarity with some advanced methods (e.g., neural nets, transformers, convolution). You can apply, implement, and modify a variety of standard methods to fit most types of data problems you encounter.

You understand how to chain models and customize hyperparameters to improve performance without sacrificing interpretability.

Presenting analysis and results

You can clearly communicate the rationale behind modeling decisions you’ve made as well as explain broad trends or patterns you discovered in the data. You’re able to express these in plain language.

You can use analytic methods to answer stakeholder questions about data.

Visualizing a data narrative

You can use common data visualization tools to communicate a basic story about data. You have an awareness of complicated visualization tools, such as geospatial representations.

Data championing

You advocate for broadly-used best practices for data collection and analysis within and across teams. You understand the value of data to the modern organization and can effectively advocate for the use of data to drive decision-making.

Data scientist II

  1. Associate completed
  2. I completed
  3. II
  4. Senior not completed
  5. Staff not completed
  6. Principal not completed

A data scientist II is usually embedded in a multidisciplinary team and is responsible for analyzing and presenting data. At this level, you’ll be expected to work independently on a team.

Skills needed for this level

Strategic and policy thinking

You work with stakeholders to cooperatively identify meaningful questions that can deliver impactful analysis. You use data to communicate ideas and to help create new services, business models, workflows and strategies. You know the role that data plays in various project ecosystems. You look for ways to increase data literacy, and you can effectively communicate how data-driven decisions might provide increased benefits.

Data engineering

You know how to apply appropriate cleaning methods to a variety of input data formats. You know how to pre-process in robust, fault-tolerant ways that create streamlined downstream data.

Anonymizing for analysis

You’re able to anonymize data by eliminating PII and other protected demographic information from data to-be-analyzed, such as date of birth, name, or unique identifier (e.g., social security numbers).

Analysis and model development

You’re well versed in both standard and advanced methods. You’re able to design, tweak, or customize standard methods to change nuances like objective function or minimization criteria to fit most problems you encounter.

You understand how to chain models and customize hyperparameters to improve performance without sacrificing interpretability.

Presenting analysis and results

You can distill modeling problems down to salient highlights and use these as a framework for communication. You can effectively describe your model and your results to most audiences.

You can use technical and non-technical language to get to the heart of a data-driven question, and you can effectively translate this essence into an actionable outcome.

Visualizing a data narrative

You know how to choose and combine visualizations to best tell a story. You can leverage features like annotations, tool tips, and various visual design strategies to highlight important points in the data narrative.

Data championing

You advocate for broadly-used best practices for data collection and analysis within and across teams. You understand the value of data to the modern organization and can effectively advocate for the use of data to drive decision-making.

Senior data scientist

  1. Associate completed
  2. I completed
  3. II completed
  4. Senior
  5. Staff not completed
  6. Principal not completed

A senior data scientist is an experienced practitioner who’s able to synthesize a broad suite of data from complex cases into models that create actionable insights. At this level, you’ll be expected to:

  • build collection or analysis systems that operate as part of an organization’s data ecosystem
  • align data science activities with wider plans to inform a service proposition
  • supervise and develop other data scientists to improve data practice

Skills needed for this level

Strategic and policy thinking

You work with stakeholders to cooperatively identify meaningful questions that can deliver impactful analysis. You’re an expert at using data to communicate ideas and to help create new services, business models, workflows, and strategies. You know the role that data plays in various project ecosystems. You look for ways to increase data literacy, and you can effectively communicate how data-driven decisions might provide increased benefits.

Data engineering

You’re an expert at cleaning and manipulating structured and unstructured data while also integrating and unifying different data formats. You’re comfortable building complex data models. You use the desired outcome of a research question to guide granularity.

Anonymizing for analysis

You de-identify data by masking or eliminating PII and personal demographic information. You understand the nuances of how summary views of data can be used to re-identify individuals and their features, and you take steps to suppress or mask this information.

Analysis and model development

You’re broadly fluent in standard and advanced methods. You opt for the simplest, most interpretable model that can solve the problem, and you aim for the smallest amount of customization necessary to achieve results. You keep up with journals, publications, and conferences that increase your exposure to new and emerging methods.

You can design models that optimize for different criteria based on the desired outcome. You understand how to balance statistical performance with communicability and interpretability of a model. You can implement, modify, or customize tailor-made models to address most data problems, including those requiring neural networks or AI.

Presenting analysis and results

You can distill modeling problems down to salient highlights and use these as a framework for communication. You can effectively describe your model and your results to most audiences.

You can use technical and non-technical language to get to the heart of a data-driven question, and you can effectively translate this essence into an actionable outcome.

Visualizing a data narrative

You know how to choose and combine visualizations to best tell a story, leveraging features like annotations, tool tips, and various visual design strategies to highlight important points.

Data championing

You champion best practices for data science within and across teams, leading by example and pushing for growth. You look for ways to engage senior leadership or decision-makers in data-driven culture.

Staff data scientist

  1. Associate completed
  2. I completed
  3. II completed
  4. Senior completed
  5. Staff
  6. Principal not completed

A staff data scientist is an expert practitioner, leading and aligning data scientist activities across several teams. At this level, you’ll be expected to:

  • ensure that organizations take a data-driven, analytical approach to service design and delivery, as well as policy conversations
  • develop and assure good data science practice

Skills needed for this level

Strategic and policy thinking

You use data to proactively ask meaningful questions to stakeholders. You can engage stakeholders in technical and non-technical dialogue to determine priorities, questions, and outcomes that can be addressed with data. You lead efforts to incorporate data analytics at every level of a project. You can effectively communicate to client leadership on how data will make an impact. You can identify opportunities for future growth in data-driven practices.

Data engineering

You can coach teams through data engineering and are an expert at cleaning and manipulating messy data. You use the desired outcome of a research question to guide granularity.

Anonymizing for analysis

You de-identify data by masking or eliminating PII and personal demographic information. You understand the nuances of how summary views of data can be used to re-identify individual features, and you take steps to suppress or mask this information.

Analysis and model development

You’re broadly fluent in standard and advanced methods. You opt for the simplest, most interpretable model that can solve the problem, and you understand the smallest degree of tweaks or hand-spinning necessary to achieve your desired result. You actively consume journals, publications, and conferences to stay abreast of the latest in data science methods.

You can design models that optimize for different criteria based on the desired outcome. You understand how to balance statistical performance with communicability and interpretability of a model. You can implement, modify, or customize tailor-made models to address most data problems, including those requiring neural networks or AI.

Presenting analysis and results

You have mastered the technical and non-technical expression of modeling details. You can motivate a data problem, succinctly convey a solution, and deliver analytic recommendations within the context of client or policy outcomes.

Visualizing a data narrative

You’re fluent in data-driven visual design and apply these skills at every level of visualization. You can build dashboards to present summary views of data or give an exploratory deep-dive. You’re adept at visualizing complex geospatial and time-series data using more advanced visualization tooling.

Data championing

You champion best practices for data science within and across teams, leading by example and pushing for growth. You look for ways to engage senior leadership or decision-makers in data-driven culture.

Principal data scientist

  1. Associate completed
  2. I completed
  3. II completed
  4. Senior completed
  5. Staff completed
  6. Principal

A principal data scientist leads data scientists in an organization and attracts and builds talent. At this level, you’ll be expected to be an expert practitioner who can define and assure best practice, influence organizational strategy and priorities, and collaborate with colleagues across government.

Skills needed for this level

Strategic and policy thinking

You use data to proactively ask meaningful questions to stakeholders. You can engage stakeholders in technical and non-technical dialogue to determine priorities, questions, and outcomes that can be addressed with data. You collaborate with leadership, strategists, and policy-makers alike to incorporate data at every step of the way. You can identify, pursue, and deliver on new opportunities to leverage data-driven thinking to make an impact towards mission outcomes.

Data engineering

You have a thorough mastery of data processing techniques suitable for data in any structure and input format. You know how to apply these techniques in a robust, fault-tolerant way. You employ data-integration best practices to choose appropriate analytic granularity.

Anonymizing for analysis

Protecting data is at the core of your analyses. You anonymize data by eliminating PII and PHI, and you mask the ability to back-calculate individual records from summary views. You ensure at every level of analysis that private information is suppressed, masked, or eliminated as may be appropriate.

Analysis and model development

You have demonstrable mastery over standard, advanced, and custom models. You know how to tailor any model to fit the problem, and you can explain the comparative advantages and disadvantages of choosing one model over another. You advise teams on recommendations for the most suitable ways to analyze data. You continually strive to understand more about the field of data science.

You’re an expert in the theory and practice of model design, creating blends of off-the-shelf and custom models to address any data problem. You can articulate tradeoffs in model design decisions, and you can scale modeling to even the deepest learning problems.

Presenting analysis and results

You have mastered the technical and non-technical expression of modeling details. You can motivate a data problem, succinctly convey a solution, and deliver analytic recommendations within the context of client or policy outcomes.

Visualizing a data narrative

You’re fluent in data-driven visual design and apply these skills at every level of visualization. You can build dashboards to present summary views of data or give an exploratory deep-dive. You understand and can work with complex visualization engines, including those that involve geospatial or time-series data.

Data championing

You’re a true evangelist for data-driven decision-making, analysis, and culture. You use data science best practices to create opportunities to learn and grow within organizations, leading to digital transformation.

We’re hiring!

Let’s work together to change the way government
serves millions of people.