What do you think being a data scientist is about?

I think being a data scientist involves proposing recommendations to companies or scientists about their research or goals by analyzing data using statistics. They must have a strong background in statistics and programming in order to do so.

Data scientists, machine learning engineers, statisticians, and data analysts all have confusing definitions, but their goals are connected and benefit each other. Data scientists create models that machine learning engineers modify so that they can handle large amounts of real-time data (Zola, 2019). Data scientists and statisticians rely on each other because data scientists need to understand statistics while statisticians need to know how to model data (Data Scientists Versus Statisticians, 2019). Lastly, data scientists propose questions that a data analyst may find answers to (Sasikumar, 2022). This overlap can make it difficult to define the differences between these fields.

What differences/similarities do you see between data scientists and statisticians?

There is some confusion about the definition of a “data scientist” and a “statistician” because they both require a strong background in statistics in order to work with complex data and share the same goal: to learn from data (Data Scientists Versus Statisticians, 2019). In fact, even people applying for jobs as a data scientist may not understand what a data scientist is (Megahan, 2022).

Data scientists use models and machine learning to analyze large datasets while statisticians build simple models that are built upon by addressing and verifying assumptions to analyze smaller more traditional datasets, like the results from an experiment. In addition, the machine learning that data scientists use does not rely on measurements of uncertainty, unlike the modeling statisticians use (Data Scientists Versus Statisticians, 2019). Statisticians also have more of a focus on predicting the relationships between variables. In practice, data scientists may be more useful to companies that rely on algorithms, like Facebook, in order to reach their customers because they can make data-based recommendations with the company’s goals and products in mind (Megahan, 2022). In terms of public opinion, the work that data scientists do is more understandable than the duties of a statistician because they follow a logical process that tells a story about the data as opposed to statisticians that rely on complex ideas, like margins of error (Data Scientists Versus Statisticians, 2019).

How do you view yourself in relation to these two areas?

I am currently using R in my job as an ORISE research intern at the U.S. Environmental Protection Agency to analyze and visualize data. I feel that I am more of a data scientist because I work with multiple datasets (i.e., cell-based in vitro data, zebrafish developmental toxicity data, transcriptomic data, and literature mining data) to generate hypotheses about the potential mechanisms of chemicals with selective toxicity, or chemicals that produce developmental effects at concentrations lower than where cell stress is produced.

Bibliography:

Medium. 2019. Data Scientists Versus Statisticians. [online] Available at: https://medium.com/odscjournal/data-scientists-versus-statisticians-8ea146b7a47f [Accessed 21 May 2022].

Megahan, J., 2022. What is data science vs. statistics? - The Signal. [online] Mixpanel. Available at: https://mixpanel.com/blog/this-is-the-difference-between-statistics-and-data-science/ [Accessed 21 May 2022].

Zola, A., 2019. [online] Springboard.com. Available at: https://www.springboard.com/blog/data-science/machine-learning-engineer-vs-data-scientist/ [Accessed 21 May 2022].

Sasikumar, S., 2022. Data Science vs. Data Analytics vs. Machine Learning: Expert Talk. [online] Available at: https://www.simplilearn.com/data-science-vs-data-analytics-vs-machine-learning-article [Accessed 21 May 2022].


<
Blog Archive
Archive of all previous blog posts
>
Next Post
Programming Background Info