Data Science Interviews and Possible Questions

İrem Kömürcü
8 min readMay 23, 2022

In January 2022, I went through a fast job search process and had plenty of interview experience. Due to high demand, I wanted to write this article to help you with the job search process and answer career journey questions from the community.

For those who are wondering before we start, I would like to give you some details about my experience working as a Data Scientist within Deloitte, one of the companies known as the “Big Four”, who I started working with in February ❤

I’ve interviewed at more than 10 companies for a Data Scientist position — as well as Machine Learning Engineer and Computer Vision Engineer positions. Among the companies I interviewed, there were also big global brands, startups, and freelance jobs.

Photo by Maranda Vandergriff on Unsplash

Is Every Interview Process the Same?

In all the interviews I have attended, I can say that the processes are different from each other, but the questions and general requests are similar.

While some companies start with an HR phase/behavioral and continue with a technical interview and take-home-assignment phase; some companies start with a direct technical interview and then continue with an HR interview. There are recruitment processes consisting of 6 stages, there are also recruitment processes that end in 1 or 2 stages. Not every company sends a take-home-assignment Some companies prefer live coding instead of sending take-home-assignments.

A seemingly easy question: Can you tell me a little about yourself?

The first question that came up in all the interviews I attended was “Can you tell me a little about yourself”. Sounds easy, right? I can say that this question determines the course of the whole interview.

Interview questions basically develop depending on your previous experience. What are your areas of interest, did you launch a project on this subject, how many people did you work with, what programming language/framework did you use, is the project live now, etc. This is the first stage of the interview.

In other words, I can say that the key information you provide in the part where you describe yourself and what you have done can determine other questions to come.

I suggest that you make a confident introduction in the part about yourself, try to explain in technical terms while talking about your projects, and explain in a fluent and non-confusing way.

I have seen that your command of the subject you describe, owning past projects and your work are important in creating a good impression at the first stage. I would like to point this out as well.

Technical Interview Questions

In the first part of the technical questions, I received questions about technical parts of the projects I explained in the question “Can you tell me a little about yourself” above. You must be realistic in your projects, and you can remain confident that questions will come from here.

For example; I applied for a company that only deals with structured data and interviewed for the Data Scientist position. Many Computer Vision questions were asked because my previous projects were based on computer vision. Even if I didn’t deal with Computer Vision there, I’m sure they were asking me to test the accuracy of what I said.

In all the interviews, there were many questions about whether I understood the math of the job or not. Much more than the questions I’ve listed below have been asked in different scenarios, in easy-to-hard ways. Therefore, I would say that you should research the questions I have listed below in more detail. I list some sample questions below;

  • Why don’t we use the Relu activation function in the output layer?
  • We have a scenario like this, according to this scenario, which activation function would you prefer in the output layer? Why?
  • Which metrics did you use in your projects?
  • Can you distinguish between metrics such as Precision, Recall, and F1 Score, what are the differences? What metrics would you use in the XXX scenario?
  • What is dropout? How is the output affected when a dropout is always applied to the same value?
  • How does our Confusion Matrix change when we increase the threshold from 50% to 80%?
  • Do you know ML/DL algorithms? (You are not expected to know all the algorithms in this section, but knowing the basic algorithms, in particular, earns plus points. I received questions about algorithms such as PCA, K-Means, SVM, Decision Tree, XGBoost)
  • Optimization algorithm selection and optimization logic
  • Have you worked with pre-trained models before? Which ones did you use? How did it benefit your output?
  • What is transfer learning, have you used it before?
  • What is Augmentation? What augmentation techniques would you use in a scenario where our goal is XXX?
  • Which frameworks did you use? For example, do you know the tf.record, tf.data features for Tensorflow, and how did you use them?
  • What is normalization, and what are normalization techniques? Why do we use normalization?
  • What is overfitting-underfitting, and when does it occur? How would you intervene? (These questions sometimes come directly without saying overfitting-underfitting. For example, they can be asked with scenarios such as this much success in the train data but this much success in the test dataset)
  • How would you separate the test and train dataset? Why is the validation dataset necessary?
  • What is cross-validation?
  • What is regularization, what are the differences between the techniques, and in which situations is it used?
  • Recommendation systems and system scenario questions

Python, R, SQL, and more

Python is a very important programming language for Data Scientist positions. If you can read the data, can you actively use libraries such as Numpy, Pandas, Scikitlearn, to answer their questions and analyze the data?

You can find the article where I listed the resources for the Python programming language here.

Reading, analyzing, shaping, and preprocessing data is very important for Data Science. That’s why it comes as a default request to be able to understand various data with the programming language.

Although SQL is not asked in every interview, it is a required skill from time to time in Data Science positions. In data science, your job may fall into SQL, so be prepared for SQL experience questions that may come in the interview.

R is a programming language that is not asked because I have no experience but came up in Data Science interviews.

You can find my article here, where I list the resources you can use to learn programming languages and improve your algorithm skills.

Photo by Scott Graham on Unsplash

Cloud Experience: Google Cloud, AWS, and more

Cloud experience was a subject that I mentioned during my application, which is also on my CV and which I think put me a few steps ahead in the interviews.

In Data Science studies, most of the time, work is done on the cloud, not locally, for reasons such as data density, model complexity, and installation costs. Therefore, it is important to have a good command of working with notebooks, to know Anaconda products as well as to have information about the services offered by Cloud technologies. Most of the time, using Cloud tools rather than rewriting a project significantly reduces costs. Cloud products are very useful not only in terms of hardware but also in terms of tools.

Therefore, entering Data Science interviews with knowledge of Cloud products and experience if you can, will put you ahead.

Unexpected Questions

There were also problem solving questions in the interview. In addition to the opposite questions on the technical side, some questions measure your analysis skills and way of thinking. In such questions, it is very important to think aloud, not hesitate to make comments, and attribute what you say to a reason.

  • How many families with children are there in Istanbul?
  • How many liters of olive oil is used in a year in Turkey?
  • If we have 8 balls and a balance board and only 1 of the balls is heavier, how many ways can we balance the heavy ball with the balance board?
  • How much shampoo is consumed in a year in the world?

I’m sure the questions were surprising for you too. Do not forget that the purpose of asking these questions is to evaluate and analyze different situations, which is one of the important competencies of Data Science, and accordingly, answering the problem in each question using data vernacular will earn you plus points.

For example, the question of shampoo usage rate; How many times a week do you shower on average and how often you use how many liters of shampoo can be a good introduction. Thinking out loud about details and expressing abnormal and different situations by looking at the data will have a positive effect, such as people who use soap instead of shampoo, people with long hair shampoo their hair 2–3 times in a single shower, people who do not have hair will not use one or no shampoo at all.

Conclusion

The field of Data Science is broad and companies can ask different questions, but the basic ML and Python knowledge-based questions are the subjects of almost every interview.

Do not be afraid to enter the interview, to come to the technical stage. Most of the time, the questions are directed towards you and your answers.

By using LinkedIn or different job search sites, you can check the requests of companies that open Data Scientist advertisements, note the technologies you do not know, and prepare for interviews by making up for your shortcomings.

It is very important to be positive, confident, and knowledgeable of past projects in the interview. Understanding the projects end-to-end, creating scenarios, analyzing them, and thinking aloud are among my suggestions that can positively affect the course of your interview :)

I wanted to list some sample questions that were asked to me with some suggestions on how to answer. I hope you earn a great career.

I regularly try to write resource recommendations and technical articles. You can follow my Medium account, if you like the article, you can present your appreciation with claps. Your comments and interaction with me will make me happy.

If you want to reach my social media accounts, contact me, and be informed about my work, I leave my website. You can follow and communicate with me on social media, especially Twitter. Thanks!

--

--

İrem Kömürcü

Google Developer Expert on Machine Learning | Data Scientist @Deloitte | iremkomurcu.com