D DevBrainBox

Artificial Intelligence

Artificial Intelligence Interview Questions and Answers

1. What is Artificial Intelligence?

A: AI is the simulation of human intelligence in machines that are programmed to think, learn, and solve problems like humans.

2. What are the types of AI?

  • Narrow AI (Weak AI): Performs a specific task (e.g., recommendation engines).
  • General AI (Strong AI): Machines with human-like cognitive abilities.
  • Super AI: Hypothetical AI surpassing human intelligence.

3. What is the difference between AI and Machine Learning?

  • AI: Broader concept where machines can perform smart tasks.
  • ML: Subset of AI, machines learn from data without being explicitly programmed.

4. What is the Turing Test?

A: Proposed by Alan Turing to determine if a machine can exhibit behavior indistinguishable from a human.

5. What is the difference between supervised, unsupervised, and reinforcement learning?

  • Supervised: Learn from labeled data (classification, regression).
  • Unsupervised: Find patterns in unlabeled data (clustering).
  • Reinforcement: Learn by trial and error with rewards.

6. What is overfitting?

A: When a model learns the noise in training data, performs well on train but poorly on unseen data.

7. How can you reduce overfitting?

  • Cross-validation
  • Pruning (in trees)
  • Regularization (L1, L2)
  • Dropout (in NN)
  • More data

8. What is underfitting?

A: Model too simple to capture underlying trend, performs poorly on both train and test data.

9. What is bias-variance tradeoff?

A:

  • Bias: Error from wrong assumptions (underfitting).
  • Variance: Error from sensitivity to small fluctuations (overfitting).
  • Goal: balance both.

10. Explain the difference between classification and regression.

  • Classification: Predicts discrete labels (spam vs not spam).
  • Regression: Predicts continuous output (house price).

11. What is gradient descent?

A: Optimization algorithm to minimize a cost function by iteratively moving toward steepest descent.

12. What is learning rate?

A: Size of steps taken during gradient descent. Too high: overshoot; too low: slow convergence.

13. What is stochastic gradient descent (SGD)?

A: Uses a random subset (mini-batch) of data to compute gradient, faster but noisier.

14. What is a confusion matrix?

A: Table showing true positives, false positives, true negatives, false negatives for classification.

15. What is precision, recall, F1-score?

  • Precision: TP / (TP + FP) – how many selected items are relevant.
  • Recall: TP / (TP + FN) – how many relevant items are selected.
  • F1: Harmonic mean of precision & recall

16. What is deep learning?

A: Subset of ML using neural networks with many layers to model complex patterns.

17. What is a neural network?

A: Computational model inspired by the brain, made of interconnected nodes (neurons).

18. What is backpropagation?

A: Algorithm to update weights in a NN by computing gradient of loss wrt weights.

19. What is dropout?

A: Regularization technique where randomly selected neurons are ignored during training.

20. What is activation function?

A: Introduces non-linearity. Examples:

  • ReLU: max(0,x)
  • Sigmoid: outputs between 0-1
  • Tanh: outputs between -1 and 1

21. What is CNN?

A: Convolutional Neural Network, specialized for image data, uses filters to capture spatial hierarchies.

22. What is RNN?

A: Recurrent Neural Network, processes sequential data by maintaining a hidden state.

23. What are vanishing gradients?

A: In deep nets, gradients shrink exponentially making learning slow (common in RNN).

24. How does batch normalization help?

A: Normalizes layer inputs, speeds up training and can reduce overfitting.

25. What is NLP?

A: Field of AI focused on interaction between computers and human language.

26. What is tokenization?

A: Breaking text into smaller units (words or subwords).

27. What is stemming vs lemmatization?

  • Stemming: Cuts to root (play → play, playing → play).
  • Lemmatization: Maps to dictionary base form (better → good).

28. What is bag-of-words?

A: Text is represented as a set of word counts ignoring grammar & order.

29. What is TF-IDF?

A: Term Frequency-Inverse Document Frequency, weighs words by how important they are to a doc relative to a corpus.

30. What is word embedding?

A: Vector representation of words capturing context. Examples: Word2Vec, GloVe.

31. What are transformers?

A: Attention-based architecture (e.g., BERT, GPT) that handles sequence data without recurrence.

32. What is attention mechanism?

A: Allows models to focus on relevant parts of input sequence when predicting.

33. What is fairness in AI?

A: Ensuring models do not discriminate against protected groups.

34. What is model interpretability?

A: Ability to understand why a model made a prediction. Tools: SHAP, LIME.

35. What is adversarial example?

A: Small perturbations to input that fool a model (e.g., misclassifying images).

36. Give examples of AI applications.

  • Self-driving cars
  • Chatbots
  • Fraud detection
  • Medical imaging

37. What is reinforcement learning used for?

A: Games (AlphaGo), robotics, recommendation optimization.

38. What is recommendation system?

A: Predicts user preferences based on past behavior or similar users/items.

39. What is hyperparameter tuning?

A: Process of finding best hyperparameters (e.g., learning rate, depth).

40. Techniques for hyperparameter tuning?

  • Grid search
  • Random search
  • Bayesian optimization

41. What is early stopping?

A: Stops training when performance on validation set stops improving.

42. What is cross-validation?

A: Technique to assess model performance by partitioning data into folds.

43. What is the gradient?

A: Vector of partial derivatives indicating direction of steepest increase.

44. What is softmax?

A: Converts vector of scores to probabilities summing to 1.

45. What is entropy?

A: Measure of uncertainty or randomness.

46. What is ROC curve?

A: Graph of TPR vs FPR at different thresholds. AUC summarizes performance.

47. What is TensorFlow?

A: Open-source deep learning framework by Google.

48. What is PyTorch?

A: Popular deep learning library by Facebook, more pythonic & dynamic graphs.

49. What is scikit-learn used for?

A: Classical ML models, preprocessing, cross-validation.

50. What is Hugging Face Transformers?

A: Library for state-of-the-art NLP models.

51. What is a GAN?

A: Generative Adversarial Network. Consists of two nets — generator & discriminator — competing in a zero-sum game to produce realistic data.

52. Example application of GANs?

Image synthesis (deepfakes)

  • Super-resolution
  • Style transfer

53. What is LSTM?

A: Long Short-Term Memory, type of RNN that mitigates vanishing gradient with gates controlling information flow.

54. What is GRU?

A: Gated Recurrent Unit, simpler variant of LSTM with fewer gates.

55. What is an autoencoder?

A: Neural net trained to compress (encode) data and then reconstruct (decode), learning efficient representations.

56. What is a Variational Autoencoder (VAE)?

A: Adds probabilistic elements to autoencoders, allowing sampling from latent space.

57. What is transfer learning?

A: Using a pretrained model (like on ImageNet) and fine-tuning it on a new task.

58. What is fine-tuning vs feature extraction?

  • Feature extraction: freeze base layers, train head.
  • Fine-tuning: unfreeze some/all layers, train with low LR.

59. What is one-hot encoding?

A: Represent categorical variables as binary vectors.

60. Why is normalization important?

A: Scales input data, speeds up convergence. E.g., mean=0, std=1.

61. What is Q-learning?

A: Model-free RL algorithm that learns value of actions in states to maximize cumulative reward.

62. What is Bellman Equation?

A: Recursive equation that describes relationship between value of state and values of successor states.

63. What is epsilon-greedy policy?

A: With probability ε, explore random action; otherwise exploit best known action.

64. What is a policy gradient?

A: Directly optimizes the policy (probability distribution over actions) via gradient ascent.

65. What is the difference between value-based and policy-based RL?

  • Value-based: learns value functions (Q-learning, DQN).
  • Policy-based: directly optimizes policy (REINFORCE).

66. What is PPO?

A: Proximal Policy Optimization, stable policy gradient algorithm used in RL.

67. What is BERT?

A: Bidirectional Encoder Representations from Transformers, pretrained on masked LM + next sentence prediction.

68. What is GPT?

A: Generative Pretrained Transformer, autoregressive model trained to predict next token.

69. What is token embedding vs positional embedding?

  • Token embedding: maps words/subwords to vectors
  • Positional embedding: injects order info into input.

70. What is attention score formula?

A: Attention(Q,K,V) = softmax(QK^T / sqrt(dk)) * V

71. What is multi-head attention?

A: Runs attention multiple times in parallel with different projections, captures various relationships.

72. What is learning rate scheduler?

A: Adjusts learning rate during training. E.g., reduce LR on plateau.

73. What is gradient clipping?

A: Limits gradient norms to avoid exploding gradients.

74. What is data augmentation?

A: Artificially increases dataset by modifying inputs (rotate, flip, noise).

75. What is early stopping?

A: Stop training when validation loss stops improving.

76. What is ONNX?

A: Open Neural Network Exchange, format for exporting models to run on different platforms.

77. What is TensorRT?

A: NVIDIA library for optimizing/trt-compiling models for inference on GPUs.

78. What is MLflow?

A: Platform for tracking experiments, packaging code, deploying models.

79. What is Kubeflow?

A: Kubernetes-native platform for deploying scalable ML pipelines.

80. How to serve a model in production?

  • REST API (FastAPI, Flask)
  • TensorFlow Serving
  • TorchServe

81. PyTorch vs TensorFlow main differences?

  • PyTorch: dynamic graphs, easier debugging.
  • TensorFlow: static graphs (TF2 eager by default), production optimizations.

82. What is Keras?

A: High-level API for building NN, now part of TensorFlow.

83. What is scikit-learn pipeline?

A: Chains preprocessing and modeling into one workflow.

84. What is CatBoost?

A: Gradient boosting library that handles categorical features automatically.

85. What is LightGBM?

A: Fast gradient boosting implementation by Microsoft, uses leaf-wise tree growth.

86. What is RMSE?

A: Root Mean Squared Error, common regression metric.

87. What is log loss?

A: Measures uncertainty of predictions for classification.

88. What is K-means?

A: Clustering algorithm that partitions data into k groups by minimizing variance.

  1. What is silhouette score? A: Measures how similar a point is to its own cluster vs other clusters.

90. What is PCA?

A: Principal Component Analysis, reduces dimensionality by projecting onto principal components.

91. What is SHAP?

A: SHapley Additive exPlanations, interprets contributions of each feature.

92. What is LIME?

A: Local Interpretable Model-agnostic Explanations, perturbs input to see effect on prediction.

93. What is differential privacy?

A: Adds noise to data/models to protect individual data points.

94. What is fairness through unawareness?

A: Avoid using protected attributes in model. But proxies may still introduce bias.

95. Why might validation loss increase while train loss decreases?

A: Overfitting.

96. What to do if gradients are exploding?

  • Lower learning rate
  • Use gradient clipping
  • Try batch normalization

97. What to check if accuracy doesn’t improve at all?

  • Learning rate too high or too low
  • Wrong labels
  • Data leakage

98. How to debug training instability?

Plot learning curves, try smaller architecture first.

99. How to deploy low-latency models?

  • Quantization
  • Pruning
  • Use GPUs or edge accelerators.

100. What is your favorite recent advancement in AI?

(Open-ended — could mention diffusion models, ChatGPT, AlphaFold, generative video.)

On this page

1. What is Artificial Intelligence?2. What are the types of AI?3. What is the difference between AI and Machine Learning?4. What is the Turing Test?5. What is the difference between supervised, unsupervised, and reinforcement learning?6. What is overfitting?7. How can you reduce overfitting?8. What is underfitting?9. What is bias-variance tradeoff?10. Explain the difference between classification and regression.11. What is gradient descent?12. What is learning rate?13. What is stochastic gradient descent (SGD)?14. What is a confusion matrix?15. What is precision, recall, F1-score?16. What is deep learning?17. What is a neural network?18. What is backpropagation?19. What is dropout?20. What is activation function?21. What is CNN?22. What is RNN?23. What are vanishing gradients?24. How does batch normalization help?25. What is NLP?26. What is tokenization?27. What is stemming vs lemmatization?28. What is bag-of-words?29. What is TF-IDF?30. What is word embedding?31. What are transformers?32. What is attention mechanism?33. What is fairness in AI?34. What is model interpretability?35. What is adversarial example?36. Give examples of AI applications.37. What is reinforcement learning used for?38. What is recommendation system?39. What is hyperparameter tuning?40. Techniques for hyperparameter tuning?41. What is early stopping?42. What is cross-validation?43. What is the gradient?44. What is softmax?45. What is entropy?46. What is ROC curve?47. What is TensorFlow?48. What is PyTorch?49. What is scikit-learn used for?50. What is Hugging Face Transformers?51. What is a GAN?52. Example application of GANs?53. What is LSTM?54. What is GRU?55. What is an autoencoder?56. What is a Variational Autoencoder (VAE)?57. What is transfer learning?58. What is fine-tuning vs feature extraction?59. What is one-hot encoding?60. Why is normalization important?61. What is Q-learning?62. What is Bellman Equation?63. What is epsilon-greedy policy?64. What is a policy gradient?65. What is the difference between value-based and policy-based RL?66. What is PPO?67. What is BERT?68. What is GPT?69. What is token embedding vs positional embedding?70. What is attention score formula?71. What is multi-head attention?72. What is learning rate scheduler?73. What is gradient clipping?74. What is data augmentation?75. What is early stopping?76. What is ONNX?77. What is TensorRT?78. What is MLflow?79. What is Kubeflow?80. How to serve a model in production?81. PyTorch vs TensorFlow main differences?82. What is Keras?83. What is scikit-learn pipeline?84. What is CatBoost?85. What is LightGBM?86. What is RMSE?87. What is log loss?88. What is K-means?90. What is PCA?91. What is SHAP?92. What is LIME?93. What is differential privacy?94. What is fairness through unawareness?95. Why might validation loss increase while train loss decreases?96. What to do if gradients are exploding?97. What to check if accuracy doesn’t improve at all?98. How to debug training instability?99. How to deploy low-latency models?100. What is your favorite recent advancement in AI?