Artificial Intelligence
Artificial Intelligence Interview Questions and Answers
1. What is Artificial Intelligence?
A: AI is the simulation of human intelligence in machines that are programmed to think, learn, and solve problems like humans.
2. What are the types of AI?
- Narrow AI (Weak AI): Performs a specific task (e.g., recommendation engines).
- General AI (Strong AI): Machines with human-like cognitive abilities.
- Super AI: Hypothetical AI surpassing human intelligence.
3. What is the difference between AI and Machine Learning?
- AI: Broader concept where machines can perform smart tasks.
- ML: Subset of AI, machines learn from data without being explicitly programmed.
4. What is the Turing Test?
A: Proposed by Alan Turing to determine if a machine can exhibit behavior indistinguishable from a human.
5. What is the difference between supervised, unsupervised, and reinforcement learning?
- Supervised: Learn from labeled data (classification, regression).
- Unsupervised: Find patterns in unlabeled data (clustering).
- Reinforcement: Learn by trial and error with rewards.
6. What is overfitting?
A: When a model learns the noise in training data, performs well on train but poorly on unseen data.
7. How can you reduce overfitting?
- Cross-validation
- Pruning (in trees)
- Regularization (L1, L2)
- Dropout (in NN)
- More data
8. What is underfitting?
A: Model too simple to capture underlying trend, performs poorly on both train and test data.
9. What is bias-variance tradeoff?
A:
- Bias: Error from wrong assumptions (underfitting).
- Variance: Error from sensitivity to small fluctuations (overfitting).
- Goal: balance both.
10. Explain the difference between classification and regression.
- Classification: Predicts discrete labels (spam vs not spam).
- Regression: Predicts continuous output (house price).
11. What is gradient descent?
A: Optimization algorithm to minimize a cost function by iteratively moving toward steepest descent.
12. What is learning rate?
A: Size of steps taken during gradient descent. Too high: overshoot; too low: slow convergence.
13. What is stochastic gradient descent (SGD)?
A: Uses a random subset (mini-batch) of data to compute gradient, faster but noisier.
14. What is a confusion matrix?
A: Table showing true positives, false positives, true negatives, false negatives for classification.
15. What is precision, recall, F1-score?
- Precision: TP / (TP + FP) – how many selected items are relevant.
- Recall: TP / (TP + FN) – how many relevant items are selected.
- F1: Harmonic mean of precision & recall
16. What is deep learning?
A: Subset of ML using neural networks with many layers to model complex patterns.
17. What is a neural network?
A: Computational model inspired by the brain, made of interconnected nodes (neurons).
18. What is backpropagation?
A: Algorithm to update weights in a NN by computing gradient of loss wrt weights.
19. What is dropout?
A: Regularization technique where randomly selected neurons are ignored during training.
20. What is activation function?
A: Introduces non-linearity. Examples:
ReLU: max(0,x)
- Sigmoid: outputs between 0-1
- Tanh: outputs between -1 and 1
21. What is CNN?
A: Convolutional Neural Network, specialized for image data, uses filters to capture spatial hierarchies.
22. What is RNN?
A: Recurrent Neural Network, processes sequential data by maintaining a hidden state.
23. What are vanishing gradients?
A: In deep nets, gradients shrink exponentially making learning slow (common in RNN).
24. How does batch normalization help?
A: Normalizes layer inputs, speeds up training and can reduce overfitting.
25. What is NLP?
A: Field of AI focused on interaction between computers and human language.
26. What is tokenization?
A: Breaking text into smaller units (words or subwords).
27. What is stemming vs lemmatization?
- Stemming: Cuts to root (play → play, playing → play).
- Lemmatization: Maps to dictionary base form (better → good).
28. What is bag-of-words?
A: Text is represented as a set of word counts ignoring grammar & order.
29. What is TF-IDF?
A: Term Frequency-Inverse Document Frequency, weighs words by how important they are to a doc relative to a corpus.
30. What is word embedding?
A: Vector representation of words capturing context. Examples: Word2Vec, GloVe.
31. What are transformers?
A: Attention-based architecture (e.g., BERT, GPT) that handles sequence data without recurrence.
32. What is attention mechanism?
A: Allows models to focus on relevant parts of input sequence when predicting.
33. What is fairness in AI?
A: Ensuring models do not discriminate against protected groups.
34. What is model interpretability?
A: Ability to understand why a model made a prediction. Tools: SHAP, LIME.
35. What is adversarial example?
A: Small perturbations to input that fool a model (e.g., misclassifying images).
36. Give examples of AI applications.
- Self-driving cars
- Chatbots
- Fraud detection
- Medical imaging
37. What is reinforcement learning used for?
A: Games (AlphaGo), robotics, recommendation optimization.
38. What is recommendation system?
A: Predicts user preferences based on past behavior or similar users/items.
39. What is hyperparameter tuning?
A: Process of finding best hyperparameters (e.g., learning rate, depth).
40. Techniques for hyperparameter tuning?
- Grid search
- Random search
- Bayesian optimization
41. What is early stopping?
A: Stops training when performance on validation set stops improving.
42. What is cross-validation?
A: Technique to assess model performance by partitioning data into folds.
43. What is the gradient?
A: Vector of partial derivatives indicating direction of steepest increase.
44. What is softmax?
A: Converts vector of scores to probabilities summing to 1.
45. What is entropy?
A: Measure of uncertainty or randomness.
46. What is ROC curve?
A: Graph of TPR vs FPR at different thresholds. AUC summarizes performance.
47. What is TensorFlow?
A: Open-source deep learning framework by Google.
48. What is PyTorch?
A: Popular deep learning library by Facebook, more pythonic & dynamic graphs.
49. What is scikit-learn used for?
A: Classical ML models, preprocessing, cross-validation.
50. What is Hugging Face Transformers?
A: Library for state-of-the-art NLP models.
51. What is a GAN?
A: Generative Adversarial Network. Consists of two nets — generator & discriminator — competing in a zero-sum game to produce realistic data.
52. Example application of GANs?
Image synthesis (deepfakes)
- Super-resolution
- Style transfer
53. What is LSTM?
A: Long Short-Term Memory, type of RNN that mitigates vanishing gradient with gates controlling information flow.
54. What is GRU?
A: Gated Recurrent Unit, simpler variant of LSTM with fewer gates.
55. What is an autoencoder?
A: Neural net trained to compress (encode) data and then reconstruct (decode), learning efficient representations.
56. What is a Variational Autoencoder (VAE)?
A: Adds probabilistic elements to autoencoders, allowing sampling from latent space.
57. What is transfer learning?
A: Using a pretrained model (like on ImageNet) and fine-tuning it on a new task.
58. What is fine-tuning vs feature extraction?
- Feature extraction: freeze base layers, train head.
- Fine-tuning: unfreeze some/all layers, train with low LR.
59. What is one-hot encoding?
A: Represent categorical variables as binary vectors.
60. Why is normalization important?
A: Scales input data, speeds up convergence. E.g., mean=0, std=1.
61. What is Q-learning?
A: Model-free RL algorithm that learns value of actions in states to maximize cumulative reward.
62. What is Bellman Equation?
A: Recursive equation that describes relationship between value of state and values of successor states.
63. What is epsilon-greedy policy?
A: With probability ε, explore random action; otherwise exploit best known action.
64. What is a policy gradient?
A: Directly optimizes the policy (probability distribution over actions) via gradient ascent.
65. What is the difference between value-based and policy-based RL?
- Value-based: learns value functions (Q-learning, DQN).
- Policy-based: directly optimizes policy (REINFORCE).
66. What is PPO?
A: Proximal Policy Optimization, stable policy gradient algorithm used in RL.
67. What is BERT?
A: Bidirectional Encoder Representations from Transformers, pretrained on masked LM + next sentence prediction.
68. What is GPT?
A: Generative Pretrained Transformer, autoregressive model trained to predict next token.
69. What is token embedding vs positional embedding?
- Token embedding: maps words/subwords to vectors
- Positional embedding: injects order info into input.
70. What is attention score formula?
A: Attention(Q,K,V) = softmax(QK^T / sqrt(dk)) * V
71. What is multi-head attention?
A: Runs attention multiple times in parallel with different projections, captures various relationships.
72. What is learning rate scheduler?
A: Adjusts learning rate during training. E.g., reduce LR on plateau.
73. What is gradient clipping?
A: Limits gradient norms to avoid exploding gradients.
74. What is data augmentation?
A: Artificially increases dataset by modifying inputs (rotate, flip, noise).
75. What is early stopping?
A: Stop training when validation loss stops improving.
76. What is ONNX?
A: Open Neural Network Exchange, format for exporting models to run on different platforms.
77. What is TensorRT?
A: NVIDIA library for optimizing/trt-compiling models for inference on GPUs.
78. What is MLflow?
A: Platform for tracking experiments, packaging code, deploying models.
79. What is Kubeflow?
A: Kubernetes-native platform for deploying scalable ML pipelines.
80. How to serve a model in production?
- REST API (FastAPI, Flask)
- TensorFlow Serving
- TorchServe
81. PyTorch vs TensorFlow main differences?
- PyTorch: dynamic graphs, easier debugging.
- TensorFlow: static graphs (TF2 eager by default), production optimizations.
82. What is Keras?
A: High-level API for building NN, now part of TensorFlow.
83. What is scikit-learn pipeline?
A: Chains preprocessing and modeling into one workflow.
84. What is CatBoost?
A: Gradient boosting library that handles categorical features automatically.
85. What is LightGBM?
A: Fast gradient boosting implementation by Microsoft, uses leaf-wise tree growth.
86. What is RMSE?
A: Root Mean Squared Error, common regression metric.
87. What is log loss?
A: Measures uncertainty of predictions for classification.
88. What is K-means?
A: Clustering algorithm that partitions data into k groups by minimizing variance.
- What is silhouette score? A: Measures how similar a point is to its own cluster vs other clusters.
90. What is PCA?
A: Principal Component Analysis, reduces dimensionality by projecting onto principal components.
91. What is SHAP?
A: SHapley Additive exPlanations, interprets contributions of each feature.
92. What is LIME?
A: Local Interpretable Model-agnostic Explanations, perturbs input to see effect on prediction.
93. What is differential privacy?
A: Adds noise to data/models to protect individual data points.
94. What is fairness through unawareness?
A: Avoid using protected attributes in model. But proxies may still introduce bias.
95. Why might validation loss increase while train loss decreases?
A: Overfitting.
96. What to do if gradients are exploding?
- Lower learning rate
- Use gradient clipping
- Try batch normalization
97. What to check if accuracy doesn’t improve at all?
- Learning rate too high or too low
- Wrong labels
- Data leakage
98. How to debug training instability?
Plot learning curves, try smaller architecture first.
99. How to deploy low-latency models?
- Quantization
- Pruning
- Use GPUs or edge accelerators.
100. What is your favorite recent advancement in AI?
(Open-ended — could mention diffusion models, ChatGPT, AlphaFold, generative video.)