ML Overview

Roles#

  • Data Analysis

    • Data Scientist
  • Model Development

    • AI Researcher
    • AI Specialist
    • AI Engineer
  • Service (Production)

    • AI Engineer
    • MLOps

AI Product Development Cycle#

  1. Clearly define the problem and goals
    • target audience, key stakeholders, data and resources
  2. Collect and analyze data
    • Acquire high-quality and relevant data.
    • Perform exploratory data analysis(EDA) to understand the distribution, patterns, and anomalies in the data.
    • Determine the data split between training, validation, and testing.
  3. Prepare the dataset
    • clean, preprocess, transform, missing or imcomplete, imbalance
    • Ensure that the data is properly formatted and normalized.
  4. Choose and train an appropriate model
    • Consider the trade-off between model complexity and interpretability.
    • Ensure that the model is scalable and can handle large datasets.
    • loss function, optimization algorithm
    • Monitor and adjust the model
  5. Evaluate the model’s performance with appropriate metrics
  6. Refine and optimize the model
    • Compare the model’s performance on the test data with its performance on the validation data.
  7. Deploy the model
    • Make sure that it is integrated with the other systems and processes as needed.
  8. Monitor and maintain the model’s performance.
    • Monitor the model’s performance in real-world conditions, and adjust as necessary.
    • Continuously evaluate the model’s performance and make improvements as needed.

Main Challenges#

Data Preprocessing#

  • Insufficient Quantity of Data
  • Nonrepresentative Data
  • Poor-Quality Data
  • Imbalanced data
    • Oversampling
    • Undersampling
    • Generating synthetic
  • Irrelevant Features

Training#

  • Overfitting
    • Dropout
    • Monte Carlo (MC) Dropout
    • Regularization
  • Underfitting
  • The Vanishing/Exploding Gradients Problems
    • Glorot and He Initialization
    • Better Activation Functions
    • Batch Normalization
    • Gradient Clipping
  • Hyperparameter tuning
    • Grid search
    • Random search
    • Bayesian optimization

Evaluation#

  • Explainable AI (XAI)
  • Data Mismatch in Testing and Validating

Math#

  • Softmax

Linear Algebra#

Vector, Matrix(Tensor)

Probability#

Statistics#


Framework#

  • Dataset and Dataloader

  • Optimizer

  • Multi-GPU

  • Monitoring

    • Weights & Biases
    • Tensorboard
  • Neural Network

    • PyTorch; PyG
    • JAX; Flax, Jraph
    • TensorFlow;
  • Accelerator

    • TPU: XLA, ?
    • GPU: CUDA, Triton

Base of ML#

ΘΘηΘL\Theta^* \larr \Theta - \eta\nabla_{\Theta}{\cal L}

  • Gradient Descent

  • Constraints?

Activation Functions

Weight Initializers

Metric

  • Confusion Matrices
    • Decision Boundaries
  • Precision and Recall, and it’s Trade-off
  • ROC Curve
  • F1

Loss (Cost)

  • MSE, MAE
  • Cross Entropy
    • KL divergence
    • Focal

Backpropagation

  • Autodiff (AutoGrad)

Optimizers

  • Momentum, Nesterov Accelerated Gradient, AdaGrad, RMSProp, Adam, AdaMax, Nadam, AdamW

Learning Rate Scheduler

Computational Complexity

Regularization

  • The Normal Equation

  • Regularized Models

    • Ridge Regression
    • Lasso Regression
    • Elastic Net Regression
    • Early Stopping
  • 1\ell_1 and 2\ell_2 Regularization

  • Max-Norm Regularization

  • Estimating Probabilities ?

Architectures#

  • 1-Stage Detector

  • 2-Stage Detector

  • AutoML

Layers#

  • Linear Layers

  • Convolutional Layers

    • CNN
  • Recurrent Neurons

    • LSTM, GRU
  • Attention Mechanisms

    • Transformer
      • Attention Is All You Need: The Original Transformer Architecture
    • Vision Transformers
  • Pooling Layers

    • Avg, Max
  • Normalization Layers

  • Dropout

  • Neural architecture search (NAS)

  • Optimization

    • Tensor Decomposition
    • Quantization
    • Model Compression
      • Knowledge distillation

Technic#

  • Using GPUs to Speed Up Computations

    • Getting Your Own GPU
    • Managing the GPU RAM
    • Placing Operations and Variables on Devices
    • Parallel Execution Across Multiple Devices
  • Training Models Across Multiple Devices

    • Model Parallelism
    • Data Parallelism

Tasks#

  • Supervise, Un-supervise, Semi-supervise

  • Instance-Based vs. Model-Based Learning

  • Classification

  • Regression

  • Annotation

  • Computer Vision (CV)

    • Classification and Localization
    • Object Detection
    • Object Tracking
    • Semantic Segmentation
    • Optical Character Recognition(OCR)
  • Natural Language Processing (NLP)

    • Bag of Words & Word Embedding
    • Forecasting a Time Series
    • Handling Long Sequences
      -Fighting the Unstable Gradients Problem
      -Tackling the Short-Term Memory Problem
    • Sentiment Analysis
      • Masking
      • Reusing Pretrained Embeddings and Language Models
    • An Encoder-Decoder Network for Neural Machine Translation
      • Bidirectional RNNs
      • Beam Search
    • KLUE
    • MRC
    • Summarize
    • Generative
      • GPT
  • RecSys

  • Multi-modal Learning

Models#

  • X-AI

Support Vector Machines#

  • Linear SVM Classification
    • Soft Margin Classification
  • Nonlinear SVM Classification
    • Polynomial Kernel
    • Similarity Features
    • Gaussian RBF Kernel
    • SVM Classes and Computational Complexity
  • SVM Regression
  • Under the Hood of Linear SVM Classifiers
  • The Dual Problem
    • Kernelized SVMs

Decision Trees#

  • The CART Training Algorithm
  • Gini Impurity or Entropy?
  • Regularization Hyperparameters
  • Sensitivity to Axis Orientation
  • Decision Trees Have a High Variance

Ensemble Learning and Random Forests#

  • Voting Classifiers
  • Bagging and Pasting
    • Out-of-Bag Evaluation
  • Random Forests
    • Feature Importance
  • Boosting
  • Stacking

Dimensionality Reduction#

  • Projection
  • Manifold Learning
  • PCA
  • LLE

Clustering#

  • k-means and DBSCAN
  • Gaussian Mixtures
    • Using Gaussian Mixtures for Anomaly Detection

Autoencoder#

  • Efficient Data Representations
  • Performing PCA with an Undercomplete Linear Autoencoder
  • Stacked Autoencoders
  • Convolutional Autoencoders
  • Denoising Autoencoders
  • Sparse Autoencoders
  • Variational Autoencoders

Generative Adversarial Networks#

  • The Difficulties of Training GANs
  • Deep Convolutional GANs
  • Progressive Growing of GANs
  • StyleGANs

Reinforcement Learning#

  • Rewards
  • Policy Search
  • Neural Network Policies
  • Evaluating Actions: The Credit Assignment Problem
  • Policy Gradients
  • Markov Decision Processes
  • Temporal Difference Learning
  • Q-Learning
    • Exploration Policies
    • Approximate Q-Learning and Deep Q-Learning
  • Implementing Deep Q-Learning
  • Deep Q-Learning Variants
    • Fixed Q-value Targets
    • Double DQN
    • Prioritized Experience Replay
    • Dueling DQN

Diffusion Models#

Service#

Cloud Service#

  • Microsoft Azure

  • Google Cloud services: Vertex AI

  • AWS

  • Cloudflare

Drug Discovery Service#

  • Isomorphic Labs, DeepMind

  • Insilico medicien


Reference:

  1. homl, 2nd ed.