Deep Learning for Vision (DLV)


This course studies learning visual representations for common computer vision tasks including matching, retrieval, classification, and object detection. Related problems are discussed including indexing, nearest neighbor search, clustering, and dimensionality reduction. The course discusses well-known methods from low-level description to intermediate representation, and their dependence on the end task. It then studies a data-driven approach where the entire pipeline is optimized jointly in a supervised fashion, according to a task-dependent objective. Deep learning models are studied in detail and interpreted in connection to conventional models. The focus of the course is on recent, state of the art methods and large scale applications.


Visual representations, convolutional neural networks, classification, regression, matching, image retrieval, object detection


Basic knowledge of Linear Algebra, Calculus, Probabilities, Machine Learning, Python, C++


Part 1 - Conventional methods
  • Global and local visual descriptors, dense and sparse representation, feature detectors. Encoding and pooling methods, vocabularies, bag-of-words. Match kernels, high-dimensional embedding, Fisher vectors, vectors of locally aggregated descriptors.
  • Spatial matching, geometric models, RANSAC, Hough transform. Pyramid matching, spatial and Hough pyramids. Object localization and detection, subwindow search, constellation model, Hough model, deformable part model.
  • Indexing and approximate nearest neighbor search. Tree-based methods, inverted index and multi-index. Hashing, product quantization, optimized methods. Clustering, dimensionality reduction, density estimation, expectation-maximization.
  • Naive Bayes and nearest neighbor classification. Linear regression and classification, logistic regression, support vector machines, neural networks. Activation functions and loss functions, stochastic gradient descent.
Part 2 - Deep learning approach
  • Back-propagation, computational graphs, automatic differentiation, Jacobian and Hessian calculation.
  • Convolution, pooling, strided convolution, dilated convolution. Convolutional neural networks, deep learning. Deconvolution, fully convolutional networks. Efficient implementations.
  • Parameter initialization, data-dependent initialization, normalization, regularization. Optimization methods, second-order methods, Hessian-free methods.
  • Convolutional networks for object localization and detection. Class-agnostic region proposals, bounding box regression, non-maxima suppression, part-based models, spatial transformers, attention networks.
  • Recurrent networks, context modeling. Residual networks, architecture learning. Network visualization, image synthesis, adversarial learning. Transfer learning, weakly supervised, semi-supervised and unsupervised learning.
  • Convolutional networks for image retrieval. Siamese, triplet, and batch-wise loss functions. Embedding, pooling, dimensionality reduction and manifold learning. Region proposals, partial matching, spatial matching, quantization and diffusion.

Learning outcomes

  • Understand conventional visual representations including local features, descriptors, vocabularies, encoding.
  • Understand geometric models, transformations, invariant matching, robust estimation.
  • Be able to adjust existing algorithms and pipelines into parameterized, differentiable functions that can be learned from data in an end-to-end fashion.
  • Understand deep learning essentials including backpropagation, initialization, regularization, activation and loss functions for different tasks.
  • Understand different models and architectures including residual and networks; siamese, unsupervised, adversarial and transfer learning.
  • Familiarize with and use existing deep learning libraries and models.


Ewa Kijak (responsible), Yannis Avrithis