BASI di DATI MULTIMEDIALI 2013-14 Alberto del Bimbo
Multimedia Recognition and Indexing Professor
Course program
Week 1
Section 1. Introduction to recognition and indexing of visual data
(Professor: Alberto del Bimbo)
Week 2
Section 2. Global image features (Recall of image analysis)
(Professor: Alberto del Bimbo)
Global image features : Color; Texture; Edges and Lines
Dimensionality reduction: PCA, LDA, Eigenfaces
References
[A] Richard Szeliski, Computer Vision Algorithms and Applications, Springer 2010, Chapter 4
[B] Alberto del Bimbo, Visual Information Retrieval, Morgan Khaufman, 1999, Chapter 2-4
Week 3
Section 3. The MPEG7 standard
(Professor: Alberto del Bimbo)
MPEG7 holistic descriptors [1]
References
[1] ISO/IEC TR 15938-8:2002, Information technology -- Multimedia content description interface-Part 8: Extraction and use of MPEG-7 descriptions, http://www.iso.org/iso/
Week 4
Laboratory 1: MPEG7
(Assistant: Marco Bertini)
Week 5 - 6
Section 4. Local image features
(Professor: Alberto del Bimbo)
Rotation invariant Harris corner detector
Scale invariant keypoint detectors: Harris-Laplacian [1], SIFT Scale Invariant Feature Transform [2], SURF Speed Up Robust Features [3]
Affine invariant region detectors: Harris affine, Intensity Extrema Regions, MSER Maximally Stable Extremal Regions [4]
Local descriptors: SIFT [2], Color SIFT, SURF [3], GLOH Gradient Location and Orientation Histogram
References
[A] Richard Szeliski, Computer Vision Algorithms and Applications, Springer 2010, Chapter 4
[1] Krystian Mikolajczyk and Cordelia Schmid, A Performance Evaluation of Local Descriptors, IEEE TPAMI 2005
[2] David Lowe, Distinctive Image Features from Scale-Invariant Keypoints , International Journal of Computer Vision, 2004.
[3] Herbert Bay, Andreas Ess, Tinne Tuytelaars, and Luc Van Gool, Speeded-Up Robust Features (SURF), Elsevier, 2008
[4] J. Matas, O. Chum, M. Urban, T. Pajdla, Robust Wide Baseline Stereo from Maximally Stable Extremal Regions, British Machine Vision Int. Conference, 2002
Week 7
Section 5 Visual words and bag of Words representation
(Professor: Alberto del Bimbo)
Visual Words and Bag of Words model: vocabulary formation by K-means, Radius-based clustering [1]
References
[A] Richard Szeliski, Computer Vision Algorithms and Applications, Springer 2010, Chapter 14
[1] Gabriella Csurka, Christopher R. Dance, Lixin Fan, Jutta Willamowski, Cédric Bray , Visual Categorization with Bags of Keypoints
Week 8
Section 6. Object instance recognition
(Professor: Alberto del Bimbo)
Nearest Neighbour Matching
Geometric alignment and outliers rejection: Random Sample Consensus
Video Google [1]
References
[A] Richard Szeliski, Computer Vision Algorithms and Applications, Springer 2010, Chapter 4, 5, 6
[1] Josef Sivic, Andrew Zisserman, Video Google: A Text Retrieval Approach to Object Matching in Videos, ICCV 2003
Week 9 - 10
Section 7. Object categorization
(Professors: Alberto del Bimbo, Andy Bagdanov°, Lorenzo Seidenari*)
Bayes classification (Recall of statistical principles) °
Support Vector Machines discriminative classifier °
Partial matching of sets of features: Pyramid Matching Kernel [1] Spatial Pyramid Matching
HOG Histogram of Oriented Gradients people detector [2]
Boosting classifier, Adaboost
Viola and Jones face detector [3]
Probabilistic Latent Semantic Analysis generative classifier [4] *
Expectation maximization (Recall of statistical principles) *
References
[A] Richard Szeliski, Computer Vision Algorithms and Applications, Springer 2010, Chapter 4, 5, 6
[B] Christopher Bishop, Pattern Recognition and Machine Learning, Springer 2006, Chapter 2
[1] Kristen Grauman and Trevor Darrell, Pyramid Match Kernels: Discriminative Classification with Sets of Image Features, IEEE ICCV 2005
[2] Navneet Dalal and Bill Triggs, Histograms of Oriented Gradients for Human Detection, IEEE CVPR Int. Conference 2005
[3] Paul Viola and Michael Jones, Robust Real-time Object Detection , Int. Wkshop on Statistical and Computational Theories of Vision, 2001
[4] Florent Monay, Daniel Gatica-Perez, PLSA-based Image Auto-Annotation: Constraining the Latent Space, ACM Multimedia 2004
Week 11
Laboratory 2 : Bag of Visual Words
(Assistants: Marco Bertini, Lamberto Ballan, Lorenzo Seidenari)
Week 12
Section 8. With image sequences
(Professors: Lorenzo Seidenari °, Andy Bagdanov*)
Spatio-temporal features: holistic features; local features: STIP Spatio-Temporal Interest Point
detector [1], Dollar’s spatio-temporal detector; local descriptors °
Action and Event recognition [2] °
Detection in video sequences *
References
[1] Ivan Laptev, On Space-Time Interest Points, International Journal of Computer Vision, 2005
[2] L. Ballan, M. Bertini, A. Del Bimbo, L. Seidenari, and G. Serra, "Effective Codebooks for Human Action Categorization," IEEE ICCV Int. Workshop on Video-oriented Object and Event Classification (VOEC), 2009.
Week 13
Laboratory 3 : Detection and tracking
(Assistants: Andy Bagdanov, Giuseppe Lisanti)
Week 14
Section 9. Matching at large scale
(Professor: Alberto del Bimbo)
Hashing [2][3][4]
References
[A] Richard Szeliski, Computer Vision Algorithms and Applications, Springer 2010, Chapter 14
[4] David Nister and Henrik Stewenius, Scalable Recognition with a Vocabulary Tree, IEEE CVPR Int. Conference, 2006
[2] Aristides Gionis, Piotr Indyky, Rajeev Motwaniz, Similarity Search in High Dimensions via Hashing, IEEE VLDB, Int. Conference 1999
[3] Brian Kulis Kristen Grauman, Kernelized Locality-Sensitive Hashing for Scalable Image Search, IEEE ICCV int. Conference, 2009
[4] Mohamed Aly, Peter Welinder, Mario Munich, Pietro Perona, Scaling Object Recognition: Benchmark of Current State of the Art Techniques, IEEE ICCV Int. Conference, 2009
Week 15
Section 10. Exploiting human knowledge
(Professors: Giuseppe Serra °, Marco Bertini *)
Wordnet and ontologies [1] °
RDF, OWL, SWRL °
Data from Social Networks *
References
[1] John Davies, Dieter Fensel, Frank van Harmelen, Towards the Semantic Web: Ontology-driven Knowledge Management, 2002
Course slides
Free pdf copy downloadable at: http://www.micc.unifi.it/delbimbo/teaching/multimedia-databases
(password protected)
Reference textbooks
[A] Richard Szeliski, Computer Vision Algorithms and Applications, Springer 2010
Free copy downloadable at: http://szeliski.org/Book/
for details in algorithms and solutions
[B] Alberto del Bimbo, Visual Information Retrieval, Morgan Khaufman, 1999
for details in algorithms and solutions
Share with your friends: |