Video segmentation has always been a major topic in computer vision and multimedia. Indeed, partitioning the frames of a video into groups of pixels that share certain visual characteristics (color, intensity, textures, etc.) is the first step towards the “understanding” of the captured scene. In other words, the goal of segmentation is to transform the images into a set of adjacent regions that are more meaningful and easier to analyse.
This crucial step has so many applications that we will cite only a subset:
- Object detection: because multiple objects have a predefined color/intensity/texture, a photometric segmentation enables to isolate such specific object in an image/video. For example, the car industry is pushing foward to automatically detect and recognize traffic signs/lights, which relies on color segmentation.
- Data compression: because all the parts in an image do not have the same importance (e.g. face vs background in a portrait), segmentation enables to partition an image and compress indecently these groups.
- Medical imaging: tumours and other pathologies have sometimes discriminative shapes/intensities. By computing relevant separations, segmentation enables to determine such shapes.
- 3D reconstruction: if a relation between the 2D domain (image) and the 3D domain (scene) is known, the resulting contours after image segmentation can be used to create a 3D model of the segmented object. This technique is for example widely used in [[ViewInterpolation | model-based rendering]].
- Content-based image retrieval: given an image (a query), similar images can be find automatically. Instead of defining this similarity at the pixel level (which is both computational and noisy), research tools use to compare the segmented regions.
- Tracking: video segmentation can be used to extract color features which, in turn, can be exploited by multiple object tracking to disambiguate between targets.
Our group works on efficient and effective methods to segment videos, in such a way to simplify their analysis.
For example, autonomous statistics and summaries of sport events could be produced by automatically detecting and recognizing the sport players and their actions on the field. This recognition is only possible by extracting representative features that enable to distinguish among players, such as the number printed on their jerseys.
Once the plausible digit regions have been extracted/segmented, their recognition is based on feature-based classification.