Image Segmentation Fundamentals

Image segmentation is a fundamental process in computer vision that subdivides an image into its constituent regions or objects. It is commonly defined as the process of grouping together pixels that share similar attributes, partitioning the image into non-intersecting, homogeneous regions. Segmentation is typically the first phase in pattern recognition and object isolation problems.

The typical image classification cycle follows a distinct pipeline:


Approaches to Image Segmentation

The primary goal of segmentation is to isolate individual objects within an image. There are two fundamental approaches to achieving this:

  1. Discontinuity (Boundary-based): This approach partitions an image based on abrupt changes in intensity, such as points, lines, and edges.

  2. Similarity (Region-based): This approach partitions an image into regions that are similar according to predefined criteria, using techniques like thresholding, region growing, and region splitting and merging.


Detection of Discontinuities

To find discontinuities (points, lines, or edges), the most common method involves passing a small mask (or filter) over the image. The mask determines the specific type of discontinuity being targeted. The response of the mask at any given point is the sum of the products of the mask coefficients and the corresponding image pixels: Z=k=19wkzk.

Point Detection

Point detection utilizes a mask where the coefficients sum to zero, ensuring that the response is zero in areas of constant gray levels. A point is detected if the absolute value of the response is greater than or equal to a non-negative threshold: |R|T.

Line Detection

Line detection uses specific masks designed to extract lines that are one pixel thick in a particular direction. In digital images, straight lines are typically evaluated in horizontal, vertical, or diagonal directions.

Edge Detection

An edge is a set of connected pixels lying on the boundary between two regions. Detecting edges can be challenging because derivative-based edge detectors are extremely sensitive to noise, and actual edges often resemble a "ramp" profile rather than an ideal step profile due to blurring.

First-Order Derivatives (Gradient)

The first derivative of an image identifies edges by finding the maximum rate of change in intensity. The gradient of an image f(x,y) is a vector f containing the partial derivatives in the x and y directions.

Several operators are used to compute these gradients:

To output a binary segmentation matrix, a threshold is applied to the final gradient magnitude. Due to the high level of detail sometimes captured by these masks, images are often smoothed prior to edge detection.

Second-Order Derivatives (Laplacian)

The second derivative finds edges by searching for "zero crossings"—the point where the second derivative changes sign, indicating the midpoint of the edge ramp. The Laplacian of a 2D function is the sum of its second unmixed partial derivatives.

Laplacian of a Gaussian (LoG)

Because the Laplacian is highly sensitive to noise, it is rarely used alone. Instead, it is combined with a Gaussian smoothing filter, creating the Laplacian of a Gaussian, also known as the Mexican hat function. This method uses the Gaussian component for noise removal and the Laplacian component for edge detection.


Edge Linking and Boundary Detection

After detecting edge points, local processing is used to link them together to form continuous boundaries. Adjacent edge points are linked if they share similar properties within a local neighborhood.

Two primary criteria must be met to link a pixel at (x0,y0) to a neighboring pixel at (x,y):

Technique Category Kernel / Mask Key Characteristics & Methodology
Point Detection Discontinuity [111181111] Detects isolated points. The coefficients sum to zero. A point is detected if the absolute response exceeds a threshold ($
Horizontal Line Discontinuity [111222111] Extracts straight lines that are one pixel thick in the horizontal direction.
Vertical Line Discontinuity [121121121] Extracts straight lines that are one pixel thick in the vertical direction.
Diagonal Line (+45°) Discontinuity [211121112] Extracts straight lines that are one pixel thick in the positive diagonal direction.
Diagonal Line (-45°) Discontinuity [112121211] Extracts straight lines that are one pixel thick in the negative diagonal direction.
Roberts Cross Edge (1st Order) Gx=[1001]




Gy=[0110]
Computes the 2D spatial gradient on an image to find maximum rates of intensity change using a simple 2x2 neighborhood.
Prewitt Operator Edge (1st Order) Gx=[111000111]




Gy=[101101101]
Calculates gradient magnitude and direction. It is highly sensitive to noise, often requiring prior image smoothing.
Sobel Operator Edge (1st Order) Gx=[121000121]




Gy=[101202101]
Provides a localized smoothing effect by giving higher weight (2 and -2) to the center pixels, making it slightly more robust to noise than Prewitt.
Laplacian (8-neighbor) Edge (2nd Order) 2=[111181111] Detects edges by locating "zero crossings" (where the 2nd derivative changes sign). Extremely sensitive to noise and rarely used alone.
Laplacian of a Gaussian (LoG) Edge (2nd Order) Gaussian filter followed by Laplacian Uses the Gaussian component to remove noise and blur the image before the Laplacian component detects the edges via zero crossings (Mexican hat function).
Local Processing Edge Linking Local neighborhood analysis (e.g., 3x3 or 5x5) Links detected edge pixels if they have similar gradient magnitudes

Pasted image 20260413233224.png

1. Compute Gradient Matrices (dx and dy)

First, we apply the horizontal and vertical Sobel masks to the image using zero-padding for the boundaries.

According to the slides, the Sobel masks used are: dx=[121000121] dy=[101202101]

Applying these masks to the image yields the following gradient matrices:

dx Matrix:

[1214181215911214121812]

dy Matrix:

[6061095124128622]

2. Compute Gradient Magnitude

The gradient magnitude is calculated using the formula f=dx2+dy2. The slides provide this

computed matrix as: Magnitude Matrix: [13.414.018.915.612.7215.811.0424.0816.9715.2319.6925.05]

3. Compute Gradient Angle

The angle of the gradient is perpendicular to the edge direction and is calculated using the formula α(x,y)=tan1(dydx).

For example, calculating the angle for the top-left pixel (where dx=12 and dy=6):

α(0,0)=tan1(612)=0.464 radians

Applying this to every pixel gives us the Angle Matrix (in radians):

[0.4640.0000.3220.6950.5400.5070.0901.4880.7090.5880.3221.071]

4. Apply Local Edge Linking Criteria

To link edges, we evaluate a local neighborhood against a specific "seed" pixel (x0,y0). A neighboring pixel (x,y) is linked to the seed pixel if both of the following criteria are met:

  1. Magnitude Similarity: |f(x,y)f(x0,y0)|E (where E is a non-negative threshold).

  2. Angle Similarity: |α(x,y)α(x0,y0)|<A (where A is a non-negative angle threshold).

Because the problem statement in Example-3 does not specify the seed pixel or the thresholds E and A, just assume any two numbers and calculate the differences, if the magnitude of diff is less than or equal to thresholds, turn 0 to 1.