Gaze Processing Module¶

Developer reference for the gaze estimation and processing pipeline in MindSight.

1. Overview¶

The gaze processing subsystem lives across five key locations:

File / Directory	Purpose
`gaze_factory.py`	Selects and instantiates the active gaze backend
`gaze_processing.py` (~1 000 lines)	Core processing classes: smoothing, lock-on, snap hysteresis, ray geometry
`gaze_pipeline.py`	Per-frame coordinator that wires detection, estimation, and post-processing together
`pitchyaw_pipeline.py`	Pitch/yaw-specific pipeline utilities
`Backends/`	Plugin backends (MGaze, L2CS, UniGaze, Gazelle)

2. Module Architecture¶

flowchart TD
    A[gaze_pipeline.py] -->|coordinates| B[face_det]
    B --> C[faces]
    C --> D{gaze_eng has custom run_pipeline?}
    D -- yes --> E[gaze_eng.run_pipeline]
    D -- no --> F[_default_scene_pipeline]
    E --> G[apply_tip_snapping]
    F --> G
    G --> H[apply_lock_on]
    H --> I[compute_ray_intersections]
    I --> J[ctx writes]

3. Gaze Factory¶

File: gaze_factory.py

create_gaze_engine(plugin_args) -> GazePlugin

Selects and instantiates the active gaze backend based on CLI flags:

Checks gaze_registry for installed plugins first.
Falls back to built-in backends if no plugin matches.

The returned engine conforms to the GazePlugin interface, which every backend must implement.

4. Gaze Pipeline Coordinator¶

File: gaze_pipeline.py

Entry point: run_gaze_step(ctx, face_det, gaze_eng, gaze_cfg)

Execution order¶

Face detection -- Run RetinaFace on detection_frame, then rescale coordinates back to the original frame space using inverse_scale.
Plugin delegation -- If gaze_eng exposes a custom run_pipeline(), delegate to it; otherwise call _default_scene_pipeline().
Post-processing chain -- apply_tip_snapping -> apply_lock_on -> compute_ray_intersections.

FrameContext reads¶

frame, detection_frame, inverse_scale, objects, cached_faces, smoother, locker, snap_hysteresis

FrameContext writes¶

persons_gaze, face_confs, face_bboxes, face_track_ids, all_targets, hits, hit_events, lock_info, ray_snapped, ray_extended

5. Core Processing Classes¶

File: gaze_processing.py

GazeSmootherReID¶

Temporal EMA smoothing combined with re-identification across frames.

Tracks faces using position proximity and colour histogram matching.
Lost tracks remain in the buffer for reid_grace_seconds (grace period) before being discarded.

smooth_and_track(detections, gaze, face_crops, bboxes)
    -> (persons_gaze, track_ids)

GazeLockTracker¶

Fixation lock-on mechanism.

When a participant gazes near the same object for >= dwell_frames consecutive frames, their gaze is locked to that object.

update(persons_gaze, face_bboxes, objects, hit_events)
    -> (lock_info, updated_gaze)

SnapHysteresisTracker¶

Adaptive snap with hysteresis to prevent rapid switching between snap targets.

Weighted scoring combines three factors:
snap_w_dist -- distance from ray to target
snap_w_size -- angular size of target
snap_w_intersect -- ray-bbox intersection depth
switch_frames sets the minimum number of frames before the tracker will change its snap target.

6. Ray Geometry¶

Files: gaze_processing.py, utils/geometry.py

Function	Signature	Description
`pitch_yaw_to_2d`	`(pitch, yaw) -> ndarray`	Converts pitch/yaw angles to a 2D direction vector
`ray_hits_box`	`(origin, endpoint, x1, y1, x2, y2) -> bool`	Liang-Barsky ray-box intersection test
`ray_hits_cone`	`(origin, direction, half_angle, x1, y1, x2, y2) -> bool`	Cone-box intersection test
`extend_ray`	`(origin, endpoint, length) -> ndarray`	Extends a ray to a new endpoint at the given length
`bbox_center`	`(x1, y1, x2, y2) -> ndarray`	Returns the center point of a bounding box
`bbox_diagonal`	`(x1, y1, x2, y2) -> float`	Returns the diagonal length of a bounding box

7. Post-Processing Functions¶

apply_tip_snapping¶

apply_tip_snapping(persons_gaze, ray_snapped, ray_extended, gaze_eng, gaze_cfg)

Operates in extend/snap mode. Extends gaze rays toward detected objects and snaps ray tips when within the configured threshold.

apply_lock_on¶

apply_lock_on(persons_gaze, locker, objects)

Applies fixation lock using the GazeLockTracker. If a participant has been fixating on an object long enough, overrides the raw gaze with the locked target.

compute_ray_intersections¶

compute_ray_intersections(persons_gaze, face_confs, track_ids, face_objs, objects, gaze_cfg)

Tests ray-bbox or ray-cone intersection for every (person, object) pair. Filters results through hit_conf_gate (minimum face-detection confidence) and detect_extend (whether to extend rays that miss all objects).

8. Global Motion Compensation¶

When the camera is handheld or mounted on a moving platform, global scene motion can cause false gaze shifts. The gaze pipeline includes an optional global motion compensation step that estimates inter-frame camera motion (via sparse optical flow) and subtracts it from gaze angle deltas before temporal smoothing. This prevents the smoother from integrating camera jitter into gaze tracks.

9. GazeToolkit¶

Extensible toolkit class designed for plugins to add custom processing steps. Plugin authors can subclass GazeToolkit and override or add methods to inject behaviour at any stage of the pipeline.

10. Backends¶

Backend	Model type	Granularity	Notes
MGaze	ONNX or PyTorch (auto-detected)	Per-face	Default gaze estimation backend
L2CS	PyTorch	Per-face	L2CS-Net architecture
UniGaze	ViT	Per-face	Vision Transformer backbone
Gazelle	DINOv2	Scene-level	Processes the full scene rather than individual faces

All backends implement the GazePlugin interface, which requires at minimum an estimate() method and optionally a run_pipeline() override for scene-level models.