Architecture Deep Dive¶
Overview¶
MindSight is a thin orchestrator (MindSight.py, ~718 lines) that wires together four pipeline stage modules. All stages communicate through a shared FrameContext object created once per frame. The orchestrator itself contains no domain logic -- it simply sequences the stages, manages the video capture loop, and holds run-level state such as smoothers, trackers, and output paths.
The four stages are:
- Object Detection -- detect people and objects in the frame.
- Gaze Tracking -- estimate gaze rays for each detected face.
- Phenomena Analysis -- derive higher-level social gaze phenomena.
- Data Collection -- accumulate metrics and write outputs.
Module Dependency Graph¶
graph TD
MS[MindSight.py] --> DP[ObjectDetection/detection_pipeline]
MS --> GP[GazeTracking/gaze_pipeline]
MS --> PP[Phenomena/phenomena_pipeline]
MS --> DC[DataCollection/data_pipeline]
DP --> OD[ObjectDetection/object_detection]
DP --> MF[ObjectDetection/model_factory]
GP --> GPR[GazeTracking/gaze_processing]
GP --> GI[GazeTracking/__init__]
GP --> GBE[GazeTracking/Backends/*]
PP --> PT[Phenomena/phenomena_tracking]
PP --> PC[Phenomena/phenomena_config]
PP --> PH[Phenomena/helpers]
PP --> PD[Phenomena/Default/*]
DC --> CSV[DataCollection/csv_output]
DC --> HM[DataCollection/heatmap_output]
DC --> DB[DataCollection/dashboard_output]
MS --> CFG[pipeline_config.py]
MS --> PLG[Plugins/__init__.py]
PLG --> GREG[gaze_registry]
PLG --> OREG[object_detection_registry]
PLG --> PREG[phenomena_registry]
Plugins/__init__.py provides base classes and registries that each plugin subdirectory hooks into. pipeline_config.py defines FrameContext and the configuration dataclasses consumed by every stage.
The Orchestrator: MindSight.py¶
main()¶
Entry point. Parses CLI arguments (each module registers its own flags), loads an optional pipeline YAML file, and dispatches into either single-video mode or project mode (batch processing a directory of videos).
run(video_path, args, ...)¶
Opens the video capture and creates the per-run tracker objects that persist across frames:
GazeSmootherReID-- temporal smoothing of gaze rays with re-identification.GazeLockTracker-- fixation lock-on / dwell detection.SnapHysteresisTracker-- hysteresis-based snap-to-object logic.ObjectPersistenceCache-- short-term memory for disappeared objects.
These are bundled into run_ctx_base and seeded into every FrameContext. The function then iterates frames, calling process_frame() for each one.
process_frame(ctx, ...)¶
Calls the four pipeline stages in order:
run_detection_step(ctx, ...)run_gaze_step(ctx, ...)update_phenomena_step(ctx)collect_frame_data(ctx, ...)
After the stages complete, it handles display rendering and dashboard updates.
_build_from_args(args)¶
Factory function that reads the argparse namespace and instantiates the correct model backends (e.g., YOLO variant, Gazelle variant) via the model factory and plugin registries.
Pipeline Stages¶
| Stage | Entry Point | Module |
|---|---|---|
| Detection | run_detection_step(ctx, ...) |
ObjectDetection/detection_pipeline.py |
| Gaze | run_gaze_step(ctx, ...) |
GazeTracking/gaze_pipeline.py |
| Phenomena | update_phenomena_step(ctx) |
Phenomena/phenomena_pipeline.py |
| Data Collection | collect_frame_data(ctx, ...) |
DataCollection/data_pipeline.py |
Each stage reads from and writes to the FrameContext. See FrameContext Reference for the full key registry.
Configuration Hierarchy¶
All configuration dataclasses follow the same pattern: they are constructed via a from_namespace(args) class method that pulls values from the argparse namespace.
- GazeConfig -- ray parameters, snap distance, cone angle, smoothing window, gaze lock thresholds.
- DetectionConfig -- confidence threshold, COCO class IDs, blacklist labels, detection scale factor.
- TrackerConfig -- gaze lock frames, dwell frame count, skip frames, re-ID parameters (IoU, feature distance).
- OutputConfig -- save directory paths, PID map file, anonymization mode, video writer settings.
- PhenomenaConfig -- per-phenomenon enable/disable toggles and their individual thresholds (e.g., mutual gaze angle, joint attention confirmation frames).
All of these are defined in pipeline_config.py or in their respective module configs (e.g., Phenomena/phenomena_config.py).
CLI Argument Registration¶
Each module owns its CLI flags through an add_arguments(parser) function. During startup, MindSight.py calls each module's registration function in turn:
ObjectDetection → add_arguments(parser)
GazeTracking → add_arguments(parser)
Phenomena → add_arguments(parser)
DataCollection → add_arguments(parser)
Plugins (each) → add_arguments(parser)
This keeps flag definitions co-located with the code that consumes them. Plugins also participate: each discovered plugin can register its own flags, which appear alongside the built-in ones.
Per-Frame Sequence¶
sequenceDiagram
participant MS as MindSight.py (run loop)
participant DP as detection_pipeline
participant GP as gaze_pipeline
participant PP as phenomena_pipeline
participant DC as data_pipeline
participant DB as dashboard_output
MS->>MS: ctx = FrameContext(frame, frame_no, **run_ctx_base)
MS->>DP: run_detection_step(ctx, ...)
DP-->>MS: ctx now has all_dets, persons, objects
MS->>GP: run_gaze_step(ctx, ...)
GP-->>MS: ctx now has face_bboxes, persons_gaze, hits, hit_events
MS->>PP: update_phenomena_step(ctx)
PP-->>MS: ctx now has confirmed_objs, phenomenon events
MS->>DC: collect_frame_data(ctx, ...)
DC-->>MS: metrics accumulated, CSV rows buffered
MS->>DB: update dashboard / render overlays
GUI Architecture¶
The GUI is built with PyQt and lives in the GUI/ directory.
main_window.pycreates the application window with three tabs:GazeTab(gaze_tab.py) -- live video view with gaze overlay, controls for starting/stopping tracking.VPBuilderTab(vp_builder_tab.py) -- visual pipeline builder for constructing and editing pipeline YAML configurations.-
ProjectTab(project_tab.py) -- project configuration and batch processing. Two-panel layout with a configuration panel (pipeline, participants, conditions, output settings) and a monitoring panel (sources, preview, progress, log). Supports importing pipeline settings from the Gaze Tab, visual editing of participant labels and condition tags, and custom output directories. -
workers.pycontainsthreading.Thread-based workers that run the tracking pipeline in the background so the UI remains responsive.ProjectWorkeraccepts aProjectConfigand uses it for per-video metadata (condition tags, participant labels) and post-processing (global and per-condition CSV generation). pipeline_dialog.pyprovides import/export for pipeline YAML files._namespace_to_yaml_dict()converts a namespace to a structured YAML dict.widgets.pyandphenomena_panel.pysupply reusable UI components.plugin_panel.pyrenders plugin configuration controls dynamically from registered plugins.
Plugin Integration Points¶
Plugins hook into three registries defined in Plugins/__init__.py:
| Registry | Purpose | Example |
|---|---|---|
gaze_registry |
Alternative gaze estimation backends | Gazelle |
object_detection_registry |
Alternative detection backends | Custom YOLO variants |
phenomena_registry |
New social gaze phenomena | Custom attention metrics |
Auto-discovery scans Plugins/ subdirectories at startup. The gaze registry also scans GazeTracking/Backends/ for built-in gaze backends (MGaze, L2CS, UniGaze). Each plugin package exposes a registration function that inserts itself into the appropriate registry. Plugins can also provide add_arguments(parser) to register their own CLI flags.
Auxiliary Video Streams¶
MindSight supports optional per-participant auxiliary video feeds (e.g., eye cameras, first-person views) that are frame-synchronised with the main source. Auxiliary streams are configured via --aux-stream CLI flags or the aux_streams YAML section and are parsed into AuxStreamConfig instances. Each frame, the run loop reads one frame from every auxiliary capture and stores them in FrameContext['aux_frames'] keyed by "PID:TYPE". Auxiliary frames are not processed by any built-in pipeline stage but are passed to plugins (gaze backends, phenomena trackers) for consumption.