MindSight¶

Unified Object Detection and Gaze Intersection Tracker for Cognitive Sciences Research

MindSight is an open-source toolkit that detects where people look in video, images, and live camera feeds, then maps those gaze vectors onto detected objects to identify social-cognitive phenomena such as joint attention, mutual gaze, and gaze following -- all from a single configurable pipeline.

Beta Notice

MindSight v0.3.0-beta is currently in beta. APIs, configuration formats, and output schemas may change between releases. Please pin your version and check the Changelog before upgrading.

How It Works¶

MindSight processes each frame through a four-stage pipeline:

flowchart LR
    A["Input\nCamera / Video / Image"] --> B["Object Detection\n(YOLO / YOLOE)"]
    A --> C["Face Detection\n(RetinaFace)"]
    B --> D["Object\nBounding Boxes"]
    C --> E["Face\nBounding Boxes"]
    E --> F["Gaze Estimation\n(MGaze / L2CS / UniGaze / Gazelle)"]
    F --> G["Pitch + Yaw\nper Face"]
    D --> H["Ray-BBox\nIntersection"]
    G --> H
    H --> I["Hit List"]
    I --> J["Phenomena Detection\n(JA, Mutual Gaze, etc.)"]
    J --> K["Data Collection\n(CSV, Heatmaps, Dashboard)"]

Object & Face Detection -- locates people, faces, and objects of interest in every frame.
Gaze Estimation -- predicts a 3-D gaze direction (pitch and yaw) for each detected face.
Ray-BBox Intersection -- casts each gaze ray and determines which bounding boxes it hits.
Phenomena & Data Collection -- classifies social-gaze events and writes structured output.

Feature Highlights¶

Core Functionality¶

Frame-by-frame gaze-to-object intersection via ray casting
Swappable object-detection backends (YOLO, YOLOE with visual prompts)
Four swappable gaze-estimation backends (MGaze, L2CS-Net, UniGaze, Gazelle)
Face anonymization for privacy-sensitive recordings
Auxiliary video stream support for multi-camera setups
CLI and GUI interfaces for flexible workflows
YAML-driven pipeline configuration

Phenomena Tracking¶

Joint Attention -- two or more people attending to the same object
Mutual Gaze -- two people looking at each other
Social Referencing -- gaze shifts toward a reference person after an event
Gaze Following -- one person's gaze directing another's
Gaze Aversion -- active avoidance of eye contact
Scanpath Analysis -- sequential fixation patterns over time
Gaze Leadership -- identifying who initiates gaze shifts in a group
Attention Span -- sustained fixation duration on targets

Extensibility¶

Plugin architecture for custom gaze backends, detectors, and phenomena
Drop-in plugin discovery -- add a folder, register in YAML, run
Base classes and hooks for every pipeline stage

Research Tools¶

Per-frame CSV export with full gaze and detection metadata
Aggregated heatmap generation over configurable time windows
Live dashboard with real-time gaze overlay
Project mode for batch processing of multiple videos

Where to Start¶

I'm a researcher

Get MindSight running, process your first video, and explore the phenomena it can detect.
- Getting Started -- installation and first run
- User Guide -- pipeline configuration and workflows
- Phenomena -- detailed descriptions of each tracked phenomenon
I'm a developer

Understand the internals, write plugins, and extend the pipeline.
- Architecture Deep Dive -- how the pipeline fits together
- Plugin System -- extension points and base classes
- Developer Guide -- module references and contribution guidelines