Data Collection Module¶

Overview¶

The DataCollection/ module is responsible for all output generation in MindSight: CSV summaries, dashboard video overlays, heatmaps, time-series charts, and project-level CSV aggregation. It contains seven files:

File	Purpose
`data_pipeline.py`	Pipeline step coordinator (`collect_frame_data`, `finalize_run`)
`csv_output.py`	Summary CSV writer
`global_csv.py`	Project-level CSV aggregation and per-condition splitting
`dashboard_output.py`	Frame overlay + dashboard compositor
`heatmap_output.py`	Per-participant heatmap generation
`chart_output.py`	Time-series chart generation
`dashboard_matplotlib.py`	Matplotlib-based dashboard rendering for headless/CLI runs

Data Pipeline¶

File: data_pipeline.py

This file coordinates all data collection during and after a run.

`collect_frame_data(ctx, log_csv, frame_no, hit_events, face_track_ids, persons_gaze)`¶

Called once per frame. Responsibilities:

Accumulates the look_counts dictionary, mapping (face_idx, obj_cls) pairs to frame counts.
If a log_csv writer is provided, writes per-hit rows to the open log CSV.
In project mode, prepends video_name and conditions columns to each CSV row (read from ctx).
If heatmap_path is set on the context, accumulates gaze endpoint coordinates for later heatmap generation.

`finalize_run(ctx)`¶

Called once at the end of a run. Responsibilities:

Prints run statistics to the console (total frames processed, hit event count).
Writes the summary CSV via csv_output.write_summary_csv(), passing video_name and conditions from context for project mode.
Generates heatmaps via heatmap_output.save_heatmaps().
Generates charts via chart_output.generate_run_charts().

Summary CSV¶

File: csv_output.py

`resolve_summary_path(summary_arg, source)`¶

Returns a concrete file path or None.

If summary_arg is True, an automatic path is derived from source.
If summary_arg is a string, it is used as-is.

`write_summary_csv(path, total_frames, look_counts, all_trackers, pid_map, video_name, conditions)`¶

Writes the summary CSV with multiple sections:

object_look_time section -- built-in, not contributed by any tracker. Contains per-object gaze duration data derived from look_counts.
Tracker sections -- iterates all_trackers and calls tracker.csv_rows(total_frames, pid_map=pid_map) on each. Every tracker contributes its own section, separated from the previous one by a blank row.

When video_name is not None (project mode), every data row is prepended with video_name and conditions values, and every header row is prepended with the corresponding column names. Comment rows (starting with #) are left unchanged.

Global CSV Aggregation¶

File: global_csv.py

This module handles project-level CSV aggregation, called after all per-video processing is complete.

`generate_global_csv(csv_dir, csv_type)`¶

Combines all per-video CSVs of the given type ("summary" or "events") into a single file (Global_Summary.csv or Global_Events.csv). Comment lines are stripped, and duplicate header rows are deduplicated by comparing against the header from the first file.

Returns the path to the written file, or None if no source files were found.

`generate_condition_csvs(global_csv_path, condition_dir, csv_type)`¶

Splits a global CSV by the conditions column. Each unique tag gets its own CSV. A video with multiple pipe-delimited tags (e.g., "Emotional|Group A") appears in both Emotional_Summary.csv and Group A_Summary.csv. Tag names are sanitized for filesystem safety.

Dashboard Output¶

File: dashboard_output.py

`draw_overlay(ctx, gaze_cfg)`¶

Annotates the current frame with visual indicators:

Gaze rays
Object bounding boxes
Joint attention markers
Lock badges
Convergence markers
Dwell arcs

When lite-overlay mode is active, expensive visuals are skipped for performance.

`compose_dashboard(ctx)`¶

Composes the final display image from the annotated frame and side panels:

Queries each tracker's dashboard_data() method for panel content.
Trackers declare which side they appear on via the dashboard_panel attribute ("left" or "right").
Left and right panels are assembled independently and composited alongside the frame.

`open_video_writer(save_arg, source, cap)`¶

Opens a cv2.VideoWriter for saving the dashboard output to a video file.

`apply_face_anonymization(frame, face_bboxes, mode, padding, ...)`¶

Applies face anonymization to the frame. Supported modes:

blur -- Gaussian blur over face regions.
black -- Solid black rectangles over face regions.

`AnonSmoother`¶

Temporal smoothing class for anonymization bounding boxes. Prevents flickering when face detection is intermittent across frames.

`_draw_panel_section(panel, y, title, colour, rows, line_h)`¶

Internal helper used by trackers that implement the legacy dashboard_section() interface.

Heatmap Output¶

File: heatmap_output.py

`extract_mid_frame(source)`¶

Extracts a single reference frame from the midpoint of the source video, used as the background for heatmap overlays.

`save_heatmaps(path, source, bg, heatmap_gaze, pid_map)`¶

Generates per-participant heatmap images:

Takes the accumulated gaze endpoint coordinates from heatmap_gaze.
Applies Gaussian blur (sigma defined in constants) to produce a density map.
Overlays the density map onto the reference frame.
Saves one PNG file per participant.

`resolve_heatmap_path(heatmap_arg, source)`¶

Returns a concrete directory path or None, following the same convention as resolve_summary_path.

Chart Output¶

File: chart_output.py

`generate_run_charts(path, all_trackers, total_frames, fps, pid_map, data_plugins)`¶

Generates time-series charts for the completed run:

Iterates all trackers and calls time_series_data() on each.
Creates matplotlib subplots for each returned metric.
Supported chart types: area, step, line.

`resolve_chart_path(charts_arg, source)`¶

Returns a concrete directory path or None, following the same convention as the other resolve functions.

Matplotlib Dashboard¶

File: dashboard_matplotlib.py

Provides a matplotlib-based dashboard renderer used in headless and CLI modes (when a Qt display is unavailable). Queries each tracker's dashboard_data() method and renders the panels to a static image that is composited alongside the annotated frame, mirroring the layout of the Qt live dashboard. This module is selected automatically when the GUI is not running.

How Plugins Contribute Data¶

Phenomena trackers and plugins extend data collection by implementing any combination of the following methods:

Method	Return type	Used by
`csv_rows(total_frames, pid_map)`	List of rows	`csv_output.write_summary_csv()`
`dashboard_data(pid_map)`	Dict with `title`, `colour`, `rows`	`dashboard_output.compose_dashboard()`
`time_series_data()`	Dict of metric name to series data	`chart_output.generate_run_charts()`
`console_summary(total_frames, pid_map)`	String	`data_pipeline.finalize_run()`

Each method is optional. A tracker that only cares about CSV output can implement csv_rows alone and ignore the rest.

Extending Data Collection¶

For fully custom output (e.g. writing to a database, generating a PDF report), subclass DataCollectionPlugin and override its hooks. The plugin system will discover and invoke your subclass automatically.

See Plugin System for registration details and the full plugin lifecycle.

Data Collection Module¶

Overview¶

Data Pipeline¶

collect_frame_data(ctx, log_csv, frame_no, hit_events, face_track_ids, persons_gaze)¶

finalize_run(ctx)¶

Summary CSV¶

resolve_summary_path(summary_arg, source)¶

write_summary_csv(path, total_frames, look_counts, all_trackers, pid_map, video_name, conditions)¶

Global CSV Aggregation¶

generate_global_csv(csv_dir, csv_type)¶

generate_condition_csvs(global_csv_path, condition_dir, csv_type)¶

Dashboard Output¶

draw_overlay(ctx, gaze_cfg)¶

compose_dashboard(ctx)¶

open_video_writer(save_arg, source, cap)¶

apply_face_anonymization(frame, face_bboxes, mode, padding, ...)¶

AnonSmoother¶

_draw_panel_section(panel, y, title, colour, rows, line_h)¶