Venue Distributors
Topic: Distributors
Paths: configs/2021/distributors/school_distributor.yaml, university_distributor.yaml, company_distributor.yaml, hospital_distributor.yaml
Overview
A venue distributor allocates agents to venues and records the assignment on the agent's activity_map. Each YAML file targets one venue_type (school, university, company, hospital). All share the same top-level schema; only the values differ.
Allocation proceeds in phases:
- Special cases — mandatory overrides handled before anything else (e.g. boarding-school students matched to their named school).
- Priority groups — defined sub-populations allocated in order, optionally with per-area probabilities. Each group may permit overflow.
- Normal allocation — remaining eligible agents allocated by the configured strategy and distance rules.
- Fallback — agents still unallocated after all phases are handled according to
fallback.strategy.
Keys
| Key | Description |
|---|---|
distributor_name |
Arbitrary label used in logs |
venue_type |
Must match a key in venues_config.yaml |
activity_map_key |
Key written to person.activity_map on assignment |
subset_key |
Subset the agent is added to within the venue |
special_cases |
Priority overrides applied before eligibility filters |
eligibility |
Who is eligible and how they are prioritised |
venue_selection |
How candidate venues are found for each agent |
allocation |
How a venue is chosen from candidates and capacity managed |
settings |
Execution order, logging, performance |
fallback |
Behaviour when no eligible venue is found |
validation |
Required attributes checked before allocation |
exports |
Optional CSV reports written after allocation |
distributor_name, venue_type, activity_map_key, subset_key
distributor_name: "school_distributor"
venue_type: "school"
activity_map_key: "primary_activity"
subset_key: "student"
venue_type determines which loaded venues are candidates. activity_map_key is the key under which the assigned venue is stored on person.activity_map; the hospital distributor uses "medical_facility" rather than "primary_activity". subset_key is the subset name added to the venue; omit to skip subset assignment.
special_cases
special_cases:
- name: "boarding_school_students"
priority: 1
condition:
person_residence_type: "boarding_school"
allocation_rule:
match_by:
- field: "name"
source: "person.residence.name"
target: "venue.name"
match_type: "exact"
mandatory: true
if_no_match: "error" # "error" | "warn" | "skip" | "fallback_to_normal"
Special cases are checked before any eligibility filter. A matched agent bypasses the normal pipeline entirely. mandatory: true means the match must succeed; if_no_match controls what happens when no venue matches the rule — "error" halts, "warn" logs and continues, "skip" silently leaves the agent unassigned, "fallback_to_normal" re-enters the agent into normal allocation.
eligibility
eligibility:
require_unassigned: true
global_filters:
- attribute: "age"
type: "numerical"
min: 5
max: 19
- attribute: "residence.type"
type: "categorical"
values: ["household", "boarding_school"]
exclude:
households:
original_pattern: "0 >=0 0 0"
attributes:
- name: "sex"
type: "categorical"
venue_column: "Gender"
matching_rules:
"Mixed": ["male", "female"]
"Boys": ["male"]
"Girls": ["female"]
assume_if_missing: "Mixed"
case_sensitive: false
priority_allocation:
enabled: true
priority_order: "age_desc"
groups:
- name: "high_priority_school_age"
priority: 1
allow_overflow: true
search_limits: [20, 70]
filters:
- attribute: "age"
type: "numerical"
min: 5
max: 17
probability_config:
type: "file"
file_path: "data/activities/university/university_probabilities.csv"
lookup_column: "geo_unit"
lookup_attribute: "geographical_unit.name"
probability_column: "prob_uni_18_22"
default: 0.35
require_unassigned — when true, skips anyone who already has activity_map_key assigned.
global_filters apply to all phases. Each filter has an attribute path, a type ("numerical" or "categorical"), and type-specific bounds (min/max) or allowed values. Filters are checked in order; list more restrictive filters first for efficiency. Attribute paths may traverse nested objects using dot notation (e.g. residence.type, properties.workplace_sgu).
exclude.households.original_pattern removes agents from households whose original_pattern property matches the given string.
attributes matches agent properties against venue CSV columns. Each entry names a venue_column and a matching_rules dict mapping CSV values to lists of valid agent values. assume_if_missing supplies a default if the column is absent from the venue data.
priority_allocation.groups are processed in priority order before normal allocation. allow_overflow: true permits the group to exceed venue capacity whilst still respecting attributes constraints. search_limits is a list of candidate counts tried in sequence (e.g. [20, 70] — try 20 closest, then 70). probability_config optionally samples agents stochastically: type: "file" loads a CSV, matching rows by lookup_column against the agent attribute named by lookup_attribute, and reads allocation probability from probability_column; default is used when the agent's geo unit is not found in the file. priority_order: "age_desc" processes older agents first within each group.
venue_selection
venue_selection:
consider_by: "count" # "count" | "distance" | "geo_unit"
count: 10
criteria: "closest" # "closest" | "random" | "largest_capacity"
search_limits: [20, 50]
max_distance: 10
max_distance_unit: "km" # "km" | "miles" | "meters"
venue_geo_level: "SGU" # "SGU" | "MGU" | "LGU"
person_location_source: "geographical_unit.coordinates"
venue_location_source: "coordinates"
distance_metric: "haversine" # "haversine" | "euclidean"
filter_by_geography: true
respect_capacity: true
consider_by controls how candidate venues are identified. "count" selects the count closest venues matching criteria. "distance" selects all venues within max_distance. "geo_unit" restricts to venues sharing the agent's geo unit (used by the company distributor, which matches by workplace_sgu rather than residence).
venue_geo_level declares the geography level at which venue coordinates are stored; the engine traverses the hierarchy when agent and venue levels differ.
person_location_source is the attribute path used to read the agent's location. The company distributor sets this to "properties.workplace_sgu" to match agents to companies near their work location rather than their home.
distance_metric: "haversine" computes great-circle distance from (latitude, longitude) pairs; "euclidean" uses projected coordinates.
search_limits gives a fallback candidate sequence for the global pipeline (individual priority groups may specify their own).
allocation
allocation:
strategy: "random" # "random" | "closest" | "proportional"
capacity_column: "SchoolCapacity"
capacity_handling:
if_missing: "ignore" # "ignore" | "skip" | "default"
default_capacity: 1000
if_zero: "ignore" # "ignore" | "skip"
track_capacity: true
when_full: "exclude" # "exclude" | "overflow"
overflow_behavior:
distribute_evenly: true
max_overflow_per_venue: null
enforce_no_empty_venues: false
batch_by: "geo_unit" # "geo_unit" | "none"
batch_location_source: "centroid"
strategy selects from the candidate set: "random" picks uniformly; "closest" always picks the nearest; "proportional" weights by inverse distance.
capacity_column names the CSV column holding venue capacity. capacity_handling controls what happens when the column is absent or zero. track_capacity enables runtime tracking so full venues are excluded from subsequent candidates. when_full: "exclude" removes a full venue from the candidate set; "overflow" allows over-capacity assignment (always applies to priority groups with allow_overflow: true).
enforce_no_empty_venues — when true, post-allocation step that moves one agent from the most-populated venue to each empty venue, minimising the number of venues with zero occupancy.
batch_by: "geo_unit" groups agents sharing a geo unit and performs a single spatial query for the batch, reducing the number of distance calculations significantly.
settings
settings:
priority: 10
max_allocations: null
verbose: true
log_summary: true
use_spatial_index: true
priority controls execution order across all distributors; higher runs first. use_spatial_index builds a KD-tree over venue coordinates for fast nearest-neighbour queries — disable only for very small venue sets or debugging.
fallback
fallback:
strategy: "skip" # "skip" | "relax_distance" | "relax_capacity" | "assign_closest"
relax_params:
max_iterations: 3
distance_multiplier: 2.0
Applied to any agent still unassigned after all phases. "skip" leaves the agent unassigned. "relax_distance" retries with an expanded search radius, doubling it each iteration up to max_iterations. "relax_capacity" retries ignoring capacity limits. "assign_closest" assigns the nearest venue regardless of all constraints.
The hospital distributor uses "assign_closest" — every agent must be assigned a medical facility.
validation
validation:
required_person_attributes:
- "age"
- "geographical_unit"
required_venue_columns:
- "geo_unit"
- "Latitude"
- "Longitude"
optional_venue_columns:
- "StatutoryLowAge"
- "StatutoryHighAge"
required_person_attributes — agents missing any of these are skipped before allocation. required_venue_columns — venues missing these raise an error. optional_venue_columns — missing values trigger a warning only.
exports
exports:
venue_summary: "output/school_summary.csv"
unallocated_report: "output/school_unallocated.csv"
Optional post-allocation CSV reports. venue_summary writes per-venue occupancy statistics. unallocated_report lists agents that remained unassigned after fallback.
Key differences by venue type
| Config | consider_by |
allocation.strategy |
fallback.strategy |
Notes |
|---|---|---|---|---|
school_distributor |
count |
random |
skip |
Age + gender constraints; boarding-school special case; priority groups by age band |
university_distributor |
count |
proportional |
skip |
Stochastic priority groups with per-SGU probability file |
company_distributor |
geo_unit |
random |
skip |
Matches by workplace_sgu; work_sector must match industry_code |
hospital_distributor |
count (1) |
closest |
assign_closest |
No eligibility filters; respect_capacity: false; everyone assigned |