Attribute Assignment
Assigns categorical or list-type properties to agents using probabilistic strategies driven by data files.
Topic: Attributes
Paths: configs/2021/attributes/attribute_assignment.yaml, comorbidity_assignment.yaml
Overview
The attribute assigner runs against every agent in the population. It supports two assignment modes controlled by attribute.assignment_level:
person_by_residence— processes each residence venue, classifies it by household structure, assigns roles to members, then applies per-role strategies. Used byattribute_assignment.yaml(ethnicity).person— processes each person independently with no household context. Used bycomorbidity_assignment.yaml(comorbidities).
All keys are parsed by the same AttributeAssignmentConfig class; keys irrelevant to the chosen mode are ignored.
Keys
| Key | Description |
|---|---|
attribute |
Attribute name, data type, and assignment mode |
required_attributes |
Attributes that must already be assigned before this runs |
region_mapping |
Maps geo unit names to names used in data files |
categories |
Value bands (e.g. age ranges) used for data lookups |
roles |
Household roles mapped to residence subsets |
household_structures |
Composition patterns that classify each household |
data_sources |
Probability tables loaded from CSV |
assignment_rules |
Per-structure or per-person assignment strategies |
venue_assignment_rules |
Fallback rules for non-household residence venues |
settings |
Execution options, error handling, logging |
attribute
# Person-by-residence mode (attribute_assignment.yaml)
attribute:
name: "ethnicity"
data_type: "categorical"
assignment_level: "person_by_residence"
household_venue_types: ["household"]
# Person mode (comorbidity_assignment.yaml)
attribute:
name: "comorbidities"
data_type: "list"
assignment_level: "person"
name is the key written to person.properties. data_type is "categorical" for a single value or "list" for multiple conditions assigned simultaneously.
assignment_level: "person_by_residence" enables the household structure and role pipeline. household_venue_types lists which residence venue types are processed through that pipeline; residents of other venue types (e.g. care_home, boarding_school) are handled by venue_assignment_rules instead.
assignment_level: "person" bypasses household context entirely — every person is processed individually.
required_attributes
required_attributes:
- name: "ethnicity"
required: true
error_if_missing: true
mapping:
W: "W"
O: "CO"
Lists attributes that must already be present on the person before this assigner runs. error_if_missing: true halts on any person lacking the attribute; false skips them. mapping translates internal attribute values to the codes used in data file columns (e.g. internal "O" → CSV column "CO").
Accepts both list and dict formats.
region_mapping
region_mapping:
"East of England": "East"
"Yorkshire and The Humber": "Yorkshire and The Humber"
Maps geo unit names (as stored on the GeographicalUnit) to the names used in data file rows or columns. Referenced by a data source entry with mapping: "region_mapping". Omit if data files use the same names as the geography.
categories
categories:
- name: "Adults (30-49)"
symbol: "A3049"
csv_value: "30-49"
attribute: "age"
type: "numerical"
numerical:
min: 30
max: 49
Defines value bands used when a data source maps a continuous person attribute (e.g. age) to a discrete CSV row. csv_value is the string looked up in the data file. max: null means no upper bound. type: "categorical" is also supported, with categorical.allowed_values: [...] listing matching values.
Categories are matched in definition order; the first match wins.
roles
roles:
primary_adult:
type: primary
subsets: ["Adults"]
secondary_adult:
type: secondary
subsets: ["Adults"]
children:
subsets: ["Kids", "Young Adults"]
Roles map household subsets (defined in households_config.yaml) to assignment order and inheritance behaviour. type controls when a role is assigned:
| Type | Condition |
|---|---|
primary |
First person in the subset |
secondary |
After a primary exists for the same subset |
extra |
After both primary and secondary exist |
general |
Any matching person, regardless of count |
A role may list multiple subsets. Roles are defined globally and referenced by name in assignment_rules.
household_structures
household_structures:
Family:
description: "Households with children"
inheritance: true
matching_rules:
- actual:
- ">=1 >=0 >=0 >=0"
description: "Any household with kids"
- actual:
- "0 >=1 1 <=2"
original:
- ">=2 >=0 1 0"
description: "Demoted family"
Independents:
inheritance: false
matching_rules:
- actual:
- "0 >=0 >=0 >=0"
description: "Catch-all for no-kid households"
Each named structure is tested in definition order; the first match wins. inheritance: true enables child-inherits-from-parent logic in the inheritance and reverse_inheritance strategies.
Pattern format is "Kids YoungAdults Adults OldAdults" with operators N (exact), >=N, <=N. When a rule specifies both actual and original, both must match. When only one is given, only that is checked.
data_sources
data_sources:
geo_distribution:
type: "csv_lookup"
files:
- path: "data/population/ethnicity/ethnicity_5groups_by_OA.csv"
key_column: "geo_unit"
value_columns:
W: "W"
A: "A"
total_column: "total"
fallback:
W: 0.81
A: 0.09
comorbidity_probabilities:
type: "csv_lookup"
files:
- path: "data/population/comorbidities/comorbidities_by_region.csv"
key_columns:
sex:
attribute: "sex"
age_band_min:
attribute: "age"
type: "category_lookup"
region:
attribute: "geographical_unit"
type: "ancestor_lookup"
level: "XLGU"
property: "name"
mapping: "region_mapping"
value_columns:
cvd: "has_had_cvd_diagnosis_count_midpoint_rounded"
crd: "has_had_crd_diagnosis_count_midpoint_rounded"
fallback:
cvd: 0.01
crd: 0.01
key_column (single string) or key_columns (dict) defines the lookup key. In key_columns, each entry maps a column name to a person attribute path plus an optional lookup type:
| Lookup type | Behaviour |
|---|---|
| (omitted) | Direct value from person.attribute |
"category_lookup" |
Maps continuous value to a category csv_value via categories |
"ancestor_lookup" |
Traverses the geo hierarchy to the named level and reads property |
mapping references a top-level region_mapping dict to translate the looked-up value before matching.
value_columns maps internal names to CSV column names. fallback supplies defaults when the key is not found; use fallback: "uniform" to draw uniformly, or name another data source to chain.
assignment_rules
Person-by-residence mode (attribute_assignment.yaml) — rules keyed by structure name:
assignment_rules:
Family:
rules:
- role: "primary_adult"
priority: 1
assignment:
strategy: "probabilistic"
data_source: "geo_distribution"
context: "household.geo_unit"
- role: "secondary_adult"
priority: 2
assignment:
strategy: "partnership"
data_source: "pair_probabilities"
partner_role: "primary_adult"
context: ["household.geo_unit", "primary_adult.ethnicity"]
fallback:
strategy: "probabilistic"
data_source: "geo_distribution"
context: "household.geo_unit"
- role: "children"
priority: 4
assignment:
strategy: "inheritance"
inherit_from:
roles: ["primary_adult", "secondary_adult"]
logic:
- when: "count(unique_values) == 1"
then: "values[0]"
- when: "count(unique_values) > 1"
then: "M"
fallback:
strategy: "probabilistic"
data_source: "geo_distribution"
context: "household.geo_unit"
- role: "primary_elder"
priority: 5
assignment:
strategy: "reverse_inheritance"
inherit_from:
role: "primary_adult"
logic:
- when: "primary_adult.ethnicity in ['W', 'A', 'B', 'O']"
then: "primary_adult.ethnicity"
- when: "primary_adult.ethnicity == 'M'"
then:
strategy: "probabilistic"
data_source: "geo_distribution"
role may be a string or a list (all listed roles receive the same rule). priority controls order within a structure; lower runs first.
Assignment strategies:
| Strategy | Behaviour |
|---|---|
probabilistic |
Sample from data_source keyed by context |
partnership |
Sample conditional on partner_role's already-assigned value |
inheritance |
Derive value from parent roles using logic conditions |
reverse_inheritance |
Infer parent value from an already-assigned child role |
constant |
Assign a fixed value |
logic is a list of when/then pairs evaluated in order. inherit_from.roles collects values from multiple roles (forward); inherit_from.role names a single role (reverse).
Person mode (comorbidity_assignment.yaml) — rules keyed by the literal string person:
assignment_rules:
person:
rules:
- priority: 1
assignment:
strategy: "probabilistic_conditions"
data_source: "comorbidity_probabilities"
selection_method: "independent_bernoulli"
conditions:
- name: "cvd"
label: "Cardiovascular Disease"
- name: "crd"
label: "Chronic Respiratory Disease"
probabilistic_conditions assigns a list of conditions independently. independent_bernoulli samples each condition as a separate Bernoulli trial. Each entry in conditions must correspond to a key in data_source.value_columns.
venue_assignment_rules
venue_assignment_rules:
- venue_types: ["care_home", "boarding_school"]
assignment:
strategy: "probabilistic"
data_source: "geo_distribution"
context: "venue.geo_unit"
Applied to persons residing in venue types not listed in attribute.household_venue_types. Uses the same assignment block as household rules. Omit if all residence types are covered by household_venue_types.
settings
The engine reads exactly two keys, both optional logging switches:
settings:
logging:
detailed_assignment_logging: false # default false — very verbose per-person log
show_attribute_distribution: true # default true — summary table at end of run
detailed_assignment_logging: true logs every per-person assignment (debug only).
show_attribute_distribution: false suppresses the end-of-run value-distribution table.
Nothing else under settings: is read. Earlier configs carried random_seed,
normalize_probabilities, cache_lookups, error_handling, and an
assignment_order map — none of which the engine ever consulted (probabilities
are always normalised; member processing order is determined by role
dependencies, with person id as the stable tie-breaker). Those keys have been
removed; omit them.