World Creation Pipeline
create_world.py builds a synthetic population world — geography, venues, people, households, and social networks — and serialises it to HDF5 for simulation.
Running the script
python create_world.py --config <path/to/config.yaml> [--filename <output.h5>]
--config selects the master config; --filename overrides the output path set therein.
Pipeline overview
| # | Stage | Description | Config |
|---|---|---|---|
| 1 | Initialisation | Fix random seeds for reproducibility | — |
| 2 | Config & CLI | Load master YAML; apply CLI overrides | Master config |
| 3 | Geography | Load geographical unit hierarchy and coordinates | Master config |
| 4 | Venues | Load venue type catalogue; instantiate all venues | Venues config |
| 5 | Population | Load demographics; generate individual population | Master config |
| 6 | Households | Allocate population to household venues | Households config |
| 7 | World assembly | Aggregate all components into a single World object | — |
| 8 | Timeline pipeline | Run ordered attribute, distributor, and child-creator steps | Attributes, Distributors, Child creators |
| 9 | Relationship pipeline | Build social networks | Social networks |
| 10 | Romantic relationships | Assign sexual orientation and romantic partnerships | Romantic relationships |
| 11 | HDF5 export | Serialise world to HDF5 | Serialisation config |
Stages
1. Initialisation
Sets PYTHONHASHSEED and the global random seed to 0 before any other step. This ensures runs are reproducible: the same config will yield the same world on any machine.
2. Config & CLI
Parses --config and --filename from the command line, then loads the master config YAML. All subsequent stages draw their settings from this file. --filename overrides whichever output path the config specifies.
3. Geography
Constructs the geographical hierarchy — large, medium, and small geographical units (LGU/MGU/SGU) — from the CSV files declared in the master config. Coordinates are loaded at this stage and attached to each unit.
4. Venues
Creates a VenueManager and loads the venue type catalogue from the venues config. All venue instances — schools, workplaces, hospitals, and the like — are populated here before any person is assigned to them.
5. Population
Loads population demographics from the files named in the master config. In matrix mode, individual Person objects are generated from the demographic matrix at this stage; in explicit mode they are loaded directly.
6. Households
If household distribution is enabled, allocates people to household venues according to the rules in the households config and its allocation strategy. Household composition and relationship rules are applied here.
7. World assembly
Combines geography, population, venues, and the household distributor into a single World object. Nae computation happens here; it is purely aggregation before the pipeline stages begin.
8. Timeline pipeline
Executes an ordered sequence of steps defined in the master config timeline section. Each step is one of three kinds:
- Attribute assignment — assigns properties (e.g. ethnicity, comorbidities) to people. Configured via attribute-assignment.
- Venue distributor — allocates people to non-household venues (schools, workplaces, care homes, etc.). Configured via venue distributor and its variants.
- Venue child creator — generates sub-venues within a parent venue (e.g. classrooms within a school). Configured via venue child creators.
Order matters: steps run sequentially, so later distributors can depend on attributes assigned earlier.
9. Relationship pipeline
Builds social networks amongst the population using the configs listed under relationships in the master config. Each network is constructed independently; see social networks for the full schema.
10. Romantic relationships
If enabled, assigns sexual orientation to each person then forms romantic partnerships. Configured via the romantic relationships config.
11. HDF5 export
Serialises the completed world to an HDF5 file. The serialisation config controls which fields are written; omitting a field reduces file size. The output path is set in the master config and may be overridden with --filename.