Methodology
Transparency is core to Pay Lens. This page documents every step between the raw government data and the numbers you see on screen.
Data Source
Pay Lens is built on data published under the Ontario Public Sector Salary Disclosure Act, 1996. Every year, the Government of Ontario requires public sector organizations to disclose the names, positions, salaries, and taxable benefits of all employees earning $100,000 or more in a calendar year. This disclosure is commonly known as the “Sunshine List.”
The data is published annually, typically in late March, and covers the preceding calendar year. It spans all major public sector categories: provincial ministries, municipalities, hospitals, universities, colleges, school boards, police services, crown agencies, and more.
Each record in the raw data includes:
- Employer — the organization name
- Surname & Given Name
- Position / Job Title
- Salary Paid — gross salary for the calendar year
- Taxable Benefits — employer-paid taxable benefits
The data is available from the Ontario Public Sector Salary Disclosure page.
ETL Pipeline
Raw CSV files pass through a five-stage extract-transform-load pipeline before reaching the front-end. Each stage is deterministic and reproducible.
Download
Fetch the latest annual CSV files from the Ontario government open data catalogue.
Normalize
Clean names (capitalization, whitespace), standardize titles, and parse salary and benefit amounts into consistent numeric formats.
Enrich
Attach region tags, assign role families via fuzzy matching, and compute CPI-adjusted salaries in constant 2025 dollars.
Validate
Run automated assertions on record counts, salary ranges, and referential integrity. The pipeline halts on any failure.
Load
Export validated data to Parquet (analytics) and JSON (web) formats ready for the front-end application.
Job Title Normalization
The raw Sunshine List contains thousands of free-text job titles. Because each employer enters titles independently, the same role may appear under many different spellings and abbreviations. Without normalization, aggregating or comparing salaries by role is unreliable.
Pay Lens uses rapidfuzz, a high-performance fuzzy string matching library, to map raw titles to a curated set of canonical role names. The matching process:
- Strip punctuation, normalize whitespace, and lower-case the raw title.
- Compare against the canonical dictionary using token-set ratio scoring.
- Accept matches above a configurable confidence threshold.
- Unmatched titles are flagged for manual review and tagged as “Other” until resolved.
The current target is 85%+ coverage — meaning at least 85% of all records map to a recognized canonical title. The remaining records retain their original titles.
| Raw Title | Canonical Title |
|---|---|
| Sr. Manager, Operations | Senior Manager |
| Snr Manager Operations | Senior Manager |
| Assoc. Professor | Associate Professor |
| Assoc Prof | Associate Professor |
| Reg. Nurse | Registered Nurse |
| RN - Emergency Dept | Registered Nurse |
Employer Deduplication
Employer names in the raw data are not standardized. The same organization may appear under slightly different names across years or even within a single year's disclosure. Common variations include differences in capitalization, punctuation, the use of “The” prefixes, French/English variants, and name changes following mergers.
Pay Lens maintains a curated alias table that maps variant names to a single canonical employer identifier. This table is stored in version control alongside the ETL code, making every change transparent and auditable.
The alias table is updated each year when new data is released, and community contributions to identify missed aliases are encouraged.
Region Tagging
The raw Sunshine List data does not include any geographic metadata. To enable geographic analysis — such as the pay map, regional salary benchmarks, and regional comparisons — Pay Lens assigns each employer to a Statistics Canada Census Division.
This mapping is maintained as a hand-curated lookup table: each employer is associated with the Census Division where its headquarters or primary office is located. The Census Division boundaries are defined by Statistics Canada and provide a consistent, well-documented geographic framework across the province.
This approach enables features like geographic heatmaps and regional benchmarking, but comes with an important caveat: the region reflects the employer's administrative location, not necessarily where every individual employee works (see Limitations).
Inflation Adjustment
Comparing a salary from 1996 to one from 2025 in raw (nominal) dollars is misleading because the purchasing power of a dollar changes over time. Pay Lens converts all historical salaries to constant 2025 Canadian dollars using the Consumer Price Index (CPI) published by the Bank of Canada.
The adjustment formula is:
adjusted = nominal × (CPI2025 / CPIyear)
In practice, each year has a pre-computed multiplier. The table below shows sample values for key years:
| Year | Multiplier | $100K Nominal | In 2025 Dollars |
|---|---|---|---|
| 1996 | 1.670 | $100,000 | $167,000 |
| 2000 | 1.558 | $100,000 | $155,800 |
| 2010 | 1.275 | $100,000 | $127,500 |
| 2020 | 1.081 | $100,000 | $108,100 |
| 2025 | 1.000 | $100,000 | $100,000 |
CPI data is sourced from the Bank of Canada via their Valet API. The full adjustment table covering every year from 1996 to 2025 is embedded in the application source code.
Multi-Source Income
Some individuals appear on the Sunshine List under multiple employers in the same disclosure year. This can occur when a person holds concurrent appointments — for example, a physician who is salaried by both a hospital and a university, or an executive on multiple public agency boards.
The raw government data publishes each employer relationship as a separate row and does not aggregate or cross-reference them. Pay Lens identifies these multi-employer appearances and flags them in the interface so that users can see the full picture of an individual's publicly funded compensation.
Multi-employer records are matched by name and year. Because the data does not include unique personal identifiers, there is a small risk of false matches where two different people share the same name (see Limitations).
Anomaly Detection
Pay Lens automatically flags records that exhibit statistically unusual patterns in year-over-year salary data. The goal is to surface records that merit closer inspection — not to imply wrongdoing.
A record is flagged if it meets any of these criteria:
- Large increase: year-over-year salary increase exceeding 40%
- Large decrease: year-over-year salary decrease exceeding 30%
- High-value new entry: first appearance on the list with a salary above $200,000
- Multi-employer: the individual appears under two or more employers in the same year
There are many legitimate explanations for flagged records, including:
- Promotions or role changes
- Partial-year employment (start/end mid-year)
- Retroactive pay adjustments or settlements
- Transition between part-time and full-time status
- Sabbatical or leave in the prior year
Flagged records are collected on the Anomalies page, where they can be filtered and explored in detail.
Limitations
No dataset is perfect. Understanding the limitations of the Sunshine List data is essential for interpreting it responsibly.
$100K threshold only
The Sunshine List only captures employees earning $100,000 or more. Entry-level, mid-career, and most part-time salaries are invisible in this dataset.
Eroding threshold
The $100,000 threshold set in 1996 is equivalent to approximately $167,000 in 2025 dollars. The list has grown from roughly 4,500 records in 1996 to over 260,000 today, largely because inflation has pushed more employees above a fixed nominal line.
No private sector comparison
This dataset covers the public sector only. Pay Lens does not include private sector salary data, so direct public-vs-private comparisons cannot be made within the platform.
Title normalization is approximate
The fuzzy matching process targets 85% coverage, meaning roughly 15% of records retain their original free-text titles. Some misclassifications are inevitable.
Region reflects employer HQ
Geographic tagging is based on the employer's headquarters or primary office, not the employee's actual work location. A hospital system headquartered in Toronto may employ staff across multiple cities.
Taxable benefits only
The "benefits" column captures taxable benefits only (e.g., car allowances, housing). Non-taxable benefits like health insurance premiums and pension contributions are not disclosed.
Contractors excluded
Self-employed contractors and consultants paid by public sector organizations are not covered by the Salary Disclosure Act, even if they earn above $100,000.
Name-based matching
Multi-employer and year-over-year tracking relies on name matching. Common names may produce false positives, and name changes (e.g., marriage) may break linkage.
For broader Canadian wage context, Statistics Canada publishes detailed labour force data through the Labour Force Survey.
Open Source
Transparency extends to the code itself. The entire ETL pipeline is open source and available for inspection, reproduction, and contribution.
ETL Pipeline
Python-based data pipeline for downloading, cleaning, and enriching Sunshine List data. Licensed under the MIT License.
View on GitHubData License
Processed datasets are released under the Creative Commons Attribution 4.0 (CC BY 4.0) license. You are free to share and adapt the data with attribution.
CC BY 4.0 LicenseContributions are welcome — whether it is fixing a misclassified title, adding a missing employer alias, improving region mappings, or enhancing the pipeline logic. Open an issue or pull request on GitHub.
This data only includes public sector employees earning $100,000 or more. Entry-level and mid-range salaries are not captured.