Survey-based EPCs are scalable, but they remain assumption-heavy. HEM improves the quality of the model, but it does not remove the gap between modelled and in-use performance.
Measurements anchor assessments in reality, while models remain more explainable and predictive. H3 is intended to bridge those strengths.
H3 is an open-source Python CLI toolkit built by Vulcan and Knauf Energy Solutions. It helps practitioners:
- Calculate a robust output-based HTC from HEM's dynamic outputs.
- Calibrate a HEM model against measured HTC in a transparent, reproducible workflow.
- Inspect results quickly through JSON outputs and an interactive HTML report.
Why measurement matters
The methodology behind Energy Performance Certificates (EPCs) is changing from SAP ('Standard Assessment Procedure') to the HEM ('Home Energy Model'). Compared with SAP's approximate monthly calculation, HEM's sub-hourly simulation provides a much richer representation of building performance, including estimates of internal temperatures, peak space-heating and cooling demand, heating-system efficiency, and unmet demand.
But even with HEM, survey-based assessments still rely on assumptions about real buildings. That is practical and scalable, but it can hide poor-performing homes that appear "typical" on paper.
That matters because better retrofit decisions, stronger quality assurance, and any future move toward measured EPCs all depend on being able to check model outputs against real performance. This concern is reflected in EPC reform consultations, with growing emphasis on fabric-focused metrics and increasing attention to heat-loss indicators such as HTC ('Heat Transfer Coefficient'). Fabric-focused measures matter because they say more about the thermal quality of the building itself, and less about short-term occupant behaviour, controls, or fuel choice.
HTC is the rate of heat loss per degree of temperature difference (W/K), in effect a single number that summarises how thermally leaky a home is. That makes it a useful basis for whole-home heat-loss comparison.
A growing range of methods can estimate whole-home HTC, from co-heating and shorter dynamic tests such as QUB to in-use smart-meter approaches such as SMETER. More rapid or lower-cost methods are more scalable, but typically come at the cost of greater measurement uncertainty. Measurements can show what a home is really doing, but they are not self-explanatory: results still depend on method, timing, weather, and how the data is analysed.
SAP calculates an 'Inputs-Based' HTC from fabric and ventilation assumptions. HEM reports a static HTC in the results_static.csv output file, but that value is not directly interchangeable with SAP. Neither metric gives you the same thing as a dynamic output-based HTC under changing weather, solar gains, ventilation, and thermal mass.
H3 is designed to derive that output-based HTC from HEM in a way that is more comparable with in-use measurement. That creates a practical basis for calibration and validation: measured performance can challenge the model, while the calibrated model still helps explain and predict building behaviour.
How H3 calculates HTC
HEM exposes timestep heat-balance terms, but naive per-timestep "heat loss divided by delta-T" is unstable when:
- the delta between internal and external temperatures approaches zero;
- dynamic effects such as solar gains, thermal mass, and ventilation behaviour shift the gradient in a seasonal way; and
- in warmer periods, HEM can model window opening to manage overheating risk.
H3 therefore uses a perturbation-regression approach. It reruns the same model with small external temperature offsets (default +/-1 K, plus a base run) and derives HTC from the gradient of the heat-loss response.
To keep comparisons fair, H3 uses controlled heating assumptions by default and makes key analysis choices explicit, including masking and time-window controls when tighter comparability is needed.
The result is a stable output-based HTC metric intended to align more closely with measured in-use HTC.
Today, H3 relies on a patched Rust HEM engine. The intent is not to maintain a permanent fork, but to upstream the minimal required changes for H3 over time.
Calibrating models with measurement
HEM offers useful advantages for calibration. Internal temperatures and other dynamic outputs give more evidence than trying to match a single number, reducing the risk of getting a good fit for the wrong reasons. And because HEM represents solar gains and thermal-mass behaviour directly, it can test whether measured HTC estimates remain consistent under stated boundary conditions.
In the initial release, H3 uses a single-scalar calibration against a single measured HTC figure. A two-scalar approach for fabric and ventilation was considered, but not pursued because one measurement does not provide enough information to identify those effects robustly and independently.
Instead, H3 solves for one shared heat-loss scalar, applied across both fabric and ventilation control points:
- Fabric control point: scaling relevant fabric and thermal-bridging inputs.
- Ventilation control point: scaling the aggregated ACH (Air Changes per Hour) term used in ventilation heat-loss calculations.
Calibration targets the same output-based HTC metric described above. The workflow runs baseline, scalar calculation, calibration, then optional iterative refinement while keeping the comparison window fixed for consistency.
Fairly comparing measurement methods
The value of any measurement depends on two things: the boundary conditions under which it is generated, and the trade-offs each test design makes between disruption, duration, and confidence.
That means different methods can produce different HTC estimates for the same dwelling. H3 therefore encourages calibration inputs to carry structured metadata such as method, time window, provenance, and uncertainty, so comparisons remain auditable and comparable over time.
Not all of that metadata is used directly in optimisation today, but it still matters for traceability and for future uncertainty-aware calibration, especially where uncertainty is higher or not directly comparable across methods or providers.
What comes next
Next, we want to make H3 easier to adopt, improve the sophistication of calibration, and build a reusable evidence base for comparing methods, conventions, providers, and calibrated outputs. That means clearer standalone access and documentation, richer calibration objectives, and a structured artefact database for transparent QA and cross-method comparison.
Measured EPCs will only be credible at scale if model outputs can be compared with real-world performance in a transparent and repeatable way. H3 is our attempt to make that practical with HEM.
We plan to make H3 publicly available shortly after a short pre-release period for validation and documentation.
If you work on EPC reform, measured HTC methods, or HEM-based assessment workflows, we would be very interested to hear from you.
