Status: public benchmark explainer for @bilig/headless
This page is the short, shareable version of the WorkPaper benchmark claim. It turns the checked-in artifact into a plain-English evaluation guide without inflating what the benchmark can prove.
The current checked-in WorkPaper-vs-HyperFormula artifact records WorkPaper
46/46 mean-latency wins on scorecard-eligible comparable workloads:
| Lane | Comparable Workloads | WorkPaper Mean Wins | HyperFormula Mean Wins |
|---|---|---|---|
| Overall | 46 |
46 |
0 |
| Public | 38 |
38 |
0 |
| Holdout | 8 |
8 |
0 |
The artifact is
packages/benchmarks/baselines/workpaper-vs-hyperformula.json,
generated at 2026-05-06T14:54:57.091Z.
The overall directional mean-ratio geomean is 0.521767150331573, and the
overall directional p95-ratio geomean is 0.5359737705859149. Ratios below
1.0 mean WorkPaper is faster on that metric.
It proves that the checked-in WorkPaper runtime is faster on mean latency across the current scorecard of directly comparable headless spreadsheet-engine workloads.
The covered families include workbook build and rebuild paths, runtime restore from snapshot, sheet lifecycle, named expressions, dirty execution, batch edits, structural row and column edits, range reads, aggregations, conditional aggregation, and lookup workloads.
It also proves that the benchmark claim is auditable from the repository. The expected scorecard shape is checked by:
pnpm workpaper:bench:competitive:check
It does not prove that bilig is a complete Excel clone.
It does not prove full formula parity with Excel, Google Sheets, or HyperFormula.
It does not prove that every p95 row is faster. The known p95 holdout is
lookup-approximate-duplicates, where the current WorkPaper-to-HyperFormula
p95 ratio is 1.043096403103571. The honest claim is 46/46 mean wins plus an
overall p95 geomean lead, not “faster on every p95 row.”
It does not prove that browser-grid rendering, import/export, collaboration, or every user workload is faster. This benchmark is about the headless WorkPaper runtime path.
It does not prove future results. If the artifact is regenerated and the scorecard changes, the public claim must change with it.
Mean latency answers: “what is the normal cost of this workload?”
p95 latency answers: “what happens near the slow end of this workload’s sample set?”
A workload can win on mean while losing one p95 row when a small number of slower samples move the tail. That is why bilig keeps both the headline mean claim and the p95 caveat visible.
For the benchmark evidence, start with:
docs/headless-workpaper-benchmark-evidence.mdpackages/benchmarks/baselines/workpaper-vs-hyperformula.jsonFor the API surface, run the published package or maintained example:
pnpm add @bilig/headless
cd examples/headless-workpaper
npm install
npm start
Short:
bilig’s WorkPaper benchmark currently records
46/46mean wins against HyperFormula-style headless workloads, with the p95 caveat documented instead of hidden.
Reply-sized:
the useful part is the audit trail: a checked-in benchmark artifact, a verify command, and an explicit p95 caveat. the claim is
46/46mean wins for the current comparable headless WorkPaper workloads, not “we are faster at everything.”