Do not trust stale XLSX cached formula values

An .xlsx file can contain formula text and a cached result from the last app that saved it. That cached result is convenient for previewing a workbook without recalculating it. It is not proof that the formula still returns that value.

For example, a stale cached XLSX value can differ from a fresh Excel recalculation even when the underlying formula is correct.

That distinction matters when you are evaluating @bilig/headless for a Node.js service, an agent tool, or a workbook automation job. A stale cache can make a correct engine look wrong. It can also make a wrong engine look correct.

Use two reports:

cache diagnostic: compare bilig with embedded XLSX cached values
Excel oracle: compare bilig with a workbook copy freshly recalculated and saved by Microsoft Excel

Only the second report is an accuracy verdict.

The rule

Do not call something a Bilig accuracy bug unless the expected value came from a fresh recalculation oracle.

For this harness, the preferred oracle is Microsoft Excel:

Open the workbook.
Force a full recalculation.
Save a recalculated copy into a local output folder.
Compare Bilig output with that recalculated copy’s formula results.

OpenPyXL, SheetJS, and similar libraries are useful for extracting formulas and cached values. They are not used here as formula-calculation oracles.

If Excel is unavailable, the harness marks cells as missing_excel_oracle. Cache-only mismatches stay diagnostic.

Why stale caches happen

XLSX cached formula values can drift for ordinary reasons:

a workbook was saved before dependent cells changed
formulas were edited by a tool that did not recalculate
the workbook uses manual calculation mode
an external-link value was saved into the file
an app wrote formula text but left an old cached result behind
Excel rewrote a function into an unsupported UDF wrapper during recalculation

The last case is subtle. If the recalculated workbook no longer contains the same formula meaning, the harness does not use that cell as an oracle. It is reported as missing_excel_oracle instead of being promoted into a fake correctness bug.

Run the Excel oracle harness

The Excel oracle harness belongs in a workspace package, not as a root-level one-off script. Use the packaged harness once it is installed in this repo.

The commands preserve your original files. All derived workbooks and reports go under the output folder.

The output is deliberately split:

cache-diagnostic.json: Bilig compared with embedded XLSX cached values; useful, but non-authoritative
excel-oracle-report.json: Bilig compared with fresh Excel-recalculated formula results
summary.md: human-readable counts and sanitized true-mismatch samples
github-issues/: optional sanitized issue drafts, written only for true Excel-oracle mismatches

What the summary tells you

The summary separates import, execution, cache freshness, and accuracy:

total workbooks evaluated
import/parser failures
timeout failures
total formula cells
comparable formula cells
Bilig vs fresh Excel match rate
embedded-cache freshness rate
stale-cache false positives
real Bilig mismatches
top formula/function families for true mismatches
sanitized formula samples with expected Excel value and actual Bilig value

Bilig vs fresh Excel match rate is the primary accuracy metric. Embedded-cache freshness rate is a cache-quality metric.

What counts as a real mismatch

A correctness issue needs all of this:

original formula text
fresh Excel expected value
Bilig actual value
workbook/cell repro notes that can be shared safely

The harness only writes github-issues/ drafts for those true Excel-oracle mismatches. File paths, customer names, organization identifiers, and private metadata are redacted from GitHub-ready output.

Cache diagnostic still has value

The older corpus checker is still useful when you understand its boundary:

pnpm workpaper:xlsx-corpus:check -- /path/to/workbooks

It answers this narrower question:

For formula cells that have embedded cached values, does Bilig currently calculate the same values?

That is helpful for triage and regression reduction. It is not enough to claim Excel accuracy.

Public Corpus Timing Budgets

The public workbook corpus verifier records elapsedMs, phaseTimings, and isolated worker peakRssBytes on each scorecard case. Use those fields to keep slow real workbooks visible in the JSON artifact, not only in a progress log.

The regression budget for workbook-364f955dd990c3d4 (command-manning-summary-as-of-21-mar-2025.xlsx, 394 KB, 60,738 cells, 2,219 formula cells) is 30 seconds for the current headless verification path. Enable the optional focused test with:

BILIG_COMMAND_MANNING_MANIFEST=/path/to/manifest-business-recent.json \
BILIG_COMMAND_MANNING_CACHE_DIR=/path/to/recent-workbook-corpus \
pnpm exec vitest run scripts/__tests__/public-workbook-corpus.test.ts -t command-manning

The scorecard phase split identifies whether time is spent in cache reads, footprint inspection, XLSX import, formula oracle comparison, round-trip, or structural smoke work before changing runtime code.

Recent Complex Public Corpus

The 2025-2026 recent-complex lane tracks public workbooks separately from the checked-in reduction corpus:

pnpm public-workbook-corpus:recent-complex:plan
pnpm public-workbook-corpus:discover-recent-complex-github
pnpm public-workbook-corpus:discover-recent-complex-zenodo
pnpm public-workbook-corpus:discover-recent-complex-figshare
pnpm public-workbook-corpus:fetch-recent-complex
pnpm public-workbook-corpus:verify-recent-complex
pnpm public-workbook-corpus:headless-recent-complex

The default CKAN discovery set includes national and regional open-data portals that have produced qualifying recent workbook evidence, including Ontario, Alberta, British Columbia, and HDX, alongside the broader GitHub and Zenodo discovery lanes. The Figshare lane uses public article search and article file metadata, requires usable license evidence, and prioritizes result/analysis/model queries before broad .xlsx searches.

Latest local evidence from May 20, 2026 UTC, using local source @bilig/headless 0.37.2 rebased on origin/main commit 969ecd119:

{
  "targetWorkbookCount": 500,
  "headlessPackageVersion": "0.37.2",
  "sourceCommit": "969ecd119",
  "manifestSourceCount": 6210,
  "manifestArtifactCount": 4531,
  "publicScorecardCaseCount": 4531,
  "publicPassingRecentComplexCount": 533,
  "headlessFileCount": 500,
  "headlessOkFileCount": 500,
  "headlessComparableFormulaFileCount": 500,
  "endToEndPassingWorkbookCount": 500,
  "remainingToTarget": 0,
  "failedErrors": 0,
  "failedTimeouts": 0,
  "formulaCells": 428311,
  "comparableFormulaCells": 427264,
  "matchingFormulaCells": 427264,
  "mismatchedFormulaCells": 0,
  "skippedFormulaCells": 1047,
  "elapsedMs": 439179.77
}

The end-to-end count intentionally requires at least one comparable headless formula cell per selected workbook. Workbooks that only produce stale-cache-risk formula audit evidence without comparable headless formulas remain useful compatibility signals, but they do not count toward the 500-workbook target. The verifier also records worksheet formulas found by bilig’s XLSX formula audit when SheetJS drops empty-cache formula cells, so those files are reported as skipped formula coverage instead of being misread as formula-free workbooks.

Checked-in fixture result

The current checked-in reduction corpus returns:

{
  "summary": {
    "totalFiles": 1,
    "filesProcessed": 1,
    "failedErrors": 0,
    "failedTimeouts": 0,
    "formulaCells": 14,
    "comparableFormulaCells": 14,
    "matchingFormulaCells": 14,
    "mismatchedFormulaCells": 0,
    "ok": 1,
    "skippedFormulaCells": 0,
    "matchRate": 1
  },
  "mismatches": []
}

Read that as cache-diagnostic evidence. It says Bilig matches the fixture’s embedded cached values. It does not replace the Excel oracle harness above.

Put it in CI

For a service that depends on workbook logic, keep a small private corpus in the repo or in a CI-only fixture bundle:

pnpm workpaper:xlsx-corpus:check -- ./fixtures/workbooks

Five to ten representative workbooks are usually better than hundreds of files nobody understands. Include at least one workbook for every formula family your service depends on.

Turn a miss into a contribution

Formula compatibility work is one of the easiest first contributions because a good mismatch already gives you the formula, the expected value, the actual value, and a small repro path.

Useful links: