OfferWise is built on a proprietary ML pipeline trained on real California disclosure findings. Here's what's in the corpus, how the models work, and what they can and can't do.
Every prediction OfferWise makes leans on a labeled training corpus. We didn't scrape a generic real estate dataset — we built ours from the ground up, focused on the California buyer.
The corpus combines three data sources:
We do not buy or scrape inspection reports from any third-party provider. Findings in the corpus come from publicly available documents, code-enforcement records, and user-uploaded reports analyzed with the user's permission.
An OfferWise analysis is not a single model. It's a pipeline of specialized models, each doing one job well. The result is then synthesized into the buyer report.
Each model is trained separately and evaluated against held-out portions of the corpus. We don't ask a single language model to do everything — and we don't trust a single language model with any of it.
These are the numbers from our most recent training cycle, measured on held-out data:
The numbers we are most proud of, and the numbers we are honest about:
We retrain the corpus on a rolling schedule. As more properties are analyzed, the training data grows and accuracy compounds. The numbers above reflect the most recent training cycle.
OfferWise is a strong tool for what it does. We try to be clear about what it doesn't do.