We did not build a test prep platform. We built a measurement system — one that begins with alignment, sharpens through item analysis, and ultimately learns to predict whether you will walk out of that exam with a passing score.
This is not a product roadmap. This is our methodology — each phase building on the last, each one impossible without what came before.
Every question is mapped directly to the BACB Test Content Outline, 6th Edition. The foundation upon which everything else is built.
As response data accumulates, we analyze each question's difficulty index, discrimination index, and distractor quality. The weak are refined. The strong remain.
Machine learning models trained on user behavior will aim to answer the question that matters most: Are you ready for exam day?
The First Question Every Test Must Answer: Is It Valid?
Before a single question reaches our platform, it must answer one question of its own: does this question actually measure what it claims to measure?
Content validity evidence is established by inspecting test questions to confirm they correspond to what users should know. But a test can touch every domain and still fail — if it over-samples easy content and ignores high-weight areas, it looks valid while measuring something else entirely.
That is why we went further than alignment. Every question on our platform is mapped to a specific task list item in the BACB Test Content Outline, 6th Edition — and the proportion of questions in each domain mirrors the actual exam weighting. This is what measurement specialists call content representativeness: not just covering the right topics, but sampling them in the right distribution.
The result is a table of specifications that maps directly to the BACB Test Content Outline, 6th Edition blueprint.
We have opened our doors. Three distinct mock exams — each proportioned to the TCO — are live. With every response submitted, we accumulate the raw material of something far more powerful. We are listening. We are waiting for the data to speak.
Every question is mapped to a specific task list item across all domains — from Behaviorism and Philosophical Foundations through Selecting and Implementing Interventions.
The number of questions per domain in each mock exam reflects the actual weighting of the BCBA exam — what measurement specialists call a table of specifications. This is content representativeness, not just content coverage.
We cast a wide net. Different question pools, each proportioned to the TCO. Multiple attempts to expose knowledge gaps — and to generate the response data our item analysis will need.
Each question is reviewed before it reaches our platform: matching items to objectives.
“Content validity evidence is more a minimum requirement for a useful test than it is a guarantee of a good test. A test can sometimes look valid but measure something entirely different — guessing ability, reading level, or skills acquired before instruction.”
This is why content validity is only where we begin. Our platform is open. Users are studying. And every response is a data point in an experiment that will reveal which questions are doing their job — and which ones are not.
When the Data Finally Speaks
Once sufficient response data accumulates, a question stops being just a question. It becomes a subject of scientific scrutiny. Quantitative item analysis allows us to inspect the performance of every item — not through intuition, but through the unambiguous language of statistics.
We use measures that define whether a question earns its place on our platform. Together, they tell the complete story of an item's quality.
The proportion of users who answered correctly. Not a reflection of how the question was written — a reflection of how users actually perform. An item too easy teaches nothing. An item too hard measures nothing. The optimal range is the middle ground where discrimination power is maximized.
Does this question separate users who understand the material from those who do not? We divide users into upper and lower halves by total score, then measure how much more likely the upper group is to answer the item correctly. A positively discriminating item is a precision instrument. A negatively discriminating item is a liability.
A well-written wrong answer is one of the most sophisticated tools in assessment design. It should attract users who do not know the material — and leave the prepared untempted. When no one chooses a distractor, it is invisible. When too many high-scorers choose it, something is broken. Item analysis reveals both.
| Observation | Signal | Action |
|---|---|---|
| No one selected the distractor | Too obviously wrong — the option is invisible to users | Revise or replace |
| Upper group favors a distractor over the key | Possible misidentification — correct answer may be wrong | Review answer key immediately |
| Responses spread equally across all options | Guessing — content likely not covered in instruction | Eliminate or remap to TCO |
| Upper group splits equally between key and one distractor | Ambiguity — two defensible answers exist | Rewrite stem or revise options |
| More lower-group users choose each distractor | Distractors functioning correctly | No action needed |
D > 0, p in the optimal range, all distractors attracting more lower-group users. The item is earning its place in the question bank.
Weak distractors, low but positive D, or an ambiguous stem. The item has potential — it needs sharpening, not removal.
Negative D means users who know the material are getting it wrong more than those who don't. This item is actively undermining the test. It must go.
“Item analysis is useful but limited. It points out items that have problems but doesn't tell us what the problems are. To do a thorough job, one must use a combination of quantitative and qualitative analysis — and not rely solely on one or the other.”
When the items are refined and the questions are trustworthy, a new question emerges — one that no single score can answer. Not how did you do? But will you pass? That is where phase three begins.
The Answer to the Question That Actually Matters
Scores tell you where you have been. Predictions give you signal about where you are going.
When we have accumulated enough validated, item-analyzed response data, we will train machine learning models against the features we have curated in our platform.
The goal is not to tell you your score. The goal is to tell you — with statistical confidence — whether you are ready to sit for the BCBA® exam. And if you are not, to show you exactly where to focus.
Every exam you take, every question you answer, every distractor that tempts you away from the correct response — all of it is signal. We are building the model that will learn to read it.
SHAP beeswarm — projected ML model output · illustrative
Every platform that promises to prepare you for an exam is making a claim. We are in the business of proving ours. We began with alignment — every question mapped to the BACB Test Content Outline. We are collecting the data. And when the statistical weight of user responses reaches critical mass, we will do what few platforms dare — subject every question to the scrutiny of item analysis and let the data decide which ones survive.
The users using this platform today are not just studying. They are contributing to the scientific foundation upon which the next generation of exam intelligence will be built. Every response matters. Every exam taken brings us closer to the answer that matters most — not “how did you score?” but “are you ready?”
We use the BCBA Test Content Outline as our reference for the domains.
©Behavior Analyst Certification Board®. All rights reserved. Reprinted and/or displayed by permission granted in 2025. The most current version of this document is available at www.BACB.com. Contact the BACB for permission to reprint and/or display this material.