How Soultrace Works: A Technical Deep-Dive (v3.0)
Most personality tests are glorified surveys. Fixed questions, fixed order, dump answers into a spreadsheet, calculate percentages. Boring. Statistically lazy.
Soultrace uses a two-stage latent trait model with adaptive question selection. We infer underlying psychological traits from your answers, then compute your color profile as a learned transformation of those traits. This post explains the actual math and code behind it.
The Problem with Direct Classification
Traditional personality tests (and our v2.x system) update archetype probabilities directly from answers. This has issues:
- No principled trait modeling: Colors are computed from raw answer patterns, not grounded psychological constructs
- Response style bias: Someone who always answers "strongly agree" gets different results than someone who answers "agree" - even if they have the same underlying personality
- Calibration nightmare: You need to calibrate likelihood tables for every question × color × score combination
Our solution: a latent trait layer that mediates between answers and colors.
The Architecture
Answers → Trait Updates (Bayesian) → Weight Matrix × Traits → Softmax → Colors
↓
ERS Conditioning
Instead of updating 5 color probabilities directly, we update 8 psychological trait probabilities. Colors emerge as a learned function of traits.
The Eight Latent Traits
We model eight binary traits, each representing a psychologically-grounded dimension with established research backing:
| Trait | Cluster | Description | Key Research |
|---|---|---|---|
| Conscientiousness | White | Organization, dependability, self-discipline | Costa & McCrae (1992) |
| Need for Cognition | Blue | Enjoying effortful cognitive activity | Cacioppo & Petty (1982) |
| Analytical Thinking | Blue | Preference for systematic reasoning | Frederick (2005) |
| Agency Motivation | Black | Drive for achievement and power | Bakan (1966); Wiggins (1991) |
| Promotion Focus | Black | Orientation toward gains and aspirations | Higgins (1997) |
| Sensation Seeking | Red | Need for novel, intense experiences | Zuckerman (1979) |
| Emotional Expressivity | Red | Comfort with displaying emotions | Kring et al. (1994) |
| Communion Motivation | Green | Drive for connection and belonging | Bakan (1966) |
Each trait is modeled as P(trait = true) ∈ [0, 1], starting at 0.5 (maximum uncertainty).
Step 1: Bayesian Trait Updates
Question-Trait Mappings
Each question specifies which traits it measures and how:
QUESTION_TRAIT_UPDATES[questionId] = {
conscientiousness: 'POSITIVE', // Agreement → trait likely true
sensationSeeking: 'NEGATIVE', // Disagreement → trait likely true
analyticalThinking: 'WEAK_POSITIVE' // Modest positive signal
// Unlisted traits use NON_INFORMATIVE (no update)
}
Templates define the likelihood P(score | trait = true):
| Template | Effect |
|---|---|
| POSITIVE | High scores (6,7) indicate trait presence |
| NEGATIVE | Low scores (1,2) indicate trait presence |
| WEAK_* | Same direction, half the update strength |
| NON_INFORMATIVE | Uniform - no information |
Symmetric Likelihood
Here's the key insight: we derive P(score | trait = false) by reversing the score:
Where 1↔7, 2↔6, 3↔5, and crucially: 4↔4.
This ensures neutral answers (score = 4) produce zero posterior shift. If you're ambivalent, you don't bias the estimate.
The Update
For each trait with a non-uniform template:
Where:
function updateTrait(
currentProb: number,
templateName: UpdateTemplateName,
answer: LikertScore,
ersProb: number
): number {
const template = UPDATE_TEMPLATES[templateName]
// P(score | trait=true) - mixture conditioned on ERS
const pScoreGivenTrue = getMixtureLikelihood(template, answer, ersProb)
// P(score | trait=false) = P(reversed_score | trait=true)
const reversedAnswer = reverseScore(answer)
const pScoreGivenFalse = getMixtureLikelihood(template, reversedAnswer, ersProb)
// Bayes update
const pScore = pScoreGivenTrue * currentProb + pScoreGivenFalse * (1 - currentProb)
return (pScoreGivenTrue * currentProb) / pScore
}
Step 2: Extreme Response Style (ERS) Conditioning
The Problem
Some people systematically choose extreme answers (1, 7). Others prefer moderate responses (3, 4, 5). This response style is independent of actual traits, but naive models conflate them.
The Solution
We model ERS as a separate binary latent variable, updated by every answer:
const ERS_LIKELIHOOD = {
true: { 1: 0.22, 2: 0.16, 3: 0.08, 4: 0.08, 5: 0.08, 6: 0.16, 7: 0.22 }, // Extremes
false: { 1: 0.04, 2: 0.10, 3: 0.20, 4: 0.32, 5: 0.20, 6: 0.10, 7: 0.04 } // Moderate
}
Each update template has two variants (extreme/moderate). The actual likelihood is a mixture:
The extreme variant is flatter (less peaked), so extreme responders don't get inflated trait estimates just because they pick endpoints.
Update Order
- Update P(ERS) based on answer extremity
- Update all traits using the new P(ERS)
This cascading update means the first few answers calibrate our response style estimate, which then conditions all subsequent trait updates.
Step 3: Trait-to-Color Transformation
The Weight Matrix
Colors are computed via a learned linear transformation:
Where W is a 5×8 matrix:
const TRAIT_TO_COLOR_WEIGHTS = [
// [consc, NFC, analyt, agency, promo, sens, emot, comm]
[ 0.9, 0.1, 0.3, -0.2, -0.1, -0.4, -0.2, 0.2], // White
[ 0.3, 0.8, 0.7, 0.1, 0.1, -0.2, 0.0, 0.0], // Blue
[ 0.2, 0.2, 0.2, 0.75, 0.65, 0.2, -0.1, -0.4], // Black
[-0.4,-0.15, -0.3, 0.2, 0.3, 0.8, 0.7, -0.1], // Red
[ 0.2, 0.1, 0.0, -0.4, -0.2, -0.2, 0.3, 0.9], // Green
]
Positive weights mean the trait increases that color. The bias vector ensures uniform colors when all traits are at 0.5.
function computeColors(state: TraitState): ColorDistribution {
const traitVector = TRAIT_IDS.map(id => state.traits[id])
const logits: number[] = []
for (let c = 0; c < 5; c++) {
let sum = TRAIT_TO_COLOR_BIAS[c]
for (let t = 0; t < 8; t++) {
sum += TRAIT_TO_COLOR_WEIGHTS[c][t] * traitVector[t]
}
logits.push(sum)
}
return softmax(logits)
}
Why This Works
The weight matrix encodes domain knowledge about how traits map to colors:
- High conscientiousness + low sensation seeking → White
- High NFC + high analytical → Blue
- High agency + high promotion focus → Black
- High sensation + high emotional expressivity → Red
- High communion + low agency → Green
The softmax ensures valid probabilities. The bias ensures calibration (all traits at 0.5 → 20% each color).
Step 4: Adaptive Question Selection
Information Gain
We select questions to maximize expected information gain across traits. For each candidate question:
function computeInformationGain(questionId: number, state: TraitState): number {
const updates = QUESTION_TRAIT_UPDATES[questionId] ?? {}
let totalIG = 0
for (const [traitId, templateName] of Object.entries(updates)) {
const currentProb = state.traits[traitId]
// Current entropy
const hBefore = binaryEntropy(currentProb)
// Expected entropy after observing answer
let hAfterExpected = 0
for (let s = 1; s <= 7; s++) {
const pScore = predictScoreProbability(s, traitId, templateName, state)
const pPosterior = bayesianUpdate(currentProb, s, templateName)
hAfterExpected += pScore * binaryEntropy(pPosterior)
}
// Weight by template strength
const weight = templateName.startsWith('WEAK_') ? 0.5 : 1.0
totalIG += weight * (hBefore - hAfterExpected)
}
return totalIG
}
High information gain = asking this question will reduce trait uncertainty significantly.
Coverage Bonus
To ensure all traits get measured, we add a bonus for under-probed traits:
Where count[trait] is how many questions have already measured this trait.
Combined Score
With β = 0.3, we favor informative questions while ensuring coverage.
Softmax Sampling
Questions are selected via temperature-controlled softmax:
With T = 0.2 (low temperature → favor high scores with some stochasticity).
function selectNextQuestion(
traitState: TraitState,
selectionState: QuestionSelectionState,
allQuestionIds: number[]
): number | null {
const scored = scoreQuestions(traitState, selectionState, allQuestionIds)
// Sample from softmax distribution
const r = Math.random()
let cumulative = 0
for (const q of scored) {
cumulative += q.probability
if (r <= cumulative) {
return q.questionId
}
}
return scored[scored.length - 1].questionId
}
The Complete Flow
function runAssessment(questionPool: Question[]): ColorDistribution {
let traitState = createInitialTraitState() // All traits at 0.5, ERS at 0.5
let selectionState = createQuestionSelectionState()
for (let i = 0; i < MAX_QUESTIONS; i++) {
// Select next question
const questionId = i === 0
? randomChoice(questionPool) // First question random
: selectNextQuestion(traitState, selectionState, questionPool)
// Get answer (1-7)
const answer = await presentQuestion(questionId)
// Update trait state (ERS first, then all traits)
traitState = updateOnAnswer(traitState, questionId, answer)
// Track coverage
selectionState = recordAskedQuestion(selectionState, questionId)
}
// Compute final colors from traits
return computeColors(traitState)
}
Why This Actually Works
Psychologically Grounded
Unlike opaque "color likelihoods," our traits have established psychological foundations. Need for Cognition, Agency/Communion, Regulatory Focus - these are real constructs with decades of research. This makes the model interpretable and testable.
Response Style Invariant
ERS conditioning means someone who always picks "strongly agree" gets similar trait estimates to someone who picks "agree" - only the ERS probability differs. This eliminates a major source of bias in self-report assessments.
Calibrated by Design
The weight matrix and bias ensure:
- All traits at 0.5 → uniform color distribution (20% each)
- Random answers → approximately uniform colors
We verified this with Monte Carlo simulations.
Efficient Convergence
Trait-based information gain focuses questions on high-uncertainty traits. We converge to stable estimates in ~24 questions instead of 50-100.
Limitations
Myopic selection: We pick questions greedily based on current state. A globally optimal strategy might sacrifice short-term information for better long-term probing.
Weight matrix is hand-tuned: We derived weights from psychological theory and calibrated via simulation. A learned matrix from user data might perform better.
Independence assumption: We treat answers as conditionally independent given traits. In reality, there may be order effects or fatigue.
Try It
That's the actual methodology. No hand-waving, no "proprietary AI" bullshit.
Take the test and see if the math holds up for you: Start Assessment
References
- Bakan, D. (1966). The duality of human existence. Basic Books.
- Cacioppo, J. T., & Petty, R. E. (1982). The need for cognition. Journal of Personality and Social Psychology, 42(1), 116-131.
- Costa, P. T., & McCrae, R. R. (1992). Revised NEO Personality Inventory (NEO-PI-R). Psychological Assessment Resources.
- Frederick, S. (2005). Cognitive reflection and decision making. Journal of Economic Perspectives, 19(4), 25-42.
- Higgins, E. T. (1997). Beyond pleasure and pain. American Psychologist, 52(12), 1280-1300.
- Kring, A. M., Smith, D. A., & Neale, J. M. (1994). Individual differences in dispositional expressiveness. Journal of Personality and Social Psychology, 66(5), 934-949.
- Wiggins, J. S. (1991). Agency and communion as conceptual coordinates for the understanding and measurement of interpersonal behavior. In D. Cicchetti & W. M. Grove (Eds.), Thinking clearly about psychology (Vol. 2, pp. 89-113). University of Minnesota Press.
- Zuckerman, M. (1979). Sensation seeking: Beyond the optimal level of arousal. Lawrence Erlbaum Associates.
Other Articles You Might Find Interesting
- The five-color personality system - Understanding the model behind SoulTrace
- SoulTrace methodology - The scientific approach explained
- SoulTrace 3.0: the new trait model - Latest evolution of the system
- Accurate personality test - What makes a personality test actually reliable