What shouldi actually does
You type a question. Something like: "Should I move to a new city?" The system doesn't give you a yes-or-no answer. Instead, it detects the domain of your decision (career, finance, health, relationships, real estateβ¦), generates a tailored survey with questions specific to your situation, and runs your answers through a probabilistic outcome model.
It produces three scenarios: best case, likely case, worst case β each with confidence intervals. Then it runs 1,000 Monte Carlo simulations to generate a full probability distribution, and gives you a composite decision score, a 90% confidence range, and risk/opportunity percentages. All of this happens in under 5 seconds.
The result isn't "you should do X." It's a map of possible futures with numbers attached.
The Architecture
The most important architectural decision was: the LLM is not the simulation engine. It's one component in a pipeline.
The Gemini LLM generates survey questions, outcome scenarios, confidence levels, and numerical simulation parameters (probability, impact, volatility). It runs on the server via server actions.
A skills system detects the decision domain and injects specialized context, evaluation criteria, and risk frameworks into the prompt. The Monte Carlo engine takes the LLM's numerical parameters and runs 1,000 weighted random simulations with Gaussian noise. The visualization layer handles histogram rendering, confidence charts, and composite scoring.
The LLM generates the parameters. The math happens separately. This means the simulation is reproducible and auditable, you can see why the model thinks what it thinks, and the distribution isn't a hallucination β it's computed from structured inputs.
The Skills System: Modular Domain Advisors
When you ask "Should I invest in real estate?", the system doesn't just pipe your question to a generic LLM. It first detects the domain (in this case, real-estate) using a keyword-based scoring system (no API call). Then it loads a specialized skill module that injects a domain persona, evaluation criteria specific to that domain, risk frameworks, and benchmarks.
There are currently 8 specialized advisors: finance, career, health, relationships, education, real estate, lifestyle, and business. The system is modular β adding a new advisor means creating one file and registering it. The prompt builder automatically weaves the domain context into the Gemini request.
What's interesting is that different advisors frame the same decision differently. A career advisor and a finance advisor will evaluate "Should I go back to school?" through completely different lenses. That tension is where the most useful insights come from.
Monte Carlo: Why Distributions Beat Single Answers
This was the feature that changed how the product feels. Before Monte Carlo, the output was: "Likely case: moderate confidence, 60β70%." Useful, but abstract. After Monte Carlo, the output is a histogram. You can see the spread of 1,000 possible outcomes. You can see that your decision has a 23% risk of a poor outcome and a 41% chance of a strong one. You can see the 90% confidence range is 34β78.
For each of 1,000 iterations, the engine selects an outcome using weighted random sampling, takes that outcome's impact score, adds Gaussian noise scaled by volatility (Box-Muller transform), clamps to 0β100, and records the result. Then it computes the composite score (mean), p5/p95, and builds a 20-bin histogram.
The entire simulation runs client-side in under 5 milliseconds. Zero API calls. The LLM provides the parameters; the math is deterministic from there.
What We Learned Building This
People don't struggle with a lack of options β they struggle with uncertainty between options. Nobody asks shouldi: "What should I do with my life?" They ask: "Should I take Job A or Job B?" The options are clear. The uncertainty between them is what paralyzes.
LLMs are surprisingly good at generating structured simulation parameters. With explicit prompt instructions and a clear schema, Gemini reliably returns probability, impact scores, and volatility values in JSON format.
The "likely case" is where all the value lives. Best case and worst case are easy to imagine. The realistic middle β what will probably happen, with specific trade-offs and timelines β is what people actually need and can't generate on their own.
Small assumption changes create wildly different distributions. Changing one survey answer from "somewhat prepared" to "very prepared" can shift the composite score by 15 points and dramatically narrow the confidence range. Showing users this sensitivity is more powerful than any single recommendation.
What's Next
shouldi.io is live and actively evolving. Areas being explored include: comparative mode (run the same decision with different assumptions side by side), historical calibration (how well did the model's predictions match actual outcomes?), more advisor specializations (legal, immigration, parenting), and collaborative decisions (share a scenario with a partner and see how your inputs change the outcome).