{ A place for our AI based experiments and applied research. }
Generative Strategic is the applied-research arm of General Strategic. We build small, strange, useful things with language models — to learn where they help, where they don't, and what a practice around them should look like when the stakes are real.
§ 02 · The log
Experiments, in various states of usefulness.
A conversational archive of Robert Menzies.
Decades of speeches, cabinet papers, and broadcasts — indexed, cited, and queryable in plain English. Built to test whether a model can serve as a research assistant for political history without making things up.
Strategy tools for people who don't want a deck.
A small suite of working tools — framings, pre-mortems, stakeholder maps — that run in the browser and keep their working on the page, not behind a login.
Fine-tuning research availability and outcomes.
Synthetic respondents, calibrated to your real ones. Fine-tunes a customised model on a research library (quant survey waves, qual transcripts) so you can pre-test messaging against the audience you actually have. Not to replace fieldwork. To stop wasting it on questions you can answer in a minutes.
A canonical corpus of Australia's public record
The most extensive working corpus of Australia's entire public record. Hansard, speeches, announcements, every act of legislation, regulation, every committee report, every court judgment. Over 4 million vectors chunked, indexed, embedded, cited, and accessible for RAG by API.
Autonomous AI agent as a team member
A long-running agent and controller on the GS team. Persistent identity, three-tier memory, scheduled work, and presence across the channels the team uses. Under test: long-horizon continuity, self-directed scheduling, self-maintaining improvement rubric, coordination with a paired secondary agents, and correction loops. Running continuously. Field notes record what the architecture predicts, what it does, and where the two diverge.
A diagnostic system for persuasion
A rubric tool for messaging and persuasion. Testing messaging across multiple layers including surface, implication, trigger, and frame, under a consistent scoring system. Variants can be quantified, modelled, and compared across waves and cross-referenced against human quant and qual research. Designed to aid alongside fieldwork, not replace it.
§ 03 · House rules
Six things we try to remember.
A working set of principles. Not a manifesto. We edit them as we go, the list on this page is always the current one.
Tools are not answers.
The tools and models themselves must be tested, compared, and corrected, not merely presented. Anything that leaves the lab must go through human judgement and a method that isn't the model.
Publish what we learned.
Every experiment that teaches us something gets a field note. We don't always show the prompts or the data. We always try to show the conclusions, the failures, and what we'd do differently.
The method survives the model.
Build for the discipline first and the model second. When a better model arrives, the method should still hold. If the method requires a specific model to work, it isn't fit for purpose.
Every claim has a chain of custody and source.
For any output that ends up in front of a client or in a public note, we can retrace the path: which document, which prompt, which model, which edit. Models hallucinate. Every answer must clearly carry its source.
Humans decide. Always.
The model drafts, compares, sorts, and forgets. The decision sits with a person. The interface is built so the line is obvious.
Measure against the floor, not the ceiling.
Evaluations describe what a tool does on its worst representative day, not its best. The number that matters is the one a tired user gets at four o'clock on a Friday.
§ 05 · Field notes
What we've been learning. How we've been failing.
The prompt is the product, until it isn't
For the first generation of LLM applications, the differentiator was the prompt. A clever instruction template was the entire moat. This was true for about eighteen months. It is no longer true, and most teams who built their product on prompt cleverness are now discovering it.
Against the demo. For the shipping experiment.
A model that confidently lies about budget dates.
A working vocabulary for 'AI strategy' that isn't embarrassing.
When a model should say 'I don't know' and how we force it to.
§ 06 · Collaborate
{ Have a hard problem you'd like us to sit with? }
We take on engagements for organisations wrestling with how to use these tools responsibly, and the occasional collaborator on an experiment of our own.
hello@generativestrategic.com