I’m paraphrasing of course. I recommend you watch this video from the Museums Victoria webpage to get the full story.

I remember other things from school too. The poems by Banjo Patterson about the rugged colonial men taming nature through quiet competence, determination, and a flexible approach to the law. We learnt about Gallipoli, about Simpson and his donkey, and the selflessness, solidarity, courage, and endurance that marked the ANZAC as exceptional. Later in school we learnt stories about later migrations, their struggles, and the racist legacy of the White Australia Policy. I remember talking about it with my great auntie’s husband whose family had come from China during the gold rush in the 1890s. He had become a champion amateur boxer in his early years as a way of managing systemic racism with his fists.

To translate values from story to something AI can understand you need a good campfire.
To translate values from story to something AI can understand you need a good campfire. Image generated by Chat GPT.

Shared values arrive through story. It’s something we humans have been doing as long as we’ve been human: sitting around a fire, telling stories, transferring wisdom between generations. I’ve held onto some of the values imparted from the stories of my early education and abandoned others. On one hand, Tiddalik taught me that nature is to be shared, and that hoarding resources causes everyone else to suffer. On the other, Banjo Patterson’s poetry often carried the message that there was heroism in taming the wild lands, claiming your little parcel, clearing it, and sticking up a fence to keep everyone else out. Needless to say, I kept the lesson from the frog and discarded the one by the bearded balladeer.

My wife didn’t learn about Tiddalik. She tells the story of her school dressing them all up in colonial clothing and having them sing God Save the Queen. And that was in 2006.

You may ask what was the point of that story? I like stories. I also enjoy analysing texts focussing on narratives which encode values relevant to a particular culture. And these stories are everywhere. It doesn’t matter whether they were created by Shakespeare or Stan Lee, they are cultural artefacts which encode values we humans can interpret and internalise. My moral compass was set as much by Jean Luc Picard as it was by my community.

As I’m writing this, I have a ChatGPT-5 window open. I wrote my paragraphs about Tiddalik and Patterson and the ANZACs, and then I asked what values were encoded in these stories. The answer was immediate:

For Tiddalik: sharing, sustainability, the interdependence of communities and the environment. For Paterson’s colonial ballads: self-reliance, perseverance, a romanticisation of settlement and squatting. For the ANZACs: mateship, sacrifice, courage, endurance in adversity.

It’s a strong output given that it was in a chat without any explicit prompting to anchor it to specific values. It’s certainly influenced by our previous interactions, but that can be ironed out by using fresh sessions for each request. What’s interesting here is that an LLM can extract values from a narrative text and even apply them to a scenario. The ability to extract values is useful, but I wanted to know whether it could be used to determine alignment. So I followed up by asking about the extent to which the mortgage industry would be aligned to the values encoded in the story of Tiddalik:

From a Tiddalik-aligned perspective, the mortgage industry would be about as far from “water for everyone” as you could get. It structurally rewards the frog for drinking more than they need — and often requires the rest of the community to amuse the frog (via interest payments) just to get a trickle back.

If I’m trying to teach values to a seven-year-old I’m going to tell them a story. Time has proven that it works and it’s entertaining for both of us. That’s not how I’m going to articulate values for interpretation by a complex system, AI or otherwise. I could go down to my local bank branch and set up a speaker system, playing the Tiddalik video on a giant screen at full volume in an attempt to align it to what I think is an obvious set of values. It wouldn’t work, despite how desperately it needs to hear the message.

How we communicate values to systems

I wrote a few weeks ago that systems don’t have a moral sense. That doesn’t mean that we don’t already have ways of articulating values to systems. We communicate values to systems through laws and regulations. These sit at the meta-systemic layer of the four-layer model of how values interact with systems.

The law sets constraints on systems, telling them what they can’t do when attempting to achieve their goals. At the same time, it establishes the enabling architecture that allows entities like companies to exist in the first place. Regulations and standards add a further layer: they don’t just constrain, they guide. By codifying lessons learned from collective benefit, things like safety codes, accounting standards, reporting requirements, they channel systems toward preferred behaviours. In this sense, laws and regulations act as a translation mechanism, turning social values into concrete rules and procedures that systems can follow. They are values articulated for systems.

For the most part, laws mostly prohibit or define, while regulations and standards prescribe patterns. It’s a positive and negative articulation which is used to make sure that our systems work in a particular way. Systems need both so that they can conduct themselves in a way that is aligned to our values. You need to tell a system “don’t pollute the river” as well as “install a wastewater treatment system”.

This could be seen to undermine my argument here. If we have a mechanism for translating our values into something systems can interpret, why do we need something new?

It isn’t controversial for me to say that our current method isn’t working. The mechanisms we’ve established for constraining and guiding complex systems hasn’t evolved as their relative power in society has increased. They continuously take actions that would be considered moral violations were they performed by a human. If we are to bring them back into alignment, we need new ways to better articulate our values so that they can be responsive.

There are reasons why this is the case. The information takes time to arrive. By the time it surfaces, the harm has already occurred. It also relies on learning through negative feedback loops caused by the consequences of violations. It means systems only adjust after they’ve failed to align and will try to minimise the scale of the violation to avoid heavier sanction. We also lose information in the imperfect translation. When we turn values into procedures and practice, they can be gamed, narrowly interpreted, or innovated into obsolescence. Finally, our laws emerge slowly through compromise. This leads to values being eroded before the constraints are even set.

Our values are articulated to systems through a process that moves from narrative to normative to procedural to quantitative. They need to because our old systems are built to interpret values in a certain way. Artificial intelligence might open a new possibility: to communicate our values to systems more directly, and more faithfully, than these legacy methods allow.

We can still be prescriptive

Values don’t need to be encoded into law to be prescriptive. We’re perfectly capable of describing our values in way that AI can understand. Our legacy systems might struggle to interpret them, but it could still be useful for them to have a library of values on the CEO’s desk for when the regulators start throwing around terms like unconscionable conduct.

Most of what I’ve tried to do with using LLMs for evaluating values alinement is to take an artefact, usually a document, and measure it against a values specification. In practice this specification is a body of text passed into the prompt, that describes a value along with a bunch of useful enriching information. That level of structure works because it matches how frontier models process instructions. They don’t understand values, but they can reliably apply patterns if they are articulated clearly.

I’ve tossed around a few candidates for a schema for values, and this is where I (after stress testing by various LLMs) have landed for the time being: * Value label (atomic form): the core term, stripped of modifiers, so it can’t be confused with composites. * Definition: a clear, bounded description of what the value means in this context. * Indicators: observable signs that the value is being upheld or violated. * Applicability conditions: the situations or domains where the value is relevant. * Related values: complementary or competing values that shape its interpretation. * Decision rules: explicit guidance for resolving trade-offs or conflicts. * Example cases: concrete scenarios that illustrate how the value plays out. * Provenance and audit trail: record of authorship, revisions, and sources, to ensure transparency and accountability.

It’s a lot. And it’s probably not right. And it’s also hugely influenced by methods from my professional background. And my worldview. And I’m going to struggle to put once of these together, much less trying to create one for every value I can think of.

Simple versions of this do work for what I’ve been trying with LLMs so far. The Terms of Service Evaluator has six of these simple articulations, appropriately brief for a custom GPT, which I’ve published on my website. However, they’re not articulated with the depth that is needed for comprehensive analysis of value alignment. In contrast, the intelligence requirements for one of the intelligence missions I process is probably 12,000-16,000 tokens of instructions per API call, plus whatever I’m trying to analyse. The values statements will need to be somewhere in the middle, maybe a thousand tokens of quality examples per value.

For now, this is what I’ll keep working with. It’s a good compromise using the values definitions as the models understand them out-of-the-box constrained by whichever author is seeking values alignment.

Existing ontologies and structures

There are individuals who have put a lot more work into articulating values in structured ways than I have. One example I’m currently looking at is ValueNet. ValueNet is “a modular ontology representing and operationalising moral and social values” based off Basic Human values theory and Moral Foundations Theory. You can read the paper here.

The representation of values in an ontological form may be useful to articulate values for artificial intelligence. An ontology like ValueNet sits between stories and rules and can operate like a source of shared vocabulary between them. It sets out the structure of values and maps them to things like the value situation and the participants, as well as the links to theory. It’s a disciplined way to represent values in a way that LLMs can process.

At this stage it isn’t important to pick an ontology or state our values, however they are authored. However, I have the suspicion that some sort of structured ontology will help in articulating values to systems via artificial intelligence.

Trusting the embeddings

There is another possibility. I’ve found during my experiments with LLMs that they’re probably better at articulating values than I am. Or at least they are when asked to do so. Most of the time you don’t even need to do that; you can just ask it to assess a document against the value “fairness”, for example, and it will go off and do it. That’s because it already carries an internal map of how the term relates to everything else.

It’s a good approach if we’re after value alignment, not rules enforcement. Capturing the nuance through embeddings is closer to how values live in language. They’re flexible, overlapping, sometimes contradictory, and always context dependent. If the authorship of values is done properly, by using something like a community’s cultural corpus of normative values statements, we may not need to be prescriptive at all.

It makes sense that the best way to articulate values to AI, or at least to LLMs, is by structuring them in the same way that they are built. Seeing values as attractors makes them less like rigid rules and more like gravitational fields. They’re dynamic, shaped by use, and capable of drifting or fragmenting.

It might also make them measurable. Structuring values in this way could be used to evaluate how coherent a value label is across a population, how consistently it is used across sources, whether it drifts over time, where contradictions occur, and how close it is to other values in the graph. I’m beginning to think this points toward building a custom embeddings graph built from value-rich texts drawn from a living culture. It’s a way of surfacing the attractors that already shape how values live in language

I think it’s because values, and more precisely the value labels, act as intentional semantic attractors. We use them deliberately to justify behaviours, establish goals, calculate trade-offs, and situate ourselves in relation to others. Values only really make sense in context, that’s why we transmit them through stories rather than bulleted lists.

We need a combination of methods

I like the idea of handing the digital holdings of the national library over to a script, extracting metadata-rich values statements, and building a giant graph of how they relate to everything else. It appeals to my instinct to automate, I suppose I’m kinda lazy like that. But that graph isn’t just data for its own sake. It’s a way of surfacing the attractors that already structure how values live in language. Beyond that though, we also need to be intentional and deliberative when articulating our values for systems. And then we need to choose the right way to represent them in all their fuzzy beauty.

To make any system values-aware, you have to pick a method of articulation. The first consideration is that they need to be articulated in a way that the system can interpret. Laws and regulations have worked for our existing institutions because they constrain behaviour explicitly as they go about achieving their goals. These new ways of articulating values to systems will become useful once AI is embedded into our institutions’ processes and decision making. More than that, they’ll likely be a necessary safeguard to make sure that AI is acting in the best interests of the humans it serves.

System interpretability is one side of the coin. The other is in making sure that our method of articulation is appropriate for the community it serves. Some cultures see values as indivisible, others as emergent from relationships, or as embodied in ritual rather than abstraction. The real challenge isn’t finding the one true encoding: it’s building a translation layer that can hold moral plurality. It’s impossible to be neutral when choosing a representation. But it might be enough, for now, to acknowledge the imperfections and still get on with the work of building better alignments.

I have a lightweight tool demonstration for next week, a narrative values extractor, to stay on theme. After that we’ll begin the shift from articulating values to how to bring our systems into alignment. See you then.

Brendon Hawkins
Brendon Hawkins

Intelligence professional exploring systems, values, and AI

Share this post