"The purpose of a system is what it does."

Stafford Beer

Today we're looking at another demonstrator: the Systems Values Analysis Tool. This ambitious tool aims to examine what systems claim their values to be and compares it to what they actually deliver. It's a useful way to do a quick critique of a system or institution, particularly to assess the stress points that have been raised by the public.

Figuring out whether a system is aligned to its stated values can be a challenge
Figuring out whether a system is aligned to its stated values can be a challenge. Image generated by Chat GPT.

It's the last of the three custom GPTs, along with the Terms of Service Evaluator and Narrative Values Extractor, that I'll be writing about in this series. These simple demonstration GPTs wrangle the logic of some of the tools I've built at home into something that can be easily accessed by the public. But they are limited, as simple prompt-driven one-shot analysis tools that don't have the validation steps that I'd like in something more robust. This tool stretches to its limits what I can do to extract values signals with a custom GPT. I've taken as many of the frameworks and theory from this series as I can in one process.

This isn't just another policy critique tool: it's a diagnostic tool that reveals where and how a system's stated purpose diverges from what it actually does. It then makes that analysis actionable by identifying specific intervention points. Unlike policy tools it's not aiming to see whether institutions are abiding by regulations or whether policy is effective at achieving goals. It's going deeper, to the values that inform those goals, to diagnose where the values may have been lost in translation.

How it works

This custom GPT needs to be used in thinking mode. You may need to use the browser version of Chat GPT, it can be hard to choose model using the phone app.

The process is as follows:

  1. The user inputs a system and an aspect of the system that it wants to analyse. This specific focus is important; these systems are often massive and the LLM can drift to whatever first catches its attention if not instructed properly.
  2. Phase 0 (Scope): The tool sets the boundaries of the analysis.
  3. Phase 1 (Grounding): The tool performs research through a web search to identify information relevant to the request.
  4. Phase 2 (Narrative Mapping): The tool extracts the dominant narrative, common metaphors and frames, narrative carriers, and tone and positioning.
  5. Phase 3 (Values encoding and drift): The tool examines the system's stated values and enacted values before identifying drift between the two.
  6. Phase 4 (Four-Layer Framework Application): The tool examines the system using the Four-Layer Framework, looking at the values layer, meta-systemic layer, implementation layer, and Interface layer.
  7. Phase 5 (Alignment Diagnosis and Interventions): The tool identifies key misalignments and proposes interventions in the system that would improve alignment.
  8. Phase 6 (Four A's Synthesis): The tool examines the values embedded in the system by looking at Authorship, Articulation, Alignment, and Adaptation.
  9. Phase 7 (Summary): The tool produces a paragraph summarising the findings of the analysis.

The results are output as a narrative report in Chat GPT.

Limitations

This is necessarily a one-shot analysis of a complex system. It works well but should not be considered authoritative. While LLMs can hold a lot of context at once, the grounding is shallow and limited by the attention that it can provide to the task. It seems like the loudest narratives in the media and the official sources are the ones that come through the search results. This is a bias in the way all web search is performed, it is difficult to overcome without more robust collection methods.

Its proposed interventions can be hit and miss. Well, mostly miss. LLMs are like idealistic teenagers who just happen to have read ever book ever written. The cheerful optimism is nice but we should probably leave policy proposals to the humans who have to live with them. They do work as good prompts for future thought, particularly on the occasions where they come up with ideas I would never have thought of myself.

It's also important to note that the entire process is designed to look for misalignment between stated values and behaviours. This is an intentional bias. The things that are working well in a system aren't likely to make the news. It means that the LLM will look for misalignments and may amplify them in its analysis. It's worth remembering that most systems achieve their goals effectively most of the time. But some misalignment is inevitable, that is what this tool is trying to highlight.

Using the tool

A reminder: you'll need to use thinking mode for this one.

Click on this link to access the tool.

I've provided a few candidate Australian systems that I've tested which work well. They get a lot of coverage in the media so are rich with values language.

  • Child Protection System (Australia) focussing on indigenous child removals
  • NDIS (National Disability Insurance Scheme) focussing on participant autonomy and administrative control
  • Youth Justice System (Victoria) focussing on incarceration of children
  • Australia's Climate Policy System focussing on fossil-fuel approvals under net-zero commitments
  • Welfare Compliance System (Centrelink / Services Australia) focussing on automation and the Robodebt legacy

I've avoided any corporate institutions in the examples but they work just as well.

The writeup of the tool is available on my website.

Final thoughts

If you treat this tool as a good first pass for future research, it works well enough. My aim was to demonstrate that an LLM can look at a system, extract the stated values that it's meant to be aligned with, and then compare those values to how it's performing. It does that at least. And it's a different lens from how we often look at the performance of our systems.

This custom GPT, with its multiple phases, is really a bunch of different tools taped together. To develop it out further will be a lot of work, but I am looking at the individual elements as part of a bigger ecosystem. At the very least I need something that does: comprehensive grounding; effective values extraction into a more formal specification; a more complete survey of sentiment towards the values of system behaviours; and, more rigorous analysis of the gap between stated values and what a system actually does. Like everything, it's a work in progress.

This post was two weeks after the last one, I am slowing down a bit. I spent the last fortnight writing training courses and will have more on my plate going forward, so I'll likely drop my tempo to one post a fortnight. Next will be Adaptation, the final of the 4As, before I start to demonstrate more of the heavier, code-based tools. Chat soon.

Brendon Hawkins
Brendon Hawkins

Intelligence professional exploring systems, values, and AI

Share this post