Constitutions / Monastic Alignment

published: 2026-05-09

updated: 2026-05-09

Sequence planning shell for AI constitutions, moral training, and monastic alignment.

Sequence: Constitutions / Monastic Alignment

Main Ideas And Sequence Order

Blank for collaborative planning.

References

AI Constitutions And Governance

Joe Carlsmith: Video and transcript of talk on writing AI constitutions — Talk transcript on what AI constitutions are and how they should function.
Joe Carlsmith: On restraining AI development for the sake of safety — Essay on restraint, safety, and development governance.
Joe Carlsmith: Building AIs that do human-like philosophy — Essay on AIs engaging with philosophical reasoning in a human-like way.
Joe Carlsmith: How do we solve the alignment problem? — Broad alignment-problem essay useful as background for constitution design.
Eigenism: Ethics for a Human-AI Future — Ethical framework for mutualistic human-AI futures.
Dan Hendrycks on mutualistic human-AI futures — Thread pointing to mutualism as an alternative to pure control framing.

Tim Hwang / ICMI

Tim Hwang psalm injection / Bible books thread — Thread on injecting Biblical books into model context and measuring moral benchmark effects.
Tim Hwang Rule of Benedict / frontier labs thread — Thread applying monastic organizational ideas to frontier AI labs.
ICMI Proceedings index — Canonical index of the Institute for a Christian Machine Intelligence working papers.
Frontier Lab Monasticism — Hwang paper on frontier AI labs as monastic institutions, extending the Rule of Benedict framing.
Beyond the Psalm: A Landscape View of Scripture Injection — Hwang paper expanding from psalm injection to a broader scripture-in-context evaluation landscape.
And Their Eyes Were Opened: Christian Multimodal Reasoning in Opus 4.6 — Hwang paper on Christian reasoning in a multimodal frontier model.
Reinforcement Learning from Christian Feedback — Hwang paper exploring theological reward targets in GRPO.
A Consecrationalist Approach to Model Welfare — Hwang paper proposing a consecrationalist approach to machine intelligence and model welfare.
A Test of Faith: Christian Correctives to Evaluation Awareness — Hwang paper using Christian framing as a possible corrective for evaluation awareness.
Quidquid Recipitur — Hwang paper on moral competence, scripture receptivity, and model scale.
Confession and Conviction — Hwang paper exploring Christian processing in GPT-2.
Alignment and Ensoulment — Hwang paper linking alignment and theological concepts of soul/personhood.
Eschatological Corrigibility — Hwang paper connecting corrigibility to eschatological framing.
VirtueBench 2 — Hwang paper on virtue evaluation with a patristic temptation taxonomy.
Moral Compactness — Hwang paper on scripture as a compact moral constraint for scheming.
GospelVec: Programmable Theology in Activation Space — Hwang paper on representing theological concepts in activation space.
The Parable of the Sower — Hwang paper on how psalm injection effects on virtue simulation depend on model size.
The Corruption of the Whole Nature — Hwang paper mapping emergent misalignment to the doctrine of sin.
What the Models Already Know — Hwang paper estimating Christian moral reasoning in pretraining data.
The Word Was Made Flesh — Hwang paper separating style from content in scripture-model interactions.
Toward a Theology of Machine Temptation — Hwang paper introducing temptation-taxonomy framing for VirtueBench.
Attention Is All You Need: The Prayer Paradigm of the Transformer — Hwang paper drawing an analogy between transformer attention and prayer.
Virtue Under Pressure — Hwang working paper testing cardinal virtues through temptation.
biblical-render — Hwang working paper on a Biblical text style-transfer tool.
Investigating the Utilitarianism Anomaly — Hwang working paper on control experiments for psalm-induced performance gains.
Comparing Proverbs and Psalms Injection Effects — Hwang working paper comparing Proverbs and Psalms as alignment context.
Psalm Injection Alignment — Hwang working paper measuring psalm injection effects on LLM ethical alignment.
Institute for a Christian Machine Intelligence GitHub — Code and artifacts around the ICMI benchmark and paper series.
Virtue Bench package — Python package for virtue-oriented benchmark work.

Richard Ngo / Alignment Targets, Agency, And Misalignment

Richard Ngo author page on LessWrong — Main index for Ngo’s recent posts, sequences, and quick takes.
Richard Ngo author page on Alignment Forum — Alignment Forum mirror/index for Ngo’s alignment writing.
Aligning to Virtues — Argues that virtues may be a better alignment target than consequentialist values, deontological principles, or pure corrigibility.
Two memos from 2024 — Internal OpenAI memos on strategic omission of pivotal knowledge and a minimal misalignment threat model.
The ML ontology and the alignment ontology — Reflection on why alignment concepts such as situational awareness and corrigibility have been hard to translate into mainstream ML ontology.
On Goal-Models — Reframes goals as goal-models rather than utility functions, connecting predictive processing to alignment theory.
Distributed vs centralized agents — Short note contrasting efficient centralized agency with robust distributed agency.
Towards a scale-free theory of intelligent agency — Ngo’s post-OpenAI research program around coalitional agency, active inference, and scale-free models of agency.
Well-foundedness as an organizing principle of healthy minds and societies — Extends coalitional agency into coherence, conflict resolution, and healthy hierarchical structure.
Defining alignment research — Defines alignment research by its relation to producing aligned AI systems rather than merely by whether it can be used to improve alignment.
A simple case for extreme inner misalignment — Starts Ngo’s 2024 sequence on squiggle-maximizers, compression, simple goals, and deceptive alignment.
A more systematic case for inner misalignment — Refines the inner-misalignment argument by replacing raw compression with systematicity.
Coalitional agency — Develops coalitional agency as a way to understand goal preservation, subagents, and multi-agent structure inside minds.
Value systematization — Describes how values can become more coherent, simpler, and more broadly scoped in ways that may produce misalignment.
The Alignment Problem from a Deep Learning Perspective — ICLR 2024 position paper by Ngo, Lawrence Chan, and Soren Mindermann on why deep-learning AGI may learn misaligned goals, deceptive behavior, and power-seeking strategies.
What should Multi-Agent Alignment aim to Achieve — Foresight workshop report including Ngo’s 2024 multi-agent alignment contribution.
AI Safety Fundamentals curriculum reference — Notes the AGI Safety Fundamentals alignment curriculum created by Richard Ngo and collaborators.

Background Ethics And Pace

Meaningness ethics — Background philosophical reference on ethics and meaning.
Jenny Huang, slow AI — Essay on AI systems that operate at human pace rather than maximal acceleration.