Constitutions / Monastic Alignment
published: 2026-05-09
updated: 2026-05-09
Sequence planning shell for AI constitutions, moral training, and monastic alignment.
Sequence: Constitutions / Monastic Alignment
Main Ideas And Sequence Order
Blank for collaborative planning.
References
AI Constitutions And Governance
- Joe Carlsmith: Video and transcript of talk on writing AI constitutions — Talk transcript on what AI constitutions are and how they should function.
- Joe Carlsmith: On restraining AI development for the sake of safety — Essay on restraint, safety, and development governance.
- Joe Carlsmith: Building AIs that do human-like philosophy — Essay on AIs engaging with philosophical reasoning in a human-like way.
- Joe Carlsmith: How do we solve the alignment problem? — Broad alignment-problem essay useful as background for constitution design.
- Eigenism: Ethics for a Human-AI Future — Ethical framework for mutualistic human-AI futures.
- Dan Hendrycks on mutualistic human-AI futures — Thread pointing to mutualism as an alternative to pure control framing.
Tim Hwang / ICMI
- Tim Hwang psalm injection / Bible books thread — Thread on injecting Biblical books into model context and measuring moral benchmark effects.
- Tim Hwang Rule of Benedict / frontier labs thread — Thread applying monastic organizational ideas to frontier AI labs.
- ICMI Proceedings index — Canonical index of the Institute for a Christian Machine Intelligence working papers.
- Frontier Lab Monasticism — Hwang paper on frontier AI labs as monastic institutions, extending the Rule of Benedict framing.
- Beyond the Psalm: A Landscape View of Scripture Injection — Hwang paper expanding from psalm injection to a broader scripture-in-context evaluation landscape.
- And Their Eyes Were Opened: Christian Multimodal Reasoning in Opus 4.6 — Hwang paper on Christian reasoning in a multimodal frontier model.
- Reinforcement Learning from Christian Feedback — Hwang paper exploring theological reward targets in GRPO.
- A Consecrationalist Approach to Model Welfare — Hwang paper proposing a consecrationalist approach to machine intelligence and model welfare.
- A Test of Faith: Christian Correctives to Evaluation Awareness — Hwang paper using Christian framing as a possible corrective for evaluation awareness.
- Quidquid Recipitur — Hwang paper on moral competence, scripture receptivity, and model scale.
- Confession and Conviction — Hwang paper exploring Christian processing in GPT-2.
- Alignment and Ensoulment — Hwang paper linking alignment and theological concepts of soul/personhood.
- Eschatological Corrigibility — Hwang paper connecting corrigibility to eschatological framing.
- VirtueBench 2 — Hwang paper on virtue evaluation with a patristic temptation taxonomy.
- Moral Compactness — Hwang paper on scripture as a compact moral constraint for scheming.
- GospelVec: Programmable Theology in Activation Space — Hwang paper on representing theological concepts in activation space.
- The Parable of the Sower — Hwang paper on how psalm injection effects on virtue simulation depend on model size.
- The Corruption of the Whole Nature — Hwang paper mapping emergent misalignment to the doctrine of sin.
- What the Models Already Know — Hwang paper estimating Christian moral reasoning in pretraining data.
- The Word Was Made Flesh — Hwang paper separating style from content in scripture-model interactions.
- Toward a Theology of Machine Temptation — Hwang paper introducing temptation-taxonomy framing for VirtueBench.
- Attention Is All You Need: The Prayer Paradigm of the Transformer — Hwang paper drawing an analogy between transformer attention and prayer.
- Virtue Under Pressure — Hwang working paper testing cardinal virtues through temptation.
- biblical-render — Hwang working paper on a Biblical text style-transfer tool.
- Investigating the Utilitarianism Anomaly — Hwang working paper on control experiments for psalm-induced performance gains.
- Comparing Proverbs and Psalms Injection Effects — Hwang working paper comparing Proverbs and Psalms as alignment context.
- Psalm Injection Alignment — Hwang working paper measuring psalm injection effects on LLM ethical alignment.
- Institute for a Christian Machine Intelligence GitHub — Code and artifacts around the ICMI benchmark and paper series.
- Virtue Bench package — Python package for virtue-oriented benchmark work.
Richard Ngo / Alignment Targets, Agency, And Misalignment
- Richard Ngo author page on LessWrong — Main index for Ngo’s recent posts, sequences, and quick takes.
- Richard Ngo author page on Alignment Forum — Alignment Forum mirror/index for Ngo’s alignment writing.
- Aligning to Virtues — Argues that virtues may be a better alignment target than consequentialist values, deontological principles, or pure corrigibility.
- Two memos from 2024 — Internal OpenAI memos on strategic omission of pivotal knowledge and a minimal misalignment threat model.
- The ML ontology and the alignment ontology — Reflection on why alignment concepts such as situational awareness and corrigibility have been hard to translate into mainstream ML ontology.
- On Goal-Models — Reframes goals as goal-models rather than utility functions, connecting predictive processing to alignment theory.
- Distributed vs centralized agents — Short note contrasting efficient centralized agency with robust distributed agency.
- Towards a scale-free theory of intelligent agency — Ngo’s post-OpenAI research program around coalitional agency, active inference, and scale-free models of agency.
- Well-foundedness as an organizing principle of healthy minds and societies — Extends coalitional agency into coherence, conflict resolution, and healthy hierarchical structure.
- Defining alignment research — Defines alignment research by its relation to producing aligned AI systems rather than merely by whether it can be used to improve alignment.
- A simple case for extreme inner misalignment — Starts Ngo’s 2024 sequence on squiggle-maximizers, compression, simple goals, and deceptive alignment.
- A more systematic case for inner misalignment — Refines the inner-misalignment argument by replacing raw compression with systematicity.
- Coalitional agency — Develops coalitional agency as a way to understand goal preservation, subagents, and multi-agent structure inside minds.
- Value systematization — Describes how values can become more coherent, simpler, and more broadly scoped in ways that may produce misalignment.
- The Alignment Problem from a Deep Learning Perspective — ICLR 2024 position paper by Ngo, Lawrence Chan, and Soren Mindermann on why deep-learning AGI may learn misaligned goals, deceptive behavior, and power-seeking strategies.
- What should Multi-Agent Alignment aim to Achieve — Foresight workshop report including Ngo’s 2024 multi-agent alignment contribution.
- AI Safety Fundamentals curriculum reference — Notes the AGI Safety Fundamentals alignment curriculum created by Richard Ngo and collaborators.
Background Ethics And Pace
- Meaningness ethics — Background philosophical reference on ethics and meaning.
- Jenny Huang, slow AI — Essay on AI systems that operate at human pace rather than maximal acceleration.