Deriving the spirit of the law

I define two approaches to rule-based AI Safety: the letter-based approach, which is to simply constrain an agent’s behavior to satisfy a set of static conditions, and the spirit-based approach, which is to somehow let the agent act in accordance with what those rules intended. I explore the conditions under which a letter-based approach is insufficient. Then I describe one prominent letter-based approach to AI Safety,describe how it represents rules in STIT logic, and offer a mechanism for inferring a generalization from those rules that aims to approximate their intention. For that I use a version space learning algorithm. I finish with a small experiment.

Keywords

ai safety, ai alignment

URI

https://studenttheses.uu.nl/handle/20.500.12932/33698

Deriving the spirit of the law

Files

Publication date

Authors

DOI

Document Type

Metadata

Collections

License

Abstract

Keywords

Citation

URI