Deriving the spirit of the law
Publication date
Authors
DOI
Document Type
Bachelor Thesis
Metadata
Show full item recordCollections
License
CC-BY-NC-ND
Abstract
I define two approaches to rule-based AI Safety: the letter-based approach, which is to simply constrain an agent’s behavior to satisfy a set of static conditions, and the spirit-based approach, which is to somehow let the agent act in accordance with what those rules intended. I explore the conditions under which a letter-based approach is insufficient. Then I describe one prominent letter-based approach to AI Safety,describe how it represents rules in STIT logic, and offer a mechanism for inferring a generalization from those rules that aims to approximate their intention. For that I use a version space learning algorithm. I finish with a small experiment.
Keywords
ai safety, ai alignment