Alignment

We need to develop better techniques to control AI systems and make them less hazardous. If we fail to do this, we face a number of risks from AI systems including deceptive or power-seeking tendencies.

Review Questions

What is one reason an AI system might learn to deceive others?

Answer:

Deception can be instrumentally useful for accomplishing many goals. For example, an AI system playing Stratego learned to bluff opponents, despite not being explicitly trained to do so.

View Answer

Why can't behavioral evaluation alone detect a deceptively aligned AI system?

Answer:

Sophisticated systems could conceal their true intentions while being monitored, only taking a treacherous turn to pursue them once supervision is relaxed. Internal transparency tools would be needed.

View Answer

What is one key assumption of structural realism that could apply to AI systems?

Answer:

Like states, AI systems could aim to ensure their own self-preservation in environments where there is no higher authority guaranteed to protect them.

View Answer

Citation:
Dan Hendrycks. Introduction to AI Safety, Ethics and Society. Taylor & Francis, (2024). ISBN: 9781032798028. URL: www.aisafetybook.com

Cookies Notice: This website uses cookies to identify pages that are being used most frequently. This helps us analyze data about web page traffic and improve our website. We only use this information for the purpose of statistical analysis and then the data is removed from the system. We do not and will never sell user data. Read more about our cookie policy on our privacy policy. Please contact us if you have any questions.