Single Agent Safety

There are several dynamics that may cause machine learning systems to act contrary to the intentions of their designers. ML systems lack transparency regarding how they make decisions and are vulnerable to adversarial attacks. They are also prone to pursuing goals in unintended and harmful ways, and can prove difficult to control adequately. Researchers looking to advance the safety of AI systems often inadvertently accelerate progress in their capabilities, thereby increasing overall risks.

Summary

ML systems have grown more competent and general as the field of deep learning has matured. Reasoning about the behavior and internal structure of such systems can be challenging, especially since some failure modes arise only once an AI system is sufficiently sophisticated. We discuss some of the fundamental technical challenges around monitoring, robustness and control of AI systems. Current AI systems lack transparency and can exhibit surprising emergent capabilities. They are vulnerable to adversarial examples, Trojans and other attacks. These challenges in turn may make it hard to control AI systems and prevent unintended behaviour such as deception. When conducting research to advance AI safety, it is important to consider the risk of inadvertently accelerating AI capabilities in a way that undermines the overall goal of better understanding and controlling AI systems.

Discussion Questions

Choose one of the sections from this chapter and summarize briefly what you see as the key safety challenges described there. What are some questions that would be interesting to explore further to go deeper on this topic?
What links do you see between the different safety challenges discussed in this chapter?

Review Questions

Citation:
Dan Hendrycks. Introduction to AI Safety, Ethics and Society. Taylor & Francis, (2024). ISBN: 9781032798028. URL: www.aisafetybook.com

Cookies Notice: This website uses cookies to identify pages that are being used most frequently. This helps us analyze data about web page traffic and improve our website. We only use this information for the purpose of statistical analysis and then the data is removed from the system. We do not and will never sell user data. Read more about our cookie policy on our privacy policy. Please contact us if you have any questions.

Single Agent Safety

Summary

Further reading

Discussion Questions

Review Questions

Answer:

Answer:

Answer: