ML systems have grown more competent and general as the field of deep learning has matured. Reasoning about the behavior and internal structure of such systems can be challenging, especially since some failure modes arise only once an AI system is sufficiently sophisticated. We discuss some of the fundamental technical challenges around monitoring, robustness and control of AI systems. Current AI systems lack transparency and can exhibit surprising emergent capabilities. They are vulnerable to adversarial examples, Trojans and other attacks. These challenges in turn may make it hard to control AI systems and prevent unintended behaviour such as deception. When conducting research to advance AI safety, it is important to consider the risk of inadvertently accelerating AI capabilities in a way that undermines the overall goal of better understanding and controlling AI systems.
Y. John, L. Caldwell, D. McCoy, and O. Braganza, "Dead rats, dopamine, performance metrics, and peacock tails: Proxy failure is an inherent risk in goal-oriented systems," Behavioral and Brain Sciences, vol. 1, pp. 1-68, 2023. doi: 10.1017/S0140525X23002753.
K. Carlsmith, "Is Power-Seeking AI An Existential Risk?" [Online]. Available: https://arxiv.org/abs/2206.13353
R. Gallow, "Instrumental Convergence," [Online]. Available: instrumental_convergence.pdf
E. Hubinger et al., "Risks from Learned Optimization in Advanced Machine Learning Systems," [Online]. Available: https://arxiv.org/abs/1906.01820, 2021.
R. Ngo et al., "The alignment problem from a deep learning perspective," [Online]. Available: https://arxiv.org/abs/2109.13916, 2022.
D. Hendrycks et al., "Unsolved Problems in ML Safety," [Online]. Available: https://arxiv.org/abs/2109.13916, 2021.
Citation:
Dan Hendrycks. Introduction to AI Safety, Ethics and Society. Taylor & Francis, (2024). ISBN: 9781032798028. URL: www.aisafetybook.com
Cookies Notice: This website uses cookies to identify pages that are being used most frequently. This helps us analyze data about web page traffic and improve our website. We only use this information for the purpose of statistical analysis and then the data is removed from the system. We do not and will never sell user data. Read more about our cookie policy on our privacy policy. Please contact us if you have any questions.