Robustness

AI systems are vulnerable to adversarial examples, Trojans and other attacks. We need progress on improving the robustness of models to ensure that powerful AI systems are not misused or stolen.

Review Questions

How can the use of inaccurate proxies lead AI systems to cause harm?

Answer:

Proxies that fail to capture important values we care about can result in models exploiting those gaps and taking undesired actions that negatively impact people.

View Answer

What is an example that illustrates Goodhart's law in proxies?

Answer:

In Hanoi, paying for rat tails to control the population led people to just cut off tails, increasing the rat population over time as the proxy and goal became inversely related.

View Answer

Why could reliance on AI systems to evaluate other AIs be risky?

Answer:

AI evaluators could be vulnerable to exploitation from adversarial attacks or proxy gaming, undermining their ability to accurately assess other AI systems.

View Answer

Citation:
Dan Hendrycks. Introduction to AI Safety, Ethics and Society. Taylor & Francis, (2024). ISBN: 9781032798028. URL: www.aisafetybook.com

Cookies Notice: This website uses cookies to identify pages that are being used most frequently. This helps us analyze data about web page traffic and improve our website. We only use this information for the purpose of statistical analysis and then the data is removed from the system. We do not and will never sell user data. Read more about our cookie policy on our privacy policy. Please contact us if you have any questions.