All
Section
Appendix
3.2

Monitoring

Current AI systems lack transparency and can exhibit surprising emergent capabilities. Research is needed to ensure we can understand models' internal representations, monitor anomalies, and evaluate hazardous capabilities.

No items found.

Review Questions

Why would we want to develop tools that make AI decision-making more transparent?

Answer:

This can be important for reasons of fairness and justice: when AI systems are being used to make important decisions, for example in the criminal justice system, those on the receiving end of these decisions might have a right to receive an explanation of the decision. Transparency can also help with accountability for harms and hazards caused by AI systems: we could use it to set standards for developers’ liability based on what they could reasonably have foreseen and prevented. Lastly, transparency can enable us to detect and combat instances of deceptive behaviour by AI systems.

View Answer
Hide Answer

Why can explanations that models provide for their decisions be unreliable?

Answer:

Explanations can sound plausible but actually "confabulate" by giving reasons not faithful to the model's real internal processes.

View Answer
Hide Answer

Explain what mechanistic interpretability and representation engineering are and how they differ as approaches to transparency.

Answer:

Mechanistic interpretability aims to identify and combine low-level components of the model in order to understand its behaviour. Representation engineering starts from identifying how models represent concepts that are of interest and uses this to analyse and control them.

View Answer
Hide Answer