What is superposition in interpretability?

Superposition refers to the concept that a neural network can sometimes use the same set of parameters to encode the recognition of multiple different disparate features. Since these features might not be related to each other this concept of superposition makes the task of interpretability of neural networks considerably harder. This concept is closely related to the concept of polysemantic neurons.



AISafety.info

AISafety.info is a project founded by Rob Miles. The website is maintained by a global team of specialists and volunteers from various backgrounds who want to ensure that the effects of future AI are beneficial rather than catastrophic.

© AISafety.info, 2022—1970

Aisafety.info is an Ashgro Inc Project. Ashgro Inc (EIN: 88-4232889) is a 501(c)(3) Public Charity incorporated in Delaware.