In February and March 2022, Dr. Vael Gates conducted 97 interviews with AI researchers about their perceptions of AI and the future of the field, with a focus on risks from advanced AI systems. 92 interviewees were randomly selected from NeurIPS or ICML 2021 submissions and 5 were outside recommendations. These interviews were transcribed and analyzed.
AnalysisTalk Text version
Preliminary results presented at Stanford.
Quantitative analysis of the common themes in the transcripts.
83 researchers agreed to share anonymized transcripts.
Interviewee quotes were assigned tags; here are some sample pairs.
Explore the questions in the interviews, with discussion on potential risks from advanced AI, common responses from AI researchers, and potential counterarguments.
- Most participants (75%), at some point in the conversation, said that they thought humanity would achieve advanced AI (imprecisely labeled "AGI" for the rest of this summary) eventually, but their timelines to AGI varied (source). Within this group:
- 32% thought it would happen in 0-50 years
- 40% thought 50-200 years
- 18% thought 200+ years
- and 28% were quite uncertain, reporting a very wide range.
(These sum to more than 100% because several people endorsed multiple timelines over the course of the conversation.)
Among participants who thought humanity would never develop AGI (22%), the most commonly cited reason was that they couldn't see AGI happening based on current progress in AI. (Source)
- Participants were pretty split on whether they thought the alignment problem argument was valid. Some common reasons for disagreement were (source):
- A set of responses that included the idea that AI alignment problems would be solved over the normal course of AI development (caveat: this was a very heterogeneous tag).
- Pointing out that humans have alignment problems too (so the potential risk of the AI alignment problem is capped in some sense by how bad alignment problems are for humans).
- AI systems will be tested (and humans will catch issues and implement safeguards before systems are rolled out in the real world).
- The objective function will not be designed in a way that causes the alignment problem / dangerous consequences of the alignment problem to arise.
- Perfect alignment is not needed.
Participants were also pretty split on whether they thought the instrumental incentives argument was valid. The most common reasons for disagreement were that 1) the loss function of an AGI would not be designed such that instrumental incentives arise / pose a problem and 2) there would be oversight (by humans or other AI) to prevent this from happening. (Source)
Some participants brought up that they were more concerned about misuse of AI than AGI misalignment (n = 17), or that potential risk from AGI was less dangerous than other large-scale risks humanity faces (n = 11). (Source)
Of the 55 participants who were asked / had a response to this question, some (n = 13) were potentially interested in working on AI alignment research. (Caveat for bias: the interviewer was less likely to ask this question if the participant believed AGI would never happen and/or the alignment/instrumental arguments were invalid, so as to reduce participant frustration. This question also tended to be asked in later interviews rather than earlier interviews.) Of those participants potentially interested in working on AI alignment research, almost all reported that they would need to learn more about the problem and/or would need to have a more specific research question to work on or incentives to do so. Those who were not interested reported feeling like it was not their problem to address (they had other research priorities, interests, skills, and positions), that they would need examples of risks from alignment problems and/or instrumental incentives within current systems to be interested in this work, or that they felt like they were not at the forefront of such research so would not be a good fit. (Source)
When participants were followed-up with ~5-6 months after the interview, 51% reported the interview had a lasting effect on their beliefs (source), and 15% reported the interview caused them to take new action(s) at work (source).
Thinking the alignment problem argument was valid, or the instrumental incentives argument was valid, both tended to correlate with thinking AGI would happen at some point. The effect wasn't symmetric: if participants thought these arguments were valid, they were quite likely to believe AGI would happen; if participants thought AGI would happen, it was still more likely that they thought these arguments were valid but the effect was less strong. (Source)