How can I do machine learning programming work to help with AI alignment?

2 min read

Suggest changes in Google Docs

Jobs in this area tend to be competitive. The advice from the article about software engineering for AI alignment applies here as well.

A lot of work in this area probably increases existential risk by advancing capabilities. If you’re working on machine learning (ML) and you want to help prevent risk, beware of just continuing to do net harmful work because of a vague plan to stay in the ML area; if you don’t see a path toward applying your skills to alignment instead of capabilities, consider saving up money to be able to take time out working on your own projects, or donating that money. If you’re working at an AI organization, consider trying to get onto their alignment team (e.g. DeepMind tends to let people switch from capabilities to alignment work) or doing something else to help with existential risk.

If you have the runway to quit your job, one thing you could do to get into this area is contribute to open source ML programming for an organization like Eleuther. Eleuther has projects you can join that might be a good way to build skills and prove yourself for future job opportunities. (But be careful and pick projects that don’t increase existential risk; you can ask people at Eleuther how and whether different projects contribute to alignment versus capabilities.)

How can I work toward AI alignment as a software engineer?

What should I do with my machine learning research idea for AI alignment?