by Defender
We know the correct solution to AI alignment has to be bottom up. We also know that you must answer the question “align to what?“. The final piece that isn’t yet widely accepted as far as I can tell is that you CANNOT align AI without aligning humans as well.
So, what are we doing here? Who’s working on the human alignment problem?
When a human asks an AI to do something that is dangerous for others, your only options are:
- comply & do the dangerous thing (misaligned)
- stop the human (misaligned, from the perspective of the human)