My Few Thoughts on AI Security and AGI

Tl;dr

  • AI security crucial for future development; advocates for precautions
  • Seeks balance between AI capabilities and safety measures
  • AGI definition debatable; suggests exceeding average human ability in most domains
  • Emphasizes importance of AI understanding context-dependent language
  • Considers implications of advanced AI and hopes for beneficial outcomes

On AI Security

I think the security is very important to the future AI development. I have argued with another person for quite a long time. He thinks that the current AI systems are not capable enough to have threat to us, while I think we should take precautions, we should get full prepared for any kind of thing that could happen in the future. I mean, I didn’t believe the “catastrophic consequences” shown in the Terminator in the past, but for today, I would fear bad things that would come along due to the rapid development. I really believe that stuff like alignment, security post-training is the lifeguard of humanity. They can set a guardrail for the model capacity, for the “monster” inside; like locking the model in a cage and we, human, can study it from outside w/out letting it harm us. But this does not mean that I’m denying or rejecting the development of AI systems, I think the current ones are great; gpt-4o, claude-3-opus, and upcoming llama-3-400b are all quite awesome. I just want them to be more secure, while being capable, which means finding a perfect balance between these two. And OpenAI just announced that new Security Committee will be formed, I hope it works ;)

By the way, I *really* love Anthropic; I love their models, I love their research, especially the recent one about model interpretability[1], I think the ability to “enable” the features in the model really pave a new way for model capacity discovery. I was surprised when I first read the blog.

On AGI

It’s a pretty abstract concept. I mean, how “general” is “general”? Even ASL classification by Anthropic is kinda fuzz. I think, to achieve “AGI” does not necessarily mean that we need to have an AI system that exceeds human in all domains. We just need one that exceeds average human ability in most domains. But before that, I think we should make sure the AI system can *really* understand:

  • Riddles
  • Jokes
  • Memes
  • Idioms
  • Stuff related to special cultural context

Although these seem not so important, I think they can reflect the model's basic and crucial language ability. They are the system that predicts the next words, so unless they really understand the pattern under words, they won’t be able to nail the points I mentioned. And if the model has strong capacity in next-word-prediction or really understands the underlying mystery between the words, it would possibly be really capable overall. (I learnt it from a podcast from Ilya hahaha)

But then again, what would possibly happen if an AI system really exceeds humans in not just average but all domains? Will we be replaced? Only time will tell. The only thing we can do is to get fully prepared, and make sure the future frontier will reach a balance.


I hope AGI can really benefits humanity in the future.

Read More:

[1] Mapping the Mind of a Large Language Model