IBM's SELF-ALIGN: The Next Step in AI Learning Beyond ChatGPT

Step into the future of AI with IBM's latest breakthrough, SELF-ALIGN.

May 10, 2023

Step into the future of AI with IBM's latest breakthrough, SELF-ALIGN. This innovative method is revolutionizing how AI learns, with minimal human supervision. By teaching AI to guide itself using a set of principles, IBM researchers have made strides in AI efficiency and control. Curious about how IBM is setting new standards in the world of AI? I will try to explain it,

For me it started with a text message from Fillip Hammerstad a friend of mine,

You should write about this tweet ! https://twitter.com/generatorman_ai/status/1655941986627772419?s=46&t=aXh4A79odS_B6vwL-Q6spw

So I followed it down the rabbit hole.

IBM researchers have a project homepage here: https://github.com/IBM/Dromedary

The complete research paper can be found here:

https://arxiv.org/abs/2305.03047

The way I read this, and I am basing it on the tweet and my understanding of the homepage. The actual research paper I need more time to understand.

Today, most AI chatbots, like ChatGPT, are trained using a lot of human input. Humans provide example conversations (annotations), and give feedback on the AI's responses (reinforcement learning). This helps to shape the AI's output, ensuring it's useful, ethical, and reliable. However, this process is expensive and time-consuming. It also introduces potential problems, like inconsistent responses and biases. Most AI apps, websites use this approach as far as I can understand.

SELF-ALIGN tries to solve these problems by reducing the amount of human supervision needed. I think it can train itself, while growing, kind of like a person studying. It does this through four main steps:

Generating synthetic prompts: The AI generates its own practice questions (prompts) to learn from. A special method is used to make sure these questions cover a wide range of topics.
Guiding the AI with principles: Instead of needing lots of human input, the AI uses a small set of human-written principles to guide its responses. It learns these principles through example demonstrations.
Fine-tuning the AI with self-aligned responses: After it's learned from these principles, the AI is fine-tuned with its own responses. This means that the AI can now respond to queries without needing the principle set and demonstrations.
Refining responses: Lastly, the AI undergoes a refining step to improve its responses. This step helps to address issues like overly short or indirect responses.

The researchers tested SELF-ALIGN on a new AI system they developed, called Dromedary. They found that with less than 300 lines of human input, Dromedary outperformed other AI systems on a variety of tests.

Watson Dromedary? Nice thing about IBM is that research is public, and I am sure experts will figure out some applications for this.

brown camel walking on desert — Photo by Wolfgang Hasselmann on Unsplash

Discussion about this post

Ready for more?