Testing with new Claude-3.5 Sonnet

Just now, Anthropic released their Claude-3.5 Sonnet (see the announcement here), and promised to release rest of the models in the model family later this year (i hate this 🫠).

Tl;dr

  • 3.5 Sonnet is more capable than 3 Opus yet cheaper;
  • 3.5 Sonnet is really good at reasoning tasks (it will ask you for "elaboration" frequently);
  • 3.5 Sonnet is good at vision;
  • 3.5 Sonnet has an updated knowledge cutoff date;
  • and more...

Current AI Development Path

Accelerate faster please.🥺

Performance

As you can see, the new Claude-3.5 Sonnet is much capable than the one in Claude-3 family model. What's more, the new model is even more than Claude-3 Opus, which is quite amazing. We now got smaller model with better capacity, I can't wait for 3.5 Opus and Haiku.

And for text and vision ability, new Claude-3.5 Sonnet also surpasses Claude-3 Opus and even GPT-4o. For reasoning performance, the score of the new model is much higher than GPT-4o. (So where is the next frontier model from OpenAI???)

Better Reasoning

After testing with some questions that I have tested with Claude-3 Opus, I found a great leap in its reasoning ability.

So as you can see, new model is a lot more better. And there is a small detail, the new Claude-3.5 Sonnet would always ask for whether the user need elaboration. It's great, at least it will make the conversation more interactive (i guess?).

BUT, for complex tasks, 3.5 Sonnet may still have slight hallucination, but thanks for the Constitutional AI, it's a lot better than GPT-4o.

Better Vision

I haven't do any further test on this, but according to Anthropic, the new model is a lot better on it. And they even have a demo:

Seems better than GPT-4o and Claude-3 Opus...

New Knowledge Cutoff Date

The knowledge cutoff date for the new model is also updated to April 2024.

I haven't do deep testing on the new knowledge; however, in one specific test which is ask the model about the new TOEFL writing (staring from August 2023, so should be in the training data). But...
It's still referring to the old one. Well, at least GPT-4o is also wrong...🤦‍♂️

What's next?

Claude-3.5 Opus and Haiku! Yeah! But,

bruh...🤦‍♂️🤦‍♂️

In conclusion

The new Claude-3.5 Sonnet is really great, and I would do more experiments on it! Stay tuned.🫡

In the end, I would just put a meme...