My Few Thoughts on Compute Scaling

To scale, or not to scale? It's a really interesting topic. Scaling Laws is a famous law in AI and ML, and the people's opinion's on it is also diverse. So I wanna share some thoughts on compute scaling, which is a part of the Scaling Laws.

I believe there's still significant scaling potential for LLM training in terms of computational power. This requires simultaneous efforts in both funding and resources, and the current trend looks promising. While we should be cautious about computational scaling turning into an arms race between companies, I think competition is exactly what we need right now. The key is to focus on training efficiency and avoid the paperclip effect. Otherwise, even if we pour all of humanity's resources into it, we won't see significant results. This could end up having a catastrophic impact on the global ecosystem, completely contradicting our vision of building an AGI system that benefits all of humanity. To be honest, even with high training efficiency, model training is incredibly expensive. A couple of months ago, Microsoft and OpenAI announced plans to invest $100 billion in building a massive computational center. In an interview last month, Anthropic CEO Dario mentioned that their current investments are sufficient for training the next generation of models, but he's unsure about next year. He predicts that model training next year could potentially cost tens to hundreds of billions of dollars.

Some might argue that instead of focusing so much on scaling up computational power to improve model capabilities, we should research more efficient model architectures. However, I think you need to ensure short-term research results with enormous potential and feasibility. Otherwise, for the big players, you're essentially gambling - and the reality is they can't afford to gamble. Once you fall behind, it's very difficult to catch up. Now it's just a matter of seeing who will have the last laugh in this long run.

Richards Tu’s Blog

My Personal Blog Space

My Few Thoughts on Compute Scaling