Text Details
|
By dynamically updating the bias of each expert according to its recent load, Loss-Free Balancing can consistently maintain a balanced distribution of expert load. In addition, since Loss-Free Balancing does not produce any interference gradients, it also elevates the upper bound of model performance gained from MoE training.
—
Loss-Free Balancing
(other)
by AUXILIARY-LOSS-FREE LOAD BALANCING STRATEGY FOR MIXTURE-OF-EXPERTS
|
| Language: | English |
This text has been typed
4 times:
| Avg. speed: | 60 WPM |
|---|---|
| Avg. accuracy: | 96.2% |