Papers I read - a leegao19 Collection

leegao19 's Collections

updated Mar 8

ShortGPT: Layers in Large Language Models are More Redundant Than You Expect

Paper • 2403.03853 • Published Mar 6 • 62

Note 1. For each layer, compute the dot-product between the hidden state vectors of the input tokens (Xi,t) and the corresponding output hidden state vectors (Xi+1,t). If the input and output vectors are very similar, it implies the layer didn’t do much transformation and thus has a low BI (block influence). 2. Use calibration set to "profile" the model and compute layerwise BI over this evaluation set 3. Prune low BI blocks first