Posts

Accurate Sublayer Pruning for Large Language Models by Exploiting Latency and Tunability Information