Models
Datasets
Spaces
Posts
Docs
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2310.18547

MOE papers to read

Copied from MoE using https://huggingface.co/spaces/librarian-bots/collection_cloner.

QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models

Paper • 2310.16795 • Published Oct 25, 2023 • 26
Pre-gated MoE: An Algorithm-System Co-Design for Fast and Scalable Mixture-of-Expert Inference

Paper • 2308.12066 • Published Aug 23, 2023 • 4
Towards MoE Deployment: Mitigating Inefficiencies in Mixture-of-Expert (MoE) Inference

Paper • 2303.06182 • Published Mar 10, 2023 • 1
EvoMoE: An Evolutional Mixture-of-Experts Training Framework via Dense-To-Sparse Gate

Paper • 2112.14397 • Published Dec 29, 2021 • 1

Punica: Multi-Tenant LoRA Serving

Paper • 2310.18547 • Published Oct 28, 2023 • 2

Papers related to parameter efficient finetuning methods.

LoRAShear: Efficient Large Language Model Structured Pruning and Knowledge Recovery

Paper • 2310.18356 • Published Oct 24, 2023 • 22
LoftQ: LoRA-Fine-Tuning-Aware Quantization for Large Language Models

Paper • 2310.08659 • Published Oct 12, 2023 • 22
ModuLoRA: Finetuning 3-Bit LLMs on Consumer GPUs by Integrating with Modular Quantizers

Paper • 2309.16119 • Published Sep 28, 2023 • 1
QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models

Paper • 2309.14717 • Published Sep 26, 2023 • 44

A Unified View of Long-Sequence Models towards Modeling Million-Scale Dependencies

Paper • 2302.06218 • Published Feb 13, 2023 • 1
ZeRO++: Extremely Efficient Collective Communication for Giant Model Training

Paper • 2306.10209 • Published Jun 16, 2023 • 2
SE-MoE: A Scalable and Efficient Mixture-of-Experts Distributed Training and Inference System

Paper • 2205.10034 • Published May 20, 2022 • 1
A Hybrid Tensor-Expert-Data Parallelism Approach to Optimize Mixture-of-Experts Training

Paper • 2303.06318 • Published Mar 11, 2023 • 1

S^{3}: Increasing GPU Utilization during Generative Inference for Higher Throughput

Paper • 2306.06000 • Published Jun 9, 2023 • 1
Fast Distributed Inference Serving for Large Language Models

Paper • 2305.05920 • Published May 10, 2023 • 1
Response Length Perception and Sequence Scheduling: An LLM-Empowered LLM Inference Pipeline

Paper • 2305.13144 • Published May 22, 2023 • 1
Towards MoE Deployment: Mitigating Inefficiencies in Mixture-of-Expert (MoE) Inference

Paper • 2303.06182 • Published Mar 10, 2023 • 1

QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models

Paper • 2310.16795 • Published Oct 25, 2023 • 26
Pre-gated MoE: An Algorithm-System Co-Design for Fast and Scalable Mixture-of-Expert Inference

Paper • 2308.12066 • Published Aug 23, 2023 • 4
Towards MoE Deployment: Mitigating Inefficiencies in Mixture-of-Expert (MoE) Inference

Paper • 2303.06182 • Published Mar 10, 2023 • 1
EvoMoE: An Evolutional Mixture-of-Experts Training Framework via Dense-To-Sparse Gate

Paper • 2112.14397 • Published Dec 29, 2021 • 1

LoftQ: LoRA-Fine-Tuning-Aware Quantization for Large Language Models

Paper • 2310.08659 • Published Oct 12, 2023 • 22
QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models

Paper • 2309.14717 • Published Sep 26, 2023 • 44
ModuLoRA: Finetuning 3-Bit LLMs on Consumer GPUs by Integrating with Modular Quantizers

Paper • 2309.16119 • Published Sep 28, 2023 • 1
LoRA ensembles for large language model fine-tuning

Paper • 2310.00035 • Published Sep 29, 2023 • 2

Company

© Hugging Face

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs