Matthew Douglas's picture

Matthew Douglas

mdouglas

·

AI & ML interests

LLMs, quantization, NLP, embeddings, hardware, DevEx

Articles

GaLore: Advancing Large Model Training on Consumer-grade Hardware

Organizations

mdouglas's activity

upvoted a collection 1 day ago

Llama3-8B-1.58

A trio of powerful models: fine-tuned from Llama3-8b-Instruct, with BitNet architecture! • 3 items • Updated 6 days ago • 8

upvoted an article 2 days ago

Article

Accelerate 1.0.0

7 days ago

• 31

upvoted an article about 1 month ago

Article

XetHub is joining Hugging Face!

Aug 8

• 76

upvoted an article about 2 months ago

Article

Google releases Gemma 2 2B, ShieldGemma and Gemma Scope

Jul 31

• 58

upvoted a collection 2 months ago

AIMO Progress Prize

Models and datasets used in the winning solution to the AIMO 1st Progress Prize • 7 items • Updated Jul 19 • 9

upvoted a paper 2 months ago

The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale

Paper • 2406.17557 • Published Jun 25 • 84

upvoted 3 collections 4 months ago

Parallel Sentences Datasets

These datasets all have "english" and "non_english" columns for numerous datasets. They can be used to make embedding models multilingual. • 13 items • Updated Jun 18 • 7

Embedding Model Datasets

A curated subset of the datasets that work out of the box with Sentence Transformers: https://huggingface.co/datasets?other=sentence-transformers • 67 items • Updated Jul 3 • 61

MS MARCO Mined Triplets

These datasets contain MS MARCO Triplets gathered by mining hard negatives using various models. Each dataset has various subsets. • 14 items • Updated May 21 • 7

upvoted a paper 4 months ago

Piccolo2: General Text Embedding with Multi-task Hybrid Loss Training

Paper • 2405.06932 • Published May 11 • 16

upvoted 7 papers 5 months ago

Replacing Judges with Juries: Evaluating LLM Generations with a Panel of Diverse Models

Paper • 2404.18796 • Published Apr 29 • 68

A Careful Examination of Large Language Model Performance on Grade School Arithmetic

Paper • 2405.00332 • Published May 1 • 30

SpQR: A Sparse-Quantized Representation for Near-Lossless LLM Weight Compression

Paper • 2306.03078 • Published Jun 5, 2023 • 3

The case for 4-bit precision: k-bit Inference Scaling Laws

Paper • 2212.09720 • Published Dec 19, 2022 • 3

QLoRA: Efficient Finetuning of Quantized LLMs

Paper • 2305.14314 • Published May 23, 2023 • 45

JetMoE: Reaching Llama2 Performance with 0.1M Dollars

Paper • 2404.07413 • Published Apr 11 • 35

RecurrentGemma: Moving Past Transformers for Efficient Open Language Models

Paper • 2404.07839 • Published Apr 11 • 41

upvoted 2 collections 6 months ago

Zeroshot Classifiers

These are my current best zeroshot classifiers. Some of my older models are downloaded more often, but the models in this collection are newer/better. • 11 items • Updated Apr 3 • 103

Preference Datasets for KTO

This collection contains a list of curated preference datasets for KTO fine-tuning for intent alignment of LLMs through signals. • 5 items • Updated Jul 30 • 14

upvoted a paper 6 months ago

GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection

Paper • 2403.03507 • Published Mar 6 • 182

upvoted 2 papers 8 months ago

MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts

Paper • 2401.04081 • Published Jan 8 • 70

Mixtral of Experts

Paper • 2401.04088 • Published Jan 8 • 157

upvoted a paper 9 months ago

TinyLlama: An Open-Source Small Language Model

Paper • 2401.02385 • Published Jan 4 • 88

upvoted a collection 11 months ago

LLM Leaderboard best models ❤️‍🔥

A daily uploaded list of models with best evaluations on the LLM leaderboard: • 264 items • Updated Jun 22 • 392

upvoted 4 papers 11 months ago

FlashDecoding++: Faster Large Language Model Inference on GPUs

Paper • 2311.01282 • Published Nov 2, 2023 • 35

Distil-Whisper: Robust Knowledge Distillation via Large-Scale Pseudo Labelling

Paper • 2311.00430 • Published Nov 1, 2023 • 56

Retentive Network: A Successor to Transformer for Large Language Models

Paper • 2307.08621 • Published Jul 17, 2023 • 170

FP8-LM: Training FP8 Large Language Models

Paper • 2310.18313 • Published Oct 27, 2023 • 31

upvoted 5 papers about 1 year ago

CoEdIT: Text Editing by Task-Specific Instruction Tuning

Paper • 2305.09857 • Published May 17, 2023 • 7

LongNet: Scaling Transformers to 1,000,000,000 Tokens

Paper • 2307.02486 • Published Jul 5, 2023 • 80

TART: A plug-and-play Transformer module for task-agnostic reasoning

Paper • 2306.07536 • Published Jun 13, 2023 • 11

One-for-All: Generalized LoRA for Parameter-Efficient Fine-tuning

Paper • 2306.07967 • Published Jun 13, 2023 • 24

Knowledge Distillation of Large Language Models

Paper • 2306.08543 • Published Jun 14, 2023 • 19