Improving Hugging Face Training Efficiency Through Packing with Flash Attention about 1 month ago • 19
Aurora-M: The First Open Source Biden-Harris Executive Order Red teamed Multilingual Language Model Apr 2 • 6
The Mamba in the Llama: Distilling and Accelerating Hybrid Models Paper • 2408.15237 • Published 24 days ago • 36
Power Scheduler: A Batch Size and Token Number Agnostic Learning Rate Scheduler Paper • 2408.13359 • Published 28 days ago • 21
view article Article Improving Hugging Face Training Efficiency Through Packing with Flash Attention about 1 month ago • 19
Power-LM Collection Dense & MoE LLMs trained with power learning rate scheduler. • 3 items • Updated 8 days ago • 13
Transformer Explainer: Interactive Learning of Text-Generative Models Paper • 2408.04619 • Published Aug 8 • 152
OpenDevin: An Open Platform for AI Software Developers as Generalist Agents Paper • 2407.16741 • Published Jul 23 • 67
Enhancing Training Efficiency Using Packing with Flash Attention Paper • 2407.09105 • Published Jul 12 • 12
MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention Paper • 2407.02490 • Published Jul 2 • 23
Dolomite Engine Sample Collection This collections contains a sample dataset and model trained via dolomite-engine. Repo: https://github.com/ibm-granite/dolomite-engine/ • 2 items • Updated Jun 30 • 1
Reducing Transformer Key-Value Cache Size with Cross-Layer Attention Paper • 2405.12981 • Published May 21 • 28
Mitigating the Impact of Outlier Channels for Language Model Quantization with Activation Regularization Paper • 2404.03605 • Published Apr 4 • 1
Granite Code Models: A Family of Open Foundation Models for Code Intelligence Paper • 2405.04324 • Published May 7 • 20
view article Article Jack of All Trades, Master of Some, a Multi-Purpose Transformer Agent Apr 22 • 78
ALERT: A Comprehensive Benchmark for Assessing Large Language Models' Safety through Red Teaming Paper • 2404.08676 • Published Apr 6 • 3
Granite Code Models Collection A series of code models trained by IBM licensed under Apache 2.0 license. We release both the base pretrained and instruct models. • 23 items • Updated 21 days ago • 155
Dense Training, Sparse Inference: Rethinking Training of Mixture-of-Experts Language Models Paper • 2404.05567 • Published Apr 8 • 10
BRAIn: Bayesian Reward-conditioned Amortized Inference for natural language generation from feedback Paper • 2402.02479 • Published Feb 4 • 2
view article Article Saving Memory Using Padding-Free Transformer Layers during Finetuning By mayank-mishra • Jun 11 • 12
view article Article Aurora-M: The First Open Source Biden-Harris Executive Order Red teamed Multilingual Language Model By mayank-mishra • Apr 2 • 6
Aurora-M: The First Open Source Multilingual Language Model Red-teamed according to the U.S. Executive Order Paper • 2404.00399 • Published Mar 30 • 40
Teaching Large Language Models to Reason with Reinforcement Learning Paper • 2403.04642 • Published Mar 7 • 46
Aurora-M models Collection Aurora-M models (base, biden-harris redteams and instruct) • 5 items • Updated May 6 • 17
Variational Inference with Latent Space Quantization for Adversarial Resilience Paper • 1903.09940 • Published Mar 24, 2019 • 1
Adversarial Approximate Inference for Speech to Electroglottograph Conversion Paper • 1903.12248 • Published Mar 28, 2019 • 1
Variational Learning for Unsupervised Knowledge Grounded Dialogs Paper • 2112.00653 • Published Nov 23, 2021 • 1
Joint Reasoning on Hybrid-knowledge sources for Task-Oriented Dialog Paper • 2210.07295 • Published Oct 13, 2022 • 1
BLOOM: A 176B-Parameter Open-Access Multilingual Language Model Paper • 2211.05100 • Published Nov 9, 2022 • 28
ModuleFormer: Learning Modular Large Language Models From Uncurated Data Paper • 2306.04640 • Published Jun 7, 2023 • 7