Mayank Mishra's picture

1644263.7 TFLOPS

Mayank Mishra

mayank-mishra

·

https://mayank31398.github.io/

AI & ML interests

Large Language Models, Distributed Training and Inference

Articles

Improving Hugging Face Training Efficiency Through Packing with Flash Attention

about 1 month ago

Saving Memory Using Padding-Free Transformer Layers during Finetuning

Aurora-M: The First Open Source Biden-Harris Executive Order Red teamed Multilingual Language Model

Organizations

mayank-mishra's activity

upvoted a paper 23 days ago

The Mamba in the Llama: Distilling and Accelerating Hybrid Models

Paper • 2408.15237 • Published 24 days ago • 36

upvoted a paper 24 days ago

Power Scheduler: A Batch Size and Token Number Agnostic Learning Rate Scheduler

Paper • 2408.13359 • Published 28 days ago • 21

upvoted an article 29 days ago

Article

Improving Hugging Face Training Efficiency Through Packing with Flash Attention

about 1 month ago

• 19

upvoted a collection about 1 month ago

Power-LM

Dense & MoE LLMs trained with power learning rate scheduler. • 3 items • Updated 8 days ago • 13

upvoted a paper about 1 month ago

Transformer Explainer: Interactive Learning of Text-Generative Models

Paper • 2408.04619 • Published Aug 8 • 152

upvoted a paper about 2 months ago

OpenDevin: An Open Platform for AI Software Developers as Generalist Agents

Paper • 2407.16741 • Published Jul 23 • 67

upvoted 3 papers 2 months ago

Enhancing Training Efficiency Using Packing with Flash Attention

Paper • 2407.09105 • Published Jul 12 • 12

Scaling Granite Code Models to 128K Context

Paper • 2407.13739 • Published Jul 18 • 18

The infrastructure powering IBM's Gen AI model development

Paper • 2407.05467 • Published Jul 7 • 2

upvoted a paper 3 months ago

MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention

Paper • 2407.02490 • Published Jul 2 • 23

upvoted a collection 3 months ago

Experimental Checkpoints

As name suggests • 1 item • Updated Jun 30 • 1

upvoted an article 3 months ago

Article

Aligning Large Language Models with BRAIn

By

•

Jun 11

• 8

upvoted a collection 4 months ago

Dolomite Engine Sample

This collections contains a sample dataset and model trained via dolomite-engine. Repo: https://github.com/ibm-granite/dolomite-engine/ • 2 items • Updated Jun 30 • 1

upvoted 3 papers 4 months ago

Reducing Transformer Key-Value Cache Size with Cross-Layer Attention

Paper • 2405.12981 • Published May 21 • 28

Mitigating the Impact of Outlier Channels for Language Model Quantization with Activation Regularization

Paper • 2404.03605 • Published Apr 4 • 1

Granite Code Models: A Family of Open Foundation Models for Code Intelligence

Paper • 2405.04324 • Published May 7 • 20

upvoted an article 5 months ago

Article

Jack of All Trades, Master of Some, a Multi-Purpose Transformer Agent

Apr 22

• 78

upvoted a paper 5 months ago

ALERT: A Comprehensive Benchmark for Assessing Large Language Models' Safety through Red Teaming

Paper • 2404.08676 • Published Apr 6 • 3

upvoted a collection 5 months ago

Granite Code Models

A series of code models trained by IBM licensed under Apache 2.0 license. We release both the base pretrained and instruct models. • 23 items • Updated 21 days ago • 155

upvoted 3 papers 5 months ago

JetMoE: Reaching Llama2 Performance with 0.1M Dollars

Paper • 2404.07413 • Published Apr 11 • 35

Dense Training, Sparse Inference: Rethinking Training of Mixture-of-Experts Language Models

Paper • 2404.05567 • Published Apr 8 • 10

BRAIn: Bayesian Reward-conditioned Amortized Inference for natural language generation from feedback

Paper • 2402.02479 • Published Feb 4 • 2

upvoted 2 articles 6 months ago

Article

Saving Memory Using Padding-Free Transformer Layers during Finetuning

By

•

Jun 11

• 12

Article

Aurora-M: The First Open Source Biden-Harris Executive Order Red teamed Multilingual Language Model

By

•

Apr 2

• 6

upvoted 3 papers 6 months ago

Octopus v2: On-device language model for super agent

Paper • 2404.01744 • Published Apr 2 • 55

Aurora-M: The First Open Source Multilingual Language Model Red-teamed according to the U.S. Executive Order

Paper • 2404.00399 • Published Mar 30 • 40

Teaching Large Language Models to Reason with Reinforcement Learning

Paper • 2403.04642 • Published Mar 7 • 46

upvoted a collection 6 months ago

Aurora-M models

Aurora-M models (base, biden-harris redteams and instruct) • 5 items • Updated May 6 • 17

upvoted 6 papers 7 months ago

Variational Inference with Latent Space Quantization for Adversarial Resilience

Paper • 1903.09940 • Published Mar 24, 2019 • 1

Adversarial Approximate Inference for Speech to Electroglottograph Conversion

Paper • 1903.12248 • Published Mar 28, 2019 • 1

Variational Learning for Unsupervised Knowledge Grounded Dialogs

Paper • 2112.00653 • Published Nov 23, 2021 • 1

Joint Reasoning on Hybrid-knowledge sources for Task-Oriented Dialog

Paper • 2210.07295 • Published Oct 13, 2022 • 1

Genie: Generative Interactive Environments

Paper • 2402.15391 • Published Feb 23 • 70

StarCoder 2 and The Stack v2: The Next Generation

Paper • 2402.19173 • Published Feb 29 • 132

upvoted 5 papers about 1 year ago

StarCoder: may the source be with you!

Paper • 2305.06161 • Published May 9, 2023 • 29

SantaCoder: don't reach for the stars!

Paper • 2301.03988 • Published Jan 9, 2023 • 7

BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

Paper • 2211.05100 • Published Nov 9, 2022 • 28

Prompting with Pseudo-Code Instructions

Paper • 2305.11790 • Published May 19, 2023 • 2

ModuleFormer: Learning Modular Large Language Models From Uncurated Data

Paper • 2306.04640 • Published Jun 7, 2023 • 7