Andrew Reed

andrewrreed

AI & ML interests

Applied ML, Practical AI, Inference & Deployment, LLMs, Multi-modal Models, RAG

Articles

Organizations

andrewrreed's activity

reacted to clem's post with ๐Ÿš€๐Ÿ”ฅ 16 days ago
view post
Post
4020
This is no Woodstock AI but will be fun nonetheless haha. Iโ€™ll be hosting a live workshop with team members next week about the Enterprise Hugging Face hub.

1,000 spots available first-come first serve with some surprises during the stream!

You can register and add to your calendar here: https://streamyard.com/watch/JS2jHsUP3NDM
ยท
reacted to melisa's post with ๐Ÿ”ฅ 2 months ago
view post
Post
2964
๐Ÿ”ฅ Introducing "Writing in the Margins (WiM)" - better inference pattern for long context LLMs that solves the Lost-in-the-Middle problem ๐Ÿ”ฅ

Paper page: Writing in the Margins: Better Inference Pattern for Long Context Retrieval (2408.14906)

TL;DR
Make your model write "margin notes" as you chunk prefill the KV cache. Then ask it reread all notes before it speaks up.
Works with humans, works with AI ๐Ÿค–

WiM leverages the chunked prefill of the key-value cache, which concurrently generates query-based extractive summaries at each step of the prefill that are subsequently reintegrated at the end of the computation. We term these intermediate outputs โ€œmarginsโ€, drawing inspiration from the practice of making margin notes for improved comprehension of long contexts in human reading. We show that this technique, which adds only minimal additional computation, significantly improves LLMs long context reasoning capabilities.

Think: Every chunk has a chance to be attended to/ be at the end of the context at least once. ๐ŸŽ‰

๐Ÿ“Š Results:
- An average accuracy boost of 7.5% in multi-hop reasoning tasks like HotpotQA and MultiHop-RAG.
- Even a 30% increase in F1-score for summarisation-like tasks (CWE).

Plus, WiM fits seamlessly into interactive applications (think: progress bar!). It can provide real-time progress updates during data retrieval and integration, making it user-friendly and transparent - a stark contrast to feeding 1mln tokens to an LLMs and waiting 6 min for the first token. ๐Ÿคฏ

๐Ÿ‘ฉโ€๐Ÿ’ป๐Ÿง‘โ€๐Ÿ’ป Check it out and contribute to our open-source project here: https://github.com/writer/writing-in-the-margins

๐Ÿง  More about chunked prefill: https://docs.vllm.ai/en/latest/models/performance.html#chunked-prefill
  • 2 replies
ยท
reacted to m-ric's post with ๐Ÿ”ฅ 3 months ago
view post
Post
1100
๐—Ÿ๐—น๐—ฎ๐—บ๐—ฎ-๐Ÿฏ.๐Ÿญ ๐—บ๐—ผ๐—ฑ๐—ฒ๐—น๐˜€ ๐—ณ๐—ถ๐—ป๐—ฎ๐—น๐—น๐˜† ๐—ด๐—ฒ๐˜ ๐˜๐—ต๐—ฒ๐—ถ๐—ฟ ๐—–๐—ต๐—ฎ๐˜๐—ฏ๐—ผ๐˜ ๐—”๐—ฟ๐—ฒ๐—ป๐—ฎ ๐—ฟ๐—ฎ๐—ป๐—ธ๐—ถ๐—ป๐—ด ๐ŸŽ–๏ธ

Given the impressive benchmarks published my Meta for their Llama-3.1 models, I was curious to see how these models would compare to top proprietary models on Chatbot Arena.

Now we've got the results! LMSys released the ELO derived from thousands of user votes for the new models, and here are the rankings:

๐Ÿ’ฅ 405B Model ranks 5th overall, in front of GPT-4-turbo! But behind GPT-4o, Claude-3.5 Sonnet and Gemini-advanced.
๐Ÿ‘ 70B Model climbs up to 9th rank ! From 1206 โžก๏ธ 1244.
๐Ÿ‘ 8B Model improves from 1152 โžก๏ธ 1170.

โœ… This confirms that Llama-3.1 is a good contender for any task: any of its 3 model size is much cheaper to run than equivalent proprietary models!

For instance, here are the inference prices for the top models;
โžค GPT-4-Turbo inference price from OpenAI: $5/M input tokens, $15/M output tokens
โžค Llama-3.1-405B from HF API (for testing only): 3$/M for input or output tokens (Source linked in the first comment)
โžค Llama-3.1-405B from HF API (for testing only): free โœจ

Get a head start on the HF API (resource by @andrewrreed ) ๐Ÿ‘‰ https://huggingface.co/learn/cookbook/enterprise_hub_serverless_inference_api
  • 1 reply
ยท
reacted to dvilasuero's post with ๐Ÿค—โค๏ธ๐Ÿš€๐Ÿ”ฅ 5 months ago
view post
Post
7893
Today is a huge day in Argillaโ€™s history. We couldnโ€™t be more excited to share this with the community: weโ€™re joining Hugging Face!

Weโ€™re embracing a larger mission, becoming part of a brilliant and kind team and a shared vision about the future of AI.

Over the past year, weโ€™ve been collaborating with Hugging Face on countless projects: launching partner of Docker Spaces, empowering the community to clean Alpaca translations into Spanish and other languages, launching argilla/notus-7b-v1 building on Zephyrโ€™s learnings, the Data is Better Together initiative with hundreds of community contributors, or releasing argilla/OpenHermesPreferences, one of the largest open preference tuning datasets

After more than 2,000 Slack messages and over 60 people collaborating for over a year, it already felt like we were part of the same team, pushing in the same direction. After a week of the smoothest transition you can imagine, weโ€™re now the same team.

To those of you whoโ€™ve been following us, this wonโ€™t be a huge surprise, but it will be a big deal in the coming months. This acquisition means weโ€™ll double down on empowering the community to build and collaborate on high quality datasets, weโ€™ll bring full support for multimodal datasets, and weโ€™ll be in a better place to collaborate with the Open Source AI community. For enterprises, this means that the Enterprise Hub will unlock highly requested features like single sign-on and integration with Inference Endpoints.

As a founder, I am proud of the Argilla team. We're now part of something bigger and a larger team but with the same values, culture, and goals. Grateful to have shared this journey with my beloved co-founders Paco and Amรฉlie.

Finally, huge thanks to the Chief Llama Officer @osanseviero for sparking this and being such a great partner during the acquisition process.

Would love to answer any questions you have so feel free to add them below!
ยท
reacted to lunarflu's post with โค๏ธ 5 months ago
view post
Post
1881
cooking up something....anyone interested in a daily activity tracker for HF?
ยท
reacted to tomaarsen's post with ๐Ÿš€๐Ÿ”ฅ 6 months ago
view post
Post
2356
NuMind has just released 3 new state-of-the-art GLiNER models for Named Entity Recognition/Information Extraction. These GLiNER models allow you to specify any label that you want, and it'll find spans in the text corresponding to your label. It's been shown to work quite well on unusual domains, e.g. celestial entities in my picture.

There are 3 models released:
- numind/NuNER_Zero:
The primary model, SOTA & can detect really long entities.
- numind/NuNER_Zero-span:
Slightly better performance than NuNER Zero, but can't detect entities longer than 12 tokens.
- numind/NuNER_Zero-4k:
Slightly worse than NuNER Zero, but has a context length of 4k tokens.

Some more details about these models in general:
- They are *really* small, orders of magnitude smaller than LLMs, which don't reach this level of performance.
- Because they're small - they're fast: <1s per sentence on free GPUs.
- They have an MIT license: free commercial usage.

Try out the demo here: https://huggingface.co/spaces/numind/NuZero
Or check out all of the models here: numind/nunerzero-zero-shot-ner-662b59803b9b438ff56e49e2

If there's ever a need for me to extract some information from any text: I'll be using these. Great work @Serega6678 !
  • 3 replies
ยท
reacted to alvarobartt's post with โค๏ธ๐Ÿ”ฅ 6 months ago
view post
Post
2918
๐Ÿ”ฅ Prometheus 2 was recently released by Kaist AI as an alternative and closely mirroring both human and GPT-4 evaluation, and surpassing the former Prometheus!

prometheus-eval/prometheus-7b-v2.0
prometheus-eval/prometheus-8x7b-v2.0

๐ŸŒฌ๏ธFine-tuned on top of mistralai/Mistral-7B-Instruct-v0.2 and mistralai/Mixtral-8x7B-Instruct-v0.1
๐Ÿ—‚๏ธThe datasets used for fine-tuning have been publicly released i.e. prometheus-eval/Feedback-Collection and prometheus-eval/Preference-Collection
๐Ÿค๐ŸปUnified LM evaluator for absolute (a single prompt-completion pair) and relative (two completions for a given prompt) due to model merging
โŒNo longer needs a mandatory reference / golden answer, but can still be provided optionally
๐Ÿ”Surpasses the former version of Prometheus, and has a high correlation with human, GPT-4, and Claude 3 Opus scores when evaluating LMs
๐Ÿ“Apache 2.0 license

Long-story short, an amazing job from Kaist AI bridging the gap with LLM evaluators other than proprietary and bigger models!

This week at Argilla, we decided to add a new task to use Prometheus 2 as an LLM evaluator using distilabel, so we implemented PrometheusEval.

๐Ÿ˜ฑ Using PrometheusEval running their 7B variant with vLLM in a single L40 on top of HuggingFaceH4/instruction-dataset, we got the 327 existing prompt-completion pairs evaluated and pushed to the Hub in less than 2 minutes!

Find the generated dataset and the code at distilabel-internal-testing/instruction-dataset-prometheus
  • 1 reply
ยท
reacted to davanstrien's post with ๐Ÿ”ฅ 6 months ago
view post
Post
2557
Introducing CosmoChat, a multiturn chat dataset based on Cosmopedia that I'm working on in the open on the Hub.

๐ŸŽฏ Goals:
๐Ÿ’ฌ Create multi-turn chats seeded from Cosmopedia
๐ŸŽ“ Customize questions for different audience levels
๐Ÿ” Evaluate the model's ability to elaborate and clarify
๐Ÿค“ (I want to learn more about creating valuable synthetic datasets, and I learn best by doing stuff rather than reading stuff).

Cosmochat is created using the excellent distilabel library.

๐Ÿ”— Explore the current version of the dataset: davanstrien/cosmochat
๐Ÿ“ Read more: https://huggingface.co/blog/davanstrien/cosmochat
  • 2 replies
ยท
replied to their post 6 months ago
view reply

Thanks! And yes, several people have pointed out the light mode color issue... will push a fix when I get the chance

posted an update 6 months ago
view post
Post
2399
๐Ÿ”ฌ Open LLM Progress Tracker ๐Ÿ”ฌ

Inspired by the awesome work from @mlabonne , I created a Space to monitor the narrowing gap between open and proprietary LLMs as scored by the LMSYS Chatbot Arena ELO ratings ๐Ÿค—

The goal is to have a continuously updated place to easily visualize these rapidly evolving industry trends ๐Ÿš€

๐Ÿ”— Open LLM Progress Tracker: andrewrreed/closed-vs-open-arena-elo
๐Ÿ”— Source of Inspiration: https://www.linkedin.com/posts/maxime-labonne_arena-elo-graph-updated-with-new-models-activity-7187062633735368705-u2jB/
  • 2 replies
ยท
reacted to Pclanglais's post with ๐Ÿ”ฅ 7 months ago
view post
Post
2320
Announcing that we are on our way to solve a long standing issue of document processing: correction of OCR mistakes. Pleias publishes the largest dataset to date with automated OCR correction, 1 billion words in English, French, German and Italian.

OCR quality is long-standing issue of digitization. Cultural heritage texts are especially concerned due to the primary sources being old documents (with many artifacts, blots, degradation) and to the limitation of OCR technology for historical scripts. When we released Common Corpus, a 500 Billion words corpus in the public domain, this was the primary criticism.

Recent breakthrough in post-OCR correction has been made possible thanks to progress in open LLM research and several months of dedicated training and alignment by Pleias as well as the HPC resources from GENCIโ€“IDRIS (Grant 2023-AD011014736) on Jean-Zay.

Announcement: https://huggingface.co/blog/Pclanglais/post-ocr-correction

Post-OCR-Correction dataset: https://huggingface.co/datasets/PleIAs/Post-OCR-Correction
reacted to fdaudens's post with ๐Ÿ”ฅ 7 months ago
view post
Post
2676
It's been only a week since I joined ๐Ÿค— and the community has released a constant flow of content!

Notable models:
- Apple OpenELM apple/openelm-instruct-models-6619ad295d7ae9f868b759ca + apple/openelm-pretrained-models-6619ac6ca12a10bd0d0df89e
- HuggingFaceM4 Idefics2 HuggingFaceM4/idefics2-8b
- Meta Llama 3 meta-llama/meta-llama-3-66214712577ca38149ebb2b6
- Microsoft Phi-3 microsoft/phi-3-6626e15e9585a200d2d761e3
- Snowflake Arctic Snowflake/arctic-66290090abe542894a5ac520

Great datasets:
- HuggingFaceFW FineWeb HuggingFaceFW/fineweb
- HuggingFaceM4/the_cauldron HuggingFaceM4/the_cauldron
- PleIAs/YouTube-Commons PleIAs/YouTube-Commons

Fascinating Spaces
- InstantMesh TencentARC/InstantMesh
- Chat with Llama 3 8B ysharma/Chat_with_Meta_llama3_8b
- Parler-TTS https://huggingface.co/spaces/parler-tts/parler_tts_mini
- AI Jukebox enzostvs/ai-jukebox
- CosXL multimodalart/cosxl
- Singing songstarter nateraw/singing-songstarter
- Play with Idefics2 8B HuggingFaceM4/idefics-8b
- CodeQwen1.5-7B-Chat Bot๐Ÿ‘พ
Qwen/CodeQwen1.5-7b-Chat-demo

I expected to be at the center of AI development. I'm not disappointed!
ยท
posted an update 7 months ago
view post
Post
2290
IMO, the "grounded generation" feature from Cohere's CommandR+ has flown under the radar...

For RAG use cases, responses directly include inline citations, making source attribution an inherent part of generation rather than an afterthought ๐Ÿ˜Ž

Who's working on an open dataset with this for the HF community to fine-tune with??

๐Ÿ”—CommandR+ Docs: https://docs.cohere.com/docs/retrieval-augmented-generation-rag

๐Ÿ”—Model on the ๐Ÿค— Hub: CohereForAI/c4ai-command-r-plus
  • 1 reply
ยท
reacted to VictorSanh's post with โค๏ธ 7 months ago
view post
Post
2742
New open multimodal model in town: Idefics2!

๐Ÿ’ช Strong 8B-parameters model: often on par with open 30B counterparts.
๐Ÿ”“Open license: Apache 2.0.
๐Ÿš€ Strong improvement over Idefics1: +12 points on VQAv2, +30 points on TextVQA while having 10x fewer parameters.
๐Ÿ“š Better data: boosting OCR capabilities with 6TB of documents to transcribe, and improving QA capabilities on charts/figures/diagrams.
๐Ÿ•ต๏ธโ€โ™€๏ธ Transparent training data: inspect and build upon all the data (10s of TB of data) we trained on.
๐Ÿ”ฒ More natural image processing: Incorporating strategies to treat images in their native resolution and native aspect ratio.
๐Ÿ“ธ High-resolution images: image resolutions up to 980 x 980 and integrating strategies that allow to trade computational efficiency for performance.
๐Ÿ˜Ž 2 checkpoints: Releasing both base checkpoint and instruction fine-tuned checkpoint. Chat version to come.

Ressources: HuggingFaceM4/idefics2-661d1971b7c50831dd3ce0fe
Blogpost: https://huggingface.co/blog/idefics2