Jaward (Jaward Sesay)

posted an update 6 days ago

Post

2014

It's work like this that in some way signal the eventual “dominance” of AI over all the sciences.

“We train our model on the six-dimensional N-body phase space, predicting particle velocities as the time derivative of the model’s displacement outputs”

The emulator is capable of predicting
the nonlinear displacement and velocity fields for 128^3 particles in half a second on a single GPU🤯

1 reply

·

posted an update 21 days ago

Post

1712

Triton nanoGPT now has a custom cross entropy loss kernel 🚀
Next: matmul, gradually overthrowing all major PyTorch ops:)

Simplified pseudo for parallel cross-entropy loss compute:
- init program: get pid, compute offsets, load targets.
- init row_max and row_sum.
- for-loop1 (find max logits): update row_max with max logits.
- for-loop2 (compute softmax and loss): compute row_sum, update loss.
- add log(row_sum) and store loss.

Code: https://github.com/Jaykef/ai-algorithms/blob/main/triton_nanoGPT.ipynb

posted an update 24 days ago

Post

389

This has to be the first peak performance level use case of a non-autoregressive architecture for TTS. Flow matching for the win!!

Demo: mrfakename/E2-F5-TTS
Model: SWivid/E2-TTS

reacted to fdaudens's post with 🔥 about 1 month ago

Post

3013

The Nobel Prize background for Hopfield and Hinton's work on neural networks is pure gold. It's a masterclass in explaining AI basics.

Key takeaways from the conclusion:
- ML applications are expanding rapidly. We're still figuring out which will stick.
- Ethical discussions are crucial as the tech develops.
- Physics 🤝 AI: A two-way street of innovation.

Some mind-blowing AI applications in physics:
- Discovering the Higgs particle
- Cleaning up gravitational wave data
- Hunting exoplanets
- Predicting molecular structures
- Designing better solar cells

We're just scratching the surface. The interplay between AI and physics is reshaping both fields.

Bonus: The illustrations accompanying the background document are really neat. (Credit: Johan Jarnestad/The Royal Swedish Academy of Sciences)

#AI #MachineLearning #Physics #Ethics #Innovation

1 reply

·

posted an update about 1 month ago

Post

1114

Lightweight implementation of newly introduced “Differential Transformer”:
Proposes differential attention mechanism which computes attention scores as a difference between two separate softmax attention maps thereby reducing noise in attention blocks. [[[Differential nanoGPT]]] :)

Code: https://github.com/Jaykef/ai-algorithms/blob/main/DIFF_Transformer.ipynb
YT Video: https://youtu.be/9V4mJA5y7dg

reacted to clem's post with 👍 about 1 month ago

Post

4129

Open-source AI creates healthy competition in a field where natural tendencies lead to extreme concentration of power. Imagine a world where only one or two companies could build software. This is the biggest risk and ethical challenge of them all IMO. Let's fight this!

3 replies

·

reacted to clem's post with 👍 about 1 month ago

Post

3676

Very few people realize that most of the successful AI startups got successful because they were focused on open science and open-source for at least their first few years. To name but a few, OpenAI (GPT, GPT2 was open-source), Runway & Stability (stable diffusion), Cohere, Mistral and of course Hugging Face!

The reasons are not just altruistic, it's also because sharing your science and your models pushes you to build AI faster (which is key in a fast-moving domain like AI), attracts the best scientists & engineers and generates much more visibility, usage and community contributions than if you were 100% closed-source. The same applies to big tech companies as we're seeing with Meta and Google!

More startups and companies should release research & open-source AI, it's not just good for the world but also increases their probability of success!

4 replies

·

posted an update about 1 month ago

Post

2559

New hobby: creating AI research paper arts lol, using pymupdf to extract text and add background then animate with runway:) code coming soon…

posted an update about 1 month ago

Post

363

Triton-accelerated nanoGPT🤕
The WHY behind this ordeal - After practicing triton for about 2 weeks now, I challenged myself into implementing custom triton kernels for Karpathy's nanoGPT and quite an ordeal it was but somehow got something working, not perfect but getting there:), contributions are welcomed.

Code: https://github.com/Jaykef/Triton-nanoGPT

posted an update about 1 month ago

Post

1933

This is supercool!!
LlaVA-3D: adds 3D-awareness to LVMs without compromising 2D understanding capabilities.

Method: they developed a unified architecture that maps 2D clip patch features to their corresponding positions in 3D space - enabling joint 2D and 3D vision-language instruction tuning.

Project: https://zcmax.github.io/projects/LLaVA-3D/

posted an update about 2 months ago

Post

1332

Some interesting findings in this paper:
- They consider o1 a Large Reasoning Model (LRM) with a different arch from SOTA LLMs.
- Creative justifications: “It is almost as if o1 has gone from hallucinating to gaslighting!”. This is so true, I noticed also it can “hallucinate” its chain-of-thoughts lol.
- Accuracy/Cost Tradeoffs: o1 provides high accuracy but at significant computational and monetary costs due to hidden "reasoning tokens."
Paper: https://www.arxiv.org/abs/2409.13373

posted an update about 2 months ago

Post

1456

nanoGPT with Sigmoid Self-Attention
I couldn’t resist had to give it a try:)

Some observations on M2:
SSA was ~5-10% faster in training with similar final loss values, slightly less coherent text generation, marginally higher perplexity, and lower memory usage compared to softmax.

Code: https://github.com/Jaykef/ai-algorithms/blob/main/sigmoid_attn.ipynb

replied to their post about 2 months ago

I used to think this way, but as it turned these models don't just do probability distribution, they are actually learning features between these distributions and to use these features during inference require some "reasoning", capable models (gpt4, gpt3, claude3) prior to OpenAI o1 could barely reason through tasks, o1 now utilizes RL to boost reasoning during inference - scaling at inference has been a huge challenge but somehow OAI figured it out with RL. Obviously we are at an early stage of this breakthrough, proof of reasoning will become clearer in subsequent versions of o1.

Geoffrey Hinton gave a talk on this topic: https://www.youtube.com/watch?v=N1TEjTeQeg0

posted an update about 2 months ago

Post

1208

The breakthrough in OpenAI’s release goes way beyond just another family of capable models - it’s a monumental leap in LLM reasoning capabilities. One in which the limitations in pre-training become obsolete and the dream of scaling during inference becomes a reality.

Once again reinforcement learning (when rightly done) proves to be the ultimate “tool” that drives reasoning in AI models. OpenAI o1 (aka strawberry 🍓) can think and learn while thinking before giving a response. This is how we humans approach solving difficult problems.

In technical terms, o1 is trained with an RL algorithm to think productively using its chain of thought. In other words “the longer it thinks, the better it does on reasoning tasks”. Similar to how AlphaGo was able to beat the world champion at Go.

Read more: https://openai.com/index/learning-to-reason-with-llms/

2 replies

·

posted an update about 2 months ago

Post

303

Free research tip:
Get used to writing the first draft of your paper in markdown using vscode’s jupyter notebook extension - it lets you do quick sanity checks with code and maths - an absolute AAA experience:)

posted an update 2 months ago

Post

549

The Forward-Forward Algorithm🤖

FFA replaces the forward and backward passes in backpropagtion with two forward passes - one with positive (real) data and another with negative data. Each layer has its objective function - to increase or decrease a “goodness" metric. The positive pass uses real data and adjusts weights to increase “goodness” in every hidden layer. The negative pass does the opposite.

I must say reading&Implementing a godfather paper feels quite fulfilling:)
Thank you Prof. Geoffrey Hinton.

Code: https://github.com/Jaykef/ai-algorithms/blob/main/mnist_the_forward_forward_algorithm.ipynb

posted an update 2 months ago

Post

1325

Simplified implementation of “Neural Networks are Decision Trees”.

Showing that any neural network with any activation function can be represented as a decision tree. Since decision trees are inherently interpretable, their equivalence helps us understand how the network makes decisions.

In this implementation, we trained a simple neural network for 1k epochs on makemoons, saved the trained weights (state dicts), extracted the decision tree equivalent from the trained weight then visualize and evaluate.

Code: https://github.com/Jaykef/ai-algorithms/blob/main/nns_are%20decision_trees.ipynb

1 reply

·

posted an update 3 months ago

Post

1484

Alan Turing's mind-bender: "Can machines think?" in its glorified form. This 74yr old paper laid the foundation for how we think about AI and machine intelligence today. The level of detail, clarity and foresight is just phenomenal - he was way ahead of his time 🧠🤖

Original copy: https://archive.org/details/MIND--COMPUTING-MACHINERY-AND-INTELLIGENCE

posted an update 3 months ago

Post

1604

Cooked up a cool & much faster AI voice assistant space that also supports speech translation (with seamless-expressive). Start with the phrase "Please translate" followed by the speech you'd like to translate, to activate speech translation mode. Using opensource LLMs (Llama 3, Mistral etc) with edge tts for voice assistant and seamless-expressive for speech translation.

Give it a try: Jaward/optimus

posted an update 3 months ago

Post

1783

Supercool Weekend Read🤖
Nvidia researchers achieved SOTA LLM compression metrics using pruning and knowledge distillation techniques.

Details on Techniques (Simplified):
They started off with a large pre-trained language model (15B params), then:

1. Estimated the importance of different parts of the model (neurons, attention heads, layers) using activation-based metrics on a small calibration dataset.

2. Pruned (remove) less important parts of the model to reduce its size.

3. Retrained the pruned model using knowledge distillation, where the original large model acts as a teacher for the smaller pruned model.

4. Used a lightweight neural architecture search to find the best configuration for the pruned model.

5. Repeated this process iteratively to create even smaller models.

Cool, giving it a try this weekend 😎
Code: https://github.com/NVlabs/Minitron
Paper: https://arxiv.org/abs/2407.14679
Demo: nvidia/minitron

Jaward Sesay

AI & ML interests

Articles

Journey With Me Into The Mind of Large Language Models: Interesting Findings in AnthropicAI's Scaling Monosemanticity paper.

On Coding Your First Attention

Organizations

Jaward's activity