This model is currently being created. Do not use yet.
Adjusting down lr from 1e4 to 5e5 since we have some instability. Restarting Nov 6. Training only 500k steps.