saishf
/

Extended-Mega-Mash-262K-8B-L3

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Extended-Mega-Mash-262K-8B-L3 / README.md

saishf's picture

Update README.md

790ec48 verified 5 months ago

|

history blame contribute delete

1.6 kB

	---
	base_model:
	- saishf/Long-Neural-SOVLish-Devil-8B-L3-262K
	- saishf/Merge-Mayhem-L3-V2
	- saishf/Neural-SOVLish-Devil-8B-L3
	- saishf/SOVLish-Maid-L3-8B
	- saishf/Merge-Mayhem-L3-V2.1
	library_name: transformers
	tags:
	- mergekit
	- merge
	license: cc-by-nc-4.0
	---
	# merge

	This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit).

	## Merge Details
	Experimental

	This model is a attempt to push [saishf/SOVL-Mega-Mash-V2-L3-8B](https://huggingface.co/saishf/SOVL-Mega-Mash-V2-L3-8B) (my personal favourite model) to support 32K+ context.
	### Merge Method

	This model was merged using the [Model Stock](https://arxiv.org/abs/2403.19522) merge method using [saishf/Long-Neural-SOVLish-Devil-8B-L3-262K](https://huggingface.co/saishf/Long-Neural-SOVLish-Devil-8B-L3-262K) as a base.

	### Models Merged

	The following models were included in the merge:
	* [saishf/Merge-Mayhem-L3-V2](https://huggingface.co/saishf/Merge-Mayhem-L3-V2)
	* [saishf/Neural-SOVLish-Devil-8B-L3](https://huggingface.co/saishf/Neural-SOVLish-Devil-8B-L3)
	* [saishf/SOVLish-Maid-L3-8B](https://huggingface.co/saishf/SOVLish-Maid-L3-8B)
	* [saishf/Merge-Mayhem-L3-V2.1](https://huggingface.co/saishf/Merge-Mayhem-L3-V2.1)

	### Configuration

	The following YAML configuration was used to produce this model:

	```yaml
	models:
	- model: saishf/Neural-SOVLish-Devil-8B-L3
	- model: saishf/Merge-Mayhem-L3-V2
	- model: saishf/Merge-Mayhem-L3-V2.1
	- model: saishf/SOVLish-Maid-L3-8B
	merge_method: model_stock
	base_model: saishf/Long-Neural-SOVLish-Devil-8B-L3-262K
	dtype: bfloat16
	```