Ateeqq's picture
Update README.md
899b715 verified
|
raw
history blame
2.28 kB
metadata
license: cc-by-nc-nd-4.0
inference:
  parameters:
    num_beams: 3
    num_beam_groups: 3
    num_return_sequences: 1
    repetition_penalty: 10
    diversity_penalty: 3.01
    no_repeat_ngram_size: 2
    temperature: 0.8
    max_length: 128
widget:
  - text: >-
      Data scientists need to be able to communicate their findings to others in
      a clear and concise way.
    example_title: Data scientists
  - text: >-
      Search engine optimization (SEO) is the practice of getting targeted
      traffic to a website from a search engine's organic rankings.
    example_title: SEO

Text Rewriter Paraphraser

This repository contains a fine-tuned text-rewriting model based on the T5-Base with 223M parameters.

Key Features:

  • Fine-tuned on t5-base: Leverages the power of a pre-trained text-to-text transfer model for effective paraphrasing.
  • Large Dataset (430k examples): Trained on a comprehensive dataset combining three open-source sources and cleaned using various techniques for optimal performance.
  • High Quality Paraphrases: Generates paraphrases that significantly alter sentence structure while maintaining accuracy and factual correctness.
  • Non-AI Detectable: Aims to produce paraphrases that appear natural and indistinguishable from human-written text.

Model Performance:

  • Train Loss: 1.0645
  • Validation Loss: 0.8761

Getting Started:

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

# Replace 'YOUR_TOKEN' with your actual Hugging Face access token
tokenizer = AutoTokenizer.from_pretrained("Ateeqq/Text-Rewriter-Paraphraser", token='YOUR_TOKEN')
model = AutoModelForSeq2SeqLM.from_pretrained("Ateeqq/Text-Rewriter-Paraphraser", token='YOUR_TOKEN')
text = "Data science is a field that deals with extracting knowledge and insights from data. "

inputs = tokenizer(text, return_tensors="pt")

output = model.generate(**inputs, max_length=50)

print(tokenizer.decode(output[0]))

Disclaimer:

This model is intended for research and creative writing purposes. It is essential to use the paraphrased text responsibly and ethically, with proper attribution of the original source.

Further Development:

(Mention any ongoing development or areas for future improvement in Discussions.)