Is the dataset available?

#1
by xzxy - opened

Excellent work! I am working to build one myself for our internal security team (Offensive, defensive, and compliance). However, I have yet to find a decent dataset to build from. Do you mind sharing yours? I thought of building one myself by feeding text documents into Mistral and outputting input/output pairs, but a head start on a dataset would be appreciated :)

Hello, good job. I have the same request. Thank you so much.

Hi there! πŸ€—

Great work on the dataset! Could you share insights into how the data pairs were collected? Also, any plans to release the dataset publicly? I'm currently working on building a cybersecurity chatbot similar to Lily and would find this data incredibly useful. Thanks!

Sego Lily Labs org

Thanks for the comments. I am working on cleaning this dataset so I can release it. I am also in the process of creating a new model and dataset that uses about 3 million pairs.

Great work! I am also interested in the dataset. Thanks

Really nice work!!! Is the dataset available? Thank you.

Hi @unshadow ,

I appreciate your work on creating the fine-tuned LLM for cyber security. Can you please let me know the size of the dataset used for fine-tuning? Also, is the dataset available, and when are you planning to release it? I appreciate your time and response. Thank you!

Hey There, I'm new to this and i have ben assigned to make a model like this. I downloaded the Lexi Llama 3 uncensored to test if it could help. It did. But How can i fine tune it on free google collab? Which model shall i use? llama3 direct, but it is censored or any other model. Also, it'd very lovely if y'all just guide me out of this. Please!. I Dont Really know what the dataset should look like or what to do!

Hey, I was also wondering the same thing, is the dataset available? Thank you for creating this tool!

+1 from me too.

Life moves on but could you release the dataset and we will do the cleaning for you :)

Sign up or log in to comment