openai langchain beautifulsoup4 chromadb tiktoken pypdf gradio PyMuPDF gdown docx2txt