Uploading Guide
Supported file types
- Documents: PDF, DOCX, TXT, MD, HTML, JSON.
- Folders: drag-and-drop folders are supported; files are traversed and ingested.
Flow
1) Drag/drop files or folders into the Uploads view. 2) Configure chunk size/overlap and choose the embedding model. 3) Optionally enable Qdrant sync to upsert embeddings immediately. 4) Monitor progress and errors inline; processed chunks feed the Embeddings/Vector DB flows.
Implementation details (reliability)
- Streaming reads: Files are read via streams where possible to avoid blocking the main thread and to surface progress.
- Chunking: Text is chunked in a web worker when available; falls back to main thread if workers are unavailable or CSP-blocked.
- Sanitization: Control characters are stripped and whitespace normalized before embedding to reduce tokenization issues.
- PDF handling: Uses
pdfjs-distwith strict CSP configuration to disableeval. - DOCX handling: Uses
mammothfor reliable text extraction. - Error isolation: Failures on one file do not halt others; per-file progress and errors are reported in the UI.
- Vector sync: Embedding results can be pushed to Qdrant; dimension hints help avoid mismatches.
Tips for large batches
- Prefer Chromium-based browsers for better streaming and worker stability.
- Keep the tab focused during long uploads to prevent background throttling.
- Start with smaller batches if you see memory pressure, then scale up.
File size limits
- There are no hard file size limits enforced, as files are processed in chunks regardless of their overall size. That said, keeping individual files to a reasonable size is recommended for optimal performance.