Uploading Guide

Supported file types

  • Documents: PDF, DOCX, TXT, MD, HTML, JSON.
  • Folders: drag-and-drop folders are supported; files are traversed and ingested.

Flow

1) Drag/drop files or folders into the Uploads view. 2) Configure chunk size/overlap and choose the embedding model. 3) Optionally enable Qdrant sync to upsert embeddings immediately. 4) Monitor progress and errors inline; processed chunks feed the Embeddings/Vector DB flows.

Implementation details (reliability)

  • Streaming reads: Files are read via streams where possible to avoid blocking the main thread and to surface progress.
  • Chunking: Text is chunked in a web worker when available; falls back to main thread if workers are unavailable or CSP-blocked.
  • Sanitization: Control characters are stripped and whitespace normalized before embedding to reduce tokenization issues.
  • PDF handling: Uses pdfjs-dist with strict CSP configuration to disable eval.
  • DOCX handling: Uses mammoth for reliable text extraction.
  • Error isolation: Failures on one file do not halt others; per-file progress and errors are reported in the UI.
  • Vector sync: Embedding results can be pushed to Qdrant; dimension hints help avoid mismatches.

Tips for large batches

  • Prefer Chromium-based browsers for better streaming and worker stability.
  • Keep the tab focused during long uploads to prevent background throttling.
  • Start with smaller batches if you see memory pressure, then scale up.

File size limits

  • There are no hard file size limits enforced, as files are processed in chunks regardless of their overall size. That said, keeping individual files to a reasonable size is recommended for optimal performance.