Back to Explore

swiss-ai/mmore

GitHub
4 updates · last 90 days1 watchersOpen source

Last release: 2026-05-08

MMORE (Massive Multimodal Open RAG & Extraction) is an open-source, end-to-end pipeline that ingests heterogeneous files like PDFs, office documents, spreadsheets, emails, images, audio, video, and web pages, then processes, indexes, and retrieves knowledge from them. It standardizes content into a unified multimodal format, supports distributed CPU/GPU processing, and provides hybrid dense plus sparse retrieval with an integrated RAG service (CLI and APIs).

Project status

  • Actively maintained: Recent updates are appearing in 2026 (last upstream push 2026-05-20), with consecutive versioned work culminating in v1.2.3 and ongoing changes to pipeline, APIs, and ops behavior.
  • Update cadence (recent): v1.2.2 on 2026-04-20, v1.2.3 on 2026-05-08, and v1.2 on 2026-03-27, indicating roughly 2 to 6 week spacing across the last few months.

AI summary generated 2026-05-21

AI-generated from public sources. May be inaccurate. Report

Recent updates

  • v1.2.3

    2026-05-08

    Release v1.2.3 focuses on citation support improvements (chunk range metadata and a /chunks endpoint), internal refactors (shared dataclasses for processor metadata), and operational/storage improvements (caching ML model downloads). It also removes the old processing dashboard and improves documentation, plus adds testing and type fixes.

    BreakingFeatures
  • v1.2.2

    2026-04-20

    v1.2.2 focuses on improving incremental reprocessing by detecting file changes and reusing cached results, plus some pipeline optimizations. The code diff shows the incremental behavior was implemented via new cache logic based on a processed_at timestamp, and it also changes how outputs are written and cleared.

    BreakingFeatures
  • v1.2.1

    2026-03-28

    Release v1.2.1 updates the markdown table row regex used by the post-processor chunker utilities, and adds a regression test to ensure certain malformed lines are rejected quickly. It also updates the project version metadata and extends the dev dependency set.

    Security
  • v1.2

    2026-03-27

    Release v1.2 contains a mix of bug fixes, typing improvements, and substantial RAG pipeline work, including a ColPali-based PDF pipeline and reranker additions. It also includes several dependency bumps and CI workflow updates. The release notes list user-visible features like a new list files endpoint and exposing page/paragraph numbers in the retrieval API, but the code diff shows additional CLI and runtime behavior changes that are not explicitly documented.

    Features
  • v1.1.1

    2025-09-29

    The release notes for v1.1.1 do not enumerate any specific changes. The code diff shows a package version bump in pyproject.toml and a README image URL change.

  • v1.1

    2025-09-29

    v1.1 adds new capabilities around web and RAG workflows, including a dedicated websearch CLI and a RAG CLI, plus Google Drive ingestion support. It also includes a major rewrite of the HTML processor and multiple CI and production deployment updates. The release notes list many dependency bumps, but the code diff also shows some internal API and behavior changes that are not called out.

    Features
  • v1.0.1

    2025-06-24

    v1.0.1 primarily includes a fix for PDF ingestion (KeyError: encoder) along with several dependency bumps and some documentation/docker changes. The release notes also claim new Live retrieval API support (index API and retriever API). The code changes shown indicate additional behavioral and configuration/schema changes that are not mentioned in the release notes.

    Features
  • v1.0.0

    2025-06-12

    v1.0.0 introduces a broad set of new document processing capabilities, including an .eml processor, an HTML processor, and new RAG evaluation, post-processing, and filtering features. It also includes multiple refactors around the processing pipeline, configuration/loading strategy for post-processing (PP) modules, and several API-related fixes (notably for RAG, retriever, and indexer). Expect potential upgrade friction if you rely on older configuration arguments or the prior process engine behavior.

    BreakingFeatures