Skip to content

Releases: docling-project/docling

v2.31.0

25 Apr 08:28
Compare
Choose a tag to compare

Feature

  • Add tutorial using Milvus and Docling for RAG pipeline (#1449) (a2fbbba)

Fix

  • html: Handle address, details, and summary tags (#1436) (ed20124)
  • Treat overflowing -v flags as DEBUG (#1419) (8012a3e)
  • codecov: Fix codecov argument and yaml file (#1399) (fa7fc9e)

Documentation

v2.30.0

14 Apr 08:20
Compare
Choose a tag to compare

Feature

  • cli: Add option for html with split-page mode (#1355) (c0ba88e)
  • xlsx: Create a page for each worksheet in XLSX backend (#1332) (eef2bde)
  • OllamaVlmModel for Granite Vision 3.2 (#1337) (c605edd)

Fix

  • deps: Widen typer upper bound (#1375) (7e40ad3)
  • Auto-recognize .xlsx, .docx and .pptx files (#1340) (0de70e7)
  • docx: Declare image_data variable when handling pictures (#1359) (415b877)
  • Implement PictureDescriptionApiOptions.bitmap_area_threshold (#1248) (2503999)
  • Properly address page in pipeline _assemble_document when page_range is provided (#1334) (6b696b5)

v2.29.0

10 Apr 12:24
Compare
Choose a tag to compare

Feature

  • Handle <code> tags as code blocks (#1320) (0499cd1)
  • docx: Add text formatting and hyperlink support (#630) (bfcab3d)

Fix

  • docx: Adding new latex symbols, simplifying how equations are added to text (#1295) (14e9c0c)
  • pptx: Check if picture shape has an image attached (#1316) (dc3bf9c)
  • docx: Improve text parsing (#1268) (d2d6874)
  • Tesseract OCR CLI can't process images composed with numbers only (#1201) (b3d111a)

Documentation

v2.28.4

29 Mar 11:56
Compare
Choose a tag to compare

Fix

v2.28.3

28 Mar 18:30
Compare
Choose a tag to compare

Fix

v2.28.2

26 Mar 16:52
Compare
Choose a tag to compare

Fix

  • Improve HTML layer detection, various MD fixes (#1241) (9210812)
  • html: Fix HTML parsed heading level (#1244) (85c4df8)

v2.28.1

25 Mar 18:20
Compare
Choose a tag to compare

Fix

  • converter: Cache same pipeline class with different options (#1152) (825b226)
  • debug: Missing translation of bbox to to_bounding_box (#1220) (6df8827)
  • docx: Identifying numbered headers (#1231) (f739d0e)

Documentation

  • examples: Batch conversion doc raises_on_error (#1147) (0974ba4)

v2.28.0

19 Mar 15:18
Compare
Choose a tag to compare

Feature

  • SmolDocling: Support MLX acceleration in VLM pipeline (#1199) (1c26769)
  • Add PPTX notes slides (#474) (b454aa1)
  • Updated vlm pipeline (with latest changes from docling-core) (#1158) (2f72167)

Fix

  • Determine correct page size in DoclingParseV4Backend (#1196) (f5adfb9)
  • msword: Fixing function return in equations handling (#1194) (0b707d0)

Documentation

v2.27.0

18 Mar 13:37
Compare
Choose a tag to compare

Feature

  • Add factory for ocr engines via plugins (#1010) (6eaae3c)
  • Add DoclingParseV4 backend, using high-level docling-parse API (#905) (3960b19)
  • actor: Docling Actor on Apify infrastructure (#875) (772487f)
  • Equations to latex in MSWord backend (with inline groups) (#1114) (6eb718f)

Fix

Documentation

v2.26.0

11 Mar 11:12
Compare
Choose a tag to compare

Feature

  • Use new TableFormer model weights and default to accurate model version (#1100) (eb97357)

Fix

Documentation

  • Add description of DOCLING_ARTIFACTS_PATH env var (#1124) (e1c49ad)

Performance

  • New revision code formula model and document picture classifier (#1140) (5e30381)