Docling for IBM watsonx: A Managed Service, Built on Open Source

Jun 15, 2026

Putting Docling into production has, until now, meant running the infrastructure yourself: deploy the services, configure the pipeline, plan for capacity, and keep it all healthy under load. That is very doable, but it is work that sits between a team and the actual goal of turning documents into AI-ready data.

Docling for IBM watsonx removes that step. It is a managed Docling-as-a-Service offering that lets teams put Docling into production without deploying, configuring, and maintaining the underlying infrastructure themselves. It is designed for a quicker path to value: out-of-the-box configuration, automatic scaling based on workload demand, and high-throughput processing without having to plan and manage dedicated capacity.

You can use it through a simple user interface for experimentation, inspection, and quick document processing. At scale, the same capabilities embed directly into production applications, automation pipelines, and enterprise AI workflows through easy-to-use APIs.

The most important thing to say about it, though, is what it means for the open-source project. So let us start there.

What this means for Docling as open source

Docling is open source first, and Docling for IBM watsonx does not change that. The managed service is not a fork and not a proprietary re-implementation. It runs the same open stack everyone else uses: docling, docling-core, docling-parse, docling-jobkit, and docling-serve.

That ordering is the whole point. The work required to run a production-grade, high-throughput managed service does not disappear into a private branch. It lands in the open repositories as fixes and features that every Docling user benefits from. Running Docling at scale is what surfaces the sharp edges, and across the first half of 2026 you can watch those edges getting filed down, release after release. The dominant theme of the year so far has not been new headline features. It has been hardening: stability, memory control, safe parallelism, and security.

A few concrete examples make the point.

A thread-safe, memory-bounded parser

The largest single investment has been in docling-parse, the C++/Python PDF engine at the bottom of the stack. Its 2026 releases read almost like a hardening checklist.

Early in the year the work was about not crashing: replacing fixed-size buffers with a safer append strategy to prevent segfaults, and robustifying parsing of broken PDFs. Then it was about not hanging: fixing an infinite loop in table-of-contents extraction triggered by circular references inside a PDF. Then it was about doing more at once, safely: making the parser thread-safe by default, adding a regexp fast path for stream decoding, and shipping memory-management improvements coordinated with the upstream docling library.

That arc culminated in the v6 major release: a public threaded PDF parser with a worker thread pool and a max_concurrent_results cap that explicitly bounds how many results are buffered in order to limit memory. Results stream out one at a time, each carrying its own success flag, error message, and timing information, so a single bad page no longer takes down a whole batch. The same release upgraded pillow, requests, pygments, and cryptography to close known vulnerabilities, and later updates added a flag to skip bitmap byte extraction entirely when a caller does not need it, trimming memory further.

This work targets a problem the community knew well. The older v4 parser could accumulate memory on very long documents, climbing well past 20 GB where other backends held steady around 4 GB. A bounded-memory, parallel, error-isolating parser is the proper fix, and it is sitting in the open repository for anyone to use.

Hardening all the way up the stack

The parser is not the only place this shows up.

In docling itself, 2026 brought a steady stream of reliability fixes across the backends. PPTX conversion now skips malformed shapes instead of aborting the whole file. The Markdown backend gained a fix that prevents a RecursionError on certain inputs. The LaTeX handler added path-traversal prevention, and input validation was strengthened for METS-GBS and USPTO XML sources. The library adopted the threaded docling-parse v6 backend and introduced a modular docling-slim package for a lighter dependency footprint.

In docling-core, the focus was data-model integrity: validation to catch duplicate references, repair of table hierarchies when rich cells break the structure, and a fix to prevent numeric precision loss when serializing tables to Markdown.

In docling-jobkit, the distributed execution layer, the theme was resilience against real infrastructure. It can now recover orphaned job status when a worker pod is killed mid-execution, cap the maximum number of Redis connections to avoid pool exhaustion, and run a hardened Ray dispatcher with execution leases and a watchdog that cleans up stuck jobs.

In docling-serve, the API service, operational hardening took the lead: readiness endpoints for clean Kubernetes rollouts, metrics on a separate port, removal of an expensive forced garbage collection from the status-poll hot path, control over how much error detail public API responses expose, failing fast on a bad config file, and server-side page slicing for very long PDFs.

None of this is glamorous, and that is the point. It is exactly the kind of unglamorous reliability work you want underneath a service you are going to trust with a production workflow, and all of it is open source.

Processing data at scale with the batch endpoint

Built on that hardened foundation, one of the most requested capabilities is now available: a batch endpoint for processing large volumes of documents in a single job.

Instead of submitting documents one request at a time, you can point Docling at a source, for example an S3 bucket, and have it process the entire collection and write the converted results back to a target location. This sits on top of the docling-jobkit orchestration work that landed throughout 2026, so the same resilience guarantees that protect a single conversion also apply to a million-page run.

For teams with large document estates, archives, scanned back-catalogs, or regulatory filings, this turns "convert everything" from a bespoke pipeline you have to build and babysit into a single managed job.

The new Docling Service Client

Whether you run your own open-source docling-serve deployment or use the managed SaaS, you can now talk to it through the new Docling Service Client.

Two things make it worth adopting. First, it is ultra lightweight. It installs without the heavy AI packages, with no torch, no model weights, and none of the inference dependencies. It is just the parts you need to submit work and collect results, which makes it trivial to drop into an application, a serverless function, or an automation step without bloating the environment.

Second, it is efficient by design. It handles the submission of conversion tasks for you, including the polling and waiting loop and efficient batching, so you are not hand-rolling retry logic and status checks around the API.

The same client works against either backend. You can prototype against a local docling-serve, then point the exact same code at Docling for IBM watsonx when you are ready to scale, with no rewrite required.

Installation

The Service Client is available in two ways. If you already have the full Docling package installed, the client is included and ready to use. If you want to keep your environment minimal, you can install just the client through the slim package with the service-client extra:

# option 1) included in the full Docling install
pip install docling

# option 2) slim/modular install with minimal dependencies
pip install "docling-slim[service-client]"

Usage

Here is a simple example showing how to use the Service Client:

from pathlib import Path
from docling.service_client import DoclingServiceClient
import os

# Setup required endpoint details
SERVICE_URL = os.getenv("DOCLING_SERVICE_URL")
API_KEY = os.getenv("DOCLING_API_KEY")

# Initialize the client
with DoclingServiceClient(url=SERVICE_URL, api_key=API_KEY) as client:
    # Convert a document as you usually do
    result = client.convert(
        source=Path("path/to/doc.pdf")
    )

    markdown = result.document.export_to_markdown()
    print(markdown)

For more examples including batch processing, custom export formats, and advanced configuration options, see the Service Client examples in the Docling repository.

Getting started with Docling for IBM watsonx

Docling for IBM watsonx is generally available now as a SaaS offering. You can try it for free and start processing documents through an easy-to-use interface, then move to API-based integration when you are ready to embed document intelligence into production workflows.

The service is available through the IBM Marketplace or the AWS Marketplace with a pay-as-you-go pricing.

The result is a practical and cost-effective way to turn complex documents into reusable, AI-ready data, while reducing the manual effort, cost, and operational complexity that can slow AI initiatives down.

Start a free trial Learn more

Recap

A managed service and a healthy open-source project are not in tension here. Docling for IBM watsonx exists because the open Docling stack is good enough to run in production, and the open stack is good enough to run in production because the work of building the managed service flows straight back into it.

If you have been waiting for Docling to feel production-ready, the 2026 hardening releases are the signal. Pull the latest, point the new Service Client at it, and tell us what you build.