Arconia Docling

Arconia provides seamless integration with Docling, a powerful AI-powered document conversion service that transforms documents into structured formats like Markdown. The integration is based on the Docling Java project and provides an auto-configured DoclingServeApi that can be used in Spring Boot applications to interact with a Docling Serve API for converting various document formats including PDFs, Word documents, and web pages.

Quick Start

Let’s see how you can get started with Arconia Docling in your Spring Boot application.

Dependencies

To add Docling support to your Spring Boot application, include the Arconia Docling Spring Boot Starter dependency in your project.

  • Gradle

  • Maven

dependencies {
    implementation 'io.arconia:arconia-docling-spring-boot-starter'
}
<dependency>
    <groupId>io.arconia</groupId>
    <artifactId>arconia-docling-spring-boot-starter</artifactId>
</dependency>

Arconia publishes a BOM (Bill of Materials) that you can use to manage the version of the Arconia libraries. While not required, it is recommended to use the BOM to ensure that all dependencies are compatible.

  • Gradle

  • Maven

dependencyManagement {
    imports {
        mavenBom "io.arconia:arconia-bom:0.19.0"
    }
}
<dependencyManagement>
    <dependencies>
        <dependency>
            <groupId>io.arconia</groupId>
            <artifactId>arconia-bom</artifactId>
            <version>0.19.0</version>
            <type>pom</type>
            <scope>import</scope>
        </dependency>
    </dependencies>
</dependencyManagement>

Dev Services

Arconia Dev Services provide zero-code integrations for services your application depends on, both at development and test time, relying on the power of Testcontainers and Spring Boot.

When working with Docling, you can use the Docling Dev Service to automatically start a Docling Serve instance during development and testing, giving you the possibility to convert documents without manually setting up a Docling Serve instance.

To enable the Docling Dev Service, add the following dependency to your project:

  • Gradle

  • Maven

dependencies {
    testAndDevelopmentOnly "io.arconia:arconia-dev-services-docling"
}
<dependency>
    <groupId>io.arconia</groupId>
    <artifactId>arconia-dev-services-docling</artifactId>
    <scope>runtime</scope>
    <optional>true</optional>
</dependency>

<build>
    <plugins>
        <plugin>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-maven-plugin</artifactId>
            <configuration>
                <includeOptional>false</includeOptional>
            </configuration>
        </plugin>
    </plugins>
</build>

By default, the Dev Service is configured to expose the Docling Serve UI on a specific port. The application logs will show you the URL where you can access that.

... Docling Serve UI: http://localhost:<port>/ui

Running the Application

When using the Arconia Dev Services, you can keep running your application as you normally would. The Dev Services will automatically start when you run your application.

  • CLI

  • Gradle

  • Maven

arconia dev
./gradlew bootRun
./mvnw spring-boot:run
Unlike the lower-level Testcontainers support in Spring Boot, Arconia doesn’t require special tasks to run your application when using Dev Services (./gradlew bootTestRun or ./mvnw spring-boot:test-run) nor requires you to define a separate @SpringBootApplication class for configuring Testcontainers.

The application logs will show you the URL where you can access the Docling Serve UI for interactive document conversion.

Configuration

The Arconia Docling integration provides sensible defaults for connecting to a Docling Serve API. You can customize the connection settings and timeouts via configuration properties.

Table 1. Docling Client Configuration Properties
Property Default Description

arconia.docling.url

http://localhost:5001

Base URL for the Docling Serve API.

arconia.docling.connect-timeout

5s

Timeout to establish a connection to the Docling Serve API.

arconia.docling.read-timeout

30s

Timeout for receving a response from the Docling Serve API.

Actuator

Health Indicator

When Spring Boot Actuator is present on the classpath, Arconia automatically configures a health indicator for the Docling integration. This health indicator checks the connectivity to the Docling Serve API by calling its health endpoint. You can customize it via configuration properties.

Table 2. Health Configuration Properties
Property Default Description

management.health.docling.enabled

true

Whether the Docling health indicator should be enabled.

When enabled, the health status will be included in the actuator /health endpoint response, showing whether the Docling Serve API is reachable and operational.

Using the Docling Client

Once you have added the dependency and optionally configured the connection settings, you can autowire and use the auto-configured DoclingServeApi in your Spring components.

Basic Usage

@Component
public class DocumentService {

    private final DoclingServeApi doclingClient;

    public DocumentService(DoclingServeApi doclingClient) {
        this.doclingClient = doclingClient;
    }

    public String convertWebPage(String url) {
        ConvertDocumentRequest request = ConvertDocumentRequest.builder()
                .source(HttpSource.builder().url(url).build())
                .build();

        ConvertDocumentResponse response = doclingClient.convertSource(request);
        return response.getDocument().getMarkdownContent();
    }
}

Converting HTTP Sources

You can convert web pages or documents accessible via HTTP/HTTPS URLs:

ConvertDocumentRequest request = ConvertDocumentRequest.builder()
        .source(HttpSource.builder()
            .url(URI.create("https://example.com/document.pdf"))
            .build())
        .build();

ConvertDocumentResponse response = doclingClient.convertSource(request);
String markdownContent = response.getDocument().getMarkdownContent();
String filename = response.getDocument().getFilename();

Converting File Sources

You can also convert local files by encoding them as Base64:

byte[] fileContent = new ClassPathResource("document.pdf").getContentAsByteArray();
String base64Content = Base64.getEncoder().encodeToString(fileContent);

ConvertDocumentRequest request = ConvertDocumentRequest.builder()
        .source(FileSource.builder()
            .filename("document.pdf")
            .base64String(base64Content)
            .build())
        .build();

ConvertDocumentResponse response = doclingClient.convertSource(request);
String markdownContent = response.getDocument().getMarkdownContent();

Conversion Options

You can customize the conversion process using ConvertDocumentOptions:

ConvertDocumentOptions options = ConvertDocumentOptions.builder()
        .includeImages(true)
        .doOcr(true)
        .build();

ConvertDocumentRequest request = ConvertDocumentRequest.builder()
        .source(HttpSource.builder()
            .url(URI.create("https://example.com/document.pdf[https://docs.arconia.io/arconia-cli/latest/development/dev/]"))
            .build())
        .options(options)
        .build();

ConvertDocumentResponse response = doclingClient.convertSource(request);

Error Handling

The DoclingServeApi will throw appropriate runtime exceptions for different error conditions, as managed by the underlying RestClient.

try {
    ConvertDocumentRequest request = ConvertDocumentRequest.builder()
            .source(HttpSource.builder()
                .url(URI.create("https://invalid-url.com/document.pdf[https://docs.arconia.io/arconia-cli/latest/development/dev/]"))
                .build())
            .build();
    ConvertDocumentResponse response = doclingClient.convertSource(request);
} catch (HttpClientErrorException.NotFound ex) {
    log.warn("Document not found: {}", ex.getMessage());
} catch (HttpClientErrorException ex) {
    log.error("Client error during conversion: {}", ex.getMessage());
} catch (HttpServerErrorException ex) {
    log.error("Server error during conversion: {}", ex.getMessage());
}

Health Check

You can also programmatically check the health of the Docling Serve service:

HealthCheckResponse health = doclingClient.health();
if ("ok".equals(health.getStatus())) {
    // Docling server is healthy
} else {
    // Handle unhealthy server
}