Arconia Docling
Arconia provides seamless integration with Docling, a powerful AI-powered document conversion service that transforms documents into structured formats like Markdown. The integration is based on the Docling Java project and provides an auto-configured DoclingServeApi that can be used in Spring Boot applications to interact with a Docling Serve API for converting various document formats including PDFs, Word documents, and web pages.
Quick Start
Let’s see how you can get started with Arconia Docling in your Spring Boot application.
Dependencies
To add Docling support to your Spring Boot application, include the Arconia Docling Spring Boot Starter dependency in your project.
-
Gradle
-
Maven
dependencies {
implementation 'io.arconia:arconia-docling-spring-boot-starter'
}
<dependency>
<groupId>io.arconia</groupId>
<artifactId>arconia-docling-spring-boot-starter</artifactId>
</dependency>
Arconia publishes a BOM (Bill of Materials) that you can use to manage the version of the Arconia libraries. While not required, it is recommended to use the BOM to ensure that all dependencies are compatible.
-
Gradle
-
Maven
dependencyManagement {
imports {
mavenBom "io.arconia:arconia-bom:0.19.0"
}
}
<dependencyManagement>
<dependencies>
<dependency>
<groupId>io.arconia</groupId>
<artifactId>arconia-bom</artifactId>
<version>0.19.0</version>
<type>pom</type>
<scope>import</scope>
</dependency>
</dependencies>
</dependencyManagement>
Dev Services
Arconia Dev Services provide zero-code integrations for services your application depends on, both at development and test time, relying on the power of Testcontainers and Spring Boot.
When working with Docling, you can use the Docling Dev Service to automatically start a Docling Serve instance during development and testing, giving you the possibility to convert documents without manually setting up a Docling Serve instance.
To enable the Docling Dev Service, add the following dependency to your project:
-
Gradle
-
Maven
dependencies {
testAndDevelopmentOnly "io.arconia:arconia-dev-services-docling"
}
<dependency>
<groupId>io.arconia</groupId>
<artifactId>arconia-dev-services-docling</artifactId>
<scope>runtime</scope>
<optional>true</optional>
</dependency>
<build>
<plugins>
<plugin>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-maven-plugin</artifactId>
<configuration>
<includeOptional>false</includeOptional>
</configuration>
</plugin>
</plugins>
</build>
By default, the Dev Service is configured to expose the Docling Serve UI on a specific port. The application logs will show you the URL where you can access that.
... Docling Serve UI: http://localhost:<port>/ui
Running the Application
When using the Arconia Dev Services, you can keep running your application as you normally would. The Dev Services will automatically start when you run your application.
-
CLI
-
Gradle
-
Maven
arconia dev
./gradlew bootRun
./mvnw spring-boot:run
Unlike the lower-level Testcontainers support in Spring Boot, Arconia doesn’t require special tasks to run your application when using Dev Services (./gradlew bootTestRun or ./mvnw spring-boot:test-run) nor requires you to define a separate @SpringBootApplication class for configuring Testcontainers.
|
The application logs will show you the URL where you can access the Docling Serve UI for interactive document conversion.
Configuration
The Arconia Docling integration provides sensible defaults for connecting to a Docling Serve API. You can customize the connection settings and timeouts via configuration properties.
| Property | Default | Description |
|---|---|---|
|
Base URL for the Docling Serve API. |
|
|
|
Timeout to establish a connection to the Docling Serve API. |
|
|
Timeout for receving a response from the Docling Serve API. |
Actuator
Health Indicator
When Spring Boot Actuator is present on the classpath, Arconia automatically configures a health indicator for the Docling integration. This health indicator checks the connectivity to the Docling Serve API by calling its health endpoint. You can customize it via configuration properties.
| Property | Default | Description |
|---|---|---|
|
|
Whether the Docling health indicator should be enabled. |
When enabled, the health status will be included in the actuator /health endpoint response, showing whether the Docling Serve API is reachable and operational.
Using the Docling Client
Once you have added the dependency and optionally configured the connection settings, you can autowire and use the auto-configured DoclingServeApi in your Spring components.
Basic Usage
@Component
public class DocumentService {
private final DoclingServeApi doclingClient;
public DocumentService(DoclingServeApi doclingClient) {
this.doclingClient = doclingClient;
}
public String convertWebPage(String url) {
ConvertDocumentRequest request = ConvertDocumentRequest.builder()
.source(HttpSource.builder().url(url).build())
.build();
ConvertDocumentResponse response = doclingClient.convertSource(request);
return response.getDocument().getMarkdownContent();
}
}
Converting HTTP Sources
You can convert web pages or documents accessible via HTTP/HTTPS URLs:
ConvertDocumentRequest request = ConvertDocumentRequest.builder()
.source(HttpSource.builder()
.url(URI.create("https://example.com/document.pdf"))
.build())
.build();
ConvertDocumentResponse response = doclingClient.convertSource(request);
String markdownContent = response.getDocument().getMarkdownContent();
String filename = response.getDocument().getFilename();
Converting File Sources
You can also convert local files by encoding them as Base64:
byte[] fileContent = new ClassPathResource("document.pdf").getContentAsByteArray();
String base64Content = Base64.getEncoder().encodeToString(fileContent);
ConvertDocumentRequest request = ConvertDocumentRequest.builder()
.source(FileSource.builder()
.filename("document.pdf")
.base64String(base64Content)
.build())
.build();
ConvertDocumentResponse response = doclingClient.convertSource(request);
String markdownContent = response.getDocument().getMarkdownContent();
Conversion Options
You can customize the conversion process using ConvertDocumentOptions:
ConvertDocumentOptions options = ConvertDocumentOptions.builder()
.includeImages(true)
.doOcr(true)
.build();
ConvertDocumentRequest request = ConvertDocumentRequest.builder()
.source(HttpSource.builder()
.url(URI.create("https://example.com/document.pdf[https://docs.arconia.io/arconia-cli/latest/development/dev/]"))
.build())
.options(options)
.build();
ConvertDocumentResponse response = doclingClient.convertSource(request);
Error Handling
The DoclingServeApi will throw appropriate runtime exceptions for different error conditions, as managed by the underlying RestClient.
try {
ConvertDocumentRequest request = ConvertDocumentRequest.builder()
.source(HttpSource.builder()
.url(URI.create("https://invalid-url.com/document.pdf[https://docs.arconia.io/arconia-cli/latest/development/dev/]"))
.build())
.build();
ConvertDocumentResponse response = doclingClient.convertSource(request);
} catch (HttpClientErrorException.NotFound ex) {
log.warn("Document not found: {}", ex.getMessage());
} catch (HttpClientErrorException ex) {
log.error("Client error during conversion: {}", ex.getMessage());
} catch (HttpServerErrorException ex) {
log.error("Server error during conversion: {}", ex.getMessage());
}