Work with PostgreSQL Database with Vector Extension Using Spring JPA and JdbcClient

Introduction

Vector databases have become essential for modern AI applications, enabling semantic search, recommendation systems, and similarity-based queries. PostgreSQL’s pgvector extension brings vector capabilities to the familiar relational database, and Spring Framework provides excellent API to work with it.

This guide demonstrates how to use Spring Data JPA and Spring JdbcClient to work effectively with PostgreSQL tables with vector columns, showing you when to use each approach and how they complement each other.

All concepts in this guide are demonstrated in the accompanying GitHub repository: https://github.com/theautonomy/spring-jpa-jdbcclient-pgvector-example.git

Why Vectors in PostgreSQL?

Vector embeddings from AI models (like OpenAI, Hugging Face, or custom models) can be stored directly in PostgreSQL alongside your traditional data:

Unified data model: Keep vectors with related business data
Leverage PostgreSQL: No need for separate vector database
ACID guarantees: Transaction support for vector operations
Production-ready: PostgreSQL’s reliability and performance
Cost-effective: Unified infrastructure

JPA vs JdbcClient for Vector Operations

Both approaches have their place:

JPA: Best for standard CRUD operations and basic entity mapping
JdbcClient: Ideal when you need distance values in results or custom DTOs

This guide shows you how to use both effectively.

Getting Started

Dependencies

Add these dependencies to your Spring Boot project:

<parent>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-parent</artifactId>
    <version>3.5.6</version>
</parent>

<dependencies>
    <!-- Spring Boot Starters -->
    <dependency>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-data-jpa</artifactId>
    </dependency>

    <dependency>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-web</artifactId>
    </dependency>

    <!-- PostgreSQL Driver -->
    <dependency>
        <groupId>org.postgresql</groupId>
        <artifactId>postgresql</artifactId>
    </dependency>

    <!-- Vector Support -->
    <dependency>
        <groupId>com.pgvector</groupId>
        <artifactId>pgvector</artifactId>
        <version>0.1.6</version>
    </dependency>

    <dependency>
        <groupId>org.hibernate.orm</groupId>
        <artifactId>hibernate-vector</artifactId>
        <version>6.6.5.Final</version>
    </dependency>
</dependencies>

Ensure you’re using Java 21 or later for the best experience with modern Spring Boot features.

PostgreSQL Setup

You need PostgreSQL with the pgvector extension. You can set it up with Docker:

docker run -d \
  --name pgvector-demo \
  -e POSTGRES_DB=myvectordb \
  -e POSTGRES_USER=myuser \
  -e POSTGRES_PASSWORD=mypassword \
  -p 5432:5432 \
  pgvector/pgvector:pg17

Connect to the database and execute the following SQL to enable the vector extension:

CREATE EXTENSION IF NOT EXISTS vector;

Database Schema

Create a table with a vector column:

CREATE TABLE items (
    id SERIAL PRIMARY KEY,
    name TEXT NOT NULL,
    category TEXT,
    price NUMERIC(10, 2),
    embedding vector(4),  -- 4-dimensional vector
    created_at TIMESTAMP DEFAULT NOW()
);

The number in vector(4) specifies the dimensionality. In production, you’d typically use 384, 768, or 1536 dimensions depending on your embedding model.

Application Configuration

Configure your database connection in application.properties:

spring.datasource.url=jdbc:postgresql://localhost:5432/myvectordb
spring.datasource.username=myuser
spring.datasource.password=mypassword

JPA Approach

Defining the Entity

Create a JPA entity with vector support using Hibernate’s vector types:

import jakarta.persistence.*;
import org.hibernate.annotations.JdbcTypeCode;
import org.hibernate.type.SqlTypes;

@Entity
@Table(name = "items")
public class Item {

    @Id
    @GeneratedValue(strategy = GenerationType.IDENTITY)
    private Long id;

    @Column(nullable = false)
    private String name;

    private String category;

    private BigDecimal price;

    @JdbcTypeCode(SqlTypes.VECTOR)
    @Column(columnDefinition = "vector(4)")
    private float[] embedding;

    @Column(name = "created_at")
    private LocalDateTime createdAt;

    // Constructors, getters, and setters

    public Item() {
        this.createdAt = LocalDateTime.now();
    }

    public Item(String name, String category, BigDecimal price, float[] embedding) {
        this.name = name;
        this.category = category;
        this.price = price;
        this.embedding = embedding;
        this.createdAt = LocalDateTime.now();
    }

    // Standard getters and setters...
}

The @JdbcTypeCode(SqlTypes.VECTOR) annotation tells Hibernate to handle this field as a PostgreSQL vector type.

Creating a Repository

Define a Spring Data JPA repository with native queries for vector operations:

import org.springframework.data.jpa.repository.JpaRepository;
import org.springframework.data.jpa.repository.Query;
import org.springframework.data.repository.query.Param;
import org.springframework.stereotype.Repository;

import java.util.List;

@Repository
public interface ItemRepository extends JpaRepository<Item, Long> {

    // Standard JPA methods work as expected
    List<Item> findByCategory(String category);

    // Vector similarity search using L2 distance (Euclidean)
    @Query(value = """
        SELECT * FROM items
        ORDER BY embedding <-> CAST(:vector AS vector)
        LIMIT :limit
        """, nativeQuery = true)
    List<Item> findSimilarByL2(@Param("vector") String vector,
                                @Param("limit") int limit);

    // Vector similarity using cosine distance
    @Query(value = """
        SELECT * FROM items
        ORDER BY embedding <=> CAST(:vector AS vector)
        LIMIT :limit
        """, nativeQuery = true)
    List<Item> findSimilarByCosine(@Param("vector") String vector,
                                    @Param("limit") int limit);

    // Vector similarity using inner product
    @Query(value = """
        SELECT * FROM items
        ORDER BY embedding <#> CAST(:vector AS vector)
        LIMIT :limit
        """, nativeQuery = true)
    List<Item> findSimilarByInnerProduct(@Param("vector") String vector,
                                          @Param("limit") int limit);

    // Filtered vector search
    @Query(value = """
        SELECT * FROM items
        WHERE category = :category
        ORDER BY embedding <-> CAST(:vector AS vector)
        LIMIT :limit
        """, nativeQuery = true)
    List<Item> findSimilarInCategory(@Param("vector") String vector,
                                      @Param("category") String category,
                                      @Param("limit") int limit);
}

Understanding Distance Operators

PostgreSQL pgvector provides four distance operators:

Operator Name Best For

Operator	Name	Best For
`<→`	L2 Distance	Euclidean distance - measures straight-line distance between vectors
`<⇒`	Cosine Distance	Directional similarity - best for normalized vectors
`<#>`	Inner Product	Magnitude-aware similarity - useful for recommendation systems
`<+>`	L1 Distance	Manhattan distance - sum of absolute differences

<→

L2 Distance

Euclidean distance - measures straight-line distance between vectors

<⇒

Cosine Distance

Directional similarity - best for normalized vectors

<#>

Inner Product

Magnitude-aware similarity - useful for recommendation systems

<+>

L1 Distance

Manhattan distance - sum of absolute differences

For most semantic search applications, cosine distance (<⇒) is the preferred metric.

Vector Casting in Queries

PostgreSQL requires explicit casting when working with vector parameters. Two approaches work equally well:

-- SQL Standard CAST() function
SELECT * FROM items
ORDER BY embedding <-> CAST('[1.0, 0.5, 0.2, 0.1]' AS vector)
LIMIT 10;

-- PostgreSQL :: operator (more common)
SELECT * FROM items
ORDER BY embedding <-> '[1.0, 0.5, 0.2, 0.1]'::vector
LIMIT 10;

Both produce identical performance and query plans. Choose one style and be consistent throughout your codebase.

JdbcClient Approach

Why Use JdbcClient for Vectors?

While JPA handles basic vector operations, JdbcClient shines when you need:

Distance values in results: Know how similar items actually are
Custom DTOs: Return exactly the fields you need
Complex aggregations: Compute category centroids or cross-item distances
Dynamic queries: Build queries based on runtime conditions

Creating a Service with JdbcClient

import org.springframework.jdbc.core.simple.JdbcClient;
import org.springframework.stereotype.Service;

import java.math.BigDecimal;
import java.util.List;

@Service
public class ItemJdbcService {

    private final JdbcClient jdbcClient;

    public ItemJdbcService(JdbcClient jdbcClient) {
        this.jdbcClient = jdbcClient;
    }

    public List<ItemWithDistance> findSimilarWithDistances(
            float[] queryVector,
            int limit) {

        String vectorString = vectorToString(queryVector);

        return jdbcClient.sql("""
                SELECT
                    id,
                    name,
                    category,
                    price,
                    embedding <-> :vector::vector AS distance
                FROM items
                ORDER BY distance
                LIMIT :limit
                """)
                .param("vector", vectorString)
                .param("limit", limit)
                .query((rs, rowNum) -> new ItemWithDistance(
                        rs.getLong("id"),
                        rs.getString("name"),
                        rs.getString("category"),
                        rs.getBigDecimal("price"),
                        rs.getDouble("distance")
                ))
                .list();
    }

    public List<ItemWithDistance> findSimilarInCategoryWithDistances(
            float[] queryVector,
            String category,
            int limit) {

        String vectorString = vectorToString(queryVector);

        return jdbcClient.sql("""
                SELECT
                    id,
                    name,
                    category,
                    price,
                    embedding <-> :vector::vector AS distance
                FROM items
                WHERE category = :category
                ORDER BY distance
                LIMIT :limit
                """)
                .param("vector", vectorString)
                .param("category", category)
                .param("limit", limit)
                .query((rs, rowNum) -> new ItemWithDistance(
                        rs.getLong("id"),
                        rs.getString("name"),
                        rs.getString("category"),
                        rs.getBigDecimal("price"),
                        rs.getDouble("distance")
                ))
                .list();
    }

    // Helper method to convert float array to PostgreSQL vector format
    private String vectorToString(float[] vector) {
        StringBuilder sb = new StringBuilder("[");
        for (int i = 0; i < vector.length; i++) {
            if (i > 0) sb.append(",");
            sb.append(vector[i]);
        }
        sb.append("]");
        return sb.toString();
    }
}

Creating a DTO for Results

Define a record to hold results with distance information:

public record ItemWithDistance(
    Long id,
    String name,
    String category,
    BigDecimal price,
    Double distance
) {
    public String getDistanceFormatted() {
        return String.format("%.4f", distance);
    }
}

Java records are perfect for DTOs - immutable, concise, and type-safe.

Advanced Vector Operations

Computing Category Centroids

Find the "average" vector for each category:

public record CategoryCentroid(
    String category,
    String centroidVector,
    Long itemCount
) {}

public List<CategoryCentroid> computeCategoryCentroids() {
    return jdbcClient.sql("""
            SELECT
                category,
                AVG(embedding)::text as centroid_vector,
                COUNT(*) as item_count
            FROM items
            WHERE category IS NOT NULL
            GROUP BY category
            """)
            .query((rs, rowNum) -> new CategoryCentroid(
                    rs.getString("category"),
                    rs.getString("centroid_vector"),
                    rs.getLong("item_count")
            ))
            .list();
}

Finding Items Closest to Category Center

public List<ItemWithDistance> findItemsClosestToCategoryCenter(
        String category,
        int limit) {

    return jdbcClient.sql("""
            WITH category_centroid AS (
                SELECT AVG(embedding) as centroid
                FROM items
                WHERE category = :category
            )
            SELECT
                i.id,
                i.name,
                i.category,
                i.price,
                i.embedding <-> c.centroid AS distance
            FROM items i, category_centroid c
            WHERE i.category = :category
            ORDER BY distance
            LIMIT :limit
            """)
            .param("category", category)
            .param("limit", limit)
            .query((rs, rowNum) -> new ItemWithDistance(
                    rs.getLong("id"),
                    rs.getString("name"),
                    rs.getString("category"),
                    rs.getBigDecimal("price"),
                    rs.getDouble("distance")
            ))
            .list();
}

Cross-Item Distance Matrix

Calculate distances between all pairs of items:

public record ItemPairDistance(
    Long item1Id,
    String item1Name,
    Long item2Id,
    String item2Name,
    Double distance
) {}

public List<ItemPairDistance> computeItemDistances() {
    return jdbcClient.sql("""
            SELECT
                i1.id as item1_id,
                i1.name as item1_name,
                i2.id as item2_id,
                i2.name as item2_name,
                i1.embedding <-> i2.embedding AS distance
            FROM items i1
            CROSS JOIN items i2
            WHERE i1.id < i2.id
            ORDER BY distance
            """)
            .query((rs, rowNum) -> new ItemPairDistance(
                    rs.getLong("item1_id"),
                    rs.getString("item1_name"),
                    rs.getLong("item2_id"),
                    rs.getString("item2_name"),
                    rs.getDouble("distance")
            ))
            .list();
}

Building a REST API

Controller Implementation

import org.springframework.http.ResponseEntity;
import org.springframework.web.bind.annotation.*;

import java.util.List;
import java.util.Optional;

@RestController
@RequestMapping("/api/items")
public class ItemController {

    private final ItemRepository itemRepository;
    private final ItemJdbcService itemJdbcService;

    public ItemController(ItemRepository itemRepository,
                         ItemJdbcService itemJdbcService) {
        this.itemRepository = itemRepository;
        this.itemJdbcService = itemJdbcService;
    }

    @GetMapping
    public List<Item> getAllItems() {
        return itemRepository.findAll();
    }

    @GetMapping("/{id}")
    public ResponseEntity<Item> getItemById(@PathVariable Long id) {
        return itemRepository.findById(id)
                .map(ResponseEntity::ok)
                .orElse(ResponseEntity.notFound().build());
    }

    @PostMapping
    public Item createItem(@RequestBody ItemRequest request) {
        Item item = new Item(
                request.name(),
                request.category(),
                request.price(),
                request.embedding()
        );
        return itemRepository.save(item);
    }

    @PostMapping("/search/similar")
    public List<ItemWithDistance> searchSimilar(
            @RequestBody VectorSearchRequest request) {

        return itemJdbcService.findSimilarWithDistances(
                request.vector(),
                request.limit()
        );
    }

    @PostMapping("/search/similar-category")
    public List<ItemWithDistance> searchSimilarInCategory(
            @RequestBody CategoryVectorSearchRequest request) {

        return itemJdbcService.findSimilarInCategoryWithDistances(
                request.vector(),
                request.category(),
                request.limit()
        );
    }

    @GetMapping("/stats/categories")
    public List<CategoryCentroid> getCategoryStats() {
        return itemJdbcService.computeCategoryCentroids();
    }
}

// Request DTOs
record ItemRequest(
    String name,
    String category,
    BigDecimal price,
    float[] embedding
) {}

record VectorSearchRequest(
    float[] vector,
    int limit
) {}

record CategoryVectorSearchRequest(
    float[] vector,
    String category,
    int limit
) {}

Sample API Usage

Search for similar items:

curl -X POST http://localhost:8080/api/items/search/similar \
  -H "Content-Type: application/json" \
  -d '{
    "vector": [1.0, 0.5, 0.2, 0.1],
    "limit": 5
  }'

Response:

[
  {
    "id": 3,
    "name": "Apple",
    "category": "fruit",
    "price": 1.99,
    "distance": 0.0234
  },
  {
    "id": 7,
    "name": "Orange",
    "category": "fruit",
    "price": 2.49,
    "distance": 0.0891
  }
]

Recommendations

When to Use JPA vs JdbcClient

Use JPA when:

Performing standard CRUD operations
Working with entity relationships
You don’t need distance values in results
Leveraging JPA features (caching, lazy loading)

Use JdbcClient when:

You need distance values in results
Creating custom DTOs with specific fields
Performing complex aggregations
Building dynamic queries
Maximum performance is critical

Vector Dimension Consistency

Always validate that vectors have the correct dimensionality:

@Service
public class ItemService {

    private static final int EXPECTED_DIMENSIONS = 4;

    public Item createItem(ItemRequest request) {
        validateVectorDimensions(request.embedding());
        // Create and save item
    }

    private void validateVectorDimensions(float[] vector) {
        if (vector == null || vector.length != EXPECTED_DIMENSIONS) {
            throw new IllegalArgumentException(
                String.format("Vector must have exactly %d dimensions",
                             EXPECTED_DIMENSIONS)
            );
        }
    }
}

Indexing for Performance

For production systems with large datasets, create an index on the vector column:

-- IVFFlat index (faster build, good recall)
CREATE INDEX items_embedding_idx ON items
USING ivfflat (embedding vector_l2_ops)
WITH (lists = 100);

-- HNSW index (slower build, better recall)
CREATE INDEX items_embedding_idx ON items
USING hnsw (embedding vector_l2_ops);

Start with IVFFlat for development. Consider HNSW for production when you have sufficient data and need better recall.

Parameterized Queries for Security

Always use parameterized queries to prevent SQL injection:

// Good - parameterized
jdbcClient.sql("""
        SELECT * FROM items
        WHERE category = :category
        ORDER BY embedding <-> :vector::vector
        """)
        .param("category", category)
        .param("vector", vectorString)
        .query(Item.class)
        .list();

// Bad - string concatenation
String sql = "SELECT * FROM items " +
             "WHERE category = '" + category + "' " +
             "ORDER BY embedding <-> '" + vectorString + "'::vector";
// NEVER DO THIS!

Use Connection Pooling

Configure HikariCP for optimal performance:

spring.datasource.hikari.maximum-pool-size=10
spring.datasource.hikari.minimum-idle=5
spring.datasource.hikari.connection-timeout=20000
spring.datasource.hikari.idle-timeout=300000
spring.datasource.hikari.max-lifetime=1200000

Testing

Integration Testing with Testcontainers

import org.junit.jupiter.api.Test;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.boot.test.context.SpringBootTest;
import org.springframework.test.context.DynamicPropertyRegistry;
import org.springframework.test.context.DynamicPropertySource;
import org.testcontainers.containers.PostgreSQLContainer;
import org.testcontainers.junit.jupiter.Container;
import org.testcontainers.junit.jupiter.Testcontainers;

import java.math.BigDecimal;
import java.util.List;

import static org.assertj.core.api.Assertions.assertThat;

@SpringBootTest
@Testcontainers
class ItemVectorSearchTest {

    @Container
    static PostgreSQLContainer<?> postgres =
        new PostgreSQLContainer<>("pgvector/pgvector:pg17")
            .withDatabaseName("testdb")
            .withUsername("test")
            .withPassword("test")
            .withInitScript("init-vector.sql");

    @DynamicPropertySource
    static void configureProperties(DynamicPropertyRegistry registry) {
        registry.add("spring.datasource.url", postgres::getJdbcUrl);
        registry.add("spring.datasource.username", postgres::getUsername);
        registry.add("spring.datasource.password", postgres::getPassword);
    }

    @Autowired
    private ItemRepository itemRepository;

    @Autowired
    private ItemJdbcService itemJdbcService;

    @Test
    void shouldFindSimilarItems() {
        // Given
        Item apple = new Item("Apple", "fruit",
                             BigDecimal.valueOf(1.99),
                             new float[]{1.0f, 0.5f, 0.2f, 0.1f});
        itemRepository.save(apple);

        Item orange = new Item("Orange", "fruit",
                              BigDecimal.valueOf(2.49),
                              new float[]{1.1f, 0.4f, 0.3f, 0.1f});
        itemRepository.save(orange);

        // When
        float[] queryVector = {1.0f, 0.5f, 0.2f, 0.1f};
        List<ItemWithDistance> results =
            itemJdbcService.findSimilarWithDistances(queryVector, 2);

        // Then
        assertThat(results).hasSize(2);
        assertThat(results.get(0).name()).isEqualTo("Apple");
        assertThat(results.get(0).distance()).isLessThan(0.01);
    }
}

Create src/test/resources/init-vector.sql:

CREATE EXTENSION IF NOT EXISTS vector;

Common Pitfalls

Incorrect Vector Format

// Wrong - missing brackets
String vector = "1.0, 0.5, 0.2, 0.1";

// Wrong - using parentheses
String vector = "(1.0, 0.5, 0.2, 0.1)";

// Correct - square brackets
String vector = "[1.0, 0.5, 0.2, 0.1]";

Forgetting Vector Casting

-- Wrong - no cast
SELECT * FROM items
ORDER BY embedding <-> '[1.0, 0.5, 0.2, 0.1]'
LIMIT 10;

-- Correct - explicit cast
SELECT * FROM items
ORDER BY embedding <-> '[1.0, 0.5, 0.2, 0.1]'::vector
LIMIT 10;

Dimension Mismatch

// If table has vector(4), this will fail:
float[] wrongVector = {1.0f, 0.5f, 0.2f}; // Only 3 dimensions

// Correct:
float[] correctVector = {1.0f, 0.5f, 0.2f, 0.1f}; // 4 dimensions

Conclusion

Spring Framework provides excellent support for working with PostgreSQL vector columns through both JPA and JdbcClient:

JPA excels at standard CRUD operations and maintaining entity relationships
JdbcClient shines when you need distance values, custom DTOs, or complex queries
Together, they provide a complete toolkit for building vector-powered applications

The key insights:

Use the right tool for each job - JPA and JdbcClient complement each other
Always validate vector dimensions before saving to the database
Leverage parameterized queries for security
Consider indexing strategies for production workloads
Test with Testcontainers for realistic integration tests

Whether you’re building semantic search, recommendation systems, or AI-powered features, Spring and PostgreSQL’s pgvector extension provide a robust, production-ready foundation.

About the Author

Wei Li is a passionate software developer and technical writer specializing in Java and Spring Framework. With years of experience in building enterprise applications, Wei Li enjoys sharing knowledge through detailed tutorials and practical guides.

For questions or feedback, feel free to reach out at weili.mail@gmail.com.

Work with PostgreSQL Database with Vector Extension Using Spring JPA and JdbcClient

Introduction

Why Vectors in PostgreSQL?

JPA vs JdbcClient for Vector Operations

Getting Started

Dependencies

PostgreSQL Setup

Database Schema

Application Configuration

JPA Approach

Defining the Entity

Creating a Repository

Understanding Distance Operators

Vector Casting in Queries

JdbcClient Approach

Why Use JdbcClient for Vectors?

Creating a Service with JdbcClient

Creating a DTO for Results

Advanced Vector Operations

Computing Category Centroids

Finding Items Closest to Category Center

Cross-Item Distance Matrix

Building a REST API

Controller Implementation

Sample API Usage

Recommendations

When to Use JPA vs JdbcClient

Vector Dimension Consistency

Indexing for Performance

Parameterized Queries for Security

Use Connection Pooling

Testing

Integration Testing with Testcontainers

Common Pitfalls

Incorrect Vector Format

Forgetting Vector Casting

Dimension Mismatch

Conclusion

Further Reading

About the Author