Introduction
Vector databases have become essential for modern AI applications, enabling semantic search, recommendation systems, and similarity-based queries. PostgreSQL’s pgvector extension brings vector capabilities to the familiar relational database, and Spring Framework provides excellent API to work with it.
This guide demonstrates how to use Spring Data JPA and Spring JdbcClient to work effectively with PostgreSQL tables with vector columns, showing you when to use each approach and how they complement each other.
| All concepts in this guide are demonstrated in the accompanying GitHub repository: https://github.com/theautonomy/spring-jpa-jdbcclient-pgvector-example.git |
Why Vectors in PostgreSQL?
Vector embeddings from AI models (like OpenAI, Hugging Face, or custom models) can be stored directly in PostgreSQL alongside your traditional data:
-
Unified data model: Keep vectors with related business data
-
Leverage PostgreSQL: No need for separate vector database
-
ACID guarantees: Transaction support for vector operations
-
Production-ready: PostgreSQL’s reliability and performance
-
Cost-effective: Unified infrastructure
Getting Started
Dependencies
Add these dependencies to your Spring Boot project:
<parent>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-parent</artifactId>
<version>3.5.6</version>
</parent>
<dependencies>
<!-- Spring Boot Starters -->
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-data-jpa</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-web</artifactId>
</dependency>
<!-- PostgreSQL Driver -->
<dependency>
<groupId>org.postgresql</groupId>
<artifactId>postgresql</artifactId>
</dependency>
<!-- Vector Support -->
<dependency>
<groupId>com.pgvector</groupId>
<artifactId>pgvector</artifactId>
<version>0.1.6</version>
</dependency>
<dependency>
<groupId>org.hibernate.orm</groupId>
<artifactId>hibernate-vector</artifactId>
<version>6.6.5.Final</version>
</dependency>
</dependencies>
| Ensure you’re using Java 21 or later for the best experience with modern Spring Boot features. |
PostgreSQL Setup
You need PostgreSQL with the pgvector extension. You can set it up with Docker:
docker run -d \
--name pgvector-demo \
-e POSTGRES_DB=myvectordb \
-e POSTGRES_USER=myuser \
-e POSTGRES_PASSWORD=mypassword \
-p 5432:5432 \
pgvector/pgvector:pg17
Connect to the database and execute the following SQL to enable the vector extension:
CREATE EXTENSION IF NOT EXISTS vector;
Database Schema
Create a table with a vector column:
CREATE TABLE items (
id SERIAL PRIMARY KEY,
name TEXT NOT NULL,
category TEXT,
price NUMERIC(10, 2),
embedding vector(4), -- 4-dimensional vector
created_at TIMESTAMP DEFAULT NOW()
);
The number in vector(4) specifies the dimensionality. In production, you’d typically use 384, 768, or 1536 dimensions depending on your embedding model.
|
JPA Approach
Defining the Entity
Create a JPA entity with vector support using Hibernate’s vector types:
import jakarta.persistence.*;
import org.hibernate.annotations.JdbcTypeCode;
import org.hibernate.type.SqlTypes;
@Entity
@Table(name = "items")
public class Item {
@Id
@GeneratedValue(strategy = GenerationType.IDENTITY)
private Long id;
@Column(nullable = false)
private String name;
private String category;
private BigDecimal price;
@JdbcTypeCode(SqlTypes.VECTOR)
@Column(columnDefinition = "vector(4)")
private float[] embedding;
@Column(name = "created_at")
private LocalDateTime createdAt;
// Constructors, getters, and setters
public Item() {
this.createdAt = LocalDateTime.now();
}
public Item(String name, String category, BigDecimal price, float[] embedding) {
this.name = name;
this.category = category;
this.price = price;
this.embedding = embedding;
this.createdAt = LocalDateTime.now();
}
// Standard getters and setters...
}
The @JdbcTypeCode(SqlTypes.VECTOR) annotation tells Hibernate to handle this field as a PostgreSQL vector type.
|
Creating a Repository
Define a Spring Data JPA repository with native queries for vector operations:
import org.springframework.data.jpa.repository.JpaRepository;
import org.springframework.data.jpa.repository.Query;
import org.springframework.data.repository.query.Param;
import org.springframework.stereotype.Repository;
import java.util.List;
@Repository
public interface ItemRepository extends JpaRepository<Item, Long> {
// Standard JPA methods work as expected
List<Item> findByCategory(String category);
// Vector similarity search using L2 distance (Euclidean)
@Query(value = """
SELECT * FROM items
ORDER BY embedding <-> CAST(:vector AS vector)
LIMIT :limit
""", nativeQuery = true)
List<Item> findSimilarByL2(@Param("vector") String vector,
@Param("limit") int limit);
// Vector similarity using cosine distance
@Query(value = """
SELECT * FROM items
ORDER BY embedding <=> CAST(:vector AS vector)
LIMIT :limit
""", nativeQuery = true)
List<Item> findSimilarByCosine(@Param("vector") String vector,
@Param("limit") int limit);
// Vector similarity using inner product
@Query(value = """
SELECT * FROM items
ORDER BY embedding <#> CAST(:vector AS vector)
LIMIT :limit
""", nativeQuery = true)
List<Item> findSimilarByInnerProduct(@Param("vector") String vector,
@Param("limit") int limit);
// Filtered vector search
@Query(value = """
SELECT * FROM items
WHERE category = :category
ORDER BY embedding <-> CAST(:vector AS vector)
LIMIT :limit
""", nativeQuery = true)
List<Item> findSimilarInCategory(@Param("vector") String vector,
@Param("category") String category,
@Param("limit") int limit);
}
Understanding Distance Operators
PostgreSQL pgvector provides four distance operators:
| Operator | Name | Best For |
|---|---|---|
|
L2 Distance |
Euclidean distance - measures straight-line distance between vectors |
|
Cosine Distance |
Directional similarity - best for normalized vectors |
|
Inner Product |
Magnitude-aware similarity - useful for recommendation systems |
|
L1 Distance |
Manhattan distance - sum of absolute differences |
For most semantic search applications, cosine distance (<⇒) is the preferred metric.
|
Vector Casting in Queries
PostgreSQL requires explicit casting when working with vector parameters. Two approaches work equally well:
-- SQL Standard CAST() function
SELECT * FROM items
ORDER BY embedding <-> CAST('[1.0, 0.5, 0.2, 0.1]' AS vector)
LIMIT 10;
-- PostgreSQL :: operator (more common)
SELECT * FROM items
ORDER BY embedding <-> '[1.0, 0.5, 0.2, 0.1]'::vector
LIMIT 10;
Both produce identical performance and query plans. Choose one style and be consistent throughout your codebase.
JdbcClient Approach
Why Use JdbcClient for Vectors?
While JPA handles basic vector operations, JdbcClient shines when you need:
-
Distance values in results: Know how similar items actually are
-
Custom DTOs: Return exactly the fields you need
-
Complex aggregations: Compute category centroids or cross-item distances
-
Dynamic queries: Build queries based on runtime conditions
Creating a Service with JdbcClient
import org.springframework.jdbc.core.simple.JdbcClient;
import org.springframework.stereotype.Service;
import java.math.BigDecimal;
import java.util.List;
@Service
public class ItemJdbcService {
private final JdbcClient jdbcClient;
public ItemJdbcService(JdbcClient jdbcClient) {
this.jdbcClient = jdbcClient;
}
public List<ItemWithDistance> findSimilarWithDistances(
float[] queryVector,
int limit) {
String vectorString = vectorToString(queryVector);
return jdbcClient.sql("""
SELECT
id,
name,
category,
price,
embedding <-> :vector::vector AS distance
FROM items
ORDER BY distance
LIMIT :limit
""")
.param("vector", vectorString)
.param("limit", limit)
.query((rs, rowNum) -> new ItemWithDistance(
rs.getLong("id"),
rs.getString("name"),
rs.getString("category"),
rs.getBigDecimal("price"),
rs.getDouble("distance")
))
.list();
}
public List<ItemWithDistance> findSimilarInCategoryWithDistances(
float[] queryVector,
String category,
int limit) {
String vectorString = vectorToString(queryVector);
return jdbcClient.sql("""
SELECT
id,
name,
category,
price,
embedding <-> :vector::vector AS distance
FROM items
WHERE category = :category
ORDER BY distance
LIMIT :limit
""")
.param("vector", vectorString)
.param("category", category)
.param("limit", limit)
.query((rs, rowNum) -> new ItemWithDistance(
rs.getLong("id"),
rs.getString("name"),
rs.getString("category"),
rs.getBigDecimal("price"),
rs.getDouble("distance")
))
.list();
}
// Helper method to convert float array to PostgreSQL vector format
private String vectorToString(float[] vector) {
StringBuilder sb = new StringBuilder("[");
for (int i = 0; i < vector.length; i++) {
if (i > 0) sb.append(",");
sb.append(vector[i]);
}
sb.append("]");
return sb.toString();
}
}
Creating a DTO for Results
Define a record to hold results with distance information:
public record ItemWithDistance(
Long id,
String name,
String category,
BigDecimal price,
Double distance
) {
public String getDistanceFormatted() {
return String.format("%.4f", distance);
}
}
| Java records are perfect for DTOs - immutable, concise, and type-safe. |
Advanced Vector Operations
Computing Category Centroids
Find the "average" vector for each category:
public record CategoryCentroid(
String category,
String centroidVector,
Long itemCount
) {}
public List<CategoryCentroid> computeCategoryCentroids() {
return jdbcClient.sql("""
SELECT
category,
AVG(embedding)::text as centroid_vector,
COUNT(*) as item_count
FROM items
WHERE category IS NOT NULL
GROUP BY category
""")
.query((rs, rowNum) -> new CategoryCentroid(
rs.getString("category"),
rs.getString("centroid_vector"),
rs.getLong("item_count")
))
.list();
}
Finding Items Closest to Category Center
public List<ItemWithDistance> findItemsClosestToCategoryCenter(
String category,
int limit) {
return jdbcClient.sql("""
WITH category_centroid AS (
SELECT AVG(embedding) as centroid
FROM items
WHERE category = :category
)
SELECT
i.id,
i.name,
i.category,
i.price,
i.embedding <-> c.centroid AS distance
FROM items i, category_centroid c
WHERE i.category = :category
ORDER BY distance
LIMIT :limit
""")
.param("category", category)
.param("limit", limit)
.query((rs, rowNum) -> new ItemWithDistance(
rs.getLong("id"),
rs.getString("name"),
rs.getString("category"),
rs.getBigDecimal("price"),
rs.getDouble("distance")
))
.list();
}
Cross-Item Distance Matrix
Calculate distances between all pairs of items:
public record ItemPairDistance(
Long item1Id,
String item1Name,
Long item2Id,
String item2Name,
Double distance
) {}
public List<ItemPairDistance> computeItemDistances() {
return jdbcClient.sql("""
SELECT
i1.id as item1_id,
i1.name as item1_name,
i2.id as item2_id,
i2.name as item2_name,
i1.embedding <-> i2.embedding AS distance
FROM items i1
CROSS JOIN items i2
WHERE i1.id < i2.id
ORDER BY distance
""")
.query((rs, rowNum) -> new ItemPairDistance(
rs.getLong("item1_id"),
rs.getString("item1_name"),
rs.getLong("item2_id"),
rs.getString("item2_name"),
rs.getDouble("distance")
))
.list();
}
Building a REST API
Controller Implementation
import org.springframework.http.ResponseEntity;
import org.springframework.web.bind.annotation.*;
import java.util.List;
import java.util.Optional;
@RestController
@RequestMapping("/api/items")
public class ItemController {
private final ItemRepository itemRepository;
private final ItemJdbcService itemJdbcService;
public ItemController(ItemRepository itemRepository,
ItemJdbcService itemJdbcService) {
this.itemRepository = itemRepository;
this.itemJdbcService = itemJdbcService;
}
@GetMapping
public List<Item> getAllItems() {
return itemRepository.findAll();
}
@GetMapping("/{id}")
public ResponseEntity<Item> getItemById(@PathVariable Long id) {
return itemRepository.findById(id)
.map(ResponseEntity::ok)
.orElse(ResponseEntity.notFound().build());
}
@PostMapping
public Item createItem(@RequestBody ItemRequest request) {
Item item = new Item(
request.name(),
request.category(),
request.price(),
request.embedding()
);
return itemRepository.save(item);
}
@PostMapping("/search/similar")
public List<ItemWithDistance> searchSimilar(
@RequestBody VectorSearchRequest request) {
return itemJdbcService.findSimilarWithDistances(
request.vector(),
request.limit()
);
}
@PostMapping("/search/similar-category")
public List<ItemWithDistance> searchSimilarInCategory(
@RequestBody CategoryVectorSearchRequest request) {
return itemJdbcService.findSimilarInCategoryWithDistances(
request.vector(),
request.category(),
request.limit()
);
}
@GetMapping("/stats/categories")
public List<CategoryCentroid> getCategoryStats() {
return itemJdbcService.computeCategoryCentroids();
}
}
// Request DTOs
record ItemRequest(
String name,
String category,
BigDecimal price,
float[] embedding
) {}
record VectorSearchRequest(
float[] vector,
int limit
) {}
record CategoryVectorSearchRequest(
float[] vector,
String category,
int limit
) {}
Sample API Usage
Search for similar items:
curl -X POST http://localhost:8080/api/items/search/similar \
-H "Content-Type: application/json" \
-d '{
"vector": [1.0, 0.5, 0.2, 0.1],
"limit": 5
}'
Response:
[
{
"id": 3,
"name": "Apple",
"category": "fruit",
"price": 1.99,
"distance": 0.0234
},
{
"id": 7,
"name": "Orange",
"category": "fruit",
"price": 2.49,
"distance": 0.0891
}
]
Recommendations
When to Use JPA vs JdbcClient
Use JPA when:
-
Performing standard CRUD operations
-
Working with entity relationships
-
You don’t need distance values in results
-
Leveraging JPA features (caching, lazy loading)
Use JdbcClient when:
-
You need distance values in results
-
Creating custom DTOs with specific fields
-
Performing complex aggregations
-
Building dynamic queries
-
Maximum performance is critical
Vector Dimension Consistency
Always validate that vectors have the correct dimensionality:
@Service
public class ItemService {
private static final int EXPECTED_DIMENSIONS = 4;
public Item createItem(ItemRequest request) {
validateVectorDimensions(request.embedding());
// Create and save item
}
private void validateVectorDimensions(float[] vector) {
if (vector == null || vector.length != EXPECTED_DIMENSIONS) {
throw new IllegalArgumentException(
String.format("Vector must have exactly %d dimensions",
EXPECTED_DIMENSIONS)
);
}
}
}
Indexing for Performance
For production systems with large datasets, create an index on the vector column:
-- IVFFlat index (faster build, good recall)
CREATE INDEX items_embedding_idx ON items
USING ivfflat (embedding vector_l2_ops)
WITH (lists = 100);
-- HNSW index (slower build, better recall)
CREATE INDEX items_embedding_idx ON items
USING hnsw (embedding vector_l2_ops);
| Start with IVFFlat for development. Consider HNSW for production when you have sufficient data and need better recall. |
Parameterized Queries for Security
Always use parameterized queries to prevent SQL injection:
// Good - parameterized
jdbcClient.sql("""
SELECT * FROM items
WHERE category = :category
ORDER BY embedding <-> :vector::vector
""")
.param("category", category)
.param("vector", vectorString)
.query(Item.class)
.list();
// Bad - string concatenation
String sql = "SELECT * FROM items " +
"WHERE category = '" + category + "' " +
"ORDER BY embedding <-> '" + vectorString + "'::vector";
// NEVER DO THIS!
Testing
Integration Testing with Testcontainers
import org.junit.jupiter.api.Test;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.boot.test.context.SpringBootTest;
import org.springframework.test.context.DynamicPropertyRegistry;
import org.springframework.test.context.DynamicPropertySource;
import org.testcontainers.containers.PostgreSQLContainer;
import org.testcontainers.junit.jupiter.Container;
import org.testcontainers.junit.jupiter.Testcontainers;
import java.math.BigDecimal;
import java.util.List;
import static org.assertj.core.api.Assertions.assertThat;
@SpringBootTest
@Testcontainers
class ItemVectorSearchTest {
@Container
static PostgreSQLContainer<?> postgres =
new PostgreSQLContainer<>("pgvector/pgvector:pg17")
.withDatabaseName("testdb")
.withUsername("test")
.withPassword("test")
.withInitScript("init-vector.sql");
@DynamicPropertySource
static void configureProperties(DynamicPropertyRegistry registry) {
registry.add("spring.datasource.url", postgres::getJdbcUrl);
registry.add("spring.datasource.username", postgres::getUsername);
registry.add("spring.datasource.password", postgres::getPassword);
}
@Autowired
private ItemRepository itemRepository;
@Autowired
private ItemJdbcService itemJdbcService;
@Test
void shouldFindSimilarItems() {
// Given
Item apple = new Item("Apple", "fruit",
BigDecimal.valueOf(1.99),
new float[]{1.0f, 0.5f, 0.2f, 0.1f});
itemRepository.save(apple);
Item orange = new Item("Orange", "fruit",
BigDecimal.valueOf(2.49),
new float[]{1.1f, 0.4f, 0.3f, 0.1f});
itemRepository.save(orange);
// When
float[] queryVector = {1.0f, 0.5f, 0.2f, 0.1f};
List<ItemWithDistance> results =
itemJdbcService.findSimilarWithDistances(queryVector, 2);
// Then
assertThat(results).hasSize(2);
assertThat(results.get(0).name()).isEqualTo("Apple");
assertThat(results.get(0).distance()).isLessThan(0.01);
}
}
Create src/test/resources/init-vector.sql:
CREATE EXTENSION IF NOT EXISTS vector;
Common Pitfalls
Incorrect Vector Format
// Wrong - missing brackets
String vector = "1.0, 0.5, 0.2, 0.1";
// Wrong - using parentheses
String vector = "(1.0, 0.5, 0.2, 0.1)";
// Correct - square brackets
String vector = "[1.0, 0.5, 0.2, 0.1]";
Conclusion
Spring Framework provides excellent support for working with PostgreSQL vector columns through both JPA and JdbcClient:
-
JPA excels at standard CRUD operations and maintaining entity relationships
-
JdbcClient shines when you need distance values, custom DTOs, or complex queries
-
Together, they provide a complete toolkit for building vector-powered applications
The key insights:
-
Use the right tool for each job - JPA and JdbcClient complement each other
-
Always validate vector dimensions before saving to the database
-
Leverage parameterized queries for security
-
Consider indexing strategies for production workloads
-
Test with Testcontainers for realistic integration tests
Whether you’re building semantic search, recommendation systems, or AI-powered features, Spring and PostgreSQL’s pgvector extension provide a robust, production-ready foundation.
About the Author
Wei Li is a passionate software developer and technical writer specializing in Java and Spring Framework. With years of experience in building enterprise applications, Wei Li enjoys sharing knowledge through detailed tutorials and practical guides. For questions or feedback, feel free to reach out at weili.mail@gmail.com. |