Fixing PostgreSQL Large Object Leaks & Hibernate Bloat

A while ago, a former client asked me to have a look at their PostgreSQL database that had swollen to nearly 30 GB on disk when the actual data should have needed far less space. What began as a quick lunch-break review soon turned into an interesting quest: why did the database keep growing at an alarming, almost uncontrollable rate, with no obvious explanation?

The stack was a standard, robust setup: a React frontend communicating with a Java/Spring Boot backend, using Hibernate for ORM and PostgreSQL as the data store. On the surface, everything looked fine, but the disk usage told a different story.

When I looked into it, I found that their application stored uploads in PostgreSQL Large Objects (LOBs). LOBs are a reasonable way to hold binary data, but they explained the runaway footprint. The real problem was a disconnect in behavior: on delete, Hibernate removed the row from the application table but never called unlink on the Large Object. The app was "deleting" the pointer while the bytes stayed behind in pg_largeobject as "ghosts."

With that diagnosis clear, I explained the mechanics to my client and laid out ways to fix it: short-term relief for the bloat they already had, optional Java-side guardrails, and a long-term safeguard in the database. What follows is a concise write-up of that same story.

Why Does This Happen? (The Ghost in the Machine)

The core issue is a lack of Referential Integrity. In PostgreSQL, a Large Object is stored in a central system table (pg_largeobject), and your table only stores an OID (a numeric ID).

To the database, that OID is just a number. It has no native "Foreign Key" relationship with the LOB storage. When you delete your row, PostgreSQL doesn't know that the number was a reference to a file. It clears the row and forgets the file ever existed, leaving it "orphaned."

Why Doesn't Hibernate Handle It?

Hibernate follows a "Safety First" philosophy, and this behavior is very much by design.

Because an OID is stored as a simple numeric value (a Long or bigint), Hibernate treats it as basic data rather than a managed association. Since the database itself lacks native referential integrity for Large Objects, Hibernate cannot be certain whether other rows, or even entirely different applications, are referencing that same OID. To prevent accidental data loss, the maintainers prioritize safety; Hibernate will never assume it is safe to unlink a LOB that might be used elsewhere, leaving lifecycle management as an explicit task for the developer or the database.

The Solution Journey: From Reactive to Robust

As with most engineering challenges, there are several ways to solve the problem, each with its own trade-offs. I grouped the choices into three layers: immediate cleanup (short-term), application-level guardrails in Java (medium-term), and robust triggers in PostgreSQL (long-term). This structure helps you decide where the logic best belongs in your specific stack.

1. The "Band-Aid": `vacuumlo` and `VACUUM FULL`

Short term: run vacuumlo to find orphaned Large Objects, then VACUUM FULL to give disk space back to the OS.

That addresses symptoms, not the root cause. VACUUM FULL can require downtime because it takes heavy locks, and neither step stops new orphans from appearing if deletes still skip unlink.

2. The Java Approach: Lifecycle Callbacks

Medium term: keep cleanup in the app if you want it there. At the application layer, you can use JPA lifecycle callbacks such as @PreRemove and manually invoke PostgreSQL's LargeObjectManager to unlink the file.

Here are two Java/Spring Boot patterns that are easy to read and maintain. The snippets assume Spring Boot 3.x (Jakarta Persistence: jakarta.*) and the PostgreSQL JDBC driver (org.postgresql:postgresql) on the classpath. On Spring Boot 2.x, use javax.persistence for the same annotations (@Entity, @PreRemove, and so on); only the package name changes.

Option A: Lifecycle callback (`@PreRemove`)

FileEntity.java:

import jakarta.persistence.Column;
import jakarta.persistence.Entity;
import jakarta.persistence.Id;
import jakarta.persistence.PreRemove;
import jakarta.persistence.Table;

@Entity
@Table(name = "your_table_name")
public class FileEntity {
    @Id
    private Long id;

    @Column(name = "file_oid")
    private Long fileOid;

    @PreRemove
    public void beforeDelete() {
        // Delegate to a Spring-managed helper.
        LargeObjectCleanupSupport.unlinkIfPresent(fileOid);
    }
}

LargeObjectCleanupSupport.java:

import org.springframework.stereotype.Component;

@Component
public class LargeObjectCleanupSupport {
    private static FileLobService service;

    public LargeObjectCleanupSupport(FileLobService service) {
        LargeObjectCleanupSupport.service = service;
    }

    public static void unlinkIfPresent(Long oid) {
        if (oid != null) {
            service.unlink(oid);
        }
    }
}

A Note on Implementation: The static reference above is a simplified sketch to illustrate the flow. In a production-grade Spring application, using a static bridge is often frowned upon.

A cleaner approach is to use @EntityListeners. However, since JPA Entity Listeners are not managed by Spring by default, you would typically use HibernatePropertiesCustomizer or AutowireCapableBeanFactory to inject your services into the listener, or simply stick to Option B for a more straightforward, Spring-native experience.

Option B: Service-level delete (alternative solution)

FileService.java:

import org.springframework.stereotype.Service;
import org.springframework.transaction.annotation.Transactional;

@Service
public class FileService {
    private final FileRepository fileRepository;
    private final FileLobService fileLobService;

    public FileService(
        FileRepository fileRepository,
        FileLobService fileLobService
    ) {
        this.fileRepository = fileRepository;
        this.fileLobService = fileLobService;
    }

    @Transactional
    public void deleteFile(Long id) {
        FileEntity entity = fileRepository.findById(id).orElseThrow();
        fileLobService.unlink(entity.getFileOid());
        fileRepository.delete(entity);
    }
}

FileLobService.java:

import org.springframework.jdbc.core.JdbcTemplate;
import org.springframework.stereotype.Service;

@Service
public class FileLobService {
    private final JdbcTemplate jdbcTemplate;

    public FileLobService(JdbcTemplate jdbcTemplate) {
        this.jdbcTemplate = jdbcTemplate;
    }

    public void unlink(Long oid) {
        if (oid == null) {
            return;
        }
        jdbcTemplate.update("SELECT lo_unlink(CAST(? AS oid))", oid);
    }
}

Choose Option A when you want lightweight entity-level cleanup with minimal service code. Choose Option B when you prefer explicit, testable business flow and clearer control over delete behavior.

This keeps cleanup logic in Java, but it has blind spots: if rows are deleted through SQL scripts, maintenance jobs, external tools, or bulk JPQL/SQL delete operations that bypass entity lifecycle callbacks, your Java callback never runs.

3. The Gold Standard: Database Triggers

Long term: move cleanup into PostgreSQL. With a PL/pgSQL trigger, every delete or update is enforced at the source of truth, whether the change came from Hibernate, a script, or another client.

That is what I recommended they adopt for durable prevention, and it is what the next section walks through in detail.

Important ownership rule: this trigger pattern is correct only when each Large Object is owned by exactly one row. If multiple rows (or other tables/apps) can point to the same OID, blindly unlinking on delete/update can remove still-in-use data. In that case, use reference counting (or another ownership model) before calling lo_unlink.

The Solution: Automated Prevention with Triggers

To solve this permanently, you move the responsibility from the application layer to the database itself. With a PL/pgSQL function and a Trigger, the database can stay "self-cleaning."

1. The Logic: A Smart Cleanup Function

The function needs to handle three scenarios:

Deletion: The row is gone; the file must go too.
Update: A file is replaced by a new one; the old file must be unlinked.
Insertion: A new file is added; no cleanup needed.

CREATE OR REPLACE FUNCTION handle_lob_cleanup()
RETURNS TRIGGER AS $$
BEGIN
    -- Scenario 1: The row is being deleted
    IF (TG_OP = 'DELETE') THEN
        IF OLD.file_oid IS NOT NULL THEN
            BEGIN
                PERFORM lo_unlink(OLD.file_oid::oid);
            EXCEPTION
                WHEN undefined_object THEN
                    -- Already missing; keep row delete idempotent.
                    NULL;
            END;
        END IF;
        RETURN OLD;

    -- Scenario 2: The row is being updated (file is replaced)
    ELSIF (TG_OP = 'UPDATE') THEN
        IF (OLD.file_oid IS DISTINCT FROM NEW.file_oid)
           AND OLD.file_oid IS NOT NULL THEN
            BEGIN
                PERFORM lo_unlink(OLD.file_oid::oid);
            EXCEPTION
                WHEN undefined_object THEN
                    NULL;
            END;
        END IF;
        RETURN NEW;
    END IF;

    RETURN NULL;
END;
$$ LANGUAGE plpgsql;

The ::oid casts match a common mapping where the app stores the Large Object ID in a numeric column. If your schema permits it, prefer storing the column as type oid directly to avoid accidental out-of-range values.

2. The Event: Activating the Trigger

Next, attach this logic to the relevant table. An AFTER trigger ensures that the file is only deleted if the database operation on the main table succeeds.

CREATE TRIGGER trg_cleanup_files
AFTER DELETE OR UPDATE ON your_table_name
FOR EACH ROW
EXECUTE FUNCTION handle_lob_cleanup();

EXECUTE FUNCTION is the preferred spelling in modern PostgreSQL (introduced in PostgreSQL 11). On older releases, use EXECUTE PROCEDURE with the same function name; it is equivalent (the keyword PROCEDURE here is legacy wording for a trigger function).

Choosing the Right Path

Each approach has its place depending on your infrastructure and team preferences. Here is a quick comparison of how the three strategies stack up against each other:

Solution	Implementation	Pros	Cons
Option A	JPA `@PreRemove`	Localized to the entity, easy to read.	Requires static bridges or complex listener wiring in Spring.
Option B	Service Layer	Explicit, testable, and follows standard Spring patterns.	Must be manually called; easy to miss if new delete paths are added.
The Trigger	Database (PL/pgSQL)	Most robust; works regardless of the client (Java, SQL scripts, etc.).	Logic is "hidden" in the database; ensure OID ownership rules are clear.

Under the Hood: What's Actually Happening?

With the choice of a trigger-based approach made, it helps to look under the hood. To make this practical, let's examine exactly what happens in each scenario when Hibernate and PostgreSQL interact with our new trigger.

Scenario 1: Full Row Deletion

When the Java app calls repository.delete(entity), Hibernate sends a DELETE command to PostgreSQL. After the row is removed, the trigger fires, reads the OLD record, finds the OID, and calls lo_unlink. The file is removed from pg_largeobject immediately.

Scenario 2: Replacing a File (The Update)

If a user uploads a new file to an existing record, the app sends an UPDATE. The trigger compares OLD.file_oid with NEW.file_oid. If they are different, it knows the old OID is no longer needed and unlinks it while leaving the new one intact. This prevents "version bloat."

Scenario 3: Uploading a New File

When a new row is created (INSERT), this trigger does not fire because INSERT was not included in CREATE TRIGGER. There is no OLD data to clean up, so no unlink is attempted. The new file is stored as expected.

Why "AFTER" and Not "BEFORE"?

Using an AFTER trigger is intentional. If deleting or updating the row fails for any reason, you do not want to remove the file first. Cleanup runs only once the row operation has succeeded.

This turns a "dumb" OID column into a self-managing, integrity-aware system. It bridges the gap between Hibernate's high-level abstractions and PostgreSQL's low-level storage realities.

Reclaiming Your Space

If you are already facing a bloated database, simply adding the trigger isn't enough to get your disk space back immediately; it only prevents new orphans from appearing. To clear the existing backlog and shrink your storage, you will need a one-time cleanup.

Run vacuumlo: This utility finds and removes existing orphans created before your trigger was live. Once your trigger is active and covering all delete/update paths, you should theoretically never need to run vacuumlo again for these tables.

Note: vacuumlo is not always included in a standard PostgreSQL client installation. Depending on your distribution, you may need to install the postgresql-contrib package (or a similar naming convention like postgresql-client-common) to get access to this binary.
Run VACUUM FULL: Even after the data is "deleted," PostgreSQL keeps the file size on disk the same to reuse the space later. To physically shrink the database file and return space to the OS, a VACUUM FULL on pg_largeobject is required.

Warning: VACUUM FULL takes a heavy lock on the table, so ensure you have a maintenance window if you are operating on a large production database.

By combining the trigger (for future prevention) with this one-time maintenance (for past bloat), you move from a reactive to a proactive storage strategy.

The Results

In my client's case, the results were dramatic. After they put the trigger in place and finished the one-time cleanup we had discussed:

Their backup size dropped from 30GB to 1.25GB.
They reclaimed over 28GB of actual disk space.
The application felt significantly "snappier" as database overhead decreased, and the time required for maintenance tasks was slashed.

Conclusion

Don't let your database become a graveyard for deleted files. In this case, bridging Hibernate's abstractions and PostgreSQL's storage behavior with a trigger turned file handling from a recurring crisis into something quiet and automatic. The same idea applies wherever you need deletes in the application to stay honest about Large Object storage.

The Case of the Bloating Database: Solving the PostgreSQL Large Object Leak

Why Does This Happen? (The Ghost in the Machine)

Why Doesn't Hibernate Handle It?

The Solution Journey: From Reactive to Robust

1. The "Band-Aid": `vacuumlo` and `VACUUM FULL`

2. The Java Approach: Lifecycle Callbacks

Option A: Lifecycle callback (`@PreRemove`)

Option B: Service-level delete (alternative solution)

3. The Gold Standard: Database Triggers

The Solution: Automated Prevention with Triggers

1. The Logic: A Smart Cleanup Function

2. The Event: Activating the Trigger

Choosing the Right Path

Under the Hood: What's Actually Happening?

Scenario 1: Full Row Deletion

Scenario 2: Replacing a File (The Update)

Scenario 3: Uploading a New File

Why "AFTER" and Not "BEFORE"?

Reclaiming Your Space

The Results

Conclusion

Comments

Command Palette

Why Does This Happen? (The Ghost in the Machine)

Why Doesn't Hibernate Handle It?

The Solution Journey: From Reactive to Robust

1. The "Band-Aid": vacuumlo and VACUUM FULL

2. The Java Approach: Lifecycle Callbacks

Option A: Lifecycle callback (@PreRemove)

Option B: Service-level delete (alternative solution)

3. The Gold Standard: Database Triggers

The Solution: Automated Prevention with Triggers

1. The Logic: A Smart Cleanup Function

2. The Event: Activating the Trigger

Choosing the Right Path

Under the Hood: What's Actually Happening?

Scenario 1: Full Row Deletion

Scenario 2: Replacing a File (The Update)

Scenario 3: Uploading a New File

Why "AFTER" and Not "BEFORE"?

Reclaiming Your Space

The Results

Conclusion

Comments

1. The "Band-Aid": `vacuumlo` and `VACUUM FULL`

Option A: Lifecycle callback (`@PreRemove`)