Fixing Hibernate HHH000104 firstResult maxResults warning using Spring Data JPA Specification and Criteria API

1. OVERVIEW

Whenever you use pagination and SQL joins to retrieve entities and their associations to prevent the N+1 select queries problem you’ll most-likely run into this Hibernate’s HHH000104 warning message.

HHH000104: firstResult/maxResults specified with collection fetch; applying in memory!

This warning is bad and will affect your application’s performance once your dataset grows. Let’s see why.

Let’s start with these tables relashionship:

Tables relationships

It helps us to write or generate our domain model, and we would endup with these relevant JPA associated entities:

Tip: You might want to take a look at the recent Code and Configuration Snippets / Recipes / Tips section, which helps you with common tasks without reading through a full blog post.

Film.java:

@Entity
@Table(name = "film", schema = "public")
public class Film implements Serializable {

  @Id
  @Column(name = "film_id", unique = true, nullable = false)
  private int filmId;

  @ManyToOne(fetch = FetchType.LAZY)
  @JoinColumn(name = "language_id", nullable = false)
  private Language language;

  @OneToMany(cascade = CascadeType.ALL, fetch = FetchType.LAZY, mappedBy = "film")
  private Set<FilmActor> filmActors;
// ...
}

Language.java:

@Entity
@Table(name = "language", schema = "public")
public class Language implements Serializable {

  @Id
  @Column(name = "language_id", unique = true, nullable = false)
  private int languageId;

  @OneToMany(cascade = CascadeType.ALL, fetch = FetchType.LAZY, mappedBy = "language")
  private Set<Film> films;
// ...
}

FilmActor.java:

@Entity
@Table(name = "film_actor", schema = "public")
public class FilmActor implements Serializable {

  @EmbeddedId
  @AttributeOverrides({
    @AttributeOverride(name = "actorId", column = @Column(name = "actor_id", nullable = false)),
    @AttributeOverride(name = "filmId", column = @Column(name = "film_id", nullable = false))
  })
  private FilmActorId id;

  @ManyToOne(fetch = FetchType.LAZY)
  @JoinColumn(name = "film_id", nullable = false, insertable = false, updatable = false)
  private Film film;

  @ManyToOne(fetch = FetchType.LAZY)
  @JoinColumn(name = "actor_id", nullable = false, insertable = false, updatable = false)
  private Actor actor;
// ...
}

Actor.java:

@Entity
@Table(name = "actor", schema = "public")
public class Actor implements Serializable {

  @Id
  @Column(name = "actor_id", unique = true, nullable = false)
  private int actorId;

  @OneToMany(cascade = CascadeType.ALL, fetch = FetchType.LAZY, mappedBy = "actor")
  private Set<FilmActor> filmActors;
// ...
}

These JPA entities helps you to understand the SQL query we are about to analyze.

Let’s take a look at the logs after sending a GET request to /api/films?page=3&size=20:

WARN 57364 --- [nio-8080-exec-2] o.h.h.internal.ast.QueryTranslatorImpl   : HHH000104: firstResult/maxResults specified with collection fetch; applying in memory!
Hibernate: 
  select
    film0_.film_id as film_id1_8_0_,
    language1_.language_id as language1_13_1_,
    filmactors2_.actor_id as actor_id1_9_2_,
    filmactors2_.film_id as film_id2_9_2_,
    actor3_.actor_id as actor_id1_0_3_,
    film0_.description as descript2_8_0_,
    film0_.fulltext as fulltext3_8_0_,
    film0_.language_id as languag13_8_0_,
    film0_.last_update as last_upd4_8_0_,
    film0_.length as length5_8_0_,
    film0_.rating as rating6_8_0_,
    film0_.release_year as release_7_8_0_,
    film0_.rental_duration as rental_d8_8_0_,
    film0_.rental_rate as rental_r9_8_0_,
    film0_.replacement_cost as replace10_8_0_,
    film0_.special_features as special11_8_0_,
    film0_.title as title12_8_0_,
    language1_.last_update as last_upd2_13_1_,
    language1_.name as name3_13_1_,
    filmactors2_.last_update as last_upd3_9_2_,
    filmactors2_.film_id as film_id2_9_0__,
    filmactors2_.actor_id as actor_id1_9_0__,
    actor3_.first_name as first_na2_0_3_,
    actor3_.last_name as last_nam3_0_3_,
    actor3_.last_update as last_upd4_0_3_ 
  from
    public.film film0_ 
  left outer join
    public.language language1_ 
      on film0_.language_id=language1_.language_id 
  left outer join
    public.film_actor filmactors2_ 
      on film0_.film_id=filmactors2_.film_id 
  left outer join
    public.actor actor3_ 
      on filmactors2_.actor_id=actor3_.actor_id
Hibernate: 
  select
    count(film0_.film_id) as col_0_0_ 
  from
    public.film film0_

First notice the HHH000104 warning log.

If you take a closer look at the SQL query, there is not pagination keywords included.

H2 supports limit and offset keywords for pagination purposes, but Hibernate didn’t include them.

There is no pagination in place. Hibernate’s retrieving all the Film entities, their actors, languages, etc. and doing the pagination in-memory. Again, this is bad. For datasets with around one million rows such a query can take between 10 to 15 seconds.

@vlad_mihalcea suggests two approaches in his piece The best way to fix the Hibernate “HHH000104: firstResult/maxResults specified with collection fetch; applying in memory!” warning message:

One SQL query using the Windows Function DENSE_RANK().
Two SQL queries. The first one retrieves the entities’ ID. The second retrieves the entities using an IN predicate including the IDs from the first query.

This blog bost discusses a generic and reusable implementation using the two queries approach with Spring Data JPA Specification.

2. EXTENDING THE SPRING DATA JPA SPECIFICATION EXECUTOR

Let’s extend Spring Data JPA functionality implementing a custom JpaSpecificationExecutor.

Tip: Read on to learn how to extend Spring Data JPA’s JpaSpecificationExecutor to pad IN SQL predicates.

AsimioJpaSpecificationExecutor.java:

@NoRepositoryBean
public interface AsimioJpaSpecificationExecutor<E, ID extends Serializable> extends JpaSpecificationExecutor<E> {

  Page<ID> findEntityIds(Pageable pageable);
}

It includes one method: Page<ID> findEntityIds(Pageable pageable), to keep this post simple.

Note:
If you need to write dynamic SQL queries using Spring Data JPA Specification you would need to implement:
Page<ID> findEntityIds(Specification<E> spec, Pageable pageable);

If you need to pad IN SQL predicates to optimize Hibernate’s QueryPlanCache you would need to implement:
Page<ID> findEntityIds(NamedQueryParametersSpecification<E> parametrizedSpec, Pageable pageable);

@NoRepositoryBean prevents AsimioJpaSpecificationExecutor intermediate interface from being proxied. It’s used when providing a base interface with new and common methods for your repositories along with the custom base repository implementation.

Let’s implement it now.

AsimioSimpleJpaRepository.java:

public class AsimioSimpleJpaRepository<E, ID extends Serializable> extends SimpleJpaRepository<E, ID>
        implements AsimioJpaSpecificationExecutor<E, ID> {

  private final EntityManager entityManager;
  private final JpaEntityInformation<E, ID> entityInformation;

  public AsimioSimpleJpaRepository(JpaEntityInformation<E, ID> entityInformation, EntityManager entityManager) {
    super(entityInformation, entityManager);
    this.entityManager = entityManager;
    this.entityInformation = entityInformation;
  }

  @Override
  public Page<ID> findEntityIds(Pageable pageable) {
    CriteriaBuilder criteriaBuilder = this.entityManager.getCriteriaBuilder();
    CriteriaQuery<ID> criteriaQuery = criteriaBuilder.createQuery(this.entityInformation.getIdType());
    Root<E> root = criteriaQuery.from(this.getDomainClass());

    // Get the entities ID only
    criteriaQuery.select((Path<ID>) root.get(this.entityInformation.getIdAttribute()));

    // Update Sorting
    Sort sort = pageable.isPaged() ? pageable.getSort() : Sort.unsorted();
    if (sort.isSorted()) {
      criteriaQuery.orderBy(toOrders(sort, root, criteriaBuilder));
    }

    TypedQuery<ID> typedQuery = this.entityManager.createQuery(criteriaQuery);

    // Update Pagination attributes
    if (pageable.isPaged()) {
      typedQuery.setFirstResult((int) pageable.getOffset());
      typedQuery.setMaxResults(pageable.getPageSize());
    }

    return PageableExecutionUtils.getPage(typedQuery.getResultList(), pageable,
      () -> executeCountQuery(this.getCountQuery(null, this.getDomainClass())));
  }

  protected static long executeCountQuery(TypedQuery<Long> query) {
    Assert.notNull(query, "TypedQuery must not be null!");

    List<Long> totals = query.getResultList();
    long total = 0L;

    for (Long element : totals) {
      total += element == null ? 0 : element;
    }

    return total;
  }
}

The relevant method to discuss is Page<ID> findEntityIds(Pageable pageable).

It uses JPA’s Criteria to SELECT the JPA entities’ ID only. It adds sorting, and pagination clauses if they are found in the pageable method argument. It then returns a Page with the entities IDs along with Page metadata.

executeCountQuery() method is a copy & paste from Spring Data JPA’s SimpleJpaRepository because its scope is private and couldn’t reuse it.

One of the pros of this approach is that you can now have multiple DAO / Repository classes taking advantage of it because it’s included in a base class.

Let’s discuss just one in the next section to keep this post simple.

3. SPRING DATA JPA REPOSITORIES

Next, let’s associate the custom JpaSpecificationExecutor with the DAO / Repository classes.

FilmDao.java:

@Repository
public interface FilmDao extends JpaRepository<Film, Integer>, AsimioJpaSpecificationExecutor<Film, Integer> {

  @EntityGraph(
    type = EntityGraphType.FETCH,
    attributePaths = {
      "language", 
      "filmActors", "filmActors.actor"
    }
  )
  List<Film> findAll(@Nullable Specification<Film> spec);
}

@Repository-annotated interface extends from AsimioJpaSpecificationExecutor interface.
Annotates findAll(Specification<Film> spec) method with @EntityGraph to prevent the N+1 queries problems. If you are sure the code won’t access any children object, you don’t need to include findAll() in FilmDao. It’ll use the one defined in a parent interface.
You don’t need to include findEntityIds(Pageable pageable) because:
- There is no need to eager-load children objects (via @EntityGraph annotation). It only selects the entity IDs.
- This method doesn’t have any query hint.

Note: If you were to add QueryHint annotations to findEntityIds(Pageable pageable), you first need to implement support for it in AsimioSimpleJpaRepository.

The idea is to first retrieve the Film identifiers and then execute a second query with an IN predicate including the IDs from the first query.

You would still need to let know Spring Data JPA to use your custom JPA repository class: AsimioSimpleJpaRepository.

Application.java:

@SpringBootApplication
@EnableJpaRepositories(
  repositoryBaseClass = AsimioSimpleJpaRepository.class
)
@EnableTransactionManagement
@EnableSpringDataWebSupport
public class Application {
// ...
}

4. SERVICE CLASS

DefaultDvdRentalService.java:

@Service
@Transactional(readOnly = true)
public class DefaultDvdRentalService implements DvdRentalService {

  private final FilmDao filmDao;

  @Override
  public Page<Film> retrieveFilms(Pageable page) {
    // Getting film ids and page data to prevent:
    // HHH000104: firstResult/maxResults specified with collection fetch; applying in memory! 
    // which affects application's performance
    Page<Integer> filmIdsPage = this.filmDao.findEntityIds(page);

    List<Film> result;
    List<Integer> filmIds = filmIdsPage.getContent();
    if (CollectionUtils.isEmpty(filmIds)) {
      result = Lists.newArrayList();
    } else {
      // Retrieve films using IN predicate
      Specification<Film> fimlIdInSpecification = FilmSpecifications.idIn(Sets.newHashSet(filmIds));
      result = this.filmDao.findAll(fimlIdInSpecification);
    }
    return PageableExecutionUtils.getPage(result, page, () -> filmIdsPage.getTotalElements());
  }
}

retrieveFilms() first retrieves a Page of Film IDs.
Then executes a second SQL query via findAll() with a Spring Data JPA Specification-based IN predicate, which includes the IDs retrieved in the first query.

Tip:
Read on to learn:
How to write dynamic SQL queries using Spring Data JPA Specification
Padding IN SQL predicates to optimize Hibernate’s QueryPlanCache .

5. SPRING DATA JPA SPECIFICATIONS

FilmSpecifications.java:

public final class FilmSpecifications {

  public static Specification<Film> idIn(Set<Integer> filmIds) {
    if (CollectionUtils.isEmpty(filmIds)) {
     return null;
    }
    return (root, query, builder) -> root.get(Film_.filmId).in(filmIds);
  }
// ...
}

A Spring Data JPA Specification to retrieve Film entities by ID using a Criteria query.

6. REST CONTROLLER

FilmController.java:

@RestController
@RequestMapping(value = "/api/films", produces = MediaType.APPLICATION_JSON_VALUE)
public class FilmController {

  private final DvdRentalService dvdRentalService;

  @GetMapping(path = "")
  public ResponseEntity<List<FilmResource>> retrieveFilms(
    @PageableDefault(page = DEFAULT_PAGE_NUMBER, size = DEFAULT_PAGE_SIZE) Pageable page) {

    Page<Film> films = this.dvdRentalService.retrieveFilms(page);
    List<FilmResource> result = FilmResourceMapper.INSTANCE.map(films.getContent());
    return new ResponseEntity<>(result, HttpStatus.OK);
  }
}

Spring MVC handles setting Pageable page with the page size and offset values from request parameters from URLs like /api/films?page=0&size=10.

This is available to Spring MVC Controllers via Application.java’s configuration with @EnableSpringDataWebSupport.

7. PAGINATION IN ACTION

Let’s try sending another GET request to /api/films?page=3&size=20 and analyze the logs:

Hibernate: 
  select
    film0_.film_id as col_0_0_ 
  from
    public.film film0_ limit ? offset ?
Hibernate: 
  select
    count(film0_.film_id) as col_0_0_ 
  from
    public.film film0_
Hibernate: 
  select
    film0_.film_id as film_id1_8_0_,
    filmactors1_.actor_id as actor_id1_9_1_,
    filmactors1_.film_id as film_id2_9_1_,
    actor2_.actor_id as actor_id1_0_2_,
    language3_.language_id as language1_13_3_,
    film0_.description as descript2_8_0_,
    film0_.fulltext as fulltext3_8_0_,
    film0_.language_id as languag13_8_0_,
    film0_.last_update as last_upd4_8_0_,
    film0_.length as length5_8_0_,
    film0_.rating as rating6_8_0_,
    film0_.release_year as release_7_8_0_,
    film0_.rental_duration as rental_d8_8_0_,
    film0_.rental_rate as rental_r9_8_0_,
    film0_.replacement_cost as replace10_8_0_,
    film0_.special_features as special11_8_0_,
    film0_.title as title12_8_0_,
    filmactors1_.last_update as last_upd3_9_1_,
    filmactors1_.film_id as film_id2_9_0__,
    filmactors1_.actor_id as actor_id1_9_0__,
    actor2_.first_name as first_na2_0_2_,
    actor2_.last_name as last_nam3_0_2_,
    actor2_.last_update as last_upd4_0_2_,
    language3_.last_update as last_upd2_13_3_,
    language3_.name as name3_13_3_ 
  from
    public.film film0_ 
  left outer join
    public.film_actor filmactors1_ 
      on film0_.film_id=filmactors1_.film_id 
  left outer join
    public.actor actor2_ 
      on filmactors1_.actor_id=actor2_.actor_id 
  left outer join
    public.language language3_ 
      on film0_.language_id=language3_.language_id 
  where
    film0_.film_id in (
      64 , 65 , 66 , 67 , 68 , 69 , 70 , 71 , 72 , 73 , 74 , 75 , 76 , 77 , 78 , 79 , 80 , 61 , 62 , 63
    )

It’s executing three queries:

Retrieves Film identifiers using pagination. Notice limit, and offset keywords.
count() query, which you could remove from AsimioSimpleJpaRepository implementation if you don’t need it.
Retrieve Film entities via an IN predicate with the identifiers from the first query.

For a dataset with a million or two rows these three queries perform better than the one that logs:

HHH000104: firstResult/maxResults specified with collection fetch; applying in memory!

8. CONCLUSION

The HHH000104 warning message is bad. There is no SQL pagination. It’s done in memory which will most-likely degrade the application’s performance, specially in a multi-user environment.

There a a couple of approaches to to fix the pagination in memory. One of them is to execute two SQL queries. The first query retrieves just the entities’ ID using pagination and the second query retrieves the entities using an IN predicate with the IDs from the first query.

This blog covered pagination using the two queries approach with a generic solution using Spring Data JPA Specification.

Stay tuned, I might cover how to fix the HHH000104 pagination problem using a one query approach via the DENSE_RANK() Function in a follow-up post.

Thanks for reading and as always, feedback is very much appreciated. If you found this post helpful and would like to receive updates when content like this gets published, sign up to the newsletter.