添加链接
link之家
链接快照平台
  • 输入网页链接,自动生成快照
  • 标签化管理网页链接
Collectives™ on Stack Overflow

Find centralized, trusted content and collaborate around the technologies you use most.

Learn more about Collectives

Teams

Q&A for work

Connect and share knowledge within a single location that is structured and easy to search.

Learn more about Teams

I am using Spring Boot and Spring Batch and JPA to load data from One database into another. In a single batch Job I've created 10 steps to run steps in sequence and each step reads almost 1 millions records (I can't run in parallel, because data that I've doesn't load in parallel).

I've used GenerationType.IDENTITY and look like because of this batch job taking hell lot of time. If I say to load 100 records it takes 2 min . The target Postgres database has sequences implemented by DBA and we must need to follow those, but its eating up all the performance.

@GeneratedValue(strategy = GenerationType.IDENTITY) @Column(name = "EMP_ID", nullable = false, updatable = false, insertable = false) private Long id;

How can I improve the performance of this batch job ?

I kept spring.jpa.properties.hibernate.jdbc.batch_size=1000 and chunkSize=1000 as well.

Hibernate cannot batch insert entities if the entity is using IDENTITY to generate its ID (Also mentioned in the docs at here ).

So you have to change to use SEQUENCE to generate the ID. And according to this , choose to use "pooled" or "pooled-lo" algorithm to get the new ID from the sequence in order to further improve the performance by reducing the round trips to get the ID.

So the ID mapping looks like :

@GeneratedValue(strategy = GenerationType.SEQUENCE, generator="emp_sequence") @SequenceGenerator(name="emp_sequence", sequenceName = "emp_id_seq", allocationSize = 100) private Long id;

And the hibernate settings :

spring.jpa.properties.hibernate.order_inserts = true
spring.jpa.properties.hibernate.order_updates = true
spring.jpa.properties.hibernate.jdbc.batch_size = 1000
spring.jpa.properties.hibernate.jdbc.batch_versioned_data = true
# For using "pool-lo" optimiser for generating ID when using JPA @SequenceGenerator
spring.jpa.properties.hibernate.id.optimizer.pooled.preferred = pooled-lo

Also , you have to make sure the corresponding ID sequence in PostreSQL is aligned with the configuration in @SequenceGenerator :

alter sequence emp_id_seq increment by 100;

Another tip is to add reWriteBatchedInserts=true in the JDBC connection string which will provides 2-3x performance improvement as said from the docs.

Why there is a gap between batch_size = 1000 and allocationSize = 100? Should we keep this same ? – Jeff Cook Jun 17, 2020 at 6:53 no need , they can be different values. allocationSize control how many IDs will be reversed per each ID generation request. batch_size controls how many records will be batched into one single inserted SQL. – Ken Chan Jun 17, 2020 at 8:13 For anyone wondering why property in settings didn't work, it had a typo spring.jpa.properties.hibernate.hibernate.id.optimizer.pooled.preferred = pooled-lo to spring.jpa.properties.hibernate.id.optimizer.pooled.preferred = pooled-lo – Jacek Lipiec Mar 12, 2022 at 14:25 My entity uses UUID. @Id @GeneratedValue(generator = "UUID") @GenericGenerator(name = "UUID", strategy = "org.hibernate.id.UUIDGenerator") @Column(updatable = false, nullable = false) private UUID UUID; Is it still eligible for batch processing? – Ashish Singh Oct 8, 2022 at 7:29

I Jeff

Las year my team and I came across a situation where a job was taking like forever to perform DB write operations.

What we did to improove performance was:

  • Use repositories and managed entities just for DB read operations, but never alter those entities, to avoid spring jpa to write the changes in database.
  • Create a store procedure for the insert/update operation.
  • Call the SP with JDBCBatchItemWriter using index parameters
  • Set the parameters with ItemPreparedStatementSetter
  • That way we coulm improove performance reducing DB calls!

    Hope it helps

    Thanks for contributing an answer to Stack Overflow!

    • Please be sure to answer the question. Provide details and share your research!

    But avoid

    • Asking for help, clarification, or responding to other answers.
    • Making statements based on opinion; back them up with references or personal experience.

    To learn more, see our tips on writing great answers.