Spring Data: a new perspective of data operations

Spring Data: a new perspective of data operations

Spring Data is an umbrella project from SpringSource Community, which tries to provide a more generic abstraction of data operations for RDBMS, NoSQL and cloud based data and indexing service.

The status of Spring Data

Spring Data is evolved out of an open source project named Hades, which mainly provided a GenericDao interface for Hibernate/JPA. Spring Data extends the core concept to other domains and APIs.

According to the Spring data project page, currently the official Spring Data project provides the following sub projects.

  • JPA
  • REST
  • JDBC Extensions
  • Apache Hadoop
  • GemFire
  • Redis
  • MongoDB
  • Neo4j
  • Commons

Besides, there are some community based projects available.

  • Spring Data Solr , Spring Data repositories abstraction for Apache Solr
  • Spring Data Elasticsearch , Spring Data repositories abstraction for Elasticsearch
  • Spring Data Couchbase , Spring Data repositories abstraction for Couchbase
  • Spring Data FuzzyDB , Spring Data repositories abstraction for FuzzyDB

Spring Data Commons(Maven archetype id is spring-data-commons) is the base for other subprojects.

Overview of Spring Data Commons

The most attractive feature of Commons is it provides a generic Repository interface and two extended variants for CRUD operations(CurdRepository) and pagination(PagingAndSortingRepository) purposes which can be used in all other sub projects.

public interface Repository<T, ID extends Serializable> {


Repository is an empty interface which can accept a domain class and a serialable primary key as type parameters. CurdRepository extends Repository and adds some common methods for CRUD operations, PagingAndSortingRepository extends CurdRepository and provides pagination and sort capability.

You can select any one of them to extend in your project. But in the real world project, you should extend the one in the certain subproject, which provides more features. For example, if you are using Spring Data JPA, it is better to create your Repository to extend JpaRepository which is extended from PagingAndSortingRepository.

When you use any sub projects, you must add Spring Data Commons as project dependency, and specify the package which includes the your Repository in Spring configuration. Spring Data will discover the Repository beans automatically.

For example, if you are using Spring Data JPA, you have to add the following configuration

<jpa:repositories base-package="com.hantsylabs.example.spring.jpa"></jpa:repositories>

or use @EanbleJpaRepositories via Spring java configuration

@EnableJpaRepositories(basePackages = { "com.hantsylabs.example.spring.jpa" })

to activate your own repositories.

NOTE: If you are using the Repository API in other data storage, such as MongoDB, you should include Spring Data Mongo dependency in project classpath and activate it via mongo namespace.

Up to now, you can not see any magic of the Repository API. Keep patient!

Query by convention

Assume there is a JPA entity class defined as:

public class Conference {
    @GeneratedValue(strategy = GenerationType.AUTO)
    @Column(name = "id")
    private Long id;

    @Column(name = "version")
    private Integer version;

    private String name;

    private String description;

    @DateTimeFormat(style = "M-")
    private Date startedDate;

    @DateTimeFormat(style = "M-")
    private Date endedDate;

    private String slug;

    //getters and setters ...

And declare an Repository interface for Conference class like this:

public interface ConferenceRepository extends 
        PagingAndSortingRepository<Conference, Long>{

    Conference findBySlug(String slug);

    Conference findByName(String name);

    Conference findByDescriptionLike(String desc);

    Conference findByStartedDateBetween(Date started, Date ended);


Now you can call them directly in your Services or Controllers, yes, you do not need to implement them at all, Spring Data prepare the query and get the results for you.

When you are using JPA, findBySlug are equivalent to the following JPQL.

from Conference where slug=?1

When you are using Mongo, you do not need the JPA entity declaration, and the method will be converted into Mongo specific expression.

In this example, the methods return a single object, in most of case, they should return a Collection of object.

Ideally, the above logic are available in all Spring Data subprojects, but the method naming convention maybe some difference among the specific data storage.

Common annotations

If you have used Spring Data in projects, you will find it try to provide a series of common annotations for all data storage. Such as, @Id, @Version , @Persistent, @Transient , @PersistenceConstructor, etc. Please explorer the package org.springframework.data.annotation for more details.

But currently they behaves a little different in the subprojects, they can not be exchanged with the ones in the existing API, such as JPA API.

Personally, I do not think this is a good idea. If you are keep an open on the world outside of Spring, you will find there are some better solutions.

EclipseLink and Hibernate are starting to support NoSQL in the new version, but base on JPA API(and some extension), DataNucleus is a mature solution for multi storage(RDBMS, NoSQL, even filesystem), but it supports standard JPA, JDO API, and it is also used in Google App Engine for Java developers. Obviously, if you select these solutions, the learning curve to switch between different storage will be decreased dramatically.

Audit API

Commons provides some interfaces(such as Auditable, Persistable, AuditorAware) and annotations(@CreatedDate, @CreatedBy, @LastModifidedBy, @LastModifiedDate) to developers to add some simple audit capabilities for your domain classes.

Pagination capability

Pagination is a basic feature for web application, some APIs in Commons are ready for this purpose.

The PagingAndSortingRepository provides some methods which can accept a Pageable object as parameter and return a Page object.

Pageable interface which will be passed in includes the essential data for pagination, such page size, offset, sort field and direction. There is a default implementation class PageRequest provided in Commons.

Page class wraps all information of the pagination query result, the collection of result data, the count of all data, etc.

Commons provides some pagination integration with Spring MVC.

We will discuss these in details in future post.

2019-03-27 01:22




Becoming a data scientist

Data Week: Becoming a data scientist Data Pointed, CouchDB in the Cloud, Launching Strata                                                                                                       Life Adv

Hadoop Operations

If you’ve been asked to maintain large and complex Hadoop clusters, this book is a must. Demand for operations-specific material has skyrocketed now that Hadoop is becoming the de facto standard for truly large-scale data processing in the data center...


spring貌似要一统天下,不断的给人以惊喜    请看官方发言:    我很高兴的宣布 Spring Data Solr 项目首个里程碑发布,这是由  Christoph Strobl  领导开发的项目,实现了 Spring Data 访问 Solr 存储并提供了 Spring Data JPA 模型的访问方式。此次之外,Spring Data Solr 提供了一个更底层的 SolrTempla

《Big Data Glossary》笔记

清明假期翻以前的笔记发现有一些NoSQL相关的内容,比较零散,是之前读《Big Data Glossary》的笔记.简单整理了一下,记录于此.  Horizontal or Vertical Scaling   数据库扩展的方向有两个:    垂直扩展-换更牛的机器   水平扩展-增加同样的机器     选择水平扩展必然遇到的一个问题就是,如何决定数据分布在哪台机器上? 也就是分片策略

Spring Data Solr教程(翻译)

大多数应用都必须具有某种搜索功能,问题是搜索功能往往是巨大的资源消耗并且它们由于沉重的数据库加载而拖垮你的应用的性能 这就是为什么转移负载到一个外部的搜索服务器是一个不错的主意,Apache Solr是一个流行的开源搜索服务器,它通过使用类似REST的HTTP API,这就确保你能从几乎任何编程语言来使用solr 虽然支持任何编程语言的能力具有很大的市场价值,你可能感兴趣的问题是:我如何和在我的S

(二)solr data import

solr 的 data import 导入 mysql数据 (1)、编辑 example/solr/conf/solrconfig.xml 添加 request handler <requestHandler name="/dataimport" class="org.apache.solr.handler.dataimport.DataImportHandle

zz Data Analysis Process

An interesting article....easy to understand. Summary, be critical..... MindMap Chart Below...  By Robert Niles You wouldn't buy a car or a house without asking some questions about it first. So don't

自己封装的一个Solr Data Import Request Handler Scheduler

经过将近一天的努力,终于搞定了Solr的Data Import Request HandlerScheduler。 Scheduler主要解决两个问题: 1.定时增量更新索引。 2.定时重做索引。 经过测试,Scheduler已经可以实现完全基于配置,无需开发功能,无需人工干预的情况下实现以上两个功能(结合 Solr 的Data Import Request Handler前提下)。 为了方便以后

使用Solr Data Import的delta-import功能

使用Solr Data Import的delta-import功能    Solr提供了full-import和delta-import两种导入方式,这篇文章主要讲解后者。 所谓delta-import主要是对于数据库(也可能是文件等等)中增加或者被修改的字段进行导入。主要原理是利用率每次我们进行import的时候在solr.home\conf下面生成的dataimport.properties文


Data-config为solr的data-import处理器配置数据来源。 依次按照如下树状结构: <dataConfig> <dataSource name="tdp" type="JdbcDataSource" driver="com.mysql.jdbc.Driver" url="jdbc:mysql:

[ZZ] Big Data 开源工具

原文参见: http://techcrunch.com/2012/10/27/big-data-right-now-five-trendy-open-source-technologies/ 总结的开源工具: 1, STORM AND KAFKA 2, DRILL AND DREMEL 3, R 4, GREMLIN AND GIRAPH 5, SAP HANA 6, HONORABLE MENT

TMF大数据分析指南 Unleashing Business Value in Big Data(一)

大数据分析指南 TMF Frameworx最佳实践 Unleashing Business Value in Big Data 前言    此文节选自TMF Big Data Analytics Guidebook。   TMF文档版权信息    Copyright © TeleManagement Forum 2013. All Rights Reserved. This docume

The Log: What every software engineer should know about real-time data's unifying abstraction

http://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying 主要的思想,  将所有的系统都可以看作两部分,真正的数据log系统和各种各样的query engine  所有的一致性由log系统来保证,其他各

[转]So You Want To Be A Producer

pro-du-cer n. 1. Someone from a game publisher who will be the liaison between the publisher and the game development team. 2. A furnace that manufactures producer gas.  If you want to learn about fur

Solr 4.3.0 配置Data import handler时出错

启动solr的时候,居然出现了如下的错误: org.apache.solr.common.SolrException: RequestHandler init failure at org.apache.solr.core.SolrCore.<init>(SolrCore.java:794) at org.apache.solr.core.SolrCore.<init>(S




java线程类为:java.lang.Thread,其实现java.lang.Runnable接口。 线程在运行过程中有6种状态,分别如下: NEW:初始状态,线程被构建,但是还没有调用start()方法 RUNNABLE:运行状态,Java线程将操作系统中的就绪和运行两种状态统称为“运行状态” BLOCK:阻塞状态,表示线程阻塞


默认情况下redis数据库充当slave角色时是只读的不能进行写操作,如果写入,会提示以下错误:READONLY You can't write against a read only slave.> set k3 111  (error) READONLY You can't write against a read only slave. 如果你要开启从库




​在TCP/IP的基于流的传输中,接收的数据被存储到套接字接收缓冲器中。不幸的是,基于流的传输的缓冲器不是分组的队列,而是字节的队列。 这意味着,即使将两个消息作为两个独立的数据包发送,操作系统也不会将它们视为两个消息,而只是一组字节(有点悲剧)。 因此,不能保证读的是您在远程定入的行数据








最新版本的Netty 4.x和JDK 1.6及更高版本



HttpClient 上传文件

我们使用MultipartEntityBuilder创建一个HttpEntity。 当创建构建器时,添加一个二进制体 - 包含将要上传的文件以及一个文本正文。 接下来,使用RequestBuilder创建一个HTTP请求,并分配先前创建的HttpEntity。


查看当前使用的数据库    > db    test  切换数据库   > use foobar    switched to db foobar  插入文档    > post={"title":"领悟书生","content":"这是一个分享教程的网站","date":new


什么是MongoDB MongoDB is a general purpose, document-based, distributed database built for modern application developers and for the cloud era. MongoDB是一个基于分布式文件存储的数据库。由C++语言编写。旨在为WEB应用提供可扩展的高性能数据存储解决方案。


安装 下载MongoDB的安装包:mongodb-win32-x86_64-2008plus-ssl-3.2.10-signed.msi,按照提示步骤安装即可。 安装完成后,软件会安装在C:\Program Files\MongoDB 目录中 我们要启动的服务程序就是C:\Program Files\MongoDB\Server\3.2\bin目录下的mongod.exe,为了方便我们每次启动,我

Spring boot整合MyBatis-Plus 之二:增删改查

基于上一篇springboot整合MyBatis-Plus之后,实现简单的增删改查 创建实体类 添加表注解TableName和主键注解TableId import com.baomidou.mybatisplus.annotations.TableId;
import com.baomidou.mybatisplus.annotations.TableName;
import com.baom


基于snowflake雪花算法分布式ID生成器 snowflake雪花算法分布式ID生成器几大特点: 41bit的时间戳可以支持该算法使用到2082年 10bit的工作机器id可以支持1024台机器 序列号支持1毫秒产生4096个自增序列id 整体上按照时间自增排序 整个分布式系统内不会产生ID碰撞 每秒能够产生26万ID左右 Twitter的 Snowflake分布式ID生成器的JAVA实现方案