GitLab在CockroachDB和YugabyteDB上的兼容性对比(一)-系统初始化
作者介绍
何傲, 神州数码集团高级开发工程师,长期从事于技术研发、数据库运维和系统架构方面的工作,分布式技术爱好者,TiDB社区技术布道师。目前专注在分布式数据库领域的研究和应用,负责神州数码TiDB团队项目交付、技术咨询、生态研发以及团队管理等相关工作。
一、测试背景
GitLab是一款在全球范围内都非常流行的源代码管理工具,早期的版本当中用户可以选择使用MySQL或PostgreSQL两种数据库,但是从12.1.0版本开始官方就完全放弃了对MySQL的支持。
GitLab新版本中很多功能都基于PostgreSQL的特性开发,它是众多使用了PostgreSQL作为底层数据存储的标杆产品。
我们试想一下这种用户场景,某大型集团分为众多事业部,每个事业部甚至小团队可能都维护了自己的GitLab,从集团层面如何管理这些仓库就成了棘手的问题。比如:
- 版本问题(开源版和商业版,高版本和低版本)
- 精细化权限控制
- 数据备份
- 基础设施利用率
如果能有一套统一的GitLab环境,同时又具备良好的可扩展性和高可用性,那无疑是最好的解决方案。但是传统单机PostgreSQL数据库并不能满足以上需求,那能否考虑把GitLab跑在分布式数据库上?
CockroachDB和YugabyteDB是目前比较知名的实现了PG协议的新型开源分布式数据库,根据各自官网的描述:
CockroachDB supports the PostgreSQL wire protocol and the majority of PostgreSQL syntax. This means that existing applications built on PostgreSQL can often be migrated to CockroachDB without changing application code. (原文出处见参考资料)
YugabyteDB is a high-performance, cloud-native distributed SQL database that aims to support all PostgreSQL features. (原文出处见参考资料)
CockroachDB说支持绝大多数的PG语法,YugabyteDB说支持所有的PG特性,本系列测评文章用于对比这两款数据库对GitLab的支持程度如何,一定程度上能反映出对标准PostgreSQL的兼容情况。
二、测试环境
1、CockroachDB
defaultdb=# select version();
version
-----------------------------------------------------------------------------------------
CockroachDB CCL v21.2.2 (x86_64-unknown-linux-gnu, built 2021/12/01 14:35:45, go1.16.6)
(1 row)
2、YugabyteDB
postgres=# select version();
version
------------------------------------------------------------------------------------------------------------
PostgreSQL 11.2-YB-2.9.1.0-b0 on x86_64-pc-linux-gnu, compiled by gcc (Homebrew gcc 5.5.0_4) 5.5.0, 64-bit
(1 row)
3、GitLab
GitLab information
Version: 12.1.0-ee
Revision: 1f2e6f3f6d8
Directory: /home/git/gitlab
DB Adapter: PostgreSQL
用标准PostgreSQL部署的GitLab包含的数据库schema为:
gitlab_production=# select C.relkind,count(C.relname) from pg_class C left join pg_namespace n on n.oid = C.relnamespace where n.nspname = 'public' group by C.relkind;
relkind | count
---------+-------
r | 249
i | 903
S | 231
(3 rows)
三、CockroachDB启动流程
1、数据库初始化
执行GitLab setup程序生成所需要的库表结构:
dc@dc-virtual-machine:/home/git/gitlab$ sudo -u git -H bundle exec rake gitlab:setup RAILS_ENV=production
This will create the necessary database tables and seed the database.
You will lose any previous data stored in the database.
Do you want to continue (yes/no)? yes
Dropped database 'gitlab'
Created database 'gitlab'
-- enable_extension("pg_trgm")
rake aborted!
ActiveRecord::StatementInvalid: PG::FeatureNotSupported: ERROR: unimplemented: extension "pg_trgm" is not yet supported
HINT: You have attempted to use a feature that is not yet implemented.
See: https://go.crdb.dev/issue-v/51137/v21.2
: CREATE EXTENSION IF NOT EXISTS "pg_trgm"
/home/git/gitlab/config/initializers/peek.rb:18:in `async_exec_params'
/home/git/gitlab/config/initializers/peek.rb:18:in `exec_params'
/home/git/gitlab/vendor/bundle/ruby/2.6.0/gems/activerecord-5.2.3/lib/active_record/connection_adapters/postgresql_adapter.rb:611:in `block (2 levels) in exec_no_cache'
....
从上面的输出信息可以看到,GitLab初始化需要依赖PostgreSQL的Extension特性,但是很遗憾CockroachDB目前还不支持,在第一步就失败了,此时数据库中没有创建任何对象:
gitlab=# select C.relkind,count(C.relname) from pg_class C left join pg_namespace n on n.oid = C.relnamespace where n.nspname = 'public' group by C.relkind;
Empty set
2、访问GitLab
当我们访问GitLab主页面时会返回502错误信息:
从日志来看,是因为SQL执行的时候找不到目标表报错:
ActiveRecord::StatementInvalid: PG::UndefinedTable: ERROR: relation "geo_nodes" does not exist
: SELECT a.attname, format_type(a.atttypid, a.atttypmod),
pg_get_expr(d.adbin, d.adrelid), a.attnotnull, a.atttypid, a.atttypmod,
c.collname, col_description(a.attrelid, a.attnum) AS comment
FROM pg_attribute a
LEFT JOIN pg_attrdef d ON a.attrelid = d.adrelid AND a.attnum = d.adnum
LEFT JOIN pg_type t ON a.atttypid = t.oid
LEFT JOIN pg_collation c ON a.attcollation = c.oid AND a.attcollation <> t.typcollation
WHERE a.attrelid = '"geo_nodes"'::regclass
AND a.attnum > 0 AND NOT a.attisdropped
ORDER BY a.attnum
3、更新数据库版本
考虑到当前CockroachDB不是最新版本,有没有可能最新版已经支持extension功能,尝试升级一下版本到latest-v22.1:
defaultdb=# select version();
version
------------------------------------------------------------------------------------
CockroachDB CCL v22.1.0 (x86_64-pc-linux-gnu, built 2022/05/23 16:27:47, go1.17.6)
(1 row)
再次执行setup创建数据库,发现还是报相同的问题“ActiveRecord::StatementInvalid: PG::FeatureNotSupported: ERROR: unimplemented: extension “pg_trgm” is not yet supported”,说明新版本也无法支持extension特性。
四、YugabyteDB启动流程
1、数据库初始化
修改GitLab配置文件把数据库连接切换到YugabyteDB,用相同办法初始化一个新库:
dc@dc-virtual-machine:/home/git/gitlab$ sudo -u git -H bundle exec rake gitlab:setup RAILS_ENV=production
This will create the necessary database tables and seed the database.
You will lose any previous data stored in the database.
Do you want to continue (yes/no)? yes
Dropped database 'gitlab'
Created database 'gitlab'
-- enable_extension("pg_trgm")
-> 2.5496s
-- enable_extension("plpgsql")
-> 0.1143s
-- create_table("abuse_reports", {:id=>:serial, :force=>:cascade})
-> 0.3709s
-- create_table("appearances", {:id=>:serial, :force=>:cascade})
-> 0.3022s
-- create_table("issue_tracker_data", {:force=>:cascade})
-> 3.7627s
-- create_table("issues", {:id=>:serial, :force=>:cascade})
rake aborted!
ActiveRecord::StatementInvalid: PG::InternalError: ERROR: index method "ybgin" not supported yet
HINT: See https://github.com/YugaByte/yugabyte-db/issues/1337. Click '+' on the description to raise its priority
: CREATE INDEX "index_issues_on_description_trigram" ON "issues" USING gin ("description" gin_trgm_ops)
/home/git/gitlab/vendor/bundle/ruby/2.6.0/gems/peek-pg-1.3.0/lib/peek/views/pg.rb:17:in `async_exec'
/home/git/gitlab/vendor/bundle/ruby/2.6.0/gems/peek-pg-1.3.0/lib/peek/views/pg.rb:17:in `async_exec'
从以上输出信息可以看出,刚开始setup运行正常,可以正常创建extension和table,持续约20分钟后碰到创建索引失败,原因是YugabyteDB不能识别“gin”类型的索引,取而代之的类型是“ybgin”。
看一下到这一步数据库生成了哪些对象:
gitlab=# select C.relkind,count(C.relname) from pg_class C left join pg_namespace n on n.oid = C.relnamespace where n.nspname = 'public' group by C.relkind;
relkind | count
---------+-------
S | 113
i | 391
r | 117
(3 rows)
情况看起来比CockroachDB要好一些,但是比完整的库表结构还是差很多。
2、访问GitLab
此时依然无法访问GitLab主页面,从日志里面发现报错原因是缺少目标表:
source=rack-timeout id=7gatOugcqB8 timeout=60000ms state=ready
Started GET "/" for 10.3.74.126 at 2022-05-27 16:05:31 +0800
Processing by RootController#index as HTML
Completed 500 Internal Server Error in 78ms (ActiveRecord: 58.8ms | Elasticsearch: 0.0ms)
ActiveRecord::StatementInvalid (PG::UndefinedTable: ERROR: relation "projects" does not exist
LINE 8: WHERE a.attrelid = '"projects"'::regclass
: SELECT a.attname, format_type(a.atttypid, a.atttypmod),
pg_get_expr(d.adbin, d.adrelid), a.attnotnull, a.atttypid, a.atttypmod,
c.collname, col_description(a.attrelid, a.attnum) AS comment
FROM pg_attribute a
LEFT JOIN pg_attrdef d ON a.attrelid = d.adrelid AND a.attnum = d.adnum
LEFT JOIN pg_type t ON a.atttypid = t.oid
LEFT JOIN pg_collation c ON a.attcollation = c.oid AND a.attcollation <> t.typcollation
WHERE a.attrelid = '"projects"'::regclass
AND a.attnum > 0 AND NOT a.attisdropped
ORDER BY a.attnum
):
3、更新数据库版本
同样地,我们尝试把YugabytesDB升级到最新版本,看是否已经完成了Gin索引兼容:
postgres=# select version();
version
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
PostgreSQL 11.2-YB-2.13.2.0-b0 on x86_64-pc-linux-gnu, compiled by clang version 12.0.1 (https://github.com/yugabyte/llvm-project.git bdb147e675d8c87cee72cc1f87c4b82855977d94), 64-bit
(1 row)
再次执行setup程序,这个过程比较顺利,大约30分钟以后程序正常退出无报错。这时候我们看一下数据库中的对象情况:
gitlab=# select C.relkind,count(C.relname) from pg_class C left join pg_namespace n on n.oid = C.relnamespace where n.nspname = 'public' group by C.relkind;