GitLab在docker和Kubernetes之间折腾

GitLab在docker和Kubernetes之间折腾

[TOC]

1. 概述

最近用上了Kubernetes,刚好又要求Gitlab AutoDev配合Kubernetes,所以将旧的Gitlab升级下,并迁移成了helm版本。
但是在使用过程中,发现并不如docker版本稳定,特别是pod在重新分配后,在节点上pull image失败问题,即使配置了镜像加速,虽然有办法解决(我是多个节点去pull,某个成功后,tag,push到私有仓库,再到pod分配节点pull,tag)。
另外,helm版本的gitlab组件全部分离,相互依赖,非常复杂。在不熟悉gitlab内部原理和通信的情况下,排查问题非常困难。
于是,又将helm版本迁移到docker版本,反而是遇到了坑。本想放弃迁移回docker版本,谁想在2018年12月11日,git突然push失败了,而且问题排查半天,也没发现有用的信息。转而解决在docker版本下恢复helm版本的gitlab数据对象存储问题,好在解决了,虽然也有些问题,至少能用了。

2. Gitlab从docker迁移到Kubernetes

Gitlab docker: GitLab-CE 9.5.4
Gitlab Kubernetes: GitLab-CE 11.4.3

2.1 备份恢复过程

进入gitlab docker:

docker exec -it mygitlab /bin/bash

执行备份:

gitlab-rake gitlab:backup:create & # 避免超时自动退出

helm安装gitlab:

helm upgrade --install gitlab --timeout 600 --namespace devops . --set gitlab.migrations.image.repository=registry.gitlab.com/gitlab-org/build/cng/gitlab-rails-ce --set gitlab.sidekiq.image.repository=registry.gitlab.com/gitlab-org/build/cng/gitlab-sidekiq-ce --set gitlab.unicorn.image.repository=registry.gitlab.com/gitlab-org/build/cng/gitlab-unicorn-ce --set gitlab.unicorn.workhorse.image=registry.gitlab.com/gitlab-org/build/cng/gitlab-workhorse-ce --set gitlab.task-runner.image.repository=registry.gitlab.com/gitlab-org/build/cng/gitlab-task-runner-ce --set certmanager-issuer.email=29ygq@sina.com --set gitlab.migrations.initialRootPassword.key="Git@domain.com" --set glabal.initialRootPassword.key="Git@domain.com" --set global.hosts.domain=domain.com --set global.smtp.password.secret=163mail --set global.time_zone="Asia/Shanghai" --set gitlab.gitaly.persistence.storageClass=ceph-rbd --set gitlab.gitaly.persistence.size=20Gi --set gitlab.gitaly.persistence.storageClass=ceph-rbd --set postgresql.persistence.size=5Gi --set postgresql.persistence.storageClass=ceph-rbd --set minio.persistence.size=5Gi --set minio.persistence.storageClass=ceph-rbd --set redis.persistence.size=1Gi --set redis.persistence.storageClass=ceph-rbd --set nginx-ingress.enabled=false --set prometheus.install=false --set certmanager.install=false --set gitlab.gitlab-shell.service.externalPort=2222 --set gitlab.gitlab-shell.service.internalPort=2222 --set gitlab.gitlab-runner.rbac.clusterWideAccess=true --set gitlab.gitlab-runner.rbac.create=true --set gitlab.gitlab-runner.runners.privileged=true 

将备份文件拷出来,并同步到Kubernetes master节点:

docker cp mygitlab:/var/opt/gitlab/backups/1541148206_2018_11_02_9.5.4_gitlab_backup.tar /tmp/rsync -avzP /tmp/1541148206_2018_11_02_9.5.4_gitlab_backup.tar root@master_ip:/tmp/

进入Kubernetes上的gitlab task-runner pod中:

kubectl exec <task-runner pod name> -it /bin/bash 

执行恢复:

backup-utility --restore -f file:///tmp/1541148206_2018_11_02_9.5.4_gitlab_backup.tar &

2.2 恢复失败解决

恢复中断:

[root@lab1 tmp]# kubectl describe pod gitlab-task-runner-6cfc59d65d-zg64k -n devopsName: gitlab-task-runner-6cfc59d65d-zg64kNamespace: devopsPriority: 0PriorityClassName: <none>Node: lab1/Start Time: Wed, 31 Oct 2018 17:04:30 +0800Labels: app=task-runner pod-template-hash=6cfc59d65d release=gitlabAnnotations: checksum/config: dcbb3fa285b4af4cc7d76913713f826335d0826672270a02beeda74bdac8f528 cluster-autoscaler.kubernetes.io/safe-to-evict: trueStatus: FailedReason: EvictedMessage: The node was low on resource: ephemeral-storage. Container task-runner was using 11874956Ki, which exceeds its request of 0. 

Gitlab上有issue说明,但是当前官方版本还没有修复。
https://gitlab.com/charts/gitlab/issues/705
https://github.com/kubernetes/enhancements/issues/361

我们按他的想法,给task-runner添加持久化存储。

pvc.yaml

apiVersion: v1kind: PersistentVolumeClaimmetadata: name: gitlab-task-runner namespace: devopsspec: accessModes: - ReadWriteOnce storageClassName: ceph-rbd resources: requests: storage: 20Gi

kubectl apply -f pvc.yaml

kubectl edit deploy gitlab-task-runner -n devops

找到相关地方,添加挂载点和存储点:

 volumeMounts: - mountPath: /srv/gitlab/tmp name: data-storage-tmp volumes: - name: data-storage-tmp persistentVolumeClaim: claimName: gitlab-task-runner 

待POD重新启来,再将备份文件拷进去,重新恢复:

kubectl cp /tmp/1541148206_2018_11_02_9.5.4_gitlab_backup.tar devops/gitlab-task-runner-5d859b8c8d-gbggx:/srv/gitlab/tmp/kubectl exec -it gitlab-task-runner-5d859b8c8d-gbggx -n devops -- backup-utility --restore -f file:///srv/gitlab/tmp/1541148206_2018_11_02_9.5.4_gitlab_backup.tar

恢复后,登录系统,发现仓库为空。

尝试将目录repositories同步过去恢复。

docker目录为/var/opt/gitlab/git-data/repositories
k8s里目录为/home/git/repositories

恢复后数据可用。

3. Gitlab从Kubernetes迁移到docker

Gitlab docker: GitLab-CE 11.5.2
Gitlab chart: 1.3.2 GitLab-CE 11.5.2

3.1 备份恢复过程

因为有好几G的数据量了,为了加快备份速度,不备份repositories,而是手动打包repositories。

kubectl exec &lt;task-runner pod name&gt; -i /bin/bash backup-utility --skip repositories

备份好的文件类似下面文件名:
1544529772_2018_12_11_11.5.2_gitlab_backup.tar

提前安装docker版gitlab,当时我装的时候版本为11.5.2。

sudo docker run --detach --hostname gitlab.domain.com --publish 443:443 --publish 80:80 --publish 10022:22 --name gitlab --restart always --volume /data/gitlab/config:/etc/gitlab --volume /data/gitlab/logs:/var/log/gitlab --volume /data/gitlab/data:/var/opt/gitlab gitlab/gitlab-ce:latest

因为恢复过程中,它会寻找/data/gitlab/data/backups/目录下的备份打包文件,因此,我们可以手动准备已经解压好的备份文件,然后打包一个小文件为1544529772_2018_12_11_11.5.2_gitlab_backup.tar以减少恢复过程中的解压时间(特别对于多次恢复)。

恢复命令:
docker exec -it gitlab gitlab-rake gitlab:backup:restore

直接将备份恢复后,在页面上随便都会报500错误,日志提示‘Object Storage is not enabled‘,需要按下文开启lfs功能再恢复数据。

###3.2 500错误

  • 情况一:

上文提到的‘Object Storage is not enabled‘gitlab.rb需要开启lfs功能。

### Job Artifacts# gitlab_rails[‘artifacts_enabled‘] = true# gitlab_rails[‘artifacts_path‘] = "/var/opt/gitlab/gitlab-rails/shared/artifacts"####! Job artifacts Object Store####! Docs: https://docs.gitlab.com/ee/administration/job_artifacts.html#using-object-storagegitlab_rails[‘artifacts_object_store_enabled‘] = truegitlab_rails[‘artifacts_object_store_direct_upload‘] = truegitlab_rails[‘artifacts_object_store_background_upload‘] = truegitlab_rails[‘artifacts_object_store_proxy_download‘] = truegitlab_rails[‘artifacts_object_store_remote_directory‘] = "artifacts"gitlab_rails[‘artifacts_object_store_connection‘] = { ‘provider‘ => ‘AWS‘, ‘region‘ => ‘eu-west-1‘, ‘aws_access_key_id‘ => ‘Z1Xh28dpKo0Oc9Xjjq35n0lCceGYxHmGwpibz2WQ9acLtiUTBHftVTKxcLiISSld‘, ‘aws_secret_access_key‘ => ‘ebRmMNRHh9R9ve869SkspkC3xMOyPBmo0FGhud4JqBZu7zjuiMCu36xn7aEVNEeT‘, # # The below options configure an S3 compatible host instead of AWS ‘aws_signature_version‘ => 4, # For creation of signed URLs. Set to 2 if provider does not support v4. ‘endpoint‘ => ‘http://127.0.0.1:9000‘, # default: nil - Useful for S3 compliant services such as DigitalOcean Spaces ‘host‘ => ‘localhost‘, ‘path_style‘ => true # Use ‘host/bucket_name/object‘ instead of ‘bucket_name.host/object‘}### Git LFSgitlab_rails[‘lfs_enabled‘] = truegitlab_rails[‘lfs_storage_path‘] = "/var/opt/gitlab/gitlab-rails/shared/lfs-objects"gitlab_rails[‘lfs_object_store_enabled‘] = truegitlab_rails[‘lfs_object_store_direct_upload‘] = truegitlab_rails[‘lfs_object_store_background_upload‘] = truegitlab_rails[‘lfs_object_store_proxy_download‘] = truegitlab_rails[‘lfs_object_store_remote_directory‘] = "lfs-objects"gitlab_rails[‘lfs_object_store_connection‘] = { ‘provider‘ => ‘AWS‘, ‘region‘ => ‘eu-west-1‘, ‘aws_access_key_id‘ => ‘Z1Xh28dpKo0Oc9Xjjq35n0lCceGYxHmGwpibz2WQ9acLtiUTBHftVTKxcLiISSld‘, ‘aws_secret_access_key‘ => ‘ebRmMNRHh9R9ve869SkspkC3xMOyPBmo0FGhud4JqBZu7zjuiMCu36xn7aEVNEeT#‘, # # The below options configure an S3 compatible host instead of AWS ‘aws_signature_version‘ => 4, # For creation of signed URLs. Set to 2 if provider does not support v4. # ‘endpoint‘ => ‘https://s3.amazonaws.com‘, # default: nil - Useful for S3 compliant services such as DigitalOcean Spaces ‘host‘ => ‘localhost‘, ‘endpoint‘ => ‘http://127.0.0.1:9000‘, ‘path_style‘ => true# # ‘path_style‘ => false # Use ‘host/bucket_name/object‘ instead of ‘bucket_name.host/object‘}### GitLab uploads###! Docs: https://docs.gitlab.com/ee/administration/uploads.htmlgitlab_rails[‘uploads_storage_path‘] = "/var/opt/gitlab/gitlab-rails/public"gitlab_rails[‘uploads_base_dir‘] = "uploads/-/system"gitlab_rails[‘uploads_object_store_enabled‘] = truegitlab_rails[‘uploads_object_store_direct_upload‘] = truegitlab_rails[‘uploads_object_store_background_upload‘] = truegitlab_rails[‘uploads_object_store_proxy_download‘] = truegitlab_rails[‘uploads_object_store_remote_directory‘] = "uploads"gitlab_rails[‘uploads_object_store_connection‘] = { ‘provider‘ => ‘AWS‘, ‘region‘ => ‘eu-west-1‘, ‘aws_access_key_id‘ => ‘Z1Xh28dpKo0Oc9Xjjq35n0lCceGYxHmGwpibz2WQ9acLtiUTBHftVTKxcLiISSld‘, ‘aws_secret_access_key‘ => ‘ebRmMNRHh9R9ve869SkspkC3xMOyPBmo0FGhud4JqBZu7zjuiMCu36xn7aEVNEeT‘,# # # The below options configure an S3 compatible host instead of AWS ‘host‘ => ‘localhost‘, ‘aws_signature_version‘ => 4, # For creation of signed URLs. Set to 2 if provider does not support v4. ‘endpoint‘ => ‘http://127.0.0.1:9000‘, # default: nil - Useful for S3 compliant services such as DigitalOcean Spaces ‘path_style‘ => true # Use ‘host/bucket_name/object‘ instead of ‘bucket_name.host/object‘}

恢复备份数据后,gitlab的docker中执行以下命令迁移数据至对象存储:

 # Avatars gitlab-rake "gitlab:uploads:migrate[AvatarUploader, Project, :avatar]" gitlab-rake "gitlab:uploads:migrate[AvatarUploader, Group, :avatar]" gitlab-rake "gitlab:uploads:migrate[AvatarUploader, User, :avatar]" # Attachments gitlab-rake "gitlab:uploads:migrate[AttachmentUploader, Note, :attachment]" gitlab-rake "gitlab:uploads:migrate[AttachmentUploader, Appearance, :logo]" gitlab-rake "gitlab:uploads:migrate[AttachmentUploader, Appearance, :header_logo]" # Markdown gitlab-rake "gitlab:uploads:migrate[FileUploader, Project]" gitlab-rake "gitlab:uploads:migrate[PersonalFileUploader, Snippet]" gitlab-rake "gitlab:uploads:migrate[NamespaceFileUploader, Snippet]" gitlab-rake "gitlab:uploads:migrate[FileUploader, MergeRequest]"

然后再将repositories覆盖到/data/gitlab/data/git-data/repositories/,并进gitlab docker把目录属主改为git。
在gitlab UI界面,部分项目可能提示为空项目,可以用高权限的维护者,把仓库pull下来,修改下,再push,即可解决项目为空问题。

  • 情况二:
    有可能为DB数据关系错误,需要升级数据库关系

输入以下指令查看数据升级状态

gitlab-rake db:migrate:status

果然发现有一些显示为Down,显示为Up即表示正常同,再执行数据库关系升级

gitlab-rake db:migrate

执行完成再重复重建、重启命令,问题解决。

4. helm版本问题记录

最后,记录下helm版本,push失败问题。日后再研究。

$ git pushwarning: redirecting to https://gitlab.domain.com/devops/docs.git/Enumerating objects: 10, done.Counting objects: 100% (10/10), done.Compressing objects: 100% (6/6), done.Writing objects: 100% (7/7), 3.65 KiB | 1.22 MiB/s, done.Total 7 (delta 2), reused 0 (delta 0)remote: GitLab: Failed to authorize your Git request: internal API unreachableTo https://gitlab.domain.com/devops/docs ! [remote rejected] master -> master (pre-receive hook declined)error: failed to push some refs to ‘https://gitlab.domain.com/devops/docs‘

参考资料:
[1] https://docs.gitlab.com/omnibus/README.html
[2] https://docs.gitlab.com/ce/raketasks/backup_restore.html
[3] https://docs.gitlab.com/omnibus/settings/backups.html
[4] https://docs.gitlab.com/ce/workflow/lfs/lfs_administration.html
[5] https://docs.gitlab.com/ce/administration/uploads.html
[6] https://docs.gitlab.com/ce/administration/raketasks/check.html

相关文章