闷骚的酱肘子 · 济南市人民政府办公厅印发关于深化文化体制改革 ...· 2 月前 · |
听话的香菜 · 广联达GBQ工程预算软件操作全攻略 - ...· 3 月前 · |
空虚的荒野 · 广东省发展和改革委员会 - ...· 4 月前 · |
睡不着的盒饭 · 上海长宁、嘉定武协与非遗武术家共建传承基地_武术界· 4 月前 · |
面冷心慈的手套 · 山东省人民政府 专项规划 ...· 5 月前 · |
浅言碎语x86_64 架构,直接从官网下载二进制文件就可以了hadoop下载地址maven下载地址jdk下载地址oracle 的 jdk 下载需要注册 oracle 的账号arm64 架构官方没有二进制文件,只能自己编译当然,也可以在 linux 服务上编译,利用 docker 其实是为了不'玷污'本地环境因为发行版是 麒麟V10 ,编译异常困难,还是要借助 docker 运行一个 centos为什么是 centos ?因为 hadoop-2.7.7 编译的时候要求 protoc 的版本是 2.5.0麒麟 V10 默认下载的版本已经是 3.x 了,而且没有 2.x 的历史版本尝试过去编译 protoc, 反正没成功docker hub 上拉取的 centos 7 的镜像,yum 安装的就是 2.5.0 的 protocdocker 镜像制作完成后,一定要先去容器里面执行 protoc --version,确认版本是不是2.5.0的,否则编译过程中会报错报错内容protoc version is 'libprotoc 3.9.0', expected version is '2.5.0'为什么用 oracle 的 jdk?因为 openjdk 在编译的时候,会出现 cannot find symbol 的报错,编译就进行不下去了用了 oracle 的 jdk1.8.0_321 就没有问题,百度云上传了一份链接:https://pan.baidu.com/s/1dY20MskH40KOscq0dXM61A提取码:vuuh编译过程还是比较依赖网络的,需要从 apache 的 maven 仓库里面获取 java 依赖当然,也可以提前下载好 maven 仓库,但是挺累的准备环境查看发行版cat /etc/os-releaseNAME="Kylin Linux Advanced Server" VERSION="V10 (Tercel)" ID="kylin" VERSION_ID="V10" PRETTY_NAME="Kylin Linux Advanced Server V10 (Tercel)" ANSI_COLOR="0;31"查看架构uname -iaarch64目录结构. ├── apache-maven-3.6.2.tar.gz ├── Dockerfile ├── hadoop-2.7.7-src.tar.gz └── jdk1.8.0_321.tar.gz编写 dockerfile# 定义一个 centos 7 的初始镜像,方便自定义编译环境 FROM centos:7 # 定义变量 ## 定义工作目录 ARG work_dir=/usr/local # 定义容器的环境变量 ENV JAVA_HOME=${work_dir}/jdk1.8.0_321 ## 我这里用的是 maven 3.6.2 版本,如果下载的版本和我的不一样,这里要修改 ENV MAVEN_HOME=${work_dir}/apache-maven-3.6.2 ENV PATH=${PATH}:${JAVA_HOME}/bin:${MAVEN_HOME}/bin # 定义进入容器的默认目录 WORKDIR ${work_dir} # 配置 yum 源为 阿里源 ## 安装编译 hadoop 所需的工具,清理安装包和缓存 RUN curl -O /etc/yum.repos.d/ http://mirrors.aliyun.com/repo/Centos-7.repo && \ yum install -y gcc gcc-c++ make cmake protobuf-* automake libtool zlib-devel openssl-devel && \ yum clean all # 复制 tar 包到镜像内 ADD ./jdk1.8.0_321.tar.gz ./ ADD ./apache-maven-3.6.2.tar.gz ./ ADD ./hadoop-2.7.7-src.tar.gz ./ # 整个脚本,让他睡十年 ## docker 容器想要在后台常驻,需要有一个前台常驻进程 RUN echo '/usr/bin/sleep 315360000' > start.sh && \ chmod +x start.sh CMD ["/usr/bin/bash","start.sh"]创建一个镜像docker build -t hadoop:2.7.7 .开始编译先让容器在后台跑着docker run -d --network host hadoop:2.7.7进入容器docker exec -it <容器id> bash开始编译cd hadoop-2.7.7-src mvn package -e -X -Pdist,native -DskipTests -Dtar整个编译时间长达 1小时22分钟(反正肯定一小时起步,至于时长,还是和机器性能以及网络有关)[INFO] ------------------------------------------------------------------------ [INFO] Reactor Summary for Apache Hadoop Main 2.7.7: [INFO] [INFO] Apache Hadoop Main ................................. SUCCESS [14:57 min] [INFO] Apache Hadoop Build Tools .......................... SUCCESS [11:03 min] [INFO] Apache Hadoop Project POM .......................... SUCCESS [03:02 min] [INFO] Apache Hadoop Annotations .......................... SUCCESS [ 54.759 s] [INFO] Apache Hadoop Assemblies ........................... SUCCESS [ 0.407 s] [INFO] Apache Hadoop Project Dist POM ..................... SUCCESS [02:16 min] [INFO] Apache Hadoop Maven Plugins ........................ SUCCESS [03:00 min] [INFO] Apache Hadoop MiniKDC .............................. SUCCESS [07:35 min] [INFO] Apache Hadoop Auth ................................. SUCCESS [03:06 min] [INFO] Apache Hadoop Auth Examples ........................ SUCCESS [01:01 min] [INFO] Apache Hadoop Common ............................... SUCCESS [08:37 min] [INFO] Apache Hadoop NFS .................................. SUCCESS [ 3.666 s] [INFO] Apache Hadoop KMS .................................. SUCCESS [01:01 min] [INFO] Apache Hadoop Common Project ....................... SUCCESS [ 0.204 s] [INFO] Apache Hadoop HDFS ................................. SUCCESS [02:13 min] [INFO] Apache Hadoop HttpFS ............................... SUCCESS [ 29.157 s] [INFO] Apache Hadoop HDFS BookKeeper Journal .............. SUCCESS [02:04 min] [INFO] Apache Hadoop HDFS-NFS ............................. SUCCESS [ 2.859 s] [INFO] Apache Hadoop HDFS Project ......................... SUCCESS [ 0.205 s] [INFO] hadoop-yarn ........................................ SUCCESS [ 0.205 s] [INFO] hadoop-yarn-api .................................... SUCCESS [ 23.612 s] [INFO] hadoop-yarn-common ................................. SUCCESS [03:06 min] [INFO] hadoop-yarn-server ................................. SUCCESS [ 0.207 s] [INFO] hadoop-yarn-server-common .......................... SUCCESS [ 6.124 s] [INFO] hadoop-yarn-server-nodemanager ..................... SUCCESS [ 10.998 s] [INFO] hadoop-yarn-server-web-proxy ....................... SUCCESS [ 2.448 s] [INFO] hadoop-yarn-server-applicationhistoryservice ....... SUCCESS [ 4.626 s] [INFO] hadoop-yarn-server-resourcemanager ................. SUCCESS [ 12.354 s] [INFO] hadoop-yarn-server-tests ........................... SUCCESS [ 3.492 s] [INFO] hadoop-yarn-client ................................. SUCCESS [ 3.898 s] [INFO] hadoop-yarn-server-sharedcachemanager .............. SUCCESS [ 2.501 s] [INFO] hadoop-yarn-applications ........................... SUCCESS [ 0.190 s] [INFO] hadoop-yarn-applications-distributedshell .......... SUCCESS [ 2.112 s] [INFO] hadoop-yarn-applications-unmanaged-am-launcher ..... SUCCESS [ 1.968 s] [INFO] hadoop-yarn-site ................................... SUCCESS [ 0.202 s] [INFO] hadoop-yarn-registry ............................... SUCCESS [ 3.535 s] [INFO] hadoop-yarn-project ................................ SUCCESS [ 3.945 s] [INFO] hadoop-mapreduce-client ............................ SUCCESS [ 0.360 s] [INFO] hadoop-mapreduce-client-core ....................... SUCCESS [ 12.012 s] [INFO] hadoop-mapreduce-client-common ..................... SUCCESS [ 9.888 s] [INFO] hadoop-mapreduce-client-shuffle .................... SUCCESS [ 2.641 s] [INFO] hadoop-mapreduce-client-app ........................ SUCCESS [ 6.284 s] [INFO] hadoop-mapreduce-client-hs ......................... SUCCESS [ 3.903 s] [INFO] hadoop-mapreduce-client-jobclient .................. SUCCESS [ 17.598 s] [INFO] hadoop-mapreduce-client-hs-plugins ................. SUCCESS [ 1.778 s] [INFO] Apache Hadoop MapReduce Examples ................... SUCCESS [ 3.951 s] [INFO] hadoop-mapreduce ................................... SUCCESS [ 2.957 s] [INFO] Apache Hadoop MapReduce Streaming .................. SUCCESS [ 14.252 s] [INFO] Apache Hadoop Distributed Copy ..................... SUCCESS [01:42 min] [INFO] Apache Hadoop Archives ............................. SUCCESS [ 1.917 s] [INFO] Apache Hadoop Rumen ................................ SUCCESS [ 3.753 s] [INFO] Apache Hadoop Gridmix .............................. SUCCESS [ 3.136 s] [INFO] Apache Hadoop Data Join ............................ SUCCESS [ 2.143 s] [INFO] Apache Hadoop Ant Tasks ............................ SUCCESS [ 1.645 s] [INFO] Apache Hadoop Extras ............................... SUCCESS [ 2.321 s] [INFO] Apache Hadoop Pipes ................................ SUCCESS [ 6.265 s] [INFO] Apache Hadoop OpenStack support .................... SUCCESS [ 3.007 s] [INFO] Apache Hadoop Amazon Web Services support .......... SUCCESS [03:45 min] [INFO] Apache Hadoop Azure support ........................ SUCCESS [ 24.236 s] [INFO] Apache Hadoop Client ............................... SUCCESS [ 7.014 s] [INFO] Apache Hadoop Mini-Cluster ......................... SUCCESS [ 0.844 s] [INFO] Apache Hadoop Scheduler Load Simulator ............. SUCCESS [ 3.275 s] [INFO] Apache Hadoop Tools Dist ........................... SUCCESS [ 7.215 s] [INFO] Apache Hadoop Tools ................................ SUCCESS [ 0.199 s] [INFO] Apache Hadoop Distribution ......................... SUCCESS [ 29.403 s] [INFO] ------------------------------------------------------------------------ [INFO] BUILD SUCCESS [INFO] ------------------------------------------------------------------------ [INFO] Total time: 01:22 h [INFO] Finished at: 2022-04-17T04:51:48Z [INFO] ------------------------------------------------------------------------编译完成后的 tar 包文件在hadoop-dist/target/目录下
mysql 各大版本下载地址环境准备NAME="Kylin Linux Advanced Server" VERSION="V10 (Tercel)" ID="kylin" VERSION_ID="V10" PRETTY_NAME="Kylin Linux Advanced Server V10 (Tercel)" ANSI_COLOR="0;31"内核版本4.19.90-17.ky10.aarch64准备编译环境yum install -y gcc gcc-c++ make cmake ncurses ncurses-devel bison useradd mysql -s /sbin/nologin下载解压安装包wget https://downloads.mysql.com/archives/get/p/23/file/mysql-boost-5.7.29.tar.gz tar xvf mysql-boost-5.7.29.tar.gz cd mysql-5.7.29cmake 参数mysql 编译参数FormatsDescriptionDefaultIntroducedRemovedBUILD_CONFIG使用与官方版本相同的构建选项 CMAKE_BUILD_TYPE要生产的构建类型RelWithDebInfo CMAKE_CXX_FLAGSC++ 编译器的标志 CMAKE_C_FLAGSC 编译器的标志 CMAKE_INSTALL_PREFIX安装目录/usr/local/mysql COMPILATION_COMMENT编译与评论 CPACK_MONOLITHIC_INSTALL包构建是否产生单个文件OFF DEFAULT_CHARSET默认服务器字符集latin1 DEFAULT_COLLATION默认服务器排序规则latin1_swedish_ci DISABLE_PSI_COND排除性能模式条件检测OFF DISABLE_PSI_FILE排除性能模式文件检测OFF DISABLE_PSI_IDLE排除性能模式空闲检测OFF DISABLE_PSI_MEMORY排除性能模式内存检测OFF DISABLE_PSI_METADATA排除性能模式元数据检测OFF DISABLE_PSI_MUTEX排除性能模式互斥检测OFF DISABLE_PSI_PS排除性能模式准备好的语句OFF DISABLE_PSI_RWLOCK排除性能模式 rwlock 检测OFF DISABLE_PSI_SOCKET排除性能模式套接字检测OFF DISABLE_PSI_SP排除 Performance Schema 存储程序检测OFF DISABLE_PSI_STAGE排除 Performance Schema 阶段检测OFF DISABLE_PSI_STATEMENT排除性能模式语句检测OFF DISABLE_PSI_STATEMENT_DIGEST排除性能模式语句_digest 检测OFF DISABLE_PSI_TABLE排除性能模式表检测OFF DISABLE_PSI_THREAD排除性能模式线程检测OFF DISABLE_PSI_TRANSACTION排除性能模式事务检测OFF DOWNLOAD_BOOST是否下载Boost库OFF DOWNLOAD_BOOST_TIMEOUT下载 Boost 库的超时时间(以秒为单位)600 ENABLED_LOCAL_INFILE是否为 LOAD DATA 启用 LOCALOFF ENABLED_PROFILING是否启用查询分析代码ON ENABLE_DOWNLOADS是否下载可选文件OFF ENABLE_DTRACE是否包含 DTrace 支持 ENABLE_GCOV是否包含 gcov 支持 ENABLE_GPROF启用 gprof(仅限优化的 Linux 版本)OFF FORCE_UNSUPPORTED_COMPILER是否允许不支持的编译器OFF IGNORE_AIO_CHECK使用-DBUILD_CONFIG=mysql_release,忽略libaio检查OFF INSTALL_BINDIR用户可执行文件目录PREFIX/bin INSTALL_DOCDIR文档目录PREFIX/docs INSTALL_DOCREADMEDIR自述文件目录PREFIX INSTALL_INCLUDEDIR头文件目录PREFIX/include INSTALL_INFODIR信息文件目录PREFIX/docs INSTALL_LAYOUT选择预定义的安装布局STANDALONE INSTALL_LIBDIR库文件目录PREFIX/lib INSTALL_MANDIR手册页目录PREFIX/man INSTALL_MYSQLKEYRINGDIRkeyring_file 插件数据文件的目录platform specific5.7.11 INSTALL_MYSQLSHAREDIR共享数据目录PREFIX/share INSTALL_MYSQLTESTDIRmysql测试目录PREFIX/mysql-test INSTALL_PKGCONFIGDIRmysqlclient.pc pkg-config 文件的目录INSTALL_LIBDIR/pkgconfig INSTALL_PLUGINDIR插件目录PREFIX/lib/plugin INSTALL_SBINDIR服务器可执行目录PREFIX/bin INSTALL_SCRIPTDIR脚本目录PREFIX/scripts INSTALL_SECURE_FILE_PRIVDIRsecure_file_priv 默认值platform specific INSTALL_SECURE_FILE_PRIV_EMBEDDEDDIRlibmysqld 的secure_file_priv 默认值 INSTALL_SHAREDIRaclocal/mysql.m4 安装目录PREFIX/share INSTALL_SUPPORTFILESDIR额外的支持文件目录PREFIX/support-files MAX_INDEXES每个表的最大索引64 MEMCACHED_HOME内存缓存路径;过时的[none] 5.7.33MUTEX_TYPEInnoDB 互斥类型event MYSQLX_TCP_PORTX Plugin 使用的 TCP/IP 端口号330605.7.17 MYSQLX_UNIX_ADDRX Plugin 使用的 Unix 套接字文件/tmp/mysqlx.sock5.7.15 MYSQL_DATADIR数据目录 MYSQL_MAINTAINER_MODE是否启用 MySQL 维护者专用开发环境OFF MYSQL_PROJECT_NAMEWindows/macOS 项目名称MySQL MYSQL_TCP_PORTTCP/IP 端口号3306 MYSQL_UNIX_ADDRUnix 套接字文件/tmp/mysql.sock ODBC_INCLUDESODBC 包括目录 ODBC_LIB_DIRODBC 库目录 OPTIMIZER_TRACE是否支持优化器跟踪 REPRODUCIBLE_BUILD特别注意创建独立于构建位置和时间的构建结果 5.7.19 SUNPRO_CXX_LIBRARYSolaris 10+ 上的客户端链接库 SYSCONFDIR选项文件目录 SYSTEMD_PID_DIRsystemd 下 PID 文件的目录/var/run/mysqld SYSTEMD_SERVICE_NAMEsystemd 下 MySQL 服务的名称mysqld TMPDIRtmpdir 默认值 WIN_DEBUG_NO_INLINE是否禁用函数内联OFF WITHOUT_xxx_STORAGE_ENGINE从构建中排除存储引擎 xxx WITH_ASAN启用 AddressSanitizerOFF WITH_ASAN_SCOPE启用 AddressSanitizer -fsanitize-address-use-after-scope Clang 标志OFF5.7.21 WITH_AUTHENTICATION_LDAPLDAP认证插件无法构建是否报错OFF5.7.19 WITH_AUTHENTICATION_PAM构建 PAM 身份验证插件OFF WITH_AWS_SDKAmazon Web Services 软件开发工具包的位置 5.7.19 WITH_BOOSTBoost 库源的位置 WITH_BUNDLED_LIBEVENT构建 ndbmemcache 时使用捆绑的 libevent;过时的ON 5.7.33WITH_BUNDLED_MEMCACHED构建ndbmemcache时使用捆绑的memcached;过时的ON 5.7.33WITH_CLASSPATH构建 MySQL Cluster Connector for Java 时使用的类路径。默认为空字符串。`` WITH_CLIENT_PROTOCOL_TRACING构建客户端协议跟踪框架ON WITH_CURLcurl库的位置 5.7.19 WITH_DEBUG是否包含调试支持OFF WITH_DEFAULT_COMPILER_OPTIONS是否使用默认编译器选项ON WITH_DEFAULT_FEATURE_SET是否使用默认功能集ON WITH_EDITLINE使用哪个 libedit/editline 库bundled WITH_EMBEDDED_SERVER是否搭建嵌入式服务器OFF WITH_EMBEDDED_SHARED_LIBRARY是否构建共享嵌入式服务器库OFF WITH_ERROR_INSERT在 NDB 存储引擎中启用错误注入。不应用于构建用于生产的二进制文件。OFF WITH_EXTRA_CHARSETS要包括哪些额外的字符集all WITH_GMOCKgooglemock 分发路径 WITH_INNODB_EXTRA_DEBUG是否包括对 InnoDB 的额外调试支持。OFF WITH_INNODB_MEMCACHED是否生成 memcached 共享库。OFF WITH_KEYRING_TEST构建密钥环测试程序OFF5.7.11 WITH_LDAP限内部使用 5.7.29 WITH_LIBEVENT使用哪个 libevent 库bundled WITH_LIBWRAP是否包含 libwrap(TCP 包装器)支持OFF WITH_LZ4LZ4 库支持的类型bundled5.7.14 WITH_MECAB编译 MeCab WITH_MSAN启用 MemorySanitizerOFF WITH_MSCRT_DEBUG启用 Visual Studio CRT 内存泄漏跟踪OFF WITH_NDBAPI_EXAMPLES构建 API 示例程序OFF WITH_NDBCLUSTER构建 NDB 存储引擎ON WITH_NDBCLUSTER_STORAGE_ENGINE供内部使用;可能无法在所有情况下都按预期工作;用户应该改用 WITH_NDBCLUSTERON WITH_NDBMTD构建多线程数据节点。ON WITH_NDB_BINLOGmysqld 默认启用二进制日志。ON WITH_NDB_DEBUG生成用于测试或故障排除的调试版本。OFF WITH_NDB_JAVA启用 Java 和 ClusterJ 支持的构建。默认启用。仅在 MySQL 集群中支持。ON WITH_NDB_PORT使用此选项构建的管理服务器使用的默认端口。如果未使用此选项构建它,则管理服务器的默认端口为 1186。[none] WITH_NDB_TEST包括 NDB API 测试程序。OFF WITH_NUMA设置 NUMA 内存分配策略 5.7.17 WITH_PROTOBUF使用哪个 Protocol Buffers 包bundled5.7.12 WITH_RAPID是否构建快速开发周期插件ON5.7.12 WITH_SASL限内部使用 5.7.29 WITH_SSLSSL 支持的类型system WITH_SYSTEMD启用 systemd 支持文件的安装OFF WITH_TEST_TRACE_PLUGIN构建测试协议跟踪插件OFF WITH_UBSAN启用未定义的行为清理程序OFF WITH_UNIT_TESTS使用单元测试编译 MySQLON WITH_UNIXODBC启用 unixODBC 支持OFF WITH_VALGRIND是否在 Valgrind 头文件中编译OFF WITH_ZLIBzlib 支持的类型bundled WITH_xxx_STORAGE_ENGINE将存储引擎 xxx 静态编译到服务器中 开始编译定义变量,根据自己环境情况修改export boost_home=/usr/local/src/mysql-5.7.29/boost export mysql_home=/opt/mysqlmake -j $(nproc) 表示使用所有的 cpu 线程进行编译,如果机器有业务使用,不建议使用 $(nproc)cmake \ -DCMAKE_INSTALL_PREFIX=${mysql_home} \ -DSYSTEMD_PID_DIR=${mysql_home} \ -DMYSQL_UNIX_ADDR=${mysql_home}/mysql.sock \ -DMYSQL_DATADIR=${mysql_home}/data \ -DSYSCONFDIR=/etc \ -DDEFAULT_CHARSET=utf8 \ -DDEFAULT_COLLATION=utf8_general_ci \ -DWITH_INNOBASE_STORAGE_ENGINE=1 \ -DWITH_ARCHIVE_STORAGE_ENGINE=1 \ -DWITH_BLACKHOLE_STORAGE_ENGINE=1 \ -DWITH_PERFSCHEMA_STORAGE_ENGINE=1 \ -DWITH_BOOST=${boost_home}/boost_1_59_0 \ -DWITH_SYSTEMD=1 && \ make -j $(nproc) && \ make install报错1CMake Error at rapid/plugin/group_replication/rpcgen.cmake:100 (MESSAGE): Could not find rpcgenCall Stack (most recent call first): rapid/plugin/group_replication/CMakeLists.txt:36 (INCLUDE)-- Configuring incomplete, errors occurred!See also "/usr/local/src/mysql-5.7.29/CMakeFiles/CMakeOutput.log".See also "/usr/local/src/mysql-5.7.29/CMakeFiles/CMakeError.log".wget https://github.com/thkukuk/rpcsvc-proto/releases/download/v1.4/rpcsvc-proto-1.4.tar.gz tar xvf rpcsvc-proto-1.4.tar.gz cd rpcsvc-proto-1.4 ./configure && make && make install 配置 mysql等会配置文件会定义使用的用户为 mysql如果定义的是其他用户,就赋权其他的用户,以自己实际为准chown -R mysql.mysql ${mysql_home}修改配置文件vim /etc/my.cnf[mysqld] user=mysql basedir=/opt/mysql datadir=/opt/mysql/data socket=/opt/mysql/mysql.sock port=3306 character_set_server=utf8 # lower_case_table_names 让MYSQL不区分表名大小写 lower_case_table_names=1 # Disabling symbolic-links is recommended to prevent assorted security risks symbolic-links=0 # Settings user and group are ignored when systemd is used. # If you need to run mysqld under a different user or group, # customize your systemd unit file for mysql according to the # instructions in http://fedoraproject.org/wiki/Systemd [mysqld_safe] log-error=/var/log/mysql/mysql.log pid-file=/var/run/mysql/mysql.pid # include all files from the config directory !includedir /etc/my.cnf.d数据库初始化更新环境变量echo "export PATH=\$PATH:${mysql_home}/bin:${mysql_home}/lib" >> /etc/profile source /etc/profilemysqld --initialize-insecure \ --user=mysql \ --basedir=${mysql_home} \ --datadir=${mysql_home}/dataroot@localhost is created with an empty password ! Please consider switching off the --initialize-insecure option.初始化完成后,会出现上面的输出,创建的 root 用户仅本机可以访问,并且没有配置密码--initialize-insecure - 创建一个密码为空的超级用户--initialize - 创建一个随机密码的超级用户,并将其存储到日志中一般都使用空密码启动 mysqlcp ${mysql_home}/usr/lib/systemd/system/mysqld.service /usr/lib/systemd/system systemctl daemon-reload systemctl enable mysqld.service systemctl start mysqld.service
先准备一个编译环境压压惊首先要有一个 docker部署 dockerwget https://download.docker.com/linux/static/stable/aarch64/docker-19.03.11.tgz tar xvf docker-19.03.11.tgz mv docker/* /usr/bin/docker.servicecat <<EOF> /usr/lib/systemd/system/docker.service [Unit] Description=Docker Application Container Engine Documentation=https://docs.docker.com BindsTo=containerd.service After=network-online.target firewalld.service containerd.service Wants=network-online.target Requires=docker.socket [Service] Type=notify # the default is not to use systemd for cgroups because the delegate issues still # exists and systemd currently does not support the cgroup feature set required # for containers run by docker ExecStart=/usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock ExecReload=/bin/kill -s HUP \$MAINPID TimeoutSec=0 RestartSec=2 Restart=always # Note that StartLimit* options were moved from "Service" to "Unit" in systemd 229. # Both the old, and new location are accepted by systemd 229 and up, so using the old location # to make them work for either version of systemd. StartLimitBurst=3 # Note that StartLimitInterval was renamed to StartLimitIntervalSec in systemd 230. # Both the old, and new name are accepted by systemd 230 and up, so using the old name to make # this option work for either version of systemd. StartLimitInterval=60s # Having non-zero Limit*s causes performance problems due to accounting overhead # in the kernel. We recommend using cgroups to do container-local accounting. LimitNOFILE=infinity LimitNPROC=infinity LimitCORE=infinity # Comment TasksMax if your systemd version does not support it. # Only systemd 226 and above support this option. TasksMax=infinity # set delegate yes so that systemd does not reset the cgroups of docker containers Delegate=yes # kill only the docker process, not all processes in the cgroup KillMode=process [Install] WantedBy=multi-user.target EOFdocker.socketcat <<EOF> /usr/lib/systemd/system/docker.socket [Unit] Description=Docker Socket for the API PartOf=docker.service [Socket] ListenStream=/var/run/docker.sock SocketMode=0660 SocketUser=root SocketGroup=docker [Install] WantedBy=sockets.targe EOFcontainerd.servicecat <<EOF> /usr/lib/systemd/system/containerd.service # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # http://www.apache.org/licenses/LICENSE-2.0 # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. [Unit] Description=containerd container runtime Documentation=https://containerd.io After=network.target [Service] ExecStartPre=-/sbin/modprobe overlay ExecStart=/usr/bin/containerd KillMode=process Delegate=yes LimitNOFILE=1048576 # Having non-zero Limit*s causes performance problems due to accounting overhead # in the kernel. We recommend using cgroups to do container-local accounting. LimitNPROC=infinity LimitCORE=infinity TasksMax=infinity [Install] WantedBy=multi-user.target EOFdaemon.jsongroupadd docker mkdir /etc/docker/ cat > /etc/docker/daemon.json <<EOF "live-restore": true, "exec-opts": ["native.cgroupdriver=systemd"], "log-driver": "json-file", "registry-mirrors": ["https://v16stybc.mirror.aliyuncs.com"], "log-opts": { "max-size": "100m" "storage-driver": "overlay2", "storage-opts": [ "overlay2.override_kernel_check=true" EOF准备镜像FROM centos:7 ARG work_dir=/usr/local ENV PS1='\[\e[7;34m\]\u@\h\[\e[0m\]\[\e[0;35m\]:$(pwd) \[\e[0m\]\[\e[0;35m\]\t\[\e[0m\]\n\[\e[0;32m\]> \[\e[0m\]' ENV LANG=en_US.UTF-8 ENV TZ="Asia/Shanghai" ENV GOPATH=/usr/local/src ENV GOPROXY=https://goproxy.cn ENV GO_HOME=${work_dir}/go ENV PATH=${PATH}:${GO_HOME}/bin WORKDIR ${work_dir} ADD ./go1.17.9.linux-arm64.tar.gz ./ RUN curl -o /etc/yum.repos.d/ http://mirrors.aliyun.com/repo/Centos-altarch-7.repo && \ yum install -y wget vim unzip bash-completion git gcc gcc-c++ make cmake protobuf-* automake libtool zlib-devel openssl-devel && \ yum clean all && \ echo '/usr/bin/sleep 315360000' > start.sh && \ chmod +x start.sh CMD ["/usr/bin/bash","start.sh"]以 host 网络模式来 build docker 镜像nat 模式局限性比较大,很多网络会超时docker build --network host -t go_make:arm64_1.17.9 .把容器放后台启动,方便进进出出docker run -d --network host --name make_some_thing go_make:arm64_1.17.9进入容器docker exec -it make_some_thing bash修改 go 模块的下载地址为国内go env -w GOPROXY=https://goproxy.cn编译 redisredis 官网下载地址wget https://download.redis.io/releases/redis-4.0.14.tar.gz tar xvf redis-4.0.14.tar.gz cd redis-4.0.14/ make install查看 redis 版本,验证编译是否成功./src/redis-cli --version其他细节`GLIBC_2.28' not found编译好的 redis-server 拿到其他环境,可能会出现的问题redis-server: /lib64/libc.so.6: version `GLIBC_2.28' not foundGLIBC 2.8 对于 make 的版本有要求,不能低于 4.x ,我们这边用 4.2 版本的不升级 make ,编译的时候会报错 These critical programs are missing or too old: make bison compiler编译 makewget https://ftp.gnu.org/gnu/make/make-4.2.tar.gz --no-check-certificate tar xvf make-4.2.tar.gz cd make-4.2/ ./configure && make -j $(nproc) && make install mv /usr/bin/make{,-$(make --version | head -1 | awk '{print $NF}')} mv make /usr/bin/查看 make 版本,验证编译是否成功make --version编译 glibc 2.8升级 gccThese critical programs are missing or too old: bison compiler编译的时候会有上面的报错表示 gcc 版本太低了,最少要 gcc-7yum install -y centos-release-scl bison yum install -y devtoolset-7-gcc devtoolset-7-gcc-c++ devtoolset-7-binutils # 使环境生效,通过 gcc -v 可以查看版本已经变成 7.x了 scl enable devtoolset-7 bash # 永久生效 echo "source /opt/rh/devtoolset-7/enable" >>/etc/profile source /etc/profile编译 glibc 2.8切记,glibc 2.8 不要在 /usr/local 目录下编译,编译的时候会有下面这样的提示,要使用 --prefix 指定路径*** On GNU/Linux systems the GNU C Library should not be installed into *** /usr/local since this might make your system totally unusable. *** We strongly advise to use a different prefix. For details read the FAQ. *** If you really mean to do this, run configure again using the extra *** parameter `--disable-sanity-checks'.wget https://ftp.gnu.org/gnu/glibc/glibc-2.28.tar.xz --no-check-certificate tar xvf glibc-2.28.tar.xz cd glibc-2.28 # 为什么要建一个新目录? ## 因为直接执行'./configure',会报错'configure: error: you must configure in a separate build directory' mkdir build cd build/ mkdir /lib64/glibc-2.28/etc # 可能会遇到这个报错:Warning: ignoring configuration file that cannot be opened: /lib64/glibc-2.28/etc/ld.so.conf: No such file or directory ## find 查找一下文件,然后做个软连接就可以了 ln -s $(find / -name "ld.so.conf") /lib64/glibc-2.28/etc/ld.so.conf ../configure --prefix=/lib64/glibc-2.28 && make -j $(nproc) && make install编译 filebeatbeats githubwget https://github.com/elastic/beats/archive/refs/tags/v7.7.0.tar.gz tar xvf v7.7.0.tar.gz cd beats-7.7.0/filebeat make执行的时候会有一段时间终端没有任务输出,耐心等待就可以了,输出类似如下内容,表示编译完成fatal: Not a git repository (or any of the parent directories): .git go build -ldflags "-X github.com/elastic/beats/libbeat/version.buildTime=2022-04-29T08:21:45Z -X github.com/elastic/beats/libbeat/version.commit=" go: downloading github.com/imdario/mergo v0.3.6 go: downloading github.com/urso/go-bin v0.0.0-20180220135811-781c575c9f0e go: downloading github.com/google/gofuzz v1.0.0 go: downloading github.com/davecgh/go-spew v1.1.1 go: downloading github.com/google/go-cmp v0.4.0 go: downloading github.com/json-iterator/go v1.1.7 go: downloading github.com/modern-go/reflect2 v1.0.1 go: downloading google.golang.org/grpc v1.27.1 go: downloading github.com/containerd/containerd v1.3.3 go: downloading github.com/sirupsen/logrus v1.4.2 go: downloading github.com/docker/distribution v2.7.1+incompatible go: downloading github.com/eapache/queue v1.1.0 go: downloading github.com/pierrec/lz4 v2.2.6+incompatible go: downloading github.com/eapache/go-xerial-snappy v0.0.0-20180814174437-776d5712da21 go: downloading github.com/eapache/go-resiliency v1.1.0 go: downloading github.com/hashicorp/go-uuid v1.0.1 go: downloading gopkg.in/jcmturner/dnsutils.v1 v1.0.1 go: downloading gopkg.in/jcmturner/aescts.v1 v1.0.1 go: downloading gopkg.in/jcmturner/rpc.v1 v1.1.0 go: downloading github.com/golang/snappy v0.0.1 go: downloading github.com/modern-go/concurrent v0.0.0-20180306012644-bacd9c7ef1dd go: downloading google.golang.org/genproto v0.0.0-20191230161307-f3c370f40bfb查看 filebeat 版本,验证编译是否成功./filebeat version从容器内打包的时候,需要注意一个细节,在宿主机上执行 file filebeat 命令会看到有 lib 文件依赖,在其他环境使用的时候,也要注意把 lib 文件复制过去通过 file 命令可以看到,依赖的 lib 文件的路径和名称为 /lib/ld-linux-aarch64.so.1filebeat-7.7.0-linux-arm64/filebeat: ELF 64-bit LSB executable, ARM aarch64, version 1 (SYSV), dynamically linked, interpreter /lib/ld-linux-aarch64.so.1, for GNU/Linux 3.7.0, Go BuildID=2ZFlNlPqOIS1j0hZOMLt/zmkml0JZODmFUTnWldZe/wMGZe6JzkJmCcs5EeF-6/ID26ze0hYOJZMk5gMX19, BuildID[sha1]=438f66d9c4384ccde3a4a395034160565df6280e, not stripped安装 airflowyum install -y python3 python3-devel python3-pip pip3 install --upgrade pip pip3 install apache-airflow==2.2.3升级 sqlite3airflow 2.2.3 环境要求sqlite 下载地址官方要求 airflow 2.0+ 版本的环境, python 的 sqlite3 版本不能低于 3.15.0arm 环境下载的 python 3.6 使用的 sqlite3 版本是 3.7.17 的,运行 airflow --version 会报错 error: sqlite C library version too old (< 3.15.0)Python 3.6.8 (default, Nov 16 2020, 16:33:14) [GCC 4.8.5 20150623 (Red Hat 4.8.5-44)] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import sqlite3 >>> sqlite3.sqlite_version '3.7.17'wget --no-check-certificate https://sqlite.org/2022/sqlite-autoconf-3380400.tar.gz tar xvf sqlite-autoconf-3380400.tar.gz cd sqlite-autoconf-3380400/ export CFLAGS="-DSQLITE_ENABLE_FTS3 \ -DSQLITE_ENABLE_FTS3_PARENTHESIS \ -DSQLITE_ENABLE_FTS4 \ -DSQLITE_ENABLE_FTS5 \ -DSQLITE_ENABLE_JSON1 \ -DSQLITE_ENABLE_LOAD_EXTENSION \ -DSQLITE_ENABLE_RTREE \ -DSQLITE_ENABLE_STAT4 \ -DSQLITE_ENABLE_UPDATE_DELETE_LIMIT \ -DSQLITE_SOUNDEX \ -DSQLITE_TEMP_STORE=3 \ -DSQLITE_USE_URI \ -O2 \ -fPIC" export PREFIX="/usr/local/sqlite3" LIBS="-lm" ./configure --disable-tcl --enable-shared --enable-tempstore=always --prefix="$PREFIX" make -j $(nproc) make install这里的 /usr/local 要和上面编译的时候 --prefix 路径一致export LD_LIBRARY_PATH=/usr/local/sqlite3/lib:$LD_LIBRARY_PATH查看 airflow 版本,验证编译是否成功airflow version编译 hue安装相关依赖编译 hue 需要安装 python-devel 否则 make 的时候会报错:/usr/local/hue-release-4.7.1/Makefile.vars:65: *** "Error: must have python development packages for python2.7. Could not find Python.h. Please install python2.7-devel". Stop不安装 libffi-devel ,pip 安装 cffi 的时候会报错:fatal error: ffi.h: No such file or directory不安装 MySQL-python,pip 安装 mysql-python 会报错:EnvironmentError: mysql_config not found不安装 mysql-devel,pip 安装 mysql-python 会报错:EnvironmentError: mysql_config not found不安装 sqlite-devel ,编译的时候会报错:fatal error: sqlite3.h: No such file or directory不安装 cyrus-sasl-devel,编译的时候会报错:fatal error: sasl/sasl.h: No such file or directory不安装 openldap-devel,编译的时候会报错:fatal error: lber.h: No such file or directory不安装 libxslt-devel,编译的时候会报错:fatal error: libxml/xmlversion.h: No such file or directoryyum install -y python-devel \ libffi-devel \ MySQL-python \ mysql-devel \ sqlite-devel \ cyrus-sasl-devel \ openldap-devel \ libxslt-devel安装 nodejs如何确定 nodejs 的版本下载好 hue 的安装包,解压后进入 hue 的路径下,执行下面的命令grep setup tools/container/base/hue/Dockerfilecurl -sL https://rpm.nodesource.com/setup_10.x 这里就可以看到,用到的是 10.x 的版本wget https://nodejs.org/dist/v10.19.0/node-v10.19.0-linux-arm64.tar.gz tar xvf node-v10.19.0-linux-arm64.tar.gz echo 'export NODE_HOME=/usr/local/node-v10.19.0-linux-arm64' >> /etc/profile echo 'export PATH=$PATH:$NODE_HOME/bin' >> /etc/profile source /etc/profile # 验证环境变量是否生效 node -v npm -v开始编译wget https://github.com/cloudera/hue/archive/refs/tags/release-4.7.1.tar.gz tar xvf release-4.7.1.tar.gz cd hue-release-4.7.1/ build/env/bin/python2.7 -m pip install cffi \ mysql-python \ traitlets \ backports.shutil-get-terminal-size \ pathlib2 \ pexpect \ pickleshare \ simplegeneric==0.8.1 \ prompt-toolkit==1.0.4 make apps通过 cat tools/container/base/hue/Dockerfile 可以看到 hue 的编译需要解决的依赖,但是我为了让镜像更小,能不装的就不装,所以在一步一步的试错报错1distutils.errors.DistutilsError: Command '['/usr/local/hue-release-4.7.1/build/env/bin/python2.7', '-m', 'pip', '--disable-pip-version-check', 'wheel', '--no-deps', '-w', '/usr/local/hue-release-4.7.1/desktop/core/ext-py/cryptography-2.9/temp/tmpojt9TZ', '--quiet', 'cffi!=1.11.3,>=1.8']' returned non-zero exit status 1build/env/bin/python2.7 -m pip install cffi报错2sh: mysql_config: command not foundbuild/env/bin/python2.7 -m pip install mysql-python报错3` File "/tmp/easy_install-_GDIGO/traitlets-5.1.1/setup.py", line 41print(error, file=sys.stderr)`build/env/bin/python -m pip install traitlets报错4` ERROR: pip's legacy dependency resolver does not consider dependency conflicts when selecting packages. This behaviour is the source of the following dependency conflicts.ipython 5.2.0 requires backports.shutil-get-terminal-size, which is not installed.ipython 5.2.0 requires pathlib2, which is not installed.ipython 5.2.0 requires pexpect, which is not installed.ipython 5.2.0 requires pickleshare, which is not installed.ipython 5.2.0 requires prompt-toolkit<2.0.0,>=1.0.4, which is not installed.ipython 5.2.0 requires simplegeneric>0.8, which is not installed.`build/env/bin/python -m pip install backports.shutil-get-terminal-size \ pathlib2 \ pexpect \ pickleshare \ prompt-toolkit==1.0.4 \ simplegeneric==0.8.1 制作 alertmanager 镜像获取二进制文件alertmanager githubwget https://github.com/prometheus/alertmanager/releases/download/v0.14.0/alertmanager-0.14.0.linux-arm64.tar.gz tar xvf alertmanager-0.14.0.linux-arm64.tar.gz mv alertmanager-0.14.0.linux-arm64/a* . vim config.yamlroute: group_by: ['alertname'] group_wait: 30s group_interval: 5m repeat_interval: 1h receiver: 'web.hook' receivers: - name: 'web.hook' webhook_configs: - url: 'http://127.0.0.1:5001/' inhibit_rules: - source_match: severity: 'critical' target_match: severity: 'warning' equal: ['alertname', 'dev', 'instance']利用 busybox 来制作 alertmanager 镜像 (因为 busybox 很小)FROM busybox:latest COPY amtool /bin/amtool COPY alertmanager /bin/alertmanager COPY config.yml /etc/alertmanager/config.yml RUN mkdir -p /alertmanager && \ chown -R nobody:nobody etc/alertmanager /alertmanager USER nobody EXPOSE 9093 VOLUME [ "/alertmanager" ] WORKDIR /alertmanager ENTRYPOINT [ "/bin/alertmanager" ] CMD [ "--config.file=/etc/alertmanager/config.yml", \ "--storage.path=/alertmanager" ]docker build -t prom/alertmanager:v0.14.0 .
检查 openssh 版本如果需要使用 ChrootDirectory 参数配置用户访问的默认路径,openssh 的版本不能低于 4.8ssh -VOpenSSH_7.4p1, OpenSSL 1.0.2k-fips 26 Jan 2017创建 sftp 用户Linux 系统内已存在的用户,都是可以直接使用 sftp 服务,创建用户是为了一些特殊的场景使用,做一些区分将 for-sftp 用户的登录 shell 指定为 /sbin/nologin ,不让 for-sftp 用户可以通过 ssh 命令登录到服务器useradd for-sftp -s /sbin/nologin为 for-sftp 用户设置密码echo for-sftp | passwd --stdin for-sftp配置 sftp多备份,少跑路cp /etc/ssh/sshd_config{,.bak}可以直接过滤 sftp 关键字# 注释掉下面的这行内容,因为配置的 for-sftp 用户没有登录 shell 的权限 ## 不注释的话,登录的时候会返回报错:Received message too long 1416128883 # Subsystem sftp /usr/libexec/openssh/sftp-server # 修改成下面的方式来打开 sftp 服务 Subsystem sftp internal-sftp # Match User 后面的配置内容,只对 Match User 指定的用户生效,多个用户以逗号分隔 ## 也可以配置成 Match Group,对指定的组生效,同样,多个组以逗号分隔 Match User for-sftp # 指定 sftp 登录的默认路径 ## 目录必须存在,否则 sftp 连接会报错 ChrootDirectory /data/sftp # 指定 sftp 命令 ForceCommand internal-sftp配置了 ChrootDirectory 时,sftp 对于 ChrootDirectory 目录的权限要求比较死不允许 ChrootDirectory 的属组有写入权限,也就是最高只支持 750 和 755 两种权限ChrootDirectory 的属主必须是 root 用户不满足以上两种条件时,用 for-sftp 用户登录 sftp 会报错:packet_write_wait: Connection to <ip地址> port 22: Broken pipeCouldn't read packet: Connection reset by peer重启 sshd 服务重启 sshd 服务才能使 sftp 配置生效systemctl restart sshd如果出现了以下报错Directive 'Protocol' is not allowed within a Match block那说明 /etc/ssh/sshd_config 配置文件内开了 Protocol 这个配置,需要把 sftp 相关的配置,移到 Protocol 后面几行就行,可以直接放到配置文件最后的地方,然后重启 sshd 服务就可以解决创建 ChrootDirectory-m 750 - 指定目录创建时的权限-m 只会指定创建时的最后一级目录,不影响前面的父级目录-p - 父级目录不存在时,创建父级目录要给 ChrootDirectory 配置一个 for-sftp 组的权限,否则 for-sftp 用户也没权限进入到自己有权限的路径下搞事情mkdir /data/sftp -m 750 -p chown root.for-sftp /data/sftpsftp 配置了 ChrootDirectory 后,就只能查看文件,没办法上传文件,所以要在 ChrootDirectory 目录下在创建一个 for-sftp 用户有权限的路径,就可以上传文件了mkdir /data/sftp/for-sftp chown for-sftp.for-sftp /data/sftp/for-sftp登录 sftp 搞事情图省事,我就直接使用 localhost 了sftp for-sftp@localhostfor-sftp@localhost's password: Connected to localhost. sftp> ls for-sftp sftp> cd for-sftp/ sftp> ls sftp> put anaconda-ks.cfg Uploading anaconda-ks.cfg to /for-sftp/anaconda-ks.cfg anaconda-ks.cfg 100% 1526 2.8MB/s 00:00 sftp> ls anaconda-ks.cfg然后我们就把本地的 anaconda-ks.cfg 文件上传到 sftp 服务器上了,当我们查看 ChrootDirectory 目录的 for-sftp 目录下就会有 anaconda-ks.cfg 这个文件了sftp 常用命令登录到 sftp 服务器之后,输入 help 就可以查看所有的命令了Available commands: # 退出 sftp bye Quit sftp # 进入到 sftp 内指定的路径 cd path Change remote directory to 'path' # 修改 sftp 内指定路径的属租 chgrp grp path Change group of file 'path' to 'grp' # 修改 sftp 内指定路径的权限 [ ugo 权限] chmod mode path Change permissions of file 'path' to 'mode' # 修改 sftp 内指定路径的所有者 chown own path Change owner of file 'path' to 'own' # 查看 sftp 内指定路径的统计信息 df [-hi] [path] Display statistics for current directory or filesystem containing 'path' # 退出 sftp exit Quit sftp # 从 sftp 下载文件到本地 get [-afPpRr] remote [local] Download file # 恢复下载 reget [-fPpRr] remote [local] Resume download file # 恢复上传 reput [-fPpRr] [local] remote Resume upload file # 查看帮助 help Display this help text # 修改本地所在路径 lcd path Change local directory to 'path' # 查看本地路径下的文件详情 lls [ls-options [path]] Display local directory listing # 本地创建路径 lmkdir path Create local directory # 生成连接文件 ln [-s] oldpath newpath Link remote file (-s for symlink) # 显示本地所在路径 lpwd Print local working directory # 同 linux 查看指定路径下有哪些文件 ls [-1afhlnrSt] [path] Display remote directory listing # 设置本地 umask lumask umask Set local umask to 'umask' # sftp 内创建目录 mkdir path Create remote directory # 切换进度表的显示 progress Toggle display of progress meter # 本地文件上传到 sftp 内 put [-afPpRr] local [remote] Upload file # 显示 sftp 内当前所在路径 pwd Display remote working directory # 退出 sftp quit Quit sftp # sftp 内文件重命名 rename oldpath newpath Rename remote file # sftp 内删除文件 rm path Delete remote file # sftp 内删除目录 rmdir path Remove remote directory # 生成连接文件 symlink oldpath newpath Symlink remote file # 查看 sftp 版本 version Show SFTP version # 在本地执行命令 !command Execute 'command' in local shell # 逃到本地 [ 其实就是退出 sftp ] ! Escape to local shell # 显示帮助 ? Synonym for help
使用 shell 部署二进制 k8s 集群的好处在于时间消耗比较小传统手动部署,在熟练的情况下,也会消耗半天左右的时间,并且操作过程中,也容易出现误操作的情况,非常耗费时间,使用脚本部署,在网络和磁盘性能好的情况下,只需要几分钟即可完成部署,只需要前期配置好配置文件,接杯水的功夫就完成了脚本下载路径压缩包 292MB,解压完,大约 0.98G关于脚本请使用root用户执行此脚本脚本执行前,请先关闭 firewall 以及 selinux(脚本内不做处理)参考命令:关闭防火墙:systemctl disable firewalld --now关闭 selinux(重启后生效):sed -i "s/SELINUX=enforcing/SELINUX=disabled/g" /etc/selinux/config临时关闭 selinux:setenforce 0脚本执行前,请提前做好主机之间的免密操作以及时间同步免密脚本可以参考 bin/ssh_key.sh [脚本依赖 expect 命令,需要提前安装]使用方法:在 bin/ssh_list.txt 文件内填写所有主机的信息,主机信息格式: <用户名> <ip> <用户密码>执行免密脚本 bash bin/ssh_key.sh集群部署:脚本执行前,请修改 conf/install_conf.sh 文件,填写需要部署的节点 ip、数据存储路径(注意检查磁盘是否有足够的空间)、服务端口是否被占用(若被占用,可修改配置文件)、相关的服务ip(cluster_cidr、service_cidr、cluster_svc_ip、cluster_dns_ip)是否和宿主机网段有冲突执行 bash bin/00.install.sh 开始部署 kubernetes 集群节点扩容:修改 conf/install_conf.sh 文件内的 work_nodes 变量值,将 ip 修改为需要扩容的节点 ip执行 bash bin/03.deploy_node.sh 即可脚本执行完成后,kubectl命令会找不到,此时执行source /etc/profile命令即可(因为当前终端没有重新加载PATH变量,所以只需要通过source重新加载变量即可)此脚本基于 kubernetes v1.19.7 编写,如若需要安装高版本或者低版本,需要注意 service 文件内的启动参数是否需要修改,因为版本迭代,会导致一些参数不再被使用,或者被其他参数替代,如果不修改,会影响服务启动,导致 kubernetes 集群部署失败关于二进制文件二进制文件都存放在 packages 目录下,带有目录的,不要变动目录的名称或删除目录,会影响服务的部署关于镜像镜像都存放在 images 目录下镜像的 tag ,可以在 var_list.sh 文件内修改镜像文件的名称,可以在 var_list.sh 文件内修改关于 var_list.sh 文件有很多定义的变量存放在 var_list.sh 文件里面,比如 ssh 的端口等等,有特殊需求的时候,可以修改使用目录结构. ├── bin # 脚本存放路径 │ ├── 00.install.sh # 总安装脚本 │ ├── 01.deploy_system.sh # 环境初始化脚本 │ ├── 02.deploy_master.sh # master 节点部署脚本 │ ├── 03.deploy_node.sh # node 节点部署脚本 │ ├── deploy_cert.sh # 生成证书脚本 [ 会用到 cfssl 和 kubectl 命令 ] │ ├── print_log.sh # 终端输出内容模板 │ ├── ssh_key.sh # ssh 免密脚本 │ ├── ssh_list.txt # ssh 免密脚本调用的主机清单 │ └── var_list.sh # 一些变量的维护,需要自定义的情况下可以修改这个文件 ├── conf # 配置文件存放路径 │ ├── install_conf.sh # 安装使用的配置文件 │ └── template # 存放的模板文件 │ ├── cert # k8s 证书模板 │ │ ├── admin-csr.json.template │ │ ├── ca-config.json.template │ │ ├── ca-csr.json.template │ │ ├── etcd-csr.json.template │ │ ├── kube-controller-manager-csr.json.template │ │ ├── kubelet-csr.json.template │ │ ├── kube-proxy-csr.json.template │ │ ├── kubernetes-csr.json.template │ │ ├── kube-scheduler-csr.json.template │ │ └── metrics-server-csr.json.template │ ├── service # systemctl service 文件模板 │ │ ├── 10-flannel.conflist.template │ │ ├── cni-default.conf.template │ │ ├── config.toml.template │ │ ├── containerd.service.template │ │ ├── crictl.yaml.template │ │ ├── daemon.json.template │ │ ├── docker.service.template │ │ ├── kube-apiserver.service.template │ │ ├── kube-controller-manager.service.template │ │ ├── kube-etcd.service.template │ │ ├── kubelet.service.template │ │ ├── kube-nginx.conf.template │ │ ├── kube-nginx.service.template │ │ ├── kube-proxy.service.template │ │ └── kube-scheduler.service.template │ ├── system # 系统服务使用的一些模板 │ │ ├── history.sh.template │ │ ├── kubernetes_journald.conf.template │ │ ├── kubernetes_limits.conf.template │ │ ├── kubernetes_sysctl.conf.template │ │ └── rc.local.template │ └── yaml # yaml 文件模板 │ ├── coredns.yaml.template │ ├── flannel.yaml.template │ ├── kubelet-config.yaml.template │ └── kube-proxy-config.yaml.template ├── images # 镜像存放路径 │ ├── coredns-v1.7.0.tar │ ├── flannel-v0.15.1.tar │ └── pause-v3.2.tar ├── packages # 二进制文件存放路径 │ ├── cfssl │ │ ├── cfssl │ │ └── cfssljson │ ├── cni │ │ ├── bridge │ │ ├── flannel │ │ ├── host-local │ │ ├── loopback │ │ └── portmap │ ├── conntrack │ ├── containerd │ │ └── bin │ │ ├── containerd │ │ ├── containerd-shim │ │ ├── containerd-shim-runc-v1 │ │ ├── containerd-shim-runc-v2 │ │ ├── crictl │ │ ├── ctr │ │ └── runc │ ├── docker │ │ ├── containerd │ │ ├── containerd-shim │ │ ├── ctr │ │ ├── docker │ │ ├── dockerd │ │ ├── docker-init │ │ ├── docker-proxy │ │ └── runc │ ├── etcd │ │ ├── etcd │ │ └── etcdctl │ ├── kubernetes │ │ ├── kubeadm │ │ ├── kube-apiserver │ │ ├── kube-controller-manager │ │ ├── kubectl │ │ ├── kubelet │ │ ├── kube-proxy │ │ └── kube-scheduler │ └── nginx │ └── nginx └── README.md
Using built-in specs. COLLECT_GCC=gcc COLLECT_LTO_WRAPPER=/usr/libexec/gcc/x86_64-redhat-linux/4.8.5/lto-wrapper Target: x86_64-redhat-linux Configured with: ../configure --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=http://bugzilla.redhat.com/bugzilla --enable-bootstrap --enable-shared --enable-threads=posix --enable-checking=release --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-gnu-unique-object --enable-linker-build-id --with-linker-hash-style=gnu --enable-languages=c,c++,objc,obj-c++,java,fortran,ada,go,lto --enable-plugin --enable-initfini-array --disable-libgcj --with-isl=/builddir/build/BUILD/gcc-4.8.5-20150702/obj-x86_64-redhat-linux/isl-install --with-cloog=/builddir/build/BUILD/gcc-4.8.5-20150702/obj-x86_64-redhat-linux/cloog-install --enable-gnu-indirect-function --with-tune=generic --with-arch_32=x86-64 --build=x86_64-redhat-linux Thread model: posix gcc version 4.8.5 20150623 (Red Hat 4.8.5-36) (GCC)[root@localhost ~]# cat /etc/redhat-release CentOS Linux release 7.6.1810 (Core) [root@localhost ~]# uname -r 3.10.0-957.el7.x86_642、下载gcc-5.4.0源码包[root@localhost ~]# wget http://mirrors.nju.edu.cn/gnu/gcc/gcc-5.4.0/gcc-5.4.0.tar.gz [root@localhost ~]# tar xf gcc-5.4.0.tar.gz [root@localhost ~]# cd gcc-5.4.0/ [root@localhost gcc-5.4.0]# ./contrib/download_prerequisites "./contrib/download_prerequisites会下载mpfr-2.4.2.tar.bz2 gmp-4.3.2.tar.bz2 mpc-0.8.1.tar.gz isl-0.14.tar.bz2这四个文件" "下方的tar包里面已经创建好了gcc-build-5.4.0,执行./contrib/download_prerequisites所下载的依赖包也都下载好了,解压后,可以直接开始编译,包的大小为132.97MB" "链接:https://pan.baidu.com/s/1bEQNC20SLQ3-psEirAzfVg" "提取码:bfwg"3、编译安装gcc[root@localhost gcc-5.4.0]# mkdir gcc-build-5.4.0 [root@localhost gcc-5.4.0]# cd gcc-build-5.4.0/ [root@localhost gcc-build-5.4.0]# ../configure --enable-checking=release \ --enable-languages=c,c++ \ --with-arch_32=x86-64 \ --build=x86_64-redhat-linux \ --disable-multilib [root@localhost gcc-build-5.4.0]# make && make install "make && make install执行了大约1小时,可以喝喝茶,看看手机,站起来转两圈"4、验证gcc版本[root@localhost ~]# /usr/local/bin/gcc -v Using built-in specs. COLLECT_GCC=/usr/local/bin/gcc COLLECT_LTO_WRAPPER=/usr/local/libexec/gcc/x86_64-redhat-linux/5.4.0/lto-wrapper Target: x86_64-redhat-linux Configured with: ../configure --enable-checking=release --enable-languages=c,c++ --with-arch_32=x86-64 --build=x86_64-redhat-linux --disable-multilib Thread model: posix gcc version 5.4.0 (GCC) "编译的gcc在/usr/local/bin目录下,/usr/bin/gcc的版本依旧还是4.8.5的"5、更新gcc连接"保留一下4.8.5的gcc,后续需要回退的时候,可以方便很多" [root@localhost ~]# mv /usr/bin/gcc{,-4.8.5} [root@localhost ~]# mv /usr/lib64/libstdc++.so.6{,-4.8.5} [root@localhost ~]# mv /usr/bin/g++{,-4.8.5} [root@localhost ~]# ln -s /usr/local/bin/gcc /usr/bin/gcc [root@localhost ~]# ln -s /usr/local/lib64/libstdc++.so.6 /usr/lib64/libstdc++.so.6 [root@localhost ~]# ln -s /usr/local/bin/g++ /usr/bin/g++ "这个时候gcc -v就可以看到gcc的版本变成5.4了" [root@localhost ~]# gcc -v Using built-in specs. COLLECT_GCC=gcc COLLECT_LTO_WRAPPER=/usr/local/libexec/gcc/x86_64-redhat-linux/5.4.0/lto-wrapper Target: x86_64-redhat-linux Configured with: ../configure --enable-checking=release --enable-languages=c,c++ --with-arch_32=x86-64 --build=x86_64-redhat-linux --disable-multilib Thread model: posix gcc version 5.4.0 (GCC)
一、环境介绍`openssh版本` [root@localhost ~]# openssl version OpenSSL 1.0.2k-fips 26 Jan 2017 [root@localhost ~]# ssh -V OpenSSH_7.4p1, OpenSSL 1.0.2k-fips 26 Jan 2017`linux发行版和内核` [root@localhost ~]# cat /etc/os-release NAME="CentOS Linux" VERSION="7 (Core)" ID="centos" ID_LIKE="rhel fedora" VERSION_ID="7" PRETTY_NAME="CentOS Linux 7 (Core)" ANSI_COLOR="0;31" CPE_NAME="cpe:/o:centos:centos:7" HOME_URL="https://www.centos.org/" BUG_REPORT_URL="https://bugs.centos.org/" CENTOS_MANTISBT_PROJECT="CentOS-7" CENTOS_MANTISBT_PROJECT_VERSION="7" REDHAT_SUPPORT_PRODUCT="centos" REDHAT_SUPPORT_PRODUCT_VERSION="7" [root@localhost ~]# uname -r 3.10.0-957.el7.x86_64二、安装配置telnet2.1、安装telnet-server[root@localhost ~]# yum -y install xinetd telnet-server2.2、配置telnet`先看一下xinetd.d目录下是否有telnet文件` [root@localhost ~]# ll /etc/xinetd.d/telnet ls: cannot access /etc/xinetd.d/telnet: No such file or directory `如果有,则将文件里面的disable = no改成disable = yes` `如果没有,就进行下面的操作` [root@localhost ~]# cat > /etc/xinetd.d/telnet <<EOF service telnet disable = yes flags = REUSE socket_type = stream wait = no user = root server = /usr/sbin/in.telnetd log_on_failure += USERID EOF2.3、配置telnet登录的终端类型[root@localhost ~]# cat >> /etc/securetty <<EOF pts/0 pts/1 pts/2 pts/3 EOF2.4、启动telnet服务[root@localhost ~]# systemctl enable xinetd --now [root@localhost ~]# systemctl enable telnet.socket --now [root@localhost ~]# ss -nltp | grep 23 LISTEN 0 128 :::23 :::* users:(("systemd",pid=1,fd=46)) `23端口起来了,表示telnet服务正常运行`三、切换登录方式为telnet后面的操作都是在telnet链接的方式下进行,避免ssh中断导致升级失败以telnet方式登录的时候,注意选择协议和端口,协议为telnet,端口为23四、开始升级OpenSSH4.1、下载升级所需依赖包[root@localhost ~]# yum -y install gcc gcc-c++ glibc make autoconf openssl openssl-devel pcre-devel pam-devel4.2、下载OpenSSL和OpenSSHopenssl官网:https://www.openssl.org/openssh官网:http://www.openssh.com/[root@localhost ~]# wget https://www.openssl.org/source/openssl-1.1.1i.tar.gz [root@localhost ~]# wget http://ftp.openbsd.org/pub/OpenBSD/OpenSSH/portable/openssh-8.4p1.tar.gz [root@localhost ~]# tar xf openssl-1.1.1i.tar.gz [root@localhost ~]# tar xf openssh-8.4p1.tar.gz4.3、编译安装OpenSSL`开始之前,先备份一下原有的OpenSSL文件` [root@localhost ~]# mv /usr/bin/openssl{,.bak} [root@localhost ~]# mv /usr/include/openssl{,.bak}[root@localhost ~]# cd openssl-1.1.1i/ [root@localhost openssl-1.1.1i]# ./config shared && make && make install`编译完成后,可以在/usr/local目录下找到openssl的二进制文件和目录` [root@localhost ~]# ll /usr/local/bin/openssl -rwxr-xr-x 1 root root 749136 Jan 14 14:25 /usr/local/bin/openssl [root@localhost ~]# ll -d /usr/local/include/openssl/ drwxr-xr-x 2 root root 4096 Jan 14 14:25 /usr/local/include/openssl/`建立软连接` [root@localhost ~]# ln -s /usr/local/bin/openssl /usr/bin/openssl [root@localhost ~]# ln -s /usr/local/include/openssl/ /usr/include/openssl [root@localhost ~]# ll /usr/bin/openssl lrwxrwxrwx 1 root root 22 Jan 14 14:32 /usr/bin/openssl -> /usr/local/bin/openssl [root@localhost ~]# ll -d /usr/include/openssl lrwxrwxrwx 1 root root 27 Jan 14 14:33 /usr/include/openssl -> /usr/local/include/openssl/`重新加载配置,验证openssl版本` [root@localhost ~]# echo "/usr/local/lib64" >> /etc/ld.so.conf [root@localhost ~]# /sbin/ldconfig [root@localhost ~]# openssl version OpenSSL 1.1.1i 8 Dec 20204.3.1、可能会有的一些报错和解决方法[root@localhost ~]# openssl version openssl: error while loading shared libraries: libssl.so.1.1: cannot open shared object file: No such file or directory "这是因为libssl.so.1.1文件找不到,执行find / -name 'libssl.so.1.1',将/etc/ld.so.conf里面的lib64改成find出来的路径即可"[root@localhost ~]# find / -name "openssl" "编译完,可以用上面的find命令看一下openssl所在的路径,以及include/openssl所在的路径"4.4、编译安装OpenSSH`备份原有的ssh目录` [root@localhost ~]# mv /etc/ssh{,.bak} [root@localhost ~]# mkdir /usr/local/openssh [root@localhost ~]# cd openssh-8.4p1/ [root@localhost openssh-8.4p1]# ./configure --prefix=/usr/local/openssh \ --sysconfdir=/etc/ssh \ --with-openssl-includes=/usr/local/include \ --with-ssl-dir=/usr/local/lib64 \ --with-zlib \ --with-md5-passwords \ --with-pam && \ make && \ make install4.4.1、配置sshd_config文件[root@localhost ~]# echo "UseDNS no" >> /etc/ssh/sshd_config [root@localhost ~]# echo 'PermitRootLogin yes' >> /etc/ssh/sshd_config [root@localhost ~]# echo 'PubkeyAuthentication yes' >> /etc/ssh/sshd_config [root@localhost ~]# echo 'PasswordAuthentication yes' >> /etc/ssh/sshd_config`如果是图形化界面,需要x11的话,需要配置如下` [root@localhost ~]# echo "X11Forwarding yes" >> /etc/ssh/sshd_config [root@localhost ~]# echo "X11UseLocalhost no" >> /etc/ssh/sshd_config [root@localhost ~]# echo "XAuthLocation /usr/bin/xauth" >> /etc/ssh/sshd_config4.4.2、创建新的sshd二进制文件[root@localhost ~]# mv /usr/sbin/sshd{,.bak} [root@localhost ~]# mv /usr/bin/ssh{,.bak} [root@localhost ~]# mv /usr/bin/ssh-keygen{,.bak} [root@localhost ~]# ln -s /usr/local/openssh/bin/ssh /usr/bin/ssh [root@localhost ~]# ln -s /usr/local/openssh/bin/ssh-keygen /usr/bin/ssh-keygen [root@localhost ~]# ln -s /usr/local/openssh/sbin/sshd /usr/sbin/sshd `查看openssh当前版本` [root@localhost ~]# ssh -V OpenSSH_8.4p1, OpenSSL 1.1.1i 8 Dec 20204.4.3、重新启动openssh服务[root@localhost ~]# systemctl disable sshd --now [root@localhost ~]# mv /usr/lib/systemd/system/sshd.service{,.bak} [root@localhost ~]# systemctl daemon-reload [root@localhost ~]# cp -a openssh-8.4p1/contrib/redhat/sshd.init /etc/init.d/sshd [root@localhost ~]# cp -a openssh-8.4p1/contrib/redhat/sshd.pam /etc/pam.d/sshd.pam [root@localhost ~]# chkconfig --add sshd [root@localhost ~]# systemctl enable sshd --now4.5、ssh链接成功后的处理[root@localhost ~]# ssh root@192.168.145.130 `成功连接上之后,可以关闭telnet服务,当然,也可以不关闭` [root@localhost ~]# systemctl disable xinetd.service --now [root@localhost ~]# systemctl disable telnet.socket --now
1.修改网卡配置文件中的 DEVICE=参数的,关于eth0 [root@ansheng ~ ]# cd /etc/sysconfig/network-scripts/ [root@ansheng network-scripts_ # vi ifcfg-eno16777728 TYPE=Ethernet BOOTPROTO=dhcp DEFROUTE=yes PEERDNS=yes PEERROUTES=yes IPV4_FAILURE_FATAL=no IPV6INIT=yes IPV6_AUTOCONF=yes IPV6_DEFROUTE=yes IPV6_PEERDNS=yes IPV6_PEERROUTES=yes IPV6_FAILURE_FATAL=no NAME=eno16777728 UUID=8a3eade8-005c-46df-81f2-6e2598457bac #DEVICE=eno16777728 DEVICE=eth0 ONBOOT=yes 2.网卡配置文件名称改为ifcfg-eth0 [root@ansheng network-scripts]# mv ifcfg-eno16777728 ifcfg-eth0 3.因CentOS7采用grub2引导,还需要对grub2进行修改,编辑/etc/default/grub配置文件,在GRUB_CMDLINE_LINUX这个参数后面加入net.ifnames=0 biosdevname=0 [root@ansheng network-scripts]# vi /etc/default/grub GRUB_TIMEOUT=5 GRUB_DISTRIBUTOR="$(sed 's, release .*$,,g' /etc/system-release)" GRUB_DEFAULT=saved GRUB_DISABLE_SUBMENU=true GRUB_TERMINAL_OUTPUT="console" GRUB_CMDLINE_LINUX="rd.lvm.lv=centos/root rd.lvm.lv=centos/swap rhgb quiet net.ifnames=0 biosdevname=0" GRUB_DISABLE_RECOVERY="true" 4.用grub2-mkconfig命令重新生成GRUB配置并更新内核 [root@ansheng network-scripts]# grub2-mkconfig -o /boot/grub2/grub.cfg Generating grub configuration file ... Found linux image: /boot/vmlinuz-3.10.0-327.el7.x86_ _64 Found initrd image: /boot/initramfs-3.10.0-327.el7 .x86_ _64.img Found linux image; /boot/vmlinuz-0-rescue-4dd6b54f74c94bff9e92c61d669fc195 Found initrd image: /boot/initramfs-0-rescue-4dd6b54f74c94bff9e92c61d669fc195.img 5.重启系统
rpm升级# rpm的方式升级内核 1.载入内核公钥 [root@localhost ~]# rpm --import https://www.elrepo.org/RPM-GPG-KEY-elrepo.org 2.安装内核 ELRepo [root@localhost ~]# rpm -Uvh http://www.elrepo.org/elrepo-release-7.0-3.el7.elrepo.noarch.rpm 3.载入elrepo-kernel元数据 [root@localhost ~]# yum --disablerepo=\* --enablerepo=elrepo-kernel repolist 4.查看可用的rpm包 [root@localhost ~]# yum --disablerepo=\* --enablerepo=elrepo-kernel list kernel* # 产品需求是4.14的内核,这里没有,只好去官方找安装包,进行编译升级 Loaded plugins: fastestmirror Loading mirror speeds from cached hostfile * elrepo-kernel: mirrors.neusoft.edu.cn Installed Packages kernel.x86_64 3.10.0-957.el7 @anaconda kernel-tools.x86_64 3.10.0-957.el7 @anaconda kernel-tools-libs.x86_64 3.10.0-957.el7 @anaconda Available Packages kernel-lt.x86_64 4.4.235-1.el7.elrepo elrepo-kernel kernel-lt-devel.x86_64 4.4.235-1.el7.elrepo elrepo-kernel kernel-lt-doc.noarch 4.4.235-1.el7.elrepo elrepo-kernel kernel-lt-headers.x86_64 4.4.235-1.el7.elrepo elrepo-kernel kernel-lt-tools.x86_64 4.4.235-1.el7.elrepo elrepo-kernel kernel-lt-tools-libs.x86_64 4.4.235-1.el7.elrepo elrepo-kernel kernel-lt-tools-libs-devel.x86_64 4.4.235-1.el7.elrepo elrepo-kernel kernel-ml.x86_64 5.8.7-1.el7.elrepo elrepo-kernel kernel-ml-devel.x86_64 5.8.7-1.el7.elrepo elrepo-kernel kernel-ml-doc.noarch 5.8.7-1.el7.elrepo elrepo-kernel kernel-ml-headers.x86_64 5.8.7-1.el7.elrepo elrepo-kernel kernel-ml-tools.x86_64 5.8.7-1.el7.elrepo elrepo-kernel kernel-ml-tools-libs.x86_64 5.8.7-1.el7.elrepo elrepo-kernel kernel-ml-tools-libs-devel.x86_64 5.8.7-1.el7.elrepo elrepo-kernel # lt:long term support,长期支持版本; # ml:mainline,主线版本; 5.安装最新版本的kernel [root@localhost ~]# yum --disablerepo=\* --enablerepo=elrepo-kernel -y install kernel-ml.x86_64 6.删除旧版本工具包 [root@localhost ~]# yum -y remove kernel-tools-libs.x86_64 kernel-tools.x86_64 7.安装新版本工具包 [root@localhost ~]# yum --disablerepo=\* --enablerepo=elrepo-kernel -y install kernel-ml-tools.x86_64编译升级升级前[root@localhost ~]# uname -r 3.10.0-957.el7.x86_64 [root@localhost ~]# cat /etc/redhat-release CentOS Linux release 7.6.1810 (Core)下载安装包linux内核官网:https://www.kernel.org/ [root@localhost ~]# wget https://cdn.kernel.org/pub/linux/kernel/v4.x/linux-4.14.196.tar.xz 链接:https://pan.baidu.com/s/1QSc_PeVsj6olE6vIrRScYQ 提取码:w1g2 # 有百度会员的,可以用网盘下载,没有的话,只能wget了,官方的服务器下载会比较慢,文件不大,只有97MB不到编译内核选择配置项的方式有以下几种,选择哪种都可以。 make config (基于文本的配置界面) make menuconfig (基于文本菜单的配置界面) make xconfig (基于图形窗口的配置界面) make oldconfig (基于原来内核配置的基础上修改) 以上几种: # make xconfig 最为友好,基于窗口操作,但是需要 Xwindow 的支持,CentOS 还好,若是使用其它没有图形界面的发行版就 GG 了。 # make menuconfig 相对比较友好,又是基于文本菜单,所有的发行版都可以使用,所以这里推荐使用 make menuconfig。 使用 make menuconfig 需要 ncurses-devel 的支持,如果之前没装过,需要执行下面的命令安装一下。 [root@localhost ~]# yum -y install ncurses-devel[root@localhost ~]# tar xf linux-4.14.196.tar.xz -C /usr/local/ [root@localhost linux-4.14.196]# yum -y install gcc make elfutils-libelf-devel openssl-devel bc # 安装编译内核所需的依赖 [root@localhost ~]# cd /usr/local/linux-4.14.196/ [root@localhost linux-4.14.196]# make menuconfig # 执行成功后,会显示下面的界面Linux 内核所有的配置项都在这里,内核的编译分为两个部分,核心和模块,对于核心的部分,要编译进核心,可能以后会用到的部分,尽量编译成模块。 文本菜单选择界面,使用左(←)、右(→)箭头切换底部菜单,上(↑)、下(↓)箭头切换中间的配置项,空格键 选择配置项,部分配置项右边有 —> 标识,代表有下级子项,可以使用 Enter 进去选择。 同时每一项的前面都有以下标识,可以根据需要选择。 - <*>[*] 表示编译进核心 - <M> 表示编译成模块 - 空格 表示不选中此项 如果你只是看一下整个编译过程,不想深究每一项,执行上一步 make menuconfig 之后,直接保存退出就可以了,它会使用 CentOS 内部的配置文件作为这次编译的配置文件# 配置项选完,config 配置文件生成之后,就可以开始编译了 # 编译时间比较长,如果上面你是自定义配置项,把不需要的配置都关闭,编译会快的多。我这使用的 CentOS 内部的配置文件,CentOS 为了大多数人的使用,开的配置项比较多,所以编译的时间比较长,也和你的电脑配置有关。我make了两个小时。 [root@localhost linux-4.14.196]# make [root@localhost linux-4.14.196]# make modules_install # 安装模块 [root@localhost linux-4.14.196]# make install # 安装核心 [root@localhost linux-4.14.196]# ll /boot/ # 安装完成后,就可以看到4.14的内核文件了 total 201364 -rw-r--r--. 1 root root 151918 Nov 9 2018 config-3.10.0-957.el7.x86_64 drwxr-xr-x. 3 root root 17 Jun 30 18:04 efi drwxr-xr-x. 2 root root 4096 Sep 9 14:06 extlinux drwxr-xr-x. 2 root root 27 Jun 30 18:05 grub drwx------. 5 root root 97 Sep 9 19:07 grub2 -rw-------. 1 root root 57430086 Jun 30 18:08 initramfs-0-rescue-502ad5c8bfc847fea2cacceff257adae.img -rw-------. 1 root root 22417877 Jun 30 18:09 initramfs-3.10.0-957.el7.x86_64.img -rw-------. 1 root root 98006427 Sep 9 19:07 initramfs-4.14.196.img -rw-r--r--. 1 root root 314036 Nov 9 2018 symvers-3.10.0-957.el7.x86_64.gz lrwxrwxrwx. 1 root root 25 Sep 9 19:04 System.map -> /boot/System.map-4.14.196 -rw-------. 1 root root 3543471 Nov 9 2018 System.map-3.10.0-957.el7.x86_64 -rw-r--r--. 1 root root 3498834 Sep 9 19:04 System.map-4.14.196 lrwxrwxrwx. 1 root root 22 Sep 9 19:04 vmlinuz -> /boot/vmlinuz-4.14.196 -rwxr-xr-x. 1 root root 6639904 Jun 30 18:08 vmlinuz-0-rescue-502ad5c8bfc847fea2cacceff257adae -rwxr-xr-x. 1 root root 6639904 Nov 9 2018 vmlinuz-3.10.0-957.el7.x86_64 -rw-r--r--. 1 root root 7517472 Sep 9 19:04 vmlinuz-4.14.196更新启动引导[root@localhost linux-4.14.196]# awk -F \' '$1=="menuentry " {print i++ " : " $2}' /etc/grub2.cfg # 查看内核插入顺序,默认新内核是从头插入 0 : CentOS Linux (4.14.196) 7 (Core) 1 : CentOS Linux (3.10.0-957.el7.x86_64) 7 (Core) 2 : CentOS Linux (0-rescue-502ad5c8bfc847fea2cacceff257adae) 7 (Core) [root@localhost linux-4.14.196]# grub2-mkconfig -o /boot/grub2/grub.cfg Generating grub configuration file ... Found linux image: /boot/vmlinuz-4.14.196 Found initrd image: /boot/initramfs-4.14.196.img Found linux image: /boot/vmlinuz-3.10.0-957.el7.x86_64 Found initrd image: /boot/initramfs-3.10.0-957.el7.x86_64.img Found linux image: /boot/vmlinuz-0-rescue-502ad5c8bfc847fea2cacceff257adae Found initrd image: /boot/initramfs-0-rescue-502ad5c8bfc847fea2cacceff257adae.img done修改默认启动内核[root@localhost ~]# grub2-editenv list # 查看默认启动的内核 saved_entry=CentOS Linux (3.10.0-957.el7.x86_64) 7 (Core) [root@localhost ~]# awk -F \' '$1=="menuentry " {print i++ " : " $2}' /etc/grub2.cfg # 查看内核插入顺序 0 : CentOS Linux (4.14.196) 7 (Core) 1 : CentOS Linux (3.10.0-957.el7.x86_64) 7 (Core) 2 : CentOS Linux (0-rescue-502ad5c8bfc847fea2cacceff257adae) 7 (Core) [root@localhost ~]# grub2-set-default 'CentOS Linux (4.14.196) 7 (Core)' # 设置默认启动的内核 [root@localhost ~]# grub2-editenv list # 查看默认启动的内核 saved_entry=CentOS Linux (4.14.196) 7 (Core)重启之后验证[root@localhost ~]# reboot [root@localhost ~]# uname -r 4.14.196
注意:环境要求`阿里源的centos6和centos7各14G不到,注意磁盘空间` '环境准备,修改hostname,关闭防火墙,disabled selinux' [root@localhost ~]# hostnamectl set-hostname --static yum-server [root@yum-server ~]# systemctl disable firewalld --now [root@yum-server ~]# sed -i 's/^SELINUX=.*/SELINUX=disabled/' /etc/sysconfig/selinux1、配置服务器端yum1.1、安装yum源工具[root@yum-server ~]# yum -y install epel-release.noarch # nginx需要epel源 [root@yum-server ~]# yum -y install nginx # 安装nginx [root@yum-server ~]# yum -y install createrepo yum-utils # 安装repository管理工具1.2、配置nginx[root@yum-server nginx]# cd /etc/nginx/ [root@yum-server nginx]# cp nginx.conf{,.bak} # 备份!备份!备份! [root@yum-server nginx]# vim nginx.conf server { listen 80; server_name localhost; root /usr/share/nginx/html; # Load configuration files for the default server block. include /etc/nginx/default.d/*.conf; location / { # 在server段加入以下三段内容 autoindex on; # 表示:自动在index.html的索引打开 autoindex_exact_size on; # 表示:如果有文件,则显示文件的大小 autoindex_localtime on; # 表示:显示更改时间,以当前系统的时间为准 error_page 404 /404.html; location = /40x.html { error_page 500 502 503 504 /50x.html; location = /50x.html { [root@yum-server nginx]# nginx -t # 检测一下nginx语法是否有错 nginx: the configuration file /etc/nginx/nginx.conf syntax is ok nginx: configuration file /etc/nginx/nginx.conf test is successful [root@yum-server nginx]# systemctl enable nginx.service --now # 启动nginx,设为开机自启 Created symlink from /etc/systemd/system/multi-user.target.wants/nginx.service to /usr/lib/systemd/system/nginx.service. [root@yum-server nginx]# curl -I http://localhost # 访问本地,状态码返回200,服务正常 HTTP/1.1 200 OK Server: nginx/1.16.1 Date: Sun, 05 Jul 2020 09:48:05 GMT Content-Type: text/html Content-Length: 4833 Last-Modified: Fri, 16 May 2014 15:12:48 GMT Connection: keep-alive ETag: "53762af0-12e1" Accept-Ranges: bytes1.2.1、配置nginx页面目录[root@yum-server nginx]# cd /usr/share/nginx/html/ [root@yum-server html]# mkdir -p CentOS-YUM/Aliyun/{version_6,version_7}/64bit [root@yum-server html]# tree /usr/share/nginx/html/CentOS-YUM/ /usr/share/nginx/html/CentOS-YUM/ └── Aliyun ├── version_6 │ └── 64bit └── version_7 └── 64bit 5 directories, 0 files[root@yum-server html]# cd CentOS-YUM/ [root@yum-server CentOS-YUM]# vim index.html <p style="font-weight:bolder;color:green;font-size:30px;">ALL of the packages in the below:</p> <br/> <a href="http://192.168.57.133/CentOS-YUM/Aliyun">version_6</a><br/> These packagers using for Centos 6<br/> <a href="http://192.168.57.133/CentOS-YUM/Aliyun">version_7</a><br/> These packagers using for Centos 7<br/> <p style="font-weight:bolder;color:red;font-size:18px;">Please replace the file and fill in the f ollowing content:</p> <p style="font-weight:bolder;color:blue;font-size:15px;">Way: /etc/yum.repos.d/CentOS-Base.repo</ p>1.3、替换yum源文件# 备份原来的官方yum源 [root@yum-server CentOS-YUM]# cd /etc/yum.repos.d/ [root@yum-server yum.repos.d]# mv ./* /tmp/[root@yum-server yum.repos.d]# wget -O /etc/yum.repos.d/CentOS-Base.repo https://mirrors.aliyun.com/repo/Centos-7.repo # 下载aliyun的centos7的yum源 [root@yum-server yum.repos.d]# vim yum.reposync.sh # 同步脚本 #!/usr/bin/bash reposync -p /usr/share/nginx/html/CentOS-YUM/Aliyun/version_7/64bit/ # 同步/etc/yum.repos.d/CentOS-Base.repo内的rpm包 /usr/bin/sed -i "s/7/6/g" /etc/yum.repos.d/CentOS-Base.repo # 将CentOS-Base.repo内的7改成6,后续同步centos6的rpm包使用 reposync -p /usr/share/nginx/html/CentOS-YUM/Aliyun/version_6/64bit/ # 同步/etc/yum.repos.d/CentOS-Base.repo内的rpm包(此时是centos6的包) /usr/bin/sed -i "s/6/7/g" /etc/yum.repos.d/CentOS-Base.repo # 重新将CentOS-Base.repo内的6改回成7,下次同步的时候,不会出错 [root@yum-server yum.repos.d]# chmod +x yum.reposync.sh # 要给执行权限 [root@yum-server yum.repos.d]# ll total 8 -rw-r--r-- 1 root root 2523 Jun 16 2018 CentOS-Base.repo -rwxr-xr-x 1 root root 303 Jul 5 19:02 yum.reposync.sh [root@yum-server yum.repos.d]# sh yum.reposync.sh # 等待同步完成 # 同步完成,查看文件大小,合计27G [root@yum-server CentOS-YUM]# du -ch Aliyun/ 9.0G Aliyun/version_7/64bit/base/Packages 9.0G Aliyun/version_7/64bit/base 616M Aliyun/version_7/64bit/extras/Packages 616M Aliyun/version_7/64bit/extras 3.6G Aliyun/version_7/64bit/updates/Packages 3.6G Aliyun/version_7/64bit/updates 14G Aliyun/version_7/64bit 14G Aliyun/version_7 9.0G Aliyun/version_6/64bit/base/Packages 9.0G Aliyun/version_6/64bit/base 616M Aliyun/version_6/64bit/extras/Packages 616M Aliyun/version_6/64bit/extras 3.6G Aliyun/version_6/64bit/updates/Packages 3.6G Aliyun/version_6/64bit/updates 14G Aliyun/version_6/64bit 14G Aliyun/version_6 27G Aliyun/ 27G total1.4、建立yum源仓库'因为建仓最终的目的也是可供client来进行检索的,所以得每个Packages目录都要建成仓库,所以建仓的时候,目录指到最底层的Packages,而-np更新的时候只用指定到64bit的目录就可以了,否则会重复建立base、extras、updates三个目录进行下载 [root@yum-server ~]# createrepo -p /usr/share/nginx/html/CentOS-YUM/Aliyun/version_7/64bit/base/Packages/ Spawning worker 0 with 10070 pkgs Workers Finished Saving Primary metadata Saving file lists metadata Saving other metadata Generating sqlite DBs Sqlite DBs complete [root@yum-server ~]# createrepo -p /usr/share/nginx/html/CentOS-YUM/Aliyun/version_7/64bit/extras/Packages/ Spawning worker 0 with 397 pkgs Workers Finished Saving Primary metadata Saving file lists metadata Saving other metadata Generating sqlite DBs Sqlite DBs complete [root@yum-server ~]# createrepo -p /usr/share/nginx/html/CentOS-YUM/Aliyun/version_7/64bit/updates/Packages/ Spawning worker 0 with 884 pkgs Workers Finished Saving Primary metadata Saving file lists metadata Saving other metadata Generating sqlite DBs Sqlite DBs complete [root@yum-server ~]# createrepo -p /usr/share/nginx/html/CentOS-YUM/Aliyun/version_6/64bit/base/Packages/ Spawning worker 0 with 10070 pkgs Workers Finished Saving Primary metadata Saving file lists metadata Saving other metadata Generating sqlite DBs Sqlite DBs complete [root@yum-server ~]# createrepo -p /usr/share/nginx/html/CentOS-YUM/Aliyun/version_6/64bit/updates/Packages/ Spawning worker 0 with 884 pkgs Workers Finished Saving Primary metadata Saving file lists metadata Saving other metadata Generating sqlite DBs Sqlite DBs complete [root@yum-server ~]# createrepo -p /usr/share/nginx/html/CentOS-YUM/Aliyun/version_6/64bit/extras/Packages/ Spawning worker 0 with 397 pkgs Workers Finished Saving Primary metadata Saving file lists metadata Saving other metadata Generating sqlite DBs Sqlite DBs complete[root@yum-server ~]# tree -d /usr/share/nginx/html/CentOS-YUM/Aliyun/ # 建仓完成后,会自动生成一个repodata目录 /usr/share/nginx/html/CentOS-YUM/Aliyun/ ├── version_6 │ └── 64bit │ ├── base │ │ └── Packages │ │ └── repodata │ ├── extras │ │ └── Packages │ │ └── repodata │ └── updates │ └── Packages │ └── repodata └── version_7 └── 64bit ├── base │ └── Packages │ └── repodata ├── extras │ └── Packages │ └── repodata └── updates └── Packages └── repodata 22 directories'可以写一个更新yum源的脚本,然后写一个计划任务,定期更新yum源(reposync -np 就是更新新的rpm包) #!/usr/bin/bash reposync -np /usr/share/nginx/html/CentOS-YUM/Aliyun/version_7/64bit/ echo "centos7 is sync complate" /usr/bin/sed -i "s/7/6/g" /etc/yum.repos.d/CentOS-Base.repo` reposync -np /usr/share/nginx/html/CentOS-YUM/Aliyun/version_6/64bit/ echo "centos6 is sync complate" /usr/bin/sed -i "s/6/7/g" /etc/yum.repos.d/CentOS-Base.repo2、配置客户端yum# 备份原来的yum源 [root@localhost ~]# cd /etc/yum.repos.d/ [root@localhost yum.repos.d]# ls CentOS-Base.repo CentOS-Debuginfo.repo CentOS-Media.repo CentOS-Vault.repo epel-testing.repo CentOS-CR.repo CentOS-fasttrack.repo CentOS-Sources.repo epel.repo [root@localhost yum.repos.d]# mkdir back [root@localhost yum.repos.d]# mv *.repo back/ [root@localhost yum.repos.d]# ls back[root@localhost yum.repos.d]# vim CentOS-Base.repo # 需要6,就使用6,需要7,就使用7,也可以使用yum-plugin-priorities工具来控制优先级,加上priority=1(2|3|4都可以)来控制优先级 [Aliyun_7_base] name=source_from_localserver baseurl=http://192.168.57.133/CentOS-YUM/Aliyun/version_7/64bit/base/Packages gpgcheck=0 enable=1 [Aliyun_7_extras] name=source_from_localserver baseurl=http://192.168.57.133/CentOS-YUM/Aliyun/version_7/64bit/extras/Packages gpgcheck=0 enable=1 [Aliyun_7_updates] name=source_from_localserver baseurl=http://192.168.57.133/CentOS-YUM/Aliyun/version_7/64bit/updates/Packages gpgcheck=0 enable=1 # [Aliyun_6_base] # name=source_from_localserver # baseurl=http://192.168.57.133/CentOS-YUM/Aliyun/version_6/64bit/base/Packages # gpgcheck=0 # enable=1 # [Aliyun_6_extras] # name=source_from_localserver # baseurl=http://192.168.57.133/CentOS-YUM/Aliyun/version_6/64bit/extras/Packages # gpgcheck=0 # enable=1 # [Aliyun_6_updates] # name=source_from_localserver # baseurl=http://192.168.57.133/CentOS-YUM/Aliyun/version_6/64bit/updates/Packages # gpgcheck=0 # enable=1 [root@localhost yum.repos.d]# yum clean all [root@localhost yum.repos.d]# yum makecache# 安装软件来测试一下 [root@localhost yum.repos.d]# yum -y install net-tools Loaded plugins: fastestmirror, priorities Loading mirror speeds from cached hostfile Resolving Dependencies --> Running transaction check ---> Package net-tools.x86_64 0:2.0-0.25.20131004git.el7 will be installed --> Finished Dependency Resolution Dependencies Resolved ===================================================================================================== Package Arch Version Repository Size ===================================================================================================== Installing: net-tools x86_64 2.0-0.25.20131004git.el7 Aliyun_7_base 306 k Transaction Summary ===================================================================================================== Install 1 Package Total download size: 306 k Installed size: 917 k Downloading packages: net-tools-2.0-0.25.20131004git.el7.x86_64.rpm | 306 kB 00:00:00 Running transaction check Running transaction test Transaction test succeeded Running transaction Installing : net-tools-2.0-0.25.20131004git.el7.x86_64 1/1 Verifying : net-tools-2.0-0.25.20131004git.el7.x86_64 1/1 Installed: net-tools.x86_64 0:2.0-0.25.20131004git.el7 Complete! 安装完成,Repository里面显示,是从Aliyun_7_base内获取的,到此,yum源仓库(阿里源)部署完成
说明书rsyncrsync 是一个开源的实用程序,可提供快速增量文件传输rsync 官网和 sync 命令是完全两个玩意,sync 是将内存 buff 中的资料强制写入磁盘,rsync 是增量文件传输lsyncdLsyncd监视本地目录树事件监视器接口( inotify 或 fsevents )它聚合并组合事件几秒钟,然后生成一个(或多个)进程来同步更改,(默认情况下是 rsync )lsyncd 2.2.1 要求所有源计算机和目标计算机上的 rsync >= 3.1lsyncd githublsyncd + rsync 可以用来做数据的备份,也可以代替 nfs 做 web 服务器的共享根目录192.168.16.100 这个节点用来充当服务端的角色,192.168.16.107 和 192.168.16.108 用来充当客户端的角色(当服务端指定目录内发生修改[增删改]操作后,将修改的操作同步给客户端)需要整理一下场景和思路192.168.16.100需要部署 lsyncd 和 rsync (因为 lsyncd 是一个采用 linux 内核的 inotify 触发机制去调用 rsync 做增量文件传输)lsyncd 需要编写配置文件rsync 不需要编写配置文件192.168.16.107 和 192.168.16.108只需要部署 rsync (用来接收 192.168.16.100 传输过来的增量文件)rsync 需要编写配置文件,指定接收哪个节点传输过来的增量文件IPSERVICE/ROLEOS_VERSION192.168.16.100lsyncd & rsync/serverCentOS-7.6.1810192.168.16.107rsync/clientCentOS-7.6.1810192.168.16.108rsync/clientCentOS-7.6.1810部署客户端安装 rsync 服务192.168.16.107 和 192.168.16.108 两个节点做一样的操作yum install -y rsync编写配置文件配置文件模板 [过滤EXAMPLES]mv /etc/rsyncd.conf{,.bak} vim /etc/rsyncd.confuid = rsync gid = rsync use chroot = no max connections = 4 pid file = /var/run/rsyncd.pid lock file = /var/run/rsyncd.lock log file = /var/log/rsyncd.log timeout = 300 [data] path = /data/ read only = false list = false hosts allow = 192.168.16.100 hosts deny = 0.0.0.0/32 auth users = rsync secrets file = /etc/rsyncd.secrets参数解释uid: 运行 rsync 服务使用的用户gid: 运行 rsync 服务使用的用户组use chroot: 安全相关,需要 root 权限,默认为 truemax connections: 最大链接数(默认值为 0,表示没有限制。负值禁用模块。)pid file: pid 文件存放路径lock file: rsync 守护进程在此文件上使用记录锁定,以确保共享锁定文件的模块不会超过最大连接限制log file: 指定日志存储路径,默认为 syslogtimeout: 超时时间[data]: 模块名称(自定义,服务端同步文件的时候指定的名称)path: 服务端的同步目录read only: 客户端是否只读,若不是只读,客户端也可以同步文件到服务端list: 是否列出模块hosts allow: rsync 的服务端地址,多个主机用逗号或者空格分隔,也可以写网段(不写的话,服务端链接不上客户端,无法传输增量文件)hosts deny: 拒绝的链接的ip(不写这个参数,表示谁都可以连)auth users: 用户以及权限secrets file: 密码文件创建 rsync 用户useradd rsync -s /sbin/nologin创建目录并赋权mkdir /data chown -R rsync.rsync /data创建认证用户和密码文件密码文件的权限必须是 600,不然后期会报错无法认证echo 'rsync:rsync' > /etc/rsyncd.secrets chmod 600 /etc/rsyncd.secrets启动 rsyncrsync --daemon --config=/etc/rsyncd.conf查看 rsync 是否启动成功cat /var/log/rsyncd.logrsyncd 的版本是 3.1.2rsyncd 的进程 pid 是 21654rsyncd 的端口是 8732022/03/28 15:27:50 [21654] rsyncd version 3.1.2 starting, listening on port 873部署服务端安装 rsync 服务yum install -y rsync创建 rsync 用户useradd rsync -s /sbin/nologin创建目录并赋权顺便造点数据mkdir /data for i in $(seq 1 10);do mkdir /data/test_$i;echo "this is no.$i" > /data/test_$i/test.log;done chown -R rsync.rsync /data创建认证用户和密码文件密码文件的权限必须是 600,不然后期会报错无法认证后面执行 rsync 命令的时候会指定用户,这里就只写入密码就可以了echo 'rsync' > /etc/rsyncd.secrets chmod 600 /etc/rsyncd.secrets测试 rsync 文件同步注意格式/data/:服务端的目录rsync@192.168.16.107:用户@需要同步的客户端主机ip::data:双冒号是格式规定,data是模块的名称,和客户端配置的有关不存在会报错:@ERROR: Unknown module 'test'rsync -avz /data/ rsync@192.168.16.107::data --password-file=/etc/rsyncd.secrets输出传输的进度sending incremental file list test_1/ test_1/test.log test_10/ test_10/test.log test_2/ test_2/test.log test_3/ test_3/test.log test_4/ test_4/test.log test_5/ test_5/test.log test_6/ test_6/test.log test_7/ test_7/test.log test_8/ test_8/test.log test_9/ test_9/test.log sent 1,022 bytes received 261 bytes 2,566.00 bytes/sec total size is 131 speedup is 0.10客户端查看文件登录到 192.168.16.107 服务器查看for i in $(seq 1 10);do cat /data/test_$i/test.log;done可以查看得到内容this is no.1 this is no.2 this is no.3 this is no.4 this is no.5 this is no.6 this is no.7 this is no.8 this is no.9 this is no.10安装 lsyncd 服务lsyncd 需要 epel 源yum install -y epel* && yum install -y lsyncd编写配置文件lsyncd settings 层配置lsyncd sync 层配置mv /etc/lsyncd.conf{,.bak} vim /etc/lsyncd.conflua 语法中 -- 表示注释多个 sync 表示配置多个同步作业-- 全局配置 settings { -- 定义日志文件路径和名称 logfile = "/var/log/lsyncd/lsyncd.log", -- 定义状态文件路径和名称 statusFile = "/var/log/lsyncd/lsyncd.status", -- 指定inotify监控的事件 -- 默认是"CloseWrite",还可以是"Modify" inotifyMode = "CloseWrite", -- 最大进程数 maxProcesses = 8, -- 累计到多少所监控的事件激活一次同步,即使后面的sync配置的delay延迟时间还未到 maxDelays = 1, -- true 表示不启用守护进程模式(默认是true) nodaemon = false, -- 定义同步的配置 sync { -- 使用 rsync 进行目录同步 default.rsync, -- 源目录 source = "/data/", -- 虚拟用户和远程主机ip以及模块名称 -- 如果是 default.direct ,target 直接写同步到哪个目录即可,不需要写虚拟用户和主机ip以及模块名称 target = "rsync@192.168.16.107::data", -- 排除选项 -- excludeFrom = "/etc/lsyncd.exclude" 指定列表文件 -- exclude = { LIST } 指定规则 exclude = { '.**', '.git/**', '*.bak', '*.tmp', 'runtime/**', 'cache/**' -- 累计事件,默认15秒 -- 15s内两次修改了同一文件,最后只同步最新的文件 delay = 15, -- rsync 配置 rsync = { -- rsync 二进制文件绝对路径 [使用'whereis rsync'命令可以查看 rsync 二进制文件的绝对路径] binary = "/usr/bin/rsync", -- 指定密码文件 password_file = "/etc/rsyncd.secrets", -- 是否归档 archive = true, -- 是否压缩传输 -- 默认是 true ,根据文件大小等因素决定是否开启压缩 compress = false, verbose = false, -- 其他参数 -- bwlimit 限速,单位kb/s _extra = {"--bwlimit=200", "--omit-link-times"} sync { default.rsync, source = "/data/", target = "rsync@192.168.16.108::data", exclude = { '.**', '.git/**', '*.bak', '*.tmp', 'runtime/**', 'cache/**' delay = 15, rsync = { binary = "/usr/bin/rsync", password_file = "/etc/rsyncd.secrets", archive = true, compress = false, verbose = false, _extra = {"--bwlimit=200", "--omit-link-times"} }参数解释 settingsinotifyModeCloseWrite 和 ModifyCloseWrite 包含了以下 inotify 事件IN_ATTRIB 文件属性被修改,如 chmod、chown、touch 等IN_CLOSE_WRITE 可写文件被关闭IN_CREATE创建新文件IN_DELETE 文件/目录已在监控目录中删除IN_DELETE_SELF 监控的项目本身已删除IN_MOVED_FROM 文件被移出监控目录,如 mvIN_MOVED_TO 文件被移动到监控目录,如 mv、cpIN_DONT_FOLLOW 不追踪符号链接的真实路径IN_ONLYDIR 仅监视目录Modify 是在 CloseWrite 的基础上增加了IN_MODIFY 文件已被修改删除了IN_CLOSE_WRITE 可写文件被关闭参数解释 syncrsync、rsyncssh、direct三种模式default.rsync :使用 rsync 命令完成本地目录间同步,也可以达到使用 ssh 形式的远程 rsync 效果,或 daemon 方式连接远程 rsyncd 进程default.direct :使用 cp、rm 等命令完成本地目录间差异文件同步default.rsyncssh :同步到远程主机目录,rsync 的 ssh 模式,需要使用 key 来认证;启动 lsyncd 服务同时设置为开机自启systemctl enable lsyncd systemctl start lsyncd测试 lsyncd 功能查看服务端监听目录下的文件和目录ssh 192.168.16.100 "ls /data/"得到了如下的输出test_1 test_10 test_2 test_3 test_4 test_5 test_6 test_7 test_8 test_9同时也查看 192.168.16.107 和 192.168.16.108 两个客户端是否也是存在这些文件和目录(前面手动同步过,所以是存在的)for i in 107 108;do ssh 192.168.16.$i "ls /data/";done预期是返回两次内容test_1 test_10 test_2 test_3 test_4 test_5 test_6 test_7 test_8 test_9 test_1 test_10 test_2 test_3 test_4 test_5 test_6 test_7 test_8 test_9现在我们删除服务端(192.168.16.100)上的 /data/test_1 目录,验证客户端(192.168.16.107 和 192.168.16.108)是否也会删除这个目录ssh 192.168.16.100 "rm -rf /data/test_1"此时查看服务端和客户端是否还存在 test_1 这个目录for i in 100 107 108;do ssh 192.168.16.$i "ls /data/";done此时得到如下的返回说明 lsyncd 配和 rsync 成功同步test_10 test_2 test_3 test_4 test_5 test_6 test_7 test_8 test_9 test_10 test_2 test_3 test_4 test_5 test_6 test_7 test_8 test_9 test_10 test_2 test_3 test_4 test_5 test_6 test_7 test_8 test_9
获取当前时间的时间戳不加时间的情况下,默认输出当前时间的时间戳Linux:~ # date +%s实验效果为了更好的展现效果,使用如下命令的方式来展示Linux:~ # date ; date +%s Tue Mar 16 23:44:16 CST 2021 1615909456将时间戳转换成时间显示命令格式: date -d '@时间戳'Linux:~ # date -d '@1615909456' Tue Mar 16 23:44:16 CST 2021获取已知时间的时间戳unix时间戳是从1970年1月1日(UTC/GMT的午夜)开始所经过的秒数,不考虑闰秒Linux:~ # date -d '1970-01-01 00:00:00' +%s -28800验证时间戳转换的效果Linux:~ # date -d '@-28800' Thu Jan 1 00:00:00 CST 1970以指定格式输出时间获取到时间戳后,将转换出来的时间,按照指定的格式输出依旧使用上面的时间戳(-28800)Linux:~ # date -d '@-28800' '+%F %T' 1970-01-01 00:00:00%F same as %Y-%m-%d,显示完整的年月日,分隔符默认为 -%T same as %H:%M:%S,显示完整的时间,分隔符默认为 :
pdsh是一个多线程远程shell客户机,它在多个远程主机上并行执行命令pdsh可以使用几种不同的远程shell服务,包括标准的 rsh、Kerberos IV 和 ssh在使用pdsh之前,必须保证本地主机和要管理远程主机之间的单向信任pdsh还附带了pdcp命令,该命令可以将本地文件批量复制到远程的多台主机上,这在大规模的文件分发环境下非常有用github:https://github.com/grondo/pdsh安装CentOS系列可以使用yum安装,pdsh需要epel源Linux:~ # wget https://github.com/grondo/pdsh/archive/pdsh-2.31.tar.gz Linux:~ # tar xf pdsh-2.31.tar.gz -C /usr/local/src/ Linux:~ # cd /usr/local/src/pdsh-pdsh-2.31/ Linux:/usr/local/src/pdsh-pdsh-2.31 # ./configure \ --prefix=/usr/local/pdsh \ --with-ssh \ --with-machines=/usr/local/pdsh/machines \ --with-dshgroups=/usr/local/pdsh/group \ --with-rcmd-rank-list=ssh \ --with-exec && \ make && \ make install--with-ssh ssh模块(支持ssh)--with-rcmd-rank-list=ssh 指定默认模式为ssh--with-dshgroups= 指定默认主机组路径--with-machines= 指定默认主机列表在该文件中写入主机地址(或主机名,需要在hosts中写好主机解析),每行一个存在machines文件,使用pdsh执行时若不指定主机,则默认对machines文件中所有主机执行该命令--with-exec exec模块其他模块参数可以在pdsh-pdsh-2.31目录下使用 ./configure --help 命令查看Linux:~ # ll /usr/local/pdsh/bin/ total 516 -rwxr-xr-x 1 root root 8638 Jan 29 22:15 dshbak -rwxr-xr-x 1 root root 171664 Jan 29 22:15 pdcp -rwxr-xr-x 1 root root 171664 Jan 29 22:15 pdsh -rwxr-xr-x 1 root root 171664 Jan 29 22:15 rpdcp Linux:~ # echo 'export PATH=/usr/local/pdsh/bin:$PATH' >> /etc/profile Linux:~ # source /etc/profile "将pdsh的所有命令追加到环境变量中" Linux:~ # pdsh -V pdsh-2.31 rcmd modules: ssh,rsh,exec (default: ssh) misc modules: machines,dshgroup使用语法:pdsh <参数> <需要并行执行的命令>如果只输入前面两部分,回车后可进入pdsh交互式命令行(若是编译安装需要启用--with-readline),再输入并行执行的命令部分常用参数:-w 指定主机 -x 排除指定的主机目标主机可以使用Ip地址或主机名(确保该主机名已经在/etc/hosts中存在解析)多个主机之间可以使用逗号分隔,可重复使用该参数指定多个主机;可以使用简单的正则-g 指定主机组 -G 排除指定主机组-l 目标主机的用户名如果不指定用户名,默认以当前用户名作为在目标主机上执行命令的用户名-N 用来关闭目标主机所返回值前的主机名显示示例-w 指定主机Linux:~ # pdsh -w ssh:192.168.72.12,192.168.72.13,192.168.72.14 date 192.168.72.12: Sun Jan 31 12:35:36 CST 2021 192.168.72.14: Sun Jan 31 12:35:36 CST 2021 192.168.72.13: Sun Jan 31 12:35:36 CST 2021 "pdsh -w ssh:192.168.72.[12-14] date 也可以"-l 指定用户Linux:~ # pdsh -w ssh:192.168.72.[12-14] -l linux date 192.168.72.12: Sun Jan 31 12:36:32 CST 2021 192.168.72.13: Sun Jan 31 12:36:32 CST 2021 192.168.72.14: Sun Jan 31 12:36:32 CST 2021-g指定用户组Linux:~ # mkdir /usr/local/pdsh/group Linux:~ # cat > /usr/local/pdsh/group/test1 <<EOF 192.168.72.12 192.168.72.13 192.168.72.14 Linux:~ # pdsh -g test1 'uname -r' 192.168.72.12: 4.4.73-5-default 192.168.72.14: 4.4.73-5-default 192.168.72.13: 4.4.73-5-default主机列表Linux:~ # cat > /usr/local/pdsh/machines <<EOF 192.168.72.12 192.168.72.13 192.168.72.14 Linux:~ # pdsh -a uptime 192.168.72.12: 12:37pm up 0:08, 2 users, load average: 0.08, 0.13, 0.09 192.168.72.13: 12:37pm up 0:07, 1 user, load average: 0.12, 0.05, 0.01 192.168.72.14: 12:37pm up 0:07, 1 user, load average: 0.00, 0.01, 0.00交互式界面"有exec模块即可,或者readline模块" Linux:~ # pdsh -a pdsh> date 192.168.72.14: Sun Jan 31 12:38:05 CST 2021 192.168.72.13: Sun Jan 31 12:38:05 CST 2021 192.168.72.12: Sun Jan 31 12:38:05 CST 2021 pdsh> whoami 192.168.72.12: root 192.168.72.14: root 192.168.72.13: root pdsh> exit "退出交互式界面"
Dockerfile 之 volume定义镜像启动时容器内需要持久化的路径docker run 之 -v 参数启动镜像时指定需要持久化的路径乍一看,没啥区别,请听我一一道来敲黑板如果 Dockerfile 内指定了 volume,并且 docker run -v 参数指向了和 volume 配置的路径一致时,-v 参数会将宿主机路径下的文件覆盖掉 volume 配置的路径下的文件如果 Dockerfile 内指定了 volume,并且 docker run -v 参数没有指向和 volume 配置的路径一致时,-v 参数会将容器内的文件映射到宿主机上,而 Dockerfile 指定的 volume 仍然被指向 docker 数据存储路径下的 volumes 路径下如何确定镜像内是否有指定 volumedocker inspect <镜像id> | grep Volumes -A 1如果有指定 volume ,则会返回容器内的路径,例如: "Volumes": { "/var/log": {} "Volumes": { "/var/log": {}如果没有指定 volume ,则会返回如下内容 "Volumes": null, "WorkingDir": "", "Volumes": null, "WorkingDir": "",实践出真知docker 安装时,默认的数据存储路径在 /var/lib/docker ,如果自己的 docker 在安装时,设置了数据存储路径,需要替换掉文档内展示的 /var/lib/docker ,否则会找不到目录准备一个 DockerfileFROM centos:7 VOLUME /var/log RUN echo '/usr/bin/sleep 315360000' > start.sh && \ chmod +x start.sh CMD ["/usr/bin/bash","start.sh"]生成新的镜像docker build -t centos:volume_test .有 volume 参数,docker run 时不加 -v 参数启动带有 volume 的镜像docker run -d --name test_volume centos:volume_test查看 volume 默认挂载的路径docker inspect test_volume | grep Source有一个单独的 volumes 目录来提供给容器使用"Source": "/var/lib/docker/volumes/05e5ce2c4e8fbdd13313c2ea643bb3a6732308085686fb58a2a885872fe54b88/_data",查看目录下的文件ls -l /var/lib/docker/volumes/05e5ce2c4e8fbdd13313c2ea643bb3a6732308085686fb58a2a885872fe54b88/_data可以看到,默认的 centos 容器的 /var/log 目录下有这些内容,等下后面我们验证一下 -v 参数的小细节,验证是否会覆盖容器内的内容total 40 -rw------- 1 root utmp 0 Nov 13 2020 btmp -rw-r--r-- 1 root root 193 Nov 13 2020 grubby_prune_debug -rw-r--r-- 1 root root 23944 Nov 13 2020 lastlog -rw------- 1 root root 5248 Nov 13 2020 tallylog -rw-rw-r-- 1 root utmp 0 Nov 13 2020 wtmp -rw------- 1 root root 1430 Nov 13 2020 yum.log删除容器,查看目录是否还存在下面的方式比较暴力,练习环境无所谓,重要环境需要 one two three 然后再 go [ 三思而后行 ]docker rm -f test_volume查看之前的 volume 映射的路径,可以看到,文件信息都还存在ls -l /var/lib/docker/volumes/05e5ce2c4e8fbdd13313c2ea643bb3a6732308085686fb58a2a885872fe54b88/_data有 volume 参数,docker run 时加 -v 参数-v 挂载的路径和 volume 的路径一致时启动带有 volume 的镜像mkdir -p /data/log docker run -d -v /data/log:/var/log --name test_volume_v centos:volume_test造点数据,方便验证echo 'test' > /data/log/test.log再次查看,可以看到,映射到我们指定的路径了docker inspect test_volume_v | grep Source验证一下我们之前的敲黑板docker exec -it test_volume_v ls -l /var/log此时,只剩下我们前面造的数据了,之前的那些文件都被宿主机的 /data/log 给覆盖了total 4 -rw-r--r-- 1 root root 5 Jul 24 06:36 test.log-v 挂载的路径和 volume 的路径不一致时启动容器之前,我们先去查看 /var/lib/docker/volume 目录下的情况ls -l /var/lib/docker/volumes/因为我的是新环境,所以目录下很干净,只有一个前面实验生成的数据目录total 24 drwx-----x 3 root root 19 Jul 24 14:42 05e5ce2c4e8fbdd13313c2ea643bb3a6732308085686fb58a2a885872fe54b88 brw------- 1 root root 253, 0 Jul 24 14:18 backingFsBlockDev drwx-----x 3 root root 19 Jul 24 14:34 log -rw------- 1 root root 32768 Jul 24 14:42 metadata.db启动一个新的容器,这里图省事,就挂在给容器内的 /etc 目录了,实际生产不可取!!!mkdir -p /data/etc docker run -d -v /data/etc:/etc --name test_volume_vo centos:volume_test查看挂载的路径docker inspect test_volume_vo | grep Source可以看到,有两个路径,一个是 -v 参数挂载的,一个是 volume 指定的"Source": "/data/etc", "Source": "/var/lib/docker/volumes/364e98d44e8a3a4f5bef17a8206e3b3920db85113d45f1d7db6485a932d1c1bf/_data",可以看到,和第一个实验展现的结果是一样的ls -l /var/lib/docker/volumes/364e98d44e8a3a4f5bef17a8206e3b3920db85113d45f1d7db6485a932d1c1bf/_data查看容器内的 /etc 目录docker exec -it test_volume_vo ls -l /etc可以看到有三个文件 [ 如果本地造了数据,也会出现在容器内,容器内的文件和宿主机文件是共存的 ]total 12 -rw-r--r-- 1 0 0 13 Jul 24 06:51 hostname -rw-r--r-- 1 0 0 174 Jul 24 06:51 hosts -rw-r--r-- 1 0 0 73 Jul 24 06:51 resolv.conf看一下宿主机的路径,验证文件是否和容器内的一致ls -l /data/etc/删除容器,验证两个不同的目录映射的数据是否都还存在下面同样是暴力执法,未成年请在家长的陪同下验证docker rm -f test_volume_vo最终的结果就不用多展示了,两个数据映射目录都是存在的总结书写 Dockerfile 的时候,最好是提前考虑是否有数据持久化的场景,是否需要配置 volume 来保证数据的安全启动一个 docker 时,最好是先 inspect 过滤一下是否配置了 volume ,避免 -v 参数覆盖了容器内的文件
浅言碎语什么叫 Docker-ComposeDocker-Compose 项目是 Docker 官方的开源项目,负责实现对 Docker 容器集群的快速编排Docker-Compose 将所管理的容器分为三层,分别是:工程(project)服务(service)容器(container)Docker-Compose 运行目录下的所有文件(Docker-Compose.yml,extends 文件或环境变量文件等)组成一个工程,若无特殊指定工程名即为当前目录名一个工程当中可包含多个服务,每个服务中定义了容器运行的镜像,参数,依赖一个服务当中可包括多个容器实例Docker-Compose 并没有解决负载均衡的问题,因此需要借助其它工具实现服务发现及负载均衡Docker-Compose 的工程配置文件默认为 Docker-Compose.yml可通过环境变量 COMPOSE_FILE 或 -f 参数自定义配置文件,其定义了多个有依赖关系的服务及每个服务运行的容器Docker-Compose 允许用户通过一个单独的 Docker-Compose.yml 模板文件(YAML 格式)来定义一组相关联的应用容器为一个项目(project)Docker-Compose 项目由 Python 编写,调用 Docker 服务提供的 API 来对容器进行管理因此,只要所操作的平台支持 Docker API,就可以在其上利用 Compose 来进行编排管理请给我一个 Docker-Composecompose github官方提供的安装方法yum 安装yum install -y epel-release && \ yum install -y docker-composepip 安装yum install -y epel-release && \ yum install -y python-pip pip install --upgrade pip pip install docker-compose二进制文件curl -L "https://github.com/docker/compose/releases/download/1.24.1/docker-compose-$(uname -s)-$(uname -m)" \ -o /usr/local/bin/docker-compose chmod +x /usr/local/bin/docker-compose安装完成后,可以用 docker-compose -version 命令来验证是否安装成功Docker-Compose 常用命令docker-compose 命令只能管理 docker-compose 启动的容器,无法管理 docker run 启动的容器docker-compose -h 可以查看帮助文档Define and run multi-container applications with Docker. Usage: # [options] 类型的参数,必须出现在 [COMMAND] 类型的参数前面 # [COMMAND] 类型的参数默认是找当前所在路径下的 docker-compose.yaml 文件 ## 如果想要在任何路径执行 [COMMAND] 类型的参数 ## 需要加上 -f 参数指定 docker-compose.yaml 文件的路径 docker-compose [-f <arg>...] [options] [COMMAND] [ARGS...] docker-compose -h|--help Options: # 指定 Compose 模板文件,默认为 docker-compose.yml,可以多次指定 -f, --file FILE Specify an alternate compose file (default: docker-compose.yml) # 指定项目名称,默认将使用所在目录名称作为项目名 -p, --project-name NAME Specify an alternate project name (default: directory name) # 输出更多调试信息 --verbose Show more output # 不要打印ANSI控制字符 --no-ansi Do not print ANSI control characters # 打印版本并退出 -v, --version Print version and exit # 要连接到的守护进程套接字 -H, --host HOST Daemon socket to connect to # 使用 tls 证书,需要包含 --tlsverify 参数 --tls Use TLS; implied by --tlsverify # CA 证书路径 --tlscacert CA_PATH Trust certs signed only by this CA # TLS证书文件的路径 --tlscert CLIENT_CERT_PATH Path to TLS certificate file # TLS密钥文件的路径 --tlskey TLS_KEY_PATH Path to TLS key file # 使用TLS并验证远程 --tlsverify Use TLS and verify the remote # 不要根据客户端证书中指定的名称检查守护程序的主机名(例如,docker 主机是IP地址) --skip-hostname-check Don't check the daemon's hostname against the name specified in the client certificate (for example if your docker host is an IP address) # 指定工作目录,默认为 Compose 所在目录 --project-directory PATH Specify an alternate working directory (default: the path of the Compose file) Commands: # [COMMAND] 类型的参数默认针对 docker-compose.yaml 文件内的所有容器执行操作 ## 如果需要针对某个指定的容器操作,可以在 [COMMAND] 类型的参数后面加上指定的容器名称 build Build or rebuild services bundle Generate a Docker bundle from the Compose file config Validate and view the Compose file create Create services down Stop and remove containers, networks, images, and volumes events Receive real time events from containers exec Execute a command in a running container help Get help on a command images List images kill Kill containers logs View output from containers pause Pause services port Print the public port for a port binding ps List containers pull Pull service images push Push service images restart Restart services rm Remove stopped containers run Run a one-off command scale Set number of containers for a service start Start services stop Stop services top Display the running processes unpause Unpause services up Create and start containers version Show the Docker-Compose version informationdocker-compose up创建并启动容器Usage: up [options] [--scale SERVICE=NUM...] [SERVICE...] Options: # 在后台运行容器,打印容器名称,不能和 --abort-on-container-exit 以及 --timeout 同时使用 -d Detached mode: Run containers in the background, print new container names. Incompatible with --abort-on-container-exit and --timeout. # 不使用颜色来区分不同的服务的控制输出 --no-color Produce monochrome output. # 不启动服务所链接的容器 --no-deps Don't start linked services. # 强制重新创建容器,不能与 -–no-recreate 同时使用 --force-recreate Recreate containers even if their configuration and image haven't changed. Incompatible with --no-recreate. # 如果容器已经存在,则不重新创建,不能与 -–force-recreate 同时使用 --no-recreate If containers already exist, don't recreate them. Incompatible with --force-recreate. # 不自动构建缺失的服务镜像 --no-build Don't build an image, even if it's missing. # 创建服务后不要启动它们 --no-start Don't start the services after creating them. # 在启动容器前构建服务镜像 --build Build images before starting containers. # 如果任何一个容器被停止,就停止所有容器。不能与 -d 同时使用 --abort-on-container-exit Stops all containers if any container was stopped. Incompatible with -d. # 停止容器时的超时间[默认单位:秒](默认为10秒)。不能与 -d 同时使用 -t, --timeout TIMEOUT Use this timeout in seconds for container shutdown when attached or when containers are already. Incompatible with -d. running. (default: 10) # 删除服务中没有在 compose 文件中定义的容器 --remove-orphans Remove containers for services not defined in the Compose file # 返回所选服务容器的退出代码。不能与 --abort-on-container-exit 同时使用 --exit-code-from SERVICE Return the exit code of the selected service container. Implies --abort-on-container-exit. # 设置服务运行容器的个数,将覆盖在 compose 中通过 scale 指定的参数 --scale SERVICE=NUM Scale SERVICE to NUM instances. Overrides the `scale` setting in the Compose file if present.docker-compose create创建容器,但不运行容器docker-compose create 要被弃用了,官方建议可以使用 docker-compose up --no-startUsage: create [options] [SERVICE...] Options: # 强制重新创建容器,不能与 -–no-recreate 同时使用 --force-recreate Recreate containers even if their configuration and image haven't changed. Incompatible with --no-recreate. # 如果容器已经存在,不需要重新创建,不能与 -–force-recreate 同时使用 --no-recreate If containers already exist, don't recreate them. Incompatible with --force-recreate. # 不创建镜像,即使缺失 --no-build Don't build an image, even if it's missing. # 创建容器前,生成镜像 --build Build images before creating containers.docker-compose scale设置指定名称的容器启动的数量docker-compose scale 要被弃用了,官方建议可以使用 docker-compose up --scale docker-compose down停止并删除容器、网络、卷、镜像Usage: down [options] Options: # 删除镜像 ## all 删除 compose 文件中定义的所有镜像 ## local 删除镜像名为空的镜像 --rmi type Remove images. Type must be one of: 'all': Remove all images used by any service. 'local': Remove only images that don't have a custom tag set by the `image` field. # 删除已经在 compose 文件中定义的和匿名的附在容器上的数据卷 -v, --volumes Remove named volumes declared in the `volumes` section of the Compose file and anonymous volumes attached to containers. # 删除服务中没有在 compose 中定义的容器 --remove-orphans Remove containers for services not defined in the Compose file # 停止容器时的超时间[默认单位:秒](默认为10秒)。 -t, --timeout TIMEOUT Specify a shutdown timeout in seconds. (default: 10)docker-compose build构建或重构容器需要 docker-compose.yml 文件中使用了 build 选项容器一旦构建后,将会带上一个标记名。可以随时在项目目录下运行 docker-compose build 来重新构建Usage: build [options] [--build-arg key=val...] [SERVICE...] Options: # 删除构建过程中的临时容器 --force-rm Always remove intermediate containers. # 构建镜像过程中不使用缓存 --no-cache Do not use cache when building the image. # 始终尝试通过拉取操作来获取更新版本的镜像 --pull Always attempt to pull a newer version of the image. # 为构建的容器设置内存大小 -m, --memory MEM Sets memory limit for the bulid container. # 为服务设置 build-time 变量 --build-arg key=val Set build-time variables for one service.dokcer-compose config验证并查看 compose 文件语法格式Usage: config [options] Options: # 将镜像标签标记为摘要 --resolve-image-digests Pin image tags to digests. # 只验证配置,不输出。 ## 当配置正确时,不输出任何内容 ## 当文件配置错误,输出错误信息 -q, --quiet Only validate the configuration, don't print anything. # 打印服务名,一行一个 --services Print the service names, one per line. # 打印数据卷名,一行一个 --volumes Print the volume names, one per line.docker-compose pull拉取 docker-compose.yaml 文件内的镜像Usage: pull [options] [SERVICE...] Options: # 忽略拉取镜像过程中的错误 --ignore-pull-failures Pull what it can and ignores images with pull failures. # 多个镜像同时拉取 --parallel Pull multiple images in parallel. # 拉取镜像过程中不打印进度信息 --quiet Pull without printing progress informationdocker-compose push推送 docker-compose.yaml 文件内的镜像Usage: push [options] [SERVICE...] Options: # 忽略推送镜像过程中的错误 --ignore-push-failures Push what it can and ignores images with push failures.docker-compose top查看正在运行的 compose 项目进程可以看到启动的容器名称(web1、web2)和相关的进程信息(UID、PID、PPID、C、STIME、TTY 、TIME、CMD)web1 UID PID PPID C STIME TTY TIME CMD -------------------------------------------------------------------------------------------- root 7658 7605 0 14:57 ? 00:00:00 nginx: master process nginx -g daemon off; 101 7851 7658 0 14:57 ? 00:00:00 nginx: worker process 101 7852 7658 0 14:57 ? 00:00:00 nginx: worker process UID PID PPID C STIME TTY TIME CMD -------------------------------------------------------------------------------------------- root 7649 7571 0 14:57 ? 00:00:00 nginx: master process nginx -g daemon off; 101 7864 7649 0 14:57 ? 00:00:00 nginx: worker process 101 7865 7649 0 14:57 ? 00:00:00 nginx: worker processdocker-compose ps列出项目中的所有容器docker-compose stop停止正在运行的容器Usage: stop [options] [SERVICE...] Options: # 停止容器时的超时间[默认单位:秒](默认为10秒)。 -t, --timeout TIMEOUT Specify a shutdown timeout in seconds. (default: 10)docker-compose start启动已经存在的容器docker-compose restart重启已经存在的容器docker-compose kill强制停止容器Usage: kill [options] [SERVICE...] Options: # 通过发送SIGKILL信号来强制停止服务容器 -s SIGNAL SIGNAL to send to the container. Default signal is SIGKILL.docker-compose pause暂停一个容器docker-compose unpause恢复暂停状态的容器docker-compose rm删除 stoped 状态下的容器docker-compose kill 和 docker-compose stop 命令执行过的容器,都可以被 docker-compose rm 命令删除Usage: rm [options] [SERVICE...] Options: # 强制删除,不需要用户确认 -f, --force Don't ask to confirm removal # 如果需要,在移除之前停止容器 -s, --stop Stop the containers, if required, before removing # 删除容器所挂载的数据卷 -v Remove any anonymous volumes attached to containers # 已弃用 - 无效参数 -a, --all Deprecated - no effect.删除一个非 stopped 状态的容器,会返回 No stopped containersdocker-compose logs查看容器的输出Usage: logs [options] [SERVICE...] Options: # 默认不同的容器用不同的颜色区分,可以选择不区分颜色 --no-color Produce monochrome output. # 动态加载 -f, --follow Follow log output. # 显示时间戳 -t, --timestamps Show timestamps. # 看尾部指定行数的输出 --tail="all" Number of lines to show from the end of the logs for each container.docker-compose run对服务运行一次性命令Usage: run [options] [-v VOLUME...] [-p PORT...] [-e KEY=VAL...] [-l KEY=VALUE...] SERVICE [COMMAND] [ARGS...] Options: # 分离模式:在后台运行容器,打印新的容器名称 -d Detached mode: Run container in the background, print new container name. # 给 run 的容器分配一个名字 --name NAME Assign a name to the container # 覆盖镜像的 entrypoint --entrypoint CMD Override the entrypoint of the image. # 设置环境变量 (可多次使用) -e KEY=VAL Set an environment variable (can be used multiple times) # 增加或覆盖一个 label (可多次使用) -l, --label KEY=VAL Add or override a label (can be used multiple times) # 指定一个用户名或者 uid 执行 run -u, --user="" Run as specified username or uid # 不启动关联的服务 --no-deps Don't start linked services. # run 执行完成后删除 run 的镜像 (分离模式下被忽略) --rm Remove container after run. Ignored in detached mode. # 端口映射 -p, --publish=[] Publish a container's port(s) to the host # 在启用服务端口并映射到主机的情况下运行命令 --service-ports Run command with the service's ports enabled and mapped to the host. # 挂载卷 -v, --volume=[] Bind mount a volume (default []) # 禁用伪tty分配。默认情况下,“docker compose run”分配TTY -T Disable pseudo-tty allocation. By default `docker-compose run` allocates a TTY. # 容器内的操作路径 -w, --workdir="" Working directory inside the containerdocker-compose exec进入指定名称的容器Usage: exec [options] [-e KEY=VAL...] SERVICE COMMAND [ARGS...] Options: # 分离模式:在后台运行命令 -d Detached mode: Run command in the background. # 为进程授予扩展权限 --privileged Give extended privileges to the process. # 使用指定的用户执行命令 -u, --user USER Run the command as this user. # 禁用伪tty分配。默认情况下,“docker compose run”分配TTY -T Disable pseudo-tty allocation. By default `docker-compose exec` allocates a TTY. # 如果服务有多个实例,使用 --index 指定实例 [默认:1] --index=index index of the container if there are multiple instances of a service [default: 1] # 设置环境变量(可以多次使用,API < 1.25 不支持) -e, --env KEY=VAL Set environment variables (can be used multiple times, not supported in API < 1.25)docker-compose port查看容器内指定端口映射了宿主机的哪个端口Usage: port [options] SERVICE PRIVATE_PORT Options: # 指定协议, tcp 或者 udp [默认:tcp] --protocol=proto tcp or udp [default: tcp] # 如果服务有多个实例,使用 --index 指定实例 [默认:1] --index=index index of the container if there are multiple instances of a service [default: 1]docker-compose version查看 docker-compose 命令的详细版本docker-compose version 1.18.0, build 8dd22a9 docker-py version: 2.6.1 CPython version: 3.6.8 OpenSSL version: OpenSSL 1.0.2k-fips 26 Jan 2017Docker-Compose 编排文件YAML 的布尔值(true, false, yes, no, on, off)必须要使用引号引起来(单引号、双引号均可),否则会当成字符串解析官方建议 docker-compose.yml 文件内的路径使用相对路径,官方认为这样可移植性会更好具体其实还是根据实际的规划来决定使用相对路径还是绝对路径Docker-Compose 标准模板文件应该包含三大部分versionservicesnetworks 最关键的是 services 和 networks 两个部分docker-compose.yaml 模板# 指定使用的格式版本 version: '3' # 定义服务 services: # 服务的名称(容器启动后的名称) # 使用的镜像 image: dockercloud/hello-world # 端口映射 ports: - 8080:80 # 使用的网络名称 networks: - front-tier - back-tier image: dockercloud/haproxy ports: - 80:80 # 链接到其他服务的容器中 links: - web networks: - front-tier - back-tier # 目录持久化 volumes: - /var/run/docker.sock:/var/run/docker.sock # 定义网络 networks: # 网络名称 front-tier: # 网络模式 driver: bridge back-tier: driver: bridgeversiondocker-compose.yaml 文件格式的版本version:'1' (已弃用)Compose 1.6.x 以下的版本可以使用 version: '1'version: '1' 不能申明 volumes、networks、构建参数version: '1' 默认每个容器都是 bridge 网络只能通过容器内的 ip 访问容器之间的服务发现需要用 link 模块version:'2'Compose 1.6.x 以上,Docker 1.10.0 以上的版本可以使用 version:'2'version: '2' 可以申明 volumes、networks、构建参数version:'3'Compose 1.10.x 以上,Docker 1.13.0 以上的版本可以使用 version:'3'为了在 Compose 和 Docker 引擎的swarm 模式之间交叉兼容 ,version:'3' 删除了一些选项,增加了更多的选项不同版本的 version 对应的 Docker 版本,详细的历史,可以看官方文档docker 官方关于 version 的解释编写文件格式Docker 引擎发布撰写规范19.03.0+3.819.03.0+3.718.06.0+3.618.02.0+3.517.12.0+3.417.09.0+3.317.06.0+3.217.04.0+3.11.13.1+3.01.13.0+2.417.12.0+2.317.06.0+2.21.13.0+2.11.12.0+2.01.10.0+image指定服务的镜像名称或镜像ID如果镜像在本地不存在,Compose 将会尝试拉取镜像services: image: dockercloud/hello-worldbuild服务除了可以基于指定的镜像,还可以基于一份Dockerfile在使用up启动时执行构建任务,构建标签是 build可以指定 Dockerfile 文件所在的路径Compose 将会利用 Dockerfile 自动构建镜像,然后使用构建的镜像启动服务容器Dockerfile 文件的名称必须是 Dockerfile一个服务里面,image 和 buid 只能存在一个如果都存在,Compose 会构建镜像并且把镜像命名为 image 指向的那个名字build 生成的镜像名称格式:<项目名称_服务名称>:latest如果需要对于生成的镜像做统一管理,可以配合 image 选择来定义生成的镜像名称build: /path/to/build/dircontextcontext 选项可以是 Dockerfile 的存放路径,也可以是到链接到 git仓库的 urlcontext 选项默认使用指定路径下以 Dockerfile 命名的 Dockerfile 文件build: context: /path/to/build/dirdockerfile自定义 Dockerfile 文件名称,需要使用 dockerfile 选项使用 dockerfile 选项来构建,必须指定构建路径(context)dockerfile 指令不能跟 image 同时使用,否则 Compose 将不确定根据哪个指令来生成最终的服务镜像build: context: /path/to/build/dir # 名称要 context 指向的路径下存在 dockerfile: Dockerfile-buildargs与 Dockerfile 的 ARG 一样,在构建前后使用相关的可以看我另一篇博客:Dockerfile 从入门到放弃有两种写法,都支持build: context: /path/to/build/dir dockerfile: Dockerfile-build args: os_version: 7 use_user: workbuild: context: /path/to/build/dir dockerfile: Dockerfile-build args: - os_version=7 - use_user=workcommand覆盖容器启动后默认执行的命令有两种写法,都支持version: '3' services: nginx: image: nginx:mainline-alpine command: nginx -g 'daemon off;'version: '3' services: nginx: image: nginx:mainline-alpine command: [nginx, -g, 'daemon off;']container_name指定容器名称compose 的容器名称默认格式:<项目名称_服务名称_序号>需要自定义容器名称时,可以使用 container_nameversion: '3' services: nginx: image: nginx:mainline-alpine container_name: web_staticlinks链接到其它服务中的容器links: # 服务名称 - web # 服务名称:别名 - web:staticlinks 会在容器内创建 hosts 解析172.17.2.186 web 172.17.2.186 staticdepends_on解决容器之间的依赖和启动顺序只有在 redis 和 db 启动的情况下,才会启动 webversion: '3' services: depends_on: - redis redis: image: redis image: mariadbrestart定义容器终止时使用的策略# 任何情况下都不重启(默认策略) restart: "no" # 始终重启,直到容器被删除 restart: always # 退出代码提示错误时重启 restart: on-failure # 不在乎退出代码都会重启,直到服务停止或删除 restart: unless-stoppedpull_policy镜像拉取策略# 始终都会拉取 pull_policy: always # 从不拉取,如果本地不存在则报错 pull_policy: never # 本地不存在时拉取 pull_policy: missing # 如果已存在,重构镜像 pull_policy: buildulimits覆盖容器内默认的 ulimits 参数ulimits: nproc: 65535 nofile: soft: 20000 hard: 40000user容器运行时使用的用户默认是镜像指定的用户,没有配置的情况下,默认使用 root 用户extra_hosts给容器添加 hosts 解析 [只有 linux 可以]格式必须是:<域名>:<IP>extra_hosts: - "somehost:162.242.195.82" - "otherhost:50.31.209.229"external_links链接 docker-compose.yaml 以外的容器,比如一些单独使用 docker run 命令启动的容器链接的外部容器,必须和当前 docker-compose.yaml 的容器处于同一个网络中比如:docker-compose.yaml 内定义了使用名为 net-web 的网络,链接的外部容器 --net 参数也必须时 net-webexternal_links: # redis 为外包容器的 NAME - redis # database 为外包容器的 NAME # mysql 为服务别名 - database:mysqlpid设置容器 pid 模式为主机 pid 模式和宿主机共享进程命名空间,容器使用 pid 标签能够访问和操纵其他容器和宿主机的命名空间pid: "host"ports用于映射端口使用 <宿主机端口>:<容器内端口> 的格式(类似 docker run -p)或者单独指定容器内的端口,宿主机将会随机映射端口 (类似 docker run -P)ports: - "8080" - "80:8080"volumes挂载一个目录或者一个已存在的数据卷容器如果挂在的目录是给单一的服务使用,只需要在对应的服务部分使用 volumes如果是多个服务公用的目录,需要在顶级 volumes 中申明一个卷宿主机路径可以是相对路径services: backend: image: awesome/backend volumes: # type: 挂载类型 'volume'、'bind'、'tmpfs'、'npipe' - type: volume # 挂载的来源(宿主机路径,或者下方顶级 volumes 定义的卷),不适用于 tmpfs 挂载 source: db-data # 映射到容器内的路径 target: /data # 配置额外的选项 volume: # 禁止从容器复制数据 nocopy: true - type: bind source: /var/run/postgres/postgres.sock target: /var/run/postgres/postgres.sock # 定义一个名为 db-data 的卷,可以给多个服务挂载 volumes: db-data:可以直接使用 <宿主机路径>:<容器内路径> 或者 <宿主机路径>:<容器内路径>:<访问模式>访问模式:rw:可读可写(默认的权限)ro:只读模式(read only)z: SELinux 选项表示绑定挂载主机内容在多个容器之间共享Z: SELinux 选项表示绑定挂载主机内容是私有的,对其他容器不共享在没有 SELinux 的平台上,SELinux 重新标记绑定挂载选项会被忽略。volumes: # 使用绝对路径挂载数据卷 - /opt/data:/var/lib/mysql # 使用绝对路径挂载数据卷,并配置访问模式 - ~/configs:/etc/configs/:rovolumes_from从另一个服务或者容器挂载数据卷volumes_from: # 指定服务 - service_name # 指定服务,并配置访问模式 - service_name:ro # container: 是固定格式, container_name 指定外部容器的名称 - container:container_name # 指定外部容器,并配置访问模式 - container:container_name:rwdns配置 dnsdns: - 8.8.8.8 - 114.114.114.114dns_searchdns 搜索域dns_search: - dc1.example.com - dc2.example.comentrypoint指定容器运行时执行的命令,会覆盖 Dockerfile 的 ENTRYPOINTentrypoint: - php - zend_extension=/usr/local/lib/php/extensions/no-debug-non-zts-20100525/xdebug.so - memory_limit=-1 - vendor/bin/phpunitenv_file以文件的形式,在构建的时候将变量写入到容器内的 env 里面env_file: - ./common.env - ./apps/web.env - /opt/secrets.envenv_file 格式每一行必须是 变量[=[变量值]]# 和 空格 表示注释只写 变量,没有=变量值,表示 unset 取消变量# Set Rails/Rack environment RACK_ENV=development VAR="quoted"environment定义容器内的环境变量environment: - RACK_ENV=development - SHOW=true如果同时设置了 env_file 和 environment ,以 environment 的为准devices指定设备映射关系devices: - "/dev/ttyUSB1:/dev/ttyUSB0expose只能指定容器内暴露的端口expose: - "3000" - "8000"extends调用其他模板文件有一个基础模板,名称为 common.ymlwebapp: build: ./webapp environment: - DEBUG=false - SEND_EMAILS=false进行调用web: extends: file: common.yml service: webapp ports: - "8000:8000" links: environment: - DEBUG=true image: mysqlextends 使用限制要避免出现循环依赖extends 不会继承 links 和 volumes_from 中定义的容器和数据卷资源推荐在基础模板中只定义一些可以共享的镜像和环境变量在扩展模板中具体指定应用变量、链接、数据卷等信息labels为容器添加元数据labels: - "com.example.description=Accounting webapp" - "com.example.department=Finance" - "com.example.label-with-empty-value"logging配置日志服务logging: driver: syslog options: syslog-address: "tcp://192.168.0.42:123"network_mode指定容器的网络模式# 和宿主机相同的模式 network_mode: "host" # 禁用网络 network_mode: "none" # 只能访问指定的服务 network_mode: "service:[service name]"networks定义服务容器使用的网络配置,引用顶级 networks 的配置services: frontend: image: awesome/webapp networks: - front-tier - back-tier monitoring: image: awesome/monitoring networks: - admin backend: image: awesome/backend networks: back-tier: aliases: - database admin: # 声明网络上 admin 服务的备用主机名为 mysql # 同一网络上的其他容器可以使用服务名称或此别名连接到服务的容器 aliases: - mysql # 顶级 networks networks: front-tier: driver: bridge back-tier: admin:静态 ip顶级 networks 配置必须包含一个 ipam 子网配置可以配置 ipv4 或者 ipv6services: frontend: image: awesome/webapp networks: front-tier: ipv4_address: 172.16.238.10 ipv6_address: 2001:3984:3989::10 networks: front-tier: ipam: driver: default config: - subnet: "172.16.238.0/24" - subnet: "2001:3984:3989::/64"未声明顶级networks例如:项目所在目录的名称为:appdocker-compose.yaml 内容如下version: '3' services: image: nginx:latest container_name: web depends_on: ports: - "9090:80" links: image: mysql:5.7 volumes: - /data/db_data:/var/lib/mysql restart: always environment: MYSQL_ROOT_PASSWORD: 1234.com MYSQL_DATABASE: web MYSQL_USER: web MYSQL_PASSWORD: 1234.com执行 docker-compose up -d 命令后,会生成一个名称为 app_default 的网络,可以通过 docker network ls 命令查看resources容器使用的资源闲置limits:容器最高使用资源reservations:容器最低资源要求services: frontend: image: awesome/webapp deploy: resources: limits: # CPU 最多可用核心数 cpus: '0.50' memory: 50M # 限制容器内进程的数量(必须是整数) pids: 1 reservations: # 宿主机最少要有的空闲 cpu 核心数 cpus: '0.25' # 宿主机最少要有的空闲内存大小 memory: 20Mmemory 单位格式:b(bytes)、k 或者 kb、m 或者 mb、g 或者 gb全剧终compose 可配置的选项是相当的多,尤其是 version: '3'甚至可以限制容器使用的磁盘io,配置 cpu cfs 配额等许多许多的功能,具体的,有兴趣的可以参考官方的文档Compose specification还是 Dockerfile 从入门到放弃 里面的那句总结 (留点头发)
Docker Registry 官网Docker Registry 需要 Docker 版本高于等于 1.6.0Registry是一个无状态、高度可扩展的服务器侧应用程序,用于存储和允许您分发Docker镜像内网环境下,可以使用 Docker Registry 来解决k8s集群的镜像拉取问题,当然,公网情况下, Docker Registry 私密性更高,比共有仓库更适合如果需要 Docker Registry 开启认证功能,可以直接看配置 Docker Registry 认证搭建 Docker Registry创建本地映射目录这个目录可以自定义,根据自身实际磁盘空间情况进行创建,将容器内的文件映射到本地,以此来达到持久化的效果# mkdir /var/lib/registry启动 Docker Registrydocker命令中,冒号前面的为本地路径或端口,冒号后面的为容器内部的路径或端口-p:将本地5000端口映射给容器内的5000端口(Docker Registry默认端口),本地端口可以自定义,只要是空闲的端口即可--restart:容器的重启策略--name:启动的容器名称-v:将本地目录映射到容器内的/var/lib/registry目录-d:将容器放到后台运行registry:镜像名,不加tag,默认拉取latest,如果本地不存在,启动容器前,会自动拉取# docker run -p 5000:5000 \ --restart=always \ --name registry \ -v /var/lib/registry:/var/lib/registry \ -d registry配置 Docker Registry# vim /etc/docker/daemon.json注意json语法格式如果重启docker失败,日志有如下输出,表示daemon.json文件的格式有错误,注意最后是否需要加上逗号unable to configure the Docker daemon with file /etc/docker/daemon.json: invalid character '"' after object key:value pair"insecure-registries": ["ip:端口"]重启docker# systemctl daemon-reload # systemctl restart docker配置 Docker Registry 认证创建 Docker Registry 认证文件目录# mkdir /var/lib/registry_auth创建 Docker Registry 认证文件使用 Apache 的 htpasswd 来创建加密文件# yum install -y httpd-tools # htpasswd -Bbn admin admin > /var/lib/registry_auth/htpasswd启动带认证的 Docker RegistryREGISTRY_AUTH=htpasswd # 以 htpasswd 的方式认证REGISTRY_AUTH_HTPASSWD_REALM=Registry Realm # 注册认证REGISTRY_AUTH_HTPASSWD_PATH=/auth/htpasswd # 认证的用户密码# docker run -p 5000:5000 \ --restart=always \ --name registry \ -v /var/lib/registry:/var/lib/registry \ -v /var/lib/registry_auth/:/auth/ \ -e "REGISTRY_AUTH=htpasswd" \ -e "REGISTRY_AUTH_HTPASSWD_REALM=Registry Realm" \ -e "REGISTRY_AUTH_HTPASSWD_PATH=/auth/htpasswd" \ -d registry配置 Docker Registry# vim /etc/docker/daemon.json注意json语法格式如果重启docker失败,日志有如下输出,表示daemon.json文件的格式有错误,注意最后是否需要加上逗号unable to configure the Docker daemon with file /etc/docker/daemon.json: invalid character '"' after object key:value pair"insecure-registries": ["ip:端口"]重启docker# systemctl daemon-reload # systemctl restart docker登录 Docker Registry登录可以是免交互式,也可以是交互式的docker login -u 用户名 -p 密码 ip:端口 # 一般不建议使用明文密码docker login -u 用户名 -p ip:端口 # 不输入密码,回车后,使用交互式输入密码(输入的密码不会显示)docker login ip:端口 # 不输入密码和用户名,回车后,使用交互式输入用户名和密码(输入的密码不会显示)# docker login ip:端口 Username: admin Password: WARNING! Your password will be stored unencrypted in /root/.docker/config.json. Configure a credential helper to remove this warning. See https://docs.docker.com/engine/reference/commandline/login/#credentials-store Login Succeeded退出登录# docker logout ip:端口测试 Docker Registry给镜像打上 Docker Registry 的仓库标签# docker tag centos:7 ip:端口/centos:7将新打标签的镜像上传镜像到仓库# docker push ip:端口/centos:7查看镜像,可以看到我们上传的 centos 7 这个镜像了# curl ip:端口/v2/_catalog -u admin Enter host password for user 'admin': {"repositories":["centos"]}查看镜像包含的tag需要先使用_catalog查看镜像的名称# curl ip:端口/v2/centos/tags/list -u admin Enter host password for user 'admin': {"name":"centos","tags":["8","7","7.1"]}需要先使用_catalog查看镜像的名称# curl ip:端口/v2/centos/tags/list -u admin Enter host password for user 'admin': {"name":"centos","tags":["8","7","7.1"]}
当然。你需要现有一个vmware,还要有一个suse的镜像,suse镜像可以直接去官网获取,只需要注册一个suse的账号就可以了,官网下载会有点慢,可以使用迅雷下载F2 可以设置安装界面的语言,可以设置为中文F3 设置安装界面的模式,分辨率选择Installation回车,开始安装有网络的情况下,会去suse的官网下载更新,速度极慢,也会失败,直接点击Abort取消,然后点击Skip跳过即可Langaue 选择系统默认语言,最好是用English,中文有时候复制排错会比较尴尬(英语装X他不香嘛)Agree同意协议,点击Next进入下一个环节这里会让你填一些企业的信息一类的内容,直接跳过即可安装一些服务,既然最小化,当然是不装了,直接下一步系统的角色,选默认的就行,如果用来做KVM,可以选择第二个,第三个没了解过,有兴趣的可以了解一下,然后下一步这里是磁盘分区,没有特殊要求,自己学习的话,可以直接下一步,有需求的话,可以选择Create Partition Setup,选择自己的磁盘,安装自己的需求进行分区选择分区,可以直接点地图,把Hardware Clock Set to UTC取消,也可以不取消,有兴趣,可以了解一下UTC和CST时间的一些区别,然后选择Other Setting这里选择时间同步服务器,否则创建出来的系统,时间会不对(当然,有网络的前提下)设置用户和密码,可以勾选Use this password for system administrator,使root用户和设置的用户密码一样,也可以不勾选,进入系统后,再创建root用户的密码一些系统安装的设置,关闭Kdump,关闭Firewall,选择要安装的Sofware关闭Kdump如果想安装最小化系统的话,Base Sysrtem 、 GNOME Desktop 、 X Window System这三个不安装就可以了,选中后右键,点击 Do Not Install 即可最后点击OK,开始安装,重启后,就是一个最小化的suse 12 sp3的环境了,在vmware中部署出来的这套系统,只有1.55G,很好的减少了磁盘的占用
其实,咱也不知道为啥写了这篇博客,咱就是想学一学suse,咱也不会,咱也只能学,只能查 [# 上个月部署公司新版本产品,使用的是ansible部署的,由于suse的一些特殊性(暂时还没有去研究ansible的zypper部署方式,最终是用python的方式部署的ansible),想起之前用过yum的缓存方式去创建本地源,就想着给suse也整一个,最近整完之后,在思考几个问题,如下: 1、'虽然机器是客户的,无法添加光驱,但是我可以把iso镜像里面的文件压缩一下,上传到服务器就可以了(iso里面的源已经压缩了,但是文件比较大,要3.7G,上传到客户环境不是狠方便,暂且搁置,毕竟需要啥上传啥挺好的,各有优缺点)' 2、'虽然解决了本地源的情况,目前还有一些问题,比如一些开源软件在centos上,可以利用repo源进行yum安装,不知道suse上是否可以利用zypper实现,还需要自己实践,毕竟这样配合本地源,才有本地源的意义,否则不如直接编译安装(如果有大佬知道,希望可以赐教)' 3、'虽然可以使用编译的方式去部署一些服务,但是从脚本编写的角度考虑,还是觉得rpm包安装的方式,写脚本更舒服一点把(来自菜鸡的错觉~~~)']zypper-cache:~ # cat /etc/issue Welcome to SUSE Linux Enterprise Server 12 SP3 (x86_64) - Kernel \r (\l).服务端获取添加源1、# 添加网络源,之所以写repo文件,是因为zypper ar添加的repo源,没有gpgcheck=0这一项配置,安装软件的时候会报错,在加上还要写keeppackages=1,所以还是觉得写repo文件更舒服把,后期自己需要复制使用也狠方便 cat > /etc/zypp/repos.d/opensuse.repo << EOF [opensuse-non-oss] # 以下都是suse12的国内源(中国科技大学镜像站)(国内源速度快) name=opensuse-non-oss enabled=1 autorefresh=0 baseurl=http://mirrors.ustc.edu.cn/opensuse/distribution/openSUSE-current/repo/non-oss gpgcheck=0 keeppackages=1 [opensuse-oss] name=opensuse-oss enabled=1 autorefresh=0 baseurl=http://mirrors.ustc.edu.cn/opensuse/distribution/openSUSE-current/repo/oss gpgcheck=0 keeppackages=1 [opensuse-update-non-oss] name=opensuse-update-non-oss enabled=1 autorefresh=0 baseurl=http://mirrors.ustc.edu.cn/opensuse/update/openSUSE-non-oss-current/ gpgcheck=0 keeppackages=1 [opensuse-update] name=opensuse-update enabled=1 autorefresh=0 baseurl=http://mirrors.ustc.edu.cn/opensuse/update/openSUSE-current/ gpgcheck=0 keeppackages=1 zypper-cache:~ # zypper refresh 2、# 添加本地源 zypper-cache:~ # zypper ar /root/suse12-dvd suse12-dvd # 本地源需要自己准备rpm目录 zypper-cache:~ # echo "gpgcheck=0" >> /etc/zypp/repos.d/suse12-dvd.repo zypper-cache:~ # echo "keeppackages=1" >> /etc/zypp/repos.d/suse12-dvd.repo刷新源zypper-cache:~ # vim /etc/zypp/zypp.conf packagesdir = /var/cache/zypp/packages zypper-cache:~ # zypper refresh # 刷新一下源 zypper-cache:~ # zypper lr Repository priorities are without effect. All enabled repositories share the same priority. # | Alias | Name | Enabled | GPG Check | Refresh --+-------------------------+-------------------------+---------+-----------+-------- 1 | SLES12-SP3-12.3-0 | SLES12-SP3-12.3-0 | No | ---- | ---- # 虚拟机安装的suse12,会自带镜像里面的源(需要光驱开机自启,生产环境不一定有,所以为了测试,吧这个源禁用了,可以不操作) 2 | opensuse-non-oss | opensuse-non-oss | Yes | ( ) No | No # repo文件没有问题的情况下,这些源是可以被 zypper lr 查看的 3 | opensuse-oss | opensuse-oss | Yes | ( ) No | No 4 | opensuse-update | opensuse-update | Yes | ( ) No | No 5 | opensuse-update-non-oss | opensuse-update-non-oss | Yes | ( ) No | No 6 | suse12-dvd | suse12-dvd | Yes | ( ) No | No清除缓存zypper-cache:~ # zypper clean All repositories have been cleaned up.安装软件zypper-cache:~ # zypper in sl获取rpm包zypper-cache:~ # mkdir rpmcache/sl zypper-cache:~ # find /var/cache/zypp/ -name "*.rpm" -exec mv {} /root/rpmcache/sl/ \; zypper-cache:~ # scp -r rpmcache/sl/ 192.168.10.158:/root客户端测试linux-oz6w:~ # zypper ar /root/sl/ sl # 创建本地源 linux-oz6w:~ # echo "gpgcheck=0" >> /etc/zypp/repos.d/sl.repo linux-oz6w:~ # zypper in sl # 成功安装即可zypper --helpRepository Management: # zypper 后面可以带简写,比如lr ar ref 具体可以参考zypper --help repos, lr List all defined repositories. addrepo, ar Add a new repository. removerepo, rr Remove specified repository. renamerepo, nr Rename specified repository. modifyrepo, mr Modify specified repository. refresh, ref Refresh all repositories. clean Clean local caches. Software Management: install, in Install packages. remove, rm Remove packages. verify, ve Verify integrity of package dependencies. source-install, si Install source packages and their build dependencies. install-new-recommends, inr Install newly added packages recommended by installed packages. Update Management: update, up Update installed packages with newer versions. list-updates, lu List available updates. patch Install needed patches. list-patches, lp List needed patches. dist-upgrade, dup Perform a distribution upgrade. patch-check, pchk Check for patches.
方法一安装setup-toolslinux-oz6w:~ # wget https://pypi.python.org/packages/source/s/setuptools/setuptools-11.3.tar.gz linux-oz6w:~ # tar xf setuptools-11.3.tar.gz linux-oz6w:~ # cd setuptools-11.3/ linux-oz6w:~ # python setup.py install安装piplinux-oz6w:~ # easy_install https://mirrors.aliyun.com/pypi/packages/0b/f5/be8e741434a4bf4ce5dbc235aa28ed0666178ea8986ddc10d035023744e6/pip-20.2.4.tar.gz#sha256=85c99a857ea0fb0aedf23833d9be5c40cf253fe24443f0829c7b472e23c364a1 .......... creating /usr/lib/python2.7/site-packages/pip-20.2.4-py2.7.egg Extracting pip-20.2.4-py2.7.egg to /usr/lib/python2.7/site-packages Adding pip 20.2.4 to easy-install.pth file Installing pip script to /usr/bin Installing pip2.7 script to /usr/bin Installing pip2 script to /usr/bin Installed /usr/lib/python2.7/site-packages/pip-20.2.4-py2.7.egg Processing dependencies for pip==20.2.4 Finished processing dependencies for pip==20.2.4 # https://mirrors.aliyun.com/pypi/simple/pip/ 这是阿里云上面pip的tar包,python官方很多时候会受限,速度也不稳定,经常超时,没办法搞定方法二linux-oz6w:~ # wget https://github.com/imcxsen/python/blob/master/get_pip.py linux-oz6w:~ # python get_pip.py DEPRECATION: Python 2.7 reached the end of its life on January 1st, 2020. Please upgrade your Python as Python 2.7 is no longer maintained. pip 21.0 will drop support for Python 2.7 in January 2021. More details about Python 2 support in pip can be found at https://pip.pypa.io/en/latest/development/release-process/#python-2-support pip 21.0 will remove support for this functionality. Collecting pip Downloading pip-20.3.1-py2.py3-none-any.whl (1.5 MB) |████████████████████████████████| 1.5 MB 3.5 MB/s Collecting setuptools Downloading setuptools-44.1.1-py2.py3-none-any.whl (583 kB) |████████████████████████████████| 583 kB 6.0 MB/s Collecting wheel Downloading wheel-0.36.0-py2.py3-none-any.whl (34 kB) Installing collected packages: pip, setuptools, wheel Successfully installed pip-20.3.1 setuptools-44.1.1 wheel-0.36.0配置阿里云pip源linux-oz6w:~ # mkdir ~/.pip linux-oz6w:~ # cat > ~/.pip/pip.conf << EOF [global] trusted-host=mirrors.aliyun.com index-url=https://mirrors.aliyun.com/pypi/simple/ EOFpip安装pyotplinux-oz6w:~ # pip install pyotp DEPRECATION: Python 2.7 reached the end of its life on January 1st, 2020. Please upgrade your Python as Python 2.7 is no longer maintained. pip 21.0 will drop support for Python 2.7 in January 2021. More details about Python 2 support in pip can be found at https://pip.pypa.io/en/latest/development/release-process/#python-2-support pip 21.0 will remove support for this functionality. Looking in indexes: https://mirrors.aliyun.com/pypi/simple/ Collecting pyotp Downloading https://mirrors.aliyun.com/pypi/packages/6f/5f/b8b985153df5516386e2918ab97ac836abfe88dc420cd3211d7b9e30814e/pyotp-2.4.1-py2.py3-none-any.whl (11 kB) Installing collected packages: pyotp Successfully installed pyotp-2.4.1 # 安装成功
1、通过 pip download 下载安装包linux-oz6w:~ # mkdir pip-ansible linux-oz6w:~ # pip download ansible -d /root/pip-ansible Looking in indexes: https://mirrors.aliyun.com/pypi/simple/ # 可以看到,这里是使用阿里云的pip源下载ansible的tar包和whl文件 linux-oz6w:~ # ll pip-ansible/ # 这是pip download下载下来的所有依赖的安装包 total 37156 -rw-r--r-- 1 root root 125774 Jan 6 22:44 Jinja2-2.11.2-py2.py3-none-any.whl -rw-r--r-- 1 root root 24348 Jan 6 22:44 MarkupSafe-1.1.1-cp27-cp27mu-manylinux1_x86_64.whl -rw-r--r-- 1 root root 269377 Jan 6 22:44 PyYAML-5.3.1.tar.gz -rw-r--r-- 1 root root 28622006 Jan 6 22:44 ansible-2.10.4.tar.gz -rw-r--r-- 1 root root 5708083 Jan 6 22:44 ansible-base-2.10.4.tar.gz -rw-r--r-- 1 root root 389322 Jan 6 22:44 cffi-1.14.4-cp27-cp27mu-manylinux1_x86_64.whl -rw-r--r-- 1 root root 2626135 Jan 6 22:44 cryptography-3.3.1-cp27-cp27mu-manylinux2010_x86_64.whl -rw-r--r-- 1 root root 11223 Jan 6 22:44 enum34-1.1.10-py2-none-any.whl -rw-r--r-- 1 root root 18159 Jan 6 22:44 ipaddress-1.0.23-py2.py3-none-any.whl -rw-r--r-- 1 root root 39857 Jan 6 22:44 packaging-20.8-py2.py3-none-any.whl -rw-r--r-- 1 root root 112041 Jan 6 22:44 pycparser-2.20-py2.py3-none-any.whl -rw-r--r-- 1 root root 67842 Jan 6 22:44 pyparsing-2.4.7-py2.py3-none-any.whl -rw-r--r-- 1 root root 10963 Jan 6 22:44 six-1.15.0-py2.py3-none-any.whl2、利用 pip install --no-index 离线安装linux-oz6w:~ # pip install --no-index --find-links=/root/pip-ansible/ --ignore-installed /root/pip-ansible/* Looking in links: /root/pip-ansible/ # 这里就按照指定的目录去找包安装了 ...... Successfully installed MarkupSafe-1.1.1 PyYAML-5.3.1 ansible-2.10.4 ansible-base-2.10.4 cffi-1.14.4 cryptography-3.3.1 enum34-1.1.10 ipaddress-1.0.23 jinja2-2.11.2 packaging-20.8 pycparser-2.20 pyparsing-2.4.7 six-1.15.0 ----------------------------------------------------------------------------------- # 命令解析 --no-index # 忽略包索引(只查看--find-links) --find-links # --find-links 指向URL,html文件,tar.gz,whl或者目录,不支持指向VCS项目URL的链接 --ignore-installed # 忽略已安装的软件包,覆盖它们 /root/pip-ansible/ansible-2.10.4.tar.gz # 需要安装的软件
效果展示===================== 2022/05/20-09:12:11+0000 ===================== ===================== check system ===================== [INFO] [2022/05/20-09:12:11+0000] Hostname: test-master-01 [INFO] [2022/05/20-09:12:11+0000] Ipaddress: ## 和谐了,不给看 ## [INFO] [2022/05/20-09:12:11+0000] Os-release: CentOS Linux 7 (Core) GNU/Linux [INFO] [2022/05/20-09:12:11+0000] Kernel: Linux 3.10.0-1127.19.1.el7.x86_64 x86_64 GNU/Linux [INFO] [2022/05/20-09:12:11+0000] Up Days: 143 days [INFO] [2022/05/20-09:12:11+0000] Os Language: en_US.UTF-8 ===================== check cpu ===================== [INFO] [2022/05/20-09:12:11+0000] CPU Model: AMD EPYC 7571 [INFO] [2022/05/20-09:12:11+0000] Physical CPUS: 1 [INFO] [2022/05/20-09:12:11+0000] Processor CPUS: 4 [INFO] [2022/05/20-09:12:11+0000] CPU Cores: 2 [INFO] [2022/05/20-09:12:11+0000] Load Average: 0.55 , 0.44 , 0.51 [INFO] [2022/05/20-09:12:11+0000] CPU Usage: 15.66% ===================== check memory ===================== [INFO] [2022/05/20-09:12:11+0000] Mem Total: 15.14GiB [INFO] [2022/05/20-09:12:11+0000] Mem Used: 12.07GiB [INFO] [2022/05/20-09:12:11+0000] Mem Available: 3.07GiB [INFO] [2022/05/20-09:12:11+0000] Mem Usage: 79.73% ===================== check disk ===================== [INFO] [2022/05/20-09:12:11+0000] Disk Info: [INFO] [2022/05/20-09:12:11+0000] /dev/nvme0n1p1 xfs 100G 55G 46G 55% / [INFO] [2022/05/20-09:12:11+0000] Disk Inode Info: [INFO] [2022/05/20-09:12:11+0000] /dev/nvme0n1p1 xfs 50M 722K 50M 2% / ===================== check kubernetes ===================== [INFO] [2022/05/20-09:12:11+0000] Apiserver Cert Not After: Mar 16 07:45:06 2023 GMT [INFO] [2022/05/20-09:12:11+0000] Node Status: test-master-01 is Ready [INFO] [2022/05/20-09:12:11+0000] Node Status: test-master-02 is Ready [INFO] [2022/05/20-09:12:11+0000] Node Status: test-master-03 is Ready [INFO] [2022/05/20-09:12:11+0000] Node Status: test-node-01 is Ready [INFO] [2022/05/20-09:12:11+0000] Node Status: test-node-02 is Ready [INFO] [2022/05/20-09:12:11+0000] Node Status: test-node-03 is Ready [INFO] [2022/05/20-09:12:11+0000] Node Status: test-node-04 is Ready [INFO] [2022/05/20-09:12:11+0000] Node Status: test-node-05 is Ready [INFO] [2022/05/20-09:12:11+0000] Top Nodes: test-master-01 552m 13% 12590Mi 81% [INFO] [2022/05/20-09:12:11+0000] Top Nodes: test-master-02 399m 9% 9644Mi 62% [INFO] [2022/05/20-09:12:11+0000] Top Nodes: test-master-03 534m 13% 10336Mi 67% [INFO] [2022/05/20-09:12:11+0000] Top Nodes: test-node-01 679m 16% 21175Mi 67% [INFO] [2022/05/20-09:12:11+0000] Top Nodes: test-node-02 591m 14% 21119Mi 66% [INFO] [2022/05/20-09:12:11+0000] Top Nodes: test-node-03 674m 16% 23677Mi 75% [INFO] [2022/05/20-09:12:11+0000] Top Nodes: test-node-04 564m 14% 23123Mi 73% [INFO] [2022/05/20-09:12:11+0000] Top Nodes: test-node-05 558m 13% 22760Mi 72%目录结构├── config │ └── conf.sh └── inspection.shconfig/conf.sh#!/usr/bin/env bash # 需要检查的目录 ## 有的场景下,数据目录是单独挂载的磁盘,也要巡检 ## / 根目录也要巡检,不要删除 disk_lists=' /data # CPU 使用率告警上线 cpu_limit='85%' # 内存使用率告警上线 mem_limit='85%' # 磁盘使用率告警上线 ## kubelet 的默认驱逐条件是磁盘使用率超过85% ## 如果有kubelet 服务,建议设定在 70% - 80% 之间 disk_limit='75%' # 磁盘 inode 使用率告警上线 disk_inode_limit='85%' # apiserver 证书的绝对路径 ## kubeadm 默认为 /etc/kubernetes/pki/apiserver.crt api_cert_file='/etc/kubernetes/pki/apiserver.crt' # 证书剩余多少天到期时间提醒 cert_expires='30' # kubectl 命令证书路径 kube_config='/root/.kube/config'inspection.sh#!/usr/bin/env bash # 定义脚本当前所在路径 base_dir=$(cd `dirname "$0"`; pwd) # 定义配置文件的路径和名称 conf_file="${base_dir}/config/conf.sh" # 定义日志存储目录 log_dir="${base_dir}/logs" # 定义标准日志文件名称 log_file="${log_dir}/$(date +%Y-%m-%d)-INFO.log" # 定义告警日志文件名称 warn_log="${log_dir}/$(date +%Y-%m-%d)-WARN.log" # 定义时间格式 time_style="$(date +%Y/%m/%d-%T%z)" # 定义 df 命令的参数,可以根据实际情况进行修改 df_cmd="df -Th -x devtmpfs -x tmpfs -x debugfs -x aufs -x overlay -x fuse.glusterfs" # 定义日志压缩时间,数字表示多少天 tar_time=7 # 定义日志压缩路径 tar_dir=$(date +%Y-%m-%d -d "${tar_time} days ago") # 定义 tar 包名称 tar_name="${tar_dir}.tgz" function check_config () { # 检查配置文件是否存在 if [[ -f "${conf_file}" ]];then # 调用配置文件内的变量 source ${conf_file} # disk_lists 变量值为空,则 disk_lists 变量值默认为 / disk_lists=${disk_lists:-'/'} # cpu_limit 变量值为空,则 cpu_limit 变量值默认为 85% cpu_limit=${cpu_limit:-'85%'} # mem_limit 变量值为空,则 mem_limit 变量值默认为 85% mem_limit=${mem_limit:-'85%'} # disk_limit 变量值为空,则 disk_limit 变量值默认为 75% ## 因为 kubelet 默认的驱逐机制是磁盘使用率超过 85% disk_limit=${disk_limit:-'75%'} # disk_inode_limit 变量值为空,则 disk_inode_limit 变量值默认为 85% disk_inode_limit=${disk_inode_limit:-'85%'} # api_cert_file 变量值为空,则 api_cert_file 变量值默认为 /etc/kubernetes/pki/apiserver.crt api_cert_file=${api_cert_file:-'/etc/kubernetes/pki/apiserver.crt'} # cert_expires 变量值为空,则 cert_expires 变量值默认为 30 cert_expires=${cert_expires:-'30'} # kube_config 变量值为空,则 kube_config 变量值默认为 /root/.kube/config kube_config=${kube_config:-'/root/.kube/config'} kube_cmd="kubectl --kubeconfig ${kube_config}" # 配置文件不存在则退出脚本,并告知配置文件不存在 echo "${conf_file} is not found, please check it !" exit 0 function check_user () { local wai=$(id -u -n) # 当前用户不是 root 则退出脚本,并告知需要使用 root 用户执行 if [[ "${wai}"x != "root"x ]];then printf "\e[1;31mPlease use the root to execute this shell !\e[0m\n" exit 0 function print_terminal () { printf "\e[1;34m[INFO] [${time_style}] ${*}\e[0m\n" function print_info_title () { if [[ ! -f "${log_file}" ]];then echo "===================== ${*} =====================" >> ${log_file} echo " " >> ${log_file} echo "===================== ${*} =====================" >> ${log_file} function print_warn_title () { if [[ ! -f "${warn_log}" ]];then echo "===================== ${*} =====================" >> ${warn_log} echo " " >> ${warn_log} echo "===================== ${*} =====================" >> ${warn_log} function check_warn_title () { grep "${*}" ${warn_log} &> /dev/null || print_warn_title "${*}" function print_info () { # 标准日志输出格式 echo "[INFO] [${time_style}] ${*}" >> ${log_file} function print_warn () { # 告警日志输出格式 echo "[WARN] [${time_style}] ${*}" >> ${warn_log} function check_log_dir () { # 检查日志目录是否存在 [[ -d ${log_dir} ]] || mkdir -p ${log_dir} # 检查当天巡检日志文件是否存在 [[ ! -f ${log_file} ]] || mv ${log_file}{,-$(date +%T%z)} [[ ! -f ${warn_log} ]] || mv ${warn_log}{,-$(date +%T%z)} print_info_title "${time_style}" print_warn_title "${time_style}" function check_tar () { # 判断指定时间之前是否存在日志文件,存在日志文件则对文件进行压缩 ## 修改 tar_time 变量可以指定天数 local check_num=$(find ${log_dir} -mtime +${tar_time} -name *.log* | wc -l) # 判断指定时间之前是否存在打包文件,存在则删除 local check_tarnum=$(find ${log_dir} -mtime +${tar_time} -name *.tar.gz | wc -l) # 判断指定天数前的文件数量,大于等于 1 的情况下才做处理 if [[ "${check_num}" > 0 ]];then [[ -d "${log_dir}/${tar_dir}" ]] || mkdir -p "${log_dir}/${tar_dir}" [[ ! -f "${log_dir}/${tar_dir}/${tar_name}" ]] || mv ${log_dir}/${tar_dir}/${tar_name}{,-$(date +%T%z)} find ${log_dir} -mtime +${tar_time} -name *.log* -exec mv {} ${log_dir}/${tar_dir} \; &> /dev/null cd ${log_dir} && tar czf ${tar_name} ${tar_dir}/* && rm -rf ${tar_dir} # 判断指定天数之前的打包文件梳理,大于等于 1 的情况下才做处理 if [[ "${check_tarnum}" > 0 ]];then find ${log_dir} -mtime +${tar_time} -name *.tar.gz -exec rm -f {} \; print_terminal "check logs done" function check_system () { # 系统相关信息检查 print_info_title 'check system' # 主机名 get_hostname="$(cat /etc/hostname)" print_info "Hostname: ${get_hostname}" # ip 地址 [银联有双网卡的情况,并且无法使用 hostname -i 命令获取 ip 地址] ## k8s 全部使用的主机名,因此改用过滤 hosts 解析文件的方式来获取 ip 地址 local get_host_ip=$(hostname -i) print_info "Ipaddress: ${get_host_ip}" # 发行版 local get_os_release="$(awk -F '"' '/PRETTY_NAME/ {print $2}' /etc/os-release)" print_info "Os-release: ${get_os_release} $(uname -o)" local get_kernel="$(uname -srmo)" print_info "Kernel: ${get_kernel}" # 服务器启动时长 local get_up_secs="$(awk -F '.' '{print $1}' /proc/uptime)" local get_days="$(( ${get_up_secs} / 60 / 60 / 24 ))" print_info "Up Days: ${get_days} days" local os_lang=$(echo $LANG) print_info "Os Language: ${os_lang}" # swap 是否关闭 local chech_swap=$(grep -iv size /proc/swaps | wc -l) if [[ "${chech_swap}" == "0" ]];then print_info "Swap Status: off" check_warn_title 'check system' swapoff -a print_info "Swap Status: manual off" # firewalld 是否关闭 local firewalld_status=$(systemctl is-active firewalld) local firewalld_enable=$(systemctl is-enabled firewalld) if [[ "${firewalld_status}"x == "inactive"x ]];then print_info "Firewalld Status: dead" check_warn_title 'check system' systemctl stop firewalld print_warn "Firewalld Status: manual dead" if [[ "${firewalld_enable}"x == "disabled"x ]];then print_info "Firewalld Enabled: disabled" check_warn_title 'check system' systemctl disable firewalld print_warn "Firewalld Enabled: manual disabled" print_terminal "check system done" function check_cpu () { print_info_title "check cpu" # cpu 信息 local physical_cpus="$(grep "^physical id" /proc/cpuinfo | sort | uniq | wc -l)" local process_cpus="$(grep -c "^processor" /proc/cpuinfo)" local core_cpus="$(grep '^cpu cores' /proc/cpuinfo | tail -1 | awk '{print $NF}')" local cpu_model="$(grep "^model name" /proc/cpuinfo | awk -F ': ' '{print $2}' | sort | uniq)" print_info "CPU Model: ${cpu_model}" print_info "Physical CPUS: ${physical_cpus}" print_info "Processor CPUS: ${process_cpus}" print_info "CPU Cores: ${core_cpus}" # cpu 负载 local one_min="$(awk '{print $1}' /proc/loadavg)" local five_min="$(awk '{print $2}' /proc/loadavg)" local fif_min="$(awk '{print $3}' /proc/loadavg)" print_info "Load Average: ${one_min} , ${five_min} , ${fif_min}" # 检查 cpu 使用率 local cpu_util="$(awk '/cpu / {util=($2+$4)*100/($2+$4+$5); printf ("%.2f%"), util}' /proc/stat)" print_info "CPU Utilization: ${cpu_util}" # cpu 使用率超过 cpu_limit 配置的数值,打印 WARN 日志 if [[ "${cpu_util%%.*}" -ge "${cpu_limit%%%}" ]];then local top_cpu_use="$(ps -eo user,pid,pcpu,args --sort=-pcpu | head -n 10)" check_warn_title 'check cpu' print_warn "CPU utilization is ${cpu_util} , it's greater equal ${cpu_limit}, should be check !" # CPU 使用前十进程 print_warn "Top 10 CPU Use: " echo "${top_cpu_use}" >> ${warn_log} print_terminal "check cpu done" function check_mem () { print_info_title "check memory" # 检查内存使用率 local get_mem_info="$(awk '/MemTotal:/{total=$2/1024/1024;next} /MemAvailable:/{available=$2/1024/1024;use=total-available; printf("%.2fGiB %.2fGiB %.2fGiB %.2f%"),total,use,available,(use/total)*100}' /proc/meminfo)" # 内存总大小 local mem_total="$(awk '{print $1}' <<< ${get_mem_info})" # 已使用的内存大小 local mem_used="$(awk '{print $2}' <<< ${get_mem_info})" # 可以内存的大小 local mem_available="$(awk '{print $3}' <<< ${get_mem_info})" # 使用中内存的大小 local mem_util="$(awk '{print $4}' <<< ${get_mem_info})" # 内存使用率最高的十个进程 local top_mem_use="$(ps -eo user,pid,pmem,args --sort=-pmem | head -n 10)" print_info "Mem Total: ${mem_total}" print_info "Mem Used: ${mem_used}" print_info "Mem Available: ${mem_available}" print_info "Mem Utilization: ${mem_util}" # 内存使用率超过 mem_limit 配置的数值,打印 WARN 日志 if [[ "${mem_util%%.*}" -ge "${mem_limit%%%}" ]];then check_warn_title 'check memory' print_warn "Mem utilization is ${mem_util}, it's greater equal ${mem_limit}, should be check !" # 内存使用前十进程 print_warn "Top 10 Mem Use: " echo "${top_mem_use}" >> ${warn_log} print_terminal "check memory done" function check_disk () { print_info_title "check disk" print_info "Disk Info: " # 检查磁盘使用率 local disk_lists_array=($(printf "%q\n" ${disk_lists})) for (( i=0; i<${#disk_lists_array[@]}; i++ )) local disk_info=$(${df_cmd} | egrep "${disk_lists_array[i]}$") # df 使用了 -T 参数,因此使用率是第 6 列,如果有修改 df 参数,注意确认使用率的列数,并修改下面的位置变量 local disk_util="$(awk '{print $6}' <<< ${disk_info})" local disk_name="$(awk '{print $NF}' <<< ${disk_info})" [[ "${disk_info}"x != ""x ]] || break print_info "${disk_info}" # 磁盘使用率超过 disk_limit 配置的数值,打印 WARN 日志 if [[ "${disk_util%%%}" -ge "${disk_limit%%%}" ]];then check_warn_title 'check disk' print_warn "Disk ${disk_name} utilization is ${disk_util}, it's greater equal ${disk_limit}, should be check !" # 检查 inode 使用率 print_info '---' print_info "Disk Inode Info: " for (( i=0; i<${#disk_lists_array[@]}; i++ )) local disk_inode_info=$(${df_cmd} -i | egrep "${disk_lists_array[i]}$") # df 使用了 -T 参数,因此使用率是第 6 列,如果有修改 df 参数,注意确认使用率的列数,并修改下面的位置变量 local disk_inode_util="$(awk '{print $6}' <<< ${disk_inode_info})" local disk_inode_name="$(awk '{print $NF}' <<< ${disk_inode_info})" [[ "${disk_inode_info}"x != ""x ]] || break print_info "${disk_inode_info}" # 磁盘 inode 使用率超过 disk_limit 配置的数值,打印 WARN 日志 if [[ "${disk_inode_util%%%}" -ge "${disk_inode_limit%%%}" ]];then check_warn_title 'check disk' print_warn "Disk ${disk_inode_name} utilization is ${disk_inode_util}, it's greater equal ${disk_inode_limit}, should be check !" print_terminal "check disk done" function check_kubernetes () { print_info_title "check kubernetes" if [[ -f ${api_cert_file} ]];then # apiserver 证书到期时间 local cert_info="$(openssl x509 -in ${api_cert_file} -noout -text | awk -F ': ' '/Not After/ {print $2}')" local cert_time_stamp=$(date -d "${cert_info}" +%s) local cert_not_after="$(( (${cert_time_stamp} - $(date +%s)) / 86400 ))" print_info "Apiserver Cert Not After: ${cert_info}" if [[ "${cert_not_after}" -le "${cert_expires}" ]];then check_warn_title 'check kubernetes' print_warn "The apiserver cert will expire in ${cert_expires} days, please renewal !" if [[ -f "${kube_config}" ]];then # 节点是否都为 Ready 状态 local k8s_nodes_lists=$(${kube_cmd} get node --no-headers=true | awk '{print $1}') local k8s_lists_array=($(printf "%q\n" ${k8s_nodes_lists})) for (( h=0; h<${#k8s_lists_array[@]}; h++ )) local node_status=$(${kube_cmd} get nodes | awk "/${k8s_lists_array[h]}/ {print \$2}") if [[ "${node_status}"x == "Ready"x ]];then print_info "Node Status: ${k8s_lists_array[h]} is Ready" check_warn_title 'check kubernetes' print_warn "Node: ${k8s_lists_array[h]} is NotReady , please check !" # top node 查看 k8s 集群资源使用情况 ${kube_cmd} top node &> /dev/null if [[ "$?" -eq '0' ]];then for (( tn=0; tn<${#k8s_lists_array[@]}; tn++ )) local k_top_node=$(${kube_cmd} top node | awk "/${k8s_lists_array[tn]}/ {print \$0}") local node_cpu_usage="$(awk '{print $3}' <<< ${k_top_node})" local node_mem_usage="$(awk '{print $5}' <<< ${k_top_node})" print_info "Top Nodes: ${k_top_node}" if [[ "${node_cpu_usage%%%}" -ge "${cpu_limit%%%}" ]];then check_warn_title 'check kubernetes' print_warn "${k8s_lists_array[tn]} top node check: cpu usage is ${node_cpu_usage}, it's greater equal ${cpu_limit}, should be check !" if [[ "${node_mem_usage%%%}" -ge "${mem_limit%%%}" ]];then check_warn_title 'check kubernetes' print_warn "${k8s_lists_array[tn]} top node check: cpu usage is ${node_mem_usage}, it's greater equal ${mem_limit}, should be check !" print_info "This node's role is the work for kubernetes cluster" check_config check_user check_log_dir check_tar check_system check_cpu check_mem check_disk check_kubernetes
前言:由于客户的机器都是suse的,并且uid为0的用户有 root、sysop、appadmin 三个用户,导致有的时候远程连接,即使是以 root 用户的身份登录,也会出现当前用户不是 root 的情况,以至于部署和免密脚本会失败以下是通过注释 /etc/passwd 文件的方式,来暂时注销 sysop 和 appadmin 这两个用户,以此来达到当前uid为0的用户只有 root登陆 Linux 系统时,虽然输入的是自己的用户名和密码,但其实 Linux 并不认识你的用户名称,它只认识用户名对应的 ID 号(也就是一串数字)Linux 系统将所有用户的名称与 ID 的对应关系都存储在 /etc/passwd 文件中ip:192.168.72.12192.168.72.13192.168.72.14user:rootappadminsysop创建用户-m 创建用户的家目录(suse默认不会创建)-U 创建用户的基本组(suse默认不会创建)-o 允许创建重复的用户-u 指定用户uiduseradd -mUo appadmin -u 0 useradd -mUo sysop -u 0创建用户密码创建用户密码,由于suse的passwd命令没有--stdin参数,没法使用echo '<user password>' | passwd --stdin <user name>这个方式创建用户密码但是可以通过echo '<user name>:<user password>' | chpasswd的方式来创建用户密码这两种方式,是为了免交互为用户创建密码,直接使用passwd或者chpasswd命令,是需要输入两次密码,对于写脚本不友好echo 'appadmin:123.com' | chpasswd echo 'sysop:123.com' | chpasswd快速注释,方便快速回到root用户for i in $(awk -F : '{if($3==0){print$1}}' /etc/passwd | grep -v root);do sed -i "/$i/s/^/#/g" /etc/passwd;done快速取消注释for i in $(awk -F : '{if($3==0){print$1}}' /etc/passwd);do sed -i "/$i/s/^#//g" /etc/passwd;done免密脚本我的用户密码全部设置为123.com了,需要使用下面的脚本,记得修改密码为自己的用户密码#!/usr/bin/env bash ips=' 192.168.72.12 192.168.72.13 192.168.72.14 # 将上面的ip变量格式化成数组的形式 ip_arry=($(printf "%q\n" ${ips})) # 备份root用户的.ssh目录 [[ -d "/root/.ssh" ]] && mv /root/.ssh{,-$(date +"%F_%T")} # 生成新的公钥和私钥 ssh-keygen -t rsa -P "" -f /root/.ssh/id_rsa -q for (( i=0; i<${#ip_arry[@]}; i++ )) expect -c " spawn ssh-copy-id -i /root/.ssh/id_rsa.pub root@${ip_arry[i]} -o \"StrictHostKeyChecking no\" expect { \"*assword*\" {send \"123.com\r\"; exp_continue} \"*assword*\" {send \"123.com\r\";} done当前主机用户不是root通过whoami命令,确保当前的用户不是root,可以使用ssh sysop@ip来切换用户此时,执行上面的脚本会报错,如果-f指定的目录不是用户的家目录,ssh-keygen无法自动创建,需要手动创建/usr/bin/ssh-copy-id: ERROR: failed to open ID file '/root/.ssh/id_rsa.pub': No such file or directory当前主机用户是root,远程主机多个用户uid为0执行上面的脚本没有问题,可以实现root用户免密但是连接远程主机,查看当前用户,显示为非root用户ssh root@192.168.72.13 "whoami"利用shell脚本实现远程主机uid为0的用户只有root一个#!/usr/bin/env bash ips=' 192.168.72.12 192.168.72.13 192.168.72.14 ip_arry=($(printf "%q\n" ${ips})) user_name=$(awk -F : '{if($3==0){print$1}}' /etc/passwd | grep -v root) for (( i=0; i<${#ip_arry[@]}; i++ )) for name in ${user_name} j=($(printf "%q\n" ${name})) ssh root@${ip_arry[i]} "sed -i \"/${name[j]}/s/^/#/g\" /etc/passwd" done当然,服务器是人家的,所以,自己的服务搞定以后,要取消sysop和appadmin的注释,不能影响客户的使用,使用一下脚本实现了取消注释#!/usr/bin/env bash ips=' 192.168.72.12 192.168.72.13 192.168.72.14 ip_arry=($(printf "%q\n" ${ips})) user_name=$(awk -F : '{if($3==0){print$1}}' /etc/passwd | sed 's/#//') for (( i=0; i<${#ip_arry[@]}; i++ )) for name in ${user_name} j=($(printf "%q\n" ${name})) ssh root@${ip_arry[i]} "sed -i \"/${name[j]}/s/^#//g\" /etc/passwd" done仅以此篇来记录自己的学习和成长的过程,虽然并不适用与大家的场景,不过用来练习shell脚本,还是可以的---------------------------------更新时间:2021年11月22日---------------------------------多个UID=0用户之间的免密由于会有新场景上线,因此会有新机器需要做免密的情况,此时就要对所有UID=0的用户都完成免密才可以保证服务部署过程中,避免出现需要密码的情况,因此有了下面的脚本脚本需要通过位置变量引入一个清单文件,清单文件名称以自己本地实际的为准,我这里使用的是list.txt来做参考脚本执行方式: bash same_uid_ssh.sh list.txtlist.txt文件内的格式 [以一个空格为分隔符,依次为IP地址、用户名、用户密码‘]:192.168.70.26 sysop 123.com192.168.70.88 sysop 234.com192.168.70.89 sysop 345.com脚本依赖 expect 和 doc2unix 两个命令,执行前,建议先检查环境是否有这两个命令#!/bin/bash base_dir=$(cd `dirname $0`; pwd) check_sanme_uid=$(awk -F ':' '{if ($3==0) {print $1}}' /etc/passwd | wc -l) # 以数组的形式输出UID=0的所有用户 same_uid_user=($(awk -F ':' '{if ($3==0) {print $1}}' /etc/passwd)) # 脚本执行的时候没有带参数,则返回脚本执行方式 # list.txt非固定名称,只需要文件存在即可,文件内容以一个空格为分割,内容格式为: <ip地址> <用户名称> <用户密码> if [[ "$#" == 0 ]];then echo "Usage: bash $0 list.txt" exit 0 hosts_list=$1 user_host=($(awk '{print $1}' ${hosts_list})) user_name=($(awk '{print $2}' ${hosts_list})) user_pass=($(awk '{print $3}' ${hosts_list})) if [[ "${check_sanme_uid}" > 1 ]];then echo "system have ${check_sanme_uid} same uid users" # 生成ssh公钥 function make_ssh_pub () { # 判断脚本所在路径下是否有 authorized_keys 文件,如果存在则清空文件内容 [[ ! -f "${base_dir}/authorized_keys" ]] || > ${base_dir}/authorized_keys for (( i=0; i<${#same_uid_user[@]}; i++ )) # 输当前循环的用户名 echo "now is ${same_uid_user[i]}" # 通过for循环逐一注释 /etc/passwd 文件来达到当前UID=0的用户是唯一的 for change in $(awk -F ':' '{if ($3==0) {print $1}}' /etc/passwd | grep -v ${same_uid_user[i]}) sed -i "/${change}/s/^/#/g" /etc/passwd # 判断用户是否为root来区分用户的家目录 # [只是注释 /etc/passwd 文件无法达到切换环境变量的效果,无法使用系统变量 $HOME 来指定用户的家目录] if [[ "${same_uid_user[i]}"x == "root"x ]];then user_home="/root" user_home="/home/${same_uid_user[i]}" # 判断用户家目录下是否存在 .ssh 目录,存在则备份,后缀为: 年月日-时:分 [[ ! -d "${user_home}/.ssh" ]] || cp -r ${user_home}/.ssh{,.$(date +%Y%m%d-%H:%M)} # 为了保证环境干净,删除用户家目录的 .ssh 目录 rm -rf ${user_home}/.ssh # 静默生成用户ssh公钥,公钥格式为rsa ssh-keygen -t rsa -P "" -f ${user_home}/.ssh/id_rsa -q # 将用户的公钥追加到脚本所在路径下的 authorized_keys 文件内 # 后续只需要将脚本所在路径下的 authorized_keys 文件分发到其他节点指定用户家目录下的 .ssh 目录 # 以此来达到免密的效果 cat ${user_home}/.ssh/id_rsa.pub >> ${base_dir}/authorized_keys # 取消之前的 /etc/passwd 文件的注释,进入下一层循环时,会重新注释其他用户,避免漏注释,造成公钥缺失 for change in $(awk -F ':' '{if ($3==0) {print $1}}' /etc/passwd | grep -v ${same_uid_user[i]}) sed -i "/${change}/s/^#//g" /etc/passwd # 受到umask的影响,默认生成的文件权限为644,authorized_keys 文件默认权限为600,此处做一个赋权 chmod 600 ${base_dir}/authorized_keys function make_ssh_auth () { for (( host=0; host<${#user_host[@]}; host++ )) # 判断脚本所在路径下是否有 user_list.txt 这个文件,有则清空文件内容 [[ ! -f "${base_dir}/user_list.txt" ]] || > ${base_dir}/user_list.txt # 通过expect远程登录其他节点,查看UID=0的用户有哪些,避免环境差异,导致免密失败 # log_file将expect的输出重定向到指定文件中 [expect的log_file输出的文件格式不是unix的] # expect内如果需要ssh到其他节点使用awk命令,需要使用花括号来代替双引号,并且$符号前面需要加上转义符(\) expect -c " spawn ssh ${user_name[host]}@${user_host[host]} {awk -F ':' '{if (\$3==0) {print \$1}}' /etc/passwd} log_file ${base_dir}/user_list.txt expect { \"*es/no*\" {send \"yes\r\"; exp_continue} \"*assword*\" {send \"${user_pass[host]}\r\"; exp_continue} \"*assword*\" {send \"${user_pass[host]}\r\"; exp_continue} # 将dos格式的文件转换成unix格式的文件 [否则输出的内容会有dos文件的字符] dos2unix -o ${base_dir}/user_list.txt # 定义same_uid变量,以数组的形式定义从log_file文件内获取到的UID=0的用户 same_uid=($(printf "%q " `egrep -v 'spawn|assword' ${base_dir}/user_list.txt`)) for (( n=0; n<${#same_uid[@]}; n++ )) # 定义用户的家目录,root用户与其他用户的家目录不同 if [[ "${same_uid[n]}"x == "root"x ]];then user_home="/root" user_home="/home/${same_uid[n]}" # 通过expect解决交互问题 # 在用户的家目录下创建.ssh目录,并赋权700 expect -c " spawn ssh ${user_name[host]}@${user_host[host]} \"mkdir ${user_home}/.ssh -m 700\" expect { \"*es/no*\" {send \"yes\r\"; exp_continue} \"*assword*\" {send \"${user_pass[host]}\r\"; exp_continue} \"*assword*\" {send \"${user_pass[host]}\r\";} # 有些用户家目录下可能已经存在.ssh目录,这里重新赋权700,避免权限问题影响免密 expect -c " spawn ssh ${user_name[host]}@${user_host[host]} \"chmod 700 ${user_home}/.ssh\" expect { \"*es/no*\" {send \"yes\r\"; exp_continue} \"*assword*\" {send \"${user_pass[host]}\r\"; exp_continue} \"*assword*\" {send \"${user_pass[host]}\r\";} # 判断用户家目录下的.ssh目录下是否已有 authorized_keys 免密文件,有则备份 # expect内如果用到'[]'来判断目录或文件是否存在,也需要用花括号来代替双引号 expect -c " spawn ssh ${user_name[host]}@${user_host[host]} {[ ! -f ${user_home}/.ssh/authorized_keys ] || mv ${user_home}/.ssh/authorized_keys{,.bak}} expect { \"*es/no*\" {send \"yes\r\"; exp_continue} \"*assword*\" {send \"${user_pass[host]}\r\"; exp_continue} \"*assword*\" {send \"${user_pass[host]}\r\";} # 将脚本生成的 authorized_keys 免密文件分发到其他节点 expect -c " spawn scp ${base_dir}/authorized_keys ${user_name[host]}@${user_host[host]}:${user_home}/.ssh/authorized_keys expect { \"*es/no*\" {send \"yes\r\"; exp_continue} \"*assword*\" {send \"${user_pass[host]}\r\"; exp_continue} \"*assword*\" {send \"${user_pass[host]}\r\";} make_ssh_pub make_ssh_auth
环境准备IPHOSTNAMESERVICESYSTEM192.168.131.129mysql-master1mysqlCentOS7.6192.168.131.130mysql-slave1mysqlCentOS7.6192.168.131.131mysql-slave2mysqlCentOS7.6[root@localhost ~]# sestatus SELinux status: disabled [root@localhost ~]# systemctl status firewalld ● firewalld.service - firewalld - dynamic firewall daemon Loaded: loaded (/usr/lib/systemd/system/firewalld.service; disabled; vendor preset: enabled) Active: inactive (dead) Docs: man:firewalld(1) [root@localhost ~]# cat /etc/redhat-release CentOS Linux release 7.6.1810 (Core) [root@localhost ~]# hostnamectl --static set-hostname mysql-master1 [root@localhost ~]# hostnamectl --static set-hostname mysql-slave1 [root@localhost ~]# hostnamectl --static set-hostname mysql-master2 [root@localhost ~]# hostnamectl --static set-hostname mysql-slave2部署mysql# mysql-master和mysql-slave都需要部署 [root@mysql-master ~]# wget https://repo.mysql.com/mysql57-community-release-el7-11.noarch.rpm [root@mysql-master ~]# yum -y install mysql57-community-release-el7-11.noarch.rpm [root@mysql-master ~]# yum -y install yum-utils # 安装yum管理工具 [root@mysql-master ~]# yum-config-manager --disable mysql80-community # 禁用8.0版本 [root@mysql-master ~]# yum-config-manager --enable mysql57-community # 启用5.7版本 [root@mysql-master ~]# yum repolist enabled | grep mysql # 检查一下,确保只有一个版本 mysql-connectors-community/x86_64 MySQL Connectors Community 165 mysql-tools-community/x86_64 MySQL Tools Community 115 mysql57-community/x86_64 MySQL 5.7 Community Server 444 [root@mysql-master ~]# yum -y install mysql-community-server [root@mysql-master ~]# systemctl enable mysqld --now # 设为开机自启,并立即启动# 修改默认密码(MySQL从5.7开始不允许首次安装后使用空密码进行登录!为了加强安全性,系统会随机生成一个密码以供管理员首次登录使用,这个密码记录在/var/log/mysqld.log文件中) [root@mysql-master1 ~]# grep "temporary password" /var/log/mysqld.log 2020-08-11T01:38:32.872421Z 1 [Note] A temporary password is generated for root@localhost: pHj_Agoyi3of [root@mysql-master1 ~]# mysql -uroot -p'pHj_Agoyi3of' mysql> alter user 'root'@'localhost' identified by 'Test123.com';主从复制配置master1[root@mysql-master1 ~]# cp /etc/my.cnf{,.bak} [root@mysql-master1 ~]# > /etc/my.cnf [root@mysql-master1 ~]# vim /etc/my.cnf [mysqld] datadir = /var/lib/mysql socket = /var/lib/mysql/mysql.sock symbolic-links = 0 log-error = /var/log/mysqld.log pid-file = /var/run/mysqld/mysqld.pid #GTID: server_id = 1 gtid_mode = on enforce_gtid_consistency = on #binlog log_bin = mysql-bin log-slave-updates = 1 binlog_format = row sync-master-info = 1 sync_binlog = 1 #relay log skip_slave_start = 1[root@mysql-master1 ~]# systemctl restart mysqld[root@mysql-master1 ~]# mysql -uroot -p Enter password: mysql> show master status; # 查看master状态, 发现多了一项"Executed_Gtid_Set " +------------------+----------+--------------+------------------+-------------------+ | File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set | +------------------+----------+--------------+------------------+-------------------+ | mysql-bin.000001 | 154 | | | | +------------------+----------+--------------+------------------+-------------------+ 1 row in set (0.00 sec) mysql> show global variables like '%uuid%'; +---------------+--------------------------------------+ | Variable_name | Value | +---------------+--------------------------------------+ | server_uuid | 5d02c99a-db73-11ea-a39a-000c294ec5c2 | +---------------+--------------------------------------+ 1 row in set (0.00 sec) mysql> show global variables like '%gtid%'; # 查看确认gtid功能打开 +----------------------------------+-------+ | Variable_name | Value | +----------------------------------+-------+ | binlog_gtid_simple_recovery | ON | | enforce_gtid_consistency | ON | | gtid_executed | | | gtid_executed_compression_period | 1000 | | gtid_mode | ON | | gtid_owned | | | gtid_purged | | | session_track_gtids | OFF | +----------------------------------+-------+ 8 rows in set (0.00 sec) mysql> show variables like 'log_bin'; # 查看确认binlog日志功能打开 +---------------+-------+ | Variable_name | Value | +---------------+-------+ | log_bin | ON | +---------------+-------+ 1 row in set (0.00 sec) mysql> grant replication slave,replication client on *.* to slave@'192.168.%' identified by "Slave@123"; Query OK, 0 rows affected, 1 warning (0.04 sec) mysql> show grants for slave@'192.168.%'; +---------------------------------------------------------------------------+ | Grants for slave@192.168.% | +---------------------------------------------------------------------------+ | GRANT REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO 'slave'@'192.168.%' | +---------------------------------------------------------------------------+ 1 row in set (0.00 sec) mysql> show master status; # 查看master状态 +------------------+----------+--------------+------------------+----------------------------------------+ | File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set | +------------------+----------+--------------+------------------+----------------------------------------+ | mysql-bin.000001 | 466 | | | 5d02c99a-db73-11ea-a39a-000c294ec5c2:1 | +------------------+----------+--------------+------------------+----------------------------------------+ 1 row in set (0.00 sec)配置slave1[root@mysql-master1 ~]# cp /etc/my.cnf{,.bak} [root@mysql-master1 ~]# > /etc/my.cnf [root@mysql-master1 ~]# vim /etc/my.cnf [mysqld] datadir = /var/lib/mysql socket = /var/lib/mysql/mysql.sock symbolic-links = 0 log-error = /var/log/mysqld.log pid-file = /var/run/mysqld/mysqld.pid #GTID: server_id = 2 gtid_mode = on enforce_gtid_consistency = on #binlog log_bin = mysql-bin log-slave-updates = 1 binlog_format = row sync-master-info = 1 sync_binlog = 1 #relay log skip_slave_start = 1[root@mysql-master1 ~]# mysql -uroot -p Enter password: mysql> stop slave; Query OK, 0 rows affected, 1 warning (0.00 sec) mysql> change master to master_host='192.168.131.129',master_user='slave',master_password='Slave@123',master_auto_position=1; Query OK, 0 rows affected, 2 warnings (0.00 sec) mysql> start slave; Query OK, 0 rows affected (0.01 sec) mysql> show slave status \G; *************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: 192.168.131.129 Master_User: slave Master_Port: 3306 Connect_Retry: 60 Master_Log_File: mysql-bin.000001 Read_Master_Log_Pos: 466 Relay_Log_File: mysql-slave1-relay-bin.000002 Relay_Log_Pos: 679 Relay_Master_Log_File: mysql-bin.000001 Slave_IO_Running: Yes Slave_SQL_Running: Yes ......... ......... Auto_Position: 1 Replicate_Rewrite_DB: Channel_Name: Master_TLS_Version: 1 row in set (0.00 sec) # mysql-slave1节点已经和mysql-master1节点配置完成主从同步关系并行复制一般Mysql主从复制有三个线程参与,都是单线程:Binlog Dump(主) -> IO Thread (从) -> SQL Thread(从)。复制出现延迟一般出在两个地方: - SQL线程忙不过来 (可能需要应用数据量较大,可能和从库本身的一些操作有锁和资源的冲突;主库可以并发写,SQL线程不可以;主要原因) - 网络抖动导致IO线程复制延迟(次要原因)。 MySQL主从复制延迟的解决办法:MySQL从5.6开始有了SQL Thread多个的概念,可以并发还原数据,即并行复制技术。并行复制的机制,是MySQL的一个非常重要的特性,可以很好的解决MySQL主从延迟问题!MySQL 5.6版本支持所谓的并行复制,但是其并行只是基于schema的,也就是基于库的。 如果用户的MySQL数据库实例中存在多个schema,对于从机复制的速度的确可以有比较大的帮助。 但是基于schema的并行复制存在两个问题: 1) crash safe功能不好做,因为可能之后执行的事务由于并行复制的关系先完成执行,那么当发生crash的时候,这部分的处理逻辑是比较复杂的。 2) 最为关键的问题是这样设计的并行复制效果并不高,如果用户实例仅有一个库,那么就无法实现并行回放,甚至性能会比原来的单线程更差。而 单库多表是比多库多表更为常见的一种情形 。 注意:mysql 5.6 的MTS是基于库级别的并行,当有多个数据库时,可以将slave_parallel_workers设置为数据库的数量,为了避免新建库后来回修改,也可以将该参数设置的大一些。设置为库级别的事务时,不允许这样做,会报错。 在MySQL 5.7 中,引入了基于组提交的并行复制(官方称为Enhanced Multi-threaded Slaves,即MTS),设置参数 slave_parallel_workers>0 并且 global.slave_parallel_type='LOGICAL_CLOCK',即可支持一个 schema 下, slave_parallel_workers 个的 worker 线程并发执行 relay log 中主库提交的事务。 其核心思想:一个组提交的事务都是可以并行回放(配合 binary log group commit );slave 机器的relay log 中 last_committed 相同的事务( sequence_num 不同)可以并发执行。其中,变量 slave-parallel-type 可以有两个值: 1 )DATABASE 默认值,基于库的并行复制方式; 2 )LOGICAL_CLOCK,基于组提交的并行复制方式; MySQL 5.7是基于组提交的并行复制,并且是支持"真正"的并行复制功能,这其中最为主要的原因:就是slave服务器的回放与主机是一致的, 即master服务器上是怎么并行执行的slave上就怎样进行并行回放。不再有库的并行复制限制,对于二进制日志格式也无特殊的要求(基于库的并行复制也没有要求)。 MySQL5.7的并行复制,期望最大化还原主库的并行度,实现方式是在binlog event中增加必要的信息,以便slave节点根据这些信息实现并行复制。MySQL5.7的并行复制建立在group commit的基础上,所有在主库上能够完成prepared的语句表示没有数据冲突,就可以在slave节点并行复制。配置master1[root@mysql-master1 ~]# cp /etc/my.cnf{,.bak} [root@mysql-master1 ~]# vim /etc/my.cnf # 基于GTID主从复制结构,加入并行复制的配置 [mysqld] datadir = /var/lib/mysql socket = /var/lib/mysql/mysql.sock symbolic-links = 0 log-error = /var/log/mysqld.log pid-file = /var/run/mysqld/mysqld.pid #GTID: server_id = 1 gtid_mode = on enforce_gtid_consistency = on #binlog log_bin = mysql-bin log-slave-updates = 1 binlog_format = row sync-master-info = 1 sync_binlog = 1 #relay log skip_slave_start = 1 #不配置binlog_group_commit从库无法做到基于事物的并行复制 binlog_group_commit_sync_delay = 100 binlog_group_commit_sync_no_delay_count = 10 #为了数据安全再配置 sync_binlog=1 innodb_flush_log_at_trx_commit =1 #这个参数控制binlog写入 磁盘的方式。设置为1时,表示每次commit;都写入磁盘。这个刷新的是redo log 即ib_logfile0,而不是binlog[root@mysql-master1 ~]# systemctl restart mysqld[root@mysql-master1 ~]# mysql -uroot -p Enter password: mysql> show variables like 'binlog_group_commit_%'; +-----------------------------------------+-------+ | Variable_name | Value | +-----------------------------------------+-------+ | binlog_group_commit_sync_delay | 100 | | binlog_group_commit_sync_no_delay_count | 10 | +-----------------------------------------+-------+ 2 rows in set (0.01 sec) # 设置binlog_group_commit的上面两个参数,否则从库无法做到基于事物的并行复制! 这两个参数共同决定了是否触发组提交操作! # 第二个参数表示该事务组提交之前总共等待累积到多少个事务(如上要累计到10个事务); # 第一个参数则表示该事务组总共等待多长时间后进行提交(如上要总共等待100毫秒的时间),任何一个条件满足则进行后续操作。 # 因为有这个等待,可以让更多事务的binlog通过一次写binlog文件磁盘来完成提交,从而获得更高的吞吐量。配置slave1'记住:只要主数据库的mysqld服务重启,那么从数据库上就要重启slave,以恢复主从同步状态!!! [root@mysql-slave1 ~]# cp /etc/my.cnf{,.bak} [root@mysql-slave1 ~]# vim /etc/my.cnf [mysqld] datadir = /var/lib/mysql socket = /var/lib/mysql/mysql.sock symbolic-links = 0 log-error = /var/log/mysqld.log pid-file = /var/run/mysqld/mysqld.pid #GTID: server_id = 2 gtid_mode = on enforce_gtid_consistency = on #binlog log_bin = mysql-bin log-slave-updates = 1 binlog_format = row sync-master-info = 1 sync_binlog = 1 #relay log skip_slave_start = 1 read_only = on slave-parallel-type = LOGICAL_CLOCK #开启逻辑时钟的复制 slave-parallel-workers = 4 #这里设置线程数为4 (最大线程数不能超过16,即最大线程为16) master_info_repository = TABLE relay_log_info_repository = TABLE relay_log_recovery = on[root@mysql-slave1 ~]# systemctl restart mysqld[root@mysql-slave1 ~]# mysql -uroot -p Enter password: mysql> stop slave; Query OK, 0 rows affected, 1 warning (0.00 sec) mysql> start slave; Query OK, 0 rows affected (0.10 sec) mysql> show slave status \G *************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: 192.168.131.129 Master_User: slave Master_Port: 3306 Connect_Retry: 60 Master_Log_File: mysql-bin.000001 Read_Master_Log_Pos: 466 Relay_Log_File: mysql-slave1-relay-bin.000002 Relay_Log_Pos: 679 Relay_Master_Log_File: mysql-bin.000001 Slave_IO_Running: Yes Slave_SQL_Running: Yes ......... ......... Retrieved_Gtid_Set: 5d02c99a-db73-11ea-a39a-000c294ec5c2:1 Executed_Gtid_Set: 5d02c99a-db73-11ea-a39a-000c294ec5c2:1 Auto_Position: 1 Replicate_Rewrite_DB: Channel_Name: Master_TLS_Version: 1 row in set (0.00 sec) # mysql-slave1从数据库恢复了与mysql-master1主数据库的同步关系了# 查看线程数,这个跟在my.cnf文件里配置的是一样的 mysql> show variables like '%slave_para%'; +------------------------+---------------+ | Variable_name | Value | +------------------------+---------------+ | slave_parallel_type | LOGICAL_CLOCK | | slave_parallel_workers | 4 | +------------------------+---------------+ 2 rows in set (0.01 sec)半同步复制 默认情况下MySQL的复制是异步的,master将新生成的binlog发送给各slave后,无需等待slave的ack回复(slave将接收到的binlog写进relay log后才会回复ack),直接就认为这次DDL/DML成功了; 半同步复制(semi-synchronous replication)是指master将新生成的binlog发送给各slave时, 只需等待一个(默认)slave返回的ack信息就返回成功。 MySQL 5.7对半同步复制作了大改进,新增了一个master线程。 在MySQL 5.7以前,master上的binlog dump线程负责两件事:dump日志给slave的io_thread;接收来自slave的ack消息。它们是串行方式工作的。 在MySQL 5.7中,新增了一个专门负责接受ack消息的线程ack collector thread。这样master上有两个线程独立工作,可以同时发送binlog到slave和接收slave的ack。还新增了几个变量,其中最重要的是 rpl_semi_sync_master_wait_point ,它使得MySQL半同步复制有两种工作模型。半同步复制的两种类型 从MySQL 5.7.2开始,MySQL支持两种类型的半同步复制。这两种类型由变量 rpl_semi_sync_master_wait_point (MySQL 5.7.2之前没有该变量)控制,它有两种值:AFTER_SYNC和AFTER_COMMIT。在MySQL 5.7.2之后,默认值为AFTER_SYNC,在此版本之前,等价的类型为AFTER_COMMIT。这个变量控制的是master何时提交、何时接收ack以及何时回复成功信息给客户端的时间点。 - AFTER_SYNC模式:master将新事务写进binlog(buffer)后发送给slave,再sync到自己的binlog file(disk), 之后才允许接收slave的ack回复,接收到ack之后才会提交事务,并返回成功信息给客户端。 - AFTER_COMMIT模式:master将新事务写进binlog(buffer)后发送给slave,再sync到自己的binlog file(disk),然后直接提交事务。之后才允许接收slave的ack回复,然后再返回成功信息给客户端。AFTER_SYNC和AFTER_COMMIT的优缺点 AFTER_SYNC - 对于所有客户端来说,它们看到的数据是一样的,因为它们看到的数据都是在接收到slave的ack后提交后的数据。 - 这种模式下,如果master突然故障,不会丢失数据,因为所有成功的事务都已经写进slave的relay log中了,slave的数据是最新的。 AFTER_COMMIT - 不同客户端看到的数据可能是不一样的。对于发起事务请求的那个客户端,它只有在master提交事务且收到slave的ack后才能看到提交的数据。但对于那些非本次事务的请求客户端,它们在master提交后就能看到提交后的数据,这时候master可能还没收到slave的ack。 - 如果master收到ack回复前,slave和master都故障了,那么将丢失这个事务中的数据。 在MySQL 5.7.2之前,等价的模式是 AFTER_COMMIT ,在此版本之后,默认的模式为 AFTER_SYNC ,该模式能最大程度地保证数据安全性,且性能上并不比 AFTER_COMMIT 差。配置master1[root@mysql-master1 ~]# cp /etc/my.cnf{,.bak} [root@mysql-master1 ~]# vim /etc/my.cnf # 基于GTID主从复制和并行复制,加入半同步复制 [mysqld] datadir = /var/lib/mysql socket = /var/lib/mysql/mysql.sock symbolic-links = 0 log-error = /var/log/mysqld.log pid-file = /var/run/mysqld/mysqld.pid #GTID: server_id = 1 gtid_mode = on enforce_gtid_consistency = on #binlog log_bin = mysql-bin log-slave-updates = 1 binlog_format = row sync-master-info = 1 sync_binlog = 1 #relay log skip_slave_start = 1 #不配置binlog_group_commit从库无法做到基于事物的并行复制 binlog_group_commit_sync_delay = 100 binlog_group_commit_sync_no_delay_count = 10 #开启半同步复制 (超时时间为1s) plugin-load=rpl_semi_sync_master=semisync_master.so rpl_semi_sync_master_enabled = 1 rpl_semi_sync_master_timeout = 1000[root@mysql-master1 ~]# systemctl restart mysqld[root@mysql-master1 ~]# mysql -uroot -p Enter password: mysql> install plugin rpl_semi_sync_master soname 'semisync_master.so'; ERROR 1125 (HY000): Function 'rpl_semi_sync_master' already exists # 在mysql-master主数据库上加载 (前提是/usr/lib64/mysql/plugin/semisync_master.so 文件存在。 一般mysql安装后就默认产生),我的已经默认带有这个function mysql> select plugin_name, -> plugin_status from information_schema.plugins -> where plugin_name like '%semi%'; # 查看插件是否加载成功 +----------------------+---------------+ | plugin_name | plugin_status | +----------------------+---------------+ | rpl_semi_sync_master | ACTIVE | +----------------------+---------------+ 1 row in set (0.01 sec) mysql> show status like 'Rpl_semi_sync_master_status'; # 查看半同步是否在运行 +-----------------------------+-------+ | Variable_name | Value | +-----------------------------+-------+ | Rpl_semi_sync_master_status | ON | +-----------------------------+-------+ 1 row in set (0.00 sec)配置slave1[root@mysql-slave1 ~]# cp /etc/my.cnf{,.bak} [root@mysql-slave1 ~]# vim /etc/my.cnf [mysqld] datadir = /var/lib/mysql socket = /var/lib/mysql/mysql.sock symbolic-links = 0 log-error = /var/log/mysqld.log pid-file = /var/run/mysqld/mysqld.pid #GTID: server_id = 1 gtid_mode = on enforce_gtid_consistency = on #binlog log_bin = mysql-bin log-slave-updates = 1 binlog_format = row sync-master-info = 1 sync_binlog = 1 #relay log skip_slave_start = 1 #不配置binlog_group_commit从库无法做到基于事物的并行复制 binlog_group_commit_sync_delay = 100 binlog_group_commit_sync_no_delay_count = 10 #为了数据安全再配置 sync_binlog=1 innodb_flush_log_at_trx_commit =1 #这个参数控制binlog写入 磁盘的方式。设置为1时,表示每次commit;都写入磁盘。这个刷新的是redo log 即ib_logfile0,而不是binlog # 开启半同步复制 plugin-load=rpl_semi_sync_slave=semisync_slave.so rpl_semi_sync_slave_enabled=1[root@mysql-slave1 ~]# systemctl restart mysqld[root@mysql-slave1 ~]# mysql -uroot -p Enter password: mysql> install plugin rpl_semi_sync_master soname 'semisync_master.so'; Query OK, 0 rows affected (0.00 sec) # 在mysql-slave1从数据库上加载 (前提是/usr/lib64/mysql/plugin/semisync_slave.so 文件存在。 一般mysql安装后就默认产生) mysql> select plugin_name, -> plugin_status from information_schema.plugins -> where plugin_name like '%semi%'; # 查看插件是否加载成功 +----------------------+---------------+ | plugin_name | plugin_status | +----------------------+---------------+ | rpl_semi_sync_slave | ACTIVE | | rpl_semi_sync_master | ACTIVE | +----------------------+---------------+ 2 rows in set (0.00 sec) mysql> show status like 'Rpl_semi_sync_slave_status'; +----------------------------+-------+ | Variable_name | Value | +----------------------------+-------+ | Rpl_semi_sync_slave_status | OFF | +----------------------------+-------+ 1 row in set (0.00 sec) # 发现是OFF,这是因为此时还没有生效,必须从数据库上的IO线程才能生产!! mysql> stop slave IO_THREAD; Query OK, 0 rows affected, 1 warning (0.00 sec) mysql> start slave IO_THREAD; Query OK, 0 rows affected (0.00 sec) mysql> show status like 'Rpl_semi_sync_slave_status'; # 然后再查看mysql-slave1的半同步状态,发现就已经开启了! +----------------------------+-------+ | Variable_name | Value | +----------------------------+-------+ | Rpl_semi_sync_slave_status | ON | +----------------------------+-------+ 1 row in set (0.00 sec) mysql> show slave status \G # 再次查看主从同步状态,发现主从同步出现异常,这个时候再重启下slave即可! *************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: 192.168.131.129 Master_User: slave Master_Port: 3306 Connect_Retry: 60 Master_Log_File: mysql-bin.000002 Read_Master_Log_Pos: 194 Relay_Log_File: mysql-slave1-relay-bin.000005 Relay_Log_Pos: 4 Relay_Master_Log_File: mysql-bin.000002 Slave_IO_Running: Yes Slave_SQL_Running: No mysql> stop slave; Query OK, 0 rows affected (0.01 sec) mysql> start slave; Query OK, 0 rows affected (0.01 sec) mysql> show slave status \G *************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: 192.168.131.129 Master_User: slave Master_Port: 3306 Connect_Retry: 60 Master_Log_File: mysql-bin.000002 Read_Master_Log_Pos: 194 Relay_Log_File: mysql-slave1-relay-bin.000007 Relay_Log_Pos: 367 Relay_Master_Log_File: mysql-bin.000002 Slave_IO_Running: Yes Slave_SQL_Running: Yesmaster1查看Rpl_semimysql> show status like '%Rpl_semi%'; +--------------------------------------------+-------+ | Variable_name | Value | +--------------------------------------------+-------+ | Rpl_semi_sync_master_clients | 1 | | Rpl_semi_sync_master_net_avg_wait_time | 0 | | Rpl_semi_sync_master_net_wait_time | 0 | | Rpl_semi_sync_master_net_waits | 0 | | Rpl_semi_sync_master_no_times | 0 | | Rpl_semi_sync_master_no_tx | 0 | | Rpl_semi_sync_master_status | ON | | Rpl_semi_sync_master_timefunc_failures | 0 | | Rpl_semi_sync_master_tx_avg_wait_time | 0 | | Rpl_semi_sync_master_tx_wait_time | 0 | | Rpl_semi_sync_master_tx_waits | 0 | | Rpl_semi_sync_master_wait_pos_backtraverse | 0 | | Rpl_semi_sync_master_wait_sessions | 0 | | Rpl_semi_sync_master_yes_tx | 0 | +--------------------------------------------+-------+ 14 rows in set (0.00 sec) # 从上面信息,发现Rpl_semi_sync_master_clients的数值为1,说明此时mysql-master主数据库已经有一个半同步复制的从机,即mysql-slave1节点。 # Rpl_semi_sync_master_yes_tx的数值为0, 说明此时还没有半同步复制的sql语句被执行。主库写入数据后,Rpl_semi_sync_master_yes_tx的数值为sql语句的数量slave2加入主从复制&并行复制&半同步复制[root@mysql-slave2 ~]# cp /etc/my.cnf{,.bak} [root@mysql-slave2 ~]# > /etc/my.cnf [root@mysql-slave2 ~]# vim /etc/my.cnf [mysqld] datadir = /var/lib/mysql socket = /var/lib/mysql/mysql.sock symbolic-links = 0 log-error = /var/log/mysqld.log pid-file = /var/run/mysqld/mysqld.pid #GTID: server_id = 3 gtid_mode = on enforce_gtid_consistency = on #binlog log_bin = mysql-bin log-slave-updates = 1 binlog_format = row sync-master-info = 1 sync_binlog = 1 #relay log skip_slave_start = 1 read_only = on slave-parallel-type = LOGICAL_CLOCK #开启逻辑时钟的复制 slave-parallel-workers = 4 #最大线程16 master_info_repository = TABLE relay_log_info_repository = TABLE relay_log_recovery = on # 开启半同步复制 plugin-load=rpl_semi_sync_slave=semisync_slave.so rpl_semi_sync_slave_enabled=1[root@mysql-slave2 ~]# systemctl restart mysqld[root@mysql-slave2 ~]# mysql -uroot -p Enter password: mysql> show global variables like 'gtid_%'; +----------------------------------+-------+ | Variable_name | Value | +----------------------------------+-------+ | gtid_executed | | | gtid_executed_compression_period | 1000 | | gtid_mode | ON | | gtid_owned | | | gtid_purged | | +----------------------------------+-------+ 5 rows in set (0.01 sec) mysql> stop slave; Query OK, 0 rows affected, 1 warning (0.00 sec) mysql> change master to master_host='192.168.131.129',master_user='slave',master_password='Slave@123',master_auto_position=1; # 开启主从复制 Query OK, 0 rows affected, 2 warnings (0.00 sec) mysql> start slave; Query OK, 0 rows affected (0.02 sec) mysql> show slave status \G *************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: 192.168.131.129 Master_User: slave Master_Port: 3306 Connect_Retry: 60 Master_Log_File: mysql-bin.000002 Read_Master_Log_Pos: 194 Relay_Log_File: mysql-slave2-relay-bin.000003 Relay_Log_Pos: 407 Relay_Master_Log_File: mysql-bin.000002 Slave_IO_Running: Yes Slave_SQL_Running: Yes mysql> show variables like '%slave_para%'; # 查看并行复制 +------------------------+---------------+ | Variable_name | Value | +------------------------+---------------+ | slave_parallel_type | LOGICAL_CLOCK | | slave_parallel_workers | 4 | +------------------------+---------------+ 2 rows in set (0.00 sec) mysql> install plugin rpl_semi_sync_master soname 'semisync_master.so'; # 开启半同步复制 Query OK, 0 rows affected (0.04 sec) mysql> select plugin_name, -> plugin_status from information_schema.plugins -> where plugin_name like '%semi%'; +----------------------+---------------+ | plugin_name | plugin_status | +----------------------+---------------+ | rpl_semi_sync_slave | ACTIVE | | rpl_semi_sync_master | ACTIVE | +----------------------+---------------+ 2 rows in set (0.00 sec) mysql> stop slave IO_THREAD; Query OK, 0 rows affected (0.00 sec) mysql> start slave IO_THREAD; Query OK, 0 rows affected (0.00 sec) mysql> show status like 'Rpl_semi_sync_slave_status'; +----------------------------+-------+ | Variable_name | Value | +----------------------------+-------+ | Rpl_semi_sync_slave_status | ON | +----------------------------+-------+ 1 row in set (0.00 sec)# 回到mysql-master1主数据库查看 mysql> show slave hosts; +-----------+------+------+-----------+--------------------------------------+ | Server_id | Host | Port | Master_id | Slave_UUID | +-----------+------+------+-----------+--------------------------------------+ | 3 | | 3306 | 1 | 5d138126-db73-11ea-988b-000c29bef1e6 | | 2 | | 3306 | 1 | 5d0fecd7-db73-11ea-b20e-000c2986ee9d | +-----------+------+------+-----------+--------------------------------------+ 2 rows in set (0.00 sec) # mysql-master1主数据库现在有两个从数据库,分别为mysql-slave1 和 mysql-slave2 mysql> show status like '%Rpl_semi%'; +--------------------------------------------+-------+ | Variable_name | Value | +--------------------------------------------+-------+ | Rpl_semi_sync_master_clients | 2 | | Rpl_semi_sync_master_net_avg_wait_time | 0 | | Rpl_semi_sync_master_net_wait_time | 0 | | Rpl_semi_sync_master_net_waits | 0 | | Rpl_semi_sync_master_no_times | 0 | | Rpl_semi_sync_master_no_tx | 0 | | Rpl_semi_sync_master_status | ON | | Rpl_semi_sync_master_timefunc_failures | 0 | | Rpl_semi_sync_master_tx_avg_wait_time | 0 | | Rpl_semi_sync_master_tx_wait_time | 0 | | Rpl_semi_sync_master_tx_waits | 0 | | Rpl_semi_sync_master_wait_pos_backtraverse | 0 | | Rpl_semi_sync_master_wait_sessions | 0 | | Rpl_semi_sync_master_yes_tx | 0 | +--------------------------------------------+-------+ 14 rows in set (0.01 sec) # mysql-master1主数据库现在有两个半同步复制的从库,即mysql-slave1 和mysql-slave2
故事前景接了个私活,需要安装canal,canal需要mysql开启binlog功能,查看了mysql的配置文件,看到已经写了log_bin参数,此时进入mysql,执行sql语句确认binlog功能是否为ON [sql语句:show variables like 'log_bin';],结果显示为OFF,于是开启了排查之路查看docker启动时挂载了哪些目录docker inspect 9e33b294e948 | grep Binds -A 4预期出现类似如下的输出,以本地实际环境为准docker run启动的时候,-v参数所挂载的目录,会在docker inspect的Binds这块找到"Binds": [ "/etc/localtime:/etc/localtime:ro", "/data/mysql-test/conf:/etc/mysql", "/data/mysql-test/data:/var/lib/mysql" ],这时,查看一下本地持久化配置文件的目录,发现,只有一个my.cnf文件问题就出现在这一块:本地直接使用yum安装的mysql,默认的配置文件存储路径是/etc/mysql/my.cnf但是docker容器其实并非如此# tree /data/mysql-test/conf /data/mysql-test/conf └── my.cnf使用相同镜像启动一个mysql因为只是查看一下mysql的配置文件情况,就简单的启动mysql即可如果不给-e MYSQL_ROOT_PASSWORD=root参数,容器无法在后台运行,就无法把配置文件获取到宿主机docker run -d -e MYSQL_ROOT_PASSWORD=root mysql:5.7新建一个目录用来存放容器内的mysql配置文件mkdir -p /data/mysql-new/conf复制容器内的mysql配置文件到本地docker cp <容器ID>:/etc/mysql/ /data/mysql-new/conf/查看mysql配置文件目录结构为什么要拿到本地?反正也要拿到本地重新挂载,早晚都要拿,总不能手撸配置文件吧# tree /data/mysql-new/conf/ /data/mysql-new/conf/ ├── conf.d │ ├── docker.cnf │ ├── mysql.cnf │ └── mysqldump.cnf ├── my.cnf -> /etc/alternatives/my.cnf ├── my.cnf.fallback ├── mysql.cnf └── mysql.conf.d └── mysqld.cnf那么问题来了,这么多文件,到底哪个才是默认的配置文件呢,那就一个个看吧conf/conf.d/docker.cnf[mysqld] skip-host-cache skip-name-resolveconf/conf.d/mysql.cnf[mysql]conf/conf.d/mysqldump.cnf[mysqldump] quick quote-names max_allowed_packet = 16Mconf/my.cnf这个文件在本地看不了,因为他是一个软连接文件,文件链接的路径是/etc/alternatives/my.cnf而/etc/alternatives/my.cnf这个文件也是一个软连接文件,文件的连接路径是/etc/mysql/mysql.cnf咱也不知道官方为啥要这样套娃,咱也不敢问conf/my.cnf.fallback# # The MySQL database server configuration file. # You can copy this to one of: # - "/etc/mysql/my.cnf" to set global options, # - "~/.my.cnf" to set user-specific options. # One can use all long options that the program supports. # Run program with --help to get a list of available options and with # --print-defaults to see which it would actually understand and use. # For explanations see # http://dev.mysql.com/doc/mysql/en/server-system-variables.html # This will be passed to all mysql clients # It has been reported that passwords should be enclosed with ticks/quotes # escpecially if they contain "#" chars... # Remember to edit /etc/mysql/debian.cnf when changing the socket location. # Here is entries for some specific programs # The following values assume you have at least 32M ram # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License, version 2.0, # as published by the Free Software Foundation. # This program is also distributed with certain software (including # but not limited to OpenSSL) that is licensed under separate terms, # as designated in a particular file or component or in included license # documentation. The authors of MySQL hereby grant you an additional # permission to link the program and your derivative works with the # separately licensed software that they have included with MySQL. # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License, version 2.0, for more details. # You should have received a copy of the GNU General Public License # along with this program; if not, write to the Free Software # Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA !includedir /etc/mysql/conf.d/ # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License, version 2.0, # as published by the Free Software Foundation. # This program is also distributed with certain software (including # but not limited to OpenSSL) that is licensed under separate terms, # as designated in a particular file or component or in included license # documentation. The authors of MySQL hereby grant you an additional # permission to link the program and your derivative works with the # separately licensed software that they have included with MySQL. # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License, version 2.0, for more details. # You should have received a copy of the GNU General Public License # along with this program; if not, write to the Free Software # Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA # The MySQL Server configuration file. # For explanations see # http://dev.mysql.com/doc/mysql/en/server-system-variables.html [mysqld] pid-file = /var/run/mysqld/mysqld.pid socket = /var/run/mysqld/mysqld.sock datadir = /var/lib/mysql #log-error = /var/log/mysql/error.log # By default we only accept connections from localhost #bind-address = 127.0.0.1 # Disabling symbolic-links is recommended to prevent assorted security risks symbolic-links=0真假配置文件已经显而易见了docker容器启动的mysql默认的配置文件其实是/etc/mysql/mysql.conf.d/mysqld.conf因此,如果需要将本地配置文件挂载到容器里面,只需要挂载这一个文件即可,此时我们修改本地的mysql.conf.d/mysqld.conf文件,开启binlog,并验证是否修改成功启动mysql容器精简一下本地mysql配置文件目录,就保留一个mysqld.cnf文件即可# tree /data/mysql-new/conf/ /data/mysql-new/conf/ └── mysqld.cnf在mysqld.cnf文件最后加上这两行,用来开启binlog日志log_bin=mysql-bin server_id=33091启动mysql容器docker run -d \ -e MYSQL_ROOT_PASSWORD=root \ -v /etc/localtime:/etc/localtime \ -v /data/mysql-new/conf/mysqld.cnf:/etc/mysql/mysql.conf.d/mysqld.cnf \ -v /data/mysql-new/data:/var/lib/mysql \ -p 3309:3306 \ --name mysql-new \ mysql:5.7数据库就不进去了,直接使用-e参数将结果返回到终端页面# mysql -uroot -p -P3309 -h192.168.100.200 -e "show variables like 'log_bin';" Enter password: +---------------+-------+ | Variable_name | Value | +---------------+-------+ | log_bin | ON | +---------------+-------+此时,找到了为何已经启动的mysql容器加载不到配置文件的原因了同时,也学到了一个新的经验,当容器需要持久化的时候,最好是简单启动一下这个容器,查看一下持久化目录的结构以及是否存在依赖的情况,根据实际来选择到底是目录挂载,还是单配置文件挂载,避免本地错误目录结构覆盖了容器内的目录结构,当一些配置没有更新的时候,排查真的很头疼后续将会在头脑清醒的时候去修复已经启动的mysql环境,预知后事如何,请看下集 [填别人留下的坑,真的难顶]
frm文件和ibd文件简介在MySQL中,使用默认的存储引擎innodb创建一张表,那么在库名文件夹下面就会出现表名.frm和表名.ibd两个文件ibd文件是innodb的表数据文件frm文件是innodb的表结构文件需要注意的是,frm文件和ibd文件都是不能直接打开的恢复数据之前,需要先恢复表结构在有建表语句的前提下,可以直接跳到ibd文件恢复表数据,不需要使用frm文件恢复表结构frm文件恢复表结构前提是已经备份了对应的frm文件建议重新启动一个MySQL实例,待数据恢复后,通过mysqldump备份数据,再重新恢复到需要使用的数据库里在新启动的实例上创建一个同名的表,例如study.frm,表示表名称为study在不知道表结构的情况下,可以先定义一个字段,稍后可以通过mysql.err日志内查看表字段的数量create table study (id int);创建完表后,在对应的数据目录下就会生成study.frm和study.ibd文件,然后使用之前备份的study.frm来替换现有的study.frm,切记,不要着急替换study.ibd文件,这个文件在恢复表结构后再使用注意替换文件后的study.frm文件的权限,确保和其他文件的属主和属组是一样的重启mysql数据库查看日志grep study mysql.err | grep columns容器启动的MySQL,直接使用docker restart <容器id>来重启MySQL服务如果是容器启动的MySQL,可以使用下面的命令在容器外查看日志docker logs <容器id> | grep study | grep columns通过日志,我们可以看到,study这个表,之前有5个字段,但是我们现在只有1个字段[Warning] InnoDB: Table hello@002dworld/study contains 1 user defined columns in InnoDB, but 5 columns in MySQL. Please check INFORMATION_SCHEMA.INNODB_SYS_COLUMNS and http://dev.mysql.com/doc/refman/5.7/en/innodb-troubleshooting.html for how to resolve the issue.这个时候,我们可以把原来的表删掉drop table study;然后重新创建一个和原来的表相同字段的表,切记,表名称要一样,字段内容不重要,只需要字段数量一致create table study (id1 int,id2 int,id3 int,id4 int,id5 int);现在可以看到我们的建表语句了,当然,这个是上面使用的建表语句,咱们继续往下show create table study\G *************************** 1. row *************************** Table: study Create Table: CREATE TABLE `study` ( `id1` int(11) DEFAULT NULL, `id2` int(11) DEFAULT NULL, `id3` int(11) DEFAULT NULL, `id4` int(11) DEFAULT NULL, `id5` int(11) DEFAULT NULL ) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin 1 row in set (0.00 sec)确认是否开启了innodb_force_recovery参数,正常情况下,如果不是为了恢复数据是不会开启这个参数的innodb_force_recovery 参数需要配置到 my.cnf 中的 [mysqld] 模块下,取值范围是0-6, 默认是01: (SRV_FORCE_IGNORE_CORRUPT): 忽略检查到的corrupt页2: (SRV_FORCE_NO_BACKGROUND): 阻止主线程的运行,如主线程需要执行full purge操作,会导致crash3: (SRV_FORCE_NO_TRX_UNDO): 不执行事务回滚操作4: (SRV_FORCE_NO_IBUF_MERGE): 不执行插入缓冲的合并操作5: (SRV_FORCE_NO_UNDO_LOG_SCAN): 不查看重做日志,InnoDB存储引擎会将未提交的事务视为已提交6: (SRV_FORCE_NO_LOG_REDO): 不执行前滚的操作当设置参数值大于0后,可以对表进行select、create、drop操作,但insert、update或者delete这类操作是不允许的grep 'innodb_force' my.cnf插入配置到my.cnf配置文件中,然后再次替换study.frm文件,并重启MySQL服务注意替换文件后的study.frm文件的权限,确保和其他文件的属主和属组是一样的sed -i '/\[mysqld\]/a\innodb_force_recovery=6' my.cnf重启完成后,再次查看建表语句show create table study\G *************************** 1. row *************************** Table: study Create Table: CREATE TABLE `study` ( `id` int(11) DEFAULT NULL, `name` varchar(20) COLLATE utf8_bin DEFAULT NULL, `age` int(11) DEFAULT NULL, `time` int(11) DEFAULT NULL, `lang` varchar(20) COLLATE utf8_bin DEFAULT NULL ) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin 1 row in set (0.00 sec)到这里,我们已经成功找回之前的建表语句了,通过这个语句,就可以恢复之前的表了复制获取的建表语句,注释掉之前的innodb_force_recovery参数,并且重启MySQL服务sed -i '/innodb_force_recovery/s/^\(.*\)$/#\1/g' my.cnf再次删掉study这个表drop table study;然后使用上面获取到的建表语句重新建表,注意最后加上一个分号,这是SQL的语法格式CREATE TABLE `study` ( `id` int(11) DEFAULT NULL, `name` varchar(20) COLLATE utf8_bin DEFAULT NULL, `age` int(11) DEFAULT NULL, `time` int(11) DEFAULT NULL, `lang` varchar(20) COLLATE utf8_bin DEFAULT NULL ) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin;ibd文件恢复表数据在有建表语句的情况下,使用idb文件恢复数据,相比使用frm文件恢复表数据要简单方便很多删除当前的ibd文件alter table study discard tablespace;将之前备份的study.ibd文件复制到对应的数据目录下,使用下面的命令将数据加载到MySQL数据库里注意替换文件后的study.ibd文件的权限,确保和其他文件的属主和属组是一样的alter table study import tablespace;再次查看数据表,发现之前的数据也回来了select * from study;+------+------+------+----------+---------+ | id | name | age | time | lang | +------+------+------+----------+---------+ | 1 | tom | 26 | 20211024 | chinese | +------+------+------+----------+---------+记得备份数据,数据是无价的通过脚本利用ibd文件恢复数据前提是表结构是存在的注意自己的数据库是否区分大小写,以及表名称是否有大小写,如果表名称有大小写,新启动的mysql一定要开启大小写[开启大小写参数:lower_case_table_names = 0]mysql_user变量的值为mysql数据目录的属主和属组根据实际场景修改mysql_cmd变量的值,修改成自己用户名,用户密码,主机ipmysql_data_dir变量的值为mysql数据存储路径back_data_dir变量的值为备份下来的ibd文件存储路径#!/bin/bash base_dir=$(cd `dirname $0`; pwd) mysql_user='mysql' mysql_cmd="mysql -N -uroot -proot -h192.168.70.49" databases_list=($(${mysql_cmd} -e 'SHOW DATABASES;' | egrep -v 'information_schema|mysql|performance_schema|sys')) mysql_data_dir='/var/lib/mysql' back_data_dir='/tmp/back-data' for (( i=0; i<${#databases_list[@]}; i++ )) tables_list=($(${mysql_cmd} -e "SELECT table_name FROM information_schema.tables WHERE table_schema=\"${databases_list[i]}\";")) database_name=${databases_list[i]/-/@002d} for (( table=0; table<${#tables_list[@]}; table++ )) ${mysql_cmd} -e "alter table \`${databases_list[i]}\`.${tables_list[table]} discard tablespace;" rm -f ${mysql_data_dir}/${database_name}/${tables_list[table]}.ibd cp ${back_data_dir}/${database_name}/${tables_list[table]}.ibd ${mysql_data_dir}/${database_name}/ chown -R ${mysql_user}.${mysql_user} ${mysql_data_dir}/${database_name}/ ${mysql_cmd} -e "alter table \`${databases_list[i]}\`.${tables_list[table]} import tablespace;" sleep 5 done通过shell脚本导出mysql所有库的所有表的表结构mysql_cmd和dump_cmd的变量值根据实际环境修改,修改成自己用户名,用户密码,主机ipdatabases_list只排除了mysql的系统库,如果需要排除其他库,可以修改egrep -v后面的值导出的表结构以库名来命名,并且加入了CREATE DATABASE IF NOT EXISTS语句#!/bin/bash base_dir=$(cd `dirname $0`; pwd) mysql_cmd="mysql -N -uroot -proot -h192.168.70.49" dump_cmd="mysqldump -uroot -proot -h192.168.70.49" databases_list=($(${mysql_cmd} -e 'SHOW DATABASES;' | egrep -v 'information_schema|mysql|performance_schema|sys')) for (( i=0; i<${#databases_list[@]}; i++ )) tables_list=($(${mysql_cmd} -e "SELECT table_name FROM information_schema.tables WHERE table_schema=\"${databases_list[i]}\";")) [[ ! -f "${base_dir}/${databases_list[i]}.sql" ]] || rm -f ${base_dir}/${databases_list[i]}.sql echo "CREATE DATABASE IF NOT EXISTS \`${databases_list[i]}\`;" >> ${base_dir}/${databases_list[i]}.sql echo "USE \`${databases_list[i]}\`;" >> ${base_dir}/${databases_list[i]}.sql for (( table=0; table<${#tables_list[@]}; table++ )) ${dump_cmd} -d ${databases_list[i]} ${tables_list[table]} >> ${base_dir}/${databases_list[i]}.sql
此脚本的初衷是因为,KVM创建的桥接网卡的虚拟机,无法使用virsh domifaddr命令获取IP,而创建的nat网卡的虚拟机,则可以直接使用virsh domifaddr命令来获取IP此脚本是个人学习所写的,关于KVM的管理方式,有很多,可以直接使用virt-manager图形化管理配置KVM虚拟机本着传统手艺不能丢的原则,写下了此脚本,和大佬们互相学习此脚本最终生成的log文件,也可以当成是资产管理清单只要路子野,shell也很强脚本说明执行脚本之前,需要修改脚本内的IP_HEAD变量,改成自己IP网段,只需要前三个主机位即可,结尾不要有.,否则后面内容执行会报错脚本使用方法:sh 脚本名称 KVM虚拟机名称(例如:sh virsh-ip.sh centos8.3.3)(虚拟机名称可以使用virsh list命令获取)sh 脚本名称 all(例如:sh virsh-ip.sh all)(获取所有KVM虚拟机的IP)此脚本获取IP的方式:通过ping整个网段的所有IP,从1ping到255,创建arp缓存表,通过过滤mac地址来获取IP此脚本开启了并发,可以适当减少ping的范围,减少脚本的运行时间,修改PING_ALL_IPADDR函数内的seq命令参数即可()此脚本会用到arp命令,需要安装net-tools,否则会获取不到IP,并且会报错此脚本最终会将内容追加到脚本所在目录的virsh-ip.log文件内,脚本完成后,会有如下回显:Get IP complate,Use command: cat /root/virsh-ip.log,直接复制cat命令和参数,执行后,即可查看到虚拟机对应的IP地址脚本展示#!/bin/bash BASE_DIR=$(cd $(dirname $0); pwd) VIRSH_NAME=$1 IP_HEAD=192.168.72 if [ "$1"x == "all"x ] VIRSH_NAME=$(virsh list | egrep -v "^$|Name|-----" | awk '{print $2}') VIRSH_NAME=$1 function PING_ALL_IPADDR () { for i in $(seq 1 255) ping ${IP_HEAD}.${i} -c 1 -w 1 > /dev/null 2>&1 } & function FIND_VIRSH () { VIRSH_NAME_ARRAY=($(printf "%q\n" ${VIRSH_NAME})) for (( n=0 ; n<${#VIRSH_NAME_ARRAY[@]} ; n++ )) VIRSH_MAC=$(virsh domiflist ${VIRSH_NAME_ARRAY[n]} | egrep -v "MAC|-----|^$" | awk '{print $NF}') echo "${VIRSH_NAME_ARRAY[n]}:" >> ${BASE_DIR}/virsh-ip.log arp -n | grep -i ${VIRSH_MAC} | awk '{print "ip:"$1 "\t" "mac:"$3}' >> ${BASE_DIR}/virsh-ip.log echo " " >> ${BASE_DIR}/virsh-ip.log printf "\e[1;35m Get IP complate,Use command: cat ${BASE_DIR}/virsh-ip.log\e[0m\n" function main () { printf "\e[1;35m I'm just coming!\e[0m\n" PING_ALL_IPADDR FIND_VIRSH main效果展示执行脚本,获取全部KVM虚拟机的IPsh virsh-ip.sh all脚本执行后的回显 I'm just coming!执行成功后的回显 Get IP complate,Use command: cat /root/virsh-ip.log查看日志内容cat /root/virsh-ip.logcentos8.3.3-ks: ip:192.168.72.85 mac:52:54:00:d1:34:fb centos8.3.1-ks: ip:192.168.72.87 mac:52:54:00:87:bf:a9 centos8.3.2-ks: ip:192.168.72.86 mac:52:54:00:f5:3c:c0 大家如果有建议,可以评论告诉我,或者私信我,我可以修改脚本,让他变得更好
KVM的组件① kvm.ko:模块API 应用程序编程接口② qemu-kvm:用户空间的工具程序;qemu-KVM 是一种开源虚拟器,它为KVM管理程序提供硬件仿真。运行中的一个 kvm 虚拟机就是一个 qemu-kvm 进程,运行 qemu-kvm 程序并传递给它合适的选项及参数即能完成虚拟机启动,终止此进程即能关闭虚拟机;③ libvirt 虚拟化库:Libvirt是C工具包,libvirt可以与最近版本的Linux(以及其他操作系统)的虚拟化功能进行交互。主包包含了导出虚拟化支持的libvirtd服务器。libvirt 包含 C/S:Client:libvirt-clientvirt-managerDaemon:libvirt-daemonKVM模块load进内存之后,系统的运行模式内核模式:GuestOS 执行 IO 类的操作时,或其它的特殊指令操作时的模式;它也被称为"Guest-Kernel"模式;用户模式:Host OS的用户空间,用于代为GuestOS发出IO请求;来宾模式:GuestOS 的用户模式;所有的非IO类请求部署KVM下面开始套娃,我vmware开的虚拟机,给了4C16G,KVM的最低要求是内存不能低于4G基础配置必须跑在 x86 系统的架构上必须支持硬件级虚拟化vmx: Intel VT-xsvm: AMD AMD-v虚拟机上再虚拟化,需开启虚拟化 Intel VT-x/EPT判断CPU是否支持硬件虚拟化# egrep -i 'vmx|svm|lm' /proc/cpuinfo注意:vmx 或 svm 必须出现一个,表示是支持的vmx: Intel VT-xsvm: AMD AMD-vflags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon nopl xtopology tsc_reliable nonstop_tsc eagerfpu pni pclmulqdq 'vmx' ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm ssbd ibrs ibpb stibp tpr_shadow vnmi ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 invpcid xsaveopt arat spec_ctrl intel_stibp flush_l1d arch_capabilities检测 kvm 模块是否装载# lsmod | grep kvm kvm_intel 183621 0 kvm 586948 1 kvm_intel irqbypass 13503 1 kvm如果没有装载 kvm 模块,执行 modprobe kvm 命令即可安装用户端工具 qemu-kvm# yum install -y libvirt* virt-* qemu-kvm*libvirt 虚拟机管理virt 虚拟机安装克隆qemu-kvm 管理虚拟机磁盘安装的比较多,有300多个包启动服务# systemctl start libvirtd.service查看网卡# virsh net-list Name State Autostart Persistent ---------------------------------------------------------- default active yes yes配置桥接网卡br0这一步,根据实际情况操作,启动KVM后,会生成virbr0网卡,供KVM创建的虚拟机使用,KVM默认虚拟机的网络为NAT模式先备份一下网卡配置文件,万一有问题,还能恢复如果是CentOS发行版,可以关掉NetworkManager服务,免得他捣乱# systemctl disable NetworkManager --now# virsh iface-bridge eth0 br0把自己的物理网卡 eth0 作为交换机,把 br0 当网卡,提供IP(切记,别复制直接用,将eth0改为自己的网卡名称,使用ip a命令可以查看自己的网卡名称)注意:命令可能会卡死或出错,终端被强制退出;等一会,在登录就OK 了# cp /etc/libvirt/qemu/networks/default.xml /etc/libvirt/qemu/networks/br0.xml # vim /etc/libvirt/qemu/networks/br0.xml<network> <name>br0</name> <forward mode='bridge'/> <bridge name='br0'/> </network>启动br0网卡# virsh net-define /etc/libvirt/qemu/networks/br0.xml # virsh net-autostart br0 # virsh net-start br0 # virsh net-list Name State Autostart Persistent ---------------------------------------------------------- br0 active yes yes default active yes yes使用KVM创建虚拟机图形化界面,可以使用 virt-manager 来创建虚拟机,都是点点点的操作这里就使用命令行的方式创建虚拟机了注意:需要先上传一个系统镜像文件到KVM服务器上使用VNC的方式安装VNC# yum install -y tightvnc使用命令创建虚拟机# mkdir /opt/kvm # virt-install --virt-type kvm \ --name suse12-sp3 \ --memory 2048 \ --vcpus 1 \ --disk /opt/kvm/suse12-sp3.qcow2,format=qcow2,size=30 \ --cdrom /opt/kvm/SLE-12-SP3-Server-DVD-x86_64-GM-DVD1.iso \ --network network=default \ --graphics vnc,listen=0.0.0.0,port=5900 \ --noautoconsole打开VNC,输入本机的IP加上创建的时候指定的端口5900,即可开始安装虚拟机端口是自定义的,并非固定的5900# vncviewer # 打开 VNC使用VNC部署的虚拟机,当虚拟机启动的时候,会自动启用创建时所指定的端口,可以再次使用VNC远程连接虚拟机,当虚拟机关机的时候,端口也会自动关闭# 关机状态下的虚拟机,需要加上--all参数才会看得到 # virsh list --all Id Name State ---------------------------------------------------- - suse12-sp3 shut off # ss -nltp | grep 5900# virsh start suse12-sp3 Domain suse12-sp3 started # virsh list --all Id Name State ---------------------------------------------------- 6 suse12-sp3 running # ss -nltp | grep 5900 # 虚拟机启动后,端口就出来了 LISTEN 0 1 *:5900 *:* users:(("qemu-kvm",pid=31317,fd=19))参数说明参数说明--virt-type要使用的虚拟化名称(kvm, qemu, xen, ...)--name虚拟机的名称--memory配置内存大小,默认单位为MiB--vcpus配置虚拟 CPU(vcpu) 数量--disk指定存储的各种选项--cdrom安装的介质--network虚拟机使用的网络接口,可以使用 virsh net-list 命令查看当前拥有的网络接口--graphics配置虚拟机的显示设置,有vnc、none、spice--noautoconsole不要自动尝试连接到客户端控制台不使用VNC的方式# virt-install --virt-type=kvm \ --name=centos7.7 \ --vcpus=1 \ --memory=2048 \ --location /opt/kvm/CentOS-7.7-x86_64-DVD-1908.iso \ --disk /opt/kvm/centos7.7.qcow2,format=qcow2,size=30 \ --network network=default \ --graphics none \ --extra-args='console=ttyS0'使用kickstart文件的方式# centos系统安装完成后,在/root目录下会有一个cfg后缀的kickstart文件 # cp anaconda-ks.cfg /opt/kvm/ # cd /opt/kvm/ # vim anaconda-ks.cfg由于kvm安装的虚拟机,磁盘都是vda,所以需要将文件内的sda全部修改为vda由于虚拟机分配的是100G,但是KVM创建的虚拟机,咱给的是30G,所以需要修改分区,为了省事,直接使用了自动分区autopart --type=lvmkickstart文件之前没接触过,就没有过多的修改了,以后有时间再去研究一下#version=DEVEL # System authorization information auth --enableshadow --passalgo=sha512 # Use graphical install # graphical # Run the Setup Agent on first boot firstboot --enable ignoredisk --only-use=vda # Keyboard layouts keyboard --vckeymap=us --xlayouts='us' # System language lang en_US.UTF-8 # Network information network --bootproto=dhcp --device=eth0 --ipv6=auto --activate network --hostname=localhost.localdomain # Root password rootpw --iscrypted $6$3IMre6QwQrXPP1tr$2t6ACeLAG/Ogg.nSdX.iNwxZLkrpN.sC6u/e6GYqV.GOvsmA1zu9rA7ceYZmgUvWgPy2NyuM8q4S75Kk9cjKn. # System services services --enabled="chronyd" services --disabled="NetworkManager" services --disabled="firewall" # System timezone timezone Asia/Shanghai --isUtc --nontp # System bootloader configuration bootloader --location=mbr --boot-drive=vda # Partition clearing information clearpart --all --initlabel --drives=vda # Disk partitioning information autopart --type=lvm %packages @^minimal @core @development @system-admin-tools chrony %addon com_redhat_kdump --disable --reserve-mb='auto' %end# virt-install --virt-type=kvm \ --name=centos7.7-ks \ --vcpus=1 \ --memory=2048 \ --location /opt/kvm/CentOS-7.7-x86_64-DVD-1908.iso \ --disk /opt/kvm/centos7.7-ks.qcow2,format=qcow2,size=30 \ --network network=default \ --graphics none \ --initrd-inject=/opt/kvm/anaconda-ks.cfg \ --extra-args='ks=file:/anaconda-ks.cfg console=ttyS0'
k8s的apiserver组件重启失败,通过journalctl -xeu kube-apiserver命令查看日志,找到了如下的报错etcdserver: mvcc: database space exceeded查看节点状态这里,我们需要用到etcdctl工具,一般二进制部署的k8s,都会带有这个工具,如果没有,可以去github下载指定版本的etcd二进制文件即可通过etcdctl version查看当前API版本,以下的命令,需要使用API 3版本,如果不是API 3版本,需要在执行etcdctl前加上参数,示例:ETCDCTL_API=3 etcdctl endpoint status如果etcd的--listen-client-urls参数有配置http://127.0.0.1:2379,以下的命令可以不加上--endpoints参数,如果需要加上--endpoints参数,就需要加上指定的证书路径通过systemctl status etcd -l可以看到etcd启动时所带的参数,可以找到指定的证书路径,下面的证书路径,以自己实际的为准,不要纯复制黏贴ETCDCTL_API=3 etcdctl \ --cacert=/etc/kubernetes/cert/ca.pem \ --cert=/etc/kubernetes/cert/etcd.pem \ --key=/etc/kubernetes/cert/etcd-key.pem \ --endpoints=https://172.31.243.179:2379 \ endpoint status --write-out="table"--write-out="table"是输出的格式,可以是json,可以是table,默认是simple,这个参数可以不加ENDPOINTIDVERSIONDB SIZEIS LEADERIS LEARNERRAFT TERMRAFT INDEXRAFT APPLIED INDEXERRORShttps://172.31.243.179:2379f0a399bcc03bea5f3.4.126.4GBtruefalse52965952329659523 可以看到,这里的db size已经达到6.4G,在etcd启动的时候,如果没有配置--quota-backend-bytes的大小,默认只有2G,因此,导致了apiserver无法写入etcd获取旧版本号ETCDCTL_API=3 etcdctl \ --cacert=/etc/kubernetes/cert/ca.pem \ --cert=/etc/kubernetes/cert/etcd.pem \ --key=/etc/kubernetes/cert/etcd-key.pem \ --endpoints=https://172.31.243.179:2379 \ endpoint status --write-out="json" | egrep -o '"revision":[0-9]*' | egrep -o '[0-9]*'15151255得到的这个数据值,就是当前的版本号,当我们压缩的时候,他就变成旧版本号了压缩旧版本ETCDCTL_API=3 etcdctl \ --cacert=/etc/kubernetes/cert/ca.pem \ --cert=/etc/kubernetes/cert/etcd.pem \ --key=/etc/kubernetes/cert/etcd-key.pem \ --endpoints=https://172.31.243.179:2379 \ compact 15151255清理碎片ETCDCTL_API=3 etcdctl \ --cacert=/etc/kubernetes/cert/ca.pem \ --cert=/etc/kubernetes/cert/etcd.pem \ --key=/etc/kubernetes/cert/etcd-key.pem \ --endpoints=https://172.31.243.179:2379 \ defrag再次查看节点状态ETCDCTL_API=3 etcdctl \ --cacert=/etc/kubernetes/cert/ca.pem \ --cert=/etc/kubernetes/cert/etcd.pem \ --key=/etc/kubernetes/cert/etcd-key.pem \ --endpoints=https://172.31.243.179:2379 \ endpoint statushttps://172.31.243.179:2379, f0a399bcc03bea5f, 3.4.12, 1.0 MB, true, false, 5, 29659523, 29659523,db size这一块,变成了1.0MB了清楚告警查看告警ETCDCTL_API=3 etcdctl \ --cacert=/etc/kubernetes/cert/ca.pem \ --cert=/etc/kubernetes/cert/etcd.pem \ --key=/etc/kubernetes/cert/etcd-key.pem \ --endpoints=https://172.31.243.179:2379 \ alarm listmemberID:f0a399bcc03bea5f alarm:NOSPACE清楚告警ETCDCTL_API=3 etcdctl \ --cacert=/etc/kubernetes/cert/ca.pem \ --cert=/etc/kubernetes/cert/etcd.pem \ --key=/etc/kubernetes/cert/etcd-key.pem \ --endpoints=https://172.31.243.179:2379 \ alarm disarm然后再次重启我的apiserver他来了,他来了,他活过来了,不用跑路的感觉,真好
由于自己的误操作,将A节点的etcd备份数据复制到B节点的etcd备份节点目录下,还原etcd快照后,导致etcd数据受损,B节点加入之前的集群失败理论上,只要有一个etcd节点还在,就可以重新将etcd节点加回到原来的集群查看当前集群状态etcdctl \ --cacert=/etc/kubernetes/cert/ca.pem \ --cert=/etc/kubernetes/cert/etcd.pem \ --key=/etc/kubernetes/cert/etcd-key.pem \ --endpoints=https://172.31.243.179:2379 \ member listf0a399bcc03bea5f, started, k8s-02, https://172.31.243.179:2380, https://172.31.243.179:2379, false 8262a810106df86c, started, k8s-03, https://172.31.243.180:2380, https://172.31.243.180:2379, false原本是三节点,现在一个节点加入不了集群,因为数据受损,已经没有那个节点的信息了,如果这里有出现受损节点的信息,需要member remove <集群id>,然后再执行后面的操作删除受损etcd节点的数据通过systemctl status etcd -l可以看到etcd的数据存储路径,参数为--data-dir,下面的路径,以自己的实际路径为准,不要纯复制黏贴cd /opt/k8s/server/etcd mv data{,-bak$(date +%F)} mkdir data数据受损节点重新加入集群etcdctl \ --cacert=/etc/kubernetes/cert/ca.pem \ --cert=/etc/kubernetes/cert/etcd.pem \ --key=/etc/kubernetes/cert/etcd-key.pem \ --endpoints=https://172.31.243.179:2379 \ member add k8s-03 https://172.31.243.178:2380修改etcd启动参数,重启etcd将etcd的--initial-cluster-state启动参数,改为--initial-cluster-state=existingsystemctl daemon-reload systemctl restart etcd
上集回顾etcd 集群出现了故障,节点启动会有如下报错,一般是 member 信息不匹配,导致了集群异常,无法重启 etcd 集群member count is unequal本集预告如果 etcd 有做过 snapshot 快照,可以直接新建一套集群,直接还原快照即可,有时候没有快照,或没来的及快照,集群已经出现了问题,此时可以通过保留的 etcd data 数据目录下的文件,配合新 etcd 集群,使用 member update 可以在保留 etcd 数据的情况下初始化集群数据,重新构建 etcd 集群无论是 kubeadm 还是 二进制 的 k8s ,都可以通过这种方式来灾难恢复 etcd 集群,并且早期发现,在 pod 没有重启的情况下,也不会影响正在运行的 pod [ 会影响 master 组件的运行,因为 etcd 集群宕机,apiserver 服务会挂,也就无法使用 kubectl 命令管理 k8s 集群,当然,也无法使用 api 的方式,毕竟 apiserver 的端口也关了 ]L(老)B(b)Z(真)S(帅) - 来吧展示 二进制部署的,担心配置文件修改不好的,可以看这篇文章 感谢这篇文章让我救活了我生产环境的 etcd备份数据目录kubeadm 安装的,默认 etcd 数据目录在 /var/lib/etcd 目录下,二进制部署的,要查看 service 文件指定的 --data-dir 参数指向的路径对比一下,哪个目录最大,就备份哪个,目录最大的,数据最新cp -r /var/lib/etcd/member /var/lib/member-bak停止所有 etcd 服务如果是 二进制 部署的,只需要 systemctl stop etcd 即可,当然,以自己实际的环境为准,也许你的 service 文件配置的不是 etcd.servicekubeadm 部署的,只需要把 etcd.yaml 从 manifests 目录下移走即可kubeadm 默认部署的路径是 /etc/kubernetes,记得检查自己的环境是否也是这个路径,不要只管 cv,啥也不看mv /etc/kubernetes/manifests/etcd.yaml /etc/kubernetes/检查 etcd 是否都已经关闭# 这个方法只适合 kubeadm 部署的 k8s 环境,二进制的,需要通过 'ps -ef | grep etcd' 命令查看 etcd 进程是否还存在来验证 docker ps -a | grep etcd创建新 etcd 节点切记,先配置并启动一个节点的 etcd,要等这个单节点 etcd 启动成功后,再做后面的节点加入操作如果是修复数据,可以在原节点操作,并且不需要使用 member update 命令如果是做 etcd 数据迁移,配置好新节点的一个 etcd 节点即可,数据迁移需要使用 member update 命令单节点 etcd如果是 二进制 etcd,需要修改 service 文件如果是 kubeadm 部署的 k8s ,需要修改前面移走的 etcd.yaml 文件,然后放到 manifests 目录下来启动 etcd注意以下配置,修改之前先备份一份, service 文件也记得备份,我的环境是 kubeadm 的,懒得搞一个二进制环境了,理解思路才是最重要的cp etcd.yaml{,.bak} vim etcd.yaml增加 --force-new-cluster 参数,这个参数,在 etcd 单节点启动成功后,再删掉,然后重启 etcd这个参数的意义是强制删除集群中的所有成员,并把自己加入到集群成员中增加 --initial-cluster-state=new 参数new - 成员设置为 new,而不是加入已有集群existing - 成员加入已有 etcd 集群spec: containers: - command: - etcd - --advertise-client-urls=https://192.168.11.135:2379 - --cert-file=/etc/kubernetes/pki/etcd/server.crt - --client-cert-auth=true - --data-dir=/var/lib/etcd - --initial-advertise-peer-urls=https://192.168.11.135:2380 - --initial-cluster=master-01=https://192.168.11.135:2380 - --key-file=/etc/kubernetes/pki/etcd/server.key - --listen-client-urls=https://127.0.0.1:2379,https://192.168.11.135:2379 - --listen-metrics-urls=http://127.0.0.1:2381 - --listen-peer-urls=https://192.168.11.135:2380 - --name=master-01 - --peer-cert-file=/etc/kubernetes/pki/etcd/peer.crt - --peer-client-cert-auth=true - --peer-key-file=/etc/kubernetes/pki/etcd/peer.key - --peer-trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt - --snapshot-count=10000 - --trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt # 主要是这两个参数,上面的以自己环境的为准,都配置为当前节点的 ip,和自己的证书路径 - --initial-cluster-state=new - --force-new-cluster启动 etcd 服务,切记,就启动一个节点,其他节点,等这个单节点启动成功后,需要修改配置才可以加入到这个单节点的 etcd ,然后才能组成一套集群cp /etc/kubernetes/etcd.yaml /etc/kubernetes/manifests/查看 etcd 是否启动成功二进制部署的,可以看进程或者端口,kubeadm 部署的,使用下面的方式docker ps -a | grep etcd查看 member 信息本地如果没有 etcdctl 命令,可以从容器内复制一个 [ 二进制部署的,一般都自带 etcdctl 命令,如果没有的话,下载一个二进制文件就可以了 ]docker cp $(docker ps | awk '/etcd --/ {print $1}'):/usr/local/bin/etcdctl /etc/kubernetes/pki/etcd/etcdctl chmod +x /etc/kubernetes/pki/etcd/etcdctl cd /etc/kubernetes/pki/etcd/# ip 和证书改为自己环境的配置,切莫纯 cv ./etcdctl \ --cacert ./ca.crt \ --cert ./server.crt \ --key ./server.key \ --endpoints https://192.168.11.135:2379 \ member list可以看到,peerurl 的 ip 和 server 的 ip 是不一样的cde66358cffbaacf, started, master-01, https://192.168.11.131:2380, https://192.168.11.135:2379, false执行 member update 命令来更新 peerurl 的地址# ip 和证书改为自己环境的配置,member id 也已自己上面的命令获取的为准,切莫纯 cv ./etcdctl \ --cacert ./ca.crt \ --cert ./server.crt \ --key ./server.key \ --endpoints https://192.168.11.135:2379 \ member update cde66358cffbaacf --peer-urls=https://192.168.11.135:2380删除 -force-new-cluster 参数,避免下次 etcd 重启的时候,集群成员被初始化,这里就不留下我的操作足迹了,但是大家一定一定要记得删掉这个参数,然后重启 etcd将其他成员加入到 etcd 集群下面的操作,还是继续在已经启动的这个单节点的 etcd 机器上操作,也可以在其他节点操作,只要有证书,有 etcdctl 命令,哪里都可以操作# ip 和证书改为自己环境的配置,切莫纯 cv ./etcdctl \ --cacert ./ca.crt \ --cert ./server.crt \ --key ./server.key \ --endpoints https://192.168.11.135:2379 \ member add master-02 --peer-urls=https://192.168.11.133:2380准备加入的节点,记得 etcd 的 data 目录下不要有数据下面的操作在 需要加入 etcd 成员的节点操作确保 data 目录为空rm -rf /var/lib/etcd/member/* vim etcd.yaml增加 --initial-cluster-state=new 参数new - 成员设置为 new,而不是加入已有集群existing - 成员加入已有 etcd 集群spec: containers: - command: - etcd - --advertise-client-urls=https://192.168.11.133:2379 - --cert-file=/etc/kubernetes/pki/etcd/server.crt - --client-cert-auth=true - --data-dir=/var/lib/etcd - --initial-advertise-peer-urls=https://192.168.11.133:2380 - --initial-cluster=master-02=https://192.168.11.133:2380,master-01=https://192.168.11.135:2380 # 需要增加这个参数,切记,不能是 new,要是 existing,不然 etcd 会起不来 - --initial-cluster-state=existing - --key-file=/etc/kubernetes/pki/etcd/server.key - --listen-client-urls=https://127.0.0.1:2379,https://192.168.11.133:2379 - --listen-metrics-urls=http://127.0.0.1:2381 - --listen-peer-urls=https://192.168.11.133:2380 - --name=master-02 - --peer-cert-file=/etc/kubernetes/pki/etcd/peer.crt - --peer-client-cert-auth=true - --peer-key-file=/etc/kubernetes/pki/etcd/peer.key - --peer-trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt - --snapshot-count=10000 - --trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt重启 etcd ,查看 etcd 是否正常# ip 和证书改为自己环境的配置,切莫纯 cv ./etcdctl \ --cacert ./ca.crt \ --cert ./server.crt \ --key ./server.key \ --endpoints https://192.168.11.135:2379 \ member list可看到,两个节点都是 started 状态了,如果还有节点需要加入,重复执行成员加入 etcd 集群的操作即可47785658ea2ca7f0, started, master-02, https://192.168.11.133:2380, https://192.168.11.133:2379, false cde66358cffbaacf, started, master-01, https://192.168.11.135:2380, https://192.168.11.135:2379, false
原因是客户环境为双网卡环境,对内和对外有两个不同的网段,因为前期的部署 [那肯定不是我部署的,是我部署,我也不一定注意的到],因为本机路由不对,没有走对外的网卡,而加入控制节点的时候,没有指定 ip,导致走的默认路由,后期发现了问题,现在需要重新生成证书来修复 etcd 和 apiserver 因为修改 ip 而引发的一系列问题正片开始证书的修改,必须要 apiserver 服务可用备份 kubernetes 目录cp -r /etc/kubernetes{,-bak}查看证书内的 ipfor i in $(find /etc/kubernetes/pki -type f -name "*.crt");do echo ${i} && openssl x509 -in ${i} -text | grep 'DNS:';done可以看到,只有 apiserver 和 etcd 的证书里面是包含了 ip 的/etc/kubernetes/pki/ca.crt /etc/kubernetes/pki/front-proxy-ca.crt /etc/kubernetes/pki/etcd/ca.crt /etc/kubernetes/pki/etcd/server.crt DNS:master-03, DNS:localhost, IP Address:192.168.11.135, IP Address:127.0.0.1, IP Address:0:0:0:0:0:0:0:1 /etc/kubernetes/pki/etcd/healthcheck-client.crt /etc/kubernetes/pki/etcd/peer.crt DNS:master-03, DNS:localhost, IP Address:192.168.11.135, IP Address:127.0.0.1, IP Address:0:0:0:0:0:0:0:1 /etc/kubernetes/pki/apiserver.crt DNS:master-03, DNS:kubernetes, DNS:kubernetes.default, DNS:kubernetes.default.svc, DNS:kubernetes.default.svc.cluster.local, DNS:lb-vip, IP Address:10.96.0.1, IP Address:192.168.11.135 /etc/kubernetes/pki/apiserver-kubelet-client.crt /etc/kubernetes/pki/front-proxy-client.crt /etc/kubernetes/pki/apiserver-etcd-client.crt生成集群配置kubeadm config view > /root/kubeadm.yaml增加 ipvim kubeadm.yamlapiServer: extraArgs: authorization-mode: Node,RBAC timeoutForControlPlane: 4m0s # 增加下面的配置 certSANs: - 192.168.11.131 - 192.168.11.134 - 192.168.11.136 # 增加上面的配置 apiVersion: kubeadm.k8s.io/v1beta2 certificatesDir: /etc/kubernetes/pki clusterName: kubernetes controlPlaneEndpoint: lb-vip:6443 controllerManager: {} type: CoreDNS etcd: local: dataDir: /var/lib/etcd # 增加下面的配置 serverCertSANs: - 192.168.11.131 - 192.168.11.135 - 192.168.11.136 peerCertSANs: - 192.168.11.131 - 192.168.11.135 - 192.168.11.136 # 增加上面的配置 imageRepository: registry.aliyuncs.com/google_containers kind: ClusterConfiguration kubernetesVersion: v1.17.3 networking: dnsDomain: cluster.local podSubnet: 172.10.0.0/16 serviceSubnet: 10.96.0.0/12 scheduler: {}删除原有的证书需要保留 ca ,sa,front-proxy 这三个证书rm -rf /etc/kubernetes/pki/{apiserver*,front-proxy-client*} rm -rf /etc/kubernetes/pki/etcd/{healthcheck*,peer*,server*}重新生成证书kubeadm init phase certs all --config /root/kubeadm.yaml再次查看证书内的 ipfor i in $(find /etc/kubernetes/pki -type f -name "*.crt");do echo ${i} && openssl x509 -in ${i} -text | grep 'DNS:';done这里可以得到验证,不会覆盖之前证书内已经有的 ip,会将新的 ip 追加到后面/etc/kubernetes/pki/etcd/ca.crt /etc/kubernetes/pki/etcd/server.crt DNS:master-02, DNS:localhost, IP Address:192.168.11.134, IP Address:127.0.0.1, IP Address:0:0:0:0:0:0:0:1, IP Address:192.168.11.131, IP Address:192.168.11.134, IP Address:192.168.11.136 /etc/kubernetes/pki/etcd/peer.crt DNS:master-02, DNS:localhost, IP Address:192.168.11.134, IP Address:127.0.0.1, IP Address:0:0:0:0:0:0:0:1, IP Address:192.168.11.131, IP Address:192.168.11.134, IP Address:192.168.11.136 /etc/kubernetes/pki/etcd/healthcheck-client.crt /etc/kubernetes/pki/ca.crt /etc/kubernetes/pki/front-proxy-ca.crt /etc/kubernetes/pki/apiserver.crt DNS:master-02, DNS:kubernetes, DNS:kubernetes.default, DNS:kubernetes.default.svc, DNS:kubernetes.default.svc.cluster.local, DNS:lb-vip, IP Address:10.96.0.1, IP Address:192.168.11.134, IP Address:192.168.11.131, IP Address:192.168.11.134, IP Address:192.168.11.136 /etc/kubernetes/pki/apiserver-kubelet-client.crt /etc/kubernetes/pki/front-proxy-client.crt /etc/kubernetes/pki/apiserver-etcd-client.crt将配置更新到 configmap 中这样,以后有升级,或者增加其他 ip 时,也会将配置的 CertSANs 的 ip 保留下来,方便以后删减kubeadm init phase upload-config kubeadm --config kubeadm.yaml
官方文档快速上手Kubernetes 部署浅言碎语官方文档有多种部署方式,可惜 k8s 的部署方式使用的是 helm 的方式,并不适用于公司产品的部署形式,需要转换成 yaml 的方式(自己摸石头过河把)关于 DolphinSchedulerApache DolphinScheduler 是一个分布式易扩展的可视化 DAG 工作流任务调度开源系统。解决数据研发 ETL 错综复杂的依赖关系,不能直观监控任务健康状态等问题。DolphinScheduler 以 DAG 流式的方式将 Task 组装起来,可实时监控任务的运行状态,同时支持重试、从指定节点恢复失败、暂停及 Kill 任务等操作简单易用DAG监控界面,所有流程定义都是可视化,通过拖拽任务定制DAG,通过API方式与第三方系统对接,一键部署高可靠性去中心化的多 Master 和多 Worker ,自身支持 HA 功能,采用任务队列来避免过载,不会造成机器卡死丰富的使用场景支持暂停恢复操作.支持多租户,更好的应对大数据的使用场景支持更多的任务类型,如 spark, hive, mr, python, sub_process, shell高扩展性支持自定义任务类型,调度器使用分布式调度,调度能力随集群线性增长,Master 和 Worker 支持动态上下线默认端口组件默认端口MasterServer5678WorkerServer1234ApiApplicationServer12345模块介绍dolphinscheduler-alert - 告警模块,提供 AlertServer 服务dolphinscheduler-api - web 应用模块,提供 ApiServer 服务dolphinscheduler-common - 通用的常量枚举、工具类、数据结构或者基类dolphinscheduler-dao - 提供数据库访问等操作dolphinscheduler-remote - 基于 netty 的客户端、服务端dolphinscheduler-server - MasterServer 和 WorkerServer 服务dolphinscheduler-service - service模块,包含 Quartz、Zookeeper、日志客户端访问服务,便于 server 模块和 api 模块调用dolphinscheduler-ui - 前端模块制作镜像DolphinScheduler 元数据存储在关系型数据库中,目前支持 PostgreSQL 和 MySQL,如果使用 MySQL 则需要手动下载 mysql-connector-java (8.0.16) 驱动并移动到 DolphinScheduler 的 lib 目录下下载 mysql 驱动包 mysql-connector-java-8.0.16.jar (要求 >=8.0.1)准备 debian 的阿里云源,文件名称为:sources.list,和下载好的 mysql 驱动包放一起deb http://mirrors.cloud.aliyuncs.com/debian stable main contrib non-free deb http://mirrors.cloud.aliyuncs.com/debian stable-proposed-updates main contrib non-free deb http://mirrors.cloud.aliyuncs.com/debian stable-updates main contrib non-free deb-src http://mirrors.cloud.aliyuncs.com/debian stable main contrib non-free deb-src http://mirrors.cloud.aliyuncs.com/debian stable-proposed-updates main contrib non-free deb-src http://mirrors.cloud.aliyuncs.com/debian stable-updates main contrib non-free deb http://mirrors.aliyun.com/debian stable main contrib non-free deb http://mirrors.aliyun.com/debian stable-proposed-updates main contrib non-free deb http://mirrors.aliyun.com/debian stable-updates main contrib non-free deb-src http://mirrors.aliyun.com/debian stable main contrib non-free deb-src http://mirrors.aliyun.com/debian stable-proposed-updates main contrib non-free deb-src http://mirrors.aliyun.com/debian stable-updates main contrib non-free增加一些大佬们需要使用的工具FROM apache/dolphinscheduler:2.0.6 ENV PIP_CMD='pip3 install --no-cache-dir -i https://pypi.tuna.tsinghua.edu.cn/simple' COPY mysql-connector-java-8.0.16.jar /opt/apache-dolphinscheduler-2.0.6-bin/lib/mysql-connector-java-8.0.16.jar COPY ./sources.list /tmp/ RUN cat /tmp/sources.list > /etc/apt/sources.list && \ apt-get update && \ apt-get install -y libsasl2-dev python3-pip && \ apt-get autoclean RUN ${PIP_CMD} \ pyhive \ thrift \ thrift-sasl \ pymysql \ pandas \ faker \ sasl \ setuptools_rust \ wheel \ rust \ oss2生成镜像docker build -t dolphinscheduler_mysql:2.0.6 .准备 yaml 文件以下的 yaml 文件是通过 helm 启动 dolphinscheduler 导出的 yaml 文件,有很多参数没有做修改,需要各自根据实际的场景修改后使用,进攻参考使用以下 yaml 文件内指定的 namespace 均为 bigdata,默认已经有 mysql 以及 zookeepermysql 默认用户名密码为:dolphinscheduler/dolphinschedulerdolphinscheduler-master.yaml--- apiVersion: v1 data: LOGGER_SERVER_OPTS: -Xms512m -Xmx512m -Xmn256m MASTER_DISPATCH_TASK_NUM: "3" MASTER_EXEC_TASK_NUM: "20" MASTER_EXEC_THREADS: "100" MASTER_FAILOVER_INTERVAL: "10" MASTER_HEARTBEAT_INTERVAL: "10" MASTER_HOST_SELECTOR: LowerWeight MASTER_KILL_YARN_JOB_WHEN_HANDLE_FAILOVER: "true" MASTER_MAX_CPULOAD_AVG: "-1" MASTER_PERSIST_EVENT_STATE_THREADS: "10" MASTER_RESERVED_MEMORY: "0.3" MASTER_SERVER_OPTS: -Xms1g -Xmx1g -Xmn512m MASTER_TASK_COMMIT_INTERVAL: "1000" MASTER_TASK_COMMIT_RETRYTIMES: "5" ORG_QUARTZ_SCHEDULER_BATCHTRIGGERACQUISTITIONMAXCOUNT: "1" ORG_QUARTZ_THREADPOOL_THREADCOUNT: "25" kind: ConfigMap metadata: labels: app.kubernetes.io/name: dolphinscheduler-master app.kubernetes.io/version: 2.0.6 name: dolphinscheduler-master namespace: bigdata apiVersion: v1 data: DATA_BASEDIR_PATH: /tmp/dolphinscheduler DATASOURCE_ENCRYPTION_ENABLE: "false" DATASOURCE_ENCRYPTION_SALT: '!@#$%^&*' DATAX_HOME: /opt/soft/datax DOLPHINSCHEDULER_OPTS: "" HADOOP_CONF_DIR: /opt/soft/hadoop/etc/hadoop HADOOP_HOME: /opt/soft/hadoop HADOOP_SECURITY_AUTHENTICATION_STARTUP_STATE: "false" HDFS_ROOT_USER: hdfs HIVE_HOME: /opt/soft/hive JAVA_HOME: /usr/local/openjdk-8 LOGIN_USER_KEYTAB_USERNAME: hdfs@HADOOP.COM ORG_QUARTZ_SCHEDULER_BATCHTRIGGERACQUISTITIONMAXCOUNT: "1" ORG_QUARTZ_THREADPOOL_THREADCOUNT: "25" PYTHON_HOME: /usr/bin/python RESOURCE_MANAGER_HTTPADDRESS_PORT: "8088" RESOURCE_STORAGE_TYPE: HDFS RESOURCE_UPLOAD_PATH: /dolphinscheduler SESSION_TIMEOUT_MS: "60000" SPARK_HOME1: /opt/soft/spark1 SPARK_HOME2: /opt/soft/spark2 SUDO_ENABLE: "true" YARN_APPLICATION_STATUS_ADDRESS: http://ds1:%s/ws/v1/cluster/apps/%s YARN_JOB_HISTORY_STATUS_ADDRESS: http://ds1:19888/ws/v1/history/mapreduce/jobs/%s YARN_RESOURCEMANAGER_HA_RM_IDS: "" kind: ConfigMap metadata: labels: app.kubernetes.io/name: dolphinscheduler-common app.kubernetes.io/version: 2.0.6 name: dolphinscheduler-common namespace: bigdata apiVersion: v1 kind: Service metadata: labels: app.kubernetes.io/instance: dolphinscheduler app.kubernetes.io/managed-by: Helm app.kubernetes.io/name: dolphinscheduler-master-headless app.kubernetes.io/version: 2.0.6 name: dolphinscheduler-master-svc namespace: bigdata spec: ports: - name: master-port port: 5678 protocol: TCP selector: app.kubernetes.io/name: dolphinscheduler-master app.kubernetes.io/version: 2.0.6 apiVersion: apps/v1 kind: StatefulSet metadata: labels: app.kubernetes.io/name: dolphinscheduler-master app.kubernetes.io/version: 2.0.6 name: dolphinscheduler-master namespace: bigdata spec: replicas: 1 selector: matchLabels: app.kubernetes.io/name: dolphinscheduler-master app.kubernetes.io/version: 2.0.6 serviceName: dolphinscheduler-master-svc template: metadata: creationTimestamp: null labels: app.kubernetes.io/name: dolphinscheduler-master app.kubernetes.io/version: 2.0.6 spec: containers: - args: - master-server - name: TZ value: Asia/Shanghai - name: DATABASE_TYPE value: mysql # 官方要求使用的 jdbc 包是 8.0 的 ## 因此 driver 需要写成 com.mysql.cj.jdbc.Driver ## 如果是 5.x 以下的版本,需要写成 com.mysql.jdbc.Driver - name: DATABASE_DRIVER value: com.mysql.cj.jdbc.Driver # 根据自己的实际场景修改 value ## 我的 mysql 是 k8s 内的,因此直接写的 svc 地址 - name: DATABASE_HOST value: mysql-svc.bigdata.svc.cluster.local - name: DATABASE_PORT value: "3306" # 如果创建的 mysql 用户不是 dolphinscheduler ## 需要修改这里的 value 的值 - name: DATABASE_USERNAME value: dolphinscheduler # 同上,密码不同,需要修改 value 的值 - name: DATABASE_PASSWORD value: dolphinscheduler # 同上,库名不同,需要修改 value 的值 - name: DATABASE_DATABASE value: dolphinscheduler # jdbc 6.x 以上版本,都需要增加 useSSL=false&serverTimezone=Asia/Shanghai # jdbc 5.x 以下版本,不需要增加 useSSL=false&serverTimezone=Asia/Shanghai - name: DATABASE_PARAMS value: useSSL=false&serverTimezone=Asia/Shanghai&useUnicode=true&characterEncoding=UTF-8 - name: REGISTRY_PLUGIN_NAME value: zookeeper # 同 mysql 地址,zk 是 k8s 内部署,这里写的 svc - name: REGISTRY_SERVERS value: zk-svc.bigdata.svc.cluster.local:2181 envFrom: - configMapRef: name: dolphinscheduler-common - configMapRef: name: dolphinscheduler-master image: dolphinscheduler_mysql:2.0.6 imagePullPolicy: IfNotPresent livenessProbe: exec: command: - bash - /root/checkpoint.sh - MasterServer failureThreshold: 3 initialDelaySeconds: 30 periodSeconds: 30 successThreshold: 1 timeoutSeconds: 5 name: dolphinscheduler-master ports: - containerPort: 5678 name: master-port protocol: TCP readinessProbe: exec: command: - bash - /root/checkpoint.sh - MasterServer failureThreshold: 3 initialDelaySeconds: 30 periodSeconds: 30 successThreshold: 1 timeoutSeconds: 5 volumeMounts: - mountPath: /opt/dolphinscheduler/logs name: dolphinscheduler-master dnsPolicy: ClusterFirst restartPolicy: Always volumes: - emptyDir: {} name: dolphinscheduler-masterdolphinscheduler-alert.yaml--- apiVersion: v1 data: ALERT_SERVER_OPTS: -Xms512m -Xmx512m -Xmn256m kind: ConfigMap metadata: labels: app.kubernetes.io/name: dolphinscheduler-alert app.kubernetes.io/version: 2.0.6 name: dolphinscheduler-alert namespace: bigdata apiVersion: v1 kind: Service metadata: labels: app.kubernetes.io/name: dolphinscheduler-alert app.kubernetes.io/version: 2.0.6 name: dolphinscheduler-alert namespace: bigdata spec: ports: - name: alert-port port: 50052 protocol: TCP selector: app.kubernetes.io/component: alert app.kubernetes.io/name: dolphinscheduler-alert app.kubernetes.io/version: 2.0.6 apiVersion: apps/v1 kind: Deployment metadata: annotations: deployment.kubernetes.io/revision: "1" labels: app.kubernetes.io/name: dolphinscheduler-alert app.kubernetes.io/version: 2.0.6 name: dolphinscheduler-alert namespace: bigdata spec: progressDeadlineSeconds: 600 replicas: 1 revisionHistoryLimit: 10 selector: matchLabels: app.kubernetes.io/name: dolphinscheduler-alert app.kubernetes.io/version: 2.0.6 strategy: rollingUpdate: maxSurge: 25% maxUnavailable: 25% type: RollingUpdate template: metadata: creationTimestamp: null labels: app.kubernetes.io/name: dolphinscheduler-alert app.kubernetes.io/version: 2.0.6 spec: containers: - args: - alert-server - name: TZ value: Asia/Shanghai - name: DATABASE_TYPE value: mysql # 官方要求使用的 jdbc 包是 8.0 的 ## 因此 driver 需要写成 com.mysql.cj.jdbc.Driver ## 如果是 5.x 以下的版本,需要写成 com.mysql.jdbc.Driver - name: DATABASE_DRIVER value: com.mysql.cj.jdbc.Driver # 根据自己的实际场景修改 value ## 我的 mysql 是 k8s 内的,因此直接写的 svc 地址 - name: DATABASE_HOST value: mysql-svc.bigdata.svc.cluster.local - name: DATABASE_PORT value: "3306" # 如果创建的 mysql 用户不是 dolphinscheduler ## 需要修改这里的 value 的值 - name: DATABASE_USERNAME value: dolphinscheduler # 同上,密码不同,需要修改 value 的值 - name: DATABASE_PASSWORD value: dolphinscheduler # 同上,库名不同,需要修改 value 的值 - name: DATABASE_DATABASE value: dolphinscheduler # jdbc 6.x 以上版本,都需要增加 useSSL=false&serverTimezone=Asia/Shanghai # jdbc 5.x 以下版本,不需要增加 useSSL=false&serverTimezone=Asia/Shanghai - name: DATABASE_PARAMS value: useSSL=false&serverTimezone=Asia/Shanghai&useUnicode=true&characterEncoding=UTF-8 envFrom: - configMapRef: name: dolphinscheduler-common - configMapRef: name: dolphinscheduler-alert image: dolphinscheduler_mysql:2.0.6 imagePullPolicy: IfNotPresent livenessProbe: exec: command: - bash - /root/checkpoint.sh - AlertServer failureThreshold: 3 initialDelaySeconds: 30 periodSeconds: 30 successThreshold: 1 timeoutSeconds: 5 name: dolphinscheduler-alert ports: - containerPort: 50052 name: alert-port protocol: TCP readinessProbe: exec: command: - bash - /root/checkpoint.sh - AlertServer failureThreshold: 3 initialDelaySeconds: 30 periodSeconds: 30 successThreshold: 1 timeoutSeconds: 5 resources: {} terminationMessagePath: /dev/termination-log terminationMessagePolicy: File volumeMounts: - mountPath: /opt/dolphinscheduler/logs name: dolphinscheduler-alert dnsPolicy: ClusterFirst restartPolicy: Always schedulerName: default-scheduler securityContext: {} terminationGracePeriodSeconds: 30 volumes: - emptyDir: {} name: dolphinscheduler-alertdolphinscheduler-worker.yaml--- apiVersion: v1 data: LOGGER_SERVER_OPTS: -Xms512m -Xmx512m -Xmn256m WORKER_EXEC_THREADS: "100" WORKER_GROUPS: default WORKER_HEARTBEAT_INTERVAL: "10" WORKER_HOST_WEIGHT: "100" WORKER_MAX_CPULOAD_AVG: "-1" WORKER_RESERVED_MEMORY: "0.3" WORKER_RETRY_REPORT_TASK_STATUS_INTERVAL: "600" WORKER_SERVER_OPTS: -Xms1g -Xmx1g -Xmn512m kind: ConfigMap metadata: labels: app.kubernetes.io/name: dolphinscheduler-worker app.kubernetes.io/version: 2.0.6 name: dolphinscheduler-worker namespace: bigdata apiVersion: v1 kind: Service metadata: labels: app.kubernetes.io/name: dolphinscheduler-worker-headless app.kubernetes.io/version: 2.0.6 name: dolphinscheduler-worker-headless namespace: bigdata spec: ports: - name: worker-port port: 1234 protocol: TCP - name: logger-port port: 50051 protocol: TCP selector: app.kubernetes.io/component: worker app.kubernetes.io/instance: dolphinscheduler app.kubernetes.io/managed-by: Helm app.kubernetes.io/name: dolphinscheduler-worker app.kubernetes.io/version: 2.0.6 apiVersion: apps/v1 kind: StatefulSet metadata: labels: app.kubernetes.io/component: worker app.kubernetes.io/instance: dolphinscheduler app.kubernetes.io/managed-by: Helm app.kubernetes.io/name: dolphinscheduler-worker app.kubernetes.io/version: 2.0.6 name: dolphinscheduler-worker namespace: bigdata spec: replicas: 3 revisionHistoryLimit: 10 selector: matchLabels: app.kubernetes.io/component: worker app.kubernetes.io/instance: dolphinscheduler app.kubernetes.io/managed-by: Helm app.kubernetes.io/name: dolphinscheduler-worker app.kubernetes.io/version: 2.0.6 serviceName: dolphinscheduler-worker-headless template: metadata: creationTimestamp: null labels: app.kubernetes.io/component: worker app.kubernetes.io/instance: dolphinscheduler app.kubernetes.io/managed-by: Helm app.kubernetes.io/name: dolphinscheduler-worker app.kubernetes.io/version: 2.0.6 spec: containers: - args: - worker-server - name: TZ value: Asia/Shanghai - name: ALERT_LISTEN_HOST value: dolphinscheduler-alert - name: DATABASE_TYPE value: mysql # 官方要求使用的 jdbc 包是 8.0 的 ## 因此 driver 需要写成 com.mysql.cj.jdbc.Driver ## 如果是 5.x 以下的版本,需要写成 com.mysql.jdbc.Driver - name: DATABASE_DRIVER value: com.mysql.cj.jdbc.Driver # 根据自己的实际场景修改 value ## 我的 mysql 是 k8s 内的,因此直接写的 svc 地址 - name: DATABASE_HOST value: mysql-svc.bigdata.svc.cluster.local - name: DATABASE_PORT value: "3306" # 如果创建的 mysql 用户不是 dolphinscheduler ## 需要修改这里的 value 的值 - name: DATABASE_USERNAME value: dolphinscheduler # 同上,密码不同,需要修改 value 的值 - name: DATABASE_PASSWORD value: dolphinscheduler # 同上,库名不同,需要修改 value 的值 - name: DATABASE_DATABASE value: dolphinscheduler # jdbc 6.x 以上版本,都需要增加 useSSL=false&serverTimezone=Asia/Shanghai # jdbc 5.x 以下版本,不需要增加 useSSL=false&serverTimezone=Asia/Shanghai - name: DATABASE_PARAMS value: useSSL=false&serverTimezone=Asia/Shanghai&useUnicode=true&characterEncoding=UTF-8 - name: REGISTRY_PLUGIN_NAME value: zookeeper # 同 mysql 地址,zk 是 k8s 内部署,这里写的 svc - name: REGISTRY_SERVERS value: zk-svc.bigdata.svc.cluster.local:2181 envFrom: - configMapRef: name: dolphinscheduler-common - configMapRef: name: dolphinscheduler-worker - configMapRef: name: dolphinscheduler-alert image: dolphinscheduler_mysql:2.0.6 imagePullPolicy: IfNotPresent livenessProbe: exec: command: - bash - /root/checkpoint.sh - WorkerServer failureThreshold: 3 initialDelaySeconds: 30 periodSeconds: 30 successThreshold: 1 timeoutSeconds: 5 name: dolphinscheduler-worker ports: - containerPort: 1234 name: worker-port protocol: TCP - containerPort: 50051 name: logger-port protocol: TCP readinessProbe: exec: command: - bash - /root/checkpoint.sh - WorkerServer failureThreshold: 3 initialDelaySeconds: 30 periodSeconds: 30 successThreshold: 1 timeoutSeconds: 5 resources: {} terminationMessagePath: /dev/termination-log terminationMessagePolicy: File volumeMounts: - mountPath: /tmp/dolphinscheduler name: dolphinscheduler-worker-data - mountPath: /opt/dolphinscheduler/logs name: dolphinscheduler-worker-logs dnsPolicy: ClusterFirst restartPolicy: Always schedulerName: default-scheduler securityContext: {} terminationGracePeriodSeconds: 30 volumes: # 对 data 目录做一个持久化,以 hostpath 的形式创建 - hostPath: path: /data/k8s_data/dolphinscheduler type: DirectoryOrCreate name: dolphinscheduler-worker-data - emptyDir: {} name: dolphinscheduler-worker-logsdolphinscheduler-api.yaml--- apiVersion: v1 data: API_SERVER_OPTS: -Xms512m -Xmx512m -Xmn256m kind: ConfigMap metadata: labels: app.kubernetes.io/name: dolphinscheduler-api app.kubernetes.io/version: 2.0.6 name: dolphinscheduler-api namespace: bigdata apiVersion: v1 kind: Service metadata: labels: app.kubernetes.io/name: dolphinscheduler-api app.kubernetes.io/version: 2.0.6 name: dolphinscheduler-api namespace: bigdata spec: ports: - name: api-port port: 12345 protocol: TCP selector: app.kubernetes.io/name: dolphinscheduler-api app.kubernetes.io/version: 2.0.6 apiVersion: apps/v1 kind: Deployment metadata: annotations: deployment.kubernetes.io/revision: "1" labels: app.kubernetes.io/name: dolphinscheduler-api app.kubernetes.io/version: 2.0.6 name: dolphinscheduler-api namespace: bigdata spec: progressDeadlineSeconds: 600 replicas: 1 revisionHistoryLimit: 10 selector: matchLabels: app.kubernetes.io/component: api app.kubernetes.io/instance: dolphinscheduler app.kubernetes.io/managed-by: Helm app.kubernetes.io/name: dolphinscheduler-api app.kubernetes.io/version: 2.0.6 strategy: rollingUpdate: maxSurge: 25% maxUnavailable: 25% type: RollingUpdate template: metadata: creationTimestamp: null labels: app.kubernetes.io/component: api app.kubernetes.io/instance: dolphinscheduler app.kubernetes.io/managed-by: Helm app.kubernetes.io/name: dolphinscheduler-api app.kubernetes.io/version: 2.0.6 spec: containers: - args: - api-server - name: TZ value: Asia/Shanghai - name: DATABASE_TYPE value: mysql # 官方要求使用的 jdbc 包是 8.0 的 ## 因此 driver 需要写成 com.mysql.cj.jdbc.Driver ## 如果是 5.x 以下的版本,需要写成 com.mysql.jdbc.Driver - name: DATABASE_DRIVER value: com.mysql.cj.jdbc.Driver # 根据自己的实际场景修改 value ## 我的 mysql 是 k8s 内的,因此直接写的 svc 地址 - name: DATABASE_HOST value: mysql-svc.bigdata.svc.cluster.local - name: DATABASE_PORT value: "3306" # 如果创建的 mysql 用户不是 dolphinscheduler ## 需要修改这里的 value 的值 - name: DATABASE_USERNAME value: dolphinscheduler # 同上,密码不同,需要修改 value 的值 - name: DATABASE_PASSWORD value: dolphinscheduler # 同上,库名不同,需要修改 value 的值 - name: DATABASE_DATABASE value: dolphinscheduler # jdbc 6.x 以上版本,都需要增加 useSSL=false&serverTimezone=Asia/Shanghai # jdbc 5.x 以下版本,不需要增加 useSSL=false&serverTimezone=Asia/Shanghai - name: DATABASE_PARAMS value: useSSL=false&serverTimezone=Asia/Shanghai&useUnicode=true&characterEncoding=UTF-8 - name: REGISTRY_PLUGIN_NAME value: zookeeper # 同 mysql 地址,zk 是 k8s 内部署,这里写的 svc - name: REGISTRY_SERVERS value: zk-svc.bigdata.svc.cluster.local:2181 envFrom: - configMapRef: name: dolphinscheduler-common - configMapRef: name: dolphinscheduler-api image: dolphinscheduler_mysql:2.0.6 imagePullPolicy: IfNotPresent livenessProbe: exec: command: - bash - /root/checkpoint.sh - ApiApplicationServer failureThreshold: 3 initialDelaySeconds: 30 periodSeconds: 30 successThreshold: 1 timeoutSeconds: 5 name: dolphinscheduler-api ports: - containerPort: 12345 name: api-port protocol: TCP readinessProbe: exec: command: - bash - /root/checkpoint.sh - ApiApplicationServer failureThreshold: 3 initialDelaySeconds: 30 periodSeconds: 30 successThreshold: 1 timeoutSeconds: 5 resources: {} terminationMessagePath: /dev/termination-log terminationMessagePolicy: File volumeMounts: - mountPath: /opt/dolphinscheduler/logs name: dolphinscheduler-api dnsPolicy: ClusterFirst restartPolicy: Always schedulerName: default-scheduler securityContext: {} terminationGracePeriodSeconds: 30 volumes: - emptyDir: {} name: dolphinscheduler-apidolphinscheduler-ingress.yamlapiVersion: extensions/v1beta1 kind: Ingress metadata: generation: 1 labels: app.kubernetes.io/name: dolphinscheduler app.kubernetes.io/version: 2.0.6 name: dolphinscheduler namespace: bigdata spec: rules: - host: dolphinscheduler.org http: paths: - backend: serviceName: dolphinscheduler-api servicePort: api-port path: /dolphinschedulermysql 初始化创建用户GRANT ALL PRIVILEGES ON dolphinscheduler.* TO 'dolphinscheduler'@'%' IDENTIFIED BY 'dolphinscheduler'; FLUSH PRIVILEGES;建库建表/* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * http://www.apache.org/licenses/LICENSE-2.0 * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. SET FOREIGN_KEY_CHECKS=0; -- ---------------------------- -- Database of dolphinscheduler -- ---------------------------- CREATE DATABASE IF NOT EXISTS dolphinscheduler DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_general_ci; USE dolphinscheduler; -- ---------------------------- -- Table structure for QRTZ_BLOB_TRIGGERS -- ---------------------------- DROP TABLE IF EXISTS `QRTZ_BLOB_TRIGGERS`; CREATE TABLE `QRTZ_BLOB_TRIGGERS` ( `SCHED_NAME` varchar(120) NOT NULL, `TRIGGER_NAME` varchar(200) NOT NULL, `TRIGGER_GROUP` varchar(200) NOT NULL, `BLOB_DATA` blob, PRIMARY KEY (`SCHED_NAME`,`TRIGGER_NAME`,`TRIGGER_GROUP`), KEY `SCHED_NAME` (`SCHED_NAME`,`TRIGGER_NAME`,`TRIGGER_GROUP`), CONSTRAINT `QRTZ_BLOB_TRIGGERS_ibfk_1` FOREIGN KEY (`SCHED_NAME`, `TRIGGER_NAME`, `TRIGGER_GROUP`) REFERENCES `QRTZ_TRIGGERS` (`SCHED_NAME`, `TRIGGER_NAME`, `TRIGGER_GROUP`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8; -- ---------------------------- -- Records of QRTZ_BLOB_TRIGGERS -- ---------------------------- -- ---------------------------- -- Table structure for QRTZ_CALENDARS -- ---------------------------- DROP TABLE IF EXISTS `QRTZ_CALENDARS`; CREATE TABLE `QRTZ_CALENDARS` ( `SCHED_NAME` varchar(120) NOT NULL, `CALENDAR_NAME` varchar(200) NOT NULL, `CALENDAR` blob NOT NULL, PRIMARY KEY (`SCHED_NAME`,`CALENDAR_NAME`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8; -- ---------------------------- -- Records of QRTZ_CALENDARS -- ---------------------------- -- ---------------------------- -- Table structure for QRTZ_CRON_TRIGGERS -- ---------------------------- DROP TABLE IF EXISTS `QRTZ_CRON_TRIGGERS`; CREATE TABLE `QRTZ_CRON_TRIGGERS` ( `SCHED_NAME` varchar(120) NOT NULL, `TRIGGER_NAME` varchar(200) NOT NULL, `TRIGGER_GROUP` varchar(200) NOT NULL, `CRON_EXPRESSION` varchar(120) NOT NULL, `TIME_ZONE_ID` varchar(80) DEFAULT NULL, PRIMARY KEY (`SCHED_NAME`,`TRIGGER_NAME`,`TRIGGER_GROUP`), CONSTRAINT `QRTZ_CRON_TRIGGERS_ibfk_1` FOREIGN KEY (`SCHED_NAME`, `TRIGGER_NAME`, `TRIGGER_GROUP`) REFERENCES `QRTZ_TRIGGERS` (`SCHED_NAME`, `TRIGGER_NAME`, `TRIGGER_GROUP`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8; -- ---------------------------- -- Records of QRTZ_CRON_TRIGGERS -- ---------------------------- -- ---------------------------- -- Table structure for QRTZ_FIRED_TRIGGERS -- ---------------------------- DROP TABLE IF EXISTS `QRTZ_FIRED_TRIGGERS`; CREATE TABLE `QRTZ_FIRED_TRIGGERS` ( `SCHED_NAME` varchar(120) NOT NULL, `ENTRY_ID` varchar(200) NOT NULL, `TRIGGER_NAME` varchar(200) NOT NULL, `TRIGGER_GROUP` varchar(200) NOT NULL, `INSTANCE_NAME` varchar(200) NOT NULL, `FIRED_TIME` bigint(13) NOT NULL, `SCHED_TIME` bigint(13) NOT NULL, `PRIORITY` int(11) NOT NULL, `STATE` varchar(16) NOT NULL, `JOB_NAME` varchar(200) DEFAULT NULL, `JOB_GROUP` varchar(200) DEFAULT NULL, `IS_NONCONCURRENT` varchar(1) DEFAULT NULL, `REQUESTS_RECOVERY` varchar(1) DEFAULT NULL, PRIMARY KEY (`SCHED_NAME`,`ENTRY_ID`), KEY `IDX_QRTZ_FT_TRIG_INST_NAME` (`SCHED_NAME`,`INSTANCE_NAME`), KEY `IDX_QRTZ_FT_INST_JOB_REQ_RCVRY` (`SCHED_NAME`,`INSTANCE_NAME`,`REQUESTS_RECOVERY`), KEY `IDX_QRTZ_FT_J_G` (`SCHED_NAME`,`JOB_NAME`,`JOB_GROUP`), KEY `IDX_QRTZ_FT_JG` (`SCHED_NAME`,`JOB_GROUP`), KEY `IDX_QRTZ_FT_T_G` (`SCHED_NAME`,`TRIGGER_NAME`,`TRIGGER_GROUP`), KEY `IDX_QRTZ_FT_TG` (`SCHED_NAME`,`TRIGGER_GROUP`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8; -- ---------------------------- -- Records of QRTZ_FIRED_TRIGGERS -- ---------------------------- -- ---------------------------- -- Table structure for QRTZ_JOB_DETAILS -- ---------------------------- DROP TABLE IF EXISTS `QRTZ_JOB_DETAILS`; CREATE TABLE `QRTZ_JOB_DETAILS` ( `SCHED_NAME` varchar(120) NOT NULL, `JOB_NAME` varchar(200) NOT NULL, `JOB_GROUP` varchar(200) NOT NULL, `DESCRIPTION` varchar(250) DEFAULT NULL, `JOB_CLASS_NAME` varchar(250) NOT NULL, `IS_DURABLE` varchar(1) NOT NULL, `IS_NONCONCURRENT` varchar(1) NOT NULL, `IS_UPDATE_DATA` varchar(1) NOT NULL, `REQUESTS_RECOVERY` varchar(1) NOT NULL, `JOB_DATA` blob, PRIMARY KEY (`SCHED_NAME`,`JOB_NAME`,`JOB_GROUP`), KEY `IDX_QRTZ_J_REQ_RECOVERY` (`SCHED_NAME`,`REQUESTS_RECOVERY`), KEY `IDX_QRTZ_J_GRP` (`SCHED_NAME`,`JOB_GROUP`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8; -- ---------------------------- -- Records of QRTZ_JOB_DETAILS -- ---------------------------- -- ---------------------------- -- Table structure for QRTZ_LOCKS -- ---------------------------- DROP TABLE IF EXISTS `QRTZ_LOCKS`; CREATE TABLE `QRTZ_LOCKS` ( `SCHED_NAME` varchar(120) NOT NULL, `LOCK_NAME` varchar(40) NOT NULL, PRIMARY KEY (`SCHED_NAME`,`LOCK_NAME`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8; -- ---------------------------- -- Records of QRTZ_LOCKS -- ---------------------------- -- ---------------------------- -- Table structure for QRTZ_PAUSED_TRIGGER_GRPS -- ---------------------------- DROP TABLE IF EXISTS `QRTZ_PAUSED_TRIGGER_GRPS`; CREATE TABLE `QRTZ_PAUSED_TRIGGER_GRPS` ( `SCHED_NAME` varchar(120) NOT NULL, `TRIGGER_GROUP` varchar(200) NOT NULL, PRIMARY KEY (`SCHED_NAME`,`TRIGGER_GROUP`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8; -- ---------------------------- -- Records of QRTZ_PAUSED_TRIGGER_GRPS -- ---------------------------- -- ---------------------------- -- Table structure for QRTZ_SCHEDULER_STATE -- ---------------------------- DROP TABLE IF EXISTS `QRTZ_SCHEDULER_STATE`; CREATE TABLE `QRTZ_SCHEDULER_STATE` ( `SCHED_NAME` varchar(120) NOT NULL, `INSTANCE_NAME` varchar(200) NOT NULL, `LAST_CHECKIN_TIME` bigint(13) NOT NULL, `CHECKIN_INTERVAL` bigint(13) NOT NULL, PRIMARY KEY (`SCHED_NAME`,`INSTANCE_NAME`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8; -- ---------------------------- -- Records of QRTZ_SCHEDULER_STATE -- ---------------------------- -- ---------------------------- -- Table structure for QRTZ_SIMPLE_TRIGGERS -- ---------------------------- DROP TABLE IF EXISTS `QRTZ_SIMPLE_TRIGGERS`; CREATE TABLE `QRTZ_SIMPLE_TRIGGERS` ( `SCHED_NAME` varchar(120) NOT NULL, `TRIGGER_NAME` varchar(200) NOT NULL, `TRIGGER_GROUP` varchar(200) NOT NULL, `REPEAT_COUNT` bigint(7) NOT NULL, `REPEAT_INTERVAL` bigint(12) NOT NULL, `TIMES_TRIGGERED` bigint(10) NOT NULL, PRIMARY KEY (`SCHED_NAME`,`TRIGGER_NAME`,`TRIGGER_GROUP`), CONSTRAINT `QRTZ_SIMPLE_TRIGGERS_ibfk_1` FOREIGN KEY (`SCHED_NAME`, `TRIGGER_NAME`, `TRIGGER_GROUP`) REFERENCES `QRTZ_TRIGGERS` (`SCHED_NAME`, `TRIGGER_NAME`, `TRIGGER_GROUP`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8; -- ---------------------------- -- Records of QRTZ_SIMPLE_TRIGGERS -- ---------------------------- -- ---------------------------- -- Table structure for QRTZ_SIMPROP_TRIGGERS -- ---------------------------- DROP TABLE IF EXISTS `QRTZ_SIMPROP_TRIGGERS`; CREATE TABLE `QRTZ_SIMPROP_TRIGGERS` ( `SCHED_NAME` varchar(120) NOT NULL, `TRIGGER_NAME` varchar(200) NOT NULL, `TRIGGER_GROUP` varchar(200) NOT NULL, `STR_PROP_1` varchar(512) DEFAULT NULL, `STR_PROP_2` varchar(512) DEFAULT NULL, `STR_PROP_3` varchar(512) DEFAULT NULL, `INT_PROP_1` int(11) DEFAULT NULL, `INT_PROP_2` int(11) DEFAULT NULL, `LONG_PROP_1` bigint(20) DEFAULT NULL, `LONG_PROP_2` bigint(20) DEFAULT NULL, `DEC_PROP_1` decimal(13,4) DEFAULT NULL, `DEC_PROP_2` decimal(13,4) DEFAULT NULL, `BOOL_PROP_1` varchar(1) DEFAULT NULL, `BOOL_PROP_2` varchar(1) DEFAULT NULL, PRIMARY KEY (`SCHED_NAME`,`TRIGGER_NAME`,`TRIGGER_GROUP`), CONSTRAINT `QRTZ_SIMPROP_TRIGGERS_ibfk_1` FOREIGN KEY (`SCHED_NAME`, `TRIGGER_NAME`, `TRIGGER_GROUP`) REFERENCES `QRTZ_TRIGGERS` (`SCHED_NAME`, `TRIGGER_NAME`, `TRIGGER_GROUP`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8; -- ---------------------------- -- Records of QRTZ_SIMPROP_TRIGGERS -- ---------------------------- -- ---------------------------- -- Table structure for QRTZ_TRIGGERS -- ---------------------------- DROP TABLE IF EXISTS `QRTZ_TRIGGERS`; CREATE TABLE `QRTZ_TRIGGERS` ( `SCHED_NAME` varchar(120) NOT NULL, `TRIGGER_NAME` varchar(200) NOT NULL, `TRIGGER_GROUP` varchar(200) NOT NULL, `JOB_NAME` varchar(200) NOT NULL, `JOB_GROUP` varchar(200) NOT NULL, `DESCRIPTION` varchar(250) DEFAULT NULL, `NEXT_FIRE_TIME` bigint(13) DEFAULT NULL, `PREV_FIRE_TIME` bigint(13) DEFAULT NULL, `PRIORITY` int(11) DEFAULT NULL, `TRIGGER_STATE` varchar(16) NOT NULL, `TRIGGER_TYPE` varchar(8) NOT NULL, `START_TIME` bigint(13) NOT NULL, `END_TIME` bigint(13) DEFAULT NULL, `CALENDAR_NAME` varchar(200) DEFAULT NULL, `MISFIRE_INSTR` smallint(2) DEFAULT NULL, `JOB_DATA` blob, PRIMARY KEY (`SCHED_NAME`,`TRIGGER_NAME`,`TRIGGER_GROUP`), KEY `IDX_QRTZ_T_J` (`SCHED_NAME`,`JOB_NAME`,`JOB_GROUP`), KEY `IDX_QRTZ_T_JG` (`SCHED_NAME`,`JOB_GROUP`), KEY `IDX_QRTZ_T_C` (`SCHED_NAME`,`CALENDAR_NAME`), KEY `IDX_QRTZ_T_G` (`SCHED_NAME`,`TRIGGER_GROUP`), KEY `IDX_QRTZ_T_STATE` (`SCHED_NAME`,`TRIGGER_STATE`), KEY `IDX_QRTZ_T_N_STATE` (`SCHED_NAME`,`TRIGGER_NAME`,`TRIGGER_GROUP`,`TRIGGER_STATE`), KEY `IDX_QRTZ_T_N_G_STATE` (`SCHED_NAME`,`TRIGGER_GROUP`,`TRIGGER_STATE`), KEY `IDX_QRTZ_T_NEXT_FIRE_TIME` (`SCHED_NAME`,`NEXT_FIRE_TIME`), KEY `IDX_QRTZ_T_NFT_ST` (`SCHED_NAME`,`TRIGGER_STATE`,`NEXT_FIRE_TIME`), KEY `IDX_QRTZ_T_NFT_MISFIRE` (`SCHED_NAME`,`MISFIRE_INSTR`,`NEXT_FIRE_TIME`), KEY `IDX_QRTZ_T_NFT_ST_MISFIRE` (`SCHED_NAME`,`MISFIRE_INSTR`,`NEXT_FIRE_TIME`,`TRIGGER_STATE`), KEY `IDX_QRTZ_T_NFT_ST_MISFIRE_GRP` (`SCHED_NAME`,`MISFIRE_INSTR`,`NEXT_FIRE_TIME`,`TRIGGER_GROUP`,`TRIGGER_STATE`), CONSTRAINT `QRTZ_TRIGGERS_ibfk_1` FOREIGN KEY (`SCHED_NAME`, `JOB_NAME`, `JOB_GROUP`) REFERENCES `QRTZ_JOB_DETAILS` (`SCHED_NAME`, `JOB_NAME`, `JOB_GROUP`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8; -- ---------------------------- -- Records of QRTZ_TRIGGERS -- ---------------------------- -- ---------------------------- -- Table structure for t_ds_access_token -- ---------------------------- DROP TABLE IF EXISTS `t_ds_access_token`; CREATE TABLE `t_ds_access_token` ( `id` int(11) NOT NULL AUTO_INCREMENT COMMENT 'key', `user_id` int(11) DEFAULT NULL COMMENT 'user id', `token` varchar(64) DEFAULT NULL COMMENT 'token', `expire_time` datetime DEFAULT NULL COMMENT 'end time of token ', `create_time` datetime DEFAULT NULL COMMENT 'create time', `update_time` datetime DEFAULT NULL COMMENT 'update time', PRIMARY KEY (`id`) ) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8; -- ---------------------------- -- Records of t_ds_access_token -- ---------------------------- -- ---------------------------- -- Table structure for t_ds_alert -- ---------------------------- DROP TABLE IF EXISTS `t_ds_alert`; CREATE TABLE `t_ds_alert` ( `id` int(11) NOT NULL AUTO_INCREMENT COMMENT 'key', `title` varchar(64) DEFAULT NULL COMMENT 'title', `content` text COMMENT 'Message content (can be email, can be SMS. Mail is stored in JSON map, and SMS is string)', `alert_status` tinyint(4) DEFAULT '0' COMMENT '0:wait running,1:success,2:failed', `log` text COMMENT 'log', `alertgroup_id` int(11) DEFAULT NULL COMMENT 'alert group id', `create_time` datetime DEFAULT NULL COMMENT 'create time', `update_time` datetime DEFAULT NULL COMMENT 'update time', PRIMARY KEY (`id`) ) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8; -- ---------------------------- -- Records of t_ds_alert -- ---------------------------- -- ---------------------------- -- Table structure for t_ds_alertgroup -- ---------------------------- DROP TABLE IF EXISTS `t_ds_alertgroup`; CREATE TABLE `t_ds_alertgroup`( `id` int(11) NOT NULL AUTO_INCREMENT COMMENT 'key', `alert_instance_ids` varchar (255) DEFAULT NULL COMMENT 'alert instance ids', `create_user_id` int(11) DEFAULT NULL COMMENT 'create user id', `group_name` varchar(255) DEFAULT NULL COMMENT 'group name', `description` varchar(255) DEFAULT NULL, `create_time` datetime DEFAULT NULL COMMENT 'create time', `update_time` datetime DEFAULT NULL COMMENT 'update time', PRIMARY KEY (`id`), UNIQUE KEY `t_ds_alertgroup_name_un` (`group_name`) ) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8; -- ---------------------------- -- Records of t_ds_alertgroup -- ---------------------------- -- ---------------------------- -- Table structure for t_ds_command -- ---------------------------- DROP TABLE IF EXISTS `t_ds_command`; CREATE TABLE `t_ds_command` ( `id` int(11) NOT NULL AUTO_INCREMENT COMMENT 'key', `command_type` tinyint(4) DEFAULT NULL COMMENT 'Command type: 0 start workflow, 1 start execution from current node, 2 resume fault-tolerant workflow, 3 resume pause process, 4 start execution from failed node, 5 complement, 6 schedule, 7 rerun, 8 pause, 9 stop, 10 resume waiting thread', `process_definition_code` bigint(20) NOT NULL COMMENT 'process definition code', `process_definition_version` int(11) DEFAULT '0' COMMENT 'process definition version', `process_instance_id` int(11) DEFAULT '0' COMMENT 'process instance id', `command_param` text COMMENT 'json command parameters', `task_depend_type` tinyint(4) DEFAULT NULL COMMENT 'Node dependency type: 0 current node, 1 forward, 2 backward', `failure_strategy` tinyint(4) DEFAULT '0' COMMENT 'Failed policy: 0 end, 1 continue', `warning_type` tinyint(4) DEFAULT '0' COMMENT 'Alarm type: 0 is not sent, 1 process is sent successfully, 2 process is sent failed, 3 process is sent successfully and all failures are sent', `warning_group_id` int(11) DEFAULT NULL COMMENT 'warning group', `schedule_time` datetime DEFAULT NULL COMMENT 'schedule time', `start_time` datetime DEFAULT NULL COMMENT 'start time', `executor_id` int(11) DEFAULT NULL COMMENT 'executor id', `update_time` datetime DEFAULT NULL COMMENT 'update time', `process_instance_priority` int(11) DEFAULT NULL COMMENT 'process instance priority: 0 Highest,1 High,2 Medium,3 Low,4 Lowest', `worker_group` varchar(64) COMMENT 'worker group', `environment_code` bigint(20) DEFAULT '-1' COMMENT 'environment code', `dry_run` tinyint(4) DEFAULT '0' COMMENT 'dry run flag:0 normal, 1 dry run', PRIMARY KEY (`id`), KEY `priority_id_index` (`process_instance_priority`,`id`) USING BTREE ) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8; -- ---------------------------- -- Records of t_ds_command -- ---------------------------- -- ---------------------------- -- Table structure for t_ds_datasource -- ---------------------------- DROP TABLE IF EXISTS `t_ds_datasource`; CREATE TABLE `t_ds_datasource` ( `id` int(11) NOT NULL AUTO_INCREMENT COMMENT 'key', `name` varchar(64) NOT NULL COMMENT 'data source name', `note` varchar(255) DEFAULT NULL COMMENT 'description', `type` tinyint(4) NOT NULL COMMENT 'data source type: 0:mysql,1:postgresql,2:hive,3:spark', `user_id` int(11) NOT NULL COMMENT 'the creator id', `connection_params` text NOT NULL COMMENT 'json connection params', `create_time` datetime NOT NULL COMMENT 'create time', `update_time` datetime DEFAULT NULL COMMENT 'update time', PRIMARY KEY (`id`), UNIQUE KEY `t_ds_datasource_name_un` (`name`, `type`) ) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8; -- ---------------------------- -- Records of t_ds_datasource -- ---------------------------- -- ---------------------------- -- Table structure for t_ds_error_command -- ---------------------------- DROP TABLE IF EXISTS `t_ds_error_command`; CREATE TABLE `t_ds_error_command` ( `id` int(11) NOT NULL COMMENT 'key', `command_type` tinyint(4) DEFAULT NULL COMMENT 'command type', `executor_id` int(11) DEFAULT NULL COMMENT 'executor id', `process_definition_code` bigint(20) NOT NULL COMMENT 'process definition code', `process_definition_version` int(11) DEFAULT '0' COMMENT 'process definition version', `process_instance_id` int(11) DEFAULT '0' COMMENT 'process instance id: 0', `command_param` text COMMENT 'json command parameters', `task_depend_type` tinyint(4) DEFAULT NULL COMMENT 'task depend type', `failure_strategy` tinyint(4) DEFAULT '0' COMMENT 'failure strategy', `warning_type` tinyint(4) DEFAULT '0' COMMENT 'warning type', `warning_group_id` int(11) DEFAULT NULL COMMENT 'warning group id', `schedule_time` datetime DEFAULT NULL COMMENT 'scheduler time', `start_time` datetime DEFAULT NULL COMMENT 'start time', `update_time` datetime DEFAULT NULL COMMENT 'update time', `process_instance_priority` int(11) DEFAULT NULL COMMENT 'process instance priority, 0 Highest,1 High,2 Medium,3 Low,4 Lowest', `worker_group` varchar(64) COMMENT 'worker group', `environment_code` bigint(20) DEFAULT '-1' COMMENT 'environment code', `message` text COMMENT 'message', `dry_run` tinyint(4) DEFAULT '0' COMMENT 'dry run flag: 0 normal, 1 dry run', PRIMARY KEY (`id`) USING BTREE ) ENGINE=InnoDB DEFAULT CHARSET=utf8 ROW_FORMAT=DYNAMIC; -- ---------------------------- -- Records of t_ds_error_command -- ---------------------------- -- ---------------------------- -- Table structure for t_ds_process_definition -- ---------------------------- DROP TABLE IF EXISTS `t_ds_process_definition`; CREATE TABLE `t_ds_process_definition` ( `id` int(11) NOT NULL AUTO_INCREMENT COMMENT 'self-increasing id', `code` bigint(20) NOT NULL COMMENT 'encoding', `name` varchar(255) DEFAULT NULL COMMENT 'process definition name', `version` int(11) DEFAULT '0' COMMENT 'process definition version', `description` text COMMENT 'description', `project_code` bigint(20) NOT NULL COMMENT 'project code', `release_state` tinyint(4) DEFAULT NULL COMMENT 'process definition release state:0:offline,1:online', `user_id` int(11) DEFAULT NULL COMMENT 'process definition creator id', `global_params` text COMMENT 'global parameters', `flag` tinyint(4) DEFAULT NULL COMMENT '0 not available, 1 available', `locations` text COMMENT 'Node location information', `warning_group_id` int(11) DEFAULT NULL COMMENT 'alert group id', `timeout` int(11) DEFAULT '0' COMMENT 'time out, unit: minute', `tenant_id` int(11) NOT NULL DEFAULT '-1' COMMENT 'tenant id', `execution_type` tinyint(4) DEFAULT '0' COMMENT 'execution_type 0:parallel,1:serial wait,2:serial discard,3:serial priority', `create_time` datetime NOT NULL COMMENT 'create time', `update_time` datetime NOT NULL COMMENT 'update time', PRIMARY KEY (`id`,`code`), UNIQUE KEY `process_unique` (`name`,`project_code`) USING BTREE ) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8; -- ---------------------------- -- Records of t_ds_process_definition -- ---------------------------- -- ---------------------------- -- Table structure for t_ds_process_definition_log -- ---------------------------- DROP TABLE IF EXISTS `t_ds_process_definition_log`; CREATE TABLE `t_ds_process_definition_log` ( `id` int(11) NOT NULL AUTO_INCREMENT COMMENT 'self-increasing id', `code` bigint(20) NOT NULL COMMENT 'encoding', `name` varchar(200) DEFAULT NULL COMMENT 'process definition name', `version` int(11) DEFAULT '0' COMMENT 'process definition version', `description` text COMMENT 'description', `project_code` bigint(20) NOT NULL COMMENT 'project code', `release_state` tinyint(4) DEFAULT NULL COMMENT 'process definition release state:0:offline,1:online', `user_id` int(11) DEFAULT NULL COMMENT 'process definition creator id', `global_params` text COMMENT 'global parameters', `flag` tinyint(4) DEFAULT NULL COMMENT '0 not available, 1 available', `locations` text COMMENT 'Node location information', `warning_group_id` int(11) DEFAULT NULL COMMENT 'alert group id', `timeout` int(11) DEFAULT '0' COMMENT 'time out,unit: minute', `tenant_id` int(11) NOT NULL DEFAULT '-1' COMMENT 'tenant id', `execution_type` tinyint(4) DEFAULT '0' COMMENT 'execution_type 0:parallel,1:serial wait,2:serial discard,3:serial priority', `operator` int(11) DEFAULT NULL COMMENT 'operator user id', `operate_time` datetime DEFAULT NULL COMMENT 'operate time', `create_time` datetime NOT NULL COMMENT 'create time', `update_time` datetime NOT NULL COMMENT 'update time', PRIMARY KEY (`id`) ) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8; -- ---------------------------- -- Table structure for t_ds_task_definition -- ---------------------------- DROP TABLE IF EXISTS `t_ds_task_definition`; CREATE TABLE `t_ds_task_definition` ( `id` int(11) NOT NULL AUTO_INCREMENT COMMENT 'self-increasing id', `code` bigint(20) NOT NULL COMMENT 'encoding', `name` varchar(200) DEFAULT NULL COMMENT 'task definition name', `version` int(11) DEFAULT '0' COMMENT 'task definition version', `description` text COMMENT 'description', `project_code` bigint(20) NOT NULL COMMENT 'project code', `user_id` int(11) DEFAULT NULL COMMENT 'task definition creator id', `task_type` varchar(50) NOT NULL COMMENT 'task type', `task_params` longtext COMMENT 'job custom parameters', `flag` tinyint(2) DEFAULT NULL COMMENT '0 not available, 1 available', `task_priority` tinyint(4) DEFAULT NULL COMMENT 'job priority', `worker_group` varchar(200) DEFAULT NULL COMMENT 'worker grouping', `environment_code` bigint(20) DEFAULT '-1' COMMENT 'environment code', `fail_retry_times` int(11) DEFAULT NULL COMMENT 'number of failed retries', `fail_retry_interval` int(11) DEFAULT NULL COMMENT 'failed retry interval', `timeout_flag` tinyint(2) DEFAULT '0' COMMENT 'timeout flag:0 close, 1 open', `timeout_notify_strategy` tinyint(4) DEFAULT NULL COMMENT 'timeout notification policy: 0 warning, 1 fail', `timeout` int(11) DEFAULT '0' COMMENT 'timeout length,unit: minute', `delay_time` int(11) DEFAULT '0' COMMENT 'delay execution time,unit: minute', `resource_ids` text COMMENT 'resource id, separated by comma', `create_time` datetime NOT NULL COMMENT 'create time', `update_time` datetime NOT NULL COMMENT 'update time', PRIMARY KEY (`id`,`code`) ) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8; -- ---------------------------- -- Table structure for t_ds_task_definition_log -- ---------------------------- DROP TABLE IF EXISTS `t_ds_task_definition_log`; CREATE TABLE `t_ds_task_definition_log` ( `id` int(11) NOT NULL AUTO_INCREMENT COMMENT 'self-increasing id', `code` bigint(20) NOT NULL COMMENT 'encoding', `name` varchar(200) DEFAULT NULL COMMENT 'task definition name', `version` int(11) DEFAULT '0' COMMENT 'task definition version', `description` text COMMENT 'description', `project_code` bigint(20) NOT NULL COMMENT 'project code', `user_id` int(11) DEFAULT NULL COMMENT 'task definition creator id', `task_type` varchar(50) NOT NULL COMMENT 'task type', `task_params` longtext COMMENT 'job custom parameters', `flag` tinyint(2) DEFAULT NULL COMMENT '0 not available, 1 available', `task_priority` tinyint(4) DEFAULT NULL COMMENT 'job priority', `worker_group` varchar(200) DEFAULT NULL COMMENT 'worker grouping', `environment_code` bigint(20) DEFAULT '-1' COMMENT 'environment code', `fail_retry_times` int(11) DEFAULT NULL COMMENT 'number of failed retries', `fail_retry_interval` int(11) DEFAULT NULL COMMENT 'failed retry interval', `timeout_flag` tinyint(2) DEFAULT '0' COMMENT 'timeout flag:0 close, 1 open', `timeout_notify_strategy` tinyint(4) DEFAULT NULL COMMENT 'timeout notification policy: 0 warning, 1 fail', `timeout` int(11) DEFAULT '0' COMMENT 'timeout length,unit: minute', `delay_time` int(11) DEFAULT '0' COMMENT 'delay execution time,unit: minute', `resource_ids` text DEFAULT NULL COMMENT 'resource id, separated by comma', `operator` int(11) DEFAULT NULL COMMENT 'operator user id', `operate_time` datetime DEFAULT NULL COMMENT 'operate time', `create_time` datetime NOT NULL COMMENT 'create time', `update_time` datetime NOT NULL COMMENT 'update time', PRIMARY KEY (`id`), KEY `idx_code_version` (`code`,`version`), KEY `idx_project_code` (`project_code`) ) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8; -- ---------------------------- -- Table structure for t_ds_process_task_relation -- ---------------------------- DROP TABLE IF EXISTS `t_ds_process_task_relation`; CREATE TABLE `t_ds_process_task_relation` ( `id` int(11) NOT NULL AUTO_INCREMENT COMMENT 'self-increasing id', `name` varchar(200) DEFAULT NULL COMMENT 'relation name', `project_code` bigint(20) NOT NULL COMMENT 'project code', `process_definition_code` bigint(20) NOT NULL COMMENT 'process code', `process_definition_version` int(11) NOT NULL COMMENT 'process version', `pre_task_code` bigint(20) NOT NULL COMMENT 'pre task code', `pre_task_version` int(11) NOT NULL COMMENT 'pre task version', `post_task_code` bigint(20) NOT NULL COMMENT 'post task code', `post_task_version` int(11) NOT NULL COMMENT 'post task version', `condition_type` tinyint(2) DEFAULT NULL COMMENT 'condition type : 0 none, 1 judge 2 delay', `condition_params` text COMMENT 'condition params(json)', `create_time` datetime NOT NULL COMMENT 'create time', `update_time` datetime NOT NULL COMMENT 'update time', PRIMARY KEY (`id`), KEY `idx_code` (`project_code`,`process_definition_code`) ) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8; -- ---------------------------- -- Table structure for t_ds_process_task_relation_log -- ---------------------------- DROP TABLE IF EXISTS `t_ds_process_task_relation_log`; CREATE TABLE `t_ds_process_task_relation_log` ( `id` int(11) NOT NULL AUTO_INCREMENT COMMENT 'self-increasing id', `name` varchar(200) DEFAULT NULL COMMENT 'relation name', `project_code` bigint(20) NOT NULL COMMENT 'project code', `process_definition_code` bigint(20) NOT NULL COMMENT 'process code', `process_definition_version` int(11) NOT NULL COMMENT 'process version', `pre_task_code` bigint(20) NOT NULL COMMENT 'pre task code', `pre_task_version` int(11) NOT NULL COMMENT 'pre task version', `post_task_code` bigint(20) NOT NULL COMMENT 'post task code', `post_task_version` int(11) NOT NULL COMMENT 'post task version', `condition_type` tinyint(2) DEFAULT NULL COMMENT 'condition type : 0 none, 1 judge 2 delay', `condition_params` text COMMENT 'condition params(json)', `operator` int(11) DEFAULT NULL COMMENT 'operator user id', `operate_time` datetime DEFAULT NULL COMMENT 'operate time', `create_time` datetime NOT NULL COMMENT 'create time', `update_time` datetime NOT NULL COMMENT 'update time', PRIMARY KEY (`id`), KEY `idx_process_code_version` (`process_definition_code`,`process_definition_version`) ) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8; -- ---------------------------- -- Table structure for t_ds_process_instance -- ---------------------------- DROP TABLE IF EXISTS `t_ds_process_instance`; CREATE TABLE `t_ds_process_instance` ( `id` int(11) NOT NULL AUTO_INCREMENT COMMENT 'key', `name` varchar(255) DEFAULT NULL COMMENT 'process instance name', `process_definition_code` bigint(20) NOT NULL COMMENT 'process definition code', `process_definition_version` int(11) DEFAULT '0' COMMENT 'process definition version', `state` tinyint(4) DEFAULT NULL COMMENT 'process instance Status: 0 commit succeeded, 1 running, 2 prepare to pause, 3 pause, 4 prepare to stop, 5 stop, 6 fail, 7 succeed, 8 need fault tolerance, 9 kill, 10 wait for thread, 11 wait for dependency to complete', `recovery` tinyint(4) DEFAULT NULL COMMENT 'process instance failover flag:0:normal,1:failover instance', `start_time` datetime DEFAULT NULL COMMENT 'process instance start time', `end_time` datetime DEFAULT NULL COMMENT 'process instance end time', `run_times` int(11) DEFAULT NULL COMMENT 'process instance run times', `host` varchar(135) DEFAULT NULL COMMENT 'process instance host', `command_type` tinyint(4) DEFAULT NULL COMMENT 'command type', `command_param` text COMMENT 'json command parameters', `task_depend_type` tinyint(4) DEFAULT NULL COMMENT 'task depend type. 0: only current node,1:before the node,2:later nodes', `max_try_times` tinyint(4) DEFAULT '0' COMMENT 'max try times', `failure_strategy` tinyint(4) DEFAULT '0' COMMENT 'failure strategy. 0:end the process when node failed,1:continue running the other nodes when node failed', `warning_type` tinyint(4) DEFAULT '0' COMMENT 'warning type. 0:no warning,1:warning if process success,2:warning if process failed,3:warning if success', `warning_group_id` int(11) DEFAULT NULL COMMENT 'warning group id', `schedule_time` datetime DEFAULT NULL COMMENT 'schedule time', `command_start_time` datetime DEFAULT NULL COMMENT 'command start time', `global_params` text COMMENT 'global parameters', `flag` tinyint(4) DEFAULT '1' COMMENT 'flag', `update_time` timestamp NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP, `is_sub_process` int(11) DEFAULT '0' COMMENT 'flag, whether the process is sub process', `executor_id` int(11) NOT NULL COMMENT 'executor id', `history_cmd` text COMMENT 'history commands of process instance operation', `process_instance_priority` int(11) DEFAULT NULL COMMENT 'process instance priority. 0 Highest,1 High,2 Medium,3 Low,4 Lowest', `worker_group` varchar(64) DEFAULT NULL COMMENT 'worker group id', `environment_code` bigint(20) DEFAULT '-1' COMMENT 'environment code', `timeout` int(11) DEFAULT '0' COMMENT 'time out', `tenant_id` int(11) NOT NULL DEFAULT '-1' COMMENT 'tenant id', `var_pool` longtext COMMENT 'var_pool', `dry_run` tinyint(4) DEFAULT '0' COMMENT 'dry run flag:0 normal, 1 dry run', `next_process_instance_id` int(11) DEFAULT '0' COMMENT 'serial queue next processInstanceId', `restart_time` datetime DEFAULT NULL COMMENT 'process instance restart time', PRIMARY KEY (`id`), KEY `process_instance_index` (`process_definition_code`,`id`) USING BTREE, KEY `start_time_index` (`start_time`,`end_time`) USING BTREE ) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8; -- ---------------------------- -- Records of t_ds_process_instance -- ---------------------------- -- ---------------------------- -- Table structure for t_ds_project -- ---------------------------- DROP TABLE IF EXISTS `t_ds_project`; CREATE TABLE `t_ds_project` ( `id` int(11) NOT NULL AUTO_INCREMENT COMMENT 'key', `name` varchar(100) DEFAULT NULL COMMENT 'project name', `code` bigint(20) NOT NULL COMMENT 'encoding', `description` varchar(200) DEFAULT NULL, `user_id` int(11) DEFAULT NULL COMMENT 'creator id', `flag` tinyint(4) DEFAULT '1' COMMENT '0 not available, 1 available', `create_time` datetime NOT NULL COMMENT 'create time', `update_time` datetime DEFAULT NULL COMMENT 'update time', PRIMARY KEY (`id`), KEY `user_id_index` (`user_id`) USING BTREE ) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8; -- ---------------------------- -- Records of t_ds_project -- ---------------------------- -- ---------------------------- -- Table structure for t_ds_queue -- ---------------------------- DROP TABLE IF EXISTS `t_ds_queue`; CREATE TABLE `t_ds_queue` ( `id` int(11) NOT NULL AUTO_INCREMENT COMMENT 'key', `queue_name` varchar(64) DEFAULT NULL COMMENT 'queue name', `queue` varchar(64) DEFAULT NULL COMMENT 'yarn queue name', `create_time` datetime DEFAULT NULL COMMENT 'create time', `update_time` datetime DEFAULT NULL COMMENT 'update time', PRIMARY KEY (`id`) ) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8; -- ---------------------------- -- Records of t_ds_queue -- ---------------------------- INSERT INTO `t_ds_queue` VALUES ('1', 'default', 'default', null, null); -- ---------------------------- -- Table structure for t_ds_relation_datasource_user -- ---------------------------- DROP TABLE IF EXISTS `t_ds_relation_datasource_user`; CREATE TABLE `t_ds_relation_datasource_user` ( `id` int(11) NOT NULL AUTO_INCREMENT COMMENT 'key', `user_id` int(11) NOT NULL COMMENT 'user id', `datasource_id` int(11) DEFAULT NULL COMMENT 'data source id', `perm` int(11) DEFAULT '1' COMMENT 'limits of authority', `create_time` datetime DEFAULT NULL COMMENT 'create time', `update_time` datetime DEFAULT NULL COMMENT 'update time', PRIMARY KEY (`id`) ) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8; -- ---------------------------- -- Records of t_ds_relation_datasource_user -- ---------------------------- -- ---------------------------- -- Table structure for t_ds_relation_process_instance -- ---------------------------- DROP TABLE IF EXISTS `t_ds_relation_process_instance`; CREATE TABLE `t_ds_relation_process_instance` ( `id` int(11) NOT NULL AUTO_INCREMENT COMMENT 'key', `parent_process_instance_id` int(11) DEFAULT NULL COMMENT 'parent process instance id', `parent_task_instance_id` int(11) DEFAULT NULL COMMENT 'parent process instance id', `process_instance_id` int(11) DEFAULT NULL COMMENT 'child process instance id', PRIMARY KEY (`id`) ) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8; -- ---------------------------- -- Records of t_ds_relation_process_instance -- ---------------------------- -- ---------------------------- -- Table structure for t_ds_relation_project_user -- ---------------------------- DROP TABLE IF EXISTS `t_ds_relation_project_user`; CREATE TABLE `t_ds_relation_project_user` ( `id` int(11) NOT NULL AUTO_INCREMENT COMMENT 'key', `user_id` int(11) NOT NULL COMMENT 'user id', `project_id` int(11) DEFAULT NULL COMMENT 'project id', `perm` int(11) DEFAULT '1' COMMENT 'limits of authority', `create_time` datetime DEFAULT NULL COMMENT 'create time', `update_time` datetime DEFAULT NULL COMMENT 'update time', PRIMARY KEY (`id`), KEY `user_id_index` (`user_id`) USING BTREE ) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8; -- ---------------------------- -- Records of t_ds_relation_project_user -- ---------------------------- -- ---------------------------- -- Table structure for t_ds_relation_resources_user -- ---------------------------- DROP TABLE IF EXISTS `t_ds_relation_resources_user`; CREATE TABLE `t_ds_relation_resources_user` ( `id` int(11) NOT NULL AUTO_INCREMENT, `user_id` int(11) NOT NULL COMMENT 'user id', `resources_id` int(11) DEFAULT NULL COMMENT 'resource id', `perm` int(11) DEFAULT '1' COMMENT 'limits of authority', `create_time` datetime DEFAULT NULL COMMENT 'create time', `update_time` datetime DEFAULT NULL COMMENT 'update time', PRIMARY KEY (`id`) ) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8; -- ---------------------------- -- Records of t_ds_relation_resources_user -- ---------------------------- -- ---------------------------- -- Table structure for t_ds_relation_udfs_user -- ---------------------------- DROP TABLE IF EXISTS `t_ds_relation_udfs_user`; CREATE TABLE `t_ds_relation_udfs_user` ( `id` int(11) NOT NULL AUTO_INCREMENT COMMENT 'key', `user_id` int(11) NOT NULL COMMENT 'userid', `udf_id` int(11) DEFAULT NULL COMMENT 'udf id', `perm` int(11) DEFAULT '1' COMMENT 'limits of authority', `create_time` datetime DEFAULT NULL COMMENT 'create time', `update_time` datetime DEFAULT NULL COMMENT 'update time', PRIMARY KEY (`id`) ) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8; -- ---------------------------- -- Table structure for t_ds_resources -- ---------------------------- DROP TABLE IF EXISTS `t_ds_resources`; CREATE TABLE `t_ds_resources` ( `id` int(11) NOT NULL AUTO_INCREMENT COMMENT 'key', `alias` varchar(64) DEFAULT NULL COMMENT 'alias', `file_name` varchar(64) DEFAULT NULL COMMENT 'file name', `description` varchar(255) DEFAULT NULL, `user_id` int(11) DEFAULT NULL COMMENT 'user id', `type` tinyint(4) DEFAULT NULL COMMENT 'resource type,0:FILE,1:UDF', `size` bigint(20) DEFAULT NULL COMMENT 'resource size', `create_time` datetime DEFAULT NULL COMMENT 'create time', `update_time` datetime DEFAULT NULL COMMENT 'update time', `pid` int(11) DEFAULT NULL, `full_name` varchar(128) DEFAULT NULL, `is_directory` tinyint(4) DEFAULT NULL, PRIMARY KEY (`id`), UNIQUE KEY `t_ds_resources_un` (`full_name`,`type`) ) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8; -- ---------------------------- -- Records of t_ds_resources -- ---------------------------- -- ---------------------------- -- Table structure for t_ds_schedules -- ---------------------------- DROP TABLE IF EXISTS `t_ds_schedules`; CREATE TABLE `t_ds_schedules` ( `id` int(11) NOT NULL AUTO_INCREMENT COMMENT 'key', `process_definition_code` bigint(20) NOT NULL COMMENT 'process definition code', `start_time` datetime NOT NULL COMMENT 'start time', `end_time` datetime NOT NULL COMMENT 'end time', `timezone_id` varchar(40) DEFAULT NULL COMMENT 'schedule timezone id', `crontab` varchar(255) NOT NULL COMMENT 'crontab description', `failure_strategy` tinyint(4) NOT NULL COMMENT 'failure strategy. 0:end,1:continue', `user_id` int(11) NOT NULL COMMENT 'user id', `release_state` tinyint(4) NOT NULL COMMENT 'release state. 0:offline,1:online ', `warning_type` tinyint(4) NOT NULL COMMENT 'Alarm type: 0 is not sent, 1 process is sent successfully, 2 process is sent failed, 3 process is sent successfully and all failures are sent', `warning_group_id` int(11) DEFAULT NULL COMMENT 'alert group id', `process_instance_priority` int(11) DEFAULT NULL COMMENT 'process instance priority:0 Highest,1 High,2 Medium,3 Low,4 Lowest', `worker_group` varchar(64) DEFAULT '' COMMENT 'worker group id', `environment_code` bigint(20) DEFAULT '-1' COMMENT 'environment code', `create_time` datetime NOT NULL COMMENT 'create time', `update_time` datetime NOT NULL COMMENT 'update time', PRIMARY KEY (`id`) ) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8; -- ---------------------------- -- Records of t_ds_schedules -- ---------------------------- -- ---------------------------- -- Table structure for t_ds_session -- ---------------------------- DROP TABLE IF EXISTS `t_ds_session`; CREATE TABLE `t_ds_session` ( `id` varchar(64) NOT NULL COMMENT 'key', `user_id` int(11) DEFAULT NULL COMMENT 'user id', `ip` varchar(45) DEFAULT NULL COMMENT 'ip', `last_login_time` datetime DEFAULT NULL COMMENT 'last login time', PRIMARY KEY (`id`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8; -- ---------------------------- -- Records of t_ds_session -- ---------------------------- -- ---------------------------- -- Table structure for t_ds_task_instance -- ---------------------------- DROP TABLE IF EXISTS `t_ds_task_instance`; CREATE TABLE `t_ds_task_instance` ( `id` int(11) NOT NULL AUTO_INCREMENT COMMENT 'key', `name` varchar(255) DEFAULT NULL COMMENT 'task name', `task_type` varchar(50) NOT NULL COMMENT 'task type', `task_code` bigint(20) NOT NULL COMMENT 'task definition code', `task_definition_version` int(11) DEFAULT '0' COMMENT 'task definition version', `process_instance_id` int(11) DEFAULT NULL COMMENT 'process instance id', `state` tinyint(4) DEFAULT NULL COMMENT 'Status: 0 commit succeeded, 1 running, 2 prepare to pause, 3 pause, 4 prepare to stop, 5 stop, 6 fail, 7 succeed, 8 need fault tolerance, 9 kill, 10 wait for thread, 11 wait for dependency to complete', `submit_time` datetime DEFAULT NULL COMMENT 'task submit time', `start_time` datetime DEFAULT NULL COMMENT 'task start time', `end_time` datetime DEFAULT NULL COMMENT 'task end time', `host` varchar(135) DEFAULT NULL COMMENT 'host of task running on', `execute_path` varchar(200) DEFAULT NULL COMMENT 'task execute path in the host', `log_path` varchar(200) DEFAULT NULL COMMENT 'task log path', `alert_flag` tinyint(4) DEFAULT NULL COMMENT 'whether alert', `retry_times` int(4) DEFAULT '0' COMMENT 'task retry times', `pid` int(4) DEFAULT NULL COMMENT 'pid of task', `app_link` longtext COMMENT 'yarn app id', `task_params` longtext COMMENT 'job custom parameters', `flag` tinyint(4) DEFAULT '1' COMMENT '0 not available, 1 available', `retry_interval` int(4) DEFAULT NULL COMMENT 'retry interval when task failed ', `max_retry_times` int(2) DEFAULT NULL COMMENT 'max retry times', `task_instance_priority` int(11) DEFAULT NULL COMMENT 'task instance priority:0 Highest,1 High,2 Medium,3 Low,4 Lowest', `worker_group` varchar(64) DEFAULT NULL COMMENT 'worker group id', `environment_code` bigint(20) DEFAULT '-1' COMMENT 'environment code', `environment_config` text COMMENT 'this config contains many environment variables config', `executor_id` int(11) DEFAULT NULL, `first_submit_time` datetime DEFAULT NULL COMMENT 'task first submit time', `delay_time` int(4) DEFAULT '0' COMMENT 'task delay execution time', `var_pool` longtext COMMENT 'var_pool', `dry_run` tinyint(4) DEFAULT '0' COMMENT 'dry run flag: 0 normal, 1 dry run', PRIMARY KEY (`id`), KEY `process_instance_id` (`process_instance_id`) USING BTREE, KEY `idx_code_version` (`task_code`, `task_definition_version`) USING BTREE, CONSTRAINT `foreign_key_instance_id` FOREIGN KEY (`process_instance_id`) REFERENCES `t_ds_process_instance` (`id`) ON DELETE CASCADE ) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8; -- ---------------------------- -- Records of t_ds_task_instance -- ---------------------------- -- ---------------------------- -- Table structure for t_ds_tenant -- ---------------------------- DROP TABLE IF EXISTS `t_ds_tenant`; CREATE TABLE `t_ds_tenant` ( `id` int(11) NOT NULL AUTO_INCREMENT COMMENT 'key', `tenant_code` varchar(64) DEFAULT NULL COMMENT 'tenant code', `description` varchar(255) DEFAULT NULL, `queue_id` int(11) DEFAULT NULL COMMENT 'queue id', `create_time` datetime DEFAULT NULL COMMENT 'create time', `update_time` datetime DEFAULT NULL COMMENT 'update time', PRIMARY KEY (`id`) ) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8; -- ---------------------------- -- Records of t_ds_tenant -- ---------------------------- -- ---------------------------- -- Table structure for t_ds_udfs -- ---------------------------- DROP TABLE IF EXISTS `t_ds_udfs`; CREATE TABLE `t_ds_udfs` ( `id` int(11) NOT NULL AUTO_INCREMENT COMMENT 'key', `user_id` int(11) NOT NULL COMMENT 'user id', `func_name` varchar(100) NOT NULL COMMENT 'UDF function name', `class_name` varchar(255) NOT NULL COMMENT 'class of udf', `type` tinyint(4) NOT NULL COMMENT 'Udf function type', `arg_types` varchar(255) DEFAULT NULL COMMENT 'arguments types', `database` varchar(255) DEFAULT NULL COMMENT 'data base', `description` varchar(255) DEFAULT NULL, `resource_id` int(11) NOT NULL COMMENT 'resource id', `resource_name` varchar(255) NOT NULL COMMENT 'resource name', `create_time` datetime NOT NULL COMMENT 'create time', `update_time` datetime NOT NULL COMMENT 'update time', PRIMARY KEY (`id`) ) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8; -- ---------------------------- -- Records of t_ds_udfs -- ---------------------------- -- ---------------------------- -- Table structure for t_ds_user -- ---------------------------- DROP TABLE IF EXISTS `t_ds_user`; CREATE TABLE `t_ds_user` ( `id` int(11) NOT NULL AUTO_INCREMENT COMMENT 'user id', `user_name` varchar(64) DEFAULT NULL COMMENT 'user name', `user_password` varchar(64) DEFAULT NULL COMMENT 'user password', `user_type` tinyint(4) DEFAULT NULL COMMENT 'user type, 0:administrator,1:ordinary user', `email` varchar(64) DEFAULT NULL COMMENT 'email', `phone` varchar(11) DEFAULT NULL COMMENT 'phone', `tenant_id` int(11) DEFAULT NULL COMMENT 'tenant id', `create_time` datetime DEFAULT NULL COMMENT 'create time', `update_time` datetime DEFAULT NULL COMMENT 'update time', `queue` varchar(64) DEFAULT NULL COMMENT 'queue', `state` tinyint(4) DEFAULT '1' COMMENT 'state 0:disable 1:enable', PRIMARY KEY (`id`), UNIQUE KEY `user_name_unique` (`user_name`) ) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8; -- ---------------------------- -- Records of t_ds_user -- ---------------------------- -- ---------------------------- -- Table structure for t_ds_worker_group -- ---------------------------- DROP TABLE IF EXISTS `t_ds_worker_group`; CREATE TABLE `t_ds_worker_group` ( `id` bigint(11) NOT NULL AUTO_INCREMENT COMMENT 'id', `name` varchar(255) NOT NULL COMMENT 'worker group name', `addr_list` text NULL DEFAULT NULL COMMENT 'worker addr list. split by [,]', `create_time` datetime NULL DEFAULT NULL COMMENT 'create time', `update_time` datetime NULL DEFAULT NULL COMMENT 'update time', PRIMARY KEY (`id`), UNIQUE KEY `name_unique` (`name`) ) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8; -- ---------------------------- -- Records of t_ds_worker_group -- ---------------------------- -- ---------------------------- -- Table structure for t_ds_version -- ---------------------------- DROP TABLE IF EXISTS `t_ds_version`; CREATE TABLE `t_ds_version` ( `id` int(11) NOT NULL AUTO_INCREMENT, `version` varchar(200) NOT NULL, PRIMARY KEY (`id`), UNIQUE KEY `version_UNIQUE` (`version`) ) ENGINE=InnoDB AUTO_INCREMENT=2 DEFAULT CHARSET=utf8 COMMENT='version'; -- ---------------------------- -- Records of t_ds_version -- ---------------------------- INSERT INTO `t_ds_version` VALUES ('1', '2.0.6'); -- ---------------------------- -- Records of t_ds_alertgroup -- ---------------------------- INSERT INTO `t_ds_alertgroup`(alert_instance_ids, create_user_id, group_name, description, create_time, update_time) VALUES ("1,2", 1, 'default admin warning group', 'default admin warning group', '2018-11-29 10:20:39', '2018-11-29 10:20:39'); -- ---------------------------- -- Records of t_ds_user -- ---------------------------- INSERT INTO `t_ds_user` VALUES ('1', 'admin', '7ad2410b2f4c074479a8937a28a22b8f', '0', 'xxx@qq.com', '', '0', '2018-03-27 15:48:50', '2018-10-24 17:40:22', null, 1); -- ---------------------------- -- Table structure for t_ds_plugin_define -- ---------------------------- SET sql_mode=(SELECT REPLACE(@@sql_mode,'ONLY_FULL_GROUP_BY','')); DROP TABLE IF EXISTS `t_ds_plugin_define`; CREATE TABLE `t_ds_plugin_define` ( `id` int NOT NULL AUTO_INCREMENT, `plugin_name` varchar(100) NOT NULL COMMENT 'the name of plugin eg: email', `plugin_type` varchar(100) NOT NULL COMMENT 'plugin type . alert=alert plugin, job=job plugin', `plugin_params` text COMMENT 'plugin params', `create_time` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP, `update_time` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP, PRIMARY KEY (`id`), UNIQUE KEY `t_ds_plugin_define_UN` (`plugin_name`,`plugin_type`) ) ENGINE=InnoDB AUTO_INCREMENT=2 DEFAULT CHARSET=utf8; -- ---------------------------- -- Table structure for t_ds_alert_plugin_instance -- ---------------------------- DROP TABLE IF EXISTS `t_ds_alert_plugin_instance`; CREATE TABLE `t_ds_alert_plugin_instance` ( `id` int NOT NULL AUTO_INCREMENT, `plugin_define_id` int NOT NULL, `plugin_instance_params` text COMMENT 'plugin instance params. Also contain the params value which user input in web ui.', `create_time` timestamp NULL DEFAULT CURRENT_TIMESTAMP, `update_time` timestamp NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP, `instance_name` varchar(200) DEFAULT NULL COMMENT 'alert instance name', PRIMARY KEY (`id`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8; -- ---------------------------- -- Table structure for t_ds_environment -- ---------------------------- DROP TABLE IF EXISTS `t_ds_environment`; CREATE TABLE `t_ds_environment` ( `id` bigint(11) NOT NULL AUTO_INCREMENT COMMENT 'id', `code` bigint(20) DEFAULT NULL COMMENT 'encoding', `name` varchar(100) NOT NULL COMMENT 'environment name', `config` text NULL DEFAULT NULL COMMENT 'this config contains many environment variables config', `description` text NULL DEFAULT NULL COMMENT 'the details', `operator` int(11) DEFAULT NULL COMMENT 'operator user id', `create_time` timestamp NULL DEFAULT CURRENT_TIMESTAMP, `update_time` timestamp NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP, PRIMARY KEY (`id`), UNIQUE KEY `environment_name_unique` (`name`), UNIQUE KEY `environment_code_unique` (`code`) ) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8; -- ---------------------------- -- Table structure for t_ds_environment_worker_group_relation -- ---------------------------- DROP TABLE IF EXISTS `t_ds_environment_worker_group_relation`; CREATE TABLE `t_ds_environment_worker_group_relation` ( `id` bigint(11) NOT NULL AUTO_INCREMENT COMMENT 'id', `environment_code` bigint(20) NOT NULL COMMENT 'environment code', `worker_group` varchar(255) NOT NULL COMMENT 'worker group id', `operator` int(11) DEFAULT NULL COMMENT 'operator user id', `create_time` timestamp NULL DEFAULT CURRENT_TIMESTAMP, `update_time` timestamp NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP, PRIMARY KEY (`id`), UNIQUE KEY `environment_worker_group_unique` (`environment_code`,`worker_group`) ) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8;启动 dolphinschedulerkubectl apply -f dolphinscheduler-master.yaml kubectl apply -f dolphinscheduler-alert.yaml kubectl apply -f dolphinscheduler-worker.yaml kubectl apply -f dolphinscheduler-api.yaml kubectl apply -f dolphinscheduler-ingress.yamlpod 都为 running 后,访问 dolphinscheduler.org/dolphinscheduler ,如果修改过 ingress ,需要以自己配置的域名为准,没有域名服务器或 dns 解析,需要自己本地配置好 hosts 解析默认用户名/密码:admin/dolphinscheduler123
前言为什么用 containerd ?因为 k8s 早在2021年就说要取消 docker-shim ,相关的资料可以查看下面的链接弃用 Dockershim 的常见问题迟早都要接受的,不如早点接受k8s 组件Kubernetes 组件master 节点组件名称组件作用etcd兼具一致性和高可用性的键值数据库,可以作为保存 Kubernetes 所有集群数据的后台数据库。kube-apiserver提供了资源操作的唯一入口,各组件协调者,并提供认证、授权、访问控制、API注册和发现等机制; 以 HTTP API 提供接口服务,所有对象资源的增删改查和监听操作都交给 apiserver 处理后再提交给 etcd 存储。kube-controller-manager负责维护集群的状态,比如故障检测、自动扩展、滚动更新等;处理集群中常规后台任务,一个资源对应一个控制器,而 controllermanager 就是负责管理这些控制器的。kube-scheduler负责资源的调度,按照预定的调度策略将 pod 调度到相应的机器上。work 节点组件名称组件作用kubeletkubelet 是 master 在 work 节点上的 agent,管理本机运行容器的生命周期,比如创建容器、pod 挂载数据卷、下载 secret 、获取容器和节点状态等工作。 kubelet 将每个 pod 转换成一组容器。负责维护容器的生命周期,同时也负责 volume(CVI)和网络(CNI)的管理kube-proxy负责为 service 提供 cluster 内部的服务发现和负载均衡; 在 work 节点上实现 pod 网络代理,维护网络规则和四层负载均衡工作。container runtime负责镜像管理以及Pod和容器的真正运行(CRI) 目前用的比较多的有 docker 、 containerdcluster networking集群网络系统目前用的比较多的有 flannel 、calicocoredns负责为整个集群提供DNS服务ingress controller为服务提供外网入口metrics-server提供资源监控dashboard提供 GUI 界面环境准备IP角色内核版本192.168.91.19master/workcentos7.6/3.10.0-957.el7.x86_64192.168.91.20workcentos7.6/3.10.0-957.el7.x86_64serviceversionetcdv3.5.1kubernetesv1.23.3cfsslv1.6.1containerdv1.5.9pausev3.6flannelv0.15.1corednsv1.8.6metrics-serverv0.5.2dashboardv2.4.0cfssl githubetcd githubk8s githubcontainerd githubrunc github本次部署用到的安装包和镜像都上传到csdn了master节点的配置不能小于2c2g,work节点可以给1c1g节点之间需要完成免密操作,这里就不体现操作步骤了因为懒...所以就弄了一个master节点以下的操作,只需要选一台可以和其他节点免密的 master 节点就好网络条件好的情况下,镜像可以让他自己拉取,如果镜像经常拉取失败,可以从本地上传镜像包然后导入到 containerd,文章后面的镜像导入一类的操作不是必须要操作的创建目录根据自身实际情况创建指定路径,此路径用来存放k8s二进制文件以及用到的镜像文件mkdir -p /approot1/k8s/{bin,images,pkg,tmp/{ssl,service}}关闭防火墙for i in 192.168.91.19 192.168.91.20;do \ ssh $i "systemctl disable firewalld"; \ ssh $i "systemctl stop firewalld"; \ done关闭selinux临时关闭for i in 192.168.91.19 192.168.91.20;do \ ssh $i "setenforce 0"; \ done永久关闭for i in 192.168.91.19 192.168.91.20;do \ ssh $i "sed -i '/SELINUX/s/enforcing/disabled/g' /etc/selinux/config"; \ done关闭swap临时关闭for i in 192.168.91.19 192.168.91.20;do \ ssh $i "swapoff -a"; \ done永久关闭for i in 192.168.91.19 192.168.91.20;do \ ssh $i "sed -i '/ swap / s/^\(.*\)$/#\1/g' /etc/fstab"; \ done开启内核模块临时开启for i in 192.168.91.19 192.168.91.20;do \ ssh $i "modprobe ip_vs"; \ ssh $i "modprobe ip_vs_rr"; \ ssh $i "modprobe ip_vs_wrr"; \ ssh $i "modprobe ip_vs_sh"; \ ssh $i "modprobe nf_conntrack"; \ ssh $i "modprobe nf_conntrack_ipv4"; \ ssh $i "modprobe br_netfilter"; \ ssh $i "modprobe overlay"; \ done永久开启vim /approot1/k8s/tmp/service/k8s-modules.confip_vs ip_vs_rr ip_vs_wrr ip_vs_sh nf_conntrack nf_conntrack_ipv4 br_netfilter overlay分发到所有节点for i in 192.168.91.19 192.168.91.20;do \ scp /approot1/k8s/tmp/service/k8s-modules.conf $i:/etc/modules-load.d/; \ done启用systemd自动加载模块服务for i in 192.168.91.19 192.168.91.20;do \ ssh $i "systemctl enable systemd-modules-load"; \ ssh $i "systemctl restart systemd-modules-load"; \ ssh $i "systemctl is-active systemd-modules-load"; \ done返回active表示 自动加载模块服务 启动成功配置系统参数以下的参数适用于3.x和4.x系列的内核vim /approot1/k8s/tmp/service/kubernetes.conf建议编辑之前,在 vim 里面先执行 :set paste ,避免复制进去的内容和文档的不一致,比如多了注释,或者语法对齐异常# 开启数据包转发功能(实现vxlan) net.ipv4.ip_forward=1 # iptables对bridge的数据进行处理 net.bridge.bridge-nf-call-iptables=1 net.bridge.bridge-nf-call-ip6tables=1 net.bridge.bridge-nf-call-arptables=1 # 关闭tcp_tw_recycle,否则和NAT冲突,会导致服务不通 net.ipv4.tcp_tw_recycle=0 # 不允许将TIME-WAIT sockets重新用于新的TCP连接 net.ipv4.tcp_tw_reuse=0 # socket监听(listen)的backlog上限 net.core.somaxconn=32768 # 最大跟踪连接数,默认 nf_conntrack_buckets * 4 net.netfilter.nf_conntrack_max=1000000 # 禁止使用 swap 空间,只有当系统 OOM 时才允许使用它 vm.swappiness=0 # 计算当前的内存映射文件数。 vm.max_map_count=655360 # 内核可分配的最大文件数 fs.file-max=6553600 # 持久连接 net.ipv4.tcp_keepalive_time=600 net.ipv4.tcp_keepalive_intvl=30 net.ipv4.tcp_keepalive_probes=10分发到所有节点for i in 192.168.91.19 192.168.91.20;do \ scp /approot1/k8s/tmp/service/kubernetes.conf $i:/etc/sysctl.d/; \ done加载系统参数for i in 192.168.91.19 192.168.91.20;do \ ssh $i "sysctl -p /etc/sysctl.d/kubernetes.conf"; \ done清空iptables规则for i in 192.168.91.19 192.168.91.20;do \ ssh $i "iptables -F && iptables -X && iptables -F -t nat && iptables -X -t nat"; \ ssh $i "iptables -P FORWARD ACCEPT"; \ done配置 PATH 变量for i in 192.168.91.19 192.168.91.20;do \ ssh $i "echo 'PATH=$PATH:/approot1/k8s/bin' >> $HOME/.bashrc"; \ source $HOME/.bashrc下载二进制文件其中一台节点操作即可github下载会比较慢,可以从本地上传到 /approot1/k8s/pkg/ 目录下wget -O /approot1/k8s/pkg/kubernetes.tar.gz \ https://dl.k8s.io/v1.23.3/kubernetes-server-linux-amd64.tar.gz wget -O /approot1/k8s/pkg/etcd.tar.gz \ https://github.com/etcd-io/etcd/releases/download/v3.5.1/etcd-v3.5.1-linux-amd64.tar.gz解压并删除不必要的文件cd /approot1/k8s/pkg/ for i in $(ls *.tar.gz);do tar xvf $i && rm -f $i;done mv kubernetes/server/bin/ kubernetes/ rm -rf kubernetes/{addons,kubernetes-src.tar.gz,LICENSES,server} rm -f kubernetes/bin/*_tag kubernetes/bin/*.tar rm -rf etcd-v3.5.1-linux-amd64/Documentation etcd-v3.5.1-linux-amd64/*.md部署 master 节点创建 ca 根证书wget -O /approot1/k8s/bin/cfssl https://github.com/cloudflare/cfssl/releases/download/v1.6.1/cfssl_1.6.1_linux_amd64 wget -O /approot1/k8s/bin/cfssljson https://github.com/cloudflare/cfssl/releases/download/v1.6.1/cfssljson_1.6.1_linux_amd64 chmod +x /approot1/k8s/bin/*vim /approot1/k8s/tmp/ssl/ca-config.json{ "signing": { "default": { "expiry": "87600h" "profiles": { "kubernetes": { "usages": [ "signing", "key encipherment", "server auth", "client auth" "expiry": "876000h" }vim /approot1/k8s/tmp/ssl/ca-csr.json{ "CN": "kubernetes", "key": { "algo": "rsa", "size": 2048 "names": [ "C": "CN", "ST": "ShangHai", "L": "ShangHai", "O": "k8s", "OU": "System" "ca": { "expiry": "876000h" }cd /approot1/k8s/tmp/ssl/ cfssl gencert -initca ca-csr.json | cfssljson -bare ca部署 etcd 组件创建 etcd 证书vim /approot1/k8s/tmp/ssl/etcd-csr.json这里的192.168.91.19需要改成自己的ip,不要一股脑的复制黏贴注意json的格式{ "CN": "etcd", "hosts": [ "127.0.0.1", "192.168.91.19" "key": { "algo": "rsa", "size": 2048 "names": [ "C": "CN", "ST": "ShangHai", "L": "ShangHai", "O": "k8s", "OU": "System" }cd /approot1/k8s/tmp/ssl/ cfssl gencert -ca=ca.pem \ -ca-key=ca-key.pem \ -config=ca-config.json \ -profile=kubernetes etcd-csr.json | cfssljson -bare etcd配置 etcd 为 systemctl 管理vim /approot1/k8s/tmp/service/kube-etcd.service.192.168.91.19这里的192.168.91.19需要改成自己的ip,不要一股脑的复制黏贴etcd 参数[Unit] Description=Etcd Server After=network.target After=network-online.target Wants=network-online.target Documentation=https://github.com/coreos [Service] Type=notify WorkingDirectory=/approot1/k8s/data/etcd ExecStart=/approot1/k8s/bin/etcd \ --name=etcd-192.168.91.19 \ --cert-file=/etc/kubernetes/ssl/etcd.pem \ --key-file=/etc/kubernetes/ssl/etcd-key.pem \ --peer-cert-file=/etc/kubernetes/ssl/etcd.pem \ --peer-key-file=/etc/kubernetes/ssl/etcd-key.pem \ --trusted-ca-file=/etc/kubernetes/ssl/ca.pem \ --peer-trusted-ca-file=/etc/kubernetes/ssl/ca.pem \ --initial-advertise-peer-urls=https://192.168.91.19:2380 \ --listen-peer-urls=https://192.168.91.19:2380 \ --listen-client-urls=https://192.168.91.19:2379,http://127.0.0.1:2379 \ --advertise-client-urls=https://192.168.91.19:2379 \ --initial-cluster-token=etcd-cluster-0 \ --initial-cluster=etcd-192.168.91.19=https://192.168.91.19:2380 \ --initial-cluster-state=new \ --data-dir=/approot1/k8s/data/etcd \ --wal-dir= \ --snapshot-count=50000 \ --auto-compaction-retention=1 \ --auto-compaction-mode=periodic \ --max-request-bytes=10485760 \ --quota-backend-bytes=8589934592 Restart=always RestartSec=15 LimitNOFILE=65536 OOMScoreAdjust=-999 [Install] WantedBy=multi-user.target分发证书以及创建相关路径如果是多节点,只需要在192.168.91.19后面加上对应的ip即可,以空格为分隔,注意将192.168.91.19修改为自己的ip,切莫一股脑复制对应的目录也要确保和自己规划的一致,如果和我的有不同,注意修改,否则服务会启动失败for i in 192.168.91.19;do \ ssh $i "mkdir -p /etc/kubernetes/ssl"; \ ssh $i "mkdir -m 700 -p /approot1/k8s/data/etcd"; \ ssh $i "mkdir -p /approot1/k8s/bin"; \ scp /approot1/k8s/tmp/ssl/{ca*.pem,etcd*.pem} $i:/etc/kubernetes/ssl/; \ scp /approot1/k8s/tmp/service/kube-etcd.service.$i $i:/etc/systemd/system/kube-etcd.service; \ scp /approot1/k8s/pkg/etcd-v3.5.1-linux-amd64/etcd* $i:/approot1/k8s/bin/; \ done启动 etcd 服务如果是多节点,只需要在192.168.91.19后面加上对应的ip即可,以空格为分隔,注意将192.168.91.19修改为自己的ip,切莫一股脑复制for i in 192.168.91.19;do \ ssh $i "systemctl daemon-reload"; \ ssh $i "systemctl enable kube-etcd"; \ ssh $i "systemctl restart kube-etcd --no-block"; \ ssh $i "systemctl is-active kube-etcd"; \ done返回 activating 表示 etcd 还在启动中,可以稍等一会,然后再执行 for i in 192.168.91.19;do ssh $i "systemctl is-active kube-etcd";done 返回active表示 etcd 启动成功,如果是多节点 etcd ,其中一个没有返回active属于正常的,可以使用下面的方式来验证集群如果是多节点,只需要在192.168.91.19后面加上对应的ip即可,以空格为分隔,注意将192.168.91.19修改为自己的ip,切莫一股脑复制for i in 192.168.91.19;do \ ssh $i "ETCDCTL_API=3 /approot1/k8s/bin/etcdctl \ --endpoints=https://${i}:2379 \ --cacert=/etc/kubernetes/ssl/ca.pem \ --cert=/etc/kubernetes/ssl/etcd.pem \ --key=/etc/kubernetes/ssl/etcd-key.pem \ endpoint health"; \ donehttps://192.168.91.19:2379 is healthy: successfully committed proposal: took = 7.135668ms返回以上信息,并显示 successfully 表示节点是健康的部署 apiserver 组件创建 apiserver 证书vim /approot1/k8s/tmp/ssl/kubernetes-csr.json这里的192.168.91.19需要改成自己的ip,不要一股脑的复制黏贴注意json的格式10.88.0.1 是 k8s 的服务 ip,千万不要和现有的网络一致,避免出现冲突{ "CN": "kubernetes", "hosts": [ "127.0.0.1", "192.168.91.19", "10.88.0.1", "kubernetes", "kubernetes.default", "kubernetes.default.svc", "kubernetes.default.svc.cluster", "kubernetes.default.svc.cluster.local" "key": { "algo": "rsa", "size": 2048 "names": [ "C": "CN", "ST": "ShangHai", "L": "ShangHai", "O": "k8s", "OU": "System" }cd /approot1/k8s/tmp/ssl/ cfssl gencert -ca=ca.pem \ -ca-key=ca-key.pem \ -config=ca-config.json \ -profile=kubernetes kubernetes-csr.json | cfssljson -bare kubernetes创建 metrics-server 证书vim /approot1/k8s/tmp/ssl/metrics-server-csr.json{ "CN": "aggregator", "hosts": [ "key": { "algo": "rsa", "size": 2048 "names": [ "C": "CN", "ST": "ShangHai", "L": "ShangHai", "O": "k8s", "OU": "System" }cd /approot1/k8s/tmp/ssl/ cfssl gencert -ca=ca.pem \ -ca-key=ca-key.pem \ -config=ca-config.json \ -profile=kubernetes metrics-server-csr.json | cfssljson -bare metrics-server配置 apiserver 为 systemctl 管理vim /approot1/k8s/tmp/service/kube-apiserver.service.192.168.91.19这里的192.168.91.19需要改成自己的ip,不要一股脑的复制黏贴--service-cluster-ip-range 参数的 ip 网段要和 kubernetes-csr.json 里面的 10.88.0.1 是一个网段的--etcd-servers 如果 etcd 是多节点的,这里要写上所有的 etcd 节点apiserver 参数[Unit] Description=Kubernetes API Server Documentation=https://github.com/GoogleCloudPlatform/kubernetes After=network.target [Service] ExecStart=/approot1/k8s/bin/kube-apiserver \ --allow-privileged=true \ --anonymous-auth=false \ --api-audiences=api,istio-ca \ --authorization-mode=Node,RBAC \ --bind-address=192.168.91.19 \ --client-ca-file=/etc/kubernetes/ssl/ca.pem \ --endpoint-reconciler-type=lease \ --etcd-cafile=/etc/kubernetes/ssl/ca.pem \ --etcd-certfile=/etc/kubernetes/ssl/kubernetes.pem \ --etcd-keyfile=/etc/kubernetes/ssl/kubernetes-key.pem \ --etcd-servers=https://192.168.91.19:2379 \ --kubelet-certificate-authority=/etc/kubernetes/ssl/ca.pem \ --kubelet-client-certificate=/etc/kubernetes/ssl/kubernetes.pem \ --kubelet-client-key=/etc/kubernetes/ssl/kubernetes-key.pem \ --secure-port=6443 \ --service-account-issuer=https://kubernetes.default.svc \ --service-account-signing-key-file=/etc/kubernetes/ssl/ca-key.pem \ --service-account-key-file=/etc/kubernetes/ssl/ca.pem \ --service-cluster-ip-range=10.88.0.0/16 \ --service-node-port-range=30000-32767 \ --tls-cert-file=/etc/kubernetes/ssl/kubernetes.pem \ --tls-private-key-file=/etc/kubernetes/ssl/kubernetes-key.pem \ --requestheader-client-ca-file=/etc/kubernetes/ssl/ca.pem \ --requestheader-allowed-names= \ --requestheader-extra-headers-prefix=X-Remote-Extra- \ --requestheader-group-headers=X-Remote-Group \ --requestheader-username-headers=X-Remote-User \ --proxy-client-cert-file=/etc/kubernetes/ssl/metrics-server.pem \ --proxy-client-key-file=/etc/kubernetes/ssl/metrics-server-key.pem \ --enable-aggregator-routing=true \ --v=2 Restart=always RestartSec=5 Type=notify LimitNOFILE=65536 [Install] WantedBy=multi-user.target分发证书以及创建相关路径如果是多节点,只需要在192.168.91.19后面加上对应的ip即可,以空格为分隔,注意将192.168.91.19修改为自己的ip,切莫一股脑复制对应的目录也要确保和自己规划的一致,如果和我的有不同,注意修改,否则服务会启动失败for i in 192.168.91.19;do \ ssh $i "mkdir -p /etc/kubernetes/ssl"; \ ssh $i "mkdir -p /approot1/k8s/bin"; \ scp /approot1/k8s/tmp/ssl/{ca*.pem,kubernetes*.pem,metrics-server*.pem} $i:/etc/kubernetes/ssl/; \ scp /approot1/k8s/tmp/service/kube-apiserver.service.$i $i:/etc/systemd/system/kube-apiserver.service; \ scp /approot1/k8s/pkg/kubernetes/bin/kube-apiserver $i:/approot1/k8s/bin/; \ done启动 apiserver 服务如果是多节点,只需要在192.168.91.19后面加上对应的ip即可,以空格为分隔,注意将192.168.91.19修改为自己的ip,切莫一股脑复制for i in 192.168.91.19;do \ ssh $i "systemctl daemon-reload"; \ ssh $i "systemctl enable kube-apiserver"; \ ssh $i "systemctl restart kube-apiserver --no-block"; \ ssh $i "systemctl is-active kube-apiserver"; \ done返回 activating 表示 apiserver 还在启动中,可以稍等一会,然后再执行 for i in 192.168.91.19;do ssh $i "systemctl is-active kube-apiserver";done 返回active表示 apiserver 启动成功curl -k --cacert /etc/kubernetes/ssl/ca.pem \ --cert /etc/kubernetes/ssl/kubernetes.pem \ --key /etc/kubernetes/ssl/kubernetes-key.pem \ https://192.168.91.19:6443/api正常返回如下信息,说明 apiserver 服务运行正常{ "kind": "APIVersions", "versions": [ "serverAddressByClientCIDRs": [ "clientCIDR": "0.0.0.0/0", "serverAddress": "192.168.91.19:6443" }查看 k8s 的所有 kind (对象类别)curl -s -k --cacert /etc/kubernetes/ssl/ca.pem \ --cert /etc/kubernetes/ssl/kubernetes.pem \ --key /etc/kubernetes/ssl/kubernetes-key.pem \ https://192.168.91.19:6443/api/v1/ | grep kind | sort -u "kind": "APIResourceList", "kind": "Binding", "kind": "ComponentStatus", "kind": "ConfigMap", "kind": "Endpoints", "kind": "Event", "kind": "Eviction", "kind": "LimitRange", "kind": "Namespace", "kind": "Node", "kind": "NodeProxyOptions", "kind": "PersistentVolume", "kind": "PersistentVolumeClaim", "kind": "Pod", "kind": "PodAttachOptions", "kind": "PodExecOptions", "kind": "PodPortForwardOptions", "kind": "PodProxyOptions", "kind": "PodTemplate", "kind": "ReplicationController", "kind": "ResourceQuota", "kind": "Scale", "kind": "Secret", "kind": "Service", "kind": "ServiceAccount", "kind": "ServiceProxyOptions", "kind": "TokenRequest",配置 kubectl 管理创建 admin 证书vim /approot1/k8s/tmp/ssl/admin-csr.json{ "CN": "admin", "hosts": [ "key": { "algo": "rsa", "size": 2048 "names": [ "C": "CN", "ST": "ShangHai", "L": "ShangHai", "O": "system:masters", "OU": "System" }cd /approot1/k8s/tmp/ssl/ cfssl gencert -ca=ca.pem \ -ca-key=ca-key.pem \ -config=ca-config.json \ -profile=kubernetes admin-csr.json | cfssljson -bare admin创建 kubeconfig 证书设置集群参数--server 为 apiserver 的访问地址,修改成自己的 ip 地址和 service 文件里面指定的 --secure-port 参数的端口,切记,一定要带上https:// 协议,否则生成的证书,kubectl 命令访问不到 apiservercd /approot1/k8s/tmp/ssl/ /approot1/k8s/pkg/kubernetes/bin/kubectl config set-cluster kubernetes \ --certificate-authority=ca.pem \ --embed-certs=true \ --server=https://192.168.91.19:6443 \ --kubeconfig=kubectl.kubeconfig设置客户端认证参数cd /approot1/k8s/tmp/ssl/ /approot1/k8s/pkg/kubernetes/bin/kubectl config set-credentials admin \ --client-certificate=admin.pem \ --client-key=admin-key.pem \ --embed-certs=true \ --kubeconfig=kubectl.kubeconfig设置上下文参数cd /approot1/k8s/tmp/ssl/ /approot1/k8s/pkg/kubernetes/bin/kubectl config set-context kubernetes \ --cluster=kubernetes \ --user=admin \ --kubeconfig=kubectl.kubeconfig设置默认上下文cd /approot1/k8s/tmp/ssl/ /approot1/k8s/pkg/kubernetes/bin/kubectl config use-context kubernetes --kubeconfig=kubectl.kubeconfig分发 kubeconfig 证书到所有 master 节点如果是多节点,只需要在192.168.91.19后面加上对应的ip即可,以空格为分隔,注意将192.168.91.19修改为自己的ip,切莫一股脑复制for i in 192.168.91.19;do \ ssh $i "mkdir -p /etc/kubernetes/ssl"; \ ssh $i "mkdir -p /approot1/k8s/bin"; \ ssh $i "mkdir -p $HOME/.kube"; \ scp /approot1/k8s/pkg/kubernetes/bin/kubectl $i:/approot1/k8s/bin/; \ ssh $i "echo 'source <(kubectl completion bash)' >> $HOME/.bashrc" scp /approot1/k8s/tmp/ssl/kubectl.kubeconfig $i:$HOME/.kube/config; \ done部署 controller-manager 组件创建 controller-manager 证书vim /approot1/k8s/tmp/ssl/kube-controller-manager-csr.json这里的192.168.91.19需要改成自己的ip,不要一股脑的复制黏贴注意json的格式{ "CN": "system:kube-controller-manager", "key": { "algo": "rsa", "size": 2048 "hosts": [ "127.0.0.1", "192.168.91.19" "names": [ "C": "CN", "ST": "ShangHai", "L": "ShangHai", "O": "system:kube-controller-manager", "OU": "System" }cd /approot1/k8s/tmp/ssl/ cfssl gencert -ca=ca.pem \ -ca-key=ca-key.pem \ -config=ca-config.json \ -profile=kubernetes kube-controller-manager-csr.json | cfssljson -bare kube-controller-manager创建 kubeconfig 证书设置集群参数--server 为 apiserver 的访问地址,修改成自己的 ip 地址和 service 文件里面指定的 --secure-port 参数的端口,切记,一定要带上https:// 协议,否则生成的证书,kubectl 命令访问不到 apiservercd /approot1/k8s/tmp/ssl/ /approot1/k8s/pkg/kubernetes/bin/kubectl config set-cluster kubernetes \ --certificate-authority=ca.pem \ --embed-certs=true \ --server=https://192.168.91.19:6443 \ --kubeconfig=kube-controller-manager.kubeconfig设置客户端认证参数cd /approot1/k8s/tmp/ssl/ /approot1/k8s/pkg/kubernetes/bin/kubectl config set-credentials system:kube-controller-manager \ --client-certificate=kube-controller-manager.pem \ --client-key=kube-controller-manager-key.pem \ --embed-certs=true \ --kubeconfig=kube-controller-manager.kubeconfig设置上下文参数cd /approot1/k8s/tmp/ssl/ /approot1/k8s/pkg/kubernetes/bin/kubectl config set-context system:kube-controller-manager \ --cluster=kubernetes \ --user=system:kube-controller-manager \ --kubeconfig=kube-controller-manager.kubeconfig设置默认上下文cd /approot1/k8s/tmp/ssl/ /approot1/k8s/pkg/kubernetes/bin/kubectl config \ use-context system:kube-controller-manager \ --kubeconfig=kube-controller-manager.kubeconfig配置 controller-manager 为 systemctl 管理vim /approot1/k8s/tmp/service/kube-controller-manager.service这里的192.168.91.19需要改成自己的ip,不要一股脑的复制黏贴--service-cluster-ip-range 参数的 ip 网段要和 kubernetes-csr.json 里面的 10.88.0.1 是一个网段的--cluster-cidr 为 pod 运行的网段,要和 --service-cluster-ip-range 参数的网段以及现有的网络不一致,避免出现冲突controller-manager 参数[Unit] Description=Kubernetes Controller Manager Documentation=https://github.com/GoogleCloudPlatform/kubernetes [Service] ExecStart=/approot1/k8s/bin/kube-controller-manager \ --bind-address=0.0.0.0 \ --allocate-node-cidrs=true \ --cluster-cidr=172.20.0.0/16 \ --cluster-name=kubernetes \ --cluster-signing-cert-file=/etc/kubernetes/ssl/ca.pem \ --cluster-signing-key-file=/etc/kubernetes/ssl/ca-key.pem \ --kubeconfig=/etc/kubernetes/kube-controller-manager.kubeconfig \ --leader-elect=true \ --node-cidr-mask-size=24 \ --root-ca-file=/etc/kubernetes/ssl/ca.pem \ --service-account-private-key-file=/etc/kubernetes/ssl/ca-key.pem \ --service-cluster-ip-range=10.88.0.0/16 \ --use-service-account-credentials=true \ --v=2 Restart=always RestartSec=5 [Install] WantedBy=multi-user.target分发证书以及创建相关路径如果是多节点,只需要在192.168.91.19后面加上对应的ip即可,以空格为分隔,注意将192.168.91.19修改为自己的ip,切莫一股脑复制对应的目录也要确保和自己规划的一致,如果和我的有不同,注意修改,否则服务会启动失败for i in 192.168.91.19;do \ ssh $i "mkdir -p /etc/kubernetes/ssl"; \ ssh $i "mkdir -p /approot1/k8s/bin"; \ scp /approot1/k8s/tmp/ssl/kube-controller-manager.kubeconfig $i:/etc/kubernetes/; \ scp /approot1/k8s/tmp/ssl/ca*.pem $i:/etc/kubernetes/ssl/; \ scp /approot1/k8s/tmp/service/kube-controller-manager.service $i:/etc/systemd/system/; \ scp /approot1/k8s/pkg/kubernetes/bin/kube-controller-manager $i:/approot1/k8s/bin/; \ done启动 controller-manager 服务如果是多节点,只需要在192.168.91.19后面加上对应的ip即可,以空格为分隔,注意将192.168.91.19修改为自己的ip,切莫一股脑复制for i in 192.168.91.19;do \ ssh $i "systemctl daemon-reload"; \ ssh $i "systemctl enable kube-controller-manager"; \ ssh $i "systemctl restart kube-controller-manager --no-block"; \ ssh $i "systemctl is-active kube-controller-manager"; \ done返回 activating 表示 controller-manager 还在启动中,可以稍等一会,然后再执行 for i in 192.168.91.19;do ssh $i "systemctl is-active kube-controller-manager";done 返回active表示 controller-manager 启动成功部署 scheduler 组件创建 scheduler 证书vim /approot1/k8s/tmp/ssl/kube-scheduler-csr.json这里的192.168.91.19需要改成自己的ip,不要一股脑的复制黏贴注意json的格式{ "CN": "system:kube-scheduler", "key": { "algo": "rsa", "size": 2048 "hosts": [ "127.0.0.1", "192.168.91.19" "names": [ "C": "CN", "ST": "ShangHai", "L": "ShangHai", "O": "system:kube-scheduler", "OU": "System" }cd /approot1/k8s/tmp/ssl/ cfssl gencert -ca=ca.pem \ -ca-key=ca-key.pem \ -config=ca-config.json \ -profile=kubernetes kube-scheduler-csr.json | cfssljson -bare kube-scheduler创建 kubeconfig 证书设置集群参数--server 为 apiserver 的访问地址,修改成自己的 ip 地址和 service 文件里面指定的 --secure-port 参数的端口,切记,一定要带上https:// 协议,否则生成的证书,kubectl 命令访问不到 apiservercd /approot1/k8s/tmp/ssl/ /approot1/k8s/pkg/kubernetes/bin/kubectl config set-cluster kubernetes \ --certificate-authority=ca.pem \ --embed-certs=true \ --server=https://192.168.91.19:6443 \ --kubeconfig=kube-scheduler.kubeconfig设置客户端认证参数cd /approot1/k8s/tmp/ssl/ /approot1/k8s/pkg/kubernetes/bin/kubectl config set-credentials system:kube-scheduler \ --client-certificate=kube-scheduler.pem \ --client-key=kube-scheduler-key.pem \ --embed-certs=true \ --kubeconfig=kube-scheduler.kubeconfig设置上下文参数cd /approot1/k8s/tmp/ssl/ /approot1/k8s/pkg/kubernetes/bin/kubectl config set-context system:kube-scheduler \ --cluster=kubernetes \ --user=system:kube-scheduler \ --kubeconfig=kube-scheduler.kubeconfig设置默认上下文cd /approot1/k8s/tmp/ssl/ /approot1/k8s/pkg/kubernetes/bin/kubectl config \ use-context system:kube-scheduler \ --kubeconfig=kube-scheduler.kubeconfig配置 scheduler 为 systemctl 管理vim /approot1/k8s/tmp/service/kube-scheduler.servicescheduler 参数[Unit] Description=Kubernetes Scheduler Documentation=https://github.com/GoogleCloudPlatform/kubernetes [Service] ExecStart=/approot1/k8s/bin/kube-scheduler \ --authentication-kubeconfig=/etc/kubernetes/kube-scheduler.kubeconfig \ --authorization-kubeconfig=/etc/kubernetes/kube-scheduler.kubeconfig \ --bind-address=0.0.0.0 \ --kubeconfig=/etc/kubernetes/kube-scheduler.kubeconfig \ --leader-elect=true \ --v=2 Restart=always RestartSec=5 [Install] WantedBy=multi-user.target分发证书以及创建相关路径如果是多节点,只需要在192.168.91.19后面加上对应的ip即可,以空格为分隔,注意将192.168.91.19修改为自己的ip,切莫一股脑复制对应的目录也要确保和自己规划的一致,如果和我的有不同,注意修改,否则服务会启动失败for i in 192.168.91.19;do \ ssh $i "mkdir -p /etc/kubernetes/ssl"; \ ssh $i "mkdir -p /approot1/k8s/bin"; \ scp /approot1/k8s/tmp/ssl/{ca*.pem,kube-scheduler.kubeconfig} $i:/etc/kubernetes/; \ scp /approot1/k8s/tmp/service/kube-scheduler.service $i:/etc/systemd/system/; \ scp /approot1/k8s/pkg/kubernetes/bin/kube-scheduler $i:/approot1/k8s/bin/; \ done启动 scheduler 服务如果是多节点,只需要在192.168.91.19后面加上对应的ip即可,以空格为分隔,注意将192.168.91.19修改为自己的ip,切莫一股脑复制for i in 192.168.91.19;do \ ssh $i "systemctl daemon-reload"; \ ssh $i "systemctl enable kube-scheduler"; \ ssh $i "systemctl restart kube-scheduler --no-block"; \ ssh $i "systemctl is-active kube-scheduler"; \ done返回 activating 表示 scheduler 还在启动中,可以稍等一会,然后再执行 for i in 192.168.91.19;do ssh $i "systemctl is-active kube-scheduler";done 返回active表示 scheduler 启动成功部署 work 节点部署 containerd 组件下载二进制文件github 下载 containerd 的时候,记得选择cri-containerd-cni 开头的文件,这个包里面包含了 containerd 以及 crictl 管理工具和 cni 网络插件,包括 systemd service 文件、config.toml 、 crictl.yaml 以及 cni 配置文件都是配置好的,简单修改一下就可以使用了虽然 cri-containerd-cni 也有 runc ,但是缺少依赖,所以还是要去 runc github 重新下载一个wget -O /approot1/k8s/pkg/containerd.tar.gz \ https://github.com/containerd/containerd/releases/download/v1.5.9/cri-containerd-cni-1.5.9-linux-amd64.tar.gz wget -O /approot1/k8s/pkg/runc https://github.com/opencontainers/runc/releases/download/v1.0.3/runc.amd64 mkdir /approot1/k8s/pkg/containerd cd /approot1/k8s/pkg/ for i in $(ls *containerd*.tar.gz);do tar xvf $i -C /approot1/k8s/pkg/containerd && rm -f $i;done chmod +x /approot1/k8s/pkg/runc mv /approot1/k8s/pkg/containerd/usr/local/bin/{containerd,containerd-shim*,crictl,ctr} /approot1/k8s/pkg/containerd/ mv /approot1/k8s/pkg/containerd/opt/cni/bin/{bridge,flannel,host-local,loopback,portmap} /approot1/k8s/pkg/containerd/ rm -rf /approot1/k8s/pkg/containerd/{etc,opt,usr}配置 containerd 为 systemctl 管理vim /approot1/k8s/tmp/service/containerd.service注意二进制文件存放路径如果 runc 二进制文件不在 /usr/bin/ 目录下,需要有 Environment 参数,指定 runc 二进制文件的路径给 PATH ,否则当 k8s 启动 pod 的时候会报错 exec: "runc": executable file not found in $PATH: unknown[Unit] Description=containerd container runtime Documentation=https://containerd.io After=network.target [Service] Environment="PATH=$PATH:/approot1/k8s/bin" ExecStartPre=-/sbin/modprobe overlay ExecStart=/approot1/k8s/bin/containerd Restart=always RestartSec=5 Delegate=yes KillMode=process OOMScoreAdjust=-999 LimitNOFILE=1048576 # Having non-zero Limit*s causes performance problems due to accounting overhead # in the kernel. We recommend using cgroups to do container-local accounting. LimitNPROC=infinity LimitCORE=infinity [Install] WantedBy=multi-user.target配置 containerd 配置文件vim /approot1/k8s/tmp/service/config.tomlroot 容器存储路径,修改成磁盘空间充足的路径bin_dir containerd 服务以及 cni 插件存储路径sandbox_image pause 镜像名称以及镜像tagdisabled_plugins = [] imports = [] oom_score = 0 plugin_dir = "" required_plugins = [] root = "/approot1/data/containerd" state = "/run/containerd" version = 2 [cgroup] path = "" [debug] address = "" format = "" gid = 0 level = "" uid = 0 [grpc] address = "/run/containerd/containerd.sock" gid = 0 max_recv_message_size = 16777216 max_send_message_size = 16777216 tcp_address = "" tcp_tls_cert = "" tcp_tls_key = "" uid = 0 [metrics] address = "" grpc_histogram = false [plugins] [plugins."io.containerd.gc.v1.scheduler"] deletion_threshold = 0 mutation_threshold = 100 pause_threshold = 0.02 schedule_delay = "0s" startup_delay = "100ms" [plugins."io.containerd.grpc.v1.cri"] disable_apparmor = false disable_cgroup = false disable_hugetlb_controller = true disable_proc_mount = false disable_tcp_service = true enable_selinux = false enable_tls_streaming = false ignore_image_defined_volumes = false max_concurrent_downloads = 3 max_container_log_line_size = 16384 netns_mounts_under_state_dir = false restrict_oom_score_adj = false sandbox_image = "k8s.gcr.io/pause:3.6" selinux_category_range = 1024 stats_collect_period = 10 stream_idle_timeout = "4h0m0s" stream_server_address = "127.0.0.1" stream_server_port = "0" systemd_cgroup = false tolerate_missing_hugetlb_controller = true unset_seccomp_profile = "" [plugins."io.containerd.grpc.v1.cri".cni] bin_dir = "/approot1/k8s/bin" conf_dir = "/etc/cni/net.d" conf_template = "/etc/cni/net.d/cni-default.conf" max_conf_num = 1 [plugins."io.containerd.grpc.v1.cri".containerd] default_runtime_name = "runc" disable_snapshot_annotations = true discard_unpacked_layers = false no_pivot = false snapshotter = "overlayfs" [plugins."io.containerd.grpc.v1.cri".containerd.default_runtime] base_runtime_spec = "" container_annotations = [] pod_annotations = [] privileged_without_host_devices = false runtime_engine = "" runtime_root = "" runtime_type = "" [plugins."io.containerd.grpc.v1.cri".containerd.default_runtime.options] [plugins."io.containerd.grpc.v1.cri".containerd.runtimes] [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc] base_runtime_spec = "" container_annotations = [] pod_annotations = [] privileged_without_host_devices = false runtime_engine = "" runtime_root = "" runtime_type = "io.containerd.runc.v2" [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options] BinaryName = "" CriuImagePath = "" CriuPath = "" CriuWorkPath = "" IoGid = 0 IoUid = 0 NoNewKeyring = false NoPivotRoot = false Root = "" ShimCgroup = "" SystemdCgroup = true [plugins."io.containerd.grpc.v1.cri".containerd.untrusted_workload_runtime] base_runtime_spec = "" container_annotations = [] pod_annotations = [] privileged_without_host_devices = false runtime_engine = "" runtime_root = "" runtime_type = "" [plugins."io.containerd.grpc.v1.cri".containerd.untrusted_workload_runtime.options] [plugins."io.containerd.grpc.v1.cri".image_decryption] key_model = "node" [plugins."io.containerd.grpc.v1.cri".registry] config_path = "" [plugins."io.containerd.grpc.v1.cri".registry.auths] [plugins."io.containerd.grpc.v1.cri".registry.configs] [plugins."io.containerd.grpc.v1.cri".registry.headers] [plugins."io.containerd.grpc.v1.cri".registry.mirrors] [plugins."io.containerd.grpc.v1.cri".registry.mirrors."docker.io"] endpoint = ["https://docker.mirrors.ustc.edu.cn", "http://hub-mirror.c.163.com"] [plugins."io.containerd.grpc.v1.cri".registry.mirrors."gcr.io"] endpoint = ["https://gcr.mirrors.ustc.edu.cn"] [plugins."io.containerd.grpc.v1.cri".registry.mirrors."k8s.gcr.io"] endpoint = ["https://gcr.mirrors.ustc.edu.cn/google-containers/"] [plugins."io.containerd.grpc.v1.cri".registry.mirrors."quay.io"] endpoint = ["https://quay.mirrors.ustc.edu.cn"] [plugins."io.containerd.grpc.v1.cri".x509_key_pair_streaming] tls_cert_file = "" tls_key_file = "" [plugins."io.containerd.internal.v1.opt"] path = "/opt/containerd" [plugins."io.containerd.internal.v1.restart"] interval = "10s" [plugins."io.containerd.metadata.v1.bolt"] content_sharing_policy = "shared" [plugins."io.containerd.monitor.v1.cgroups"] no_prometheus = false [plugins."io.containerd.runtime.v1.linux"] no_shim = false runtime = "runc" runtime_root = "" shim = "containerd-shim" shim_debug = false [plugins."io.containerd.runtime.v2.task"] platforms = ["linux/amd64"] [plugins."io.containerd.service.v1.diff-service"] default = ["walking"] [plugins."io.containerd.snapshotter.v1.aufs"] root_path = "" [plugins."io.containerd.snapshotter.v1.btrfs"] root_path = "" [plugins."io.containerd.snapshotter.v1.devmapper"] async_remove = false base_image_size = "" pool_name = "" root_path = "" [plugins."io.containerd.snapshotter.v1.native"] root_path = "" [plugins."io.containerd.snapshotter.v1.overlayfs"] root_path = "" [plugins."io.containerd.snapshotter.v1.zfs"] root_path = "" [proxy_plugins] [stream_processors] [stream_processors."io.containerd.ocicrypt.decoder.v1.tar"] accepts = ["application/vnd.oci.image.layer.v1.tar+encrypted"] args = ["--decryption-keys-path", "/etc/containerd/ocicrypt/keys"] env = ["OCICRYPT_KEYPROVIDER_CONFIG=/etc/containerd/ocicrypt/ocicrypt_keyprovider.conf"] path = "ctd-decoder" returns = "application/vnd.oci.image.layer.v1.tar" [stream_processors."io.containerd.ocicrypt.decoder.v1.tar.gzip"] accepts = ["application/vnd.oci.image.layer.v1.tar+gzip+encrypted"] args = ["--decryption-keys-path", "/etc/containerd/ocicrypt/keys"] env = ["OCICRYPT_KEYPROVIDER_CONFIG=/etc/containerd/ocicrypt/ocicrypt_keyprovider.conf"] path = "ctd-decoder" returns = "application/vnd.oci.image.layer.v1.tar+gzip" [timeouts] "io.containerd.timeout.shim.cleanup" = "5s" "io.containerd.timeout.shim.load" = "5s" "io.containerd.timeout.shim.shutdown" = "3s" "io.containerd.timeout.task.state" = "2s" [ttrpc] address = "" gid = 0 uid = 0配置 crictl 管理工具vim /approot1/k8s/tmp/service/crictl.yamlruntime-endpoint: unix:///run/containerd/containerd.sock配置 cni 网络插件vim /approot1/k8s/tmp/service/cni-default.confsubnet 参数要和 controller-manager 的 --cluster-cidr 参数一致{ "name": "mynet", "cniVersion": "0.3.1", "type": "bridge", "bridge": "mynet0", "isDefaultGateway": true, "ipMasq": true, "hairpinMode": true, "ipam": { "type": "host-local", "subnet": "172.20.0.0/16" }分发配置文件以及创建相关路径for i in 192.168.91.19 192.168.91.20;do \ ssh $i "mkdir -p /etc/containerd"; \ ssh $i "mkdir -p /approot1/k8s/bin"; \ ssh $i "mkdir -p /etc/cni/net.d"; \ scp /approot1/k8s/tmp/service/containerd.service $i:/etc/systemd/system/; \ scp /approot1/k8s/tmp/service/config.toml $i:/etc/containerd/; \ scp /approot1/k8s/tmp/service/cni-default.conf $i:/etc/cni/net.d/; \ scp /approot1/k8s/tmp/service/crictl.yaml $i:/etc/; \ scp /approot1/k8s/pkg/containerd/* $i:/approot1/k8s/bin/; \ scp /approot1/k8s/pkg/runc $i:/approot1/k8s/bin/; \ done启动 containerd 服务for i in 192.168.91.19 192.168.91.20;do \ ssh $i "systemctl daemon-reload"; \ ssh $i "systemctl enable containerd"; \ ssh $i "systemctl restart containerd --no-block"; \ ssh $i "systemctl is-active containerd"; \ done返回 activating 表示 containerd 还在启动中,可以稍等一会,然后再执行 for i in 192.168.91.19 192.168.91.20;do ssh $i "systemctl is-active containerd";done 返回active表示 containerd 启动成功导入 pause 镜像ctr 导入镜像有一个特殊的地方,如果导入的镜像想要 k8s 可以使用,需要加上 -n k8s.io 参数,而且必须是ctr -n k8s.io image import <xxx.tar> 这样的格式,如果是 ctr image import <xxx.tar> -n k8s.io 就会报错 ctr: flag provided but not defined: -n 这个操作确实有点骚气,不太适应如果镜像导入的时候没有加上 -n k8s.io ,启动 pod 的时候 kubelet 会重新去拉取 pause 容器,如果配置的镜像仓库没有这个 tag 的镜像就会报错for i in 192.168.91.19 192.168.91.20;do \ scp /approot1/k8s/images/pause-v3.6.tar $i:/tmp/ ssh $i "ctr -n=k8s.io image import /tmp/pause-v3.6.tar && rm -f /tmp/pause-v3.6.tar"; \ done查看镜像for i in 192.168.91.19 192.168.91.20;do \ ssh $i "ctr -n=k8s.io image list | grep pause"; \ done部署 kubelet 组件创建 kubelet 证书vim /approot1/k8s/tmp/ssl/kubelet-csr.json.192.168.91.19这里的192.168.91.19需要改成自己的ip,不要一股脑的复制黏贴,有多少个node节点就创建多少个json文件,json文件内的 ip 也要修改为 work 节点的 ip,别重复了{ "CN": "system:node:192.168.91.19", "key": { "algo": "rsa", "size": 2048 "hosts": [ "127.0.0.1", "192.168.91.19" "names": [ "C": "CN", "ST": "ShangHai", "L": "ShangHai", "O": "system:nodes", "OU": "System" }for i in 192.168.91.19 192.168.91.20;do \ cd /approot1/k8s/tmp/ssl/; \ cfssl gencert -ca=ca.pem \ -ca-key=ca-key.pem \ -config=ca-config.json \ -profile=kubernetes kubelet-csr.json.$i | cfssljson -bare kubelet.$i; \ done创建 kubeconfig 证书设置集群参数--server 为 apiserver 的访问地址,修改成自己的 ip 地址和 service 文件里面指定的 --secure-port 参数的端口,切记,一定要带上https:// 协议,否则生成的证书,kubectl 命令访问不到 apiserverfor i in 192.168.91.19 192.168.91.20;do \ cd /approot1/k8s/tmp/ssl/; \ /approot1/k8s/pkg/kubernetes/bin/kubectl config set-cluster kubernetes \ --certificate-authority=ca.pem \ --embed-certs=true \ --server=https://192.168.91.19:6443 \ --kubeconfig=kubelet.kubeconfig.$i; \ done设置客户端认证参数for i in 192.168.91.19 192.168.91.20;do \ cd /approot1/k8s/tmp/ssl/; \ /approot1/k8s/pkg/kubernetes/bin/kubectl config set-credentials system:node:$i \ --client-certificate=kubelet.$i.pem \ --client-key=kubelet.$i-key.pem \ --embed-certs=true \ --kubeconfig=kubelet.kubeconfig.$i; \ done设置上下文参数for i in 192.168.91.19 192.168.91.20;do \ cd /approot1/k8s/tmp/ssl/; \ /approot1/k8s/pkg/kubernetes/bin/kubectl config set-context default \ --cluster=kubernetes \ --user=system:node:$i \ --kubeconfig=kubelet.kubeconfig.$i; \ done设置默认上下文for i in 192.168.91.19 192.168.91.20;do \ cd /approot1/k8s/tmp/ssl/; \ /approot1/k8s/pkg/kubernetes/bin/kubectl config \ use-context default \ --kubeconfig=kubelet.kubeconfig.$i; \ done配置 kubelet 配置文件vim /approot1/k8s/tmp/service/config.yamlclusterDNS 参数的 ip 注意修改,和 apiserver 的 --service-cluster-ip-range 参数一个网段,和 k8s 服务 ip 要不一样,一般 k8s 服务的 ip 取网段第一个ip, clusterdns 选网段的第二个ipkind: KubeletConfiguration apiVersion: kubelet.config.k8s.io/v1beta1 address: 0.0.0.0 authentication: anonymous: enabled: false webhook: cacheTTL: 2m0s enabled: true x509: clientCAFile: /etc/kubernetes/ssl/ca.pem authorization: mode: Webhook webhook: cacheAuthorizedTTL: 5m0s cacheUnauthorizedTTL: 30s cgroupDriver: systemd cgroupsPerQOS: true clusterDNS: - 10.88.0.2 clusterDomain: cluster.local configMapAndSecretChangeDetectionStrategy: Watch containerLogMaxFiles: 3 containerLogMaxSize: 10Mi enforceNodeAllocatable: - pods eventBurst: 10 eventRecordQPS: 5 evictionHard: imagefs.available: 15% memory.available: 300Mi nodefs.available: 10% nodefs.inodesFree: 5% evictionPressureTransitionPeriod: 5m0s failSwapOn: true fileCheckFrequency: 40s hairpinMode: hairpin-veth healthzBindAddress: 0.0.0.0 healthzPort: 10248 httpCheckFrequency: 40s imageGCHighThresholdPercent: 85 imageGCLowThresholdPercent: 80 imageMinimumGCAge: 2m0s kubeAPIBurst: 100 kubeAPIQPS: 50 makeIPTablesUtilChains: true maxOpenFiles: 1000000 maxPods: 110 nodeLeaseDurationSeconds: 40 nodeStatusReportFrequency: 1m0s nodeStatusUpdateFrequency: 10s oomScoreAdj: -999 podPidsLimit: -1 port: 10250 # disable readOnlyPort readOnlyPort: 0 resolvConf: /etc/resolv.conf runtimeRequestTimeout: 2m0s serializeImagePulls: true streamingConnectionIdleTimeout: 4h0m0s syncFrequency: 1m0s tlsCertFile: /etc/kubernetes/ssl/kubelet.pem tlsPrivateKeyFile: /etc/kubernetes/ssl/kubelet-key.pem配置 kubelet 为 systemctl 管理vim /approot1/k8s/tmp/service/kubelet.service.192.168.91.19这里的192.168.91.19需要改成自己的ip,不要一股脑的复制黏贴,有多少个node节点就创建多少个service文件,service 文件内的 ip 也要修改为 work 节点的 ip,别重复了--container-runtime 参数默认是 docker ,如果使用 docker 以外的,需要配置为 remote ,并且要配置 --container-runtime-endpoint 参数来指定 sock 文件的路径kubelet 参数[Unit] Description=Kubernetes Kubelet Documentation=https://github.com/GoogleCloudPlatform/kubernetes [Service] WorkingDirectory=/approot1/k8s/data/kubelet ExecStart=/approot1/k8s/bin/kubelet \ --config=/approot1/k8s/data/kubelet/config.yaml \ --cni-bin-dir=/approot1/k8s/bin \ --cni-conf-dir=/etc/cni/net.d \ --container-runtime=remote \ --container-runtime-endpoint=unix:///run/containerd/containerd.sock \ --hostname-override=192.168.91.19 \ --image-pull-progress-deadline=5m \ --kubeconfig=/etc/kubernetes/kubelet.kubeconfig \ --network-plugin=cni \ --pod-infra-container-image=k8s.gcr.io/pause:3.6 \ --root-dir=/approot1/k8s/data/kubelet \ --v=2 Restart=always RestartSec=5 [Install] WantedBy=multi-user.target分发证书以及创建相关路径如果是多节点,只需要在192.168.91.19后面加上对应的ip即可,以空格为分隔,注意将192.168.91.19修改为自己的ip,切莫一股脑复制对应的目录也要确保和自己规划的一致,如果和我的有不同,注意修改,否则服务会启动失败for i in 192.168.91.19 192.168.91.20;do \ ssh $i "mkdir -p /approot1/k8s/data/kubelet"; \ ssh $i "mkdir -p /approot1/k8s/bin"; \ ssh $i "mkdir -p /etc/kubernetes/ssl"; \ scp /approot1/k8s/tmp/ssl/ca*.pem $i:/etc/kubernetes/ssl/; \ scp /approot1/k8s/tmp/ssl/kubelet.$i.pem $i:/etc/kubernetes/ssl/kubelet.pem; \ scp /approot1/k8s/tmp/ssl/kubelet.$i-key.pem $i:/etc/kubernetes/ssl/kubelet-key.pem; \ scp /approot1/k8s/tmp/ssl/kubelet.kubeconfig.$i $i:/etc/kubernetes/kubelet.kubeconfig; \ scp /approot1/k8s/tmp/service/kubelet.service.$i $i:/etc/systemd/system/kubelet.service; \ scp /approot1/k8s/tmp/service/config.yaml $i:/approot1/k8s/data/kubelet/; \ scp /approot1/k8s/pkg/kubernetes/bin/kubelet $i:/approot1/k8s/bin/; \ done启动 kubelet 服务for i in 192.168.91.19 192.168.91.20;do \ ssh $i "systemctl daemon-reload"; \ ssh $i "systemctl enable kubelet"; \ ssh $i "systemctl restart kubelet --no-block"; \ ssh $i "systemctl is-active kubelet"; \ done返回 activating 表示 kubelet 还在启动中,可以稍等一会,然后再执行 for i in 192.168.91.19 192.168.91.20;do ssh $i "systemctl is-active kubelet";done 返回active表示 kubelet 启动成功查看节点是否 Readykubectl get node预期出现类似如下输出,STATUS 字段为 Ready 表示节点正常NAME STATUS ROLES AGE VERSION 192.168.91.19 Ready <none> 20m v1.23.3 192.168.91.20 Ready <none> 20m v1.23.3部署 proxy 组件创建 proxy 证书vim /approot1/k8s/tmp/ssl/kube-proxy-csr.json{ "CN": "system:kube-proxy", "key": { "algo": "rsa", "size": 2048 "hosts": [], "names": [ "C": "CN", "ST": "ShangHai", "L": "ShangHai", "O": "system:kube-proxy", "OU": "System" }cd /approot1/k8s/tmp/ssl/; \ cfssl gencert -ca=ca.pem \ -ca-key=ca-key.pem \ -config=ca-config.json \ -profile=kubernetes kube-proxy-csr.json | cfssljson -bare kube-proxy创建 kubeconfig 证书设置集群参数--server 为 apiserver 的访问地址,修改成自己的 ip 地址和 service 文件里面指定的 --secure-port 参数的端口,切记,一定要带上https:// 协议,否则生成的证书,kubectl 命令访问不到 apiservercd /approot1/k8s/tmp/ssl/ /approot1/k8s/pkg/kubernetes/bin/kubectl config set-cluster kubernetes \ --certificate-authority=ca.pem \ --embed-certs=true \ --server=https://192.168.91.19:6443 \ --kubeconfig=kube-proxy.kubeconfig设置客户端认证参数cd /approot1/k8s/tmp/ssl/ /approot1/k8s/pkg/kubernetes/bin/kubectl config set-credentials kube-proxy \ --client-certificate=kube-proxy.pem \ --client-key=kube-proxy-key.pem \ --embed-certs=true \ --kubeconfig=kube-proxy.kubeconfig设置上下文参数cd /approot1/k8s/tmp/ssl/ /approot1/k8s/pkg/kubernetes/bin/kubectl config set-context default \ --cluster=kubernetes \ --user=kube-proxy \ --kubeconfig=kube-proxy.kubeconfig设置默认上下文cd /approot1/k8s/tmp/ssl/ /approot1/k8s/pkg/kubernetes/bin/kubectl config \ use-context default \ --kubeconfig=kube-proxy.kubeconfig配置 kube-proxy 配置文件vim /approot1/k8s/tmp/service/kube-proxy-config.yaml.192.168.91.19这里的192.168.91.19需要改成自己的ip,不要一股脑的复制黏贴,有多少个node节点就创建多少个service文件,service 文件内的 ip 也要修改为 work 节点的 ip,别重复了clusterCIDR 参数要和 controller-manager 的 --cluster-cidr 参数一致hostnameOverride 要和 kubelet 的 --hostname-override 参数一致,否则会出现 node not found 的报错kind: KubeProxyConfiguration apiVersion: kubeproxy.config.k8s.io/v1alpha1 bindAddress: 0.0.0.0 clientConnection: kubeconfig: "/etc/kubernetes/kube-proxy.kubeconfig" clusterCIDR: "172.20.0.0/16" conntrack: maxPerCore: 32768 min: 131072 tcpCloseWaitTimeout: 1h0m0s tcpEstablishedTimeout: 24h0m0s healthzBindAddress: 0.0.0.0:10256 hostnameOverride: "192.168.91.19" metricsBindAddress: 0.0.0.0:10249 mode: "ipvs"配置 proxy 为 systemctl 管理vim /approot1/k8s/tmp/service/kube-proxy.service[Unit] Description=Kubernetes Kube-Proxy Server Documentation=https://github.com/GoogleCloudPlatform/kubernetes After=network.target [Service] # kube-proxy 根据 --cluster-cidr 判断集群内部和外部流量 ## 指定 --cluster-cidr 或 --masquerade-all 选项后 ## kube-proxy 会对访问 Service IP 的请求做 SNAT WorkingDirectory=/approot1/k8s/data/kube-proxy ExecStart=/approot1/k8s/bin/kube-proxy \ --config=/approot1/k8s/data/kube-proxy/kube-proxy-config.yaml Restart=always RestartSec=5 LimitNOFILE=65536 [Install] WantedBy=multi-user.target分发证书以及创建相关路径如果是多节点,只需要在192.168.91.19后面加上对应的ip即可,以空格为分隔,注意将192.168.91.19修改为自己的ip,切莫一股脑复制对应的目录也要确保和自己规划的一致,如果和我的有不同,注意修改,否则服务会启动失败for i in 192.168.91.19 192.168.91.20;do \ ssh $i "mkdir -p /approot1/k8s/data//kube-proxy"; \ ssh $i "mkdir -p /approot1/k8s/bin"; \ ssh $i "mkdir -p /etc/kubernetes/ssl"; \ scp /approot1/k8s/tmp/ssl/kube-proxy.kubeconfig $i:/etc/kubernetes/; \ scp /approot1/k8s/tmp/service/kube-proxy.service $i:/etc/systemd/system/; \ scp /approot1/k8s/tmp/service/kube-proxy-config.yaml.$i $i:/approot1/k8s/data/kube-proxy/kube-proxy-config.yaml; \ scp /approot1/k8s/pkg/kubernetes/bin/kube-proxy $i:/approot1/k8s/bin/; \ done启动 kube-proxy 服务for i in 192.168.91.19 192.168.91.20;do \ ssh $i "systemctl daemon-reload"; \ ssh $i "systemctl enable kube-proxy"; \ ssh $i "systemctl restart kube-proxy --no-block"; \ ssh $i "systemctl is-active kube-proxy"; \ done返回 activating 表示 kubelet 还在启动中,可以稍等一会,然后再执行 for i in 192.168.91.19 192.168.91.20;do ssh $i "systemctl is-active kubelet";done 返回active表示 kubelet 启动成功部署 flannel 组件flannel github配置 flannel yaml 文件vim /approot1/k8s/tmp/service/flannel.yamlnet-conf.json 内的 Network 参数需要和 controller-manager 的 --cluster-cidr 参数一致--- apiVersion: policy/v1beta1 kind: PodSecurityPolicy metadata: name: psp.flannel.unprivileged annotations: seccomp.security.alpha.kubernetes.io/allowedProfileNames: docker/default seccomp.security.alpha.kubernetes.io/defaultProfileName: docker/default apparmor.security.beta.kubernetes.io/allowedProfileNames: runtime/default apparmor.security.beta.kubernetes.io/defaultProfileName: runtime/default spec: privileged: false volumes: - configMap - secret - emptyDir - hostPath allowedHostPaths: - pathPrefix: "/etc/cni/net.d" - pathPrefix: "/etc/kube-flannel" - pathPrefix: "/run/flannel" readOnlyRootFilesystem: false # Users and groups runAsUser: rule: RunAsAny supplementalGroups: rule: RunAsAny fsGroup: rule: RunAsAny # Privilege Escalation allowPrivilegeEscalation: false defaultAllowPrivilegeEscalation: false # Capabilities allowedCapabilities: ['NET_ADMIN', 'NET_RAW'] defaultAddCapabilities: [] requiredDropCapabilities: [] # Host namespaces hostPID: false hostIPC: false hostNetwork: true hostPorts: - min: 0 max: 65535 # SELinux seLinux: # SELinux is unused in CaaSP rule: 'RunAsAny' kind: ClusterRole apiVersion: rbac.authorization.k8s.io/v1 metadata: name: flannel rules: - apiGroups: ['policy'] resources: ['podsecuritypolicies'] verbs: ['use'] resourceNames: ['psp.flannel.unprivileged'] - apiGroups: resources: - pods verbs: - get - apiGroups: resources: - nodes verbs: - list - watch - apiGroups: resources: - nodes/status verbs: - patch kind: ClusterRoleBinding apiVersion: rbac.authorization.k8s.io/v1 metadata: name: flannel roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: flannel subjects: - kind: ServiceAccount name: flannel namespace: kube-system apiVersion: v1 kind: ServiceAccount metadata: name: flannel namespace: kube-system kind: ConfigMap apiVersion: v1 metadata: name: kube-flannel-cfg namespace: kube-system labels: tier: node app: flannel data: cni-conf.json: | "name": "cbr0", "cniVersion": "0.3.1", "plugins": [ "type": "flannel", "delegate": { "hairpinMode": true, "isDefaultGateway": true "type": "portmap", "capabilities": { "portMappings": true net-conf.json: | "Network": "172.20.0.0/16", "Backend": { "Type": "vxlan" apiVersion: apps/v1 kind: DaemonSet metadata: name: kube-flannel-ds namespace: kube-system labels: tier: node app: flannel spec: selector: matchLabels: app: flannel template: metadata: labels: tier: node app: flannel spec: affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: kubernetes.io/os operator: In values: - linux hostNetwork: true priorityClassName: system-node-critical tolerations: - operator: Exists effect: NoSchedule serviceAccountName: flannel initContainers: - name: install-cni image: quay.io/coreos/flannel:v0.15.1 command: args: - /etc/kube-flannel/cni-conf.json - /etc/cni/net.d/10-flannel.conflist volumeMounts: - name: cni mountPath: /etc/cni/net.d - name: flannel-cfg mountPath: /etc/kube-flannel/ containers: - name: kube-flannel image: quay.io/coreos/flannel:v0.15.1 command: - /opt/bin/flanneld args: - --ip-masq - --kube-subnet-mgr resources: requests: cpu: "100m" memory: "50Mi" limits: cpu: "100m" memory: "50Mi" securityContext: privileged: false capabilities: add: ["NET_ADMIN", "NET_RAW"] - name: POD_NAME valueFrom: fieldRef: fieldPath: metadata.name - name: POD_NAMESPACE valueFrom: fieldRef: fieldPath: metadata.namespace volumeMounts: - name: run mountPath: /run/flannel - name: flannel-cfg mountPath: /etc/kube-flannel/ volumes: - name: run hostPath: path: /run/flannel - name: cni hostPath: path: /etc/cni/net.d - name: flannel-cfg configMap: name: kube-flannel-cfg配置 flannel cni 网卡配置文件vim /approot1/k8s/tmp/service/10-flannel.conflist{ "name": "cbr0", "cniVersion": "0.3.1", "plugins": [ "type": "flannel", "delegate": { "hairpinMode": true, "isDefaultGateway": true "type": "portmap", "capabilities": { "portMappings": true }导入 flannel 镜像for i in 192.168.91.19 192.168.91.20;do \ scp /approot1/k8s/images/flannel-v0.15.1.tar $i:/tmp/ ssh $i "ctr -n=k8s.io image import /tmp/flannel-v0.15.1.tar && rm -f /tmp/flannel-v0.15.1.tar"; \ done查看镜像for i in 192.168.91.19 192.168.91.20;do \ ssh $i "ctr -n=k8s.io image list | grep flannel"; \ done分发 flannel cni 网卡配置文件for i in 192.168.91.19 192.168.91.20;do \ ssh $i "rm -f /etc/cni/net.d/10-default.conf"; \ scp /approot1/k8s/tmp/service/10-flannel.conflist $i:/etc/cni/net.d/; \ done分发完 flannel cni 网卡配置文件后,节点会出现暂时的 NotReady 状态,需要等到节点都变回 Ready 状态后,再运行 flannel 组件在 k8s 中运行 flannel 组件kubectl apply -f /approot1/k8s/tmp/service/flannel.yaml检查 flannel pod 是否运行成功kubectl get pod -n kube-system | grep flannel预期输出类似如下结果flannel 属于 DaemonSet ,属于和节点共存亡类型的 pod ,k8s 有多少 node ,flannel 就有多少 pod ,当 node 被删除的时候, flannel pod 也会随之删除kube-flannel-ds-86rrv 1/1 Running 0 8m54s kube-flannel-ds-bkgzx 1/1 Running 0 8m53ssuse 12 发行版会出现 Init:CreateContainerError 的情况,此时需要 kubectl describe pod -n kube-system <flannel_pod_name> 查看报错原因,Error: failed to create containerd container: get apparmor_parser version: exec: "apparmor_parser": executable file not found in $PATH 出现这个报错,只需要使用 which apparmor_parser 找到 apparmor_parser 所在路径,然后做一个软连接到 kubelet 命令所在目录即可,然后重启 pod ,注意,所有 flannel 所在节点都需要执行这个软连接操作部署 coredns 组件配置 coredns yaml 文件vim /approot1/k8s/tmp/service/coredns.yamlclusterIP 参数要和 kubelet 配置文件的 clusterDNS 参数一致apiVersion: v1 kind: ServiceAccount metadata: name: coredns namespace: kube-system labels: kubernetes.io/cluster-service: "true" addonmanager.kubernetes.io/mode: Reconcile apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: labels: kubernetes.io/bootstrapping: rbac-defaults addonmanager.kubernetes.io/mode: Reconcile name: system:coredns rules: - apiGroups: resources: - endpoints - services - pods - namespaces verbs: - list - watch - apiGroups: resources: - nodes verbs: - get - apiGroups: - discovery.k8s.io resources: - endpointslices verbs: - list - watch apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: annotations: rbac.authorization.kubernetes.io/autoupdate: "true" labels: kubernetes.io/bootstrapping: rbac-defaults addonmanager.kubernetes.io/mode: EnsureExists name: system:coredns roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: system:coredns subjects: - kind: ServiceAccount name: coredns namespace: kube-system apiVersion: v1 kind: ConfigMap metadata: name: coredns namespace: kube-system labels: addonmanager.kubernetes.io/mode: EnsureExists data: Corefile: | .:53 { errors health { lameduck 5s ready kubernetes cluster.local in-addr.arpa ip6.arpa { pods insecure fallthrough in-addr.arpa ip6.arpa ttl 30 prometheus :9153 forward . /etc/resolv.conf { max_concurrent 1000 cache 30 reload loadbalance apiVersion: apps/v1 kind: Deployment metadata: name: coredns namespace: kube-system labels: k8s-app: kube-dns kubernetes.io/cluster-service: "true" addonmanager.kubernetes.io/mode: Reconcile kubernetes.io/name: "CoreDNS" spec: replicas: 1 strategy: type: RollingUpdate rollingUpdate: maxUnavailable: 1 selector: matchLabels: k8s-app: kube-dns template: metadata: labels: k8s-app: kube-dns spec: securityContext: seccompProfile: type: RuntimeDefault priorityClassName: system-cluster-critical serviceAccountName: coredns affinity: podAntiAffinity: preferredDuringSchedulingIgnoredDuringExecution: - weight: 100 podAffinityTerm: labelSelector: matchExpressions: - key: k8s-app operator: In values: ["kube-dns"] topologyKey: kubernetes.io/hostname tolerations: - key: "CriticalAddonsOnly" operator: "Exists" nodeSelector: kubernetes.io/os: linux containers: - name: coredns image: docker.io/coredns/coredns:1.8.6 imagePullPolicy: IfNotPresent resources: limits: memory: 300Mi requests: cpu: 100m memory: 70Mi args: [ "-conf", "/etc/coredns/Corefile" ] volumeMounts: - name: config-volume mountPath: /etc/coredns readOnly: true ports: - containerPort: 53 name: dns protocol: UDP - containerPort: 53 name: dns-tcp protocol: TCP - containerPort: 9153 name: metrics protocol: TCP livenessProbe: httpGet: path: /health port: 8080 scheme: HTTP initialDelaySeconds: 60 timeoutSeconds: 5 successThreshold: 1 failureThreshold: 5 readinessProbe: httpGet: path: /ready port: 8181 scheme: HTTP securityContext: allowPrivilegeEscalation: false capabilities: - NET_BIND_SERVICE drop: - all readOnlyRootFilesystem: true dnsPolicy: Default volumes: - name: config-volume configMap: name: coredns items: - key: Corefile path: Corefile apiVersion: v1 kind: Service metadata: name: kube-dns namespace: kube-system annotations: prometheus.io/port: "9153" prometheus.io/scrape: "true" labels: k8s-app: kube-dns kubernetes.io/cluster-service: "true" addonmanager.kubernetes.io/mode: Reconcile kubernetes.io/name: "CoreDNS" spec: selector: k8s-app: kube-dns clusterIP: 10.88.0.2 ports: - name: dns port: 53 protocol: UDP - name: dns-tcp port: 53 protocol: TCP - name: metrics port: 9153 protocol: TCP导入 coredns 镜像for i in 192.168.91.19 192.168.91.20;do \ scp /approot1/k8s/images/coredns-v1.8.6.tar $i:/tmp/ ssh $i "ctr -n=k8s.io image import /tmp/coredns-v1.8.6.tar && rm -f /tmp/coredns-v1.8.6.tar"; \ done查看镜像for i in 192.168.91.19 192.168.91.20;do \ ssh $i "ctr -n=k8s.io image list | grep coredns"; \ done在 k8s 中运行 coredns 组件kubectl apply -f /approot1/k8s/tmp/service/coredns.yaml检查 coredns pod 是否运行成功kubectl get pod -n kube-system | grep coredns预期输出类似如下结果因为 coredns yaml 文件内的 replicas 参数是 1 ,因此这里只有一个 pod ,如果改成 2 ,就会出现两个 podcoredns-5fd74ff788-cddqf 1/1 Running 0 10s部署 metrics-server 组件配置 metrics-server yaml 文件vim /approot1/k8s/tmp/service/metrics-server.yamlapiVersion: v1 kind: ServiceAccount metadata: labels: k8s-app: metrics-server name: metrics-server namespace: kube-system apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: labels: k8s-app: metrics-server rbac.authorization.k8s.io/aggregate-to-admin: "true" rbac.authorization.k8s.io/aggregate-to-edit: "true" rbac.authorization.k8s.io/aggregate-to-view: "true" name: system:aggregated-metrics-reader rules: - apiGroups: - metrics.k8s.io resources: - pods - nodes verbs: - get - list - watch apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: labels: k8s-app: metrics-server name: system:metrics-server rules: - apiGroups: resources: - pods - nodes - nodes/stats - namespaces - configmaps verbs: - get - list - watch apiVersion: rbac.authorization.k8s.io/v1 kind: RoleBinding metadata: labels: k8s-app: metrics-server name: metrics-server-auth-reader namespace: kube-system roleRef: apiGroup: rbac.authorization.k8s.io kind: Role name: extension-apiserver-authentication-reader subjects: - kind: ServiceAccount name: metrics-server namespace: kube-system apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: labels: k8s-app: metrics-server name: metrics-server:system:auth-delegator roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: system:auth-delegator subjects: - kind: ServiceAccount name: metrics-server namespace: kube-system apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: labels: k8s-app: metrics-server name: system:metrics-server roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: system:metrics-server subjects: - kind: ServiceAccount name: metrics-server namespace: kube-system apiVersion: v1 kind: Service metadata: labels: k8s-app: metrics-server name: metrics-server namespace: kube-system spec: ports: - name: https port: 443 protocol: TCP targetPort: https selector: k8s-app: metrics-server apiVersion: apps/v1 kind: Deployment metadata: labels: k8s-app: metrics-server name: metrics-server namespace: kube-system spec: selector: matchLabels: k8s-app: metrics-server strategy: rollingUpdate: maxUnavailable: 0 template: metadata: labels: k8s-app: metrics-server spec: containers: - args: - --cert-dir=/tmp - --secure-port=4443 - --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname - --kubelet-insecure-tls - --kubelet-use-node-status-port - --metric-resolution=15s image: k8s.gcr.io/metrics-server/metrics-server:v0.5.2 imagePullPolicy: IfNotPresent livenessProbe: failureThreshold: 3 httpGet: path: /livez port: https scheme: HTTPS periodSeconds: 10 name: metrics-server ports: - containerPort: 4443 name: https protocol: TCP readinessProbe: failureThreshold: 3 httpGet: path: /readyz port: https scheme: HTTPS initialDelaySeconds: 20 periodSeconds: 10 resources: requests: cpu: 100m memory: 200Mi securityContext: readOnlyRootFilesystem: true runAsNonRoot: true runAsUser: 1000 volumeMounts: - mountPath: /tmp name: tmp-dir nodeSelector: kubernetes.io/os: linux priorityClassName: system-cluster-critical serviceAccountName: metrics-server volumes: - emptyDir: {} name: tmp-dir apiVersion: apiregistration.k8s.io/v1 kind: APIService metadata: labels: k8s-app: metrics-server name: v1beta1.metrics.k8s.io spec: group: metrics.k8s.io groupPriorityMinimum: 100 insecureSkipTLSVerify: true service: name: metrics-server namespace: kube-system version: v1beta1 versionPriority: 100导入 metrics-server 镜像for i in 192.168.91.19 192.168.91.20;do \ scp /approot1/k8s/images/metrics-server-v0.5.2.tar $i:/tmp/ ssh $i "ctr -n=k8s.io image import /tmp/metrics-server-v0.5.2.tar && rm -f /tmp/metrics-server-v0.5.2.tar"; \ done查看镜像for i in 192.168.91.19 192.168.91.20;do \ ssh $i "ctr -n=k8s.io image list | grep metrics-server"; \ done在 k8s 中运行 metrics-server 组件kubectl apply -f /approot1/k8s/tmp/service/metrics-server.yaml检查 metrics-server pod 是否运行成功kubectl get pod -n kube-system | grep metrics-server预期输出类似如下结果metrics-server-6c95598969-qnc76 1/1 Running 0 71s验证 metrics-server 功能查看节点资源使用情况kubectl top node预期输出类似如下结果metrics-server 启动会偏慢,速度取决于机器配置,如果输出 is not yet 或者 is not ready 就等一会再执行一次 kubectl top nodeNAME CPU(cores) CPU% MEMORY(bytes) MEMORY% 192.168.91.19 285m 4% 2513Mi 32% 192.168.91.20 71m 3% 792Mi 21%查看指定 namespace 的 pod 资源使用情况kubectl top pod -n kube-system预期输出类似如下结果NAME CPU(cores) MEMORY(bytes) coredns-5fd74ff788-cddqf 11m 18Mi kube-flannel-ds-86rrv 4m 18Mi kube-flannel-ds-bkgzx 6m 22Mi kube-flannel-ds-v25xc 6m 22Mi metrics-server-6c95598969-qnc76 6m 22Mi部署 dashboard 组件配置 dashboard yaml 文件vim /approot1/k8s/tmp/service/dashboard.yaml--- apiVersion: v1 kind: ServiceAccount metadata: name: admin-user namespace: kube-system apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: admin-user roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: cluster-admin subjects: - kind: ServiceAccount name: admin-user namespace: kube-system apiVersion: v1 kind: ServiceAccount metadata: name: dashboard-read-user namespace: kube-system apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: dashboard-read-binding roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: dashboard-read-clusterrole subjects: - kind: ServiceAccount name: dashboard-read-user namespace: kube-system apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: name: dashboard-read-clusterrole rules: - apiGroups: resources: - configmaps - endpoints - nodes - persistentvolumes - persistentvolumeclaims - persistentvolumeclaims/status - pods - replicationcontrollers - replicationcontrollers/scale - serviceaccounts - services - services/status verbs: - get - list - watch - apiGroups: resources: - bindings - events - limitranges - namespaces/status - pods/log - pods/status - replicationcontrollers/status - resourcequotas - resourcequotas/status verbs: - get - list - watch - apiGroups: resources: - namespaces verbs: - get - list - watch - apiGroups: - apps resources: - controllerrevisions - daemonsets - daemonsets/status - deployments - deployments/scale - deployments/status - replicasets - replicasets/scale - replicasets/status - statefulsets - statefulsets/scale - statefulsets/status verbs: - get - list - watch - apiGroups: - autoscaling resources: - horizontalpodautoscalers - horizontalpodautoscalers/status verbs: - get - list - watch - apiGroups: - batch resources: - cronjobs - cronjobs/status - jobs - jobs/status verbs: - get - list - watch - apiGroups: - extensions resources: - daemonsets - daemonsets/status - deployments - deployments/scale - deployments/status - ingresses - ingresses/status - replicasets - replicasets/scale - replicasets/status - replicationcontrollers/scale verbs: - get - list - watch - apiGroups: - policy resources: - poddisruptionbudgets - poddisruptionbudgets/status verbs: - get - list - watch - apiGroups: - networking.k8s.io resources: - ingresses - ingresses/status - networkpolicies verbs: - get - list - watch - apiGroups: - storage.k8s.io resources: - storageclasses - volumeattachments verbs: - get - list - watch - apiGroups: - rbac.authorization.k8s.io resources: - clusterrolebindings - clusterroles - roles - rolebindings verbs: - get - list - watch apiVersion: v1 kind: ServiceAccount metadata: labels: k8s-app: kubernetes-dashboard name: kubernetes-dashboard namespace: kube-system kind: Service apiVersion: v1 metadata: labels: k8s-app: kubernetes-dashboard kubernetes.io/cluster-service: "true" name: kubernetes-dashboard namespace: kube-system spec: ports: - port: 443 targetPort: 8443 selector: k8s-app: kubernetes-dashboard type: NodePort apiVersion: v1 kind: Secret metadata: labels: k8s-app: kubernetes-dashboard name: kubernetes-dashboard-certs namespace: kube-system type: Opaque apiVersion: v1 kind: Secret metadata: labels: k8s-app: kubernetes-dashboard name: kubernetes-dashboard-csrf namespace: kube-system type: Opaque data: csrf: "" apiVersion: v1 kind: Secret metadata: labels: k8s-app: kubernetes-dashboard name: kubernetes-dashboard-key-holder namespace: kube-system type: Opaque kind: ConfigMap apiVersion: v1 metadata: labels: k8s-app: kubernetes-dashboard name: kubernetes-dashboard-settings namespace: kube-system kind: Role apiVersion: rbac.authorization.k8s.io/v1 metadata: labels: k8s-app: kubernetes-dashboard name: kubernetes-dashboard namespace: kube-system rules: # Allow Dashboard to get, update and delete Dashboard exclusive secrets. - apiGroups: [""] resources: ["secrets"] resourceNames: ["kubernetes-dashboard-key-holder", "kubernetes-dashboard-certs", "kubernetes-dashboard-csrf"] verbs: ["get", "update", "delete"] # Allow Dashboard to get and update 'kubernetes-dashboard-settings' config map. - apiGroups: [""] resources: ["configmaps"] resourceNames: ["kubernetes-dashboard-settings"] verbs: ["get", "update"] # Allow Dashboard to get metrics. - apiGroups: [""] resources: ["services"] resourceNames: ["heapster", "dashboard-metrics-scraper"] verbs: ["proxy"] - apiGroups: [""] resources: ["services/proxy"] resourceNames: ["heapster", "http:heapster:", "https:heapster:", "dashboard-metrics-scraper", "http:dashboard-metrics-scraper"] verbs: ["get"] kind: ClusterRole apiVersion: rbac.authorization.k8s.io/v1 metadata: labels: k8s-app: kubernetes-dashboard name: kubernetes-dashboard rules: # Allow Metrics Scraper to get metrics from the Metrics server - apiGroups: ["metrics.k8s.io"] resources: ["pods", "nodes"] verbs: ["get", "list", "watch"] apiVersion: rbac.authorization.k8s.io/v1 kind: RoleBinding metadata: labels: k8s-app: kubernetes-dashboard name: kubernetes-dashboard namespace: kube-system roleRef: apiGroup: rbac.authorization.k8s.io kind: Role name: kubernetes-dashboard subjects: - kind: ServiceAccount name: kubernetes-dashboard namespace: kube-system apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: kubernetes-dashboard roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: kubernetes-dashboard subjects: - kind: ServiceAccount name: kubernetes-dashboard namespace: kube-system kind: Deployment apiVersion: apps/v1 metadata: labels: k8s-app: kubernetes-dashboard name: kubernetes-dashboard namespace: kube-system spec: replicas: 1 revisionHistoryLimit: 10 selector: matchLabels: k8s-app: kubernetes-dashboard template: metadata: labels: k8s-app: kubernetes-dashboard spec: containers: - name: kubernetes-dashboard image: kubernetesui/dashboard:v2.4.0 imagePullPolicy: IfNotPresent ports: - containerPort: 8443 protocol: TCP args: - --auto-generate-certificates - --namespace=kube-system - --token-ttl=1800 - --sidecar-host=http://dashboard-metrics-scraper:8000 # Uncomment the following line to manually specify Kubernetes API server Host # If not specified, Dashboard will attempt to auto discover the API server and connect # to it. Uncomment only if the default does not work. # - --apiserver-host=http://my-address:port volumeMounts: - name: kubernetes-dashboard-certs mountPath: /certs # Create on-disk volume to store exec logs - mountPath: /tmp name: tmp-volume livenessProbe: httpGet: scheme: HTTPS path: / port: 8443 initialDelaySeconds: 30 timeoutSeconds: 30 securityContext: allowPrivilegeEscalation: false readOnlyRootFilesystem: true runAsUser: 1001 runAsGroup: 2001 volumes: - name: kubernetes-dashboard-certs secret: secretName: kubernetes-dashboard-certs - name: tmp-volume emptyDir: {} serviceAccountName: kubernetes-dashboard nodeSelector: "kubernetes.io/os": linux # Comment the following tolerations if Dashboard must not be deployed on master tolerations: - key: node-role.kubernetes.io/master effect: NoSchedule kind: Service apiVersion: v1 metadata: labels: k8s-app: dashboard-metrics-scraper name: dashboard-metrics-scraper namespace: kube-system spec: ports: - port: 8000 targetPort: 8000 selector: k8s-app: dashboard-metrics-scraper kind: Deployment apiVersion: apps/v1 metadata: labels: k8s-app: dashboard-metrics-scraper name: dashboard-metrics-scraper namespace: kube-system spec: replicas: 1 revisionHistoryLimit: 10 selector: matchLabels: k8s-app: dashboard-metrics-scraper template: metadata: labels: k8s-app: dashboard-metrics-scraper spec: securityContext: seccompProfile: type: RuntimeDefault containers: - name: dashboard-metrics-scraper image: kubernetesui/metrics-scraper:v1.0.7 imagePullPolicy: IfNotPresent ports: - containerPort: 8000 protocol: TCP livenessProbe: httpGet: scheme: HTTP path: / port: 8000 initialDelaySeconds: 30 timeoutSeconds: 30 volumeMounts: - mountPath: /tmp name: tmp-volume securityContext: allowPrivilegeEscalation: false readOnlyRootFilesystem: true runAsUser: 1001 runAsGroup: 2001 serviceAccountName: kubernetes-dashboard nodeSelector: "kubernetes.io/os": linux # Comment the following tolerations if Dashboard must not be deployed on master tolerations: - key: node-role.kubernetes.io/master effect: NoSchedule volumes: - name: tmp-volume emptyDir: {}导入 dashboard 镜像for i in 192.168.91.19 192.168.91.20;do \ scp /approot1/k8s/images/dashboard-*.tar $i:/tmp/ ssh $i "ctr -n=k8s.io image import /tmp/dashboard-v2.4.0.tar && rm -f /tmp/dashboard-v2.4.0.tar"; \ ssh $i "ctr -n=k8s.io image import /tmp/dashboard-metrics-scraper-v1.0.7.tar && rm -f /tmp/dashboard-metrics-scraper-v1.0.7.tar"; \ done查看镜像for i in 192.168.91.19 192.168.91.20;do \ ssh $i "ctr -n=k8s.io image list | egrep 'dashboard|metrics-scraper'"; \ done在 k8s 中运行 dashboard 组件kubectl apply -f /approot1/k8s/tmp/service/dashboard.yaml检查 dashboard pod 是否运行成功kubectl get pod -n kube-system | grep dashboard预期输出类似如下结果dashboard-metrics-scraper-799d786dbf-v28pm 1/1 Running 0 2m55s kubernetes-dashboard-9f8c8b989-rhb7z 1/1 Running 0 2m55s查看 dashboard 访问端口在 service 当中没有指定 dashboard 的访问端口,所以需要自己获取,也可以修改 yaml 文件指定访问端口预期输出类似如下结果我这边是将 30210 端口映射给 pod 的 443 端口kubernetes-dashboard NodePort 10.88.127.68 <none> 443:30210/TCP 5m30s根据得到的端口访问 dashboard 页面,例如: https://192.168.91.19:30210查看 dashboard 登录 token获取 token 文件名称kubectl get secrets -n kube-system | grep admin预期输出类似如下结果admin-user-token-zvrst kubernetes.io/service-account-token 3 9m2s获取 token 内容kubectl get secrets -n kube-system admin-user-token-zvrst -o jsonpath={.data.token}|base64 -d预期输出类似如下结果eyJhbGciOiJSUzI1NiIsImtpZCI6InA4M1lhZVgwNkJtekhUd3Vqdm9vTE1ma1JYQ1ZuZ3c3ZE1WZmJhUXR4bUUifQ.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJrdWJlLXN5c3RlbSIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VjcmV0Lm5hbWUiOiJhZG1pbi11c2VyLXRva2VuLXp2cnN0Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9zZXJ2aWNlLWFjY291bnQubmFtZSI6ImFkbWluLXVzZXIiLCJrdWJlcm5ldGVzLmlvL3NlcnZpY2VhY2NvdW50L3NlcnZpY2UtYWNjb3VudC51aWQiOiJhYTE3NTg1ZC1hM2JiLTQ0YWYtOWNhZS0yNjQ5YzA0YThmZWYiLCJzdWIiOiJzeXN0ZW06c2VydmljZWFjY291bnQ6a3ViZS1zeXN0ZW06YWRtaW4tdXNlciJ9.K2o9p5St9tvIbXk7mCQCwsZQV11zICwN-JXhRv1hAnc9KFcAcDOiO4NxIeicvC2H9tHQBIJsREowVwY3yGWHj_MQa57EdBNWMrN1hJ5u-XzpzJ6JbQxns8ZBrCpIR8Fxt468rpTyMyqsO2UBo-oXQ0_ZXKss6X6jjxtGLCQFkz1ZfFTQW3n49L4ENzW40sSj4dnaX-PsmosVOpsKRHa8TPndusAT-58aujcqt31Z77C4M13X_vAdjyDLK9r5ZXwV2ryOdONwJye_VtXXrExBt9FWYtLGCQjKn41pwXqEfidT8cY6xbA7XgUVTr9miAmZ-jf1UeEw-nm8FOw9Bb5v6A到此,基于 containerd 二进制部署 k8s v1.23.3 就结束了
前言kubeadm 和 二进制 部署的区别kubeadm优点:部署很方便,两个参数就可以完成集群的部署和节点的加入kubeadm init 初始化节点kubeadm join 节点加入集群缺点:集群证书有效期只有一年,要么破解,要么升级 k8s 版本二进制部署优点:可以自定义集群证书有效期(一般都是十年)所有组件的细节,可以在部署前定制部署过程中,能更好的理解 k8s 各个组件之间的关联缺点:部署相对 kubeadm 会复杂很多人生苦短,我选二进制部署环境准备IP角色内核版本192.168.91.8mastercentos7.6/3.10.0-957.el7.x86_64192.168.91.9workcentos7.6/3.10.0-957.el7.x86_64答应我,所有节点都要关闭防火墙systemctl disable firewalld systemctl stop firewalld答应我,所有节点都要关闭selinuxsetenforce 0 sed -i '/SELINUX/s/enforcing/disabled/g' /etc/selinux/config答应我,所有节点都要关闭swapswapoff -a sed -i '/ swap / s/^\(.*\)$/#\1/g' /etc/fstab答应我,所有节点都要开启内核模块modprobe ip_vs modprobe ip_vs_rr modprobe ip_vs_wrr modprobe ip_vs_sh modprobe nf_conntrack modprobe nf_conntrack_ipv4 modprobe br_netfilter modprobe overlay答应我,所有节点都要开启模块自动加载服务cat > /etc/modules-load.d/k8s-modules.conf <<EOF ip_vs ip_vs_rr ip_vs_wrr ip_vs_sh nf_conntrack nf_conntrack_ipv4 br_netfilter overlay EOF答应我,记得重启服务,并设置为开机自启systemctl enable systemd-modules-load systemctl restart systemd-modules-load答应我,所有节点都要做内核优化cat <<EOF > /etc/sysctl.d/kubernetes.conf # 开启数据包转发功能(实现vxlan) net.ipv4.ip_forward=1 # iptables对bridge的数据进行处理 net.bridge.bridge-nf-call-iptables=1 net.bridge.bridge-nf-call-ip6tables=1 net.bridge.bridge-nf-call-arptables=1 # 关闭tcp_tw_recycle,否则和NAT冲突,会导致服务不通 net.ipv4.tcp_tw_recycle=0 # 不允许将TIME-WAIT sockets重新用于新的TCP连接 net.ipv4.tcp_tw_reuse=0 # socket监听(listen)的backlog上限 net.core.somaxconn=32768 # 最大跟踪连接数,默认 nf_conntrack_buckets * 4 net.netfilter.nf_conntrack_max=1000000 # 禁止使用 swap 空间,只有当系统 OOM 时才允许使用它 vm.swappiness=0 # 计算当前的内存映射文件数。 vm.max_map_count=655360 # 内核可分配的最大文件数 fs.file-max=6553600 # 持久连接 net.ipv4.tcp_keepalive_time=600 net.ipv4.tcp_keepalive_intvl=30 net.ipv4.tcp_keepalive_probes=10 EOF答应我,让配置生效sysctl -p /etc/sysctl.d/kubernetes.conf答应我,所有节点都要清空 iptables 规则iptables -F && iptables -X && iptables -F -t nat && iptables -X -t nat iptables -P FORWARD ACCEPT安装 containerd所有节点都需要安装配置 docker 源 (docker 源里面有 containerd)wget -O /etc/yum.repos.d/docker.repo http://mirrors.aliyun.com/docker-ce/linux/centos/docker-ce.repo查找 containerd 安装包的名称yum search containerd安装 containerdyum install -y containerd.io修改 containerd 配置文件root 容器存储路径,修改成磁盘空间充足的路径sandbox_image pause 镜像名称以及镜像tag(一定要可以拉取到 pause 镜像的,否则会导致集群初始化的时候 kubelet 重启失败)bin_dir cni 插件存放路径,yum 安装的 containerd 默认存放在 /opt/cni/bin 目录下cat <<EOF > /etc/containerd/config.toml disabled_plugins = [] imports = [] oom_score = 0 plugin_dir = "" required_plugins = [] root = "/approot1/data/containerd" state = "/run/containerd" version = 2 [cgroup] path = "" [debug] address = "" format = "" gid = 0 level = "" uid = 0 [grpc] address = "/run/containerd/containerd.sock" gid = 0 max_recv_message_size = 16777216 max_send_message_size = 16777216 tcp_address = "" tcp_tls_cert = "" tcp_tls_key = "" uid = 0 [metrics] address = "" grpc_histogram = false [plugins] [plugins."io.containerd.gc.v1.scheduler"] deletion_threshold = 0 mutation_threshold = 100 pause_threshold = 0.02 schedule_delay = "0s" startup_delay = "100ms" [plugins."io.containerd.grpc.v1.cri"] disable_apparmor = false disable_cgroup = false disable_hugetlb_controller = true disable_proc_mount = false disable_tcp_service = true enable_selinux = false enable_tls_streaming = false ignore_image_defined_volumes = false max_concurrent_downloads = 3 max_container_log_line_size = 16384 netns_mounts_under_state_dir = false restrict_oom_score_adj = false sandbox_image = "registry.cn-hangzhou.aliyuncs.com/google_containers/pause:3.6" selinux_category_range = 1024 stats_collect_period = 10 stream_idle_timeout = "4h0m0s" stream_server_address = "127.0.0.1" stream_server_port = "0" systemd_cgroup = false tolerate_missing_hugetlb_controller = true unset_seccomp_profile = "" [plugins."io.containerd.grpc.v1.cri".cni] bin_dir = "/opt/cni/bin" conf_dir = "/etc/cni/net.d" conf_template = "/etc/cni/net.d/cni-default.conf" max_conf_num = 1 [plugins."io.containerd.grpc.v1.cri".containerd] default_runtime_name = "runc" disable_snapshot_annotations = true discard_unpacked_layers = false no_pivot = false snapshotter = "overlayfs" [plugins."io.containerd.grpc.v1.cri".containerd.default_runtime] base_runtime_spec = "" container_annotations = [] pod_annotations = [] privileged_without_host_devices = false runtime_engine = "" runtime_root = "" runtime_type = "" [plugins."io.containerd.grpc.v1.cri".containerd.default_runtime.options] [plugins."io.containerd.grpc.v1.cri".containerd.runtimes] [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc] base_runtime_spec = "" container_annotations = [] pod_annotations = [] privileged_without_host_devices = false runtime_engine = "" runtime_root = "" runtime_type = "io.containerd.runc.v2" [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options] BinaryName = "" CriuImagePath = "" CriuPath = "" CriuWorkPath = "" IoGid = 0 IoUid = 0 NoNewKeyring = false NoPivotRoot = false Root = "" ShimCgroup = "" SystemdCgroup = true [plugins."io.containerd.grpc.v1.cri".containerd.untrusted_workload_runtime] base_runtime_spec = "" container_annotations = [] pod_annotations = [] privileged_without_host_devices = false runtime_engine = "" runtime_root = "" runtime_type = "" [plugins."io.containerd.grpc.v1.cri".containerd.untrusted_workload_runtime.options] [plugins."io.containerd.grpc.v1.cri".image_decryption] key_model = "node" [plugins."io.containerd.grpc.v1.cri".registry] config_path = "" [plugins."io.containerd.grpc.v1.cri".registry.auths] [plugins."io.containerd.grpc.v1.cri".registry.configs] [plugins."io.containerd.grpc.v1.cri".registry.headers] [plugins."io.containerd.grpc.v1.cri".registry.mirrors] [plugins."io.containerd.grpc.v1.cri".registry.mirrors."docker.io"] endpoint = ["https://docker.mirrors.ustc.edu.cn", "http://hub-mirror.c.163.com"] [plugins."io.containerd.grpc.v1.cri".registry.mirrors."gcr.io"] endpoint = ["https://gcr.mirrors.ustc.edu.cn"] [plugins."io.containerd.grpc.v1.cri".registry.mirrors."k8s.gcr.io"] endpoint = ["https://gcr.mirrors.ustc.edu.cn/google-containers/"] [plugins."io.containerd.grpc.v1.cri".registry.mirrors."quay.io"] endpoint = ["https://quay.mirrors.ustc.edu.cn"] [plugins."io.containerd.grpc.v1.cri".x509_key_pair_streaming] tls_cert_file = "" tls_key_file = "" [plugins."io.containerd.internal.v1.opt"] path = "/opt/containerd" [plugins."io.containerd.internal.v1.restart"] interval = "10s" [plugins."io.containerd.metadata.v1.bolt"] content_sharing_policy = "shared" [plugins."io.containerd.monitor.v1.cgroups"] no_prometheus = false [plugins."io.containerd.runtime.v1.linux"] no_shim = false runtime = "runc" runtime_root = "" shim = "containerd-shim" shim_debug = false [plugins."io.containerd.runtime.v2.task"] platforms = ["linux/amd64"] [plugins."io.containerd.service.v1.diff-service"] default = ["walking"] [plugins."io.containerd.snapshotter.v1.aufs"] root_path = "" [plugins."io.containerd.snapshotter.v1.btrfs"] root_path = "" [plugins."io.containerd.snapshotter.v1.devmapper"] async_remove = false base_image_size = "" pool_name = "" root_path = "" [plugins."io.containerd.snapshotter.v1.native"] root_path = "" [plugins."io.containerd.snapshotter.v1.overlayfs"] root_path = "" [plugins."io.containerd.snapshotter.v1.zfs"] root_path = "" [proxy_plugins] [stream_processors] [stream_processors."io.containerd.ocicrypt.decoder.v1.tar"] accepts = ["application/vnd.oci.image.layer.v1.tar+encrypted"] args = ["--decryption-keys-path", "/etc/containerd/ocicrypt/keys"] env = ["OCICRYPT_KEYPROVIDER_CONFIG=/etc/containerd/ocicrypt/ocicrypt_keyprovider.conf"] path = "ctd-decoder" returns = "application/vnd.oci.image.layer.v1.tar" [stream_processors."io.containerd.ocicrypt.decoder.v1.tar.gzip"] accepts = ["application/vnd.oci.image.layer.v1.tar+gzip+encrypted"] args = ["--decryption-keys-path", "/etc/containerd/ocicrypt/keys"] env = ["OCICRYPT_KEYPROVIDER_CONFIG=/etc/containerd/ocicrypt/ocicrypt_keyprovider.conf"] path = "ctd-decoder" returns = "application/vnd.oci.image.layer.v1.tar+gzip" [timeouts] "io.containerd.timeout.shim.cleanup" = "5s" "io.containerd.timeout.shim.load" = "5s" "io.containerd.timeout.shim.shutdown" = "3s" "io.containerd.timeout.task.state" = "2s" [ttrpc] address = "" gid = 0 uid = 0 EOF启动 containerd 服务,并设置为开机启动systemctl enable containerd systemctl restart containerd配置 kubernetes 源所有节点都需要配置cat <<EOF > /etc/yum.repos.d/kubernetes.repo [kubernetes] name=Kubernetes baseurl=https://mirrors.aliyun.com/kubernetes/yum/repos/kubernetes-el7-x86_64/ enabled=1 gpgcheck=0 repo_gpgcheck=0 gpgkey=https://mirrors.aliyun.com/kubernetes/yum/doc/yum-key.gpg https://mirrors.aliyun.com/kubernetes/yum/doc/rpm-package-key.gpg EOF通过 yum list 命令可以查看当前源的稳定版本,目前的稳定版本是 1.23.3-0yum list kubeadm kubelet安装 kubeadm 以及 kubelet所有节点都需要安装yum install 不带版本,就会安装当前稳定版本,为了后面文档通用,我这里就在安装的时候带上了版本yum install -y kubelet-1.23.3-0 kubeadm-1.23.3-0配置命令参数自动补全功能所有节点都需要安装yum install -y bash-completion echo 'source <(kubectl completion bash)' >> $HOME/.bashrc echo 'source <(kubeadm completion bash)' >> $HOME/.bashrc source $HOME/.bashrc启动 kubelet 服务所有节点都要操作systemctl enable kubelet systemctl restart kubeletkubeadm 部署 master 节点注意在 master 节点上操作查看 kubeadm init 默认配置kubeadm config print init-defaults vim kubeadm.yamlkubeadm 配置 (v1beta3)advertiseAddress 参数需要修改成当前 master 节点的 ipbindPort 参数为 apiserver 服务的访问端口,可以自定义 criSocket 参数定义 容器运行时 使用的套接字,默认是 dockershim ,这里需要修改为 contained 的套接字文件,在 conf.toml 里面可以找到imagePullPolicy 参数定义镜像拉取策略,IfNotPresent 本地没有镜像则拉取镜像;Always 总是重新拉取镜像;Never 从不拉取镜像,本地没有镜像,kubelet 启动 pod 就会报错 (注意驼峰命名,这里的大写别改成小写)certificatesDir 参数定义证书文件存储路径,没特殊要求,可以不修改controlPlaneEndpoint 参数定义稳定访问 ip ,高可用这里可以填 vipdataDir 参数定义 etcd 数据持久化路径,默认 /var/lib/etcd ,部署前,确认路径所在磁盘空间是否足够imageRepository 参数定义镜像仓库名称,默认 k8s.gcr.io ,如果要修改,需要注意确定镜像一定是可以拉取的到,并且所有的镜像都是从这个镜像仓库拉取的kubernetesVersion 参数定义镜像版本,和镜像的 tag 一致podSubnet 参数定义 pod 使用的网段,不要和 serviceSubnet 以及本机网段有冲突serviceSubnet 参数定义 k8s 服务 ip 网段,注意是否和本机网段有冲突cgroupDriver 参数定义 cgroup 驱动,默认是 cgroupfsmode 参数定义转发方式,可选为iptables 和 ipvsname 参数定义节点名称,如果是主机名需要保证可以解析(kubectl get nodes 命令查看到的节点名称)apiVersion: kubeadm.k8s.io/v1beta3 bootstrapTokens: - groups: - system:bootstrappers:kubeadm:default-node-token token: abcdef.0123456789abcdef ttl: 24h0m0s usages: - signing - authentication kind: InitConfiguration localAPIEndpoint: advertiseAddress: 192.168.91.8 bindPort: 6443 nodeRegistration: criSocket: /run/containerd/containerd.sock imagePullPolicy: IfNotPresent name: 192.168.91.8 taints: null apiServer: timeoutForControlPlane: 4m0s apiVersion: kubeadm.k8s.io/v1beta3 certificatesDir: /etc/kubernetes/pki clusterName: kubernetes controlPlaneEndpoint: 192.168.91.8:6443 controllerManager: {} dns: {} etcd: local: dataDir: /var/lib/etcd imageRepository: registry.cn-hangzhou.aliyuncs.com/google_containers kind: ClusterConfiguration kubernetesVersion: 1.23.3 networking: dnsDomain: cluster.local serviceSubnet: 10.96.0.0/12 podSubnet: 172.22.0.0/16 scheduler: {} apiVersion: kubelet.config.k8s.io/v1beta1 kind: KubeletConfiguration cgroupDriver: systemd cgroupsPerQOS: true apiVersion: kubeproxy.config.k8s.io/v1alpha1 kind: KubeProxyConfiguration mode: ipvs集群初始化kubeadm init --config kubeadm.yaml以下是 kubeadm init 的过程,[init] Using Kubernetes version: v1.23.3 [preflight] Running pre-flight checks [WARNING Service-Kubelet]: kubelet service is not enabled, please run 'systemctl enable kubelet.service' [preflight] Pulling images required for setting up a Kubernetes cluster [preflight] This might take a minute or two, depending on the speed of your internet connection [preflight] You can also perform this action in beforehand using 'kubeadm config images pull' [certs] Using certificateDir folder "/etc/kubernetes/pki" [certs] Generating "ca" certificate and key [certs] Generating "apiserver" certificate and key [certs] apiserver serving cert is signed for DNS names [192.168.91.8 kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs [10.96.0.1 192.168.91.8] [certs] Generating "apiserver-kubelet-client" certificate and key [certs] Generating "front-proxy-ca" certificate and key [certs] Generating "front-proxy-client" certificate and key [certs] Generating "etcd/ca" certificate and key [certs] Generating "etcd/server" certificate and key [certs] etcd/server serving cert is signed for DNS names [192.168.91.8 localhost] and IPs [192.168.91.8 127.0.0.1 ::1] [certs] Generating "etcd/peer" certificate and key [certs] etcd/peer serving cert is signed for DNS names [192.168.91.8 localhost] and IPs [192.168.91.8 127.0.0.1 ::1] [certs] Generating "etcd/healthcheck-client" certificate and key [certs] Generating "apiserver-etcd-client" certificate and key [certs] Generating "sa" key and public key [kubeconfig] Using kubeconfig folder "/etc/kubernetes" [kubeconfig] Writing "admin.conf" kubeconfig file [kubeconfig] Writing "kubelet.conf" kubeconfig file [kubeconfig] Writing "controller-manager.conf" kubeconfig file [kubeconfig] Writing "scheduler.conf" kubeconfig file [kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env" [kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml" [kubelet-start] Starting the kubelet [control-plane] Using manifest folder "/etc/kubernetes/manifests" [control-plane] Creating static Pod manifest for "kube-apiserver" [control-plane] Creating static Pod manifest for "kube-controller-manager" [control-plane] Creating static Pod manifest for "kube-scheduler" [etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests" [wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s [apiclient] All control plane components are healthy after 12.504586 seconds [upload-config] Storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace [kubelet] Creating a ConfigMap "kubelet-config-1.23" in namespace kube-system with the configuration for the kubelets in the cluster NOTE: The "kubelet-config-1.23" naming of the kubelet ConfigMap is deprecated. Once the UnversionedKubeletConfigMap feature gate graduates to Beta the default name will become just "kubelet-config". Kubeadm upgrade will handle this transition transparently. [upload-certs] Skipping phase. Please see --upload-certs [mark-control-plane] Marking the node 192.168.91.8 as control-plane by adding the labels: [node-role.kubernetes.io/master(deprecated) node-role.kubernetes.io/control-plane node.kubernetes.io/exclude-from-external-load-balancers] [mark-control-plane] Marking the node 192.168.91.8 as control-plane by adding the taints [node-role.kubernetes.io/master:NoSchedule] [bootstrap-token] Using token: abcdef.0123456789abcdef [bootstrap-token] Configuring bootstrap tokens, cluster-info ConfigMap, RBAC Roles [bootstrap-token] configured RBAC rules to allow Node Bootstrap tokens to get nodes [bootstrap-token] configured RBAC rules to allow Node Bootstrap tokens to post CSRs in order for nodes to get long term certificate credentials [bootstrap-token] configured RBAC rules to allow the csrapprover controller automatically approve CSRs from a Node Bootstrap Token [bootstrap-token] configured RBAC rules to allow certificate rotation for all node client certificates in the cluster [bootstrap-token] Creating the "cluster-info" ConfigMap in the "kube-public" namespace [kubelet-finalize] Updating "/etc/kubernetes/kubelet.conf" to point to a rotatable kubelet client certificate and key [addons] Applied essential addon: CoreDNS [addons] Applied essential addon: kube-proxy Your Kubernetes control-plane has initialized successfully! To start using your cluster, you need to run the following as a regular user: mkdir -p $HOME/.kube sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config sudo chown $(id -u):$(id -g) $HOME/.kube/config Alternatively, if you are the root user, you can run: export KUBECONFIG=/etc/kubernetes/admin.conf You should now deploy a pod network to the cluster. Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at: https://kubernetes.io/docs/concepts/cluster-administration/addons/ You can now join any number of control-plane nodes by copying certificate authorities and service account keys on each node and then running the following as root: kubeadm join 192.168.91.8:6443 --token abcdef.0123456789abcdef \ --discovery-token-ca-cert-hash sha256:5e2387403e698e95b0eab7197837f2425f7b8610e7b400e54d81c27f3c6f1964 \ --control-plane Then you can join any number of worker nodes by running the following on each as root: kubeadm join 192.168.91.8:6443 --token abcdef.0123456789abcdef \ --discovery-token-ca-cert-hash sha256:5e2387403e698e95b0eab7197837f2425f7b8610e7b400e54d81c27f3c6f1964以下操作二选一kubectl 不加 --kubeconfig 参数,默认找的是 $HOME/.kube/config ,如果不创建目录,并且将证书复制过去,就要生成环境变量,或者每次使用 kubectl 命令的时候,都要加上 --kubeconfig 参数指定证书文件,否则 kubectl 命令就找不到集群了 mkdir -p $HOME/.kube sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config sudo chown $(id -u):$(id -g) $HOME/.kube/configecho 'export KUBECONFIG=/etc/kubernetes/admin.conf' >> $HOME/.bashrc source ~/.bashrc查看 k8s 组件运行情况kubectl get pods -n kube-systemNAME READY STATUS RESTARTS AGE coredns-65c54cc984-cglz9 0/1 Pending 0 12s coredns-65c54cc984-qwd5b 0/1 Pending 0 12s etcd-192.168.91.8 1/1 Running 0 27s kube-apiserver-192.168.91.8 1/1 Running 0 21s kube-controller-manager-192.168.91.8 1/1 Running 0 21s kube-proxy-zwdlm 1/1 Running 0 12s kube-scheduler-192.168.91.8 1/1 Running 0 27s因为还没有网络组件,coredns 没有运行成功安装 flannel 组件在 master 节点操作即可Network 参数的 ip 段要和上面 kubeadm 配置文件的 podSubnet 一样cat <<EOF> flannel.yaml | kubectl apply -f flannel.yaml apiVersion: policy/v1beta1 kind: PodSecurityPolicy metadata: name: psp.flannel.unprivileged annotations: seccomp.security.alpha.kubernetes.io/allowedProfileNames: docker/default seccomp.security.alpha.kubernetes.io/defaultProfileName: docker/default apparmor.security.beta.kubernetes.io/allowedProfileNames: runtime/default apparmor.security.beta.kubernetes.io/defaultProfileName: runtime/default spec: privileged: false volumes: - configMap - secret - emptyDir - hostPath allowedHostPaths: - pathPrefix: "/etc/cni/net.d" - pathPrefix: "/etc/kube-flannel" - pathPrefix: "/run/flannel" readOnlyRootFilesystem: false # Users and groups runAsUser: rule: RunAsAny supplementalGroups: rule: RunAsAny fsGroup: rule: RunAsAny # Privilege Escalation allowPrivilegeEscalation: false defaultAllowPrivilegeEscalation: false # Capabilities allowedCapabilities: ['NET_ADMIN', 'NET_RAW'] defaultAddCapabilities: [] requiredDropCapabilities: [] # Host namespaces hostPID: false hostIPC: false hostNetwork: true hostPorts: - min: 0 max: 65535 # SELinux seLinux: # SELinux is unused in CaaSP rule: 'RunAsAny' kind: ClusterRole apiVersion: rbac.authorization.k8s.io/v1 metadata: name: flannel rules: - apiGroups: ['policy'] resources: ['podsecuritypolicies'] verbs: ['use'] resourceNames: ['psp.flannel.unprivileged'] - apiGroups: resources: - pods verbs: - get - apiGroups: resources: - nodes verbs: - list - watch - apiGroups: resources: - nodes/status verbs: - patch kind: ClusterRoleBinding apiVersion: rbac.authorization.k8s.io/v1 metadata: name: flannel roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: flannel subjects: - kind: ServiceAccount name: flannel namespace: kube-system apiVersion: v1 kind: ServiceAccount metadata: name: flannel namespace: kube-system kind: ConfigMap apiVersion: v1 metadata: name: kube-flannel-cfg namespace: kube-system labels: tier: node app: flannel data: cni-conf.json: | "name": "cbr0", "cniVersion": "0.3.1", "plugins": [ "type": "flannel", "delegate": { "hairpinMode": true, "isDefaultGateway": true "type": "portmap", "capabilities": { "portMappings": true net-conf.json: | "Network": "172.22.0.0/16", "Backend": { "Type": "vxlan" apiVersion: apps/v1 kind: DaemonSet metadata: name: kube-flannel-ds namespace: kube-system labels: tier: node app: flannel spec: selector: matchLabels: app: flannel template: metadata: labels: tier: node app: flannel spec: affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: kubernetes.io/os operator: In values: - linux hostNetwork: true priorityClassName: system-node-critical tolerations: - operator: Exists effect: NoSchedule serviceAccountName: flannel initContainers: - name: install-cni image: quay.io/coreos/flannel:v0.15.1 command: args: - /etc/kube-flannel/cni-conf.json - /etc/cni/net.d/10-flannel.conflist volumeMounts: - name: cni mountPath: /etc/cni/net.d - name: flannel-cfg mountPath: /etc/kube-flannel/ containers: - name: kube-flannel image: quay.io/coreos/flannel:v0.15.1 command: - /opt/bin/flanneld args: - --ip-masq - --kube-subnet-mgr resources: requests: cpu: "100m" memory: "50Mi" limits: cpu: "100m" memory: "50Mi" securityContext: privileged: false capabilities: add: ["NET_ADMIN", "NET_RAW"] - name: POD_NAME valueFrom: fieldRef: fieldPath: metadata.name - name: POD_NAMESPACE valueFrom: fieldRef: fieldPath: metadata.namespace volumeMounts: - name: run mountPath: /run/flannel - name: flannel-cfg mountPath: /etc/kube-flannel/ volumes: - name: run hostPath: path: /run/flannel - name: cni hostPath: path: /etc/cni/net.d - name: flannel-cfg configMap: name: kube-flannel-cfg EOF稍等 2-3 分钟,等待 flannel pod 成为 running 状态 (具体时间视镜像下载速度)NAME READY STATUS RESTARTS AGE coredns-65c54cc984-cglz9 1/1 Running 0 2m7s coredns-65c54cc984-qwd5b 1/1 Running 0 2m7s etcd-192.168.91.8 1/1 Running 0 2m22s kube-apiserver-192.168.91.8 1/1 Running 0 2m16s kube-controller-manager-192.168.91.8 1/1 Running 0 2m16s kube-flannel-ds-26drg 1/1 Running 0 100s kube-proxy-zwdlm 1/1 Running 0 2m7s kube-scheduler-192.168.91.8 1/1 Running 0 2m22swork 节点加入集群在 master 节点初始化完成的时候,已经给出了加入集群的参数只需要复制一下,到 work 节点执行即可--node-name 参数定义节点名称,如果是主机名需要保证可以解析(kubectl get nodes 命令查看到的节点名称)kubeadm join 192.168.91.8:6443 --token abcdef.0123456789abcdef \ --discovery-token-ca-cert-hash sha256:5e2387403e698e95b0eab7197837f2425f7b8610e7b400e54d81c27f3c6f1964 \ --node-name 192.168.91.9如果忘记记录了,或者以后需要增加节点怎么办?执行下面的命令就可以了kubeadm token create --print-join-command --ttl=0输出也很少,这个时候只需要去 master 节点执行 kubectl get nodes 命令就可以查看节点的状态了[preflight] Running pre-flight checks [WARNING Service-Kubelet]: kubelet service is not enabled, please run 'systemctl enable kubelet.service' [preflight] Reading configuration from the cluster... [preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml' [kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml" [kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env" [kubelet-start] Starting the kubelet [kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap... This node has joined the cluster: * Certificate signing request was sent to apiserver and a response was received. * The Kubelet was informed of the new secure connection details. Run 'kubectl get nodes' on the control-plane to see this node join the cluster.节点变成 Ready 的时间取决于 work 节点的 flannel 镜像拉取时间可以通过 kubectl get node -n kube-system 查看 flannel 是否为 Running 状态NAME STATUS ROLES AGE VERSION 192.168.91.8 Ready control-plane,master 9m34s v1.23.3 192.168.91.9 Ready <none> 6m11s v1.23.3master 节点加入集群需要先从其中一个 master 节点获取 CA 键哈希值这个值在 kubeadm init 完成时也是已经输出到终端了kubeadm init 时如果有修改过 certificatesDir 参数,/etc/kubernetes/pki/ca.crt 这里的路径需要注意确认和修改获取到的 hash 值,使用格式: sha256:<hash 值>openssl x509 -pubkey -in /etc/kubernetes/pki/ca.crt | openssl rsa -pubin -outform der 2>/dev/null | openssl dgst -sha256 -hex | sed 's/^.* //'也可以直接创建新的 token ,并且会给出 hash 值,并给出如下的命令,只需要加上--certificate-key 和 --control-plane 参数即可kubeadm join 192.168.91.8:6443 --token 352obx.dw7rqphzxo6cvz9r --discovery-token-ca-cert-hash sha256:5e2387403e698e95b0eab7197837f2425f7b8610e7b400e54d81c27f3c6f1964kubeadm token create --print-join-command --ttl=0解密由 kubeadm init 上传的证书 secret对应的 kubeadm join 参数为 --certificate-keykubeadm init phase upload-certs --upload-certs在需要扩容的 master 节点执行 kubeadm join 命令加入集群--node-name 参数定义节点名称,如果是主机名需要保证可以解析(kubectl get nodes 命令查看到的节点名称)kubeadm join 192.168.91.8:6443 --token abcdef.0123456789abcdef \ --discovery-token-ca-cert-hash sha256:5e2387403e698e95b0eab7197837f2425f7b8610e7b400e54d81c27f3c6f1964 \ --certificate-key a7a12fb565bf94c768f0097898926e4d0805eb7ecc1477b48fdaaf4d27eb26b0 \ --control-plane \ --node-name 192.168.91.10查看节点kubectl get nodesNAME STATUS ROLES AGE VERSION 192.168.91.10 Ready control-plane,master 96m v1.23.3 192.168.91.8 Ready control-plane,master 161m v1.23.3 192.168.91.9 Ready <none> 158m v1.23.3查看 master 组件kubectl get pod -n kube-system | egrep -v 'flannel|dns'NAME READY STATUS RESTARTS AGE etcd-192.168.91.10 1/1 Running 0 97m etcd-192.168.91.8 1/1 Running 0 162m kube-apiserver-192.168.91.10 1/1 Running 0 97m kube-apiserver-192.168.91.8 1/1 Running 0 162m kube-controller-manager-192.168.91.10 1/1 Running 0 97m kube-controller-manager-192.168.91.8 1/1 Running 0 162m kube-proxy-6cczc 1/1 Running 0 158m kube-proxy-bfmzz 1/1 Running 0 97m kube-proxy-zwdlm 1/1 Running 0 162m kube-scheduler-192.168.91.10 1/1 Running 0 97m kube-scheduler-192.168.91.8 1/1 Running 0 162mk8s 组件证书续费查看当前组件到期时间kubeadm certs check-expiration根证书其实是10年的,只是组件的证书只有1年[check-expiration] Reading configuration from the cluster... [check-expiration] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml' CERTIFICATE EXPIRES RESIDUAL TIME CERTIFICATE AUTHORITY EXTERNALLY MANAGED admin.conf Feb 17, 2023 02:45 UTC 364d ca no apiserver Feb 17, 2023 02:45 UTC 364d ca no apiserver-etcd-client Feb 17, 2023 02:45 UTC 364d etcd-ca no apiserver-kubelet-client Feb 17, 2023 02:45 UTC 364d ca no controller-manager.conf Feb 17, 2023 02:45 UTC 364d ca no etcd-healthcheck-client Feb 17, 2023 02:45 UTC 364d etcd-ca no etcd-peer Feb 17, 2023 02:45 UTC 364d etcd-ca no etcd-server Feb 17, 2023 02:45 UTC 364d etcd-ca no front-proxy-client Feb 17, 2023 02:45 UTC 364d front-proxy-ca no scheduler.conf Feb 17, 2023 02:45 UTC 364d ca no CERTIFICATE AUTHORITY EXPIRES RESIDUAL TIME EXTERNALLY MANAGED ca Feb 15, 2032 02:45 UTC 9y no etcd-ca Feb 15, 2032 02:45 UTC 9y no front-proxy-ca Feb 15, 2032 02:45 UTC 9y no使用 kubeadm 命令续费1年前提是证书已经到期了这里使用 date -s 2023-2-18 命令修改系统时间来模拟证书到期的情况kubectl get nodes --kubeconfig /etc/kubernetes/admin.confUnable to connect to the server: x509: certificate has expired or is not yet valid: current time 2023-02-18T00:00:15+08:00 is after 2023-02-17T05:34:40Z因为证书到期,就会出现如下的输出,然后使用下面的命令再次续费一年,然后重启 kubelet 以及重启 etcd kube-apiserver kube-controller-manager kube-scheduler 组件所有的 master 节点都操作一遍,或者其中一台 master 节点操作完成后,将 /etc/kubernetes/admin.conf 证书文件分发到其他 master 节点,替换掉老的证书文件cp -r /etc/kubernetes/pki{,.old} kubeadm certs renew allsystemctl restart kubeletkubeadm certs check-expiration 再次查看证书,就可以看到,证书到期时间变成 2024 年了[check-expiration] Reading configuration from the cluster... [check-expiration] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml' CERTIFICATE EXPIRES RESIDUAL TIME CERTIFICATE AUTHORITY EXTERNALLY MANAGED admin.conf Feb 17, 2024 16:01 UTC 364d ca no apiserver Feb 17, 2024 16:01 UTC 364d ca no apiserver-etcd-client Feb 17, 2024 16:01 UTC 364d etcd-ca no apiserver-kubelet-client Feb 17, 2024 16:01 UTC 364d ca no controller-manager.conf Feb 17, 2024 16:01 UTC 364d ca no etcd-healthcheck-client Feb 17, 2024 16:01 UTC 364d etcd-ca no etcd-peer Feb 17, 2024 16:01 UTC 364d etcd-ca no etcd-server Feb 17, 2024 16:01 UTC 364d etcd-ca no front-proxy-client Feb 17, 2024 16:01 UTC 364d front-proxy-ca no scheduler.conf Feb 17, 2024 16:01 UTC 364d ca no CERTIFICATE AUTHORITY EXPIRES RESIDUAL TIME EXTERNALLY MANAGED ca Feb 15, 2032 02:45 UTC 8y no etcd-ca Feb 15, 2032 02:45 UTC 8y no front-proxy-ca Feb 15, 2032 02:45 UTC 8y no编译 kubeadm 达成十年契约编译 kubeadm 需要有 go 语言环境,先来一个 gogo 官方下载地址官方下载上传到csdnwget https://go.dev/dl/go1.17.7.linux-amd64.tar.gz tar xvf go1.17.7.linux-amd64.tar.gz -C /usr/local/ echo 'PATH=$PATH:/usr/local/go/bin' >> $HOME/.bashrc source $HOME/.bashrc go version下载 k8s 源码包,要和当前集群版本一致github下载上传到csdnwget https://github.com/kubernetes/kubernetes/archive/refs/tags/v1.23.3.tar.gz tar xvf v1.23.3.tar.gzcd kubernetes-1.23.3/ vim staging/src/k8s.io/client-go/util/cert/cert.go将 duration365d * 10 改成 duration365d * 100now.Add(duration365d * 100).UTC(),vim cmd/kubeadm/app/constants/constants.go将 CertificateValidity = time.Hour * 24 * 365 改成 CertificateValidity = time.Hour * 24 * 3650CertificateValidity = time.Hour * 24 * 3650编译 kubeadmmake WHAT=cmd/kubeadm GOFLAGS=-v续费证书cp -r /etc/kubernetes/pki{,.old} _output/bin/kubeadm certs renew all systemctl restart kubelet查看证书到期时间_output/bin/kubeadm certs check-expiration十年了[check-expiration] Reading configuration from the cluster... [check-expiration] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml' CERTIFICATE EXPIRES RESIDUAL TIME CERTIFICATE AUTHORITY EXTERNALLY MANAGED admin.conf Feb 15, 2032 07:08 UTC 9y ca no apiserver Feb 15, 2032 07:08 UTC 9y ca no apiserver-etcd-client Feb 15, 2032 07:08 UTC 9y etcd-ca no apiserver-kubelet-client Feb 15, 2032 07:08 UTC 9y ca no controller-manager.conf Feb 15, 2032 07:08 UTC 9y ca no etcd-healthcheck-client Feb 15, 2032 07:08 UTC 9y etcd-ca no etcd-peer Feb 15, 2032 07:08 UTC 9y etcd-ca no etcd-server Feb 15, 2032 07:08 UTC 9y etcd-ca no front-proxy-client Feb 15, 2032 07:08 UTC 9y front-proxy-ca no scheduler.conf Feb 15, 2032 07:08 UTC 9y ca no CERTIFICATE AUTHORITY EXPIRES RESIDUAL TIME EXTERNALLY MANAGED ca Feb 15, 2032 02:45 UTC 9y no etcd-ca Feb 15, 2032 02:45 UTC 9y no front-proxy-ca Feb 15, 2032 02:45 UTC 9y no替换 kubeadm 二进制文件,如果有多个 master 节点,也要分发过去,进行替换mv /usr/bin/kubeadm{,-oneyear} cp _output/bin/kubeadm /usr/bin/如果是访问 $HOME/.kube/conf 文件,需要替换 admin.conf如果是 export 设置环境变量的,可以不用替换mv $HOME/.kube/conf{,-oneyear} cp /etc/kubernetes/admin.conf $HOME/.kube/conf
查看私有仓库有哪些镜像如果私有仓库带有认证,在使用 curl 命令的时候需要带上 -u 参数使用方法:curl -XGET -u <仓库用户名>:<用户名密码> http://<仓库ip地址>:<仓库端口>/v2/_catalogcurl -XGET -u admin:admin http://192.168.91.18:5000/v2/_catalog输出的格式为 json{"repositories":["centos","debian","mysql","nginx","php"]}如果输出的镜像很多,可以用 python 格式化 json 格式,方便查看curl -s -XGET -u admin:admin http://192.168.91.18:5000/v2/_catalog | python -m json.tool这样看,也会直观很多{ "repositories": [ "centos", "debian", "mysql", "nginx", "php" }查看私有仓库镜像的tag使用方法:curl -XGET -u <仓库用户名>:<用户名密码> http://<仓库ip地址>:<仓库端口>/v2/<镜像名称>/targs/listcurl -XGET -u admin:admin http://192.168.91.18:5000/v2/centos/tags/list输出的格式为 json{"name":"centos","tags":["latest","7"]}如果输出的 tag 很多,可以用 python 格式化 json 格式,方便查看curl -s -XGET -u admin:admin http://192.168.91.18:5000/v2/centos/tags/list | python -m json.tool这样看,也会直观很多{ "name": "centos", "tags": [ "latest", }删除私有仓库指定镜像确认是否开启删除功能如果没有开启,执行删除镜像操作的时候,会返回如下两种结果{"errors":[{"code":"UNSUPPORTED","message":"The operation is unsupported."}]}HTTP/1.1 405 Method Not Allowed Content-Type: application/json; charset=utf-8 Docker-Distribution-Api-Version: registry/2.0 X-Content-Type-Options: nosniff Date: Fri, 18 Mar 2022 04:12:22 GMT Content-Length: 78查找 registry 容器docker ps | grep registry以自己实际获取的信息为准3745255afa90 registry "/entrypoint.sh /etc…" About an hour ago Up About an hour 0.0.0.0:5000->5000/tcp, :::5000->5000/tcp registry进入容器registry 进入容器的终端是 shdocker exec -it 3745255afa90 sh一般都是在 /etc/docker/registry/config.ymlregistry 镜像里面有 vi 没有 vimvi /etc/docker/registry/config.yml我拉取的 registry 镜像默认没有配置 delete 功能version: 0.1 fields: service: registry storage: cache: blobdescriptor: inmemory filesystem: rootdirectory: /var/lib/registry # 增加这里的 delete 和 enabled ,注意 yaml 语法格式 # 如果有 delete ,并且 enable 为 true 表示已经开启了删除功能 delete: enabled: true http: addr: :5000 headers: X-Content-Type-Options: [nosniff] health: storagedriver: enabled: true interval: 10s threshold: 3修改完成后,重启 registry 容器docker restart 3745255afa90获取指定镜像的 hash 值使用方法`curl --header "Accept:application/vnd.docker.distribution.manifest.v2+json" -I \-u <仓库用户名>:<用户名密码> http://&lt;仓库ip地址>:<仓库端口>/v2/<镜像名称>/manifests/<镜像 tag>`curl -I -XGET --header "Accept:application/vnd.docker.distribution.manifest.v2+json" \ -u admin:admin http://192.168.91.18:5000/v2/centos/manifests/latestDocker-Content-Digest 这里就会出现镜像的 hash 值HTTP/1.1 200 OK Content-Length: 529 Content-Type: application/vnd.docker.distribution.manifest.v2+json Docker-Content-Digest: sha256:a1801b843b1bfaf77c501e7a6d3f709401a1e0c83863037fa3aab063a7fdb9dc Docker-Distribution-Api-Version: registry/2.0 Etag: "sha256:a1801b843b1bfaf77c501e7a6d3f709401a1e0c83863037fa3aab063a7fdb9dc" X-Content-Type-Options: nosniff Date: Fri, 18 Mar 2022 04:06:42 GMT删除私有仓库中的镜像使用方法`curl -I -XDELETE -u <仓库用户名>:<用户名密码> \http://&lt;仓库ip地址>:<仓库端口>/v2/<镜像名称>/manifests/<获取的 hash 值>`curl -I -XDELETE -u admin:admin \ http://192.168.91.18:5000/v2/centos/manifests/sha256:a1801b843b1bfaf77c501e7a6d3f709401a1e0c83863037fa3aab063a7fdb9dc返回的状态码是 202HTTP/1.1 202 Accepted Docker-Distribution-Api-Version: registry/2.0 X-Content-Type-Options: nosniff Date: Fri, 18 Mar 2022 04:24:23 GMT Content-Length: 0再次查看 centos 镜像的 tag 列表curl -XGET -u admin:admin http://192.168.91.18:5000/v2/centos/tags/list现在只有一个 7 这个 tag 的镜像了{"name":"centos","tags":["7"]}
自我介绍kubernetes 在 1.24 版本之后就要抛弃 docker-shim 组件,容器运行时也是从 docker 转换到了 containerd,而 containerd 自带的 ctr 命令并不好用,并且自身不支持构建镜像,并不像 docker 一样可以通过 docker build 来构建镜像containerd 有一个子项目:nerdctl ,用来兼容 docker cli,可以像 docker 命令一样来管理本地的镜像和容器nerdctl githubwget 下载的时候,需要加上 --no-check-certificate 参数,不然可能会返回 Unable to establish SSL connection. 这样的报错精简版 10.22MB仅有 nerdctl 命令无法使用 nerdctl build 命令,执行 nerdctl build 会出现如下报错:ERRO[0000] buildctl needs to be installed and buildkitd needs to be running完整版 221.6MB不仅有 netdctl 命令,还包含了 buildkitd、 buildctl、ctr、runc 等 containerd 相关的命令,以及 cni 插件的二进制文件nerdctl 精简版使用方法下载精简版二进制文件后,只需要把解压出来的文件放到 /usr/bin 目录下就可以了,当然有特殊需求,也可以解压到指定的路径,追加到 PATH 变量也可以nerdctl 命令默认链接 containerd.sock 文件路径是 /run/containerd/containerd.sock ,如果和 containerd 配置文件内配置的 containerd.sock 不同,使用 nerdctl 命令的时候需要加上 -a 参数来指定 containerd.sock 路径tar xf nerdctl-1.0.0-linux-amd64.tar.gz -C /usr/bin随后就可以跟 docker 命令一样去查看容器和镜像了配置 nerdctl 参数自动补齐参数自动补齐,需要系统已经安装了 bash-completion.noarch 工具echo 'source <(nerdctl completion bash)' >> /etc/profile # 重新加载 /etc/profile 文件 source /etc/profilenerdctl 命令验证containerd 和 docker 在 kubernetes 上使用的区别在于:containerd 作为容器运行时的情况下,需要把镜像放到 k8s.io 这个 namespace 下下载镜像nerdctl -n k8s.io image pull centos:7ctr 命令验证ctr -n k8s.io image lsnerdctl 完整版使用方法用完整版是为了可以使用 nerdctl build 命令,而 nerdctl build 命令其实时使用了 buildctl 命令来构建镜像完整版的 lib 目录下有现成的 buildkit.service 文件,不过需要注意 buildkitd 命令的路径,文件内默认的路径是 /usr/local/bin/buildkitd,需要把二进制文件放到指定路径下,或者修改文件的默认路径cp lib/systemd/system/buildkit.service /lib/systemd/system/buildkitd - buildkit 服务端命令 buildkitd 有两种可用的 worker,一个是 runc ,一个是 containerd ;默认使用 runc ,在 buildkitd 参数中为 oci-worker使用 containerd 作为 worker,需要增加 --oci-worker=false --containerd-worker=true 参数和 nerdctl 命令一样,默认调用的是 /run/containerd/containerd.sock 文件,如果路径不一致,需要增加 --containerd-worker-addr 参数来指定 containerd.sock 文件的路径使用 containerd 作为 worker 时,会自动创建 buildkit 这个 namespace,可以通过 nerdctl namespace ls 命令来查看如果使用 nerdctl build 命令构建镜像,想把构建的镜像放到 buildkit 这个 namespace 下面,需要使用 nerdctl -n buildkit build 命令来指定 namespace如果使用 buildctl 命令构建镜像,会自动将构建好的镜像放到 buildkit 这个 namespace 下面buildctl - buildkit 客户端命令执行 nerdctl build 需要保证 buildctl 命令在系统 PATH 环境变量中可查启动 buildkitsystemctl enable buildkit.service --nownerdcrtl 构建镜像简单写一个 Dockerfile 验证一下FROM alpine:3.16.3 ENV LANG=en_US.UTF-8 ENV TZ="Asia/Shanghai" RUN echo '/bin/sleep 315360000' > start.sh CMD ["sh","start.sh"]构建镜像nerdctl build -t alpine:3.16.3-test .使用 buildctl 命令构建镜像buildctl build --frontend dockerfile.v0 \ --local context=. \ --local dockerfile=. \ --output type=image,name=alpine:3.16.3-buildctl--frontend - 使用 dockerfile.v0 作为前端,还有 gateway.v0 可以作为前端--local context= - Dockerfile 执行构建时的路径上下文,比如在从目录中拷贝文件到镜像里--local dockerfile= - Dockerfile 文件所在路径--output name= - 构建后的镜像名称挺麻烦的,我还是选择 nerdctl 命令把