MySQL Shell 8.3.0 では、ダンプ・ロードユーティリティでデータのチェックサムを取得し正しくデータがインポートされた事を検証するための機能が実装されました。
https://dev.mysql.com/doc/relnotes/mysql-shell/8.3/en/news-8-3-0.html
It is now possible to generate a checksum for a dump to enable you to validate your data on import. The option checksum: [true | false] is added to the following utilities:
util.dumpInstance(), util.dumpSchemas(), and util.dumpTables(): a metadata file, @.checksums.json is generated containing the details of the dumped data and a checksum for each.
util.loadDump(): validates the checksums after the data is loaded.
util.copyInstance(), util.copySchemas(), and util.copyTables(): generate the metadata file, @.checksums.json, and validate the data.
(WL #15947)
地味ながら、これまで待ち望まれていた機能ではないかと思いますので、今回検証したいと思います。
CHECKSUM TABLEコマンドについて
MySQLではこれまでも テーブルデータが同じであるかを確認するためのコマンドとして CHECKSUM TABLE コマンドがありました。
1 2 3 4 5 6 |
mysql> checksum table sakila.actor; +--------------+------------+ | Table | Checksum | +--------------+------------+ | sakila.actor | 2197499348 | +--------------+------------+ |
上記のようにテーブル全体のチェックサムを取得することができますが、テーブルのROW_FORMATの違いや、バージョン間による列データ型の内部的な違いなども差分と含まれてしまうということがあり、
純粋にテーブルに含まれるデータを比較したい、アップグレード前後で値を比較したい、といったニーズでは使うことができませんでした。
MySQL Shellのダンプ・ロードユーティリティにおけるチェックサム検証機能はよりデータのみに特化してチェックサムを取得することが可能です。
ダンプ・ロードユーティリティを使用したデータ移行
アップグレードの際には、サービス停止期間を設けられるようであれば論理バックアップを使用してさっと済ませたいものです。
そのような時にMySQLShellのダンプ・ロードユーティリティは高速にエクスポート・インポートが可能で、論理バックアップのためバージョン間の影響も物理バックアップよりは少ないです。
例えば、以下のように、システムスキーマ(mysql) を除いた形であれば 5.6から一足飛びに 8.0へインポートすることも可能です。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 |
$ mysqlsh u@mysql56 -- util dump-schemas sakila \ --output-url=/tmp/dump --threads=4 --show-progress=true \ --default-character-set=utf8 --consistent=true NOTE: Backup lock is not supported in MySQL 5.6 and DDL changes will not be blocked. The dump may fail with an error if schema changes are made while dumping. Acquiring global read lock Global read lock acquired Initializing - done 1 schemas will be dumped and within them 16 tables, 7 views, 6 routines, 6 triggers. Gathering information - done All transactions have been started Global read lock has been released Writing global DDL files Running data dump using 4 threads. NOTE: Progress information uses estimated values and may not be accurate. NOTE: Table statistics not available for <code>sakila</code>.<code>store</code>, chunking operation may be not optimal. Please consider running 'ANALYZE TABLE <code>sakila</code>.<code>store</code>;' first. NOTE: Table statistics not available for <code>sakila</code>.<code>staff</code>, chunking operation may be not optimal. Please consider running 'ANALYZE TABLE <code>sakila</code>.<code>staff</code>;' first. NOTE: Table statistics not available for <code>sakila</code>.<code>rental</code>, chunking operation may be not optimal. Please consider running 'ANALYZE TABLE <code>sakila</code>.<code>rental</code>;' first. NOTE: Table statistics not available for <code>sakila</code>.<code>language</code>, chunking operation may be not optimal. Please consider running 'ANALYZE TABLE <code>sakila</code>.<code>language</code>;' first. NOTE: Table statistics not available for <code>sakila</code>.<code>payment</code>, chunking operation may be not optimal. Please consider running 'ANALYZE TABLE <code>sakila</code>.<code>payment</code>;' first. Writing schema metadata - done Writing DDL - done Writing table metadata - done Starting data dump 311% (47.27K rows / ~15.17K rows), 0.00 rows/s, 0.00 B/s uncompressed, 0.00 B/s compressed Dump duration: 00:00:00s Total duration: 00:00:00s Schemas dumped: 1 Tables dumped: 16 Uncompressed data size: 3.01 MB Compressed data size: 701.42 KB Compression ratio: 4.3 Rows written: 47268 Bytes written: 701.42 KB Average uncompressed throughput: 3.01 MB/s Average compressed throughput: 701.42 KB/s $ mysqlsh root@mysql83 -- util load-dump /tmp/dump \ --threads=4 --show-progress=true --ignore-version \ --update-gtid-set=off --analyze-tables=on Loading DDL and Data from '/tmp/dump' using 4 threads. Opening dump... Target is MySQL 8.3.0. Dump was produced from MySQL 5.6.51 WARNING: Destination MySQL version is newer than the one where the dump was created. Source and destination have non-consecutive major MySQL versions. The 'ignoreVersion' option is enabled, so loading anyway. Scanning metadata - done Checking for pre-existing objects... Executing common preamble SQL Executing DDL - done Executing view DDL - done Starting data load WARNING: sakila@staff@@0.tsv.zst error 3719: 'utf8' is currently an alias for the character set UTF8MB3, but will be an alias for UTF8MB4 in a future release. Please consider using UTF8MB4 in order to be unambiguous. WARNING: sakila@film_text@@0.tsv.zst error 3719: 'utf8' is currently an alias for the character set UTF8MB3, but will be an alias for UTF8MB4 in a future release. Please consider using UTF8MB4 in order to be unambiguous. WARNING: sakila@film@@0.tsv.zst error 3719: 'utf8' is currently an alias for the character set UTF8MB3, but will be an alias for UTF8MB4 in a future release. Please consider using UTF8MB4 in order to be unambiguous. WARNING: sakila@address@@0.tsv.zst error 3719: 'utf8' is currently an alias for the character set UTF8MB3, but will be an alias for UTF8MB4 in a future release. Please consider using UTF8MB4 in order to be unambiguous. WARNING: sakila@film_actor@@0.tsv.zst error 3719: 'utf8' is currently an alias for the character set UTF8MB3, but will be an alias for UTF8MB4 in a future release. Please consider using UTF8MB4 in order to be unambiguous. WARNING: sakila@inventory@@0.tsv.zst error 3719: 'utf8' is currently an alias for the character set UTF8MB3, but will be an alias for UTF8MB4 in a future release. Please consider using UTF8MB4 in order to be unambiguous. WARNING: sakila@city@@0.tsv.zst error 3719: 'utf8' is currently an alias for the character set UTF8MB3, but will be an alias for UTF8MB4 in a future release. Please consider using UTF8MB4 in order to be unambiguous. WARNING: sakila@customer@@0.tsv.zst error 3719: 'utf8' is currently an alias for the character set UTF8MB3, but will be an alias for UTF8MB4 in a future release. Please consider using UTF8MB4 in order to be unambiguous. WARNING: sakila@actor@@0.tsv.zst error 3719: 'utf8' is currently an alias for the character set UTF8MB3, but will be an alias for UTF8MB4 in a future release. Please consider using UTF8MB4 in order to be unambiguous. WARNING: sakila@country@@0.tsv.zst error 3719: 'utf8' is currently an alias for the character set UTF8MB3, but will be an alias for UTF8MB4 in a future release. Please consider using UTF8MB4 in order to be unambiguous. WARNING: sakila@category@@0.tsv.zst error 3719: 'utf8' is currently an alias for the character set UTF8MB3, but will be an alias for UTF8MB4 in a future release. Please consider using UTF8MB4 in order to be unambiguous. WARNING: sakila@film_category@@0.tsv.zst error 3719: 'utf8' is currently an alias for the character set UTF8MB3, but will be an alias for UTF8MB4 in a future release. Please consider using UTF8MB4 in order to be unambiguous. WARNING: sakila@store@@0.tsv.zst error 3719: 'utf8' is currently an alias for the character set UTF8MB3, but will be an alias for UTF8MB4 in a future release. Please consider using UTF8MB4 in order to be unambiguous. WARNING: sakila@language@@0.tsv.zst error 3719: 'utf8' is currently an alias for the character set UTF8MB3, but will be an alias for UTF8MB4 in a future release. Please consider using UTF8MB4 in order to be unambiguous. 2 thds loading - 1 thds indexing - 100% (3.01 MB / 3.01 MB), 3.01 MB/s, 14 / 16 tables done Recreating indexes - done Analyzing tables - done WARNING: sakila@payment@@0.tsv.zst error 3719: 'utf8' is currently an alias for the character set UTF8MB3, but will be an alias for UTF8MB4 in a future release. Please consider using UTF8MB4 in order to be unambiguous. WARNING: sakila@rental@@0.tsv.zst error 3719: 'utf8' is currently an alias for the character set UTF8MB3, but will be an alias for UTF8MB4 in a future release. Please consider using UTF8MB4 in order to be unambiguous. Executing common postamble SQL 16 chunks (47.27K rows, 3.01 MB) for 16 tables in 1 schemas were loaded in 3 sec (avg throughput 3.01 MB/s) 16 warnings were reported during the load. |
なんだか恐ろしげなWARNINGが発生していますね。
このような場合も、インポート前後でデータのチェックサムを取得できていれば、WARNINGの内容はともかくとしてデータはダンプ・ロード前後のバージョンで一貫していることが担保できるかと思います。
チェックサムの取得方法
ダンプユーティリティ(dumpInstance, dumpSchemas等)のchecksumオプションは以下になります。
https://dev.mysql.com/doc/mysql-shell/8.3/en/mysql-shell-utilities-dump-instance-schema.html
checksum: [ true | false ]
If enabled, a metadata file, @.checksums.json is generated with the dump. This file contains the checksum data for the dump, enabling data verification when loading the dump. See Options for Load Control.The following conditions apply if checksum: true:
If ddlOnly:false and chunking:true, a checksum is generated for each dumped table and partition chunk.
If ddlOnly:false and chunking:false, a checksum is generated for each dumped table and table partition.
If ddlOnly:true, a checksum is generated for each dumped table and table partition.
ロードユーティリティ(loadDump) のchecksumオプションは以下です。
https://dev.mysql.com/doc/mysql-shell/8.3/en/mysql-shell-utilities-load-dump.html
checksum: [true|false]
If enabled, the loadDump utility checks the checksum data generated by the dump utility after the corresponding data is loaded. The verification is limited to data which was dumped, ignoring generated data such as invisible primary keys added by the loadDump utility.Errors are returned if a checksum does not match or if a table is missing and cannot be verified.
If checksum: true but no data was loaded, either due to loadData: false or no data being dumped, the utility verifies the dump’s checksum information against the current contents of the affected tables.
If a table does not exist, an error is displayed for each missing table.
If checksum: true and dryRun: true, the checksum is not verified. A message is displayed stating that no verification took place.
チェックサムを比較するには、ダンプユーティリティ実行時に checksumオプションを有効化しておく必要があります。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
$ mysqlsh u@mysql56 -- util dump-schemas sakila \ --checksum=true --output-url=/tmp/dump-checksum \ --threads=4 --show-progress=true --default-character-set=utf8 \ --consistent=true : Running data dump using 4 threads. Checksumming enabled. NOTE: Progress information uses estimated values and may not be accurate. Writing schema metadata - done Writing DDL - done Writing table metadata - done Starting data dump 4 thds chksum - 99% (47.27K rows / ~47.27K rows), 0.00 rows/s, 0.00 B/s uncompressed, 0.00 B/s compressed Computing checksum - done : |
チェックサムの取得は --threads
に指定したスレッドが各々行うようです。
実際に取得されたチェックサムはダンプディレクトリ配下の @.checksums.json
に出力されています。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
: "actor": { "columns": [ "actor_id", "first_name", "last_name", "last_update" ], "indexColumns": [ "actor_id" ], "partitions": { "": { "0": { "checksum": "CB750133771479CA3B281BBDF56290331B3F6A86D8F7314823D94A708DBB82C5", "count": 200, "boundary": "(<code>actor_id</code> BETWEEN 1 AND 200)" } } } } : |
では、チェックサム付きのダンプデータをインポートします。
この際にも --checksum
オプションを付ける必要があります。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 |
$ mysqlsh root@mysql83 -- util load-dump /tmp/dump-checksum \ --threads=4 --show-progress=true --ignore-version \ --update-gtid-set=off --checksum=true Loading DDL and Data from '/tmp/dump-checksum' using 4 threads. Opening dump... Target is MySQL 8.3.0. Dump was produced from MySQL 5.6.51 WARNING: Destination MySQL version is newer than the one where the dump was created. Source and destination have non-consecutive major MySQL versions. The 'ignoreVersion' option is enabled, so loading anyway. Scanning metadata - done Checking for pre-existing objects... Executing common preamble SQL Executing DDL - done Executing view DDL - done Starting data load WARNING: sakila@staff@@0.tsv.zst error 3719: 'utf8' is currently an alias for the character set UTF8MB3, but will be an alias for UTF8MB4 in a future release. Please consider using UTF8MB4 in order to be unambiguous. WARNING: sakila@film_text@@0.tsv.zst error 3719: 'utf8' is currently an alias for the character set UTF8MB3, but will be an alias for UTF8MB4 in a future release. Please consider using UTF8MB4 in order to be unambiguous. WARNING: sakila@address@@0.tsv.zst error 3719: 'utf8' is currently an alias for the character set UTF8MB3, but will be an alias for UTF8MB4 in a future release. Please consider using UTF8MB4 in order to be unambiguous. WARNING: sakila@film@@0.tsv.zst error 3719: 'utf8' is currently an alias for the character set UTF8MB3, but will be an alias for UTF8MB4 in a future release. Please consider using UTF8MB4 in order to be unambiguous. WARNING: sakila@film_actor@@0.tsv.zst error 3719: 'utf8' is currently an alias for the character set UTF8MB3, but will be an alias for UTF8MB4 in a future release. Please consider using UTF8MB4 in order to be unambiguous. WARNING: sakila@inventory@@0.tsv.zst error 3719: 'utf8' is currently an alias for the character set UTF8MB3, but will be an alias for UTF8MB4 in a future release. Please consider using UTF8MB4 in order to be unambiguous. WARNING: sakila@city@@0.tsv.zst error 3719: 'utf8' is currently an alias for the character set UTF8MB3, but will be an alias for UTF8MB4 in a future release. Please consider using UTF8MB4 in order to be unambiguous. WARNING: sakila@customer@@0.tsv.zst error 3719: 'utf8' is currently an alias for the character set UTF8MB3, but will be an alias for UTF8MB4 in a future release. Please consider using UTF8MB4 in order to be unambiguous. WARNING: sakila@actor@@0.tsv.zst error 3719: 'utf8' is currently an alias for the character set UTF8MB3, but will be an alias for UTF8MB4 in a future release. Please consider using UTF8MB4 in order to be unambiguous. WARNING: sakila@country@@0.tsv.zst error 3719: 'utf8' is currently an alias for the character set UTF8MB3, but will be an alias for UTF8MB4 in a future release. Please consider using UTF8MB4 in order to be unambiguous. WARNING: sakila@category@@0.tsv.zst error 3719: 'utf8' is currently an alias for the character set UTF8MB3, but will be an alias for UTF8MB4 in a future release. Please consider using UTF8MB4 in order to be unambiguous. WARNING: sakila@film_category@@0.tsv.zst error 3719: 'utf8' is currently an alias for the character set UTF8MB3, but will be an alias for UTF8MB4 in a future release. Please consider using UTF8MB4 in order to be unambiguous. WARNING: sakila@language@@0.tsv.zst error 3719: 'utf8' is currently an alias for the character set UTF8MB3, but will be an alias for UTF8MB4 in a future release. Please consider using UTF8MB4 in order to be unambiguous. WARNING: sakila@store@@0.tsv.zst error 3719: 'utf8' is currently an alias for the character set UTF8MB3, but will be an alias for UTF8MB4 in a future release. Please consider using UTF8MB4 in order to be unambiguous. 2 thds loading - 1 thds indexing - 1 thds chksum | 100% (3.01 MB / 3.01 MB), 1.20 MB/s, 14 / 16 tables done Recreating indexes - done WARNING: sakila@payment@@0.tsv.zst error 3719: 'utf8' is currently an alias for the character set UTF8MB3, but will be an alias for UTF8MB4 in a future release. Please consider using UTF8MB4 in order to be unambiguous. WARNING: sakila@rental@@0.tsv.zst error 3719: 'utf8' is currently an alias for the character set UTF8MB3, but will be an alias for UTF8MB4 in a future release. Please consider using UTF8MB4 in order to be unambiguous. Verifying checksum information - done Executing common postamble SQL 16 chunks (47.27K rows, 3.01 MB) for 16 tables in 1 schemas were loaded in 6 sec (avg throughput 1.20 MB/s) 16 warnings were reported during the load. 16 checksums were verified in 1 sec. |
Verifying checksum information - done
という出力からなんとなくデータ整合性が取れているように見えますが、わざと異なるデータに変更してみましょう。
1 2 3 4 5 6 7 8 9 10 11 12 |
$ zstd -d /tmp/dump-checksum/sakila@actor@@0.tsv.zst /tmp/dump-checksum/sakila@actor@@0.tsv.zst: 7399 bytes $ vi /tmp/dump-checksum/sakila@actor@@0.tsv 1 AAAAAAAA★ここを変更 GUINESS 2006-02-15 04:34:33 2 NICK WAHLBERG 2006-02-15 04:34:33 3 ED CHASE 2006-02-15 04:34:33 4 JENNIFER DAVIS 2006-02-15 04:34:33 $ zstd /tmp/dump-checksum/sakila@actor@@0.tsv zstd: /tmp/dump-checksum/sakila@actor@@0.tsv.zst already exists; overwrite (y/n) ? y /tmp/dump-checksum/sakila@actor@@0.tsv : 26.00% ( 7.23 KiB => 1.88 KiB, /tmp/dump-checksum/sakila@actor@@0.tsv |
インポートしたsakilaデータベースを一旦削除して再実行します。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 |
$ mysqlsh root@mysql80 --sql -e "drop database sakila" $ mysqlsh root@localhost -- util load-dump /tmp/dump-checksum \ --threads=4 --show-progress=true --ignore-version \ --update-gtid-set=off --checksum=true --reset-progress=true Loading DDL and Data from '/tmp/dump-checksum' using 4 threads. Opening dump... Target is MySQL 8.3.0. Dump was produced from MySQL 5.6.51 WARNING: Destination MySQL version is newer than the one where the dump was created. Source and destination have non-consecutive major MySQL versions. The 'ignoreVersion' option is enabled, so loading anyway. NOTE: Load progress file detected for the instance but 'resetProgress' option was enabled. Load progress will be discarded and the whole dump will be reloaded. Scanning metadata - done Checking for pre-existing objects... Executing common preamble SQL Executing DDL - done Executing view DDL - done Starting data load : 2 thds loading - 1 thds indexing - 1 thds chksum - 100% (3.01 MB / 3.01 MB), 3.01 MB/s, 14 / 16 tables done ERROR: Checksum verification failed for: <code>sakila</code>.<code>actor</code> (chunk 0) (boundary: (<code>actor_id</code> BETWEEN 1 AND 200)). Recreating indexes - done WARNING: sakila@payment@@0.tsv.zst error 3719: 'utf8' is currently an alias for the character set UTF8MB3, but will be an alias for UTF8MB4 in a future release. Please consider using UTF8MB4 in order to be unambiguous. WARNING: sakila@rental@@0.tsv.zst error 3719: 'utf8' is currently an alias for the character set UTF8MB3, but will be an alias for UTF8MB4 in a future release. Please consider using UTF8MB4 in order to be unambiguous. Executing common postamble SQL Verifying checksum information - done 16 chunks (47.27K rows, 3.01 MB) for 16 tables in 1 schemas were loaded in 3 sec (avg throughput 3.01 MB/s) 16 warnings were reported during the load. 16 checksums were verified in 1 sec. ERROR: 1 checksum verification errors were reported during the load. NOTE: Warnings reported during the load may provide some information on source of these errors. ERROR: Checksum verification failed |
しっかり checksum 処理が失敗していることが確認できました。
行まで特定できればより良いですが、Checksumはチャンク単位のようですので、具体的に何が違うのかはユーザが boundary: (actor_id BETWEEN 1 AND 200)
の条件を指定して調査していく必要があります。
チャンクまで特定できればかなり調査は楽になるでしょう。
チェックサム機能を有効化した際のパフォーマンスについて
チェックサム機能を有効化した際のパフォーマンスについて、 TPC-H のデータを使用して確認します。
今回のテストデータは 13G 程度としました。
1 2 |
$ du -sh /var/lib/mysql/tpch 13G /var/lib/mysql/tpch |
テスト環境のスペックは以下の通りで、KVMの仮想マシンに CPU 4core/Mem 8GiB を割り当てたものです。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 |
$ mysql -e "select @@version" +-----------+ | @@version | +-----------+ | 8.3.0 | +-----------+ [root@blog-mysql-80 ~]# lscpu ; lsmem Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Address sizes: 46 bits physical, 48 bits virtual Byte Order: Little Endian CPU(s): 4 On-line CPU(s) list: 0-3 Vendor ID: GenuineIntel BIOS Vendor ID: Red Hat Model name: Intel Xeon Processor (Cascadelake) BIOS Model name: RHEL 7.6.0 PC (i440FX + PIIX, 1996) CPU family: 6 Model: 85 Thread(s) per core: 1 Core(s) per socket: 1 Socket(s): 4 Stepping: 5 BogoMIPS: 3791.09 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss syscall nx pdpe1gb rd tscp lm constant_tsc rep_good nopl xtopology cpuid tsc_known_freq pni pclmulqdq vmx ssse3 fma cx16 pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch cpuid_fault invpcid _single ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves arat umip pku ospke avx512_vnni md_clear arch_capabilities Virtualization features: Virtualization: VT-x Hypervisor vendor: KVM Virtualization type: full Caches (sum of all): L1d: 128 KiB (4 instances) L1i: 128 KiB (4 instances) L2: 16 MiB (4 instances) L3: 64 MiB (4 instances) NUMA: NUMA node(s): 1 NUMA node0 CPU(s): 0-3 Vulnerabilities: Itlb multihit: Not affected L1tf: Not affected Mds: Not affected Meltdown: Not affected Mmio stale data: Vulnerable: Clear CPU buffers attempted, no microcode; SMT Host state unknown Retbleed: Mitigation; Enhanced IBRS Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization Spectre v2: Mitigation; Enhanced IBRS, IBPB conditional, RSB filling, PBRSB-eIBRS SW sequence Srbds: Not affected Tsx async abort: Mitigation; TSX disabled RANGE SIZE STATE REMOVABLE BLOCK 0x0000000000000000-0x00000000bfffffff 3G online yes 0-23 0x0000000100000000-0x000000023fffffff 5G online yes 32-71 Memory block size: 128M Total online memory: 8G Total offline memory: 0B |
checksum無しでの util.dumpSchema() はダンプに3分38秒かかりました。
1 2 |
Dump duration: 00:03:38s Total duration: 00:03:39s |
checksum=true の場合でもある程度並行して dumpと checksumが実行されるようですが、Totalの時間として2倍程度かかっています。
1 2 3 |
Dump duration: 00:04:24s Checksum duration: 00:08:25s Total duration: 00:08:51s |
checksum無しでの util.loadDump() によるインポートには14分42秒かかりました。
1 2 |
216 chunks (86.59M rows, 11.33 GB) for 8 tables in 1 schemas were loaded in 14 min 42 sec (avg throughput 12.95 MB/s) 216 warnings were reported during the load. |
checksumありの util.loadDump() によるインポートには21分43秒かかり、内checksumには7分57秒かかっていました。
1 2 3 |
216 chunks (86.59M rows, 11.33 GB) for 8 tables in 1 schemas were loaded in 21 min 43 sec (avg throughput 13.69 MB/s) 216 warnings were reported during the load. 64 checksums were verified in 7 min 57 sec. |
util.dumpSchema()のチェックサムフェーズでは、以下のクエリが実行されていました。
すべてのカラムをbinary型にキャストしてから比較を行うため、文字コード等の影響は無さそうです。
1 2 3 4 5 6 7 8 9 |
SELECT count(*), hex(bit_xor(unhex(sha2(concat_ws('#', convert(<code>l_orderkey</code> using binary), convert(<code>l_partkey</code> using binary), ...全カラム分繰り返し...),256)))) FROM tpch</code>.<code>lineitem /* mysqlsh dumpSchemas, checksumming table <code>tpch</code>.<code>lineitem</code>, ID: whole table split in chunks */ |
今回lineitemにはPKを作成していなかったためWHERE条件がついていませんが、PKやUKのあるテーブルではWHERE条件によってチャンクごとに評価されるはずです。
また、チャンク分割されることで --threads
で指定したスレッド数をフルに活かした並列処理ができますので、PKの無い(かつ行数の多い)テーブルは無くしておいたほうがパフォーマンス上のメリットがあるでしょう。
注意点としてschemaオプションで別スキーマにインポートするとチェックサムの比較ができずにエラーになるようです
1 2 |
Verifying checksum information - done ERROR: Unable to find schema tpch whose existence is being checked |
以上が検証結果になります。
移行前後のデータが整合性が取れている、ということを自信を持って言うために、今後のアップグレードやデータ移行時に活用してみてはいかがでしょうか。