반응형

마리아 그녀를 믿지 마세요..

 

MariaDB 10.2.15 to 10.4.17 로 올린 후 DB가 shutdown 되고 있다.
정확하게 얘기하면 내려갔다 다시 올라오고 있다.

로그를 확인해보면 
[FATAL] InnoDB: Semaphore wait has lasted > 600 seconds 에러가 나고 자체적으로 복구하고 재시작을 하고 있다.

 

2021-02-19 11:47:06 153730 [Warning] Aborted connection 153730 to db: 'xxxDB' user: 'xxxuser' host: '192.168.10.77' (Got timeout reading communication packets)
2021-02-19 11:47:07 153996 [Warning] Aborted connection 153996 to db: 'xxxDB' user: 'xxxuser' host: '192.168.10.77' (Got an error reading communication packets)
2021-02-19 11:47:07 153995 [Warning] Aborted connection 153995 to db: 'xxxDB' user: 'xxxuser' host: '192.168.10.77' (Got an error reading communication packets)
2021-02-19 11:47:08 153735 [Warning] Aborted connection 153735 to db: 'xxxDB' user: 'xxxuser' host: '192.168.10.77' (Got timeout reading communication packets)
2021-02-19 11:47:10 154001 [Warning] Aborted connection 154001 to db: 'xxxDB' user: 'xxxuser' host: '192.168.10.77' (Got an error reading communication packets)
2021-02-19 11:47:10 154000 [Warning] Aborted connection 154000 to db: 'xxxDB' user: 'xxxuser' host: '192.168.10.77' (Got an error reading communication packets)
2021-02-19 11:47:10 153741 [Warning] Aborted connection 153741 to db: 'xxxDB' user: 'xxxuser' host: '192.168.10.77' (Got timeout reading communication packets)
2021-02-19 11:47:12 154007 [Warning] Aborted connection 154007 to db: 'xxxDB' user: 'xxxuser' host: '192.168.10.77' (Got an error reading communication packets)
2021-02-19 11:47:12 154006 [Warning] Aborted connection 154006 to db: 'xxxDB' user: 'xxxuser' host: '192.168.10.77' (Got an error reading communication packets)
2021-02-19 11:47:12 154005 [Warning] Aborted connection 154005 to db: 'xxxDB' user: 'xxxuser' host: '192.168.10.77' (Got an error reading communication packets)
2021-02-19 11:47:12 154004 [Warning] Aborted connection 154004 to db: 'xxxDB' user: 'xxxuser' host: '192.168.10.77' (Got an error reading communication packets)
2021-02-19 11:47:12 154003 [Warning] Aborted connection 154003 to db: 'xxxDB' user: 'xxxuser' host: '192.168.10.77' (Got an error reading communication packets)
2021-02-19 11:47:13 153747 [Warning] Aborted connection 153747 to db: 'xxxDB' user: 'xxxuser' host: '192.168.10.77' (Got timeout reading communication packets)
2021-02-19 11:47:14 154012 [Warning] Aborted connection 154012 to db: 'xxxDB' user: 'xxxuser' host: '192.168.10.77' (Got an error reading communication packets)
2021-02-19 11:47:14 154011 [Warning] Aborted connection 154011 to db: 'xxxDB' user: 'xxxuser' host: '192.168.10.77' (Got an error reading communication packets)
2021-02-19 11:47:15 153753 [Warning] Aborted connection 153753 to db: 'xxxDB' user: 'xxxuser' host: '192.168.10.77' (Got timeout reading communication packets)
InnoDB: ###### Diagnostic info printed to the standard error stream
2021-02-19 11:47:16 0 [ERROR] [FATAL] InnoDB: Semaphore wait has lasted > 600 seconds. We intentionally crash the server because it appears to be hung.
210219 11:47:16 [ERROR] mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.

To report this bug, see https://mariadb.com/kb/en/reporting-bugs

We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed, 
something is definitely wrong and this may fail.

Server version: 10.4.17-MariaDB-log
key_buffer_size=33554432
read_buffer_size=16777216
max_used_connections=142
max_threads=1002
thread_count=109
It is possible that mysqld could use up to 
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 24682647 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x0 thread_stack 0x40000
/data/mariadb/bin/mysqld(my_print_stacktrace+0x2e)[0x5635b663de1e]
/data/mariadb/bin/mysqld(handle_fatal_signal+0x30f)[0x5635b604b3af]
/lib64/libpthread.so.0(+0x330de0f7e0)[0x7f0e427e47e0]
/lib64/libc.so.6(gsignal+0x35)[0x7f0e416a54f5]
/lib64/libc.so.6(abort+0x175)[0x7f0e416a6cd5]
2021-02-19 11:47:17 154018 [Warning] Aborted connection 154018 to db: 'xxxDB' user: 'xxxuser' host: '192.168.10.77' (Got an error reading communication packets)
2021-02-19 11:47:17 154017 [Warning] Aborted connection 154017 to db: 'xxxDB' user: 'xxxuser' host: '192.168.10.77' (Got an error reading communication packets)
2021-02-19 11:47:17 154016 [Warning] Aborted connection 154016 to db: 'xxxDB' user: 'xxxuser' host: '192.168.10.77' (Got an error reading communication packets)
/data/mariadb/bin/mysqld(+0xc382f0)[0x5635b63412f0]
/data/mariadb/bin/mysqld(+0xbf07c2)[0x5635b62f97c2]
/lib64/libpthread.so.0(+0x330de07aa1)[0x7f0e427dcaa1]
2021-02-19 11:47:18 153758 [Warning] Aborted connection 153758 to db: 'xxxDB' user: 'xxxuser' host: '192.168.10.77' (Got timeout reading communication packets)
/lib64/libc.so.6(clone+0x6d)[0x7f0e4175bc4d]
The manual page at https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mysqld/ contains
information that should help you find out what is causing the crash.
Writing a core file...
Working directory at /data/mariadb_data
Fatal signal 11 while backtracing
2021-02-19 11:47:27 0 [Note] InnoDB: Using Linux native AIO
2021-02-19 11:47:27 0 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
2021-02-19 11:47:27 0 [Note] InnoDB: Uses event mutexes
2021-02-19 11:47:27 0 [Note] InnoDB: Compressed tables use zlib 1.2.11
2021-02-19 11:47:27 0 [Note] InnoDB: Number of pools: 1
2021-02-19 11:47:27 0 [Note] InnoDB: Using SSE2 crc32 instructions
2021-02-19 11:47:27 0 [Note] InnoDB: Initializing buffer pool, total size = 96G, instances = 8, chunk size = 128M
2021-02-19 11:47:31 0 [Note] InnoDB: Completed initialization of buffer pool
2021-02-19 11:47:32 0 [Note] InnoDB: If the mysqld execution user is authorized, page cleaner thread priority can be changed. See the man page of setpriority().
2021-02-19 11:47:33 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=12397809980512
2021-02-19 11:47:42 0 [Note] InnoDB: 1 transaction(s) which must be rolled back or cleaned up in total 861 row operations to undo
2021-02-19 11:47:42 0 [Note] InnoDB: Trx id counter is 119829882
2021-02-19 11:47:42 0 [Note] InnoDB: Starting final batch to recover 170415 pages from redo log.
2021-02-19 11:47:47 0 [Note] InnoDB: To recover: 168236 pages from log
2021-02-19 11:48:02 0 [Note] InnoDB: To recover: 154841 pages from log
2021-02-19 11:48:17 0 [Note] InnoDB: To recover: 146890 pages from log
2021-02-19 11:48:32 0 [Note] InnoDB: To recover: 139733 pages from log
2021-02-19 11:48:47 0 [Note] InnoDB: To recover: 132848 pages from log
2021-02-19 11:49:02 0 [Note] InnoDB: To recover: 125675 pages from log
2021-02-19 11:49:17 0 [Note] InnoDB: To recover: 118673 pages from log
2021-02-19 11:49:32 0 [Note] InnoDB: To recover: 111753 pages from log
2021-02-19 11:49:47 0 [Note] InnoDB: To recover: 104255 pages from log
2021-02-19 11:50:02 0 [Note] InnoDB: To recover: 97230 pages from log
2021-02-19 11:50:17 0 [Note] InnoDB: To recover: 89903 pages from log
2021-02-19 11:50:32 0 [Note] InnoDB: To recover: 82002 pages from log
2021-02-19 11:50:47 0 [Note] InnoDB: To recover: 74631 pages from log
2021-02-19 11:51:02 0 [Note] InnoDB: To recover: 68116 pages from log
2021-02-19 11:51:17 0 [Note] InnoDB: To recover: 61849 pages from log
2021-02-19 11:51:32 0 [Note] InnoDB: To recover: 54565 pages from log
2021-02-19 11:51:47 0 [Note] InnoDB: To recover: 43859 pages from log
2021-02-19 11:52:02 0 [Note] InnoDB: To recover: 37005 pages from log
2021-02-19 11:52:17 0 [Note] InnoDB: To recover: 30065 pages from log
2021-02-19 11:52:32 0 [Note] InnoDB: To recover: 23362 pages from log
2021-02-19 11:52:47 0 [Note] InnoDB: To recover: 16618 pages from log
2021-02-19 11:53:02 0 [Note] InnoDB: To recover: 9739 pages from log
2021-02-19 11:53:17 0 [Note] InnoDB: To recover: 1741 pages from log
2021-02-19 11:53:22 0 [Note] InnoDB: Last binlog file './mysql-bin.000001', position 7587824
2021-02-19 11:53:22 0 [Note] InnoDB: 128 out of 128 rollback segments are active.
2021-02-19 11:53:22 0 [Note] InnoDB: Starting in background the rollback of recovered transactions
2021-02-19 11:53:22 0 [Note] InnoDB: Removed temporary tablespace data file: "ibtmp1"
2021-02-19 11:53:22 0 [Note] InnoDB: Creating shared tablespace for temporary tables
2021-02-19 11:53:22 0 [Note] InnoDB: Setting file './ibtmp1' size to 12 MB. Physically writing the file full; Please wait ...
2021-02-19 11:53:22 0 [Note] InnoDB: File './ibtmp1' size is now 12 MB.
2021-02-19 11:53:22 0 [Note] InnoDB: Waiting for purge to start
2021-02-19 11:53:22 0 [Note] InnoDB: 10.4.17 started; log sequence number 12397810010280; transaction id 119829883
2021-02-19 11:53:22 0 [Note] InnoDB: Loading buffer pool(s) from /data/mariadb_data/ib_buffer_pool
2021-02-19 11:53:22 0 [Note] Plugin 'FEEDBACK' is disabled.
2021-02-19 11:53:22 0 [Warning] Plugin 'FEDERATED' is of maturity level gamma while the server is stable
2021-02-19 11:53:22 0 [Note] Server socket created on IP: '::'.
2021-02-19 11:53:22 0 [Warning] 'user' entry 'root@privacy-db' ignored in --skip-name-resolve mode.
2021-02-19 11:53:22 0 [Warning] 'proxies_priv' entry '@% root@privacy-db' ignored in --skip-name-resolve mode.
2021-02-19 11:53:22 0 [Note] Reading of all Master_info entries succeeded
2021-02-19 11:53:22 0 [Note] Added new Master_info '' to hash table
2021-02-19 11:53:22 0 [Note] /data/mariadb/bin/mysqld: ready for connections.
Version: '10.4.17-MariaDB-log'  socket: '/tmp/mysql.sock'  port: 3306  MariaDB Server
2021-02-19 11:53:26 0 [Note] InnoDB: Rolled back recovered transaction 119825411
2021-02-19 11:53:26 0 [Note] InnoDB: Rollback of non-prepared transactions completed
2021-02-19 11:53:30 34 [Warning] Aborted connection 34 to db: 'xxxDB' user: 'xxxuser' host: '192.168.10.77' (Got an error reading communication packets)
2021-02-19 11:53:30 33 [Warning] Aborted connection 33 to db: 'xxxDB' user: 'xxxuser' host: '192.168.10.77' (Got an error reading communication packets)
2021-02-19 11:53:30 32 [Warning] Aborted connection 32 to db: 'xxxDB' user: 'xxxuser' host: '192.168.10.77' (Got an error reading communication packets)
2021-02-19 11:53:30 31 [Warning] Aborted connection 31 to db: 'xxxDB' user: 'xxxuser' host: '192.168.10.77' (Got an error reading communication packets)
2021-02-19 11:53:30 30 [Warning] Aborted connection 30 to db: 'xxxDB' user: 'xxxuser' host: '192.168.10.77' (Got an error reading communication packets)
2021-02-19 11:53:30 28 [Warning] Aborted connection 28 to db: 'xxxDB' user: 'xxxuser' host: '192.168.10.77' (Got an error reading communication packets)
2021-02-19 11:53:32 35 [Warning] Aborted connection 35 to db: 'xxxDB' user: 'xxxuser' host: '192.168.10.77' (Got an error reading communication packets)
2021-02-19 11:53:32 36 [Warning] Aborted connection 36 to db: 'xxxDB' user: 'xxxuser' host: '192.168.10.77' (Got an error reading communication packets)
2021-02-19 11:53:32 49 [Warning] Aborted connection 49 to db: 'xxxDB' user: 'xxxuser' host: '192.168.10.77' (Got an error reading communication packets)
2021-02-19 11:53:32 38 [Warning] Aborted connection 38 to db: 'xxxDB' user: 'xxxuser' host: '192.168.10.77' (Got an error reading communication packets)
2021-02-19 11:53:32 37 [Warning] Aborted connection 37 to db: 'xxxDB' user: 'xxxuser' host: '192.168.10.77' (Got an error reading communication packets)'

발생로그를 확인해보니 특정 테이블에 insert를 할 때마다 DB가 요동치다가 결국 죽는 것으로 확인했다.
이 테이블은 파티션테이블로 전체데이터는 약 30억건 정도 있는 테이블인데..
인덱스는 깨진 것을 확인했다.

--> DB 업그레이드하다 인덱스가 깨진 것 같다. MyISAM에서는 흔한 증상인데 innoDB에서 발생하니 좀 의외이긴하다.

일단 기도하는 마음으로 count(*)를 날려본다. 조회가 안된다면 PK가 깨진것이다. --> 조회가 안되면... 아.. 오늘이 퇴사하는 날이네? 라고 생각하면 된다

(innodb엔진 테이블의 모든데이터는 pk를 기준으로 데이터를 쌓고 모든 데이터에 pk의 시퀀스가 있어 복구하기 쉽지 않다)

 

MariaDB [xxxDB]> select count(*) from TB_xxx partition (p202102);
+-----------+
| count(*)  |
+-----------+
| 295327224 |
+-----------+
1 row in set (11 min 40.089 sec)

 

다행 PK는 깨지지 않았다 ㅠ
그럼 second index 가 깨진 것이고 이것을 복구하는 방법은 
1) shutdown이 가능하다면 innodb 복구 1~6 중에 해서 복구가가능하고, 
2) shutdown이 불가능하면 테이블스키마를 새로파고 insert into~로 데이터만 가져다 복구하는 방법
3) .ibd 파일만가지고가서 tablespace 교체로 복구하는 방법이 있다.

 

데이터가 거의 3억건이라서 원래는 3)번을 해야하는데 파티션 테이블이고 생각보다 복잡해질 거 같아 2)으로 진행했다.
테이블 스키마를 새로생성하고 insert 를 진행하는데
그런데?... insert 가 되지 않았다;
어떤 문제인지 테이블에 데이터를 하나하나씩 검토중에 datetime 형식의 파일이 이상하게 들어가 있는 것을 확인했다;

 

5033-11-13 23:34:31 8675-08-18 08:29:33

 

데이터가 깨졌다. 
DB 버전업하고 데이터가 꺠진것이다.ㅠ 하지만 다행이다. 이미 통계쪽 백업은 배치로 만들어놓아서 당장 전전월의 raw 데이터가 필요하진 않다.
한달치만 살리면 된다.

일단 테이블을 교체해서 새로 들어오는 데이터는 정상적으로 쌓이는 거을 확인했다. 복구는 개발쪽과 협의해서 raw 파일은 있으니 월말 전까지만 다시 insert 하면 된다.

추가로 확인해보니 DB 10.4.17로 업그레이드후.. 나머지 테이블은 1000만건, 3000만건 등등.. 멀쩡한데, 

30억건 6억건 되는 테이블 2개만 datatime 컬럼 데이터가 깨졌다.ㅠ
이 정도 큰 대용량 테이블은 제대로 업그레이드 지원이 되지 않은 거 같다.

아.. 새벽 5시네 이제 별일 없기를..
1차 대응 끝.

 

반응형
블로그 이미지

dung beetle

취미는 데이터 수집 직업은 MYSQL과 함께 일하는 DBA의 소소한 일상 이야기

,