Mroongan2016

: subtitle

高速日本語全文検索 for MariaDB\n
(('note:Super fast full text search for MariaDB'))

: author

Kouhei Sutou

: institution

ClearCode Inc.

: content-source

MariaDB Community Event in Tokyo

: date

2016-07-21

: allotted-time

30m

: theme

.

Mroonga

* 読み方:むるんが\n
  (('note:Pronunciation: múlúnɡά'))
* ストレージエンジン\n
  (('note:Storage engine'))
* (('wait'))MariaDBバンドル\n
  (('note:Bundled in MariaDB'))
  * 別途インストールしなくてもよい\n
    (('note:No need to install Mroonga separately'))

特徴n(('note:Characteristics'))

* (('wait'))
  高速日本語全文検索(('note:(全言語OK)'))\n
  (('note:Super fast full text search for all languages'))
* (('wait'))
  カラムストアによる高速処理\n
  (('note:Super fast processing by column store architecture'))
* (('wait'))
  全文検索初心者でも使える\n
  (('note:Easy to use by full text search beginners'))
* (('wait'))
  全文検索上級者は活用できる\n
  (('note:Features for full text search specialists'))

高速日本語全文検索n(('note:Super fast full text search'))

(1) ベンチマーク\n
    (('note:Benchmark'))
(2) 速さの秘密\n
    (('note:The reason why Mroonga is fast'))

ベンチマーク環境n(('note:Benchmark environment'))

* 対象:Wikipedia日本語版\n
  (('note:Target: Japanese version Wikipedia'))
* レコード数:約185万件\n
  (('note:The number of records: About 1.85 millions'))
* データサイズ:約7GB\n
  (('note:Data size: About 7GB'))
* メモリー4GB・SSD250GB(('note:(ConoHa)'))\n
  (('note:Memory: 4GB, SSD: 250GB'))

補足n(('note:Supplement'))

* MySQL 5.7を使用\n
  (('note:MySQL 5.7 is used'))
  * MariaDBのInnoDBは日本語未対応\n
    (('note:InnoDB in MariaDB doesn't support Japanese yet'))
* 他人のベンチマークは参考程度\n
  (('note:Just refer benchmark result by others'))
  * 検討時は実環境でベンチマークを!\n
    (('note:Run benchmark with the real data on real env'))

(('note:詳細(Detail):'))n (('note:github.com/groonga/wikipedia-search/issues/4'))

検索1n(('note:Search1'))

(('tag:center')) キーワード:テレビアニメn (('note:(ヒット数:約2万3千件)'))n (('note:Keyword: TV animation'))n (('note:(N hits: About 23K)'))

# RT
delimiter = [|]

InnoDB ngram | 3m2s
InnoDB MeCab | 6m20s
Mroonga:((*1*)) | 0.11s

検索2n(('note:Search2'))

(('tag:center')) キーワード:データベースn (('note:(ヒット数:約1万7千件)'))n (('note:Keyword: Database'))n (('note:(N hits: About 17K)'))

# RT
delimiter = [|]

InnoDB ngram | 36s
InnoDB MeCab:((*1*)) | 0.03s
Mroonga:((*2*)) | 0.09s

検索3n(('note:Search3'))

(('tag:center')) キーワード:PostgreSQL OR MySQLn (('note:(ヒット数:約400件)'))n (('note:Keyword: PostgreSQL OR MySQL'))n (('note:(N hits: About 400)'))

# RT
delimiter = [|]

InnoDB ngram | N/A(Error)
InnoDB MeCab:((*1*)) | 0.005s
Mroonga:((*2*)) | 0.028s

検索4n(('note:Search4'))

(('tag:center')) キーワード:日本n (('note:(ヒット数:約63万件)'))n (('note:Keyword: Japan'))n (('note:(N hits: About 630K)'))

# RT
delimiter = [|]

InnoDB ngram | 1.3s
InnoDB MeCab | 1.3s
Mroonga:((*1*)) | 0.21s

検索まとめn(('note:Wrap up search'))

* (('wait'))Mroonga:安定して速い\n
  (('note:Always fast'))
* (('wait'))InnoDB FTS MeCab
  * ハマれば速い\n
    (('note:Fast only for one token query'))
* (('wait'))InnoDB FTS ngram
  * 安定して遅い\n
    (('note:Always slow'))

速さの秘密n(('note:The reason why Mroonga is fast'))

* 最適化された転置索引実装\n
  (('note:Optimized inverted index implementation'))
  * (('wait'))2段階のデータ圧縮\n
    (('note:2 level data compression'))
  * (('wait'))高速なポスティングリスト探索\n
    (('note:Fast posting list search'))
  * (('wait'))検索だけでなく更新も速い\n
    (('note:Not only search but also update is fast'))

(('wait')) (('note:11年以上開発が続いている全文検索エンジンGroongaを使用'))n (('note:Groonga full text search engine (11 years old) is used'))

もっと速さの秘密n(('note:More reasons why Mroonga is fast'))

* カラムストアを活かした最適化\n
  (('note:Optimizations based on column store architecture'))
  * ポイント1:余計なI/Oを減らす\n
    (('note:Point1: Reduce needless I/O'))
  * ポイント2:I/Oを局所化\n
    (('note:Point2: Localize I/O'))

カラムストアn(('note:Column store'))

# image
# src = images/column-store.svg
# relative_height = 100

必要なカラムのみアクセスn(('note:Access to only needed columns'))

# coderay sql
-- Access to only a
SELECT a
  FROM table
-- Access to only c
 WHERE c = XXX;
-- b isn't accessed

減ったI/On(('note:Reduced I/O'))

# image
# src = images/not-access-to-needless-columns.svg
# relative_height = 100

行カウントn(('note:Row count'))

# coderay sql
-- No column values are needed
SELECT COUNT(*)
  FROM table
-- Access to only full text search index of c
 WHERE MATCH(c)
     AGAINST('+keyword' IN BOOLEAN MODE);
-- a, b and c aren't accessed

減ったI/On(('note:Reduced I/O'))

# image
# src = images/count-star.svg
# relative_height = 100

(({ORDER BY LIMIT}))

# coderay sql
SELECT *
  FROM table
 WHERE MATCH(c)
     AGAINST('+keyword' IN BOOLEAN MODE)
-- Mroonga processes ORDER BY LIMIT
-- instead of MariaDB
-- → Mroonga returns only 10 records
--    to MariaDB instead of all matched records
 ORDER BY a LIMIT 10;

Optimized (({ORDER BY LIMIT}))

* (('wait'))検索(('note:(Search)')) by Mroonga
  * カラム毎の処理でI/Oを局所化\n
    (('note:(索引非使用時)'))\n
    (('note:Localize I/O by per column processing'))\n
    (('note:(on no index case)'))
* (('wait'))ソート(('note:(Sort)')) by Mroonga
  * カラム毎の処理でI/Oを局所化\n
    (('note:Localize I/O by per column processing'))
* (('wait'))(({OFFSET}))/(({LIMIT})) by Mroonga

カラム毎の処理は速いn(('note:Per column processing is fast'))

# image
# src = images/per-column-processing.svg
# relative_height = 100

最適化のまとめn(('note:Wrap up optimization'))

* 転置索引実装が速い\n
  (('note:Inverted index implementation is fast'))
  * 検索も更新も速い\n
    (('note:Both search and update are fast'))
* カラムストアで速い\n
  (('note:Fast by column store architecture'))
  * ポイント:I/O削減・I/O局所化\n
    (('note:Points: Reduce and localize I/O'))

全文検索初心者でも使えるn(('note:Easy to use by beginners'))

* (('wait'))インストールが簡単\n
  (('note:Easy to install'))
* (('wait'))MySQLの標準機能のみで使える\n
  (('note:Usable only with MySQL standard features'))

インストールが簡単n(('note:Easy to install'))

* (('wait'))MariaDBバンドル\n
  (('note:MariaDB bundles Mroonga'))
* (('wait'))Apt/Yumリポジトリー\n
  (('note:Apt/Yum repositories'))
* (('wait'))MariaDB込みのWindowsバイナリ\n
  (('note:Windows binary with MariaDB'))

標準機能のみで使えるn(('note:Require only MySQL standard features'))

# coderay sql
-- Create
CREATE TABLE table (
  -- ...,
  FULLTEXT INDEX (column)
) ENGINE=Mroonga;

標準機能のみで使えるn(('note:Require only MySQL standard features'))

# coderay sql
-- Convert
ALTER TABLE table
  ADD FULLTEXT INDEX (column)
  ENGINE=Mroonga;

標準機能のみで使えるn(('note:Require only MySQL standard features'))

# coderay sql
SELECT * FROM table
  WHERE
    MATCH(column)
    AGAINST('+keyword'
            IN BOOLEAN MODE);

全文検索上級者向け機能n(('note:Features for specialists'))

* (('wait'))
  カスタマイズ\n
  (('note:Customizable'))
  * デフォルト値はいい感じ\n
    →初心者はカスタマイズなしでよい\n
    (('note:Suitable default values'))\n
    (('note:→Beginners don't need to customize'))
* (('wait'))
  Groongaの機能をもっと使える\n
  (('note:(高速・高機能)'))\n
  (('note:Specialists can use more Groonga features'))\n
  (('note:(Fast and high functionality)'))

文字正規化ルール変更n(('note:Change normalizer'))

# coderay sql
CREATE TABLE table (
  -- ...,
  FULLTEXT INDEX (column)
    --
    -- Specify a parameter as comment
    COMMENT='normalizer "NormalizerAuto"'
) ENGINE=Mroonga;

文字正規化ルール変更n(('note:Change normalizer'))

# coderay sql
CREATE TABLE table (
  -- ...,
  FULLTEXT INDEX (column)
    -- MariaDB:                          
    -- Custom parameter can be used
    NORMALIZER='NormalizerAuto'
) ENGINE=Mroonga;

Groongaの検索機能を使うn(('note:Use full Groonga search features'))

# coderay sql
SELECT * FROM table
  WHERE
    -- "c1" is meaningless with "*SS" pragma
    MATCH(c1)
    -- "*SS" is a pragma to use
    -- full Groonga search features
    -- Multiple indexes can be used in A query
    AGAINST('*SS c1 @ "keyword" && c2 < 100'
            IN BOOLEAN MODE);

今後n(('note:Futures'))

* (('wait'))
  最新機能サポート\n
  (('note:Support the latest features'))
  * JSONを全文検索\n
    (('note:(JSON型のデータの読み書きは対応済み)'))\n
    (('note:Full text search against JSON'))\n
    (('note:(Storing/fetching JSON are already supported)'))
  * virtual column/generated column
* (('wait'))
  最新版をMariaDBにバンドル\n
  (('note:Bundle the latest Mroonga to MariaDB'))

最新版をバンドルn(('note:Bundle the latest Mroonga'))

* (('wait'))
  Mroongaは毎月リリース\n
  (('note:Mroonga is released monthly'))
* (('wait'))
  MariaDB 10.2.1 bundles Mroonga ((*5.04*))
  * The latest Mroonga is 6.06
  * Mroonga supports MariaDB 10.2 since ((*6.03*))
  * How can we improve this?

まとめ1n(('note:Wrap up1'))

* (('wait'))
  高速日本語全文検索(('note:(全言語OK)'))\n
  (('note:Super fast full text search for all languages'))
* (('wait'))
  カラムストアによる高速処理\n
  (('note:Super fast processing by column store architecture'))
* (('wait'))
  全文検索初心者でも使える\n
  (('note:Easy to use by full text search beginners'))
* (('wait'))
  全文検索上級者は活用できる\n
  (('note:Features for full text search specialists'))

まとめ2n(('note:Wrap up2'))

* (('wait'))
  今後もMroongaは便利になる\n
  (('note:We continue to improve Mroonga'))
* (('wait'))
  MariaDBで最新Mroongaを使える\n
  (('note:MariaDB will bundle the latest Mroonga'))

(('wait')) MariaDBで全文検索ならMroonga!n (('note:Mroonga is the best for full text search on MariaDB!'))