State Key Laboratory of Intelligent Technology and Systems

Information Retrieval Group

THUIRDB: High Performance Key-Value DB

Overview:

THUIRDB is a C + + language based library for high performance on a single key-value persistent storage and high-speed queries.

For example: The following corpus file (corpus_file) as follows:

Penny <-> liang

tsinghua <-> university

...

google <-> search engine

Where the former represents key, which represents the value.

In the storage operation is complete, enter an arbitrary key ( eg: tsinghua), then the system quickly gives the corresponding value ( eg:university).

 

Advantages:

Index compression rate (average index key-value pairs consume 1 ~ 2bit)

Good scalability of computing resources

Support the large amount of data (Theoretically, using 4G memory, we can support creating database with 10 billion records and retrieve data quickly, read the disk up to at least once per once search)

Easy to use (do not rely on other specific libraries)

Features:

THUIRDB focus:

Once creating large-scale data (100 million records~10 billion records), then do not modify (the read-only scenes).

We can create the database and query the data quickly that are the main advantages.

Its main function:

Batch create databases

Queries & concurrent queries

Sequential scanning database

Restrictions:

THUIRDB is a basis for the development of the library, with more features to achieve the basis, for many of the features currently are not supported or still in the validation phase.

Other restrictions:

Does not support the SQL language

Only C language interface

Key: maximum of 512 bytes (can be changed if demand larger)

Value: maximum 4K bytes (can be changed if demand larger)

Academic communications:

2011-04-19 invited to speak at the Institute of Computing Natural Language Processing group for THUIRDB principles reports and technical exchanges. See more

2011-05-31 invited to speak at the Institute of Computing Information Retrieval Group for THUIRDB principles reports and technical exchanges. See more.

2011-06-14 invited NetEase Hangzhou Institute for THUIRDB principles reports and technical exchanges.

2011-06-17 invited to the Shanghai Stock Exchange for THUIRDB principles reports and technical exchanges.

2011-06-21 invited to Beijing for THUIRDB principle Taobao reports and technical exchanges.

2011-06-24 invited database group at Tsinghua University describes THUIRDB works.

2011-06-28 invited to introduce THUIRDB should search technology works.

2011-06-29 invited on Sina Weibo introduces THUIRDB works.

2011-07-10 invited to participate in Taobao technical carnival activities and make guests’ report. See more.

2011-07-10 invited to Thailand for the principles of science and technology for THUIRDB reports and technical exchanges.

2011-11-09 invited people search for THUIRDB principles reports and technical exchanges.

2011-11-14 invited to speak at Microsoft Research Asia for THUIRDB principles reports and technical exchanges.

2011-12-03 invited to participate in Hadoop introduction THUIRDB works. See more.

2012-11-30 participate in Chinese information retrieval conference and make reports.

Copyright:

Copyright 2007.8 State Key Laboratory of Intelligent Technology and Systems, All Rights Reserved

Software Downloads:

Download link: download (6.1KB), and refer download(6.1KB) to the ReadMe.txt learn to use. Primary users using sh help.sh a key experience.

Published papers:

Download Link: pdf

Currently supported online system:

1) Weibo people search online system: xunren.thuir.org

2) GoOnReading System: duxiaqu.com

3) Multi-language Wordbank cikuapi.com