Introduction
HBase is a column-oriented database that’s an open-source implementation of Google’s Big Table storage architecture. It can manage structured and semi-structured data and has some built-in features such as scalability, versioning, compression and garbage collection. Since its uses write-ahead logging and distributed configuration, it can provide fault-tolerance and quick recovery from individual server failures. HBase built on top of Hadoop / HDFS and the data stored in HBase can be manipulated using Hadoop’s MapReduce capabilities. Let’s now take a look at how HBase (a column-oriented database) is different from some other data structures and concepts that we are familiar with Row-Oriented vs. Column-Oriented data stores. As shown below, in a row-oriented data store, a row is a unit of data that is read or written together. In a column-oriented data store, the data in a column is stored together and hence quickly retrieved.
Row-oriented data stores:
Column-oriented data stores:
The HBase Architecture consists of servers in a Master-Slave
relationship as shown below. Typically, the HBase cluster has one Master
node, called HMaster and multiple Region Servers called HRegionServer.
Each Region Server contains multiple Regions – HRegions.
Just like in a Relational Database, data in HBase is stored in Tables and these Tables are stored in Regions. When a Table becomes too big, the Table is partitioned into multiple Regions. These Regions are assigned to Region Servers across the cluster. Each Region Server hosts roughly the same number of Regions.
whoami:
Show the current hbase user. Example:
hbase> whoami
alter:
Alter column family schema; pass table name and a dictionary specifying new column family schema. Dictionaries are described below in the GENERAL NOTES section. Dictionary must include name
of column family to alter. For example,
To change or add the 'f1' column family in table 't1' from defaults to instead keep a maximum of 5 cell VERSIONS, do:
hbase> alter 't1', {NAME => 'f1', VERSIONS => 5}
To delete the 'f1' column family in table 't1', do:
hbase> alter 't1', {NAME => 'f1', METHOD => 'delete'}
You can also change table-scope attributes like MAX_FILESIZE
MEMSTORE_FLUSHSIZE and READONLY.
For example, to change the max size of a family to 128MB, do:
hbase> alter 't1', {METHOD => 'table_att', MAX_FILESIZE => '134217728'}
count:
Count the number of rows in a table. This operation may take a LONG time (Run '$HADOOP_HOME/bin/hadoop jar hbase.jar rowcount' to run a counting mapreduce job). Current count is shown every 1000 rows by default. Count interval may be optionally specified. Examples:
hbase> count 't1'
hbase> count 't1', 100000
hbase> t.count INTERVAL => 100000
hbase> t.count CACHE => 1000
hbase> t.count INTERVAL => 10, CACHE => 1000
whoami:
Show the current hbase user. Example:
hbase> whoami
alter:
Alter column family schema; pass table name and a dictionary specifying new column family schema. Dictionaries are described below in the GENERAL NOTES section. Dictionary must include name
of column family to alter. For example,
To change or add the 'f1' column family in table 't1' from defaults to instead keep a maximum of 5 cell VERSIONS, do:
hbase> alter 't1', {NAME => 'f1', VERSIONS => 5}
To delete the 'f1' column family in table 't1', do:
hbase> alter 't1', {NAME => 'f1', METHOD => 'delete'}
You can also change table-scope attributes like MAX_FILESIZE
MEMSTORE_FLUSHSIZE and READONLY.
For example, to change the max size of a family to 128MB, do:
hbase> alter 't1', {METHOD => 'table_att', MAX_FILESIZE => '134217728'}
count:
Count the number of rows in a table. This operation may take a LONG time (Run '$HADOOP_HOME/bin/hadoop jar hbase.jar rowcount' to run a counting mapreduce job). Current count is shown every 1000 rows by default. Count interval may be optionally specified. Examples:
hbase> count 't1'
hbase> count 't1', 100000
hbase> t.count INTERVAL => 100000
hbase> t.count CACHE => 1000
hbase> t.count INTERVAL => 10, CACHE => 1000
Go to HBase Mode
$hbase shell
List all the tables
hbase>list
Create HBase table with Normal Mode
hbase>create ‘cars’, ‘vi’
Let’s insert 3 column qualifies (make, model, year) and the associated values into the first row (row1).
hbase>put ‘cars’, ‘row1’, ‘vi:make’, ‘BMW’
hbase>put ‘cars’, ‘row1’, ‘vi:model’, ‘5 series’
hbase>put ‘cars’, ‘row1’, ‘vi:year’, ‘2012’
Now let’s add second row
hbase>put ‘cars’, ‘row2’, ‘vi:make’, ‘Ferari’
hbase>put ‘cars’, ‘row2’, ‘vi:model’, ‘e series’
hbase>put ‘cars’, ‘row2’, ‘vi:year’, ‘2012’
Now let’s add third row
hbase>put ‘cars’, ‘row3’, ‘vi:make’, ‘Honda’
hbase>put ‘cars’, ‘row3’, ‘vi:model’, ‘f series’
hbase>put ‘cars’, ‘row3’, ‘vi:year’, ‘2013’
Sacn the table
hbase>scan ‘cars’
The next scan we’ll run will limit our results to the make column qualifier.
hbase>scan ‘cars’, {COLUMNs=>[‘vi:make’]}
1 row to demonstrate how LIMIT works.
hbase>scan ‘cars’, {COLUMNS =>[‘vi:make’], LIMIT => 1}
We’ll start by getting all columns in row1.
hbase>get ‘cars’, ‘row1’
You should see output similar to:
COLUMN CELL
vi:make timestamp=1344817012999, value=bmw
vi:model timestamp=1344817020843, value=5 series
vi:year timestamp=1344817033611, value=2012
vi:make timestamp=1344817104923, value=mercedes
vi:model timestamp=1344817115463, value=e class 2
row(s) in 0.0080 seconds
Disable and drop tables
>disable ‘cars’
>drop ‘cars’
Exit the table
>exit
HBase is a column-oriented database that’s an open-source implementation of Google’s Big Table storage architecture. It can manage structured and semi-structured data and has some built-in features such as scalability, versioning, compression and garbage collection. Since its uses write-ahead logging and distributed configuration, it can provide fault-tolerance and quick recovery from individual server failures. HBase built on top of Hadoop / HDFS and the data stored in HBase can be manipulated using Hadoop’s MapReduce capabilities. Let’s now take a look at how HBase (a column-oriented database) is different from some other data structures and concepts that we are familiar with Row-Oriented vs. Column-Oriented data stores. As shown below, in a row-oriented data store, a row is a unit of data that is read or written together. In a column-oriented data store, the data in a column is stored together and hence quickly retrieved.
Row-oriented data stores:
- Data is stored and retrieved one row at a time and hence could read unnecessary data if only some of the data in a row is required.
- Easy to read and write records
- Well suited for OLTP systems
- Not efficient in performing operations applicable to the entire dataset and hence aggregation is an expensive operation
- Typical compression mechanisms provide less effective results than those on column-oriented data stores
Column-oriented data stores:
- Data is stored and retrieved in columns and hence can read only relevant data if only some data is required
- Read and Write are typically slower operations
- Well suited for OLAP systems
- Can efficiently perform operations applicable to the entire dataset and hence enables aggregation over many rows and columns
- Permits high compression rates due to few distinct values in columns
Introduction Relational Databases vs. HBase
When talking of data
stores, we first think of Relational Databases with structured data
storage and a sophisticated query engine. However, a Relational Database
incurs a big penalty to improve performance as the data size increases.
HBase, on the other hand, is designed from the ground up to provide
scalability and partitioning to enable efficient data structure
serialization, storage and retrieval.
Differences between a Relational Database and HBase are:
Relational Database:
- Is Based on a Fixed Schema
- Is a Row-oriented datastore
- Is designed to store Normalized Data
- Contains thin tables
- Has no built-in support for partitioning.
HBase:
- Is Schema-less
- Is a Column-oriented datastore
- Is designed to store Denormalized Data
- Contains wide and sparsely populated tables
- Supports Automatic Partitioning
HDFS vs. HBase
HDFS is a distributed
file system that is well suited for storing large files. It’s designed
to support batch processing of data but doesn’t provide fast individual
record lookups. HBase is built on top of HDFS and is designed to provide
access to single rows of data in large tables.
Differences between HDFS and HBase are
HDFS:
- Is suited for High Latency operations batch processing
- Data is primarily accessed through MapReduce
- Is designed for batch processing and hence doesn’t have a concept of random reads/writes
HBase:
- Is built for Low Latency operations
- Provides access to single rows from billions of records
- Data is accessed through shell commands, Client APIs in Java, REST, Avro or Thrift
HBase Architecture
The HBase Physical
Architecture consists of servers in a Master-Slave relationship as shown
below. Typically, the HBase cluster has one Master node, called HMaster
and multiple Region Servers called HRegionServer. Each Region Server
contains multiple Regions – HRegions.
Just like in a
Relational Database, data in HBase is stored in Tables and these Tables
are stored in Regions. When a Table becomes too big, the Table is
partitioned into multiple Regions. These Regions are assigned to Region
Servers across the cluster. Each Region Server hosts roughly the same
number of Regions.
The HMaster in the HBase is responsible for
- Performing Administration
- Managing and Monitoring the Cluster
- Assigning Regions to the Region Servers
- Controlling the Load Balancing and Failover
On the other hand, the HRegionServer perform the following work
- Hosting and managing Regions
- Splitting the Regions automatically
- Handling the read/write requests
- Communicating with the Clients directly
Each Region Server
contains a Write-Ahead Log (called HLog) and multiple Regions. Each
Region in turn is made up of a MemStore and multiple StoreFiles (HFile).
The data lives in these StoreFiles in the form of Column Families
(explained below). The MemStore holds in-memory modifications to the
Store (data).
The mapping of Regions
to Region Server is kept in a system table called .META. When trying to
read or write data from HBase, the clients read the required Region
information from the .META table and directly communicate with the
appropriate Region Server. Each Region is identified by the start key
(inclusive) and the end key (exclusive)
HBase Data Model
The Data Model in HBase
is designed to accommodate semi-structured data that could vary in field
size, data type and columns. Additionally, the layout of the data model
makes it easier to partition the data and distribute it across the
cluster. The Data Model in HBase is made of different logical components
such as Tables, Rows, Column Families, Columns, Cells and Versions.
Tables – The
HBase Tables are more like logical collection of rows stored in separate
partitions called Regions. As shown above, every Region is then served
by exactly one Region Server. The figure above shows a representation of
a Table.
Rows – A row is
one instance of data in a table and is identified by a rowkey. Rowkeys
are unique in a Table and are always treated as a byte[].
Column Families –
Data in a row are grouped together as Column Families. Each Column
Family has one more Columns and these Columns in a family are stored
together in a low level storage file known as HFile. Column Families
form the basic unit of physical storage to which certain HBase features
like compression are applied. Hence it’s important that proper care be
taken when designing Column Families in table. The table above shows
Customer and Sales Column Families. The Customer Column Family is made
up 2 columns – Name and City, whereas the Sales Column Families is made
up to 2 columns – Product and Amount.
Columns – A
Column Family is made of one or more columns. A Column is identified by a
Column Qualifier that consists of the Column Family name concatenated
with the Column name using a colon – example: columnfamily:columnname.
There can be multiple Columns within a Column Family and Rows within a
table can have varied number of Columns.
Cell – A Cell
stores data and is essentially a unique combination of rowkey, Column
Family and the Column (Column Qualifier). The data stored in a Cell is
called its value and the data type is always treated as byte[].
Version – The
data stored in a cell is versioned and versions of data are identified
by the timestamp. The number of versions of data retained in a column
family is configurable and this value by default is 3.
Apache HBase
HBase is an open source, non-relational, distributed database
modeled after Google's BigTable and is written in Java. It is developed
as part of Apache Software Foundation's Apache Hadoop project and runs
on top of HDFS (Hadoop Distributed Filesystem), providing BigTable-like
capabilities for Hadoop.
HBase features compression, in-memory operation, and Bloom filters on a per-column basis as outlined in the original BigTable paper. Tables in HBase can serve as the input and output for MapReduce jobs run in Hadoop, and may be accessed through the Java API but also through REST, Avro or Thrift gateway APIs.
What is HBase?
HBase is a column-oriented database management system that runs on top of HDFS. It is well suited for sparse data sets, which are common in many big data use cases. Unlike relational database systems, HBase does not support a structured query language like SQL; in fact, HBase isn’t a relational data store at all. HBase applications are written in Java much like a typical MapReduce application. HBase does support writing applications in Avro, REST, and Thrift.
An HBase system comprises a set of tables. Each table contains rows and columns, much like a traditional database. Each table must have an element defined as a Primary Key, and all access attempts to HBase tables must use this Primary Key. An HBase column represents an attribute of an object; for example, if the table is storing diagnostic logs from servers in your environment, where each row might be a log record, a typical column in such a table would be the timestamp of when the log record was written, or perhaps the server name where the record originated. In fact, HBase allows for many attributes to be grouped together into what are known as column families, such that the elements of a column family are all stored together. This is different from a row-oriented relational database, where all the columns of a given row are stored together. With HBase you must predefine the table schema and specify the column families. However, it’s very flexible in that new columns can be added to families at any time, making the schema flexible and therefore able to adapt to changing application requirements.
Just as HDFS has a NameNode and slave nodes, and MapReduce has JobTracker and TaskTracker slaves, HBase is built on similar concepts. In HBase a master node manages the cluster and region servers store portions of the tables and perform the work on the data. In the same way HDFS has some enterprise concerns due to the availability of the NameNode , HBase is also sensitive to the loss of its master node.
What is NoSQL DataBase?
A NoSQL database provides a mechanism for storage and retrieval of data that is modeled in means other than the tabular relations used in relational databases. Motivations for this approach include simplicity of design, horizontal scaling and finer control over availability. NoSQL databases are often highly optimized key–value stores intended primarily for simple retrieval and appending operations, whereas an RDBMS is intended as a general purpose data store. There will thus be some operations where NoSQL is faster and some where an RDBMS is faster. NoSQL databases are finding significant and growing industry use in big data and real-time web applications.[1] NoSQL systems are also referred to as "Not only SQL" to emphasize that they may in fact allow SQL-like query languages to be used.
HBase features compression, in-memory operation, and Bloom filters on a per-column basis as outlined in the original BigTable paper. Tables in HBase can serve as the input and output for MapReduce jobs run in Hadoop, and may be accessed through the Java API but also through REST, Avro or Thrift gateway APIs.
What is HBase?
HBase is a column-oriented database management system that runs on top of HDFS. It is well suited for sparse data sets, which are common in many big data use cases. Unlike relational database systems, HBase does not support a structured query language like SQL; in fact, HBase isn’t a relational data store at all. HBase applications are written in Java much like a typical MapReduce application. HBase does support writing applications in Avro, REST, and Thrift.
An HBase system comprises a set of tables. Each table contains rows and columns, much like a traditional database. Each table must have an element defined as a Primary Key, and all access attempts to HBase tables must use this Primary Key. An HBase column represents an attribute of an object; for example, if the table is storing diagnostic logs from servers in your environment, where each row might be a log record, a typical column in such a table would be the timestamp of when the log record was written, or perhaps the server name where the record originated. In fact, HBase allows for many attributes to be grouped together into what are known as column families, such that the elements of a column family are all stored together. This is different from a row-oriented relational database, where all the columns of a given row are stored together. With HBase you must predefine the table schema and specify the column families. However, it’s very flexible in that new columns can be added to families at any time, making the schema flexible and therefore able to adapt to changing application requirements.
Just as HDFS has a NameNode and slave nodes, and MapReduce has JobTracker and TaskTracker slaves, HBase is built on similar concepts. In HBase a master node manages the cluster and region servers store portions of the tables and perform the work on the data. In the same way HDFS has some enterprise concerns due to the availability of the NameNode , HBase is also sensitive to the loss of its master node.
What is NoSQL DataBase?
A NoSQL database provides a mechanism for storage and retrieval of data that is modeled in means other than the tabular relations used in relational databases. Motivations for this approach include simplicity of design, horizontal scaling and finer control over availability. NoSQL databases are often highly optimized key–value stores intended primarily for simple retrieval and appending operations, whereas an RDBMS is intended as a general purpose data store. There will thus be some operations where NoSQL is faster and some where an RDBMS is faster. NoSQL databases are finding significant and growing industry use in big data and real-time web applications.[1] NoSQL systems are also referred to as "Not only SQL" to emphasize that they may in fact allow SQL-like query languages to be used.
HBase Architecture
Just like in a Relational Database, data in HBase is stored in Tables and these Tables are stored in Regions. When a Table becomes too big, the Table is partitioned into multiple Regions. These Regions are assigned to Region Servers across the cluster. Each Region Server hosts roughly the same number of Regions.
The HMaster in the HBase is responsible for
- Performing Administration
- Managing and Monitoring the Cluster
- Assigning Regions to the Region Servers
- Controlling the Load Balancing and Failover
On the other hand, the HRegionServer perform the following work
- Hosting and managing Regions
- Splitting the Regions automatically
- Handling the read/write requests
- Communicating with the Clients directly
Each Region Server contains a Write-Ahead Log (called HLog) and multiple
Regions. Each Region in turn is made up of a MemStore and multiple
StoreFiles (HFile). The data lives in these StoreFiles in the form of
Column Families (explained below). The MemStore holds in-memory
modifications to the Store (data).
The mapping of Regions to Region Server is kept in a system table called
.META. When trying to read or write data from HBase, the clients read
the required Region information from the .META table and directly
communicate with the appropriate Region Server. Each Region is
identified by the start key (inclusive) and the end key (exclusive)
HBase Tables and Regions
Table is made up of any number of regions.
Region is specified by its startKey and endKey.
- Empty table: (Table, NULL, NULL)
- Two-region table: (Table, NULL, “com.ABC.www”) and (Table, “com.ABC.www”, NULL)
Each region may live on a different node and is made up of several HDFS
files and blocks, each of which is replicated by Hadoop. HBase uses HDFS
as its reliable storage layer.It Handles checksums, replication,
failover
HBase Tables:
- Tables are sorted by Row in lexicographical order
- Table schema only defines its column families
- Each family consists of any number of columns
- Each column consists of any number of versions
- Columns only exist when inserted, NULLs are free
- Columns within a family are sorted and stored together
- Everything except table names are byte[]
- Hbase Table format (Row, Family:Column, Timestamp) -> Value
Hbase consists of,
- Java API, Gateway for REST, Thrift, Avro
- Master manages cluster
- RegionServer manage data
- ZooKeeper is used the “neural network” and coordinates cluster
Data is stored in memory and flushed to disk on regular intervals or based on size
- Small flushes are merged in the background to keep number of files small
- Reads read memory stores first and then disk based files second
- Deletes are handled with “tombstone” markers
MemStores:
After data is written to the WAL the RegionServer saves KeyValues in memory store
- Flush to disk based on size, is hbase.hregion.memstore.flush.size
- Default size is 64MB
- Uses snapshot mechanism to write flush to disk while still serving from it and accepting new data at the same time
Compactions:
Two types: Minor and Major Compactions
Minor Compactions
- Combine last “few” flushes
- Triggered by number of storage files
Major Compactions
- Rewrite all storage files
- Drop deleted data and those values exceeding TTL and/or number of versions
Key Cardinality:
The best performance is gained from using row keys
Fold, Store, and Shift:
All values are stored with the full coordinates,including: Row Key, Column Family, Column Qualifier, and Timestamp
- Folds columns into “row per column”
- NULLs are cost free as nothing is stored
- Versions are multiple “rows” in folded table
Hbase Data model
Hbase Data model - These six concepts form the foundation of HBase.
Table: HBase organizes data into tables. Table names are Strings and composed of characters that are safe for use in a file system path.
Row: Within a table, data is stored according to its row. Rows are identified uniquely by their rowkey. Rowkeys don’t have a data type and are always treated as a byte[].
Column family: Data within a row is grouped by column family. Column families also impact the physical arrangement of data stored in HBase. For this reason,they must be
defined up front and aren’t easily modified. Every row in a table has the same column families, although a row need not store data in all its families.Column family names are Strings and composed of characters that are safe for use in a file system path.
Column qualifier: Data within a column family is addressed via its column qualifier,or column. Column qualifiers need not be specified in advance. Column qualifiers need not be consistent between rows. Like rowkeys, column qualifiers don’t have a data type and are always treated as a byte[].
Cell: A combination of rowkey, column family, and column qualifier uniquely identifies a cell. The data stored in a cell is referred to as that cell’s value. Values also don’t have a data type and are always treated as a byte[].
Version: Values within a cell are versioned. Versions are identified by their timestamp,a long. When a version isn’t specified, the current timestamp is used as the
basis for the operation. The number of cell value versions retained by HBase is configured via the column family. The default number of cell versions is three. Versions stored in decreasing order of timestamp.
Table: HBase organizes data into tables. Table names are Strings and composed of characters that are safe for use in a file system path.
Row: Within a table, data is stored according to its row. Rows are identified uniquely by their rowkey. Rowkeys don’t have a data type and are always treated as a byte[].
Column family: Data within a row is grouped by column family. Column families also impact the physical arrangement of data stored in HBase. For this reason,they must be
defined up front and aren’t easily modified. Every row in a table has the same column families, although a row need not store data in all its families.Column family names are Strings and composed of characters that are safe for use in a file system path.
Column qualifier: Data within a column family is addressed via its column qualifier,or column. Column qualifiers need not be specified in advance. Column qualifiers need not be consistent between rows. Like rowkeys, column qualifiers don’t have a data type and are always treated as a byte[].
Cell: A combination of rowkey, column family, and column qualifier uniquely identifies a cell. The data stored in a cell is referred to as that cell’s value. Values also don’t have a data type and are always treated as a byte[].
Version: Values within a cell are versioned. Versions are identified by their timestamp,a long. When a version isn’t specified, the current timestamp is used as the
basis for the operation. The number of cell value versions retained by HBase is configured via the column family. The default number of cell versions is three. Versions stored in decreasing order of timestamp.
HBase Shell Commands
Show the current hbase user. Example:
hbase> whoami
Alter column family schema; pass table name and a dictionary specifying new column family schema. Dictionaries are described below in the GENERAL NOTES section. Dictionary must include name
of column family to alter. For example,
To change or add the 'f1' column family in table 't1' from defaults to instead keep a maximum of 5 cell VERSIONS, do:
hbase> alter 't1', {NAME => 'f1', VERSIONS => 5}
To delete the 'f1' column family in table 't1', do:
hbase> alter 't1', {NAME => 'f1', METHOD => 'delete'}
You can also change table-scope attributes like MAX_FILESIZE
MEMSTORE_FLUSHSIZE and READONLY.
For example, to change the max size of a family to 128MB, do:
hbase> alter 't1', {METHOD => 'table_att', MAX_FILESIZE => '134217728'}
count:
Count the number of rows in a table. This operation may take a LONG time (Run '$HADOOP_HOME/bin/hadoop jar hbase.jar rowcount' to run a counting mapreduce job). Current count is shown every 1000 rows by default. Count interval may be optionally specified. Examples:
hbase> count 't1'
hbase> count 't1', 100000
hbase> t.count INTERVAL => 100000
hbase> t.count CACHE => 1000
hbase> t.count INTERVAL => 10, CACHE => 1000
create:
Create table; pass table name, a dictionary of specifications per column
family, and optionally a dictionary of table configuration.
Dictionaries are described below in the GENERAL NOTES section.
Examples:
hbase> create 't1', {NAME => 'f1', VERSIONS => 5}
hbase> create 't1', {NAME => 'f1'}, {NAME => 'f2'}, {NAME => 'f3'}
hbase> # The above in shorthand would be the following:
hbase> create 't1', 'f1', 'f2', 'f3'
hbase> create 't1', {NAME => 'f1', VERSIONS => 1, TTL => 2592000, \
BLOCKCACHE => true}
describe:
Describe the named table
Example:
e.g. "hbase> describe 't1'"
e.g. "hbase> describe 't1'"
delete:
Put a delete cell value at specified table/row/column and optionally
timestamp coordinates. Deletes must match the deleted cell's
coordinates exactly. When scanning, a delete cell suppresses older
versions. Takes arguments like the 'put' command described below
Example:
hbase> delete ‘t1′, ‘r1′, ‘c1′, ts1
hbase> delete ‘t1′, ‘r1′, ‘c1′, ts1
deleteall:
Delete all cells in a given row; pass a table name, row, and optionally a column and timestamp
Delete all cells in a given row; pass a table name, row, and optionally a column and timestamp. Examples:
hbase> deleteall ‘t1′, ‘r1′
hbase> deleteall ‘t1′, ‘r1′, ‘c1′
hbase> deleteall ‘t1′, ‘r1′, ‘c1′, ts1
The same commands also can be run on a table reference. Suppose you had a reference t to table ‘t1′, the corresponding command would be: Example:
hbase> t.deleteall ‘r1′
hbase> t.deleteall ‘r1′, ‘c1′
hbase> t.deleteall ‘r1′, ‘c1′, ts1
Delete all cells in a given row; pass a table name, row, and optionally a column and timestamp. Examples:
hbase> deleteall ‘t1′, ‘r1′
hbase> deleteall ‘t1′, ‘r1′, ‘c1′
hbase> deleteall ‘t1′, ‘r1′, ‘c1′, ts1
The same commands also can be run on a table reference. Suppose you had a reference t to table ‘t1′, the corresponding command would be: Example:
hbase> t.deleteall ‘r1′
hbase> t.deleteall ‘r1′, ‘c1′
hbase> t.deleteall ‘r1′, ‘c1′, ts1
disable:
Disable the named table:
Example:
e.g. "hbase> disable 't1'"
disable_all:
Disable all of tables matching the given regex Example:
hbase> disable_all ‘t.*’
e.g. "hbase> disable 't1'"
disable_all:
Disable all of tables matching the given regex Example:
hbase> disable_all ‘t.*’
drop:
Drop the named table. Table must first be disabled.
Example:
hbase> drop ‘t1′
drop_all:
Drop all of the tables matching the given regex Example:
hbase> drop_all ‘t.*’
hbase> drop ‘t1′
drop_all:
Drop all of the tables matching the given regex Example:
hbase> drop_all ‘t.*’
enable:
Enable the named table
Example:
hbase> enable ‘t1′
enable_all:
Enable all of the tables matching the given regex Example:
hbase> enable_all ‘t.*’
is_enabled:
verifies Is named table enabled Example:
hbase> is_enabled ‘t1′
hbase> enable ‘t1′
enable_all:
Enable all of the tables matching the given regex Example:
hbase> enable_all ‘t.*’
is_enabled:
verifies Is named table enabled Example:
hbase> is_enabled ‘t1′
exists:
Does the named table exist?
Example:
e.g. "hbase> exists 't1'"
e.g. "hbase> exists 't1'"
exit:
Type "hbase> exit" to leave the HBase Shell
get:
Get row or cell contents; pass table name, row, and optionally a dictionary of column(s), timestamp and versions.
Examples:
hbase> get 't1', 'r1'
hbase> get 't1', 'r1', {COLUMN => 'c1'}
hbase> get 't1', 'r1', {COLUMN => ['c1', 'c2', 'c3']}
hbase> get 't1', 'r1', {COLUMN => 'c1', TIMESTAMP => ts1}
hbase> get 't1', 'r1', {COLUMN => 'c1', TIMESTAMP => ts1, \
VERSIONS => 4}
list:
List all tables in hbase
Example:
hbase> list
hbase> list ‘abc.*’
hbase> list
hbase> list ‘abc.*’
put:
Put a cell 'value' at specified table/row/column and optionally
timestamp coordinates. To put a cell value into table 't1' at row 'r1'
under column 'c1' marked with the time 'ts1', do:
Example:
hbase> put 't1', 'r1', 'c1', 'value', ts1
tools:
Listing of hbase surgery tools
scan:
Scan a table; pass table name and optionally a dictionary of scanner
specifications. Scanner specifications may include one or more of the
following: LIMIT, STARTROW, STOPROW, TIMESTAMP, or COLUMNS. If no
columns are specified, all columns will be scanned. To scan all members
of a column family, leave the qualifier empty as in 'col_family:'.
Examples:
hbase> scan '.META.'
hbase> scan '.META.', {COLUMNS => 'info:regioninfo'}
hbase> scan 't1', {COLUMNS => ['c1', 'c2'], LIMIT => 10, \
STARTROW => 'xyz'}
For experts, there is an additional option -- CACHE_BLOCKS -- which
switches block caching for the scanner on (true) or off (false). By
default it is enabled.
Examples:
hbase> scan 't1', {COLUMNS => ['c1', 'c2'], CACHE_BLOCKS => false}
status:
Show cluster status. Can be 'summary', 'simple', or 'detailed'. The default is 'summary'.
Examples:
hbase> status
hbase> status 'simple'
hbase> status 'summary'
hbase> status 'detailed'
shutdown:
Shut down the cluster.
truncate:
Disables, drops and recreates the specified table.
Example:
hbase>truncate ‘t1′
hbase>truncate ‘t1′
version:
Output this HBase version
Example:
hbase> version
show_filters:
Show all the filters in hbase. Example:
hbase> show_filters
hbase> version
show_filters:
Show all the filters in hbase. Example:
hbase> show_filters
alter_status:
Get the status of the alter command. Indicates the number of regions of
the table that have received the updated schema Pass table name.
Example:
hbase> alter_status ‘t1′
flush:
Flush all regions in passed table or pass a region row to flush an individual region.
Example:
hbase> flush ‘TABLENAME’
hbase> flush ‘REGIONNAME’
major_compact:
Run major compaction on passed table or pass a region row to major
compact an individual region. To compact a single column family within a
region specify the region name followed by the column family name.
Examples:
Compact all regions in a table:
hbase> major_compact ‘t1′
Compact an entire region:
hbase> major_compact ‘r1′
Compact a single column family within a region:
hbase> major_compact ‘r1′, ‘c1′
Compact a single column family within a table:
hbase> major_compact ‘t1′, ‘c1′
split:
Split entire table or pass a region to split individual region. With the
second parameter, you can specify an explicit split key for the region.
Examples:
hbase>split ‘tableName’
hbase>split ‘regionName’ # format: ‘tableName,startKey,id’
hbase>split ‘tableName’, ‘splitKey’
hbase>split ‘regionName’, ‘splitKey’
zk_dump:
Dump status of HBase cluster as seen by ZooKeeper.
Example:
hbase>zk_dump
start_replication:
Restarts all the replication features. The state in which each stream starts in is undetermined.
WARNING: start/stop replication is only meant to be used in critical load situations.
Examples:
hbase> start_replication
stop_replication:
Stops all the replication features. The state in which each stream stops in is undetermined.
WARNING: start/stop replication is only meant to be used in critical load situations.
Examples:
hbase> stop_replication
grant:
Grant users specific rights.
Syntax : grant permissions is either zero or more letters from the set “RWXCA”.
READ(‘R’), WRITE(‘W’), EXEC(‘X’), CREATE(‘C’), ADMIN(‘A’)
Example:
hbase> grant ‘bobsmith’, ‘RWXCA’
hbase> grant ‘bobsmith’, ‘RW’, ‘t1′, ‘f1′, ‘col1′
revoke:
Revoke a user’s access rights.
Syntax : revoke
Example:
hbase> revoke ‘bobsmith’, ‘t1′, ‘f1′, ‘col1′
user_permission:
Show all permissions for the particular user.
Syntax : user_permission
Example:
hbase> user_permission
hbase> user_permission ‘table1′
HBase Shell Commands
Show the current hbase user. Example:
hbase> whoami
Alter column family schema; pass table name and a dictionary specifying new column family schema. Dictionaries are described below in the GENERAL NOTES section. Dictionary must include name
of column family to alter. For example,
To change or add the 'f1' column family in table 't1' from defaults to instead keep a maximum of 5 cell VERSIONS, do:
hbase> alter 't1', {NAME => 'f1', VERSIONS => 5}
To delete the 'f1' column family in table 't1', do:
hbase> alter 't1', {NAME => 'f1', METHOD => 'delete'}
You can also change table-scope attributes like MAX_FILESIZE
MEMSTORE_FLUSHSIZE and READONLY.
For example, to change the max size of a family to 128MB, do:
hbase> alter 't1', {METHOD => 'table_att', MAX_FILESIZE => '134217728'}
count:
Count the number of rows in a table. This operation may take a LONG time (Run '$HADOOP_HOME/bin/hadoop jar hbase.jar rowcount' to run a counting mapreduce job). Current count is shown every 1000 rows by default. Count interval may be optionally specified. Examples:
hbase> count 't1'
hbase> count 't1', 100000
hbase> t.count INTERVAL => 100000
hbase> t.count CACHE => 1000
hbase> t.count INTERVAL => 10, CACHE => 1000
create:
Create table; pass table name, a dictionary of specifications per column
family, and optionally a dictionary of table configuration.
Dictionaries are described below in the GENERAL NOTES section.
Examples:
hbase> create 't1', {NAME => 'f1', VERSIONS => 5}
hbase> create 't1', {NAME => 'f1'}, {NAME => 'f2'}, {NAME => 'f3'}
hbase> # The above in shorthand would be the following:
hbase> create 't1', 'f1', 'f2', 'f3'
hbase> create 't1', {NAME => 'f1', VERSIONS => 1, TTL => 2592000, \
BLOCKCACHE => true}
describe:
Describe the named table
Example:
e.g. "hbase> describe 't1'"
e.g. "hbase> describe 't1'"
delete:
Put a delete cell value at specified table/row/column and optionally
timestamp coordinates. Deletes must match the deleted cell's
coordinates exactly. When scanning, a delete cell suppresses older
versions. Takes arguments like the 'put' command described below
Example:
hbase> delete ‘t1′, ‘r1′, ‘c1′, ts1
hbase> delete ‘t1′, ‘r1′, ‘c1′, ts1
deleteall:
Delete all cells in a given row; pass a table name, row, and optionally a column and timestamp
Delete all cells in a given row; pass a table name, row, and optionally a column and timestamp. Examples:
hbase> deleteall ‘t1′, ‘r1′
hbase> deleteall ‘t1′, ‘r1′, ‘c1′
hbase> deleteall ‘t1′, ‘r1′, ‘c1′, ts1
The same commands also can be run on a table reference. Suppose you had a reference t to table ‘t1′, the corresponding command would be: Example:
hbase> t.deleteall ‘r1′
hbase> t.deleteall ‘r1′, ‘c1′
hbase> t.deleteall ‘r1′, ‘c1′, ts1
Delete all cells in a given row; pass a table name, row, and optionally a column and timestamp. Examples:
hbase> deleteall ‘t1′, ‘r1′
hbase> deleteall ‘t1′, ‘r1′, ‘c1′
hbase> deleteall ‘t1′, ‘r1′, ‘c1′, ts1
The same commands also can be run on a table reference. Suppose you had a reference t to table ‘t1′, the corresponding command would be: Example:
hbase> t.deleteall ‘r1′
hbase> t.deleteall ‘r1′, ‘c1′
hbase> t.deleteall ‘r1′, ‘c1′, ts1
disable:
Disable the named table:
Example:
e.g. "hbase> disable 't1'"
disable_all:
Disable all of tables matching the given regex Example:
hbase> disable_all ‘t.*’
e.g. "hbase> disable 't1'"
disable_all:
Disable all of tables matching the given regex Example:
hbase> disable_all ‘t.*’
drop:
Drop the named table. Table must first be disabled.
Example:
hbase> drop ‘t1′
drop_all:
Drop all of the tables matching the given regex Example:
hbase> drop_all ‘t.*’
hbase> drop ‘t1′
drop_all:
Drop all of the tables matching the given regex Example:
hbase> drop_all ‘t.*’
enable:
Enable the named table
Example:
hbase> enable ‘t1′
enable_all:
Enable all of the tables matching the given regex Example:
hbase> enable_all ‘t.*’
is_enabled:
verifies Is named table enabled Example:
hbase> is_enabled ‘t1′
hbase> enable ‘t1′
enable_all:
Enable all of the tables matching the given regex Example:
hbase> enable_all ‘t.*’
is_enabled:
verifies Is named table enabled Example:
hbase> is_enabled ‘t1′
exists:
Does the named table exist?
Example:
e.g. "hbase> exists 't1'"
e.g. "hbase> exists 't1'"
exit:
Type "hbase> exit" to leave the HBase Shell
get:
Get row or cell contents; pass table name, row, and optionally a dictionary of column(s), timestamp and versions.
Examples:
hbase> get 't1', 'r1'
hbase> get 't1', 'r1', {COLUMN => 'c1'}
hbase> get 't1', 'r1', {COLUMN => ['c1', 'c2', 'c3']}
hbase> get 't1', 'r1', {COLUMN => 'c1', TIMESTAMP => ts1}
hbase> get 't1', 'r1', {COLUMN => 'c1', TIMESTAMP => ts1, \
VERSIONS => 4}
list:
List all tables in hbase
Example:
hbase> list
hbase> list ‘abc.*’
hbase> list
hbase> list ‘abc.*’
put:
Put a cell 'value' at specified table/row/column and optionally
timestamp coordinates. To put a cell value into table 't1' at row 'r1'
under column 'c1' marked with the time 'ts1', do:
Example:
hbase> put 't1', 'r1', 'c1', 'value', ts1
tools:
Listing of hbase surgery tools
scan:
Scan a table; pass table name and optionally a dictionary of scanner
specifications. Scanner specifications may include one or more of the
following: LIMIT, STARTROW, STOPROW, TIMESTAMP, or COLUMNS. If no
columns are specified, all columns will be scanned. To scan all members
of a column family, leave the qualifier empty as in 'col_family:'.
Examples:
hbase> scan '.META.'
hbase> scan '.META.', {COLUMNS => 'info:regioninfo'}
hbase> scan 't1', {COLUMNS => ['c1', 'c2'], LIMIT => 10, \
STARTROW => 'xyz'}
For experts, there is an additional option -- CACHE_BLOCKS -- which
switches block caching for the scanner on (true) or off (false). By
default it is enabled.
Examples:
hbase> scan 't1', {COLUMNS => ['c1', 'c2'], CACHE_BLOCKS => false}
status:
Show cluster status. Can be 'summary', 'simple', or 'detailed'. The default is 'summary'.
Examples:
hbase> status
hbase> status 'simple'
hbase> status 'summary'
hbase> status 'detailed'
shutdown:
Shut down the cluster.
truncate:
Disables, drops and recreates the specified table.
Example:
hbase>truncate ‘t1′
hbase>truncate ‘t1′
version:
Output this HBase version
Example:
hbase> version
show_filters:
Show all the filters in hbase. Example:
hbase> show_filters
hbase> version
show_filters:
Show all the filters in hbase. Example:
hbase> show_filters
alter_status:
Get the status of the alter command. Indicates the number of regions of
the table that have received the updated schema Pass table name.
Example:
hbase> alter_status ‘t1′
flush:
Flush all regions in passed table or pass a region row to flush an individual region.
Example:
hbase> flush ‘TABLENAME’
hbase> flush ‘REGIONNAME’
major_compact:
Run major compaction on passed table or pass a region row to major
compact an individual region. To compact a single column family within a
region specify the region name followed by the column family name.
Examples:
Compact all regions in a table:
hbase> major_compact ‘t1′
Compact an entire region:
hbase> major_compact ‘r1′
Compact a single column family within a region:
hbase> major_compact ‘r1′, ‘c1′
Compact a single column family within a table:
hbase> major_compact ‘t1′, ‘c1′
split:
Split entire table or pass a region to split individual region. With the
second parameter, you can specify an explicit split key for the region.
Examples:
hbase>split ‘tableName’
hbase>split ‘regionName’ # format: ‘tableName,startKey,id’
hbase>split ‘tableName’, ‘splitKey’
hbase>split ‘regionName’, ‘splitKey’
zk_dump:
Dump status of HBase cluster as seen by ZooKeeper.
Example:
hbase>zk_dump
start_replication:
Restarts all the replication features. The state in which each stream starts in is undetermined.
WARNING: start/stop replication is only meant to be used in critical load situations.
Examples:
hbase> start_replication
stop_replication:
Stops all the replication features. The state in which each stream stops in is undetermined.
WARNING: start/stop replication is only meant to be used in critical load situations.
Examples:
hbase> stop_replication
grant:
Grant users specific rights.
Syntax : grant permissions is either zero or more letters from the set “RWXCA”.
READ(‘R’), WRITE(‘W’), EXEC(‘X’), CREATE(‘C’), ADMIN(‘A’)
Example:
hbase> grant ‘bobsmith’, ‘RWXCA’
hbase> grant ‘bobsmith’, ‘RW’, ‘t1′, ‘f1′, ‘col1′
revoke:
Revoke a user’s access rights.
Syntax : revoke
Example:
hbase> revoke ‘bobsmith’, ‘t1′, ‘f1′, ‘col1′
user_permission:
Show all permissions for the particular user.
Syntax : user_permission
Example:
hbase> user_permission
hbase> user_permission ‘table1′
HBase Examples
$hbase shell
List all the tables
hbase>list
Create HBase table with Normal Mode
hbase>create ‘cars’, ‘vi’
Let’s insert 3 column qualifies (make, model, year) and the associated values into the first row (row1).
hbase>put ‘cars’, ‘row1’, ‘vi:make’, ‘BMW’
hbase>put ‘cars’, ‘row1’, ‘vi:model’, ‘5 series’
hbase>put ‘cars’, ‘row1’, ‘vi:year’, ‘2012’
Now let’s add second row
hbase>put ‘cars’, ‘row2’, ‘vi:make’, ‘Ferari’
hbase>put ‘cars’, ‘row2’, ‘vi:model’, ‘e series’
hbase>put ‘cars’, ‘row2’, ‘vi:year’, ‘2012’
Now let’s add third row
hbase>put ‘cars’, ‘row3’, ‘vi:make’, ‘Honda’
hbase>put ‘cars’, ‘row3’, ‘vi:model’, ‘f series’
hbase>put ‘cars’, ‘row3’, ‘vi:year’, ‘2013’
Sacn the table
hbase>scan ‘cars’
The next scan we’ll run will limit our results to the make column qualifier.
hbase>scan ‘cars’, {COLUMNs=>[‘vi:make’]}
1 row to demonstrate how LIMIT works.
hbase>scan ‘cars’, {COLUMNS =>[‘vi:make’], LIMIT => 1}
We’ll start by getting all columns in row1.
hbase>get ‘cars’, ‘row1’
You should see output similar to:
COLUMN CELL
vi:make timestamp=1344817012999, value=bmw
vi:model timestamp=1344817020843, value=5 series
vi:year timestamp=1344817033611, value=2012
To get one specific column include the COLUMN option.
hbase>get ‘cars’, ‘row1’, {COLUMNS => ‘vi:make’}
You can also get two or more columns by passing an array of columns.
hbase>get ‘cars’, ‘row1’, {COLUMNS => [‘vi:make’, ‘vi:year’]}
Delete a cell (value)
hbase>delete ‘cars’, ‘row2’, ‘vi:year’
Let’s check that our delete worked
hbase>get ‘cars’, ‘row2’
You should see output that shows 2 columns.
COLUMN CELLvi:make timestamp=1344817104923, value=mercedes
vi:model timestamp=1344817115463, value=e class 2
row(s) in 0.0080 seconds
Disable and drop tables
>disable ‘cars’
>drop ‘cars’
Exit the table
>exit







Are you looking to earn cash from your websites by popup ads?
ReplyDeleteIn case you are, did you take a look at Ero Advertising?