The phrase "unstructured
data" usually refers to information that doesn't reside in a
traditional row-column database. As you might expect, it's the opposite
of structured data -- the data stored in fields in a database.
Unstructured data files
often include text and multimedia content. Examples include e-mail
messages, word processing documents, videos, photos, audio files,
presentations, webpages and many other kinds of business documents. Note
that while these sorts of files may have an internal structure, they
are still considered "unstructured" because the data they contain
doesn't fit neatly in a database.
Experts estimate that 80
to 90 percent of the data in any organization is unstructured. And the
amount of unstructured data in enterprises is growing significantly --
often many times faster than structured databases are growing.
Mining Unstructured Data
Many organizations believe that their unstructured data stores include
information that could help them make better business decisions.
Unfortunately, it's often very difficult to analyze unstructured data.
To help with the problem, organizations have turned to a number of
different software solutions designed to search unstructured data and
extract important information. The primary benefit of these tools is the
ability to glean actionable information that can help a business
succeed in a competitive environment.
Because the volume of unstructured data is growing so rapidly, many
enterprises also turn to technological solutions to help them better
manage and store their unstructured data. These can include hardware or
software solutions that enable them to make the most efficient use of
their available storage space.
Unstructured Data and 'Big Data'
As mentioned above, unstructured data is the opposite of structured
data. Structured data generally resides in a relational database, and as
a result, it is sometimes called "relational data." This type of data
can be easily mapped into pre-designed fields. For example, a database
designer may set up fields for phone numbers, zip codes and credit card
numbers that accept a certain number of digits. Structured data has been
or can be placed in fields like these. By contrast, unstructured data
is not relational and doesn't fit into these sorts of pre-defined data
models.
In addition to structured and unstructured data, there's also a third
category: semi-structured data. Semi-structured data is information that
doesn't reside in a relational database but that does have some
organizational properties that make it easier to analyze. Examples of
semi-structured data might include XML documents and NoSQL databases.
The term "big data" is closely associated with unstructured data. "Big
data" refers to extremely large datasets that are difficult to analyze
with traditional tools. Big data can include both structured and
unstructured data, but IDC estimates that 90 percent of big data is
unstructured data. Many of the tools designed to analyze big data can
handle unstructured data.
Unstructured Data Vendors
Numerous vendors offer
products designed to help companies analyze and manage their
unstructured data. They include the following:
- Attensity
- Clarabridge
- Evernote
- Greemplum
- Hitachi Data Systems
- HP
- IBM
- Infosys
- Intel
- Microsoft
- Oracle
- Parity Computing
- Pingar
- Provalis Research
- SAP
- SAS
- Sysomos
- Teradata
- Vertica
The open source
community has been particularly active in developing software that can
manage unstructured data, and many vendors offer paid products and
services related to these open source projects. Open source projects and
vendors related to the storage, management and analysis of unstructured
data include the following:
- CloverETL
- Gluster
- Hadoop
- HPCC
- Jaspersoft
- Palo BI Suite/Jedox
- Lucene
- MapReduce
- Pentaho
- RapidMiner/RapidAnalytics
- Solr
- SpagoBI
- Talend



No comments:
Post a Comment