Types of data
Summary
Structured Data
Definition: Highly organized data, typically stored in tables with rows and columns. It follows a strict schema.
Characteristics:
- Data is stored in a predefined format (schemas).
- Relational data with clear relationships (e.g., primary and foreign keys).
- Easily searchable using Structured Query Language (SQL).
Examples:
- Employee records (e.g., names, roles, salaries).
- Sales data in an e-commerce platform.
Use Cases:
- Banking systems to manage transactions.
- Inventory management in retail.
Storage:
- Stored in relational database management systems (RDBMS) like MySQL, PostgreSQL, or SQLite.
Semi-Structured Data
Definition: Data with a flexible structure that uses tags or markers (like XML or JSON) to organize information, but without a strict schema.
Characteristics:
- Partially organized but not as rigid as relational databases.
- Can be nested or hierarchical, meaning objects inside objects (i.e., JSON objects).
- Requires special tools for querying (e.g., XPath, JSONPath).
Examples:
- XML or JSON files for API responses.
- NoSQL databases like MongoDB.
Use Cases:
- Storing configuration files for software.
- Managing social media comments or chat logs.
Storage:
- Commonly stored in NoSQL databases or files like XML and JSON.
Unstructured Data
Definition: Data without any predefined schema or structure, making it more complex to process and analyze.
Characteristics:
- No uniform format.
- Often includes large binary objects (e.g., images, videos).
- Requires advanced tools for analysis (e.g., AI or ML).
Examples:
- Multimedia files (e.g., photos, music, videos).
- Raw text files or documents.
Use Cases:
- Analyzing customer reviews for sentiment.
- Content management systems for websites.
Storage:
- Stored in data lakes, distributed file systems, or object storage like Amazon S3.