DATABASE AND STORAGE
What is data vs database vs DBMS? Explain in simpler terms
1. Data
- Meaning: Data is just raw information — facts, numbers, or text that may or may not have meaning.
Example: John, 25, Bangalore.
On its own, that’s just some text — data.
2. Database
- Meaning: A database is an organized collection of data — stored in a structured way so it’s easy to find and manage.
- Think of it like: A digital filing cabinet where related data is grouped into tables.
Example: A “Students” database might have a table like this:
| Name | Age | City |
|---|---|---|
| John | 25 | Bangalore |
| Roy | 28 | Chennai |
3. DBMS (Database Management System)
- Meaning: A DBMS is the software that helps you create, manage, and use the database.
- It acts as a bridge between you (the user) and the data.
- It makes sure data is:
- Saved correctly
- Easy to retrieve
- Secure and consistent
Examples: MySQL, Oracle, SQL Server, PostgreSQL, MongoDB
🪄 Simple Analogy
| Concept | Real-world Example |
|---|---|
| Data | Individual books |
| Database | A well-organized library of books |
| DBMS | The librarian who helps you store, find, and manage the books |
🧠 In short:
DBMS = software that helps you manage the database efficiently
Data = raw facts
Database = organized storage of data
What exactly is a database, and why do products need one?
A database is a structured way to store and organize data so it can be retrieved, updated, or deleted quickly when needed.
Think of it as a giant, well-organised spreadsheet that applications use behind the scenes.
Example:
- When you log into Netflix, your username, password, and viewing history are fetched from a database.
- When you place an order on Swiggy, your delivery address, order items, and payment info are stored in one.
Why it matters for PMs:
Understanding what’s stored where helps you define what data is needed for features, reporting, and personalization.
Why use a database when we can just save information in files?
Saving data in simple files (like text or Excel) works for small projects, but it breaks down quickly as your app grows.
Databases, on the other hand, are optimized for speed, relationships, and scale.
| Files | Databases |
|---|---|
| Data stored linearly in folders. | Data stored in structured format (tables/collections). |
| Hard to search or update. | Can instantly search, filter, and update. |
| Not scalable for large data. | Can handle millions of users or records. |
Example: If you want to see all users from Bangalore — a file system would need to open and read each file manually.
A database can return the result in milliseconds.
How is data organized inside a database?
In a SQL database, data is organized in tables (like an Excel sheet).
Each row represents one record (e.g., one user), and each column represents a property (e.g., name, email).
In a NoSQL database, data is stored as collections of flexible “documents” — more like JSON objects — which don’t need fixed columns.
Example:
- SQL:
Userstable → columns:user_id,name,email,signup_date - NoSQL: User document →
{ "user_id": 1, "name": "Sujith", "city": "Bangalore" }
Why it matters for PMs:
If your product data is complex and changes frequently, NoSQL can be more flexible. If data is structured and interrelated, SQL works best.
What is a Database Management System (DBMS)?
A Database Management System (DBMS) is software that is used to manage, organize, and interact with databases. It provides an interface between the user (or an application) and the underlying data, making it easier to store, retrieve, update, and manipulate information efficiently.
A DBMS ensures data integrity, security, and consistency, while also managing concurrent access — meaning multiple users or systems can work with the same data without conflict.
In simple terms:
It acts as the “brain” behind how applications handle and access data.
Examples:
MySQL, PostgreSQL, Oracle, and Microsoft SQL Server are widely used DBMSs across industries.
What are the different types of DBMS?
DBMSs come in several types based on how they organize and relate data.
| Type | Description | Example Use Case |
|---|---|---|
| Hierarchical DBMS | Data is stored in a tree-like structure (parent-child relationships). | Old banking or file systems. |
| Network DBMS | Data can have multiple parent-child relationships (more flexible than hierarchical). | Telecom billing systems. |
| Relational DBMS (RDBMS) | Data stored in tables with rows and columns; uses SQL for operations. | Most modern apps — e-commerce, payments, social platforms. |
| Object-Oriented DBMS (OODBMS) | Stores data as objects (used in object-oriented programming). | CAD/CAM systems, multimedia databases. |
| NoSQL DBMS | Stores unstructured or semi-structured data (documents, key-value pairs). | Scalable apps, social media feeds, IoT systems. |
Note: Most modern applications use RDBMS or NoSQL because they are robust, scalable, and developer-friendly.
What is the difference between DBMS and RDBMS?
| Aspect | DBMS | RDBMS |
|---|---|---|
| Structure | Stores data as files or simple records. | Stores data in tables (rows & columns). |
| Relationships | No relationships between data. | Relationships are maintained using keys (primary & foreign). |
| Query Language | Doesn’t always use SQL. | Uses SQL to manage and query data. |
| Data Integrity | Harder to enforce. | Ensures integrity using constraints. |
| Examples | Microsoft Access, file-based systems. | MySQL, PostgreSQL, Oracle, SQL Server. |
In simple terms: RDBMS is a more advanced and structured form of DBMS, designed to handle complex data relationships efficiently.
When would you use SQL vs NoSQL for a new feature?
SQL vs NoSQL — The Core Idea
| Feature | SQL | NoSQL |
|---|---|---|
| Data structure | Structured (tables with rows & columns) | Flexible / unstructured (JSON, key-value, documents, graphs, etc.) |
| Schema | Fixed schema — predefined structure | Dynamic schema — can change easily |
| Scalability | Scales vertically (bigger server) | Scales horizontally (more servers) |
| Transactions | Strong consistency (ACID) | Often prioritizes performance and flexibility (BASE) |
| Examples | MySQL, PostgreSQL, Oracle | MongoDB, Cassandra, Firebase, DynamoDB |
When to Use SQL
Use SQL when your feature needs:
- Structured and relational data
→ e.g., Users, orders, products, payments
(Data fits neatly in tables with relationships between them.) - Strong consistency
→ e.g., Banking transactions, e-commerce checkout, inventory updates - Complex queries and joins
→ When you’ll need to combine or filter data from multiple tables - Clear, stable data model
→ Your data structure won’t change often
✅ Example feature:
Building a “Payment tracking system” — every transaction must be accurate and consistent → SQL.
When to Use NoSQL
Use NoSQL when your feature needs:
- Flexibility in data structure
→ Data formats vary or evolve often (e.g., different product attributes) - High scalability and speed
→ Need to handle large amounts of user activity or content quickly - Unstructured or semi-structured data
→ Like JSON, logs, user activity, or sensor data - Rapid development
→ Schema changes shouldn’t slow you down
✅ Example feature:
Building a “User activity feed” or “Product recommendation system” where data formats vary and must scale fast → NoSQL.
🧠 Simple Analogy
- SQL → Organized like an Excel sheet (rows/columns) — great when structure matters.
- NoSQL → Organized like a collection of folders or JSON files — great when flexibility and speed matter.
⚖️ In short:
| Scenario | Choose |
|---|---|
| Consistent, structured, relational data | ✅ SQL |
| Flexible, fast-changing, large-scale data | ✅ NoSQL |
What is a query?
A query is a request for data or information from a database.
Example:
SELECT * FROM Employees WHERE Department = 'HR';
What are the Primary Components of a DBMS?
A DBMS isn’t just a storage system — it has multiple components working together:
| Component | Purpose |
|---|---|
| Database Engine | Core service that stores, retrieves, and updates data. |
| Database Schema | Blueprint or structure defining tables, relationships, and data types. |
| Query Processor | Interprets and executes database queries (like SQL). |
| Transaction Manager | Ensures data accuracy and integrity during multiple simultaneous operations. |
| Storage Manager | Handles how data is physically stored and retrieved from disk. |
| Metadata Catalog | Stores information about data (like table names, columns, and types). |
| Database Access Layer / API | Allows applications to interact with the database (like through SQL queries or APIs). |
In simple words: The DBMS combines all these parts to ensure that data is stored efficiently, retrieved quickly, and remains safe and consistent.
What is a schema?
A schema is the blueprint or structure of a database that defines how data is organized — tables, fields, relationships, views, etc.
What is a Table in DBMS?
A table is a structured way of storing related data in rows and columns — like an Excel sheet.
Each table represents one entity or object type.
Example: A “Products” table might look like this:
| Product_ID | Name | Category | Price |
|---|---|---|---|
| 1 | Laptop | Electronics | ₹60,000 |
| 2 | Shoes | Fashion | ₹3,000 |
Why it matters: Tables make data easy to organize, query, and link with other tables (like linking “Products” with “Orders”).
What is a record in DBMS?
A record (or row) is a single entry in a table that contains related data items.
Example: A record in a “Students” table might include Name, Roll No, and Age.
What are Rows and Columns in a DBMS?
Rows (Records): Each row represents one specific item or entry.
Example: In a “Users” table, each row = one user.
Columns (Fields/Attributes): Each column represents a property or attribute of that item.
Example: “Name”, “Email”, and “Join Date” are columns in the “Users” table.
Visualization:
| ID | Name | |
|---|---|---|
| 1 | Sujith | sujith@email.com |
| 2 | Keerthana | keerthana@email.com |
Here,
Each column = user attribute.
Each row = a user record.
What is metadata in a database?
Metadata is “data about data.” It describes the structure — like table names, column names, data types, and relationships.
What is SQL, and what are its main types of commands?
SQL (Structured Query Language) is used to interact with relational databases.
Main types:
- DDL – Data Definition Language (CREATE, ALTER, DROP)
- DML – Data Manipulation Language (SELECT, INSERT, UPDATE, DELETE)
- TCL – Transaction Control Language (COMMIT, ROLLBACK, SAVEPOINT)
- DCL – Data Control Language (GRANT, REVOKE)
What’s the real difference between SQL and NoSQL, and when would you use each?
The main difference between SQL and NoSQL databases lies in how they structure, store, and scale data — and this directly affects how your product handles growth, flexibility, and performance.
SQL (Structured Query Language) Databases
- Structure: Data is stored in tables with predefined columns (like spreadsheets). Every record must follow the same schema.
- Relationships: Designed to handle complex relationships using joins (linking data across tables).
- Consistency: Strong consistency — great for data accuracy and transactional reliability.
- Scaling: Typically vertical scaling (add more power to one server).
- Use cases:
- Banking or payment systems (where accuracy matters).
- Order management or inventory tracking.
- Products requiring complex analytics or reports.
- Examples: MySQL, PostgreSQL, Oracle, SQL Server.
NoSQL (Not Only SQL) Databases
- Structure: Data is stored as documents, key-value pairs, wide-column, or graphs — meaning it can vary in structure.
- Flexibility: No fixed schema — you can easily add new fields without changing the entire structure.
- Scalability: Built for horizontal scaling (add more servers instead of upgrading one).
- Speed: Optimized for fast reads/writes in large-scale, dynamic systems.
- Use cases:
- Social media feeds, chat apps, or IoT data.
- Recommendation engines or product catalogs with flexible data.
- Systems that prioritize performance over strict data consistency.
- Examples: MongoDB, Firebase, Cassandra, DynamoDB.
In simple terms:
- Choose SQL when your data is structured, consistent, and needs strong relationships.
- Choose NoSQL when your data is flexible, rapidly changing, or needs to scale for millions of users.
Analogy: Think of SQL as a well-organized library — every book has a fixed place and follows a strict cataloging system.
NoSQL is more like a modern digital bookshelf — flexible, easy to add new content, and scales effortlessly as your collection grows.
What are CRUD operations?
CRUD stands for:
- Create → INSERT
- Read → SELECT
- Update → UPDATE
- Delete → DELETE
These are the basic data manipulation operations in a database.
What is database scaling, and how do vertical and horizontal scaling differ?
Database scaling is the process of modifying a database’s architecture or resources to handle an increasing load of data, users, and queries while maintaining performance, availability, and reliability.
As an application grows, the number of transactions, reads/writes, and stored data all increase — database scaling ensures the system continues to operate efficiently under this growth.
Why Do We Need Database Scaling?
Over time, your product’s usage can surge due to:
- More users signing up or interacting simultaneously.
- Increased volume of stored records (e.g., orders, messages, logs).
- Complex analytics or real-time queries.
- Global expansion requiring low latency for different regions.
Without scaling, databases can experience:
- Slow response times
- Query timeouts or failures
- Server crashes
Scaling ensures that as demand increases, performance remains consistent.
Two Main Types of Database Scaling
1. Vertical Scaling (Scale Up)
Vertical scaling means increasing the capacity of a single database server — by adding more CPU, memory (RAM), faster storage (SSD), or network bandwidth.
Example: Upgrading from a 4-core, 16GB RAM server to a 16-core, 64GB RAM machine.
✅ Advantages
- Simple to implement.
- No application code changes required.
- Works well for small to medium workloads.
❌ Disadvantages
- Hardware has physical limits.
- Expensive at large scales.
- Creates a single point of failure — if that machine goes down, your database goes offline.
Use Case: Early-stage startups or low-traffic applications.
2. Horizontal Scaling (Scale Out)
Horizontal scaling means adding more database servers to distribute data and queries across multiple machines.
This can be done in two main ways — replication and sharding.
Key Horizontal Scaling Strategies
A. Replication
Replication involves copying data from one database server (the “primary” or “master”) to one or more others (called “replicas” or “slaves”).
- Writes happen on the primary database.
- Reads can be distributed across replicas.
Example: A website with high read traffic (e.g., an e-commerce site showing product pages) can offload read queries to replicas, while the primary handles order transactions.
✅ Benefits
- Improved read performance.
- Fault tolerance — replicas can take over if the primary fails.
- Useful for geographically distributed users (read from the nearest replica).
❌ Challenges
- Data synchronization delays (replication lag).
- Writes are still limited to a single primary.
- Handling failover and consistency adds complexity.
B. Sharding (Data Partitioning)
Sharding means splitting a large database into smaller, independent pieces (shards), each containing a portion of the data. Each shard runs on its own server.
Example:
- Shard 1 stores users with IDs 1–100,000
- Shard 2 stores users with IDs 100,001–200,000
When a query comes in, it’s routed to the correct shard.
✅ Benefits
- Improves write scalability — each shard handles fewer writes.
- Reduces query load per server.
- Enables parallel processing across shards.
❌ Challenges
- Complex to design and maintain.
- Cross-shard queries are harder.
- Re-sharding (when data distribution changes) can be painful.
Use Case: Large-scale systems like Facebook, Twitter, or Amazon that handle massive data volumes.
Supporting Techniques for Scaling
- Caching:
- Store frequently accessed data in-memory using Redis or Memcached.
- Reduces read load on the database.
- Load Balancing:
- Distribute queries across replicas to avoid overloading one server.
- Connection Pooling:
- Reuse database connections to handle many simultaneous requests efficiently.
- Database Indexing & Query Optimization:
- Ensure queries are efficient before scaling horizontally.
SQL concepts & queries
What is SQL?
SQL (Structured Query Language) is used to store, manipulate, and retrieve data from relational databases.
📘 Types of SQL Commands
| Type | Full Form | Examples | Purpose |
|---|---|---|---|
| DDL | Data Definition Language | CREATE, ALTER, DROP, TRUNCATE | Defines structure of tables |
| DML | Data Manipulation Language | SELECT, INSERT, UPDATE, DELETE | Manages data in tables |
| DCL | Data Control Language | GRANT, REVOKE | Controls access and permissions |
| TCL | Transaction Control Language | COMMIT, ROLLBACK, SAVEPOINT | Manages transactions |
| DQL | Data Query Language | SELECT | Retrieves data from database |
Let’s assume you have a table called Employees:
| EmpID | Name | Department | Salary | City |
|---|---|---|---|---|
| 1 | Keerthana | HR | 50000 | Bangalore |
| 2 | Sujith | IT | 60000 | Chennai |
| 3 | Arjun | IT | 55000 | Hyderabad |
| 4 | Priya | Finance | 70000 | Delhi |
With this table, lets have a look at SQL queries.
SELECT — Fetch data
→ Shows all columns and rows in the table.
SELECT * FROM Employees;
→ Fetch specific columns
SELECT Name, City FROM Employees;
WHERE — Filter data
The WHERE clause is used in SQL to filter rows from a table based on specific conditions.
Only the rows that satisfy the condition are returned — all others are ignored.
Basic Syntax
SELECT column_names
FROM table_name
WHERE condition;
1. Using Equality (=)
SELECT * FROM Employees
WHERE Department = 'IT';
Meaning: Fetch all employees whose Department is exactly equal to “IT”.
✅ Output:
| EmpID | Name | Department | Salary | City |
|---|---|---|---|---|
| 2 | Sujith | IT | 60000 | Chennai |
| 3 | Arjun | IT | 55000 | Hyderabad |
2. Using Greater Than (>)
SELECT * FROM Employees
WHERE Salary > 55000;
Meaning: Returns all employees with a Salary greater than 55,000.
✅ Output:
| EmpID | Name | Salary |
|---|---|---|
| 2 | Sujith | 60000 |
| 4 | Priya | 70000 |
3. Using Less Than (<)
SELECT * FROM Employees
WHERE Salary < 55000;
Meaning: Returns employees with Salary less than 55,000.
✅ Output:
| EmpID | Name | Salary |
|---|---|---|
| 1 | Keerthana | 50000 |
| 5 | Kiran | 48000 |
4. Using Greater Than or Equal To (>=)
SELECT * FROM Employees
WHERE Salary >= 55000;
Meaning: Includes employees earning 55,000 or more.
✅ Output:
| EmpID | Name | Salary |
|---|---|---|
| 2 | Sujith | 60000 |
| 3 | Arjun | 55000 |
| 4 | Priya | 70000 |
5. Using Less Than or Equal To (<=)
SELECT * FROM Employees
WHERE Salary <= 50000;
Meaning: Includes employees earning 50,000 or less.
✅ Output:
| EmpID | Name | Salary |
|---|---|---|
| 1 | Keerthana | 50000 |
| 5 | Kiran | 48000 |
6. Using Not Equal (!= or <>)
Both != and <> mean “not equal to” (depends on the SQL dialect — both are supported in most systems).
SELECT * FROM Employees
WHERE Department != 'IT';
or
SELECT * FROM Employees
WHERE Department <> 'IT';
Meaning: Return employees not in the IT department.
✅ Output:
| EmpID | Name | Department |
|---|---|---|
| 1 | Keerthana | HR |
| 4 | Priya | Finance |
| 5 | Kiran | HR |
7. Combining Multiple Conditions
a) Using AND
SELECT * FROM Employees
WHERE Department = 'IT' AND Salary > 55000;
Meaning: Show employees who are in the IT department and earn more than 55,000.
→ Both conditions must be true.
✅ Output:
| Name | Department | Salary |
|---|---|---|
| Sujith | IT | 60000 |
b) Using OR
SELECT * FROM Employees
WHERE Department = 'IT' OR Salary > 65000;
Meaning: Show employees who are either in the IT department or earn more than 65,000.
→ Only one condition needs to be true.
✅ Output:
| Name | Department | Salary |
|---|---|---|
| Sujith | IT | 60000 |
| Arjun | IT | 55000 |
| Priya | Finance | 70000 |
c) Using NOT
SELECT * FROM Employees
WHERE NOT Department = 'HR';
Meaning: Exclude HR employees from the result.
✅ Output:
| Name | Department |
|---|---|
| Sujith | IT |
| Arjun | IT |
| Priya | Finance |
8. Using Multiple Comparisons Together
You can chain multiple operators:
SELECT * FROM Employees
WHERE Salary > 50000 AND Salary < 70000 AND Department = 'IT';
Meaning: Employees in the IT department earning between 50,000 and 70,000.
✅ Output:
| Name | Department | Salary |
|---|---|---|
| Sujith | IT | 60000 |
| Arjun | IT | 55000 |
NOTE: WHERE with Strings and Numbers
- For text values, use quotes:
'IT','Bangalore' - For numbers, no quotes:
Salary > 50000 - For dates, use quotes with date format:
WHERE JoiningDate > '2024-01-01'
🧠 In summary:
| Operator | Meaning | Example | Result |
|---|---|---|---|
= | Equal to | Department = 'IT' | Only IT employees |
> | Greater than | Salary > 55000 | Salary above 55k |
< | Less than | Salary < 55000 | Salary below 55k |
>= | Greater or equal | Salary >= 55000 | Salary 55k or more |
<= | Less or equal | Salary <= 50000 | Salary 50k or less |
!= / <> | Not equal | Department <> 'HR' | All except HR |
AND | Both must be true | Salary > 50000 AND Department='IT' | Filters by both |
OR | Either condition true | Department='HR' OR City='Delhi' | Broader filter |
NOT | Opposite of condition | NOT City='Pune' | Excludes Pune |
