DATABASE AND STORAGE

What is data vs database vs DBMS? Explain in simpler terms

1. Data

  • Meaning: Data is just raw information — facts, numbers, or text that may or may not have meaning.

Example: John, 25, Bangalore.

On its own, that’s just some text — data.


2. Database

  • Meaning: A database is an organized collection of data — stored in a structured way so it’s easy to find and manage.
  • Think of it like: A digital filing cabinet where related data is grouped into tables.

Example: A “Students” database might have a table like this:

NameAgeCity
John25Bangalore
Roy28Chennai

3. DBMS (Database Management System)

  • Meaning: A DBMS is the software that helps you create, manage, and use the database.
  • It acts as a bridge between you (the user) and the data.
  • It makes sure data is:
    • Saved correctly
    • Easy to retrieve
    • Secure and consistent

Examples: MySQL, Oracle, SQL Server, PostgreSQL, MongoDB


🪄 Simple Analogy

ConceptReal-world Example
DataIndividual books
DatabaseA well-organized library of books
DBMSThe librarian who helps you store, find, and manage the books

🧠 In short:

DBMS = software that helps you manage the database efficiently

Data = raw facts

Database = organized storage of data

What exactly is a database, and why do products need one?

A database is a structured way to store and organize data so it can be retrieved, updated, or deleted quickly when needed.
Think of it as a giant, well-organised spreadsheet that applications use behind the scenes.

Example:

  • When you log into Netflix, your username, password, and viewing history are fetched from a database.
  • When you place an order on Swiggy, your delivery address, order items, and payment info are stored in one.

Why it matters for PMs:
Understanding what’s stored where helps you define what data is needed for features, reporting, and personalization.

Why use a database when we can just save information in files?

Saving data in simple files (like text or Excel) works for small projects, but it breaks down quickly as your app grows.
Databases, on the other hand, are optimized for speed, relationships, and scale.

FilesDatabases
Data stored linearly in folders.Data stored in structured format (tables/collections).
Hard to search or update.Can instantly search, filter, and update.
Not scalable for large data.Can handle millions of users or records.

Example: If you want to see all users from Bangalore — a file system would need to open and read each file manually.
A database can return the result in milliseconds.

How is data organized inside a database?

In a SQL database, data is organized in tables (like an Excel sheet).
Each row represents one record (e.g., one user), and each column represents a property (e.g., name, email).

In a NoSQL database, data is stored as collections of flexible “documents” — more like JSON objects — which don’t need fixed columns.

Example:

  • SQL: Users table → columns: user_id, name, email, signup_date
  • NoSQL: User document → { "user_id": 1, "name": "Sujith", "city": "Bangalore" }

Why it matters for PMs:
If your product data is complex and changes frequently, NoSQL can be more flexible. If data is structured and interrelated, SQL works best.

What is a Database Management System (DBMS)?

A Database Management System (DBMS) is software that is used to manage, organize, and interact with databases. It provides an interface between the user (or an application) and the underlying data, making it easier to store, retrieve, update, and manipulate information efficiently.

A DBMS ensures data integrity, security, and consistency, while also managing concurrent access — meaning multiple users or systems can work with the same data without conflict.

In simple terms:
It acts as the “brain” behind how applications handle and access data.

Examples:
MySQL, PostgreSQL, Oracle, and Microsoft SQL Server are widely used DBMSs across industries.

What are the different types of DBMS?

DBMSs come in several types based on how they organize and relate data.

TypeDescriptionExample Use Case
Hierarchical DBMSData is stored in a tree-like structure (parent-child relationships).Old banking or file systems.
Network DBMSData can have multiple parent-child relationships (more flexible than hierarchical).Telecom billing systems.
Relational DBMS (RDBMS)Data stored in tables with rows and columns; uses SQL for operations.Most modern apps — e-commerce, payments, social platforms.
Object-Oriented DBMS (OODBMS)Stores data as objects (used in object-oriented programming).CAD/CAM systems, multimedia databases.
NoSQL DBMSStores unstructured or semi-structured data (documents, key-value pairs).Scalable apps, social media feeds, IoT systems.

Note: Most modern applications use RDBMS or NoSQL because they are robust, scalable, and developer-friendly.

What is the difference between DBMS and RDBMS?
AspectDBMSRDBMS
StructureStores data as files or simple records.Stores data in tables (rows & columns).
RelationshipsNo relationships between data.Relationships are maintained using keys (primary & foreign).
Query LanguageDoesn’t always use SQL.Uses SQL to manage and query data.
Data IntegrityHarder to enforce.Ensures integrity using constraints.
ExamplesMicrosoft Access, file-based systems.MySQL, PostgreSQL, Oracle, SQL Server.

In simple terms: RDBMS is a more advanced and structured form of DBMS, designed to handle complex data relationships efficiently.

When would you use SQL vs NoSQL for a new feature?

SQL vs NoSQL — The Core Idea

FeatureSQLNoSQL
Data structureStructured (tables with rows & columns)Flexible / unstructured (JSON, key-value, documents, graphs, etc.)
SchemaFixed schema — predefined structureDynamic schema — can change easily
ScalabilityScales vertically (bigger server)Scales horizontally (more servers)
TransactionsStrong consistency (ACID)Often prioritizes performance and flexibility (BASE)
ExamplesMySQL, PostgreSQL, OracleMongoDB, Cassandra, Firebase, DynamoDB

When to Use SQL

Use SQL when your feature needs:

  1. Structured and relational data
    → e.g., Users, orders, products, payments
    (Data fits neatly in tables with relationships between them.)
  2. Strong consistency
    → e.g., Banking transactions, e-commerce checkout, inventory updates
  3. Complex queries and joins
    → When you’ll need to combine or filter data from multiple tables
  4. Clear, stable data model
    → Your data structure won’t change often

Example feature:
Building a “Payment tracking system” — every transaction must be accurate and consistent → SQL.


When to Use NoSQL

Use NoSQL when your feature needs:

  1. Flexibility in data structure
    → Data formats vary or evolve often (e.g., different product attributes)
  2. High scalability and speed
    → Need to handle large amounts of user activity or content quickly
  3. Unstructured or semi-structured data
    → Like JSON, logs, user activity, or sensor data
  4. Rapid development
    → Schema changes shouldn’t slow you down

Example feature:
Building a “User activity feed” or “Product recommendation system” where data formats vary and must scale fast → NoSQL.


🧠 Simple Analogy

  • SQL → Organized like an Excel sheet (rows/columns) — great when structure matters.
  • NoSQL → Organized like a collection of folders or JSON files — great when flexibility and speed matter.

⚖️ In short:

ScenarioChoose
Consistent, structured, relational data✅ SQL
Flexible, fast-changing, large-scale data✅ NoSQL
What is a query?

A query is a request for data or information from a database.

Example:

SELECT * FROM Employees WHERE Department = 'HR';

What are the Primary Components of a DBMS?

A DBMS isn’t just a storage system — it has multiple components working together:

ComponentPurpose
Database EngineCore service that stores, retrieves, and updates data.
Database SchemaBlueprint or structure defining tables, relationships, and data types.
Query ProcessorInterprets and executes database queries (like SQL).
Transaction ManagerEnsures data accuracy and integrity during multiple simultaneous operations.
Storage ManagerHandles how data is physically stored and retrieved from disk.
Metadata CatalogStores information about data (like table names, columns, and types).
Database Access Layer / APIAllows applications to interact with the database (like through SQL queries or APIs).

In simple words: The DBMS combines all these parts to ensure that data is stored efficiently, retrieved quickly, and remains safe and consistent.

What is a schema?

A schema is the blueprint or structure of a database that defines how data is organized — tables, fields, relationships, views, etc.

What is a Table in DBMS?

A table is a structured way of storing related data in rows and columns — like an Excel sheet.
Each table represents one entity or object type.

Example: A “Products” table might look like this:

Product_IDNameCategoryPrice
1LaptopElectronics₹60,000
2ShoesFashion₹3,000

Why it matters: Tables make data easy to organize, query, and link with other tables (like linking “Products” with “Orders”).

What is a record in DBMS?

A record (or row) is a single entry in a table that contains related data items.

Example: A record in a “Students” table might include Name, Roll No, and Age.

What are Rows and Columns in a DBMS?

Rows (Records): Each row represents one specific item or entry.
Example: In a “Users” table, each row = one user.

Columns (Fields/Attributes): Each column represents a property or attribute of that item.
Example: “Name”, “Email”, and “Join Date” are columns in the “Users” table.

Visualization:

IDNameEmail
1Sujithsujith@email.com
2Keerthanakeerthana@email.com

Here,

Each column = user attribute.

Each row = a user record.

What is metadata in a database?

Metadata is “data about data.” It describes the structure — like table names, column names, data types, and relationships.

What is SQL, and what are its main types of commands?

SQL (Structured Query Language) is used to interact with relational databases.
Main types:

  • DDL – Data Definition Language (CREATE, ALTER, DROP)
  • DML – Data Manipulation Language (SELECT, INSERT, UPDATE, DELETE)
  • TCL – Transaction Control Language (COMMIT, ROLLBACK, SAVEPOINT)
  • DCL – Data Control Language (GRANT, REVOKE)
What’s the real difference between SQL and NoSQL, and when would you use each?

The main difference between SQL and NoSQL databases lies in how they structure, store, and scale data — and this directly affects how your product handles growth, flexibility, and performance.

SQL (Structured Query Language) Databases

  • Structure: Data is stored in tables with predefined columns (like spreadsheets). Every record must follow the same schema.
  • Relationships: Designed to handle complex relationships using joins (linking data across tables).
  • Consistency: Strong consistency — great for data accuracy and transactional reliability.
  • Scaling: Typically vertical scaling (add more power to one server).
  • Use cases:
    • Banking or payment systems (where accuracy matters).
    • Order management or inventory tracking.
    • Products requiring complex analytics or reports.
  • Examples: MySQL, PostgreSQL, Oracle, SQL Server.

NoSQL (Not Only SQL) Databases

  • Structure: Data is stored as documents, key-value pairs, wide-column, or graphs — meaning it can vary in structure.
  • Flexibility: No fixed schema — you can easily add new fields without changing the entire structure.
  • Scalability: Built for horizontal scaling (add more servers instead of upgrading one).
  • Speed: Optimized for fast reads/writes in large-scale, dynamic systems.
  • Use cases:
    • Social media feeds, chat apps, or IoT data.
    • Recommendation engines or product catalogs with flexible data.
    • Systems that prioritize performance over strict data consistency.
  • Examples: MongoDB, Firebase, Cassandra, DynamoDB.

In simple terms:

  • Choose SQL when your data is structured, consistent, and needs strong relationships.
  • Choose NoSQL when your data is flexible, rapidly changing, or needs to scale for millions of users.

Analogy: Think of SQL as a well-organized library — every book has a fixed place and follows a strict cataloging system.
NoSQL is more like a modern digital bookshelf — flexible, easy to add new content, and scales effortlessly as your collection grows.

What are CRUD operations?

CRUD stands for:

  • Create → INSERT
  • Read → SELECT
  • Update → UPDATE
  • Delete → DELETE

These are the basic data manipulation operations in a database.

What is database scaling, and how do vertical and horizontal scaling differ?

Database scaling is the process of modifying a database’s architecture or resources to handle an increasing load of data, users, and queries while maintaining performance, availability, and reliability.

As an application grows, the number of transactions, reads/writes, and stored data all increase — database scaling ensures the system continues to operate efficiently under this growth.

Why Do We Need Database Scaling?

Over time, your product’s usage can surge due to:

  • More users signing up or interacting simultaneously.
  • Increased volume of stored records (e.g., orders, messages, logs).
  • Complex analytics or real-time queries.
  • Global expansion requiring low latency for different regions.

Without scaling, databases can experience:

  • Slow response times
  • Query timeouts or failures
  • Server crashes

Scaling ensures that as demand increases, performance remains consistent.

Two Main Types of Database Scaling

1. Vertical Scaling (Scale Up)

Vertical scaling means increasing the capacity of a single database server — by adding more CPU, memory (RAM), faster storage (SSD), or network bandwidth.

Example: Upgrading from a 4-core, 16GB RAM server to a 16-core, 64GB RAM machine.

✅ Advantages

  • Simple to implement.
  • No application code changes required.
  • Works well for small to medium workloads.

❌ Disadvantages

  • Hardware has physical limits.
  • Expensive at large scales.
  • Creates a single point of failure — if that machine goes down, your database goes offline.

Use Case: Early-stage startups or low-traffic applications.

2. Horizontal Scaling (Scale Out)

Horizontal scaling means adding more database servers to distribute data and queries across multiple machines.

This can be done in two main ways — replication and sharding.

Key Horizontal Scaling Strategies

A. Replication

Replication involves copying data from one database server (the “primary” or “master”) to one or more others (called “replicas” or “slaves”).

  • Writes happen on the primary database.
  • Reads can be distributed across replicas.

Example: A website with high read traffic (e.g., an e-commerce site showing product pages) can offload read queries to replicas, while the primary handles order transactions.

✅ Benefits

  • Improved read performance.
  • Fault tolerance — replicas can take over if the primary fails.
  • Useful for geographically distributed users (read from the nearest replica).

❌ Challenges

  • Data synchronization delays (replication lag).
  • Writes are still limited to a single primary.
  • Handling failover and consistency adds complexity.

B. Sharding (Data Partitioning)

Sharding means splitting a large database into smaller, independent pieces (shards), each containing a portion of the data. Each shard runs on its own server.

Example:

  • Shard 1 stores users with IDs 1–100,000
  • Shard 2 stores users with IDs 100,001–200,000

When a query comes in, it’s routed to the correct shard.

✅ Benefits

  • Improves write scalability — each shard handles fewer writes.
  • Reduces query load per server.
  • Enables parallel processing across shards.

❌ Challenges

  • Complex to design and maintain.
  • Cross-shard queries are harder.
  • Re-sharding (when data distribution changes) can be painful.

Use Case: Large-scale systems like Facebook, Twitter, or Amazon that handle massive data volumes.

Supporting Techniques for Scaling

  1. Caching:
    • Store frequently accessed data in-memory using Redis or Memcached.
    • Reduces read load on the database.
  2. Load Balancing:
    • Distribute queries across replicas to avoid overloading one server.
  3. Connection Pooling:
    • Reuse database connections to handle many simultaneous requests efficiently.
  4. Database Indexing & Query Optimization:
    • Ensure queries are efficient before scaling horizontally.

SQL concepts & queries

What is SQL?

SQL (Structured Query Language) is used to store, manipulate, and retrieve data from relational databases.

📘 Types of SQL Commands

TypeFull FormExamplesPurpose
DDLData Definition LanguageCREATE, ALTER, DROP, TRUNCATEDefines structure of tables
DMLData Manipulation LanguageSELECT, INSERT, UPDATE, DELETEManages data in tables
DCLData Control LanguageGRANT, REVOKEControls access and permissions
TCLTransaction Control LanguageCOMMIT, ROLLBACK, SAVEPOINTManages transactions
DQLData Query LanguageSELECTRetrieves data from database

Let’s assume you have a table called Employees:

EmpIDNameDepartmentSalaryCity
1KeerthanaHR50000Bangalore
2SujithIT60000Chennai
3ArjunIT55000Hyderabad
4PriyaFinance70000Delhi

With this table, lets have a look at SQL queries.

SELECT — Fetch data

→ Shows all columns and rows in the table.

SELECT * FROM Employees;

→ Fetch specific columns

SELECT Name, City FROM Employees;
WHERE — Filter data

The WHERE clause is used in SQL to filter rows from a table based on specific conditions.
Only the rows that satisfy the condition are returned — all others are ignored.

Basic Syntax

SELECT column_names
FROM table_name
WHERE condition;

1. Using Equality (=)

SELECT * FROM Employees
WHERE Department = 'IT';

Meaning: Fetch all employees whose Department is exactly equal to “IT”.

Output:

EmpIDNameDepartmentSalaryCity
2SujithIT60000Chennai
3ArjunIT55000Hyderabad

2. Using Greater Than (>)

SELECT * FROM Employees
WHERE Salary > 55000;

Meaning: Returns all employees with a Salary greater than 55,000.

Output:

EmpIDNameSalary
2Sujith60000
4Priya70000

3. Using Less Than (<)

SELECT * FROM Employees
WHERE Salary < 55000;

Meaning: Returns employees with Salary less than 55,000.

Output:

EmpIDNameSalary
1Keerthana50000
5Kiran48000

4. Using Greater Than or Equal To (>=)

SELECT * FROM Employees
WHERE Salary >= 55000;

Meaning: Includes employees earning 55,000 or more.

Output:

EmpIDNameSalary
2Sujith60000
3Arjun55000
4Priya70000

5. Using Less Than or Equal To (<=)

SELECT * FROM Employees
WHERE Salary <= 50000;

Meaning: Includes employees earning 50,000 or less.

Output:

EmpIDNameSalary
1Keerthana50000
5Kiran48000

6. Using Not Equal (!= or <>)

Both != and <> mean “not equal to” (depends on the SQL dialect — both are supported in most systems).

SELECT * FROM Employees
WHERE Department != 'IT';

or

SELECT * FROM Employees
WHERE Department <> 'IT';

Meaning: Return employees not in the IT department.

Output:

EmpIDNameDepartment
1KeerthanaHR
4PriyaFinance
5KiranHR

7. Combining Multiple Conditions

a) Using AND

SELECT * FROM Employees
WHERE Department = 'IT' AND Salary > 55000;

Meaning: Show employees who are in the IT department and earn more than 55,000.
→ Both conditions must be true.

Output:

NameDepartmentSalary
SujithIT60000

b) Using OR

SELECT * FROM Employees
WHERE Department = 'IT' OR Salary > 65000;

Meaning: Show employees who are either in the IT department or earn more than 65,000.
→ Only one condition needs to be true.

Output:

NameDepartmentSalary
SujithIT60000
ArjunIT55000
PriyaFinance70000

c) Using NOT

SELECT * FROM Employees
WHERE NOT Department = 'HR';

Meaning: Exclude HR employees from the result.

Output:

NameDepartment
SujithIT
ArjunIT
PriyaFinance

8. Using Multiple Comparisons Together

You can chain multiple operators:

SELECT * FROM Employees
WHERE Salary > 50000 AND Salary < 70000 AND Department = 'IT';

Meaning: Employees in the IT department earning between 50,000 and 70,000.

Output:

NameDepartmentSalary
SujithIT60000
ArjunIT55000

NOTE: WHERE with Strings and Numbers

  • For text values, use quotes: 'IT', 'Bangalore'
  • For numbers, no quotes: Salary > 50000
  • For dates, use quotes with date format: WHERE JoiningDate > '2024-01-01'

🧠 In summary:

OperatorMeaningExampleResult
=Equal toDepartment = 'IT'Only IT employees
>Greater thanSalary > 55000Salary above 55k
<Less thanSalary < 55000Salary below 55k
>=Greater or equalSalary >= 55000Salary 55k or more
<=Less or equalSalary <= 50000Salary 50k or less
!= / <>Not equalDepartment <> 'HR'All except HR
ANDBoth must be trueSalary > 50000 AND Department='IT'Filters by both
OREither condition trueDepartment='HR' OR City='Delhi'Broader filter
NOTOpposite of conditionNOT City='Pune'Excludes Pune