Database & Cloud Roadmap — PostgreSQL · NoSQL · AWS

01

PostgreSQL

SQL Foundations — SELECT, Filtering, Sorting, Aggregates

⏱ 3 days · Skip if you're comfortable with daily SQL

You said you know basics — use the ⚡ skip if you write GROUP BY and HAVING confidently. This covers the pure fundamentals before we go deep.

⚡ Know this already — mark all done

SELECT, WHERE, ORDER BY, LIMIT, OFFSET, DISTINCT

Foundations

▸

SELECT * vs explicit columns — always prefer explicit columns in production (bandwidth + clarity)
WHERE clauses: =, !=, <, >, <=, >=, BETWEEN, IN, NOT IN, LIKE (%), ILIKE (case-insensitive), IS NULL, IS NOT NULL
AND / OR / NOT — precedence: NOT > AND > OR. Use parentheses to be explicit
ORDER BY col ASC|DESC NULLS FIRST|LAST — multiple column sorts: ORDER BY dept, salary DESC
LIMIT + OFFSET — page 3, size 10: LIMIT 10 OFFSET 20. Performance degrades on large offsets — prefer cursor-based pagination
DISTINCT — remove duplicate rows. DISTINCT ON (col) — keep first row per distinct value of col (PostgreSQL-specific)
Aliases: SELECT first_name AS name, table aliases in JOINs: FROM users u

📖 PostgreSQL SELECT Tutorial 💻 pgexercises Basic

JOINs — INNER, LEFT, RIGHT, FULL OUTER, CROSS, SELF

Must Know

▸

INNER JOIN — only rows matching on both sides. Most common INTERVIEW
LEFT JOIN — all rows from left table + matching from right. Non-matches → NULL on right side
RIGHT JOIN — mirror of LEFT JOIN. Rarely used (swap tables to use LEFT instead)
FULL OUTER JOIN — all rows from both, NULLs where no match
CROSS JOIN — cartesian product. N × M rows. Used for generating combinations
SELF JOIN — join table to itself. Classic: employees and their managers in same table
JOIN on multiple conditions: ON a.id = b.a_id AND a.type = b.type
Anti-join pattern: LEFT JOIN ... WHERE b.id IS NULL — rows in A with no match in B

🎯 Interview

"What's the difference between INNER JOIN and LEFT JOIN?" — INNER returns only matched rows; LEFT returns all from the left table. Be able to explain with a concrete example: find all users who have never placed an order.

💻 pgexercises JOIN Exercises 💻 LeetCode SQL 50

Aggregates, GROUP BY, HAVING, Subqueries, CTEs

Must Know

▸

Aggregate functions: COUNT(*), COUNT(DISTINCT col), SUM(), AVG(), MAX(), MIN(), BOOL_AND(), BOOL_OR(), STRING_AGG(col, ','), ARRAY_AGG(col)
GROUP BY — every non-aggregate column in SELECT must be in GROUP BY
HAVING — filter groups AFTER aggregation. WHERE filters rows BEFORE aggregation INTERVIEW
Correlated subquery — references outer query's columns: WHERE salary > (SELECT AVG(salary) FROM employees e2 WHERE e2.dept = e1.dept)
CTE (WITH clause): WITH active_users AS (SELECT * FROM users WHERE active = true) SELECT * FROM active_users WHERE age > 25
Recursive CTE — for hierarchical data (org charts, file trees): WITH RECURSIVE. Base case + recursive case
GROUPING SETS, ROLLUP, CUBE — multiple GROUP BY combinations in one query

💻 pgexercises Aggregates 📖 PostgreSQL: WITH Queries

02

PostgreSQL

Advanced Queries — Window Functions, CASE, String/Date Functions

⏱ 3 days · Window functions are asked in every senior SQL interview

⚡ Know this already — mark all done

Window Functions — ROW_NUMBER, RANK, LAG, LEAD, Running Totals

Interview Critical

▸

Syntax

function() OVER (PARTITION BY col ORDER BY col ROWS/RANGE BETWEEN ...)
PARTITION BY — like GROUP BY but doesn't collapse rows. Each row keeps its own identity
ORDER BY in OVER() — defines ordering within each partition

Ranking Functions

ROW_NUMBER() — unique sequential number. No ties. 1,2,3,4...
RANK() — ties get same rank, next rank skipped. 1,1,3,4...
DENSE_RANK() — ties get same rank, no gaps. 1,1,2,3... INTERVIEW
NTILE(n) — divide into N buckets (quartiles, deciles, etc.)
PERCENT_RANK(), CUME_DIST() — relative position as percentage

Value / Lead-Lag Functions

LAG(col, offset, default) — access previous row's value. Great for day-over-day comparison
LEAD(col, offset, default) — access next row's value
FIRST_VALUE(col) — first value in the window partition
LAST_VALUE(col) — last value (watch out for default frame — use ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
NTH_VALUE(col, n) — Nth value in window

Aggregate Window Functions

SUM() OVER(ORDER BY date) — running/cumulative total
AVG() OVER(PARTITION BY dept) — dept average alongside each row
COUNT() OVER() — total count without collapsing rows

🎯 Interview

Classic: "Find the second highest salary per department" — ROW_NUMBER() OVER (PARTITION BY dept ORDER BY salary DESC) then filter WHERE rn = 2. Also: "What's the difference between RANK and DENSE_RANK?"

📖 PostgreSQL Window Functions 💻 pgexercises Date/Window ▶ Window Functions Explained

CASE WHEN, String Functions, Date/Time Functions, COALESCE

Must Know

▸

CASE WHEN: CASE WHEN age < 18 THEN 'minor' WHEN age < 65 THEN 'adult' ELSE 'senior' END
COALESCE(a, b, c) — first non-NULL value. Use for default values
NULLIF(a, b) — returns NULL if a = b, else returns a. Avoid division by zero: total / NULLIF(count, 0)
String: LOWER(), UPPER(), TRIM(), LTRIM(), RTRIM(), CONCAT(), ||, LENGTH(), SUBSTRING(str FROM n FOR len), REPLACE(), REGEXP_REPLACE(), SPLIT_PART(), POSITION(), INITCAP()
Date/Time: NOW(), CURRENT_DATE, CURRENT_TIMESTAMP, EXTRACT(YEAR FROM date), DATE_PART(), DATE_TRUNC('month', ts), AGE(ts1, ts2), date + INTERVAL '7 days'
AT TIME ZONE: convert between time zones. Always store timestamps as UTC
GENERATE_SERIES(start, stop, step) — generate sequences, date ranges
Type casting: ::integer, ::text, CAST(col AS type)

📖 PostgreSQL Functions Reference

03

PostgreSQL

Schema Design — Tables, Constraints, Normalization, Data Types

⏱ 3 days

⚡ Know this already — mark all done

Data Types — Integer, Text, UUID, Numeric, Boolean, Array, Timestamps

Must Know

▸

Integer types: SMALLINT (2B), INTEGER (4B), BIGINT (8B), SERIAL/BIGSERIAL (auto-increment)
Exact numeric: NUMERIC(p, s) / DECIMAL — for money, never use FLOAT (precision issues)
Text: VARCHAR(n) (capped), TEXT (unlimited, preferred in Postgres — same performance as VARCHAR)
UUID: gen_random_uuid() (pg 13+). Use as PK for distributed systems
Boolean: true/false. Also: 't', 'f', '1', '0', 'yes', 'no'
Timestamps: TIMESTAMP (no tz), TIMESTAMPTZ (with timezone, preferred). DATE (date only), TIME
ARRAY types: tags TEXT[] — PostgreSQL-native arrays. Use with ANY(), @>, unnest()
ENUM: CREATE TYPE status AS ENUM ('PENDING','ACTIVE','CLOSED') — type-safe, but hard to alter
BYTEA — binary data (file content). Usually better to store files in S3, path in DB

📖 PostgreSQL Data Types

Constraints — PK, FK, UNIQUE, NOT NULL, CHECK, EXCLUSION

Critical

▸

PRIMARY KEY — unique + not null. Single or composite. Creates B-tree index automatically
FOREIGN KEY — referential integrity. ON DELETE CASCADE / RESTRICT / SET NULL / SET DEFAULT
UNIQUE — column or multi-column composite unique. Allows one NULL (NULLs not considered equal)
NOT NULL — application-level enforcement. Always add where data is required
CHECK constraint: CHECK (age >= 0 AND age < 150), CHECK (end_date >= start_date)
EXCLUSION constraint — advanced: no two rows can have overlapping ranges (using GiST index)
Deferrable constraints — check at end of transaction instead of per-statement. DEFERRABLE INITIALLY DEFERRED
Named constraints — CONSTRAINT fk_user FOREIGN KEY — easier to identify in error messages

📖 PostgreSQL Constraints

Normalization (1NF–3NF), Denormalization, Relationships

Interview Heavy

▸

1NF — atomic values (no arrays/repeating groups), unique rows, single-valued columns
2NF — 1NF + no partial dependency (all non-key columns depend on FULL primary key, relevant for composite PKs)
3NF — 2NF + no transitive dependency (non-key column depends on another non-key column) INTERVIEW
BCNF (Boyce-Codd) — stronger 3NF. Every determinant must be a candidate key
One-to-Many — FK in the "many" side table. Most common relationship
Many-to-Many — junction/bridge table with two FKs. Can add extra columns to junction (e.g. user_roles with assigned_at)
One-to-One — FK in either table (usually in the "optional" side). Can also use same PK
Denormalization — intentional duplication for read performance. Cache column (e.g. post_count on users), reporting tables, materialized views
UUID vs BIGSERIAL PK: UUID = globally unique, no coordination needed, bigger (16B), random → index fragmentation; BIGSERIAL = sequential, smaller (8B), faster inserts/index INTERVIEW

🎯 Interview

"Explain 1NF, 2NF, 3NF with examples." Be ready with a concrete scenario — e.g. an orders table with product names repeated (violates 3NF). Also: "When would you denormalize?"

▶ Database Design Full Course

04

PostgreSQL

Indexes, EXPLAIN ANALYZE & Query Performance

⏱ 3 days · This is where junior/senior gap shows most

Knowing when and how to index is the single most impactful database skill. A missing index can bring a production system to its knees. A bad index makes writes slow for no gain.

⚡ Know this already — mark all done

Index Types — B-tree, Hash, GIN, GiST, BRIN, Partial, Composite

Critical

▸

B-tree (default) — balanced tree. Best for: equality, range (<, >, BETWEEN), ORDER BY, LIKE 'prefix%'. 90% of your indexes will be B-tree
Hash index — equality only (=). Slightly faster than B-tree for pure equality, but no range support and not WAL-logged before Pg 10
GIN (Generalized Inverted Index) — for JSONB, arrays, full-text search. Indexes each element. Slow to build, fast to query
GiST (Generalized Search Tree) — for geometric types, ranges, PostGIS. Supports custom operators (overlap, contains)
BRIN (Block Range Index) — tiny index for naturally ordered data (timestamps, IDs). Very fast insert, approximate search. Great for append-only time-series tables
Partial index: CREATE INDEX ON orders(user_id) WHERE status = 'PENDING' — smaller index, used only when WHERE matches INTERVIEW
Composite index: CREATE INDEX ON orders(user_id, created_at) — leftmost prefix rule: index usable as (user_id) or (user_id, created_at) but NOT just (created_at)
Covering index (INCLUDE): CREATE INDEX ON users(email) INCLUDE (name, status) — index-only scan, no heap access
When to add index: high-cardinality WHERE clause columns, JOIN ON columns, ORDER BY on large tables, foreign key columns
Index overhead: every index slows INSERT/UPDATE/DELETE. Don't over-index write-heavy tables
CREATE INDEX CONCURRENTLY — add index without locking table. Slower but production-safe

📖 PostgreSQL Indexes 📖 Index Types

EXPLAIN ANALYZE — Reading Query Plans, Seq Scan vs Index Scan

Must Know

▸

EXPLAIN — show estimated query plan (no execution). EXPLAIN ANALYZE — actually runs query, shows actual times
EXPLAIN (ANALYZE, BUFFERS, FORMAT JSON) — full info for pgbadger/explain.dalibo.com
Seq Scan — reads every row. Fine for small tables or large % of data. Bad on large tables with selective filter
Index Scan — uses index to find rows, then fetches from heap. Good for selective queries
Index Only Scan — data found entirely in index, no heap fetch. Fastest
Bitmap Heap Scan — batches index results, then fetches heap pages. Good for medium selectivity
Hash Join vs Nested Loop vs Merge Join: Hash (large tables, equality), Nested Loop (small inner, with index), Merge (pre-sorted large tables) INTERVIEW
Cost numbers: (startup cost..total cost) in planner units. Actual time in ms. Rows = estimated vs actual
Row estimate way off? Run ANALYZE table_name to update statistics. Or increase default_statistics_target
pg_stat_statements extension — track slow queries across time. Best way to find production bottlenecks
auto_explain — log slow query plans automatically

🎯 Interview

"How would you investigate a slow query?" — EXPLAIN ANALYZE, look for Seq Scan on large tables, check row estimates vs actuals, verify indexes exist and are being used, check for missing statistics, consider query rewrite.

📖 PostgreSQL EXPLAIN 🔍 explain.dalibo.com — Visual EXPLAIN

Views, Materialized Views, Functions, Stored Procedures

Important

▸

VIEW — named SELECT query. No data stored. Always fresh but no performance gain. Use for: security (expose limited columns), DRY SQL
MATERIALIZED VIEW — stores query result on disk. Fast reads. Must refresh: REFRESH MATERIALIZED VIEW CONCURRENTLY mv_name. Use for: expensive aggregations, reporting
Updatable views — simple views on single table can be updated/inserted/deleted
PL/pgSQL function — procedural language in PostgreSQL. Returns value or set of rows. Used for complex business logic, triggers
PROCEDURE — like function but no return value, can use COMMIT/ROLLBACK inside (Pg 11+)
TRIGGER — auto-execute function on INSERT/UPDATE/DELETE. BEFORE or AFTER. Row-level or statement-level. Use for: audit logging, auto-update updated_at column
RETURNING clause — INSERT INTO users (...) VALUES (...) RETURNING id, created_at — get inserted row data without a second SELECT

📖 PostgreSQL Views 📖 Materialized Views

05

PostgreSQL

Transactions, ACID, Isolation Levels & Locking

⏱ 2 days · Top interview topic

⚡ Know this already — mark all done

ACID Properties, Isolation Levels, Read Phenomena

Interview Heavy

▸

Atomicity — all or nothing. BEGIN → operations → COMMIT or ROLLBACK on error
Consistency — transaction takes DB from one valid state to another. Constraints always hold
Isolation — concurrent transactions behave as if serial. Degree controlled by isolation level
Durability — committed data survives crashes. Written to WAL (Write-Ahead Log) before confirming
PostgreSQL default isolation: READ COMMITTED — sees committed data at time of each statement INTERVIEW
Isolation levels: READ UNCOMMITTED (dirty reads — Pg treats same as READ COMMITTED) → READ COMMITTED → REPEATABLE READ → SERIALIZABLE
Dirty read — read uncommitted data. Can't happen in PostgreSQL (MVCC prevents it)
Non-repeatable read — same row returns different data in same transaction. Prevented by REPEATABLE READ
Phantom read — new rows appear in same query within transaction. Prevented by SERIALIZABLE
SAVEPOINT — partial rollback within transaction: SAVEPOINT sp1; ... ROLLBACK TO SAVEPOINT sp1;

📖 PostgreSQL Transaction Isolation ▶ ACID Explained (Hussein Nasser)

MVCC, Row Locking, Advisory Locks, Deadlocks

Must Know

▸

MVCC (Multi-Version Concurrency Control) — PostgreSQL keeps multiple versions of rows. Readers never block writers, writers never block readers INTERVIEW
SELECT FOR UPDATE — lock selected rows for update. Other transactions block until lock released. Pessimistic locking
SELECT FOR UPDATE SKIP LOCKED — skip already-locked rows. Great for job queues
SELECT FOR SHARE — shared read lock. Multiple readers, blocks writers
NOWAIT — fail immediately if lock not available: SELECT FOR UPDATE NOWAIT
Advisory locks — application-level locks: pg_try_advisory_lock(key). Not tied to rows. Use for distributed coordination
Deadlock — T1 holds A waits B, T2 holds B waits A. PostgreSQL auto-detects and aborts one (deadlock_timeout = 1s default) INTERVIEW
Deadlock prevention: always acquire locks in same order. Keep transactions short
VACUUM — reclaim space from dead row versions (MVCC creates dead rows). AUTOVACUUM runs automatically
Table bloat — too many dead tuples. Monitor with pg_stat_user_tables

🎯 Interview

"How does MVCC work in PostgreSQL?" — Each transaction sees a snapshot of data. Rows have xmin/xmax to track visibility. New versions created on update, old versions cleaned by VACUUM. Readers and writers don't block each other.

📖 PostgreSQL MVCC 📖 PostgreSQL Locking

06

PostgreSQL

JSONB, Full-Text Search, Arrays & PostgreSQL-Specific Power Features

⏱ 3 days · Makes Postgres stand out from other SQL databases

⚡ Know this already — mark all done

JSONB — Operators, Indexing, When to Use

Important

▸

JSON vs JSONB — JSON stores text, JSONB is binary (parsed, indexed-able). Always use JSONB
Operators: -> (get JSON object), ->> (get as text), #> (path array), #>> (path as text)
data->'user'->>'email' — nested access
Containment: @> — does left contain right? data @> '{"status": "active"}'
Key existence: ? (key exists), ?| (any key exists), ?& (all keys exist)
GIN index on JSONB: CREATE INDEX ON products USING GIN (attributes) — indexes all keys/values
jsonb_set() — update nested key in place
jsonb_object_keys(), jsonb_each() — expand to rows
When to use JSONB: semi-structured data with varying schema (product attributes, event metadata, config). Not for core business data that you'll query often by specific fields

📖 PostgreSQL JSON Types 📖 JSON Functions

Full-Text Search, Arrays, Range Types, Extensions (pgcrypto, uuid-ossp, PostGIS)

Important

▸

Full-Text Search

tsvector — document: normalized lexemes. tsquery — search query
to_tsvector('english', title || ' ' || body) — parse text to tsvector
to_tsquery('english', 'web & search') — query with AND/OR/NOT
plainto_tsquery() — natural language query (auto-AND)
@@ operator: tsvector @@ tsquery — matches?
ts_rank(tsvector, tsquery) — relevance score for ordering
GIN index on tsvector column — fast full-text search

Arrays

tags TEXT[] — array column. ARRAY['a', 'b'] or '{a, b}'::text[]
@> contains, <@ contained by, && overlaps
unnest(tags) — expand array to rows. ANY(tags) — check membership
GIN index on array column for fast containment queries

Range Types

daterange, tstzrange, int4range — represent a range of values
GiST index for overlap/contains queries: WHERE period @> CURRENT_DATE

Key Extensions

uuid-ossp / pgcrypto — UUID generation
pg_trgm — trigram similarity for fuzzy search (LIKE with GIN index)
PostGIS — geospatial data (geography, geometry types, distance queries)
pg_stat_statements — query performance tracking
TimescaleDB — time-series extension for Postgres

📖 Full-Text Search Docs

07

PostgreSQL

Partitioning, Replication & High Availability

⏱ 3 days · Production and system design topics

⚡ Know this already — mark all done

Table Partitioning — Range, List, Hash

Important

▸

Why partition — manage very large tables (100M+ rows). Pruning: queries only scan relevant partitions. Easier archival (drop old partitions)
Range partitioning — by date range. Most common: PARTITION BY RANGE (created_at). Create: PARTITION OF orders FOR VALUES FROM ('2024-01-01') TO ('2025-01-01')
List partitioning — by discrete values: PARTITION BY LIST (region)
Hash partitioning — distribute rows evenly by hash: PARTITION BY HASH (user_id)
Partition pruning — query planner excludes irrelevant partitions when WHERE clause matches partition key
Partitioned indexes — each partition has its own index. Global indexes limited
Default partition — catches rows not matching any partition
Declarative partitioning vs inheritance-based — use declarative (Pg 10+)

📖 PostgreSQL Partitioning

Replication — Streaming, Logical, Read Replicas, Failover

Important

▸

Streaming replication (physical) — binary WAL stream from primary to standby. Hot standby: standby is readable. Used for HA and read replicas
Synchronous vs asynchronous replication — sync: primary waits for standby ACK (no data loss, slower). Async: WAL sent without waiting (faster, tiny data loss possible) INTERVIEW
Read replicas — route SELECT queries to replica, writes to primary. Reduce primary load. Eventual consistency: replication lag
Logical replication — replicate specific tables at SQL level. Allows filtering, transformation. Can replicate to different Postgres major versions
Replication slot — tracks replica's WAL position. Prevents WAL cleanup. Danger: lagging slot causes disk fill
Patroni / pg_auto_failover — automatic failover: promotes standby to primary if primary fails
Connection pooling with PgBouncer — thousand app connections → dozens of DB connections. Transaction mode vs session mode
RTO/RPO: Recovery Time Objective (how fast back up), Recovery Point Objective (how much data loss acceptable) INTERVIEW

📖 PostgreSQL Standby 📖 Logical Replication

08

PostgreSQL

Backup, Security, Configuration & DBA Basics

⏱ 2 days

⚡ Know this already — mark all done

Backup (pg_dump, PITR), Roles, Row-Level Security, Config Tuning

Important

▸

Backup & Recovery

pg_dump — logical backup of database/table to SQL or custom format. pg_dump -Fc dbname > backup.dump
pg_restore — restore from pg_dump custom format
pg_dumpall — dump all databases + roles + tablespaces
PITR (Point-In-Time Recovery) — continuous WAL archiving + base backup. Restore to any point in time
pgBackRest / Barman — enterprise backup tools with compression, encryption, differential backups

Security & Roles

Roles — CREATE ROLE app_user LOGIN PASSWORD '...'. GRANT/REVOKE privileges
Least privilege: app should have only CONNECT, SELECT/INSERT/UPDATE/DELETE on specific tables. Never SUPERUSER in production
Row-Level Security (RLS) — filter rows per user: CREATE POLICY user_policy ON orders USING (user_id = current_user_id()). Enable: ALTER TABLE orders ENABLE ROW LEVEL SECURITY
pg_hba.conf — host-based authentication. Controls who can connect from where
SSL/TLS — encrypt connections. sslmode=require in JDBC

Key Config Parameters

shared_buffers — RAM for caching data pages. Start at 25% of RAM
work_mem — RAM per sort/hash operation. Too high → OOM; too low → disk sorts
max_connections — use PgBouncer instead of bumping this high
effective_cache_size — hint to planner about OS cache available
checkpoint_completion_target, wal_buffers — tune for write performance

📖 PostgreSQL Backup 📖 Row Level Security 🔧 PGTune — Config Generator

09

NoSQL

NoSQL Fundamentals — CAP Theorem, Types, When to Use What

⏱ 1 day · Foundation before touching any specific NoSQL DB

CAP Theorem, BASE vs ACID, NoSQL Types, Data Modeling Philosophy

Critical Foundation

▸

CAP Theorem — in a distributed system you can only guarantee 2 of 3: Consistency, Availability, Partition Tolerance. Networks always partition → choice is C vs A INTERVIEW
CP systems — consistent + partition tolerant. Cassandra, HBase. May be unavailable during partition
AP systems — available + partition tolerant. CouchDB, DynamoDB. Eventually consistent
CA systems — not realistic in distributed systems. Traditional RDBMS on single node is CA
BASE — Basically Available, Soft state, Eventually consistent. Most NoSQL follow this vs ACID INTERVIEW
Eventual consistency — all replicas will converge to same state, but not immediately. Client may read stale data
NoSQL types:
📄 Document (MongoDB, CouchDB) — JSON documents. Flexible schema. Good for content, catalogs
🔑 Key-Value (Redis, DynamoDB) — simple fast lookup. Caching, sessions, leaderboards
📊 Wide-Column (Cassandra, HBase) — sparse table, partition key + clustering. Time-series, IoT
🕸️ Graph (Neo4j, Amazon Neptune) — nodes + edges. Social networks, recommendations
🔍 Search (Elasticsearch, OpenSearch) — full-text inverted index. Log analytics, search
NoSQL data modeling philosophy — model for your queries (denormalize), not for relationships. "Know your access patterns first"

🎯 Interview

"When would you choose NoSQL over SQL?" — No fixed schema, massive horizontal scale, high write throughput, specific data model (graph, timeseries). And: "Explain the CAP theorem with an example."

10

NoSQL

MongoDB — Document Model, CRUD, Indexing

⏱ 4 days · Most widely used NoSQL DB

⚡ Know this already — mark all done

Document Model, BSON, Collections, CRUD Operations

Must Know

▸

Document — JSON-like BSON object. Each document has _id (ObjectId by default)
Collection — group of documents (like a table, but no fixed schema)
ObjectId — 12-byte unique id: 4B timestamp + 5B random + 3B increment. Sortable by creation time
insertOne(), insertMany() — insert documents
findOne({filter}), find({filter}) — query. find() returns cursor (iterate with .toArray(), .forEach())
Query operators: $eq, $ne, $gt, $lt, $gte, $lte, $in, $nin, $exists, $type, $regex, $and, $or, $not, $nor
Projection: find({}, {name: 1, email: 1, _id: 0}) — include/exclude fields
updateOne({filter}, {$set: {field: val}}), updateMany(), replaceOne()
Update operators: $set, $unset, $inc, $push, $pull, $addToSet, $pop, $rename
deleteOne(), deleteMany()
findOneAndUpdate(), findOneAndDelete() — atomic find + modify
upsert: true — insert if not found, update if found
Dot notation — query nested: {"address.city": "NYC"}
Array queries: $elemMatch for array element conditions, $size for array length

📖 MongoDB CRUD Docs 💻 MongoDB University (Free)

MongoDB Indexes, Explain, Schema Design Patterns

Must Know

▸

Single field index: db.users.createIndex({email: 1}) — 1 = ascending, -1 = descending
Compound index: createIndex({status: 1, createdAt: -1}) — ESR rule: Equality → Sort → Range fields
Multikey index — automatically created when indexing array field. Indexes each element
Text index — full-text search: createIndex({title: "text", body: "text"})
Wildcard index — index all fields: createIndex({"$**": 1}). Flexible but large
Sparse index — only index documents that have the field (skip nulls)
TTL index — auto-delete documents after expiry: createIndex({createdAt: 1}, {expireAfterSeconds: 3600})
explain("executionStats") — check if index used. Look for COLLSCAN (bad) vs IXSCAN (good)
Embedding vs Referencing — embed when data is always accessed together and 1-to-few. Reference when data is large, shared, or 1-to-many INTERVIEW
16MB document limit — design around it. Never unbounded arrays
Schema design patterns: Extended Reference, Subset, Computed, Bucket (for time-series), Polymorphic

🎯 Interview

"When would you embed vs reference in MongoDB?" — Embed for data always accessed together (user + address), reference for large/shared data (posts + author). Also: "How would you model a blog with posts and comments?"

📖 MongoDB Indexes 📖 MongoDB Schema Design Rules

11

NoSQL

MongoDB Advanced — Aggregation Pipeline, Transactions, Atlas

⏱ 3 days

⚡ Know this already — mark all done

Aggregation Pipeline, $lookup (JOIN), Change Streams, Transactions

Must Know

▸

Aggregation Pipeline Stages

$match — filter documents (like WHERE). Put early to reduce data processed
$group — group and aggregate: {_id: "$dept", total: {$sum: "$salary"}, count: {$sum: 1}}
$project — reshape: include/exclude/compute fields
$sort — sort documents
$limit, $skip — pagination
$unwind — deconstruct array into separate documents (1 doc per element)
$lookup — LEFT OUTER JOIN between collections: {from: "products", localField: "productId", foreignField: "_id", as: "product"}
$addFields / $set — add computed fields
$replaceRoot — replace document with a nested field
$facet — run multiple sub-pipelines in parallel (great for faceted search)
$bucket, $bucketAuto — histogram/bucketing
$out, $merge — write aggregation results to a collection

Transactions & Change Streams

Multi-document transactions — ACID across collections/databases. Requires replica set. Added in MongoDB 4.0. Use only when needed (document model often avoids need)
Change streams — real-time event stream of collection changes. Uses replica set oplog. Build event-driven apps
MongoDB Atlas — managed MongoDB cloud. Atlas Search (Lucene), Atlas Vector Search, Atlas Data Federation

📖 MongoDB Aggregation 📖 Pipeline Stages Reference

12

NoSQL

Redis — In-Memory Store, Data Structures, Patterns

⏱ 4 days · Used in nearly every production backend

⚡ Know this already — mark all done

Core Data Structures, TTL, Pub/Sub, Streams, Persistence

Must Know

▸

Data Structures

String — bytes, integers, binary. GET/SET, INCR, DECR, APPEND, GETSET, SETNX (set if not exists), GETDEL
Hash — field-value map. HGET/HSET, HMGET/HMSET, HGETALL, HDEL, HEXISTS. Use for: user sessions, object attributes
List — linked list. LPUSH/RPUSH, LPOP/RPOP, LRANGE, LLEN, BLPOP (blocking). Use for: queues, stacks, recent activity
Set — unordered unique strings. SADD, SMEMBERS, SISMEMBER, SUNION, SINTER, SDIFF, SCARD. Use for: tags, unique visitors
Sorted Set (ZSet) — unique members with float score. ZADD, ZRANGE, ZREVRANGE, ZRANK, ZSCORE, ZRANGEBYSCORE. Use for: leaderboards, rate limiting, scheduling INTERVIEW
HyperLogLog — approximate distinct count. PFADD, PFCOUNT. 12KB memory for billions of items
Streams — append-only log with consumer groups. Like Kafka-lite. XADD, XREAD, XGROUP
Geospatial — GEOADD, GEODIST, GEOPOS, GEORADIUS

Key Concepts

TTL/EXPIRE — EXPIRE key 3600, EXPIREAT key timestamp, PERSIST (remove TTL), TTL (check remaining)
Pub/Sub — PUBLISH channel msg, SUBSCRIBE channel. Not durable — use Streams for persistence
Transactions — MULTI/EXEC block. Atomic but not rollback-on-error. WATCH for optimistic locking
Lua scripts — EVAL for atomic multi-operation sequences
Persistence: RDB (snapshots), AOF (append-only log), RDB+AOF (both). Trade-off: RDB = smaller/faster restart, AOF = less data loss INTERVIEW
Redis Cluster — horizontal sharding. 16384 hash slots distributed across nodes
Redis Sentinel — HA for single shard: monitors, notifies, automatic failover
Eviction policies — when maxmemory reached: noeviction, allkeys-lru, volatile-lru, allkeys-lfu, etc.

🎯 Interview

Classic patterns: "How would you implement a rate limiter?" (INCR + EXPIRE or Sorted Set sliding window). "How to implement a leaderboard?" (Sorted Set ZADD/ZREVRANK). "How to implement distributed locking?" (SET nx px with Lua for release).

📖 Redis Data Types 💻 Redis Get Started ▶ Redis Crash Course

13

NoSQL

Apache Cassandra — Wide-Column, Partition Key, CQL

⏱ 4 days · Used at Netflix, Apple, Discord for massive scale

⚡ Know this already — mark all done

Cassandra Architecture, Data Model, CQL, Replication, Consistency Levels

Important

▸

Architecture

Masterless (peer-to-peer) — no single point of failure. Any node can accept any request. Uses consistent hashing ring INTERVIEW
Partition key — determines which node stores the data. Choose key that distributes data evenly and matches query patterns
Clustering key — sort order within a partition. Enables range queries within a partition
Compound primary key: PRIMARY KEY (partition_key, clustering_key1, clustering_key2)
Vnodes (virtual nodes) — each physical node owns multiple token ranges. Easier to add/remove nodes
Gossip protocol — nodes share state information with each other (membership, load)
SSTable + Memtable — writes go to memtable (in-memory), flushed to SSTable on disk. Compaction merges SSTables

CQL (Cassandra Query Language)

Keyspace — like a database. Specify replication: SimpleStrategy (dev) or NetworkTopologyStrategy (prod)
CREATE TABLE messages (user_id UUID, created_at TIMESTAMP, content TEXT, PRIMARY KEY (user_id, created_at))
CQL queries MUST filter by partition key — otherwise full table scan (ALLOW FILTERING is dangerous)
INSERT = UPSERT — Cassandra always upserts. No update anomalies. Use TTL on insert
Lightweight transactions (IF NOT EXISTS / IF condition) — compare-and-swap using Paxos. Heavy — use rarely
Collections in CQL: list, set, map as column types

Replication & Consistency

Replication factor (RF) — how many replicas per piece of data. RF=3 is typical
Consistency levels: ONE, TWO, QUORUM (majority), LOCAL_QUORUM, ALL. QUORUM reads + QUORUM writes = strong consistency INTERVIEW
Tunable consistency — per-query consistency. Trade availability for consistency
Hinted handoff — if replica is down, coordinator stores hint, delivers when replica recovers
Read repair — during read, fix inconsistent replicas automatically

🎯 Interview

"How does Cassandra achieve high availability?" — Masterless ring, tunable consistency, replication factor. Also: "How do you design a table in Cassandra?" — Start with query, design partition key to support that query, no JOINs, denormalize aggressively.

📖 Apache Cassandra Docs ▶ Cassandra Fundamentals

14

NoSQL

Elasticsearch — Search Engine, Inverted Index, Aggregations

⏱ 4 days · The go-to search + analytics database

⚡ Know this already — mark all done

Inverted Index, Mapping, Query DSL, Aggregations, Cluster

Important

▸

Architecture

Inverted index — maps terms to documents containing them. Core data structure behind fast full-text search INTERVIEW
Index — like a database. Contains documents of similar type
Shard — horizontal unit. Primary shards + replica shards. Can't change primary shard count after creation
Node types: Master (cluster state), Data (stores/searches), Coordinating (routes requests), Ingest (pre-processing)
Analyzer pipeline: Character filter → Tokenizer → Token filters. Determines how text is indexed and searched

Mapping & Indexing

Dynamic mapping — Elasticsearch auto-detects types. Risky in production (mapping explosion)
Explicit mapping — define field types: text (analyzed), keyword (exact), integer, date, nested, object, boolean, geo_point
text vs keyword: text = analyzed/full-text searchable; keyword = exact match, aggregatable, sortable INTERVIEW
Nested type — for arrays of objects where you need to query across fields of same object
Index templates + ILM — auto-apply settings to new indices. Index Lifecycle Management for time-based indices

Query DSL

match — full-text search on analyzed fields. Relevance scored
term / terms — exact match on keyword fields. No analysis. Used in filters
bool query — combine: must (AND, scored), should (OR, boosts score), must_not, filter (AND, no score, cached)
range — date/number ranges: {"range": {"date": {"gte": "2024-01-01"}}}
multi_match — search across multiple fields with different boosts
fuzzy — typo-tolerant search via Levenshtein distance
highlight — return matched snippets for display

Aggregations

Bucket aggregations — terms (group by value), date_histogram, range, filters
Metric aggregations — avg, sum, min, max, stats, percentiles
Pipeline aggregations — work on results of other aggregations
Nested aggregations — sub-aggregations within buckets (powerful faceted search)

📖 Elasticsearch Guide 📖 Query DSL Reference ▶ Elasticsearch Tutorial

15

NoSQL

Neo4j & Graph Databases — Nodes, Edges, Cypher, Use Cases

⏱ 3 days · Unique use cases — fraud detection, social graphs, recommendations

⚡ Know this already — mark all done

Graph Model, Cypher Query Language, Graph Algorithms, Use Cases

Know Conceptually

▸

Nodes — entities with labels (like types) and properties. (:User {name: 'Alice'})
Relationships (edges) — typed, directional, with properties. [:FOLLOWS {since: '2023'}]
Property graph model — both nodes and edges have properties. Richer than RDF
Index-free adjacency — traversal is O(1) per hop (pointer to next node). SQL JOINs are O(n) per hop INTERVIEW
Cypher (MATCH pattern) — MATCH (u:User)-[:FOLLOWS]->(other:User) WHERE u.name = 'Alice' RETURN other.name
CREATE, MERGE, SET, DELETE, DETACH DELETE — write operations
Variable length paths: MATCH (a)-[:FOLLOWS*1..3]->(b) — follow up to 3 hops
OPTIONAL MATCH — like LEFT JOIN in graphs
Graph algorithms: PageRank (influence), Community Detection, Shortest Path (DIJKSTRA), Centrality, Node Similarity
When to use graph DB: social networks, fraud detection (suspicious patterns), recommendation engines, knowledge graphs, network/IT topology, supply chain
DynamoDB vs Neo4j for relationships — DynamoDB can model graphs but traversal requires multiple round trips; Neo4j is native

📖 Cypher Manual 💻 Neo4j GraphAcademy (Free)

16

NoSQL

Database Selection Guide — Choosing the Right Tool

⏱ 1 day · Synthesis — this is what system design interviews test

When to Use PostgreSQL, MongoDB, Redis, Cassandra, Elasticsearch, Neo4j

System Design

▸

PostgreSQL — default choice. Complex queries, transactions, strict consistency, relational data, JOINs needed, data integrity critical
MongoDB — flexible/evolving schema, document-like data (products with varying attributes), rapid prototyping, content management
Redis — caching, session storage, rate limiting, pub/sub, leaderboards, job queues, distributed locks. NOT primary datastore
Cassandra — massive write throughput, time-series data, globally distributed, always available (AP), IoT sensor data, activity streams, 100M+ rows
Elasticsearch — full-text search, log aggregation, analytics dashboards, faceted search, autocomplete, ELK stack
Neo4j — highly connected data, relationship traversal, fraud detection, social graphs, recommendation engines
DynamoDB (AWS) — managed key-value/document, auto-scaling, serverless, single-digit ms at any scale. But expensive and limited query patterns
Polyglot persistence — use multiple databases for different purposes in same system. E.g. PostgreSQL for orders + Elasticsearch for search + Redis for caching INTERVIEW

🏗 Build — DB Phase Checkpoint

Build a product catalog: PostgreSQL (users, orders), MongoDB (product catalog with varying attributes), Elasticsearch (product search with facets), Redis (session + cache). Connect all via a REST API.

🔍 DB-Engines Ranking

17

AWS

AWS Foundations — Regions, AZs, Global Infrastructure, Free Tier

⏱ 1 day · Setup + orientation before building anything

⚡ Know this already — mark all done

Regions, Availability Zones, Edge Locations, Shared Responsibility Model

Foundations

▸

Region — geographic area with 2+ AZs. Independent. ~34 regions. Choose closest to users for latency
Availability Zone (AZ) — one or more data centers within a region. Physically separate, connected with low-latency fiber. Deploy across 2+ AZs for HA INTERVIEW
Edge Locations — CloudFront CDN PoPs. Closer to users than regions. ~400+ worldwide
Local Zones — extend AWS infrastructure to metro areas for ultra-low latency
Shared Responsibility Model — AWS owns "security OF the cloud" (hardware, network, hypervisor). You own "security IN the cloud" (OS, app, data, IAM) INTERVIEW
Free Tier — EC2 t2.micro 750 hrs/mo, RDS 750 hrs/mo, S3 5GB, Lambda 1M requests. Set $10 billing alarm immediately!
AWS CLI setup: aws configure — access key, secret, region, output format
AWS console vs CLI vs SDK vs CDK vs CloudFormation — understand when to use each
Well-Architected Framework 6 pillars: Operational Excellence, Security, Reliability, Performance, Cost, Sustainability

📖 AWS Documentation Hub 💻 AWS Free Digital Training ▶ AWS Certified Cloud Practitioner (Free)

18

AWS

IAM — Users, Roles, Policies, MFA, STS

⏱ 2 days · Most important security topic in AWS

⚡ Know this already — mark all done

IAM Users, Groups, Roles, Policies, STS, Organizations

Critical

▸

Root account — never use for daily work. Enable MFA. Create admin IAM user instead
IAM User — has access key + secret (CLI/SDK) or password (console). Belongs to groups
IAM Group — collection of users. Attach policies to group, not individual users
IAM Role — identity that can be assumed by users, services, or other accounts. No static credentials — gets temporary tokens via STS. USE ROLES FOR EC2/ECS/Lambda INTERVIEW
IAM Policy — JSON document: Effect (Allow/Deny), Action (s3:GetObject), Resource (ARN), Condition
Policy types: Identity-based (attached to user/role), Resource-based (attached to S3 bucket, Lambda), Permission boundary, SCP (Service Control Policy)
Least privilege principle — grant minimum permissions needed. Start restrictive, add as needed
STS (Security Token Service) — issues temporary credentials. AssumeRole, GetSessionToken
Instance Profile — IAM role for EC2. App calls IMDS to get temp credentials. No keys in code
AWS Organizations — multi-account management. SCPs restrict what accounts can do
AWS SSO / IAM Identity Center — single sign-on across multiple AWS accounts
Secrets Manager vs Parameter Store — Secrets Manager: auto-rotation, higher cost. SSM Parameter Store: cheaper, manual rotation

🎯 Interview

"How do EC2 instances access other AWS services securely?" — IAM Role attached to instance via Instance Profile. App uses AWS SDK which auto-fetches temp credentials from IMDS endpoint. NO static keys needed or wanted.

📖 IAM User Guide 📖 IAM Policies

19

AWS

Compute — EC2, Auto Scaling, Load Balancers, Lambda

⏱ 4 days

⚡ Know this already — mark all done

EC2 — Instance Types, Pricing, AMI, Security Groups, User Data, ASG, ALB

Must Know

▸

Instance families: t (burstable, cheapest), m (general), c (compute), r (memory), p/g (GPU). Size: nano, micro, small, medium, large, xlarge, 2xlarge...
Pricing models: On-Demand (pay by hour, no commitment), Reserved (1-3yr commitment, 60-75% discount), Spot (bid on unused capacity, up to 90% discount, can be interrupted), Savings Plans INTERVIEW
AMI (Amazon Machine Image) — OS + installed software snapshot. Create custom AMIs for faster launch
Security Group — stateful firewall at instance level. Inbound + outbound rules. Reference SGs in rules (not IP ranges) for internal communication
Key pair — SSH access. Keep private key safe — can't recover
User Data — bootstrap script on first launch. Install software, start services
EBS-optimized — dedicated bandwidth between EC2 and EBS
Auto Scaling Group (ASG) — maintain min/max/desired EC2 count. Scale out/in based on CloudWatch metrics (CPU, custom) INTERVIEW
Launch Template — defines what to launch in ASG (AMI, instance type, SG, IAM role)
Application Load Balancer (ALB) — HTTP/HTTPS, path-based routing, host-based routing, targets: EC2, containers, Lambda. Sticky sessions
Network Load Balancer (NLB) — TCP/UDP, ultra-high performance, static IP. For gaming, IoT
Target Group — ALB routes to target groups. Health checks per target
EC2 metadata service (IMDS) — get instance metadata + IAM credentials: http://169.254.169.254

📖 EC2 User Guide 📖 Auto Scaling Guide

AWS Lambda — Serverless Functions, Triggers, Layers, Cold Starts

Must Know

▸

Lambda — run code without servers. Pay per invocation + duration (ms). Max 15 min, 10GB memory, 512MB disk
Triggers: API Gateway, S3 events, DynamoDB Streams, SQS, SNS, EventBridge, Kinesis, Cognito, ALB
Cold start — first invocation initializes execution environment. Adds 100-1000ms latency. JVM languages (Java) have high cold start. Mitigate: Provisioned Concurrency, keep-warm pings, SnapStart (Java) INTERVIEW
Execution environment — reused across invocations. Init code outside handler runs once. DB connections should be global
Concurrency — each request = one execution environment. 1000 concurrent default limit. Reserved vs provisioned concurrency
Lambda Layers — shared code/dependencies (like shared JARs). Up to 5 layers. Reduce deployment package size
Function URL — HTTPS endpoint for Lambda without API Gateway
Lambda@Edge / CloudFront Functions — run Lambda at CDN edge
Dead Letter Queue (DLQ) — async invocation failures sent to SQS/SNS
Event Source Mapping — polling SQS/Kinesis/DynamoDB Streams. Batch size, parallelization factor

📖 Lambda Developer Guide ▶ Lambda Complete Guide

20

AWS

Storage — S3, EBS, EFS, Storage Classes

⏱ 3 days

⚡ Know this already — mark all done

S3 — Buckets, Storage Classes, Versioning, Lifecycle, Security, Pre-signed URLs

Must Know

▸

Object storage — key-value: key = full path, value = bytes. Not filesystem. Flat structure. Max 5TB per object
Bucket globally unique name — lowercase, 3-63 chars. Tied to a region
Storage classes — Standard (frequent access), Standard-IA (infrequent access, cheaper), One Zone-IA, Intelligent-Tiering (auto moves), Glacier Instant/Flexible/Deep Archive (archive) INTERVIEW
Lifecycle rules — auto-transition between storage classes or delete after N days
Versioning — keep multiple versions of objects. Enables recovery from accidental deletes
MFA Delete — require MFA to permanently delete versioned objects
Bucket policy vs ACL — use bucket policies (JSON). ACLs are legacy
Block Public Access — 4 settings to prevent public access. Enable all by default
Pre-signed URLs — temp URL with expiry for direct client upload/download. Avoid proxying files through your server INTERVIEW
S3 Events — trigger Lambda/SQS/SNS on object create/delete. For file processing pipelines
Transfer Acceleration — upload via CloudFront edge → faster global uploads
Multipart Upload — for files >100MB. Parallel parts, resume on failure
S3 Select / Glacier Select — run SQL on individual objects. Reduce data transfer
Cross-Region Replication (CRR) — replicate objects to another region
EBS — block storage attached to single EC2. Like a hard drive. gp3 (general), io2 (high IOPS), st1 (throughput)
EFS — managed NFS. Multi-AZ, multi-instance shared filesystem. More expensive than EBS

📖 S3 User Guide

21

AWS

Networking — VPC, Subnets, Route Tables, NAT, Security Groups, NACLs

⏱ 3 days · Foundation for everything in AWS

⚡ Know this already — mark all done

VPC Architecture — Subnets, IGW, NAT, Route Tables, VPN, Direct Connect, VPC Peering

Critical

▸

VPC — private virtual network. CIDR block (e.g. 10.0.0.0/16). Per region
Public subnet — has route to Internet Gateway. Resources can have public IPs. Load balancers, bastion hosts go here
Private subnet — no route to IGW. App servers, databases go here. Outbound via NAT Gateway
Internet Gateway (IGW) — enables internet access for VPC. Attach to VPC, add route in public subnet's route table
NAT Gateway — in public subnet. Allows private subnet outbound to internet (software updates, API calls). No inbound. Costs money! INTERVIEW
Route Table — rules for where traffic goes. Each subnet associated with one route table
Security Group — stateful. Instance-level. Allow rules only. Both inbound AND outbound. Reference other SGs
NACL (Network ACL) — stateless. Subnet-level. Allow AND deny rules. Must configure both inbound AND outbound INTERVIEW
VPC Peering — connect two VPCs. Non-transitive (A↔B, B↔C doesn't mean A↔C). Same or cross-account
Transit Gateway — hub-and-spoke for many VPCs. Transitive routing. Replaces complex peering meshes
VPN Gateway — IPSec VPN to on-premises. Encrypted over internet
Direct Connect — dedicated physical network connection to AWS. Low latency, high bandwidth, not over internet
Bastion host — jump server in public subnet for SSH into private instances
VPC Endpoints — access AWS services (S3, DynamoDB) without going through internet. Gateway (S3/DynamoDB, free) vs Interface (powered by PrivateLink, $)

🎯 Interview

"Design a VPC for a 3-tier web app." — Public subnets (ALB, NAT GW), private subnets (EC2 app servers), isolated subnets (RDS). 2 AZs for HA. SG between each tier allows only necessary traffic.

📖 VPC User Guide ▶ VPC Explained (TechWorld Nana)

22

AWS

AWS Databases — RDS, Aurora, DynamoDB, ElastiCache

⏱ 4 days

⚡ Know this already — mark all done

RDS, Aurora, DynamoDB, ElastiCache — Managed DB Services

Must Know

▸

RDS

RDS — managed relational DB: PostgreSQL, MySQL, MariaDB, Oracle, SQL Server
Multi-AZ deployment — synchronous standby in another AZ. Auto failover in 1-2 min. For HA, not read scaling INTERVIEW
Read Replicas — async replication. For read scaling. Up to 15 replicas. Can promote to standalone
RDS Proxy — connection pooler. Reduces connection overhead for Lambda/serverless
Automated backups — point-in-time restore within retention period (up to 35 days)
Parameter Group — configure DB parameters (like postgresql.conf but managed)

Aurora

Aurora — AWS-built MySQL/PostgreSQL compatible engine. 5x PostgreSQL performance claim
Aurora storage — auto-grows up to 128TB, 6 copies across 3 AZs automatically
Aurora Serverless v2 — auto-scales capacity in fractions of ACU. Great for variable workloads
Aurora Global Database — cross-region with <1s replication lag. For global HA

DynamoDB

DynamoDB — fully managed key-value + document store. Single-digit ms at any scale
Primary key — Partition key (hash) or Partition key + Sort key (range) INTERVIEW
GSI (Global Secondary Index) — alternate partition key. Separate RCU/WCU. Eventual consistency
LSI (Local Secondary Index) — same partition key, different sort key. Must be created at table creation
Capacity modes — Provisioned (specify RCU/WCU, autoscaling) vs On-Demand (pay per request, no planning)
Single-table design — store multiple entity types in one table using generic PK/SK. Reduces cost, enables complex access patterns
DynamoDB Streams — change data capture. Trigger Lambda on changes
DAX (DynamoDB Accelerator) — in-memory cache for DynamoDB. Microsecond latency

ElastiCache

ElastiCache for Redis — managed Redis cluster/sentinel. For caching, sessions, queues
ElastiCache for Memcached — simple key-value caching. Multi-threaded. No persistence
Cluster Mode — Redis Cluster mode for sharding across multiple nodes

📖 RDS User Guide 📖 DynamoDB Developer Guide 📖 DynamoDB Guide (Community)

23

AWS

Containers — ECR, ECS Fargate, EKS (Kubernetes on AWS)

⏱ 4 days

⚡ Know this already — mark all done

ECR, ECS Fargate, Task Definitions, Services, EKS Basics

Must Know

▸

ECR (Elastic Container Registry)

Private Docker registry. aws ecr get-login-password | docker login, docker push
Image scanning, lifecycle policies, cross-region replication

ECS (Elastic Container Service)

Task Definition — JSON spec: container image, CPU, memory, environment variables, IAM task role, port mappings, log config
Service — maintain N running copies of a task. Integrates with ALB for load balancing. Auto-replaces failed tasks
Cluster — logical group of services/tasks. ECS on Fargate or EC2 launch type
Fargate — serverless compute for containers. No EC2 management. Pay per task CPU/memory. Per-task IAM role INTERVIEW
EC2 launch type — you manage EC2 instances. More control, more work
Service Auto Scaling — scale task count based on CloudWatch metrics (CPU, custom)
Service Connect / Cloud Map — service discovery between ECS services

EKS (Elastic Kubernetes Service)

EKS — managed Kubernetes control plane. You manage worker nodes (EC2 or Fargate)
Core concepts: Pod, Deployment, Service, Ingress, ConfigMap, Secret, Namespace, Node, HPA
kubectl — CLI tool. Configure kubeconfig: aws eks update-kubeconfig
Helm — Kubernetes package manager. Charts for common deployments
EKS vs ECS: ECS = simpler, AWS-native. EKS = Kubernetes-compatible, more complex, portable INTERVIEW
Fargate for EKS — serverless Kubernetes pods. No EC2 nodes
AWS Load Balancer Controller — creates ALB/NLB from Kubernetes Ingress/Service

📖 ECS Developer Guide 📖 EKS User Guide ▶ ECS + Fargate Complete Tutorial

24

AWS

Messaging — SQS, SNS, EventBridge, Kinesis

⏱ 3 days

⚡ Know this already — mark all done

SQS, SNS, EventBridge, Kinesis — AWS Messaging Services

Must Know

▸

SQS (Simple Queue Service) — managed message queue. At-least-once delivery. Up to 256KB messages. 14-day retention. Visibility timeout (processing window)
SQS Standard — nearly unlimited throughput, at-least-once, best-effort ordering
SQS FIFO — exactly-once processing, strict ordering. Up to 3000 msg/s with batching INTERVIEW
Dead Letter Queue (DLQ) — failed messages after max receives. Monitor DLQ size for failures
Long polling — WaitTimeSeconds=20. Reduces empty responses, cheaper
SNS (Simple Notification Service) — pub/sub push model. Topic → subscribers (SQS, Lambda, HTTP, email, SMS)
Fan-out pattern — SNS → multiple SQS queues. Different services process same event independently INTERVIEW
SNS message filtering — subscribers receive only matching messages (filter by attributes)
EventBridge — serverless event bus. Routes events from AWS services, custom apps, SaaS. Event patterns + rules + targets. Schema registry
EventBridge vs SNS — EventBridge: richer routing, schema discovery, 3rd-party integrations. SNS: simpler, higher throughput fan-out
Kinesis Data Streams — real-time streaming. Multiple consumers, data retention 1-365 days, ordered within shard
Kinesis vs SQS — Kinesis: replayable, multiple consumers, ordered. SQS: simpler, auto-deletes, one consumer per message
Kinesis Data Firehose — load streaming data to S3, Redshift, Elasticsearch. Serverless

📖 SQS Developer Guide 📖 EventBridge Guide

25

AWS

Serverless — API Gateway, Step Functions, SAM, CDK

⏱ 3 days

⚡ Know this already — mark all done

API Gateway, Step Functions, SAM/CDK Infrastructure as Code

Important

▸

API Gateway REST API — create managed REST APIs. Integrate with Lambda, HTTP backend, AWS services. Auth: Cognito, Lambda Authorizer, API Keys
API Gateway HTTP API — newer, simpler, cheaper. JWT auth built-in. No usage plans
WebSocket API — persistent connections. For chat, real-time notifications
API Gateway caching — cache responses for TTL. Reduces Lambda invocations
API Gateway throttling — default 10K req/s per account. Per-stage, per-client throttling
Step Functions — serverless workflow orchestration. State machines: Task, Choice, Parallel, Wait, Map states. Visual workflow editor. For complex multi-Lambda orchestration INTERVIEW
Step Functions Standard vs Express — Standard: at-most-once, long-running, auditable. Express: at-least-once, high throughput, <5min
AWS SAM — Serverless Application Model. CloudFormation extension with shorthand for Lambda, API Gateway, DynamoDB
AWS CDK — Cloud Development Kit. Define infrastructure in TypeScript/Python/Java. Compiles to CloudFormation. Higher-level constructs
CloudFormation — declarative YAML/JSON infrastructure as code. Stack = group of resources. Change sets for updates
Terraform — multi-cloud IaC. HCL language. State management. More portable than CDK

📖 API Gateway Docs 📖 AWS CDK Guide

26

AWS

Monitoring — CloudWatch, X-Ray, CloudTrail, Config

⏱ 2 days

⚡ Know this already — mark all done

CloudWatch Logs/Metrics/Alarms, X-Ray Tracing, CloudTrail, GuardDuty

Must Know

▸

CloudWatch Logs — log groups + log streams. Lambda/ECS auto-sends logs. Retention policies. Log Insights (SQL-like query on logs)
CloudWatch Metrics — time-series metrics. Built-in from AWS services. Custom metrics via PutMetricData API
CloudWatch Alarms — alert on metric threshold → SNS notification → action. Billing alarms = critical
CloudWatch Dashboards — aggregate metrics from multiple services into one view
CloudWatch Contributor Insights — find top contributors to high traffic/errors
AWS X-Ray — distributed tracing. Trace ID propagated across services. Service map visualization. Find latency bottlenecks INTERVIEW
CloudTrail — audit log of every AWS API call. Who did what, when, from where. 90-day event history free. Send to S3 for long-term retention
AWS Config — resource configuration history. Compliance rules (e.g. "all S3 buckets must have encryption")
GuardDuty — threat detection using ML. Analyzes VPC flow logs, CloudTrail, DNS logs. Alerts on suspicious activity
AWS Security Hub — aggregate security findings from GuardDuty, Inspector, Config

📖 CloudWatch Docs 📖 X-Ray Developer Guide

27

AWS

AWS Advanced — CloudFront, Route 53, Cognito, WAF, Cost

⏱ 3 days

⚡ Know this already — mark all done

CloudFront CDN, Route 53 DNS, Cognito Auth, WAF, Cost Optimization

Important

▸

CloudFront — CDN. Cache content at 400+ edge locations. Distributions point to origins (S3, ALB, API GW). TTL-based caching. HTTPS termination
CloudFront Functions / Lambda@Edge — run logic at edge. Viewer request/response, origin request/response
Route 53 — DNS service. Routing policies: Simple, Weighted (A/B testing), Latency-based, Failover, Geolocation, Multivalue INTERVIEW
Route 53 Health Checks — monitor endpoints, trigger failover automatically
ACM (Certificate Manager) — free SSL/TLS certificates. Auto-renew. Use with ALB/CloudFront
Cognito User Pools — managed user directory. Sign-up, login, MFA, social login, JWT tokens
Cognito Identity Pools — exchange token for temporary AWS credentials (access S3, DynamoDB directly from app)
WAF (Web Application Firewall) — protect ALB/API Gateway/CloudFront from OWASP Top 10, rate limiting, geo-blocking
AWS Shield — DDoS protection. Standard (free, auto). Advanced (24/7 DRT, financial protection)
Cost optimization: rightsizing EC2 (Compute Optimizer), Savings Plans, Spot instances for batch, S3 Intelligent Tiering, CloudWatch billing alarm, AWS Cost Explorer, Trusted Advisor

🏗 Build — AWS Checkpoint

Deploy the Java backend from the other roadmap on AWS: ECS Fargate + RDS PostgreSQL + ElastiCache Redis + S3 for file uploads + ALB + CloudFront + Route 53 + Cognito for auth. Set up CI/CD with GitHub Actions pushing to ECR → ECS. Add CloudWatch logging + alarms.

📖 Route 53 Guide 📖 Well-Architected Tool

28

Azure

Azure Foundations — Portal, CLI, Resource Groups, Subscriptions

⏱ 1 day

⚡ Know this already — mark all done

Azure Organization — Subscriptions, Resource Groups, Regions, ARM, Bicep

Foundations

▸

Management hierarchy: Management Groups → Subscriptions → Resource Groups → Resources
Subscription — billing boundary. One subscription per department/project/environment is common
Resource Group — logical container for related resources. Resources in same lifecycle. Apply policies, RBAC, cost tracking at RG level
Azure Regions — 60+ regions. Each has paired region for geo-redundancy. Choose closest to users
Availability Zones — physically separate datacenters within a region. Deploy across 3 AZs for HA INTERVIEW
Azure Portal — web UI. Good for exploration but not for automation
Azure CLI: az login, az group create, az vm create
Azure PowerShell — PowerShell module. More common in Windows-heavy shops
ARM (Azure Resource Manager) — all Azure resources managed via ARM. JSON templates for IaC
Bicep — domain-specific language that compiles to ARM. Cleaner than raw JSON. Preferred over ARM templates now
Azure Free Account — $200 credit for 30 days + 12 months of popular free services

📖 Azure ARM Overview 💻 Microsoft Learn (Free)

29

Azure

Azure IAM — Entra ID (AAD), RBAC, Managed Identity

⏱ 2 days

⚡ Know this already — mark all done

Entra ID, RBAC, Service Principals, Managed Identity, Key Vault

Critical

▸

Microsoft Entra ID (formerly Azure AD) — cloud identity platform. Users, groups, apps, B2B, B2C. Different from on-prem AD
Azure RBAC — Role-Based Access Control. Assign roles (Owner, Contributor, Reader, custom) to: users, groups, service principals, managed identities at: subscription, resource group, or resource scope INTERVIEW
Built-in roles: Owner (full access + manage access), Contributor (full access, no RBAC), Reader (read-only), specific roles (Storage Blob Data Contributor, etc.)
Service Principal — app identity in Entra ID. Like AWS IAM User for apps. Client ID + secret or certificate
Managed Identity — auto-managed SP for Azure resources. System-assigned (tied to resource lifecycle) or User-assigned (reusable). No credentials to manage. Azure equivalent of EC2 Instance Role INTERVIEW
Azure Key Vault — store secrets, keys, certificates. Access via Managed Identity. Soft-delete and purge protection
Key Vault references in App Service — reference Key Vault secrets in app settings without storing values
Conditional Access — policies: require MFA from outside network, block legacy auth, require compliant device
Privileged Identity Management (PIM) — just-in-time privileged access. Request → approve → time-limited elevation
Azure Policy — enforce organizational standards. Audit or deny non-compliant resources (e.g. require tags, allowed regions)

📖 Azure RBAC Docs 📖 Azure Key Vault

30

Azure

Azure Compute — VMs, App Service, Azure Functions, Scale Sets

⏱ 3 days

⚡ Know this already — mark all done

Azure VMs, VMSS, App Service (PaaS), Azure Functions (Serverless)

Must Know

▸

Azure Virtual Machines

VM Series: B (burstable), D (general), E (memory), F (compute). Equivalent to EC2 families
Availability Sets — protect from hardware failure within a datacenter (fault domain + update domain)
VMSS (Virtual Machine Scale Sets) — auto-scale fleet of identical VMs. Uniform (same config) or Flexible (mix). Equivalent to AWS ASG
Proximity Placement Groups — colocate VMs in same datacenter for low latency between them
Spot VMs — use excess Azure capacity at 60-90% discount. Can be evicted. For fault-tolerant batch workloads
Azure Bastion — browser-based SSH/RDP without public IP. No bastion host to manage

Azure App Service

App Service — PaaS for web apps. Supports Java, .NET, Node, Python, PHP, Ruby. Deploy from Git, Docker, GitHub Actions. Built-in TLS, custom domains
App Service Plan — defines compute (Free, Basic, Standard, Premium, Isolated). Multiple apps share one plan
Deployment slots — staging, production slots. Swap with zero downtime (traffic gradually shifted)
Auto-scaling — manual (scale out/in manually) or autoscale (rule-based or predictive)
App Service Environment (ASE) — dedicated infrastructure. For compliance/isolation

Azure Functions

Azure Functions — serverless. Triggers: HTTP, Timer (cron), Queue, Blob, Event Hub, Service Bus, CosmosDB INTERVIEW
Hosting plans: Consumption (pay-per-use, 5min timeout), Premium (pre-warmed instances, no timeout), App Service Plan (dedicated)
Durable Functions — stateful serverless workflows. Orchestrator + Activity functions. Like AWS Step Functions
Cold start — same issue as Lambda. Premium plan keeps instances warm
Function App — container for multiple functions sharing config, storage, scaling

📖 App Service Docs 📖 Azure Functions Docs

31

Azure

Azure Storage — Blob, File, Queue, Table, Data Lake

⏱ 2 days

⚡ Know this already — mark all done

Blob Storage, Storage Tiers, SAS Tokens, Azure Files, Data Lake

Must Know

▸

Azure Blob Storage — object storage. Equivalent to S3. Storage Account → Container → Blob
Blob types: Block (most files, up to 4.75TB), Append (log files), Page (VHD disks)
Access tiers: Hot (frequent access, higher storage cost), Cool (infrequent, lower storage), Cold, Archive (cheapest, hours to retrieve) INTERVIEW
Lifecycle Management — auto-move blobs between tiers or delete based on rules
SAS (Shared Access Signature) — time-limited token for specific operations. Service SAS (single resource), Account SAS (multiple), User Delegation SAS (uses Entra ID). Equivalent to AWS pre-signed URLs
Storage redundancy: LRS (3 copies, 1 datacenter), ZRS (3 AZs), GRS (LRS + async to paired region), GZRS (ZRS + GRS) INTERVIEW
Azure Files — managed SMB/NFS file share. Mount on Windows/Linux/macOS VMs
Azure Queue Storage — simple message queue. Up to 64KB messages. 7-day retention
Azure Table Storage — key-value NoSQL. Cheaper than CosmosDB. Limited query capability
Azure Data Lake Storage Gen2 — Blob Storage + hierarchical namespace. HDFS-compatible. For big data analytics with Spark/Databricks
Storage Firewall — restrict to specific VNets/IPs. Service Endpoints or Private Endpoints

📖 Azure Storage Docs

32

Azure

Azure Databases — Azure SQL, PostgreSQL, CosmosDB, Cache for Redis

⏱ 3 days

⚡ Know this already — mark all done

Azure SQL, PostgreSQL Flexible Server, CosmosDB, Azure Cache for Redis

Must Know

▸

Relational Databases

Azure SQL Database — managed SQL Server. Serverless tier (auto-pause/scale), Hyperscale (up to 100TB), Business Critical (in-memory OLTP)
Azure Database for PostgreSQL Flexible Server — managed PostgreSQL. Choose PostgreSQL version. High availability with standby in same or different AZ. Burstable, General Purpose, Memory Optimized tiers
Azure Database for MySQL Flexible Server — same managed model as PostgreSQL
Elastic Pool — share DTU/vCores across multiple databases. Cost-efficient for variable per-DB usage

CosmosDB

CosmosDB — globally distributed multi-model database. Single-digit ms latency globally INTERVIEW
APIs: NoSQL (native, JSON), MongoDB (wire-compatible), Cassandra, Table, Gremlin (graph)
Consistency levels — 5 levels from Strong to Eventual. Bounded Staleness, Session (default), Consistent Prefix, Eventual
Request Units (RU) — currency for reads/writes. 1 RU = 1KB read. Budget RUs for cost control
Partition key — choose for even distribution AND aligns with common queries
Global distribution — add/remove regions with one click. Multi-region writes (conflict resolution)
Change Feed — ordered log of changes. For event sourcing, caching, Kafka-like patterns
Autoscale — automatically scale RU/s based on demand

Azure Cache for Redis

Managed Redis. Tiers: Basic (single node), Standard (replicated), Premium (cluster, persistence, Geo-replication), Enterprise (Redis Enterprise)
Same use cases as ElastiCache for Redis

📖 CosmosDB Docs 📖 Azure PostgreSQL Docs

33

Azure

Azure Networking — VNet, NSG, Load Balancer, Application Gateway, Front Door

⏱ 2 days

⚡ Know this already — mark all done

VNet, NSG, Azure Load Balancer, Application Gateway, Front Door, DNS

Must Know

▸

VNet (Virtual Network) — isolated private network. CIDR block. Contains subnets. Per region. Equivalent to AWS VPC
Subnets — subdivide VNet. Some subnets delegated to specific services (App Service, AKS, etc.)
NSG (Network Security Group) — stateful firewall at subnet or NIC level. Inbound/outbound rules. Priority-based (lower = higher priority) INTERVIEW
Application Security Groups (ASG) — logical grouping of VMs for NSG rules. Like SG references in AWS
VNet Peering — connect VNets. Low latency. Non-transitive. Global VNet Peering across regions
Azure Load Balancer — Layer 4 (TCP/UDP). Internal or public. Zone-redundant. Health probes
Application Gateway — Layer 7 (HTTP/HTTPS). Path-based routing, SSL termination, WAF. Equivalent to AWS ALB + WAF
Azure Front Door — global CDN + load balancer + WAF. Anycast routing, URL-based routing, caching. Equivalent to AWS CloudFront + Global Accelerator
Azure DNS — host DNS zones in Azure. Private DNS zones for VNet resolution
Private Endpoint — private IP in VNet for Azure service (SQL, Storage, etc.). Traffic stays on Azure backbone
Service Endpoint — route VNet traffic to Azure service directly (without internet). Less secure than Private Endpoint but free
VPN Gateway — site-to-site VPN to on-prem. Point-to-site VPN for developers
ExpressRoute — dedicated private circuit to Azure. Like AWS Direct Connect

📖 Azure VNet Docs 📖 Application Gateway

34

Azure

Containers — ACR, ACI, AKS, Container Apps

⏱ 3 days

⚡ Know this already — mark all done

ACR, ACI, AKS, Azure Container Apps — Container Services on Azure

Must Know

▸

ACR (Azure Container Registry) — managed Docker registry. Geo-replication, image scanning (Defender for Containers), Webhooks. Equivalent to AWS ECR
ACI (Azure Container Instances) — run containers without orchestration. Pay-per-second. Fast startup. Good for: batch jobs, CI/CD, simple tasks. No auto-scaling
AKS (Azure Kubernetes Service) — managed Kubernetes. Microsoft manages control plane (free). You pay for worker nodes
AKS node pools — system (critical pods) and user (workload pods) pools. Mix instance types. Spot node pools for cost
AKS autoscaler — Cluster Autoscaler (add/remove nodes) + HPA (scale pods)
Azure CNI vs Kubenet — Azure CNI assigns VNet IPs to pods. Kubenet uses separate pod CIDR. Azure CNI needed for Windows nodes, AKS Private Cluster
Workload Identity — pods get Azure Managed Identity. No secrets needed for Azure service access
Azure Container Apps — serverless containers. Built on Kubernetes (Dapr, KEDA). Scale to zero. Traffic splitting. Easier than AKS for microservices. Like AWS App Runner or ECS with KEDA INTERVIEW
Dapr (Distributed App Runtime) — sidecar for microservices. Service invocation, pub/sub, state, secrets via portable API. Works with Container Apps
KEDA (Kubernetes Event-Driven Autoscaling) — scale pods based on event sources (queue depth, HTTP traffic)

📖 AKS Documentation 📖 Azure Container Apps ▶ AKS Complete Tutorial

35

Azure

Azure Messaging — Service Bus, Event Hubs, Event Grid, Storage Queue

⏱ 2 days

⚡ Know this already — mark all done

Azure Service Bus, Event Hubs, Event Grid — Messaging Trio

Must Know

▸

Azure Service Bus — enterprise message broker. Queues (point-to-point) and Topics/Subscriptions (pub-sub). Guaranteed delivery, ordering, DLQ, sessions. Equivalent to AWS SQS + SNS combined INTERVIEW
Service Bus features: Message sessions (ordered processing per session), Scheduled messages, Message deferral, Transactions, Auto-forwarding, DLQ
Service Bus tiers: Basic (queues only), Standard, Premium (dedicated, VNet, larger messages)
Azure Event Hubs — event streaming. High throughput (millions/sec). Partitioned. Consumer groups. 1-7 day retention. Kafka-compatible protocol. Equivalent to AWS Kinesis
Event Hubs Capture — auto-archive to Blob Storage / Data Lake
Azure Event Grid — reactive event routing. HTTP push model. Routes events from Azure services (Blob, CosmosDB) to handlers (Functions, Logic Apps, Webhooks). At-least-once delivery. Equivalent to AWS EventBridge
When to use which: Service Bus = business messages, ordering, DLQ needed. Event Hubs = high-throughput streaming/telemetry. Event Grid = lightweight reactive events from Azure services INTERVIEW
Azure Storage Queue — basic simple queue. 64KB max, 7-day visibility. Free tier. Use for simple decoupling when Service Bus features not needed

📖 Service Bus Docs 📖 Event Hubs Docs 📖 Event Grid Docs

36

Azure

Azure DevOps — Pipelines, Repos, Boards, Artifacts

⏱ 2 days

⚡ Know this already — mark all done

Azure Pipelines, Repos, Artifacts, GitHub Actions on Azure

Must Know

▸

Azure DevOps — suite: Boards (Jira-like), Repos (Git), Pipelines (CI/CD), Test Plans, Artifacts (package registry)
Azure Repos — Git repositories. Unlimited free private repos. Branch policies (require PR, min reviewers)
Azure Pipelines — CI/CD. YAML-based pipelines. Self-hosted agents (your VMs) or Microsoft-hosted agents
Pipeline YAML structure: trigger → stages → jobs → steps (tasks and scripts)
Service connections — connect pipeline to Azure subscription, GitHub, Docker Hub, Kubernetes
Variable groups — shared variables across pipelines. Link to Key Vault for secrets
Environments — staging, production with approval gates. Required approvers before deploy
Azure Artifacts — private Maven, npm, NuGet, Python package registry
GitHub Actions → Azure — deploy to Azure from GitHub Actions using Azure service principals or OIDC (federated identity, no stored secrets)
Deployment strategies: Rolling, Blue-Green, Canary in Azure Pipelines and AKS

📖 Azure Pipelines Docs 📖 GitHub Actions + Azure

37

Azure

Monitoring — Azure Monitor, Log Analytics, App Insights, Defender

⏱ 2 days

⚡ Know this already — mark all done

Azure Monitor, Log Analytics, Application Insights, Defender for Cloud

Must Know

▸

Azure Monitor — umbrella monitoring platform. Collects metrics (Azure resources), logs (Azure Monitor Logs / Log Analytics)
Log Analytics Workspace — central repository for logs. Query with KQL (Kusto Query Language)
KQL basics: requests | where timestamp > ago(1h) | summarize count() by resultCode. Powerful SQL-like language for log analysis
Azure Metrics — time-series data for Azure resources. Metrics Explorer for visualization. Equivalent to CloudWatch Metrics
Alerts — metric alerts, log search alerts, activity log alerts. Action Groups: email, SMS, webhook, ITSM
Application Insights — APM for apps. SDK for Java (Java agent, Spring Boot starter), Node, .NET. Auto-collects: request rates, latency, exceptions, dependencies, live metrics INTERVIEW
Distributed tracing in App Insights — correlation ID propagated across services. End-to-end transaction tracking
Application Map — visual topology of microservices with failure rates and latency
Smart Detection — ML-based anomaly detection on request rates, latency, failures
Defender for Cloud — Cloud Security Posture Management (CSPM) + Cloud Workload Protection (CWPP). Security score. Vulnerability assessment. Multi-cloud (AWS, GCP too)
Azure Sentinel (Microsoft Sentinel) — SIEM/SOAR. AI-powered threat detection and response across enterprise

📖 Azure Monitor Docs 📖 Application Insights

38

Synthesis

AWS vs Azure — Service Mapping, Certifications, Multi-Cloud Strategy

⏱ 1 day · Synthesis and interview prep

Service Comparison, Certifications Path, Multi-Cloud & Hybrid

System Design

▸

AWS → Azure Service Mapping

EC2 → Azure VMs | Lambda → Azure Functions | ECS/EKS → AKS/Container Apps
S3 → Blob Storage | EBS → Managed Disk | EFS → Azure Files
RDS → Azure SQL / PostgreSQL Flexible | Aurora → Azure SQL Hyperscale | DynamoDB → CosmosDB
VPC → VNet | Security Groups → NSG | ALB → Application Gateway | CloudFront → Front Door
IAM → Entra ID + RBAC | Secrets Manager → Key Vault | CloudTrail → Azure Monitor Activity Log
SQS → Service Bus Queues | SNS → Service Bus Topics / Event Grid | Kinesis → Event Hubs
CloudWatch → Azure Monitor | X-Ray → Application Insights | GuardDuty → Defender for Cloud
Route 53 → Azure DNS | ACM → App Service Managed Certs | WAF → Azure WAF
ECR → ACR | Cognito → Entra External ID (B2C/B2B) | Step Functions → Durable Functions

Certifications Path

AWS: Cloud Practitioner (CLF-C02) → Solutions Architect Associate (SAA-C03) → Developer Associate → Solutions Architect Pro → DevOps Pro
Azure: AZ-900 Fundamentals → AZ-104 Administrator → AZ-204 Developer → AZ-305 Architecture → AZ-400 DevOps
Start with: AWS SAA-C03 if targeting backend/architecture roles. AZ-204 if company is Microsoft/Azure-heavy

Multi-Cloud & When to Choose

Choose AWS — largest service catalog, most mature, most startups, best Lambda ecosystem
Choose Azure — Microsoft shops (.NET, Office 365, Active Directory), enterprise compliance needs, hybrid cloud with on-prem Windows
Multi-cloud — use Terraform for portable IaC. Kubernetes for portable compute. Service mesh (Istio) for portable networking. But resist premature multi-cloud — it adds operational complexity

🏗 Final Capstone

Deploy the same app on BOTH AWS and Azure using Terraform. Use PostgreSQL on RDS (AWS) and PostgreSQL Flexible Server (Azure). Blob/S3 for file storage. Service Bus (Azure) / SQS (AWS) for messaging. GitHub Actions for CI/CD to both clouds. This proves cloud-agnostic skills.

📖 Azure for AWS Professionals 📖 AWS vs Azure Full Comparison 💻 Practice Exams

Database & CloudComplete Roadmap

Welcome back

Database & Cloud
Complete Roadmap