Database & Cloud
Complete Roadmap
PostgreSQL deep dive, all major NoSQL databases, AWS from basics to advanced, and Azure from scratch. In exact order with docs, videos, practice links, and interview tips. Skip what you know using ⚡.
- SELECT * vs explicit columns — always prefer explicit columns in production (bandwidth + clarity)
- WHERE clauses: =, !=, <, >, <=, >=, BETWEEN, IN, NOT IN, LIKE (%), ILIKE (case-insensitive), IS NULL, IS NOT NULL
- AND / OR / NOT — precedence: NOT > AND > OR. Use parentheses to be explicit
- ORDER BY col ASC|DESC NULLS FIRST|LAST — multiple column sorts: ORDER BY dept, salary DESC
-
LIMIT + OFFSET — page 3, size 10:
LIMIT 10 OFFSET 20. Performance degrades on large offsets — prefer cursor-based pagination - DISTINCT — remove duplicate rows. DISTINCT ON (col) — keep first row per distinct value of col (PostgreSQL-specific)
-
Aliases:
SELECT first_name AS name, table aliases in JOINs:FROM users u
- INNER JOIN — only rows matching on both sides. Most common INTERVIEW
- LEFT JOIN — all rows from left table + matching from right. Non-matches → NULL on right side
- RIGHT JOIN — mirror of LEFT JOIN. Rarely used (swap tables to use LEFT instead)
- FULL OUTER JOIN — all rows from both, NULLs where no match
- CROSS JOIN — cartesian product. N × M rows. Used for generating combinations
- SELF JOIN — join table to itself. Classic: employees and their managers in same table
-
JOIN on multiple conditions:
ON a.id = b.a_id AND a.type = b.type -
Anti-join pattern:
LEFT JOIN ... WHERE b.id IS NULL— rows in A with no match in B
"What's the difference between INNER JOIN and LEFT JOIN?" — INNER returns only matched rows; LEFT returns all from the left table. Be able to explain with a concrete example: find all users who have never placed an order.
- Aggregate functions: COUNT(*), COUNT(DISTINCT col), SUM(), AVG(), MAX(), MIN(), BOOL_AND(), BOOL_OR(), STRING_AGG(col, ','), ARRAY_AGG(col)
- GROUP BY — every non-aggregate column in SELECT must be in GROUP BY
- HAVING — filter groups AFTER aggregation. WHERE filters rows BEFORE aggregation INTERVIEW
-
Correlated subquery — references outer
query's columns:
WHERE salary > (SELECT AVG(salary) FROM employees e2 WHERE e2.dept = e1.dept) -
CTE (WITH clause):
WITH active_users AS (SELECT * FROM users WHERE active = true) SELECT * FROM active_users WHERE age > 25 - Recursive CTE — for hierarchical data (org charts, file trees): WITH RECURSIVE. Base case + recursive case
- GROUPING SETS, ROLLUP, CUBE — multiple GROUP BY combinations in one query
-
function() OVER (PARTITION BY col ORDER BY col ROWS/RANGE BETWEEN ...) - PARTITION BY — like GROUP BY but doesn't collapse rows. Each row keeps its own identity
- ORDER BY in OVER() — defines ordering within each partition
- ROW_NUMBER() — unique sequential number. No ties. 1,2,3,4...
- RANK() — ties get same rank, next rank skipped. 1,1,3,4...
- DENSE_RANK() — ties get same rank, no gaps. 1,1,2,3... INTERVIEW
- NTILE(n) — divide into N buckets (quartiles, deciles, etc.)
- PERCENT_RANK(), CUME_DIST() — relative position as percentage
- LAG(col, offset, default) — access previous row's value. Great for day-over-day comparison
- LEAD(col, offset, default) — access next row's value
- FIRST_VALUE(col) — first value in the window partition
- LAST_VALUE(col) — last value (watch out for default frame — use ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
- NTH_VALUE(col, n) — Nth value in window
- SUM() OVER(ORDER BY date) — running/cumulative total
- AVG() OVER(PARTITION BY dept) — dept average alongside each row
- COUNT() OVER() — total count without collapsing rows
Classic: "Find the second highest salary per department" —
ROW_NUMBER() OVER (PARTITION BY dept ORDER BY salary
DESC)
then filter WHERE rn = 2. Also: "What's the difference
between RANK and DENSE_RANK?"
-
CASE WHEN:
CASE WHEN age < 18 THEN 'minor' WHEN age < 65 THEN 'adult' ELSE 'senior' END - COALESCE(a, b, c) — first non-NULL value. Use for default values
-
NULLIF(a, b) — returns NULL if a = b, else
returns a. Avoid division by zero:
total / NULLIF(count, 0) - String: LOWER(), UPPER(), TRIM(), LTRIM(), RTRIM(), CONCAT(), ||, LENGTH(), SUBSTRING(str FROM n FOR len), REPLACE(), REGEXP_REPLACE(), SPLIT_PART(), POSITION(), INITCAP()
- Date/Time: NOW(), CURRENT_DATE, CURRENT_TIMESTAMP, EXTRACT(YEAR FROM date), DATE_PART(), DATE_TRUNC('month', ts), AGE(ts1, ts2), date + INTERVAL '7 days'
- AT TIME ZONE: convert between time zones. Always store timestamps as UTC
- GENERATE_SERIES(start, stop, step) — generate sequences, date ranges
-
Type casting:
::integer,::text,CAST(col AS type)
- Integer types: SMALLINT (2B), INTEGER (4B), BIGINT (8B), SERIAL/BIGSERIAL (auto-increment)
- Exact numeric: NUMERIC(p, s) / DECIMAL — for money, never use FLOAT (precision issues)
- Text: VARCHAR(n) (capped), TEXT (unlimited, preferred in Postgres — same performance as VARCHAR)
- UUID: gen_random_uuid() (pg 13+). Use as PK for distributed systems
- Boolean: true/false. Also: 't', 'f', '1', '0', 'yes', 'no'
- Timestamps: TIMESTAMP (no tz), TIMESTAMPTZ (with timezone, preferred). DATE (date only), TIME
-
ARRAY types:
tags TEXT[]— PostgreSQL-native arrays. Use with ANY(), @>, unnest() -
ENUM:
CREATE TYPE status AS ENUM ('PENDING','ACTIVE','CLOSED')— type-safe, but hard to alter - BYTEA — binary data (file content). Usually better to store files in S3, path in DB
- PRIMARY KEY — unique + not null. Single or composite. Creates B-tree index automatically
- FOREIGN KEY — referential integrity. ON DELETE CASCADE / RESTRICT / SET NULL / SET DEFAULT
- UNIQUE — column or multi-column composite unique. Allows one NULL (NULLs not considered equal)
- NOT NULL — application-level enforcement. Always add where data is required
-
CHECK constraint:
CHECK (age >= 0 AND age < 150),CHECK (end_date >= start_date) - EXCLUSION constraint — advanced: no two rows can have overlapping ranges (using GiST index)
- Deferrable constraints — check at end of transaction instead of per-statement. DEFERRABLE INITIALLY DEFERRED
-
Named constraints —
CONSTRAINT fk_user FOREIGN KEY— easier to identify in error messages
- 1NF — atomic values (no arrays/repeating groups), unique rows, single-valued columns
- 2NF — 1NF + no partial dependency (all non-key columns depend on FULL primary key, relevant for composite PKs)
- 3NF — 2NF + no transitive dependency (non-key column depends on another non-key column) INTERVIEW
- BCNF (Boyce-Codd) — stronger 3NF. Every determinant must be a candidate key
- One-to-Many — FK in the "many" side table. Most common relationship
- Many-to-Many — junction/bridge table with two FKs. Can add extra columns to junction (e.g. user_roles with assigned_at)
- One-to-One — FK in either table (usually in the "optional" side). Can also use same PK
- Denormalization — intentional duplication for read performance. Cache column (e.g. post_count on users), reporting tables, materialized views
- UUID vs BIGSERIAL PK: UUID = globally unique, no coordination needed, bigger (16B), random → index fragmentation; BIGSERIAL = sequential, smaller (8B), faster inserts/index INTERVIEW
"Explain 1NF, 2NF, 3NF with examples." Be ready with a concrete scenario — e.g. an orders table with product names repeated (violates 3NF). Also: "When would you denormalize?"
- B-tree (default) — balanced tree. Best for: equality, range (<, >, BETWEEN), ORDER BY, LIKE 'prefix%'. 90% of your indexes will be B-tree
- Hash index — equality only (=). Slightly faster than B-tree for pure equality, but no range support and not WAL-logged before Pg 10
- GIN (Generalized Inverted Index) — for JSONB, arrays, full-text search. Indexes each element. Slow to build, fast to query
- GiST (Generalized Search Tree) — for geometric types, ranges, PostGIS. Supports custom operators (overlap, contains)
- BRIN (Block Range Index) — tiny index for naturally ordered data (timestamps, IDs). Very fast insert, approximate search. Great for append-only time-series tables
-
Partial index:
CREATE INDEX ON orders(user_id) WHERE status = 'PENDING'— smaller index, used only when WHERE matches INTERVIEW -
Composite index:
CREATE INDEX ON orders(user_id, created_at)— leftmost prefix rule: index usable as (user_id) or (user_id, created_at) but NOT just (created_at) -
Covering index (INCLUDE):
CREATE INDEX ON users(email) INCLUDE (name, status)— index-only scan, no heap access - When to add index: high-cardinality WHERE clause columns, JOIN ON columns, ORDER BY on large tables, foreign key columns
- Index overhead: every index slows INSERT/UPDATE/DELETE. Don't over-index write-heavy tables
- CREATE INDEX CONCURRENTLY — add index without locking table. Slower but production-safe
- EXPLAIN — show estimated query plan (no execution). EXPLAIN ANALYZE — actually runs query, shows actual times
- EXPLAIN (ANALYZE, BUFFERS, FORMAT JSON) — full info for pgbadger/explain.dalibo.com
- Seq Scan — reads every row. Fine for small tables or large % of data. Bad on large tables with selective filter
- Index Scan — uses index to find rows, then fetches from heap. Good for selective queries
- Index Only Scan — data found entirely in index, no heap fetch. Fastest
- Bitmap Heap Scan — batches index results, then fetches heap pages. Good for medium selectivity
- Hash Join vs Nested Loop vs Merge Join: Hash (large tables, equality), Nested Loop (small inner, with index), Merge (pre-sorted large tables) INTERVIEW
- Cost numbers: (startup cost..total cost) in planner units. Actual time in ms. Rows = estimated vs actual
- Row estimate way off? Run ANALYZE table_name to update statistics. Or increase default_statistics_target
- pg_stat_statements extension — track slow queries across time. Best way to find production bottlenecks
- auto_explain — log slow query plans automatically
"How would you investigate a slow query?" — EXPLAIN ANALYZE, look for Seq Scan on large tables, check row estimates vs actuals, verify indexes exist and are being used, check for missing statistics, consider query rewrite.
- VIEW — named SELECT query. No data stored. Always fresh but no performance gain. Use for: security (expose limited columns), DRY SQL
- MATERIALIZED VIEW — stores query result on disk. Fast reads. Must refresh: REFRESH MATERIALIZED VIEW CONCURRENTLY mv_name. Use for: expensive aggregations, reporting
- Updatable views — simple views on single table can be updated/inserted/deleted
- PL/pgSQL function — procedural language in PostgreSQL. Returns value or set of rows. Used for complex business logic, triggers
- PROCEDURE — like function but no return value, can use COMMIT/ROLLBACK inside (Pg 11+)
- TRIGGER — auto-execute function on INSERT/UPDATE/DELETE. BEFORE or AFTER. Row-level or statement-level. Use for: audit logging, auto-update updated_at column
-
RETURNING clause —
INSERT INTO users (...) VALUES (...) RETURNING id, created_at— get inserted row data without a second SELECT
- Atomicity — all or nothing. BEGIN → operations → COMMIT or ROLLBACK on error
- Consistency — transaction takes DB from one valid state to another. Constraints always hold
- Isolation — concurrent transactions behave as if serial. Degree controlled by isolation level
- Durability — committed data survives crashes. Written to WAL (Write-Ahead Log) before confirming
- PostgreSQL default isolation: READ COMMITTED — sees committed data at time of each statement INTERVIEW
- Isolation levels: READ UNCOMMITTED (dirty reads — Pg treats same as READ COMMITTED) → READ COMMITTED → REPEATABLE READ → SERIALIZABLE
- Dirty read — read uncommitted data. Can't happen in PostgreSQL (MVCC prevents it)
- Non-repeatable read — same row returns different data in same transaction. Prevented by REPEATABLE READ
- Phantom read — new rows appear in same query within transaction. Prevented by SERIALIZABLE
- SAVEPOINT — partial rollback within transaction: SAVEPOINT sp1; ... ROLLBACK TO SAVEPOINT sp1;
- MVCC (Multi-Version Concurrency Control) — PostgreSQL keeps multiple versions of rows. Readers never block writers, writers never block readers INTERVIEW
- SELECT FOR UPDATE — lock selected rows for update. Other transactions block until lock released. Pessimistic locking
- SELECT FOR UPDATE SKIP LOCKED — skip already-locked rows. Great for job queues
- SELECT FOR SHARE — shared read lock. Multiple readers, blocks writers
-
NOWAIT — fail immediately if lock not
available:
SELECT FOR UPDATE NOWAIT - Advisory locks — application-level locks: pg_try_advisory_lock(key). Not tied to rows. Use for distributed coordination
- Deadlock — T1 holds A waits B, T2 holds B waits A. PostgreSQL auto-detects and aborts one (deadlock_timeout = 1s default) INTERVIEW
- Deadlock prevention: always acquire locks in same order. Keep transactions short
- VACUUM — reclaim space from dead row versions (MVCC creates dead rows). AUTOVACUUM runs automatically
- Table bloat — too many dead tuples. Monitor with pg_stat_user_tables
"How does MVCC work in PostgreSQL?" — Each transaction sees a snapshot of data. Rows have xmin/xmax to track visibility. New versions created on update, old versions cleaned by VACUUM. Readers and writers don't block each other.
- JSON vs JSONB — JSON stores text, JSONB is binary (parsed, indexed-able). Always use JSONB
-
Operators:
->(get JSON object),->>(get as text),#>(path array),#>>(path as text) -
data->'user'->>'email'— nested access -
Containment:
@>— does left contain right?data @> '{"status": "active"}' -
Key existence:
?(key exists),?|(any key exists),?&(all keys exist) -
GIN index on JSONB:
CREATE INDEX ON products USING GIN (attributes)— indexes all keys/values - jsonb_set() — update nested key in place
- jsonb_object_keys(), jsonb_each() — expand to rows
- When to use JSONB: semi-structured data with varying schema (product attributes, event metadata, config). Not for core business data that you'll query often by specific fields
- tsvector — document: normalized lexemes. tsquery — search query
-
to_tsvector('english', title || ' ' || body)— parse text to tsvector -
to_tsquery('english', 'web & search')— query with AND/OR/NOT -
plainto_tsquery()— natural language query (auto-AND) -
@@ operator:
tsvector @@ tsquery— matches? -
ts_rank(tsvector, tsquery)— relevance score for ordering - GIN index on tsvector column — fast full-text search
-
tags TEXT[]— array column.ARRAY['a', 'b']or'{a, b}'::text[] -
@>contains,<@contained by,&&overlaps -
unnest(tags)— expand array to rows. ANY(tags) — check membership - GIN index on array column for fast containment queries
- daterange, tstzrange, int4range — represent a range of values
-
GiST index for overlap/contains queries:
WHERE period @> CURRENT_DATE
- uuid-ossp / pgcrypto — UUID generation
- pg_trgm — trigram similarity for fuzzy search (LIKE with GIN index)
- PostGIS — geospatial data (geography, geometry types, distance queries)
- pg_stat_statements — query performance tracking
- TimescaleDB — time-series extension for Postgres
- Why partition — manage very large tables (100M+ rows). Pruning: queries only scan relevant partitions. Easier archival (drop old partitions)
-
Range partitioning — by date range. Most
common:
PARTITION BY RANGE (created_at). Create:PARTITION OF orders FOR VALUES FROM ('2024-01-01') TO ('2025-01-01') -
List partitioning — by discrete values:
PARTITION BY LIST (region) -
Hash partitioning — distribute rows evenly
by hash:
PARTITION BY HASH (user_id) - Partition pruning — query planner excludes irrelevant partitions when WHERE clause matches partition key
- Partitioned indexes — each partition has its own index. Global indexes limited
- Default partition — catches rows not matching any partition
- Declarative partitioning vs inheritance-based — use declarative (Pg 10+)
- Streaming replication (physical) — binary WAL stream from primary to standby. Hot standby: standby is readable. Used for HA and read replicas
- Synchronous vs asynchronous replication — sync: primary waits for standby ACK (no data loss, slower). Async: WAL sent without waiting (faster, tiny data loss possible) INTERVIEW
- Read replicas — route SELECT queries to replica, writes to primary. Reduce primary load. Eventual consistency: replication lag
- Logical replication — replicate specific tables at SQL level. Allows filtering, transformation. Can replicate to different Postgres major versions
- Replication slot — tracks replica's WAL position. Prevents WAL cleanup. Danger: lagging slot causes disk fill
- Patroni / pg_auto_failover — automatic failover: promotes standby to primary if primary fails
- Connection pooling with PgBouncer — thousand app connections → dozens of DB connections. Transaction mode vs session mode
- RTO/RPO: Recovery Time Objective (how fast back up), Recovery Point Objective (how much data loss acceptable) INTERVIEW
-
pg_dump — logical backup of database/table
to SQL or custom format.
pg_dump -Fc dbname > backup.dump - pg_restore — restore from pg_dump custom format
- pg_dumpall — dump all databases + roles + tablespaces
- PITR (Point-In-Time Recovery) — continuous WAL archiving + base backup. Restore to any point in time
- pgBackRest / Barman — enterprise backup tools with compression, encryption, differential backups
- Roles — CREATE ROLE app_user LOGIN PASSWORD '...'. GRANT/REVOKE privileges
- Least privilege: app should have only CONNECT, SELECT/INSERT/UPDATE/DELETE on specific tables. Never SUPERUSER in production
-
Row-Level Security (RLS) — filter rows per
user:
CREATE POLICY user_policy ON orders USING (user_id = current_user_id()). Enable: ALTER TABLE orders ENABLE ROW LEVEL SECURITY - pg_hba.conf — host-based authentication. Controls who can connect from where
- SSL/TLS — encrypt connections. sslmode=require in JDBC
- shared_buffers — RAM for caching data pages. Start at 25% of RAM
- work_mem — RAM per sort/hash operation. Too high → OOM; too low → disk sorts
- max_connections — use PgBouncer instead of bumping this high
- effective_cache_size — hint to planner about OS cache available
- checkpoint_completion_target, wal_buffers — tune for write performance
- CAP Theorem — in a distributed system you can only guarantee 2 of 3: Consistency, Availability, Partition Tolerance. Networks always partition → choice is C vs A INTERVIEW
- CP systems — consistent + partition tolerant. Cassandra, HBase. May be unavailable during partition
- AP systems — available + partition tolerant. CouchDB, DynamoDB. Eventually consistent
- CA systems — not realistic in distributed systems. Traditional RDBMS on single node is CA
- BASE — Basically Available, Soft state, Eventually consistent. Most NoSQL follow this vs ACID INTERVIEW
- Eventual consistency — all replicas will converge to same state, but not immediately. Client may read stale data
- NoSQL types:
- 📄 Document (MongoDB, CouchDB) — JSON documents. Flexible schema. Good for content, catalogs
- 🔑 Key-Value (Redis, DynamoDB) — simple fast lookup. Caching, sessions, leaderboards
- 📊 Wide-Column (Cassandra, HBase) — sparse table, partition key + clustering. Time-series, IoT
- 🕸️ Graph (Neo4j, Amazon Neptune) — nodes + edges. Social networks, recommendations
- 🔍 Search (Elasticsearch, OpenSearch) — full-text inverted index. Log analytics, search
- NoSQL data modeling philosophy — model for your queries (denormalize), not for relationships. "Know your access patterns first"
"When would you choose NoSQL over SQL?" — No fixed schema, massive horizontal scale, high write throughput, specific data model (graph, timeseries). And: "Explain the CAP theorem with an example."
- Document — JSON-like BSON object. Each document has _id (ObjectId by default)
- Collection — group of documents (like a table, but no fixed schema)
- ObjectId — 12-byte unique id: 4B timestamp + 5B random + 3B increment. Sortable by creation time
- insertOne(), insertMany() — insert documents
- findOne({filter}), find({filter}) — query. find() returns cursor (iterate with .toArray(), .forEach())
- Query operators: $eq, $ne, $gt, $lt, $gte, $lte, $in, $nin, $exists, $type, $regex, $and, $or, $not, $nor
-
Projection:
find({}, {name: 1, email: 1, _id: 0})— include/exclude fields - updateOne({filter}, {$set: {field: val}}), updateMany(), replaceOne()
- Update operators: $set, $unset, $inc, $push, $pull, $addToSet, $pop, $rename
- deleteOne(), deleteMany()
- findOneAndUpdate(), findOneAndDelete() — atomic find + modify
- upsert: true — insert if not found, update if found
-
Dot notation — query nested:
{"address.city": "NYC"} - Array queries: $elemMatch for array element conditions, $size for array length
-
Single field index:
db.users.createIndex({email: 1})— 1 = ascending, -1 = descending -
Compound index:
createIndex({status: 1, createdAt: -1})— ESR rule: Equality → Sort → Range fields - Multikey index — automatically created when indexing array field. Indexes each element
-
Text index — full-text search:
createIndex({title: "text", body: "text"}) -
Wildcard index — index all fields:
createIndex({"$**": 1}). Flexible but large - Sparse index — only index documents that have the field (skip nulls)
-
TTL index — auto-delete documents after
expiry:
createIndex({createdAt: 1}, {expireAfterSeconds: 3600}) - explain("executionStats") — check if index used. Look for COLLSCAN (bad) vs IXSCAN (good)
- Embedding vs Referencing — embed when data is always accessed together and 1-to-few. Reference when data is large, shared, or 1-to-many INTERVIEW
- 16MB document limit — design around it. Never unbounded arrays
- Schema design patterns: Extended Reference, Subset, Computed, Bucket (for time-series), Polymorphic
"When would you embed vs reference in MongoDB?" — Embed for data always accessed together (user + address), reference for large/shared data (posts + author). Also: "How would you model a blog with posts and comments?"
- $match — filter documents (like WHERE). Put early to reduce data processed
-
$group — group and aggregate:
{_id: "$dept", total: {$sum: "$salary"}, count: {$sum: 1}} - $project — reshape: include/exclude/compute fields
- $sort — sort documents
- $limit, $skip — pagination
- $unwind — deconstruct array into separate documents (1 doc per element)
-
$lookup — LEFT OUTER JOIN between
collections:
{from: "products", localField: "productId", foreignField: "_id", as: "product"} - $addFields / $set — add computed fields
- $replaceRoot — replace document with a nested field
- $facet — run multiple sub-pipelines in parallel (great for faceted search)
- $bucket, $bucketAuto — histogram/bucketing
- $out, $merge — write aggregation results to a collection
- Multi-document transactions — ACID across collections/databases. Requires replica set. Added in MongoDB 4.0. Use only when needed (document model often avoids need)
- Change streams — real-time event stream of collection changes. Uses replica set oplog. Build event-driven apps
- MongoDB Atlas — managed MongoDB cloud. Atlas Search (Lucene), Atlas Vector Search, Atlas Data Federation
- String — bytes, integers, binary. GET/SET, INCR, DECR, APPEND, GETSET, SETNX (set if not exists), GETDEL
- Hash — field-value map. HGET/HSET, HMGET/HMSET, HGETALL, HDEL, HEXISTS. Use for: user sessions, object attributes
- List — linked list. LPUSH/RPUSH, LPOP/RPOP, LRANGE, LLEN, BLPOP (blocking). Use for: queues, stacks, recent activity
- Set — unordered unique strings. SADD, SMEMBERS, SISMEMBER, SUNION, SINTER, SDIFF, SCARD. Use for: tags, unique visitors
- Sorted Set (ZSet) — unique members with float score. ZADD, ZRANGE, ZREVRANGE, ZRANK, ZSCORE, ZRANGEBYSCORE. Use for: leaderboards, rate limiting, scheduling INTERVIEW
- HyperLogLog — approximate distinct count. PFADD, PFCOUNT. 12KB memory for billions of items
- Streams — append-only log with consumer groups. Like Kafka-lite. XADD, XREAD, XGROUP
- Geospatial — GEOADD, GEODIST, GEOPOS, GEORADIUS
- TTL/EXPIRE — EXPIRE key 3600, EXPIREAT key timestamp, PERSIST (remove TTL), TTL (check remaining)
- Pub/Sub — PUBLISH channel msg, SUBSCRIBE channel. Not durable — use Streams for persistence
- Transactions — MULTI/EXEC block. Atomic but not rollback-on-error. WATCH for optimistic locking
- Lua scripts — EVAL for atomic multi-operation sequences
- Persistence: RDB (snapshots), AOF (append-only log), RDB+AOF (both). Trade-off: RDB = smaller/faster restart, AOF = less data loss INTERVIEW
- Redis Cluster — horizontal sharding. 16384 hash slots distributed across nodes
- Redis Sentinel — HA for single shard: monitors, notifies, automatic failover
- Eviction policies — when maxmemory reached: noeviction, allkeys-lru, volatile-lru, allkeys-lfu, etc.
Classic patterns: "How would you implement a rate limiter?" (INCR + EXPIRE or Sorted Set sliding window). "How to implement a leaderboard?" (Sorted Set ZADD/ZREVRANK). "How to implement distributed locking?" (SET nx px with Lua for release).
- Masterless (peer-to-peer) — no single point of failure. Any node can accept any request. Uses consistent hashing ring INTERVIEW
- Partition key — determines which node stores the data. Choose key that distributes data evenly and matches query patterns
- Clustering key — sort order within a partition. Enables range queries within a partition
-
Compound primary key:
PRIMARY KEY (partition_key, clustering_key1, clustering_key2) - Vnodes (virtual nodes) — each physical node owns multiple token ranges. Easier to add/remove nodes
- Gossip protocol — nodes share state information with each other (membership, load)
- SSTable + Memtable — writes go to memtable (in-memory), flushed to SSTable on disk. Compaction merges SSTables
- Keyspace — like a database. Specify replication: SimpleStrategy (dev) or NetworkTopologyStrategy (prod)
-
CREATE TABLE messages (user_id UUID, created_at TIMESTAMP, content TEXT, PRIMARY KEY (user_id, created_at)) - CQL queries MUST filter by partition key — otherwise full table scan (ALLOW FILTERING is dangerous)
- INSERT = UPSERT — Cassandra always upserts. No update anomalies. Use TTL on insert
- Lightweight transactions (IF NOT EXISTS / IF condition) — compare-and-swap using Paxos. Heavy — use rarely
- Collections in CQL: list, set, map as column types
- Replication factor (RF) — how many replicas per piece of data. RF=3 is typical
- Consistency levels: ONE, TWO, QUORUM (majority), LOCAL_QUORUM, ALL. QUORUM reads + QUORUM writes = strong consistency INTERVIEW
- Tunable consistency — per-query consistency. Trade availability for consistency
- Hinted handoff — if replica is down, coordinator stores hint, delivers when replica recovers
- Read repair — during read, fix inconsistent replicas automatically
"How does Cassandra achieve high availability?" — Masterless ring, tunable consistency, replication factor. Also: "How do you design a table in Cassandra?" — Start with query, design partition key to support that query, no JOINs, denormalize aggressively.
- Inverted index — maps terms to documents containing them. Core data structure behind fast full-text search INTERVIEW
- Index — like a database. Contains documents of similar type
- Shard — horizontal unit. Primary shards + replica shards. Can't change primary shard count after creation
- Node types: Master (cluster state), Data (stores/searches), Coordinating (routes requests), Ingest (pre-processing)
- Analyzer pipeline: Character filter → Tokenizer → Token filters. Determines how text is indexed and searched
- Dynamic mapping — Elasticsearch auto-detects types. Risky in production (mapping explosion)
- Explicit mapping — define field types: text (analyzed), keyword (exact), integer, date, nested, object, boolean, geo_point
- text vs keyword: text = analyzed/full-text searchable; keyword = exact match, aggregatable, sortable INTERVIEW
- Nested type — for arrays of objects where you need to query across fields of same object
- Index templates + ILM — auto-apply settings to new indices. Index Lifecycle Management for time-based indices
- match — full-text search on analyzed fields. Relevance scored
- term / terms — exact match on keyword fields. No analysis. Used in filters
- bool query — combine: must (AND, scored), should (OR, boosts score), must_not, filter (AND, no score, cached)
-
range — date/number ranges:
{"range": {"date": {"gte": "2024-01-01"}}} - multi_match — search across multiple fields with different boosts
- fuzzy — typo-tolerant search via Levenshtein distance
- highlight — return matched snippets for display
- Bucket aggregations — terms (group by value), date_histogram, range, filters
- Metric aggregations — avg, sum, min, max, stats, percentiles
- Pipeline aggregations — work on results of other aggregations
- Nested aggregations — sub-aggregations within buckets (powerful faceted search)
-
Nodes — entities with labels (like types)
and properties.
(:User {name: 'Alice'}) -
Relationships (edges) — typed, directional,
with properties.
[:FOLLOWS {since: '2023'}] - Property graph model — both nodes and edges have properties. Richer than RDF
- Index-free adjacency — traversal is O(1) per hop (pointer to next node). SQL JOINs are O(n) per hop INTERVIEW
-
Cypher (MATCH pattern) —
MATCH (u:User)-[:FOLLOWS]->(other:User) WHERE u.name = 'Alice' RETURN other.name - CREATE, MERGE, SET, DELETE, DETACH DELETE — write operations
-
Variable length paths:
MATCH (a)-[:FOLLOWS*1..3]->(b)— follow up to 3 hops - OPTIONAL MATCH — like LEFT JOIN in graphs
- Graph algorithms: PageRank (influence), Community Detection, Shortest Path (DIJKSTRA), Centrality, Node Similarity
- When to use graph DB: social networks, fraud detection (suspicious patterns), recommendation engines, knowledge graphs, network/IT topology, supply chain
- DynamoDB vs Neo4j for relationships — DynamoDB can model graphs but traversal requires multiple round trips; Neo4j is native
- PostgreSQL — default choice. Complex queries, transactions, strict consistency, relational data, JOINs needed, data integrity critical
- MongoDB — flexible/evolving schema, document-like data (products with varying attributes), rapid prototyping, content management
- Redis — caching, session storage, rate limiting, pub/sub, leaderboards, job queues, distributed locks. NOT primary datastore
- Cassandra — massive write throughput, time-series data, globally distributed, always available (AP), IoT sensor data, activity streams, 100M+ rows
- Elasticsearch — full-text search, log aggregation, analytics dashboards, faceted search, autocomplete, ELK stack
- Neo4j — highly connected data, relationship traversal, fraud detection, social graphs, recommendation engines
- DynamoDB (AWS) — managed key-value/document, auto-scaling, serverless, single-digit ms at any scale. But expensive and limited query patterns
- Polyglot persistence — use multiple databases for different purposes in same system. E.g. PostgreSQL for orders + Elasticsearch for search + Redis for caching INTERVIEW
Build a product catalog: PostgreSQL (users, orders), MongoDB (product catalog with varying attributes), Elasticsearch (product search with facets), Redis (session + cache). Connect all via a REST API.
- Region — geographic area with 2+ AZs. Independent. ~34 regions. Choose closest to users for latency
- Availability Zone (AZ) — one or more data centers within a region. Physically separate, connected with low-latency fiber. Deploy across 2+ AZs for HA INTERVIEW
- Edge Locations — CloudFront CDN PoPs. Closer to users than regions. ~400+ worldwide
- Local Zones — extend AWS infrastructure to metro areas for ultra-low latency
- Shared Responsibility Model — AWS owns "security OF the cloud" (hardware, network, hypervisor). You own "security IN the cloud" (OS, app, data, IAM) INTERVIEW
- Free Tier — EC2 t2.micro 750 hrs/mo, RDS 750 hrs/mo, S3 5GB, Lambda 1M requests. Set $10 billing alarm immediately!
-
AWS CLI setup:
aws configure— access key, secret, region, output format - AWS console vs CLI vs SDK vs CDK vs CloudFormation — understand when to use each
- Well-Architected Framework 6 pillars: Operational Excellence, Security, Reliability, Performance, Cost, Sustainability
- Root account — never use for daily work. Enable MFA. Create admin IAM user instead
- IAM User — has access key + secret (CLI/SDK) or password (console). Belongs to groups
- IAM Group — collection of users. Attach policies to group, not individual users
- IAM Role — identity that can be assumed by users, services, or other accounts. No static credentials — gets temporary tokens via STS. USE ROLES FOR EC2/ECS/Lambda INTERVIEW
- IAM Policy — JSON document: Effect (Allow/Deny), Action (s3:GetObject), Resource (ARN), Condition
- Policy types: Identity-based (attached to user/role), Resource-based (attached to S3 bucket, Lambda), Permission boundary, SCP (Service Control Policy)
- Least privilege principle — grant minimum permissions needed. Start restrictive, add as needed
- STS (Security Token Service) — issues temporary credentials. AssumeRole, GetSessionToken
- Instance Profile — IAM role for EC2. App calls IMDS to get temp credentials. No keys in code
- AWS Organizations — multi-account management. SCPs restrict what accounts can do
- AWS SSO / IAM Identity Center — single sign-on across multiple AWS accounts
- Secrets Manager vs Parameter Store — Secrets Manager: auto-rotation, higher cost. SSM Parameter Store: cheaper, manual rotation
"How do EC2 instances access other AWS services securely?" — IAM Role attached to instance via Instance Profile. App uses AWS SDK which auto-fetches temp credentials from IMDS endpoint. NO static keys needed or wanted.
- Instance families: t (burstable, cheapest), m (general), c (compute), r (memory), p/g (GPU). Size: nano, micro, small, medium, large, xlarge, 2xlarge...
- Pricing models: On-Demand (pay by hour, no commitment), Reserved (1-3yr commitment, 60-75% discount), Spot (bid on unused capacity, up to 90% discount, can be interrupted), Savings Plans INTERVIEW
- AMI (Amazon Machine Image) — OS + installed software snapshot. Create custom AMIs for faster launch
- Security Group — stateful firewall at instance level. Inbound + outbound rules. Reference SGs in rules (not IP ranges) for internal communication
- Key pair — SSH access. Keep private key safe — can't recover
- User Data — bootstrap script on first launch. Install software, start services
- EBS-optimized — dedicated bandwidth between EC2 and EBS
- Auto Scaling Group (ASG) — maintain min/max/desired EC2 count. Scale out/in based on CloudWatch metrics (CPU, custom) INTERVIEW
- Launch Template — defines what to launch in ASG (AMI, instance type, SG, IAM role)
- Application Load Balancer (ALB) — HTTP/HTTPS, path-based routing, host-based routing, targets: EC2, containers, Lambda. Sticky sessions
- Network Load Balancer (NLB) — TCP/UDP, ultra-high performance, static IP. For gaming, IoT
- Target Group — ALB routes to target groups. Health checks per target
- EC2 metadata service (IMDS) — get instance metadata + IAM credentials: http://169.254.169.254
- Lambda — run code without servers. Pay per invocation + duration (ms). Max 15 min, 10GB memory, 512MB disk
- Triggers: API Gateway, S3 events, DynamoDB Streams, SQS, SNS, EventBridge, Kinesis, Cognito, ALB
- Cold start — first invocation initializes execution environment. Adds 100-1000ms latency. JVM languages (Java) have high cold start. Mitigate: Provisioned Concurrency, keep-warm pings, SnapStart (Java) INTERVIEW
- Execution environment — reused across invocations. Init code outside handler runs once. DB connections should be global
- Concurrency — each request = one execution environment. 1000 concurrent default limit. Reserved vs provisioned concurrency
- Lambda Layers — shared code/dependencies (like shared JARs). Up to 5 layers. Reduce deployment package size
- Function URL — HTTPS endpoint for Lambda without API Gateway
- Lambda@Edge / CloudFront Functions — run Lambda at CDN edge
- Dead Letter Queue (DLQ) — async invocation failures sent to SQS/SNS
- Event Source Mapping — polling SQS/Kinesis/DynamoDB Streams. Batch size, parallelization factor
- Object storage — key-value: key = full path, value = bytes. Not filesystem. Flat structure. Max 5TB per object
- Bucket globally unique name — lowercase, 3-63 chars. Tied to a region
- Storage classes — Standard (frequent access), Standard-IA (infrequent access, cheaper), One Zone-IA, Intelligent-Tiering (auto moves), Glacier Instant/Flexible/Deep Archive (archive) INTERVIEW
- Lifecycle rules — auto-transition between storage classes or delete after N days
- Versioning — keep multiple versions of objects. Enables recovery from accidental deletes
- MFA Delete — require MFA to permanently delete versioned objects
- Bucket policy vs ACL — use bucket policies (JSON). ACLs are legacy
- Block Public Access — 4 settings to prevent public access. Enable all by default
- Pre-signed URLs — temp URL with expiry for direct client upload/download. Avoid proxying files through your server INTERVIEW
- S3 Events — trigger Lambda/SQS/SNS on object create/delete. For file processing pipelines
- Transfer Acceleration — upload via CloudFront edge → faster global uploads
- Multipart Upload — for files >100MB. Parallel parts, resume on failure
- S3 Select / Glacier Select — run SQL on individual objects. Reduce data transfer
- Cross-Region Replication (CRR) — replicate objects to another region
- EBS — block storage attached to single EC2. Like a hard drive. gp3 (general), io2 (high IOPS), st1 (throughput)
- EFS — managed NFS. Multi-AZ, multi-instance shared filesystem. More expensive than EBS
- VPC — private virtual network. CIDR block (e.g. 10.0.0.0/16). Per region
- Public subnet — has route to Internet Gateway. Resources can have public IPs. Load balancers, bastion hosts go here
- Private subnet — no route to IGW. App servers, databases go here. Outbound via NAT Gateway
- Internet Gateway (IGW) — enables internet access for VPC. Attach to VPC, add route in public subnet's route table
- NAT Gateway — in public subnet. Allows private subnet outbound to internet (software updates, API calls). No inbound. Costs money! INTERVIEW
- Route Table — rules for where traffic goes. Each subnet associated with one route table
- Security Group — stateful. Instance-level. Allow rules only. Both inbound AND outbound. Reference other SGs
- NACL (Network ACL) — stateless. Subnet-level. Allow AND deny rules. Must configure both inbound AND outbound INTERVIEW
- VPC Peering — connect two VPCs. Non-transitive (A↔B, B↔C doesn't mean A↔C). Same or cross-account
- Transit Gateway — hub-and-spoke for many VPCs. Transitive routing. Replaces complex peering meshes
- VPN Gateway — IPSec VPN to on-premises. Encrypted over internet
- Direct Connect — dedicated physical network connection to AWS. Low latency, high bandwidth, not over internet
- Bastion host — jump server in public subnet for SSH into private instances
- VPC Endpoints — access AWS services (S3, DynamoDB) without going through internet. Gateway (S3/DynamoDB, free) vs Interface (powered by PrivateLink, $)
"Design a VPC for a 3-tier web app." — Public subnets (ALB, NAT GW), private subnets (EC2 app servers), isolated subnets (RDS). 2 AZs for HA. SG between each tier allows only necessary traffic.
- RDS — managed relational DB: PostgreSQL, MySQL, MariaDB, Oracle, SQL Server
- Multi-AZ deployment — synchronous standby in another AZ. Auto failover in 1-2 min. For HA, not read scaling INTERVIEW
- Read Replicas — async replication. For read scaling. Up to 15 replicas. Can promote to standalone
- RDS Proxy — connection pooler. Reduces connection overhead for Lambda/serverless
- Automated backups — point-in-time restore within retention period (up to 35 days)
- Parameter Group — configure DB parameters (like postgresql.conf but managed)
- Aurora — AWS-built MySQL/PostgreSQL compatible engine. 5x PostgreSQL performance claim
- Aurora storage — auto-grows up to 128TB, 6 copies across 3 AZs automatically
- Aurora Serverless v2 — auto-scales capacity in fractions of ACU. Great for variable workloads
- Aurora Global Database — cross-region with <1s replication lag. For global HA
- DynamoDB — fully managed key-value + document store. Single-digit ms at any scale
- Primary key — Partition key (hash) or Partition key + Sort key (range) INTERVIEW
- GSI (Global Secondary Index) — alternate partition key. Separate RCU/WCU. Eventual consistency
- LSI (Local Secondary Index) — same partition key, different sort key. Must be created at table creation
- Capacity modes — Provisioned (specify RCU/WCU, autoscaling) vs On-Demand (pay per request, no planning)
- Single-table design — store multiple entity types in one table using generic PK/SK. Reduces cost, enables complex access patterns
- DynamoDB Streams — change data capture. Trigger Lambda on changes
- DAX (DynamoDB Accelerator) — in-memory cache for DynamoDB. Microsecond latency
- ElastiCache for Redis — managed Redis cluster/sentinel. For caching, sessions, queues
- ElastiCache for Memcached — simple key-value caching. Multi-threaded. No persistence
- Cluster Mode — Redis Cluster mode for sharding across multiple nodes
-
Private Docker registry.
aws ecr get-login-password | docker login, docker push - Image scanning, lifecycle policies, cross-region replication
- Task Definition — JSON spec: container image, CPU, memory, environment variables, IAM task role, port mappings, log config
- Service — maintain N running copies of a task. Integrates with ALB for load balancing. Auto-replaces failed tasks
- Cluster — logical group of services/tasks. ECS on Fargate or EC2 launch type
- Fargate — serverless compute for containers. No EC2 management. Pay per task CPU/memory. Per-task IAM role INTERVIEW
- EC2 launch type — you manage EC2 instances. More control, more work
- Service Auto Scaling — scale task count based on CloudWatch metrics (CPU, custom)
- Service Connect / Cloud Map — service discovery between ECS services
- EKS — managed Kubernetes control plane. You manage worker nodes (EC2 or Fargate)
- Core concepts: Pod, Deployment, Service, Ingress, ConfigMap, Secret, Namespace, Node, HPA
-
kubectl — CLI tool. Configure kubeconfig:
aws eks update-kubeconfig - Helm — Kubernetes package manager. Charts for common deployments
- EKS vs ECS: ECS = simpler, AWS-native. EKS = Kubernetes-compatible, more complex, portable INTERVIEW
- Fargate for EKS — serverless Kubernetes pods. No EC2 nodes
- AWS Load Balancer Controller — creates ALB/NLB from Kubernetes Ingress/Service
- SQS (Simple Queue Service) — managed message queue. At-least-once delivery. Up to 256KB messages. 14-day retention. Visibility timeout (processing window)
- SQS Standard — nearly unlimited throughput, at-least-once, best-effort ordering
- SQS FIFO — exactly-once processing, strict ordering. Up to 3000 msg/s with batching INTERVIEW
- Dead Letter Queue (DLQ) — failed messages after max receives. Monitor DLQ size for failures
- Long polling — WaitTimeSeconds=20. Reduces empty responses, cheaper
- SNS (Simple Notification Service) — pub/sub push model. Topic → subscribers (SQS, Lambda, HTTP, email, SMS)
- Fan-out pattern — SNS → multiple SQS queues. Different services process same event independently INTERVIEW
- SNS message filtering — subscribers receive only matching messages (filter by attributes)
- EventBridge — serverless event bus. Routes events from AWS services, custom apps, SaaS. Event patterns + rules + targets. Schema registry
- EventBridge vs SNS — EventBridge: richer routing, schema discovery, 3rd-party integrations. SNS: simpler, higher throughput fan-out
- Kinesis Data Streams — real-time streaming. Multiple consumers, data retention 1-365 days, ordered within shard
- Kinesis vs SQS — Kinesis: replayable, multiple consumers, ordered. SQS: simpler, auto-deletes, one consumer per message
- Kinesis Data Firehose — load streaming data to S3, Redshift, Elasticsearch. Serverless
- API Gateway REST API — create managed REST APIs. Integrate with Lambda, HTTP backend, AWS services. Auth: Cognito, Lambda Authorizer, API Keys
- API Gateway HTTP API — newer, simpler, cheaper. JWT auth built-in. No usage plans
- WebSocket API — persistent connections. For chat, real-time notifications
- API Gateway caching — cache responses for TTL. Reduces Lambda invocations
- API Gateway throttling — default 10K req/s per account. Per-stage, per-client throttling
- Step Functions — serverless workflow orchestration. State machines: Task, Choice, Parallel, Wait, Map states. Visual workflow editor. For complex multi-Lambda orchestration INTERVIEW
- Step Functions Standard vs Express — Standard: at-most-once, long-running, auditable. Express: at-least-once, high throughput, <5min
- AWS SAM — Serverless Application Model. CloudFormation extension with shorthand for Lambda, API Gateway, DynamoDB
- AWS CDK — Cloud Development Kit. Define infrastructure in TypeScript/Python/Java. Compiles to CloudFormation. Higher-level constructs
- CloudFormation — declarative YAML/JSON infrastructure as code. Stack = group of resources. Change sets for updates
- Terraform — multi-cloud IaC. HCL language. State management. More portable than CDK
- CloudWatch Logs — log groups + log streams. Lambda/ECS auto-sends logs. Retention policies. Log Insights (SQL-like query on logs)
- CloudWatch Metrics — time-series metrics. Built-in from AWS services. Custom metrics via PutMetricData API
- CloudWatch Alarms — alert on metric threshold → SNS notification → action. Billing alarms = critical
- CloudWatch Dashboards — aggregate metrics from multiple services into one view
- CloudWatch Contributor Insights — find top contributors to high traffic/errors
- AWS X-Ray — distributed tracing. Trace ID propagated across services. Service map visualization. Find latency bottlenecks INTERVIEW
- CloudTrail — audit log of every AWS API call. Who did what, when, from where. 90-day event history free. Send to S3 for long-term retention
- AWS Config — resource configuration history. Compliance rules (e.g. "all S3 buckets must have encryption")
- GuardDuty — threat detection using ML. Analyzes VPC flow logs, CloudTrail, DNS logs. Alerts on suspicious activity
- AWS Security Hub — aggregate security findings from GuardDuty, Inspector, Config
- CloudFront — CDN. Cache content at 400+ edge locations. Distributions point to origins (S3, ALB, API GW). TTL-based caching. HTTPS termination
- CloudFront Functions / Lambda@Edge — run logic at edge. Viewer request/response, origin request/response
- Route 53 — DNS service. Routing policies: Simple, Weighted (A/B testing), Latency-based, Failover, Geolocation, Multivalue INTERVIEW
- Route 53 Health Checks — monitor endpoints, trigger failover automatically
- ACM (Certificate Manager) — free SSL/TLS certificates. Auto-renew. Use with ALB/CloudFront
- Cognito User Pools — managed user directory. Sign-up, login, MFA, social login, JWT tokens
- Cognito Identity Pools — exchange token for temporary AWS credentials (access S3, DynamoDB directly from app)
- WAF (Web Application Firewall) — protect ALB/API Gateway/CloudFront from OWASP Top 10, rate limiting, geo-blocking
- AWS Shield — DDoS protection. Standard (free, auto). Advanced (24/7 DRT, financial protection)
- Cost optimization: rightsizing EC2 (Compute Optimizer), Savings Plans, Spot instances for batch, S3 Intelligent Tiering, CloudWatch billing alarm, AWS Cost Explorer, Trusted Advisor
Deploy the Java backend from the other roadmap on AWS: ECS Fargate + RDS PostgreSQL + ElastiCache Redis + S3 for file uploads + ALB + CloudFront + Route 53 + Cognito for auth. Set up CI/CD with GitHub Actions pushing to ECR → ECS. Add CloudWatch logging + alarms.
- Management hierarchy: Management Groups → Subscriptions → Resource Groups → Resources
- Subscription — billing boundary. One subscription per department/project/environment is common
- Resource Group — logical container for related resources. Resources in same lifecycle. Apply policies, RBAC, cost tracking at RG level
- Azure Regions — 60+ regions. Each has paired region for geo-redundancy. Choose closest to users
- Availability Zones — physically separate datacenters within a region. Deploy across 3 AZs for HA INTERVIEW
- Azure Portal — web UI. Good for exploration but not for automation
-
Azure CLI:
az login,az group create,az vm create - Azure PowerShell — PowerShell module. More common in Windows-heavy shops
- ARM (Azure Resource Manager) — all Azure resources managed via ARM. JSON templates for IaC
- Bicep — domain-specific language that compiles to ARM. Cleaner than raw JSON. Preferred over ARM templates now
- Azure Free Account — $200 credit for 30 days + 12 months of popular free services
- Microsoft Entra ID (formerly Azure AD) — cloud identity platform. Users, groups, apps, B2B, B2C. Different from on-prem AD
- Azure RBAC — Role-Based Access Control. Assign roles (Owner, Contributor, Reader, custom) to: users, groups, service principals, managed identities at: subscription, resource group, or resource scope INTERVIEW
- Built-in roles: Owner (full access + manage access), Contributor (full access, no RBAC), Reader (read-only), specific roles (Storage Blob Data Contributor, etc.)
- Service Principal — app identity in Entra ID. Like AWS IAM User for apps. Client ID + secret or certificate
- Managed Identity — auto-managed SP for Azure resources. System-assigned (tied to resource lifecycle) or User-assigned (reusable). No credentials to manage. Azure equivalent of EC2 Instance Role INTERVIEW
- Azure Key Vault — store secrets, keys, certificates. Access via Managed Identity. Soft-delete and purge protection
- Key Vault references in App Service — reference Key Vault secrets in app settings without storing values
- Conditional Access — policies: require MFA from outside network, block legacy auth, require compliant device
- Privileged Identity Management (PIM) — just-in-time privileged access. Request → approve → time-limited elevation
- Azure Policy — enforce organizational standards. Audit or deny non-compliant resources (e.g. require tags, allowed regions)
- VM Series: B (burstable), D (general), E (memory), F (compute). Equivalent to EC2 families
- Availability Sets — protect from hardware failure within a datacenter (fault domain + update domain)
- VMSS (Virtual Machine Scale Sets) — auto-scale fleet of identical VMs. Uniform (same config) or Flexible (mix). Equivalent to AWS ASG
- Proximity Placement Groups — colocate VMs in same datacenter for low latency between them
- Spot VMs — use excess Azure capacity at 60-90% discount. Can be evicted. For fault-tolerant batch workloads
- Azure Bastion — browser-based SSH/RDP without public IP. No bastion host to manage
- App Service — PaaS for web apps. Supports Java, .NET, Node, Python, PHP, Ruby. Deploy from Git, Docker, GitHub Actions. Built-in TLS, custom domains
- App Service Plan — defines compute (Free, Basic, Standard, Premium, Isolated). Multiple apps share one plan
- Deployment slots — staging, production slots. Swap with zero downtime (traffic gradually shifted)
- Auto-scaling — manual (scale out/in manually) or autoscale (rule-based or predictive)
- App Service Environment (ASE) — dedicated infrastructure. For compliance/isolation
- Azure Functions — serverless. Triggers: HTTP, Timer (cron), Queue, Blob, Event Hub, Service Bus, CosmosDB INTERVIEW
- Hosting plans: Consumption (pay-per-use, 5min timeout), Premium (pre-warmed instances, no timeout), App Service Plan (dedicated)
- Durable Functions — stateful serverless workflows. Orchestrator + Activity functions. Like AWS Step Functions
- Cold start — same issue as Lambda. Premium plan keeps instances warm
- Function App — container for multiple functions sharing config, storage, scaling
- Azure Blob Storage — object storage. Equivalent to S3. Storage Account → Container → Blob
- Blob types: Block (most files, up to 4.75TB), Append (log files), Page (VHD disks)
- Access tiers: Hot (frequent access, higher storage cost), Cool (infrequent, lower storage), Cold, Archive (cheapest, hours to retrieve) INTERVIEW
- Lifecycle Management — auto-move blobs between tiers or delete based on rules
- SAS (Shared Access Signature) — time-limited token for specific operations. Service SAS (single resource), Account SAS (multiple), User Delegation SAS (uses Entra ID). Equivalent to AWS pre-signed URLs
- Storage redundancy: LRS (3 copies, 1 datacenter), ZRS (3 AZs), GRS (LRS + async to paired region), GZRS (ZRS + GRS) INTERVIEW
- Azure Files — managed SMB/NFS file share. Mount on Windows/Linux/macOS VMs
- Azure Queue Storage — simple message queue. Up to 64KB messages. 7-day retention
- Azure Table Storage — key-value NoSQL. Cheaper than CosmosDB. Limited query capability
- Azure Data Lake Storage Gen2 — Blob Storage + hierarchical namespace. HDFS-compatible. For big data analytics with Spark/Databricks
- Storage Firewall — restrict to specific VNets/IPs. Service Endpoints or Private Endpoints
- Azure SQL Database — managed SQL Server. Serverless tier (auto-pause/scale), Hyperscale (up to 100TB), Business Critical (in-memory OLTP)
- Azure Database for PostgreSQL Flexible Server — managed PostgreSQL. Choose PostgreSQL version. High availability with standby in same or different AZ. Burstable, General Purpose, Memory Optimized tiers
- Azure Database for MySQL Flexible Server — same managed model as PostgreSQL
- Elastic Pool — share DTU/vCores across multiple databases. Cost-efficient for variable per-DB usage
- CosmosDB — globally distributed multi-model database. Single-digit ms latency globally INTERVIEW
- APIs: NoSQL (native, JSON), MongoDB (wire-compatible), Cassandra, Table, Gremlin (graph)
- Consistency levels — 5 levels from Strong to Eventual. Bounded Staleness, Session (default), Consistent Prefix, Eventual
- Request Units (RU) — currency for reads/writes. 1 RU = 1KB read. Budget RUs for cost control
- Partition key — choose for even distribution AND aligns with common queries
- Global distribution — add/remove regions with one click. Multi-region writes (conflict resolution)
- Change Feed — ordered log of changes. For event sourcing, caching, Kafka-like patterns
- Autoscale — automatically scale RU/s based on demand
- Managed Redis. Tiers: Basic (single node), Standard (replicated), Premium (cluster, persistence, Geo-replication), Enterprise (Redis Enterprise)
- Same use cases as ElastiCache for Redis
- VNet (Virtual Network) — isolated private network. CIDR block. Contains subnets. Per region. Equivalent to AWS VPC
- Subnets — subdivide VNet. Some subnets delegated to specific services (App Service, AKS, etc.)
- NSG (Network Security Group) — stateful firewall at subnet or NIC level. Inbound/outbound rules. Priority-based (lower = higher priority) INTERVIEW
- Application Security Groups (ASG) — logical grouping of VMs for NSG rules. Like SG references in AWS
- VNet Peering — connect VNets. Low latency. Non-transitive. Global VNet Peering across regions
- Azure Load Balancer — Layer 4 (TCP/UDP). Internal or public. Zone-redundant. Health probes
- Application Gateway — Layer 7 (HTTP/HTTPS). Path-based routing, SSL termination, WAF. Equivalent to AWS ALB + WAF
- Azure Front Door — global CDN + load balancer + WAF. Anycast routing, URL-based routing, caching. Equivalent to AWS CloudFront + Global Accelerator
- Azure DNS — host DNS zones in Azure. Private DNS zones for VNet resolution
- Private Endpoint — private IP in VNet for Azure service (SQL, Storage, etc.). Traffic stays on Azure backbone
- Service Endpoint — route VNet traffic to Azure service directly (without internet). Less secure than Private Endpoint but free
- VPN Gateway — site-to-site VPN to on-prem. Point-to-site VPN for developers
- ExpressRoute — dedicated private circuit to Azure. Like AWS Direct Connect
- ACR (Azure Container Registry) — managed Docker registry. Geo-replication, image scanning (Defender for Containers), Webhooks. Equivalent to AWS ECR
- ACI (Azure Container Instances) — run containers without orchestration. Pay-per-second. Fast startup. Good for: batch jobs, CI/CD, simple tasks. No auto-scaling
- AKS (Azure Kubernetes Service) — managed Kubernetes. Microsoft manages control plane (free). You pay for worker nodes
- AKS node pools — system (critical pods) and user (workload pods) pools. Mix instance types. Spot node pools for cost
- AKS autoscaler — Cluster Autoscaler (add/remove nodes) + HPA (scale pods)
- Azure CNI vs Kubenet — Azure CNI assigns VNet IPs to pods. Kubenet uses separate pod CIDR. Azure CNI needed for Windows nodes, AKS Private Cluster
- Workload Identity — pods get Azure Managed Identity. No secrets needed for Azure service access
- Azure Container Apps — serverless containers. Built on Kubernetes (Dapr, KEDA). Scale to zero. Traffic splitting. Easier than AKS for microservices. Like AWS App Runner or ECS with KEDA INTERVIEW
- Dapr (Distributed App Runtime) — sidecar for microservices. Service invocation, pub/sub, state, secrets via portable API. Works with Container Apps
- KEDA (Kubernetes Event-Driven Autoscaling) — scale pods based on event sources (queue depth, HTTP traffic)
- Azure Service Bus — enterprise message broker. Queues (point-to-point) and Topics/Subscriptions (pub-sub). Guaranteed delivery, ordering, DLQ, sessions. Equivalent to AWS SQS + SNS combined INTERVIEW
- Service Bus features: Message sessions (ordered processing per session), Scheduled messages, Message deferral, Transactions, Auto-forwarding, DLQ
- Service Bus tiers: Basic (queues only), Standard, Premium (dedicated, VNet, larger messages)
- Azure Event Hubs — event streaming. High throughput (millions/sec). Partitioned. Consumer groups. 1-7 day retention. Kafka-compatible protocol. Equivalent to AWS Kinesis
- Event Hubs Capture — auto-archive to Blob Storage / Data Lake
- Azure Event Grid — reactive event routing. HTTP push model. Routes events from Azure services (Blob, CosmosDB) to handlers (Functions, Logic Apps, Webhooks). At-least-once delivery. Equivalent to AWS EventBridge
- When to use which: Service Bus = business messages, ordering, DLQ needed. Event Hubs = high-throughput streaming/telemetry. Event Grid = lightweight reactive events from Azure services INTERVIEW
- Azure Storage Queue — basic simple queue. 64KB max, 7-day visibility. Free tier. Use for simple decoupling when Service Bus features not needed
- Azure DevOps — suite: Boards (Jira-like), Repos (Git), Pipelines (CI/CD), Test Plans, Artifacts (package registry)
- Azure Repos — Git repositories. Unlimited free private repos. Branch policies (require PR, min reviewers)
- Azure Pipelines — CI/CD. YAML-based pipelines. Self-hosted agents (your VMs) or Microsoft-hosted agents
- Pipeline YAML structure: trigger → stages → jobs → steps (tasks and scripts)
- Service connections — connect pipeline to Azure subscription, GitHub, Docker Hub, Kubernetes
- Variable groups — shared variables across pipelines. Link to Key Vault for secrets
- Environments — staging, production with approval gates. Required approvers before deploy
- Azure Artifacts — private Maven, npm, NuGet, Python package registry
- GitHub Actions → Azure — deploy to Azure from GitHub Actions using Azure service principals or OIDC (federated identity, no stored secrets)
- Deployment strategies: Rolling, Blue-Green, Canary in Azure Pipelines and AKS
- Azure Monitor — umbrella monitoring platform. Collects metrics (Azure resources), logs (Azure Monitor Logs / Log Analytics)
- Log Analytics Workspace — central repository for logs. Query with KQL (Kusto Query Language)
-
KQL basics:
requests | where timestamp > ago(1h) | summarize count() by resultCode. Powerful SQL-like language for log analysis - Azure Metrics — time-series data for Azure resources. Metrics Explorer for visualization. Equivalent to CloudWatch Metrics
- Alerts — metric alerts, log search alerts, activity log alerts. Action Groups: email, SMS, webhook, ITSM
- Application Insights — APM for apps. SDK for Java (Java agent, Spring Boot starter), Node, .NET. Auto-collects: request rates, latency, exceptions, dependencies, live metrics INTERVIEW
- Distributed tracing in App Insights — correlation ID propagated across services. End-to-end transaction tracking
- Application Map — visual topology of microservices with failure rates and latency
- Smart Detection — ML-based anomaly detection on request rates, latency, failures
- Defender for Cloud — Cloud Security Posture Management (CSPM) + Cloud Workload Protection (CWPP). Security score. Vulnerability assessment. Multi-cloud (AWS, GCP too)
- Azure Sentinel (Microsoft Sentinel) — SIEM/SOAR. AI-powered threat detection and response across enterprise
- EC2 → Azure VMs | Lambda → Azure Functions | ECS/EKS → AKS/Container Apps
- S3 → Blob Storage | EBS → Managed Disk | EFS → Azure Files
- RDS → Azure SQL / PostgreSQL Flexible | Aurora → Azure SQL Hyperscale | DynamoDB → CosmosDB
- VPC → VNet | Security Groups → NSG | ALB → Application Gateway | CloudFront → Front Door
- IAM → Entra ID + RBAC | Secrets Manager → Key Vault | CloudTrail → Azure Monitor Activity Log
- SQS → Service Bus Queues | SNS → Service Bus Topics / Event Grid | Kinesis → Event Hubs
- CloudWatch → Azure Monitor | X-Ray → Application Insights | GuardDuty → Defender for Cloud
- Route 53 → Azure DNS | ACM → App Service Managed Certs | WAF → Azure WAF
- ECR → ACR | Cognito → Entra External ID (B2C/B2B) | Step Functions → Durable Functions
- AWS: Cloud Practitioner (CLF-C02) → Solutions Architect Associate (SAA-C03) → Developer Associate → Solutions Architect Pro → DevOps Pro
- Azure: AZ-900 Fundamentals → AZ-104 Administrator → AZ-204 Developer → AZ-305 Architecture → AZ-400 DevOps
- Start with: AWS SAA-C03 if targeting backend/architecture roles. AZ-204 if company is Microsoft/Azure-heavy
- Choose AWS — largest service catalog, most mature, most startups, best Lambda ecosystem
- Choose Azure — Microsoft shops (.NET, Office 365, Active Directory), enterprise compliance needs, hybrid cloud with on-prem Windows
- Multi-cloud — use Terraform for portable IaC. Kubernetes for portable compute. Service mesh (Istio) for portable networking. But resist premature multi-cloud — it adds operational complexity
Deploy the same app on BOTH AWS and Azure using Terraform. Use PostgreSQL on RDS (AWS) and PostgreSQL Flexible Server (Azure). Blob/S3 for file storage. Service Bus (Azure) / SQS (AWS) for messaging. GitHub Actions for CI/CD to both clouds. This proves cloud-agnostic skills.