Prepare your interview. Always get ready.
API Design Questions
What are the key principles of RESTful API design?
RESTful API design is based on the Representational State Transfer (REST) architectural style, introduced by Roy Fielding in 2000. The core principles include:
- Statelessness: Each request from a client to the server must contain all the information needed to understand and process it. The server doesn’t store any client context between requests, which enhances scalability and reliability. For example, authentication tokens (like JWT) are sent with every request rather than relying on server-side sessions.
- Client-Server Separation: The client handles the user interface, while the server manages data and logic. This separation allows independent evolution of each side.
- Uniform Interface: Achieved through standardized methods (GET, POST, PUT, DELETE), resource identification via URIs (e.g., /users/{id}), self-descriptive messages (using HTTP headers and status codes), and HATEOAS (Hypermedia as the Engine of Application State), where responses include links to related resources for discoverability.
- Cacheability: Responses should indicate if they can be cached (via headers like Cache-Control) to reduce server load and improve performance.
- Layered System: The architecture can include intermediaries like proxies or load balancers without affecting client-server interaction.
- Code on Demand (Optional): Servers can send executable code (e.g., JavaScript) to clients for extended functionality.
These principles ensure APIs are scalable, maintainable, and interoperable. In practice, violating statelessness (e.g., using sessions) can lead to issues in distributed systems.
How do you handle versioning in an API to avoid breaking changes for clients?
API versioning prevents disruptions when evolving the API. Common strategies include:
- URI Versioning: Embed the version in the URL, e.g., /v1/users vs /v2/users. This is simple and allows side-by-side deployment but can lead to URL proliferation.
- Header Versioning: Use a custom HTTP header like Accept: application/vnd.myapi.v2+json. This keeps URLs clean and supports content negotiation but requires clients to set headers correctly.
- Query Parameter Versioning: Append ?version=2 to the URL. It’s flexible but can complicate caching.
- Media Type Versioning: Similar to header, using MIME types like application/json;version=2.
Best practices: Support multiple versions during transition (e.g., deprecate v1 after 6-12 months), document changes clearly (using tools like OpenAPI), use semantic versioning (e.g., MAJOR.MINOR.PATCH for breaking/non-breaking changes), and avoid versioning if possible by designing extensible resources (e.g., optional fields). For backward compatibility, introduce new fields without removing old ones, and use feature flags for gradual rollouts. Tools like API gateways (e.g., Kong) can route based on version.
What are the best practices for implementing authentication and authorization in APIs, such as JWT vs OAuth?
Authentication verifies identity, while authorization checks permissions. Best practices:
- Use HTTPS Always: Encrypt traffic to prevent interception.
- Token-Based Auth: Prefer stateless tokens over sessions for scalability.
- JWT (JSON Web Tokens): Self-contained tokens signed with a secret (HS256) or public/private key (RS256). Structure: Header.Payload.Signature. Pros: Stateless, scalable; cons: Larger size, no easy revocation (use short expiry + refresh tokens). Ideal for microservices.
- OAuth 2.0: Framework for delegated authorization, not authentication (pair with OpenID Connect for that). Flows: Authorization Code (secure for web apps), Implicit (for SPAs, but deprecated), Client Credentials (machine-to-machine). Pros: Secure delegation, scopes for fine-grained access; cons: Complex setup. Use PKCE for public clients to prevent code interception.
- API Keys: Simple for server-to-server, but insecure if exposed (use with rate limiting).
- Authorization: Implement RBAC (Role-Based), ABAC (Attribute-Based), or PBAC (Policy-Based) using middleware (e.g., in Express.js or Spring Security). Store policies in a database or use tools like OPA (Open Policy Agent).
- Other: Multi-factor auth (MFA), rate limiting per user, audit logs. Avoid basic auth over HTTP. Libraries: Passport.js for Node, Spring Security for Java. Regularly rotate secrets and validate tokens server-side.
How can you optimize a GET endpoint for maximum speed, considering caching, database queries, and network latency?
Optimizing a GET endpoint involves multiple layers:
- Database Queries: Use indexing on frequently queried fields (e.g., B-tree indexes), avoid N+1 problems with joins or eager loading, write efficient SQL (use EXPLAIN to analyze plans), and paginate results (cursor-based for large sets). Denormalize data if read-heavy.
- Caching: Implement multi-level caching—application-level (Redis for hot data, TTL based on volatility), database-level (PostgreSQL materialized views), and CDN-level (CloudFront for static responses). Use ETags or Last-Modified headers for conditional requests to avoid full responses. Cache invalidation: Event-driven (e.g., pub/sub on updates) or write-through.
- Network Latency: Minimize payload size (JSON minification, selective fields via GraphQL or ?fields= param), enable HTTP/2 or HTTP/3 for multiplexing, compress responses (Gzip/Brotli). Use edge computing (serverless like Lambda@Edge) to reduce round-trips.
- Server-Side: Asynchronous processing (e.g., Node.js event loop), connection pooling, load balancing. Profile with tools like New Relic.
- Client-Side: Pre-fetching or optimistic updates. Benchmarks: Aim for <200ms p95 latency. Example: For /users/{id}, cache user profile in Redis with 5-min expiry, query DB only on miss.
What strategies can be used for rate limiting in an API server to prevent abuse?
Rate limiting controls request volume per client to ensure fairness and availability. Strategies:
- Fixed Window: Count requests in fixed intervals (e.g., 100/min per IP). Simple but prone to bursts at window edges.
- Sliding Window: Track requests over rolling time (e.g., using Redis sorted sets). Smoother but more complex.
- Token Bucket: Clients have a bucket of tokens refilled at a rate (e.g., 10/sec). Allows bursts up to bucket size. Implemented with Redis (INCR + EXPIRE).
- Leaky Bucket: Similar, but processes at constant rate, queuing excess. Good for smoothing traffic.
- Key Selection: Per IP, API key, user ID, or combination to handle proxies/VPNs. Use distributed counters (Redis) for microservices.
- Implementation: Middleware (e.g., express-rate-limit in Node, Guava RateLimiter in Java). Return 429 Too Many Requests with Retry-After header. Adaptive limiting: Adjust based on load (e.g., circuit breakers). Global vs per-endpoint limits. Monitor with Prometheus. Exemptions for internal traffic.
How do you design error handling in APIs, including standard HTTP status codes and custom error messages?
Effective error handling provides clear, actionable feedback without exposing internals.
- HTTP Status Codes: Use 2xx for success (200 OK, 201 Created), 4xx for client errors (400 Bad Request, 401 Unauthorized, 403 Forbidden, 404 Not Found, 422 Unprocessable Entity for validation), 5xx for server errors (500 Internal, 503 Service Unavailable). Avoid 200 for errors.
- Error Response Format: Consistent JSON structure, e.g., { “error”: { “code”: “VALIDATION_ERROR”, “message”: “Invalid email format”, “details”: [{ “field”: “email”, “issue”: “Must be valid” }] }, “request_id”: “uuid” }. Include correlation IDs for tracing.
- Custom Errors: Define enums for codes (e.g., ERR_001 for duplicates). Log full details server-side but sanitize for clients (no stack traces).
- Global Handler: Use middleware to catch exceptions and map to responses (e.g., @ControllerAdvice in Spring). Graceful degradation for partial failures.
- Best Practices: Internationalize messages, rate-limit error responses to prevent info leaks, document errors in API specs. For async errors, use webhooks with status updates.
What are the advantages and disadvantages of using GraphQL over traditional REST APIs?
GraphQL is a query language for APIs, allowing clients to request exact data.
- Advantages:
- Flexibility: Clients specify fields, reducing over/under-fetching (e.g., one query for user + posts vs multiple REST calls).
- Strong Typing: Schema defines types, queries, mutations—enables tools like GraphiQL for introspection and autocomplete.
- Efficiency: Single endpoint /graphql, batched requests, real-time with subscriptions (WebSockets). Reduces network overhead.
- Versioning: Evolve schema without URI changes (deprecate fields).
- Ecosystem: Libraries like Apollo Client for caching, federation for microservices.
- Disadvantages:
- Complexity: Steeper learning curve; N+1 problem if not using DataLoader for batching.
- Caching Challenges: Harder than REST (no standard HTTP caching; use persisted queries or client-side).
- Security Risks: Introspection can expose schema; limit with depth limits, query complexity analysis (e.g., Apollo Server’s cost analyzer).
- Overhead: Parsing queries adds CPU; not ideal for simple CRUD.
- Debugging: Tracing errors across resolvers is trickier.
Overall, GraphQL shines in client-driven apps (mobile/SPAs); REST for uniform resources.
How can you ensure API security, including protection against common vulnerabilities like SQL injection or XSS?
API security follows OWASP guidelines. Key measures:
- Input Validation/Sanitization: Validate all inputs (e.g., schemas with Joi or JSON Schema), use prepared statements for DB to prevent SQL injection. Escape outputs for XSS (though APIs are backend, consider if rendering HTML).
- Authentication/Authorization: As above, implement properly.
- Injection Prevention: Parameterized queries, ORMs (e.g., Sequelize). For NoSQL, validate structures.
- XSS/CSRF: APIs are less prone (JSON), but use CORS properly (restrict origins), CSRF tokens for state-changing ops.
- Other Vulnerabilities: Rate limiting (DoS), input size limits, secure headers (CSP, HSTS). Encrypt sensitive data (TLS 1.3).
- API-Specific: Validate JWTs, use HTTPS, OWASP API Security Top 10 (e.g., broken auth, excessive data exposure—use field selection). Tools: API scanners like Postman or OWASP ZAP. Regular audits, secret management (Vault), zero-trust model.
What techniques can be applied to scale an API server horizontally, such as load balancing and microservices?
Horizontal scaling adds instances rather than beefing up one. Techniques:
- Load Balancing: Distribute traffic with round-robin, least connections, or IP hash (e.g., NGINX, HAProxy). Layer 7 for path-based routing. Health checks to route away from failures.
- Microservices: Break monolith into services (e.g., user service, order service) communicating via APIs/gRPC. Use service mesh (Istio) for traffic management, observability. Containerize with Docker, orchestrate with Kubernetes for auto-scaling.
- Stateless Design: Ensure servers are stateless (externalize sessions to Redis) for easy replication.
- Database Scaling: Sharding, read replicas (PostgreSQL), or NoSQL for writes.
- Caching/CDN: Offload static assets, cache API responses.
- Async Processing: Queue heavy tasks (Kafka/RabbitMQ) to non-blocking endpoints.
- Monitoring/Auto-Scaling: Use Prometheus/Grafana, cloud auto-scalers (AWS ASG). Circuit breakers (Hystrix/Resilience4j) for fault tolerance. Start with benchmarks to identify bottlenecks.
How do you implement pagination in APIs for large datasets, and what are the trade-offs between offset-based and cursor-based approaches?
Pagination breaks large results into pages.
- Offset-Based: Use ?page=2&limit=10 (offset = (page-1)*limit). Simple, intuitive. Trade-offs: Inefficient for deep pages (DB scans all prior rows), inconsistent if data changes (duplicates/skips). Good for small datasets.
- Cursor-Based (Keyset): Use ?after=last_id&limit=10. Cursor is the last item’s unique key (e.g., ID or timestamp). WHERE id > last_id ORDER BY id. Pros: Constant time, handles inserts/deletes without gaps, efficient for sorted data. Cons: Opaque cursors (clients can’t jump pages), requires sortable unique keys.
- Implementation: Always include total count if feasible (or approximate), links for next/prev (HATEOAS). For search, combine with filters. Libraries: In SQL, use LIMIT/OFFSET or window functions. Avoid for real-time (use streams). Hybrid: Offset for UI, cursor for backend.
What role does API documentation play, and how can tools like Swagger or OpenAPI help?
Documentation ensures usability, reduces support queries, and aids compliance. Role: Describes endpoints, params, responses, errors, auth—acts as contract. Interactive docs enable self-service testing.
- Swagger/OpenAPI: YAML/JSON spec for describing APIs (version 3.0+). Auto-generates docs (Swagger UI), client SDKs (Swagger Codegen), server stubs. Workflow: Annotate code (e.g., @ApiOperation in Spring), generate spec, host UI. Pros: Standardized, machine-readable, supports validation. Cons: Maintenance overhead if not automated. Alternatives: RAML, GraphQL SDL. Best: Embed examples, update with CI/CD, version docs.
How can you optimize API responses by reducing payload size, such as through compression or selective field inclusion?
Smaller payloads reduce bandwidth and latency. Techniques:
- Compression: Enable Gzip/Brotli (saves 70-90% for JSON). Set via server config (e.g., NGINX gzip on). Client accepts via Accept-Encoding.
- Selective Fields: REST: ?fields=name,email; GraphQL: Query only needed fields. Pros: Tailored data; cons: More client logic.
- Data Formats: JSON is verbose—use JSON:API (includes relationships efficiently) or Protobuf for binary (faster parsing, 3-10x smaller).
- Remove Noise: No nulls/whitespace, use short keys (e.g., “id” not “userIdentifier”).
- Pagination/Aggregation: Limit rows, summarize data.
- Caching/Profiling: Use tools like Wireshark to measure. Edge case: Mobile networks benefit most.
What are common patterns for handling asynchronous operations in APIs, like webhooks or polling?
Async ops (e.g., file uploads) avoid blocking. Patterns:
- Polling: Client checks status via GET /jobs/{id}. Pros: Simple; cons: Inefficient (many requests). Use exponential backoff.
- Webhooks: Server pushes updates to client URL on events (POST with payload). Secure with signatures (HMAC), retries (idempotent). Pros: Real-time; cons: Client must expose endpoint, firewall issues. Use for notifications.
- Server-Sent Events (SSE): Unidirectional stream from server (e.g., /events). Simpler than WebSockets for one-way.
- WebSockets: Bidirectional for interactive (chat). Libraries: Socket.io.
- Message Queues: Internal async (SQS for jobs). Return 202 Accepted with job ID. Track state in DB/Redis. Best: Combine (webhook + polling fallback).
How do you design idempotent endpoints to handle retries safely?
Idempotency ensures repeated requests have the same effect (e.g., no duplicate charges). Design:
- For GET/DELETE: Inherently idempotent.
- For POST/PUT: Use client-generated idempotency keys (UUID in header, store in DB with expiry). On retry, check key—if processed, return cached result.
- Database: Use unique constraints (e.g., on order_id), UPSERT (INSERT ON CONFLICT).
- Distributed Systems: Use Redis for key tracking (SETNX). Time-bound (24h).
- Errors: Retry on 5xx/transients, not 4xx. Libraries: Stripe’s API uses this. Benefits: Safe retries in unreliable networks.
What considerations are important for designing APIs that support mobile clients, such as offline capabilities?
Mobile APIs must handle intermittency, battery, data costs. Considerations:
- Offline Support: Design for local storage (e.g., return sync tokens for delta updates). Use optimistic UI with conflict resolution (last-write-wins or OT).
- Efficiency: Small payloads, compression, fewer requests (batch via GraphQL). Push notifications (FCM/APNs) over polling.
- Auth: Refresh tokens for long sessions, biometric integration.
- Error Handling: Graceful offline errors, retry queues (WorkManager on Android).
- Performance: Low-latency endpoints, CDN for assets. Test on slow networks (3G). Security: App-bound tokens. Patterns: Sync APIs (e.g., /changes?since=token).
PostgreSQL Questions
What are the main advantages of using PostgreSQL over other relational databases like MySQL?
PostgreSQL (Postgres) is an open-source ORDBMS emphasizing standards and extensibility. Advantages:
- Standards Compliance: Full SQL:2011 support, including window functions, CTEs, JSON querying—more advanced than MySQL’s.
- Data Types: Rich support for JSON/JSONB (indexed queries), arrays, hstore (key-value), geospatial (PostGIS), UUIDs—ideal for complex data.
- Concurrency: MVCC allows high read/write without locks, better than MySQL’s InnoDB for read-heavy.
- ACID Compliance: Strict transactions, foreign keys, constraints.
- Extensibility: Custom functions (PL/pgSQL, Python), extensions (e.g., TimescaleDB for time-series).
- Reliability/Scalability: Point-in-time recovery, logical replication, partitioning. Vs MySQL: Postgres handles complex queries better, less replication lag; MySQL is simpler/faster for basic OLTP. Use Postgres for analytics-heavy apps.
What are the disadvantages of using PostgreSQL, such as in terms of performance or complexity?
Despite strengths, Postgres has drawbacks:
- Performance Overhead: MVCC creates bloat (dead tuples), requiring frequent VACUUM—can slow writes vs MySQL’s simpler locking. Slower for very high TPS (e.g., >10k) without tuning.
- Complexity: Steeper learning curve for advanced features (e.g., custom types). Configuration (postgresql.conf) is intricate; defaults not optimized for production.
- Resource Usage: Higher memory/CPU for features like full-text search or GIS. Replication setup more involved than MySQL’s async.
- Replication: Streaming replication is async by default; synchronous adds latency. No built-in multi-master (need extensions like BDR).
- Ecosystem: Fewer managed services than MySQL (though AWS RDS helps). For simple web apps, MySQL’s speed/simplicity wins. Mitigation: Tune for workload, use connection poolers like PgBouncer.
How can you optimize SQL queries in PostgreSQL, including the use of EXPLAIN ANALYZE?
Query optimization minimizes execution time/cost. Steps:
- Analyze Plans: Use EXPLAIN (shows plan without running) or EXPLAIN ANALYZE (executes and times). Look for Seq Scan (bad for large tables—add indexes), high cost nodes, loops.
- Indexing: Create on WHERE/JOIN/ORDER BY columns (B-tree default; GIN for JSON/full-text). Partial/composite indexes for selectivity. Monitor with pg_stat_user_indexes.
- Query Rewriting: Use CTEs for readability, avoid SELECT * (specify columns), limit rows early. Rewrite correlated subqueries as joins.
- Statistics: Run ANALYZE to update planner stats; increase default_statistics_target for better estimates.
- Configuration: Tune work_mem for sorts/hashes, effective_cache_size for planner. Use materialized views for complex aggregations.
- Tools: pgBadger for log analysis, auto_explain extension. Example: For slow JOIN, add index on join key. Benchmark: Aim for <100ms queries.
What types of indexes are available in PostgreSQL, and when should you use each (e.g., B-tree vs GiST)?
Indexes speed lookups but add write overhead. Types:
- B-tree: Default, balanced tree for =, <, >, BETWEEN, IN. Use for equality/range on scalars (e.g., id, date). Supports NULLs.
- Hash: For equality only (=, <>), faster for large static data but no range/MVCC support (use cautiously).
- GiST (Generalized Search Tree): For complex types like geometry (PostGIS), full-text (tsvector). Supports proximity searches. Use for spatial or custom ops.
- GIN (Generalized Inverted Index): For arrays, JSONB, tsvector—fast containment (@>, ?). Ideal for full-text or many-to-many.
- BRIN (Block Range Index): For sorted large tables (e.g., time-series), stores min/max per block. Low overhead, use for append-only data.
- SP-GiST: For space-partitioned data like quadtrees (points).
- Others: Partial (WHERE clause), expression (on computed values), multi-column. Choose by query: Analyze with EXPLAIN; drop unused (pg_stat_user_indexes). Maintenance: REINDEX periodically.
How does PostgreSQL handle transactions and concurrency with MVCC (Multi-Version Concurrency Control)?
MVCC allows concurrent reads/writes without blocking. Mechanism:
- Versions: On update, old row gets xmin (insert txid) and xmax (delete txid); new row created. Readers see snapshot based on txid at start.
- Visibility: Queries use snapshot isolation—uncommitted changes invisible, committed post-snapshot ignored. No read locks.
- Concurrency Anomalies: Serializable isolation prevents phantoms (via SSI—Serializable Snapshot Isolation). Levels: Read Committed (default), Repeatable Read, Serializable.
- Deadlocks: Detected automatically (abort one tx); use NOWAIT or advisory locks.
- Cleanup: VACUUM removes dead tuples (invisible to all). HOT updates avoid bloat for same-page changes.
- Pros: High throughput; cons: Table bloat, vacuum overhead. Tune: vacuum_cost_delay, autovacuum. Vs locking DBs: Better for read-heavy.
What are best practices for database partitioning in PostgreSQL to improve performance on large tables?
Partitioning splits tables into sub-tables (partitions) for manageability/performance. Since PG 10, declarative (CREATE TABLE … PARTITION BY). Types:
- Range: By date/range (e.g., sales by month)—prune scans to relevant partitions.
- List: By discrete values (e.g., region=‘US’).
- Hash: Even distribution for uniform load.
Best practices: - Choose Key: High-cardinality, query-aligned (e.g., timestamp for time-series). Subpartition if needed.
- Maintenance: Attach/detach partitions online; use pg_partman extension for automation.
- Indexes/Constraints: Per-partition or global (PG 15+). FKs limited across partitions.
- Performance: Faster queries (partition elimination), parallel scans. But slower for cross-partition ops.
- When: Tables >10GB or billions rows. Monitor: pg_stat_user_tables. Alternative: Sharding for massive scale.
How can you set up replication in PostgreSQL for high availability, including streaming replication?
Replication copies data to standbys for HA/DR. Streaming (WAL-based, since PG 9.0):
- Setup: Primary: Edit postgresql.conf (wal_level=replica, max_wal_senders>0), pg_hba.conf (host replication user standby_ip/32 md5). Create replication user, base backup (pg_basebackup). Standby: recovery.conf or 12+ primary_conninfo in postgresql.conf, hot_standby=on. Start standby.
- Synchronous: Set synchronous_standby_names for commit waits (async default for speed).
- Types: Physical (block-level, for HA), logical (statement-level, for selective, PG 10+).
- Tools: pg_receivewal for WAL shipping, repmgr/Patroni for failover clustering.
- Monitoring: pg_stat_replication, check WAL sender. Benefits: Read scaling (queries on standbys), failover (<1s with tools). Test switchover.
What are common techniques for backup and recovery in PostgreSQL, such as pg_dump and point-in-time recovery?
Backups ensure data safety. Techniques:
- Logical: pg_dump (SQL dump, for schema/data), pg_dumpall (globals). Pros: Portable; cons: Slow for large DBs, locks tables. Use —jobs for parallel.
- Physical: File-system copy (pg_start_backup/pg_stop_backup for consistent), or tools like pg_basebackup (WAL-enabled). Barman/WAL-G for cloud.
- Point-in-Time Recovery (PITR): Archive WAL (wal_keep_size), restore base + apply WAL to timestamp/txid (recovery_target_time).
- Continuous Archiving: wal_level=archive, archive_command (e.g., to S3).
- Best Practices: Full weekly + incremental daily, test restores, encrypt backups. RPO/RTO: PITR for <1h loss. Tools: pgBackRest for dedup/compression.
How does PostgreSQL support JSON data types, and what are use cases for JSONB?
JSON support since PG 9.2; JSONB (binary, PG 9.4) is preferred.
- Types: JSON (text), JSONB (parsed, indexed). JSONB: Compressed, immutable, supports operators like → (get field), #> (path), @> (contains).
- Indexing: GIN on JSONB for containment/ existence.
- Use Cases: Semi-structured data (e.g., user prefs, logs), NoSQL-like in RDBMS (e.g., e-commerce product attributes). Queries: SELECT * WHERE data @> ’{“key”: “value”}‘. Functions: jsonb_build_object, jsonb_agg. Pros: Flexible schema evolution; cons: Slower than normalized, bloat. Hybrid: Normalize common fields, JSONB for sparse.
What are extensions in PostgreSQL, and how can they enhance functionality (e.g., PostGIS for geospatial data)?
Extensions add features via CREATE EXTENSION. Installed from contrib or PGXN.
- How: pg_available_extensions lists; requires superuser. Unload with DROP.
- Enhancements:
- PostGIS: Geospatial—types (geometry, geography), functions (ST_Distance, ST_Intersects). Use for mapping apps (e.g., Uber routing). Indexes: GiST.
- pg_trgm: Trigram for fuzzy search (LIKE ‘%abc%’).
- uuid-ossp: UUID generation.
- TimescaleDB: Hypertables for time-series (auto-partition, compression).
- hstore: Key-value pairs. Benefits: No forking core; versioned. Use: For analytics (cube extension), search (pg_bigm). Install selectively to avoid bloat.
How do you tune PostgreSQL configuration parameters like work_mem or shared_buffers for better performance?
Tuning matches config to hardware/workload. Key params:
- shared_buffers: RAM for shared cache (25% of RAM, e.g., 4GB on 16GB). Too high wastes OS cache.
- work_mem: Per-operation memory (sorts/joins; default 4MB). Set low (1-16MB) to avoid OOM; multiply by max_connections. Monitor temp_files.
- effective_cache_size: Planner hint (OS cache + shared_buffers, 50-75% RAM).
- maintenance_work_mem: For VACUUM/INDEX (1GB+).
- checkpoint_completion_target: 0.9 for spread checkpoints. wal_buffers: 1/32 shared_buffers.
- Process: Use pgtune or formulas; test with pg_settings, benchmarks (pgbench). Monitor: pg_stat_bgwriter. Iterate: Increase for read-heavy, decrease for many connections.
What are the implications of using triggers and stored procedures in PostgreSQL?
Triggers: Auto-execute on events (INSERT/UPDATE). Stored procs: Reusable SQL/PL/pgSQL code.
- Implications:
- Pros: Enforce business logic (e.g., audit logs via trigger), centralize (procs for complex ops), reduce app-DB roundtrips.
- Cons: Performance hit (extra execution), debugging hard (no stack traces), testing tricky. Can hide logic, violate separation. Triggers cascade (risk loops).
- Use: Triggers for integrity (e.g., update timestamp), procs for transactions (CALL/EXECUTE). Security: Definer rights. Alternatives: App logic for portability. Monitor: pg_stat_user_functions.
How can you monitor and diagnose slow queries in PostgreSQL using tools like pg_stat_statements?
Monitoring identifies bottlenecks. Tools:
- pg_stat_statements: Extension tracks query stats (calls, total_time). Load with shared_preload_libraries. Query: SELECT query, mean_time FROM pg_stat_statements ORDER BY total_time DESC. Reset with pg_stat_statements_reset.
- EXPLAIN ANALYZE: Per-query plans.
- pg_stat_activity: Current queries, locks, wait events.
- Logs: log_min_duration_statement=250ms; analyze with pgBadger.
- External: Check_Postgres.pl, pgmetrics, or Grafana + postgres exporter. Diagnose: High I/O (add indexes), CPU (rewrite queries), contention (locks). Alerts: Slow query thresholds.
What are the differences between VACUUM and ANALYZE in PostgreSQL, and why are they important?
- VACUUM: Reclaims space from dead tuples (MVCC waste), updates visibility map for faster scans. Modes: FULL (rewrites table, locks), non-FULL (in-place). autovacuum runs automatically. Importance: Prevents bloat (table size explosion), reduces I/O.
- ANALYZE: Updates statistics (row counts, histograms) for query planner. Part of VACUUM but can run alone. Importance: Accurate plans avoid bad choices (e.g., wrong index). Differences: VACUUM cleans data, ANALYZE informs planner. Run after bulk loads. Tune: autovacuum_vacuum_scale_factor=0.1. Neglect leads to slow queries/out-of-memory.
How does PostgreSQL handle full-text search, and how does it compare to dedicated search engines like Elasticsearch?
FTS via to_tsvector/tsquery.
- Handling: Convert text to tsvector (indexed tokens), query with tsquery (AND/OR/phrase). Operators: @@ (match), plainto_tsquery. Indexes: GIN on tsvector column. Ranking: ts_rank. Config: Dictionaries (english stemmer).
- Vs Elasticsearch: Postgres FTS is integrated, ACID, no separate system—good for small-medium (e.g., app search). ES: Distributed, faceting, relevance tuning (BM25), aggregations, horizontal scale. Postgres slower for massive indexes, lacks geo/realtime. Hybrid: Postgres for transactional, ES for search (sync via triggers).
Redis Questions
What are the primary use cases for Redis, such as caching, session storage, or real-time analytics?
Redis is an in-memory key-value store, fast (sub-ms). Use cases:
- Caching: Store hot data (e.g., DB results) with TTL/ eviction (LRU). Reduces DB load by 80-90%.
- Session Storage: Distributed sessions (e.g., user carts in e-commerce), key: session_id, value: JSON. Scalable vs sticky sessions.
- Real-Time Analytics: Counters (INCR), leaderboards (sorted sets ZADD/ZRANGE). Pub/Sub for metrics.
- Others: Rate limiting (sliding windows), queues (lists/RPOPLPUSH), full-page cache (strings). Not for durable storage—pair with DB. Examples: Twitter for timelines, Stack Overflow for votes.
How does Redis implement its key data structures, like strings, hashes, lists, sets, and sorted sets?
Redis uses compact encodings:
- Strings: Binary-safe, up to 512MB. Internals: embstr (small, <44 bytes), raw (SDS—simple dynamic strings). Ops: SET/GET/INCR.
- Hashes: Field-value maps, like objects. Ziplist (small, <512 items/64 bytes/field), hashtable (larger). HSET/HGET.
- Lists: Doubly-linked lists or ziplists. LPUSH/RPOP for queues, LTRIM for bounded.
- Sets: Hashsets or intsets (small integers). SADD/SMEMBERS, fast unions (SUNION).
- Sorted Sets (ZSets): Skip lists + hashtable for O(log N) ops. ZADD/ZRANGE for rankings.
Memory-efficient; promotes to larger structures on growth. Use SCAN for iteration to avoid blocking.
What is Redis persistence, and how do options like RDB snapshots and AOF logs differ?
Persistence saves data to disk (default in-memory). Options:
- RDB (Snapshot): Point-in-time binary dump (save 900 1 for 60s changes). Pros: Compact, fast restart; cons: Data loss on crash (last snapshot), blocks server during save (forks on Unix). Config: dbfilename, dir.
- AOF (Append-Only File): Logs writes (fsync everysec). Rewrite (bgrewriteaof) compacts. Pros: Durable (RDB+AOF hybrid), human-readable; cons: Larger files, slower recovery.
- Hybrid: Default—RDB for backups, AOF for durability. No persistence: Risky for critical data. Tune: save config, appendfsync.
How can you implement an LRU (Least Recently Used) cache eviction policy in Redis?
Redis supports eviction via maxmemory-policy (allkeys-lru default).
- Implementation: Set maxmemory 1gb, maxmemory-policy allkeys-lru (evicts least recently used keys across DBs). LFU (least frequently) alternative.
- Manual: Use sorted sets with timestamps (ZADD score=time key), ZREM on access, evict lowest on full. But built-in is efficient (O(1) approx).
- Use: For caching—EXPIRE for TTL, MONITOR eviction stats. Pros: Automatic; cons: No per-key policy. Libraries: Redisson for advanced.
What are Redis Pub/Sub messaging patterns, and how are they used for real-time notifications?
Pub/Sub: Publishers send to channels, subscribers receive (fire-and-forget).
- Patterns: SUBSCRIBE channel, PUBLISH channel msg. Unreliable (no ACK, disconnects lose msgs). Pattern matching (PSUBSCRIBE pattern).
- Real-Time Notifications: E.g., chat: Publish “user:123:msg” to notify. Scale with sharding (channels per user).
- Limitations: No persistence, at-most-once. Use: Triggers (e.g., Node.js socket.io backend). Alternatives: Redis Streams for durable. Monitor: PUBSUB NUMSUB.
How does Redis clustering work for scaling and high availability?
Clustering (since 3.0): Shards data across nodes.
- Setup: redis-cli —cluster create, slots (16384 total, CRC16 hash). Masters hold slots, slaves replicate.
- Scaling: Add nodes (reshard slots), auto-failover (sentinels or cluster-node-timeout).
- HA: Min 3 masters; slaves promote on failure. Client: Use Cluster mode (handles redirects).
- Limitations: Multi-key ops (e.g., MGET) only if same slot (HASH tags {}). Keys: ~O(1) with gossip protocol. Vs Sentinel: Clustering for scale, Sentinel for simple HA.
What are common anti-patterns when using Redis, such as treating it as a primary database?
Anti-patterns:
- Primary DB: No ACID transactions across keys, eventual consistency—use for cache/aux. Loss on eviction.
- Large Keys: Values >100MB slow everything; shard or use hashes.
- Blocking Commands: KEYS * in prod (O(N)); use SCAN. Long-running Lua scripts block.
- No Expiry: Memory leaks; always SETEX or CONFIG SET maxmemory.
- Ignoring Persistence: Assume durability—combine RDB/AOF.
- Over-Reliance on Pub/Sub: No durability; use Streams. Fix: Monitor INFO, use pipelines for batching.
How can you use Redis transactions and pipelining to improve performance?
- Transactions: MULTI/EXEC for atomic (WATCH for optimistic locking). All or nothing, but no rollbacks (error mid-way). Lua scripts for complex atomics.
- Pipelining: Send multiple commands without waiting (client-buffered). Boosts throughput 10x for batch ops (e.g., 100 SETs).
- Combo: Pipeline transactions. Use: Bulk inserts. Libraries: ioredis supports. Measure: redis-benchmark.
What role does Lua scripting play in Redis, and what are examples of its usage?
Lua: Embedded for atomic, server-side scripts (EVAL/EVALSHA). Runs single-threaded, access Redis calls.
- Role: Complex logic without network roundtrips (e.g., conditional updates). Faster than MULTI for reads/writes.
- Examples: Rate limiter (check/incr in one script), distributed locks (SETNX + expiry). EVAL “return redis.call(‘SET’, KEYS[1], ARGV[1])” 1 key value. Cache aside: Check/get/set. Pros: Atomic; cons: Debug hard, size limit (8MB).
How do you monitor Redis performance metrics, like memory usage and hit rates?
Monitoring:
- INFO: Command for stats—used_memory, mem_fragmentation_ratio (>1.5 bad), keyspace_hits/misses (hit rate >90% good), connected_clients, evicted_keys.
- SLOWLOG: GET/SUB for slow commands (latency > threshold).
- Tools: Redis Sentinel/Monit, Prometheus exporter + Grafana (dashboards for RSS, throughput). redis-cli —stat for live. Alerts: High fragmentation (COMPACT), low hit rate (add memory).
What are Redis modules, and how can they extend functionality (e.g., RedisGraph for graph databases)?
Modules (since 4.0): Loadable C code for new commands.
- How: MODULE LOAD libmodule.so. Unloadable.
- Extensions:
- RedisGraph: Cypher queries for graphs (nodes/edges). Use: Social networks, recommendations (MATCH (u:User)-[:FRIEND]→(f) RETURN f).
- RediSearch: Full-text search, indexing (FT.SEARCH).
- RedisJSON: Native JSON ops (JSON.SET/GET).
- TimeSeries: For IoT/metrics (TS.ADD/TS.RANGE). Pros: No fork; cons: Potential instability. Use official (RedisModules).
How can Redis be used for leaderboards or ranking systems with sorted sets?
Sorted sets: Unique members, scores (float).
- Implementation: ZADD leaderboard 100 “user1”, ZRANGE 0 -1 WITHSCORES (top N). Update: ZINCRBY on score change. Persist: Remove old (ZREMRANGEBYRANK).
- Use: Gaming (scores), e-commerce (hot items). Paginate: ZRANGEBYSCORE. Time-bound: Separate sets per epoch. Efficient: O(log N) per op. Example: Twitter trends.
What security best practices should be followed when deploying Redis, such as authentication and encryption?
Secure Redis:
- Auth: REQUIREPASS in config (ACL since 6.0 for users/rights). Bind to localhost or firewall.
- Encryption: TLS for transport (redis-tls), at-rest (external tools). No built-in.
- Network: No public exposure; use VPN/Stunnel. Disable dangerous commands (rename-command DEBUG+ "" ).
- Modules/Keys: Validate inputs to prevent injection. Monitor: Slowlog for attacks. Updates: Patch CVEs (e.g., RCE in old versions). Cloud: Use managed (AWS ElastiCache) with IAM.
How does Redis handle replication and failover in a master-slave setup?
Replication: Async, full+partial resync.
- Setup: SLAVEOF master_ip port on slave. PSYNC for deltas.
- Failover: Manual (SLAVEOF no one on promoted slave). Use Sentinel: Monitors, auto-failover (quorum elects). Config: sentinel monitor master 127.0.0.1 6379 2.
- HA: Chain slaves, wait for ack (min-slaves-to-write). Limitations: No multi-master. Scale reads on slaves. Monitor: INFO replication.
What are the differences between Redis and Memcached as in-memory data stores?
- Data Structures: Redis: Rich (lists/sets/zsets); Memcached: Simple keys/values.
- Persistence: Redis: RDB/AOF; Memcached: None.
- Eviction: Both LRU; Redis more policies (allkeys/volatile).
- Performance: Similar speed; Redis scripting/replication adds overhead.
- Use: Memcached for pure cache (simpler, multi-get); Redis for queues/pubsub. Redis Lua vs Memcached no logic. Scale: Both cluster, but Redis sharding native.
Kafka Questions
What is the concept of partitioning in Kafka, and how does it enable scalability?
Partitions: Unit of parallelism/scalability in topics.
- Concept: Topic divided into ordered, immutable logs (segments). Producers choose partition (hash key or round-robin). Consumers read from partitions.
- Scalability: Parallel processing—multiple consumers per consumer group (one per partition). Brokers store partitions; add brokers to rebalance. Throughput: Linear with partitions (e.g., 10 partitions = 10x parallelism). Retention: Per-partition configurable. Trade-off: More partitions increase overhead (file handles, ZooKeeper load).
How does Kafka ensure message ordering within a partition, and what issues arise with ordering across partitions?
- Within Partition: Strict FIFO—messages appended sequentially, offset-based. Same key goes to same partition (hash), preserving order for that key.
- Across Partitions: No global order—independent logs. Issues: If order matters globally, use single partition (bottleneck) or app-level sequencing (timestamps + idempotency). Out-of-order delivery on failures (rebalance). Solution: Logical timestamps, exactly-once semantics (transactions). For strict order, custom partitioner.
What are Kafka producers and consumers, and how do they interact with topics?
- Producers: Apps sending messages. Config: bootstrap.servers, acks (0=all,1=leader,all=replicas). Serialize (JSON/Avro), partitioner. Batch/flush for efficiency.
- Consumers: Poll messages. Groups for load balancing (one consumer per partition). Offset commit (auto/manual). Deserializer.
- Interaction: Producer: send(topic, key, value). Consumer: subscribe(topic), poll(). Heartbeats for coordination. Use: Streams (producers to topics, consumers from).
How does Kafka handle replication for fault tolerance and data durability?
Replication: Partitions replicated across brokers (replication.factor, default 3).
- ISR (In-Sync Replicas): Leader + synced followers. Min.insync.replicas for durability.
- Leader Election: Controller (ZooKeeper/KRaft) promotes ISR on failure.
- Durability: acks=all + min.insync >1 ensures writes survive 1 failure. HW/SW errors: Truncate unclean leaders. Retention: log.retention.hours. Pros: 99.99% durability; cons: Latency on sync.
What mechanisms does Kafka provide for offset management and consumer group coordination?
- Offsets: Unique per partition, stored in __consumer_offsets topic (internal).
- Management: Consumer: commitSync/async (enable.auto.commit=false for manual). Seek to offset.
- Groups: Coordinator (broker) assigns partitions (range/round-robin). Rebalance on join/leave (stop-the-world). Static membership (group.instance.id) reduces rebalances. Heartbeats: session.timeout.ms. Use: At-least-once (commit post-process).
How can you achieve exactly-once semantics in Kafka for message processing?
Exactly-once (EOS): Idempotent producers + transactions.
- Producers: enable.idempotence=true (PID, sequence numbers, dedup).
- Transactions: initTransactions, begin/commit (atomic across partitions/topics). For consumers: Read committed (isolation.level=read_committed).
- Streams: Kafka Streams uses EOS for stateful ops. Power User: Combine with idempotent sinks (e.g., UPSERT). Since 0.11; prevents duplicates on retries/failovers.
What is the role of ZooKeeper in Kafka, and how is it being phased out in newer versions?
ZooKeeper: Distributed coord for metadata (brokers, topics, partitions, leaders). Role: Leader election, config storage, offset coord (pre-0.9).
- Phasing Out: KRaft (Kafka Raft, since 2.8, default 3.3+). Metadata in quorum log (no external dep). Benefits: Simpler deploy, faster recovery. Transition: Dual mode, migrate. Still used optionally.
How does Kafka Streams enable real-time data processing and stateful operations?
Kafka Streams: Java/Scala lib for stream processing.
- Real-Time: Processes topic streams (KStream/KTable). Topologies: Source-processor-sink.
- Stateful: Stores (RocksDB) for aggregations (groupBy, windowed counts). Changelog topics for fault tolerance.
- Ops: Join streams/tables, branch/filter, exactly-once. Scale: Parallelism per app. Use: Fraud detection (windowed sums), personalization. DSL or Processor API.
What are common recovery strategies in Kafka after broker failures or network partitions?
- Broker Failure: Leader unavailable—controller elects new from ISR. Catch-up (fetch log end). Unclean shutdown: Truncate to last consistent.
- Network Partitions: Quorum-based (replication.factor/2 +1 alive). Auto-leader rebalance.
- Strategies: High replication, rack awareness (distribute replicas). unclean.leader.election.enable=false. Monitor: Under-replicated partitions. Recovery: Restart, resync slaves. Backup: MirrorMaker for DR.
How do you scale Kafka clusters by adding brokers or rebalancing partitions?
- Adding Brokers: Auto (increase partitions? No, manual). Use kafka-reassign-partitions.sh: JSON plan with new assignments, execute.
- Rebalancing: On broker add/remove, controller moves leaders/ISRs. Minimize: Preferred replica election.
- Scale: More partitions for throughput (but overhead). Vertical: Bigger disks. Monitor: Partition count <2000/broker. Tools: Cruise Control for auto.
What is the Kafka Connect framework, and how is it used for integrating with external systems?
Connect: Scalable tool for moving data (sinks/sources).
- Framework: Plugins for connectors (e.g., JDBC, S3). Distributed/standalone mode. Tasks parallel.
- Integration: Source: DB → Kafka (Debezium CDC). Sink: Kafka → ES. Config: JSON, offsets in topics. Use: ETL pipelines, no code. REST API for management. Pros: Fault-tolerant, schema-aware (with Schema Registry).
How does the Schema Registry work with Kafka to manage evolving data schemas?
Schema Registry: Central repo for Avro/Protobuf/JSON schemas.
- Work: Producers/consumers register/ fetch (POST /subjects/{topic}-value/schema). Compatibility: Backward/forward/full (evolution rules). IDs in messages (magic byte + id).
- Benefits: Decouples prod/cons evolution, validation. REST API, Kafka-integrated. Use: With Connect/Serdes. Alternatives: No registry (embed schema, brittle).
What are idempotent producers in Kafka, and why are they important for reliability?
Idempotent: Producers dedup retries (enable.idempotence=true).
- How: Producer ID (PID), sequence numbers per partition. Broker tracks, rejects duplicates.
- Importance: At-least-once without dups on retries/timeouts. Reliability: Handles network flakes. Limits: 5 retries default. Combine with transactions for EOS.
How can you monitor Kafka performance, including metrics like throughput and lag?
- Metrics: JMX (via JConsole). Throughput: bytes in/out/sec. Lag: Consumer offset - latest (kafkacat or Burrow). Under-replicated partitions.
- Tools: Prometheus + Kafka exporter, Grafana dashboards. Kafka Manager/CMAK for UI. Logs: Controller logs for errors. Alerts: Lag > threshold, CPU >80%.
What are the differences between Kafka and other messaging systems like RabbitMQ or Apache Pulsar?
- Kafka: Pub-sub, durable logs, high-throughput (millions/sec), pull-based, ordered partitions. Best for streams/batch.
- RabbitMQ: Queue-based (AMQP), push, ACKs, routing (exchanges). Better for task queues, lower latency <1k msg/sec.
- Pulsar: Segmented topics (like Kafka), multi-tenancy, layered (bookkeeper storage). Functions for processing, better geo-replication. Kafka: Simpler, mature ecosystem; Pulsar: More features but complex. Choose: Kafka for log analytics, Rabbit for RPC.