Data Analyst
Updated for 2026: Data Analyst interview questions and answers covering core skills, tools, and best practices for roles in the US, Europe & Canada.
How do window functions work in SQL and when should analysts use them?
Window functions compute values across a set of rows without collapsing results like GROUP BY. Use them for running totals, ranking, cohort retention, and “latest per group” patterns. Common functions: ROW_NUMBER, RANK, LAG/LEAD, SUM() OVER. Interview tip: explain partitioning (PARTITION BY) and ordering (ORDER BY) and how they change the result.
How do you design a KPI dashboard that stakeholders will actually trust and use?
Start from decisions, not charts. Define metric definitions, owners, and refresh cadence. Add context: targets, thresholds, and annotations for incidents/releases. Include drill-down paths and a data quality section (freshness, completeness). A dashboard is successful when it answers a specific question quickly and consistently.
What is cohort analysis and how do you use it for retention?
Cohort analysis groups users by a shared start event (signup, first purchase) and tracks behavior over time. It helps answer: are newer cohorts retaining better or worse? Key choices: cohort definition, time granularity (days/weeks), and retention event (active, purchase). Always control for seasonality and marketing channel mix.
How do you perform funnel analysis and find drop-off reasons?
A funnel tracks sequential steps (visit → signup → activation → purchase). Best practice: - Define steps precisely (events + filters) - Choose a time window (same session vs 7 days) - Segment by device, channel, geography Then investigate drop-offs using session replays, logs, UX research, and controlled experiments.
How do you detect anomalies in metrics and avoid false alarms?
Start with baselines and seasonality. Approaches: - Rolling averages with thresholds - Week-over-week comparisons - Statistical control charts - Time-series models for complex patterns Reduce false alarms by setting alert policies, using guardrail metrics, and requiring sustained deviation before paging.
What is your workflow for cleaning messy datasets before analysis?
A solid workflow is: - Validate schema and types - Handle missing values and duplicates - Standardize units/time zones/currencies - Check ranges and outliers - Document assumptions Always keep raw data immutable and perform cleaning via reproducible transforms (SQL models, notebooks, dbt).
How do you define metrics so teams don’t argue about numbers?
Define metrics like an API contract. Include: - Business meaning - Exact SQL logic (filters, time zone) - Grain (user/day/order) - Inclusion/exclusion rules - Ownership and change process Centralize definitions in a metrics layer or documentation and add tests for consistency.
How do you analyze A/B test results and avoid common mistakes?
Use a pre-defined analysis plan. Check: - Randomization and sample ratio mismatch - Primary + guardrail metrics - Confidence intervals and effect size - Multiple comparisons and peeking Report practical impact, not just p-values, and ensure the tracking is correct before trusting results.
What are best practices for data visualization and storytelling?
Great visuals reduce cognitive load. Best practices: - Pick the right chart for the question - Use consistent scales and labels - Highlight the takeaway (annotations) - Avoid chartjunk and misleading axes Tell the story: context → insight → recommendation → next step.
How do you translate stakeholder questions into an analysis plan?
Clarify the decision first. Ask: - What action will you take based on the result? - What time window and segment matter? - What constraints or definitions apply? Then define hypotheses, metrics, required data, and deliverables. This prevents “analysis for analysis’ sake.”
How do you automate reporting without losing trust in numbers?
Automate pipelines and add guardrails. Steps: - Standardize metric logic - Schedule refreshes with monitoring - Add data quality tests - Version dashboards and definitions Automation fails when logic lives in ad-hoc spreadsheets. Move transforms into tested SQL/dbt models and keep documentation updated.
Which advanced Excel skills are still valuable for data analysts?
Excel remains useful for fast ad-hoc work. High-value skills: - Pivot tables and pivot charts - XLOOKUP/INDEX-MATCH - Power Query for cleaning - Basic statistics and what-if analysis Use Excel for exploration, but move production logic into SQL/BI models to avoid “spreadsheet drift.”
What is data lineage and why does it matter for analytics?
Lineage shows where data comes from and how it transforms. It matters because: - You can debug discrepancies faster - You can assess impact of upstream changes - You can improve trust and governance Even lightweight lineage (source → model → dashboard) reduces time spent chasing broken metrics.
What data quality checks should analysts insist on for critical metrics?
Critical metrics need automated checks: - Freshness (is data up to date?) - Completeness (missing rows?) - Uniqueness (primary keys) - Valid ranges (no impossible values) Without tests, dashboards can be wrong silently. Define SLOs for key datasets and alert when checks fail.
What are common SQL join pitfalls that cause incorrect metrics?
Common pitfalls: - Many-to-many joins inflating counts - Joining at the wrong grain - Using INNER vs LEFT incorrectly - Duplicates in dimension tables Fix by validating row counts at each step, using distinct keys, and aggregating at the correct grain before joining.
How do you segment users or customers for meaningful analysis?
Segmentation should reflect real behavioral or business differences. Examples: - Lifecycle stage (new, active, churned) - Value tier (LTV bands) - Acquisition channel - Product usage patterns Validate segments by checking stability over time and whether they drive different outcomes you can act on.
How do you communicate insights so teams take action?
Make the recommendation unavoidable. Structure: - Problem and why it matters - Evidence (key charts/tables) - Recommendation + expected impact - Risks and next steps Tailor detail to the audience and include a clear owner and deadline for follow-up actions.
What privacy and compliance basics should data analysts understand (PII, GDPR)?
Analysts must handle sensitive data responsibly. Key concepts: - What counts as PII - Minimization (collect only what you need) - Access control and auditing - Retention and deletion policies Work with legal/security for GDPR/CCPA requirements and avoid exporting sensitive data to unmanaged tools.
What should analysts consider when choosing BI tools (Tableau, Power BI, Looker)?
Tool choice should match governance and workflow. Consider: - Semantic layer and metric consistency - Permissions and row-level security - Performance and caching - Collaboration and versioning - Total cost The best BI tool is the one your org can govern and maintain without fragmented definitions.
What should good analysis documentation include for reproducibility?
Documentation reduces repeated work and errors. Include: - Data sources and filters - Metric definitions and assumptions - SQL queries / notebooks - Known limitations and edge cases - Version/date and owner Good docs make analyses auditable and reusable, especially when stakeholders revisit decisions months later.