All presents

4-Week Python for Finance Learning Plan

A structured, zero-to-job-ready Python curriculum for finance professionals. No CS theory, no LeetCode — just the skills that show up in job descriptions.

Who this is for: A finance professional with zero coding experience who wants to land analyst, operations, or data-oriented roles that list Python as a requirement.

Time commitment: 1–1.5 hours per day, 6 days a week, with one catch-up day built in. You can shift days around to fit your schedule — consistency matters more than sequence.

What this is NOT: This is not a software engineering curriculum. No algorithms, no LeetCode, no computer science theory. Everything here is chosen because it shows up in finance job descriptions or makes you dangerous with data.


Before You Start

Setup (Day 0 — 45–75 minutes)

  1. Install Python 3.12+ via python.org
  2. Install VS Codecode.visualstudio.com — then add the Python extension
  3. Or skip local setup entirely — use Google Colab to write and run Python in your browser for free. No installs, no configuration. Great for Weeks 1–2 while you're learning. You can switch to a local setup later when you need to work with local files and databases.
  4. Set up a virtual environment (local installs only — skip this if using Colab):
    # In your terminal, create a project folder and virtual environment
    mkdir python-finance
    cd python-finance
    python -m venv venv
    
    # Activate it (do this every time you open a new terminal)
    # macOS/Linux:
    source venv/bin/activate
    # Windows:
    venv\Scripts\activate
    
    # You'll see (venv) in your prompt — that means it's active.
    # Always activate before running pip install.
    
    A virtual environment keeps your project's packages separate from your system Python. This prevents version conflicts and makes pip freeze > requirements.txt actually useful later.
  5. Create a GitHub accountgithub.com — this becomes your portfolio
  6. Install Gitgit-scm.com — you'll push projects here starting Week 2
  7. Bookmark these — you'll reference them constantly:

Mindset

You don't need to memorize syntax. You need to build muscle memory for looking things up and solving problems. Every professional developer Googles things daily. Your goal is fluency, not memorization.


Week 1: Python Fundamentals

Goal: Be comfortable writing simple scripts that process data. Understand variables, loops, conditionals, functions, basic data structures, and how to read an error message.

Resources This Week:

Day 1 — Variables, Types & Basic Operations (~60 min)

Learn:

  • Variables and assignment (name = "Sarah", revenue = 150000.50)
  • Data types: str, int, float, bool
  • Basic math operations (+, -, *, /, //, %, **)
  • print() and f-strings (f"Revenue is ${revenue:,.2f}")

Resource: Automate the Boring Stuff — Chapter 1

Practice:

# Calculate compound interest
principal = 10000
rate = 0.07
years = 5
future_value = principal * (1 + rate) ** years
print(f"${principal:,.2f} at {rate:.0%} for {years} years = ${future_value:,.2f}")

Day 2 — Strings & User Input (~60 min)

Learn:

  • String methods: .upper(), .lower(), .strip(), .replace(), .split()
  • String slicing: ticker[:3]
  • input() for user interaction
  • Type conversion: int(), float(), str()

Resource: Automate the Boring Stuff — Chapter 6

Practice:

# Clean up messy ticker symbols
raw_input = "  aapl, MSFT , goog,  AMZN  "
tickers = [t.strip().upper() for t in raw_input.split(",")]
print(tickers)  # ['AAPL', 'MSFT', 'GOOG', 'AMZN']

# Interactive currency converter
usd = float(input("Enter amount in USD: "))
rate = 0.92  # EUR/USD rate
print(f"${usd:,.2f} = €{usd * rate:,.2f}")

Day 3 — Conditionals & Comparisons (~60 min)

Learn:

  • if, elif, else
  • Comparison operators: ==, !=, >, <, >=, <=
  • Logical operators: and, or, not
  • Truthiness (empty strings/lists are False)

Resource: Automate the Boring Stuff — Chapter 2

Practice:

# Flag transactions for review
amount = 15000
category = "wire_transfer"

if amount > 10000 and category == "wire_transfer":
    print("FLAG: Large wire transfer — requires compliance review")
elif amount > 10000:
    print("FLAG: Large transaction — manager approval needed")
else:
    print("APPROVED: Within normal limits")

Day 4 — Lists & Loops (~75 min)

Learn:

  • Lists: creating, indexing, slicing, .append(), .remove(), .sort()
  • for loops (iterating over lists)
  • while loops
  • range() and enumerate()
  • List comprehensions (the Pythonic shortcut)

Resource: Automate the Boring Stuff — Chapter 4 (covers lists, loops, and range())

Practice:

# Identify which months beat target
monthly_revenue = [82000, 91000, 78000, 95000, 88000, 102000,
                   97000, 85000, 110000, 93000, 99000, 115000]
target = 90000

above_target = [i+1 for i, rev in enumerate(monthly_revenue) if rev > target]
print(f"Months above ${target:,} target: {above_target}")
print(f"Hit rate: {len(above_target)}/{len(monthly_revenue)}")

Day 5 — Dictionaries & Sets (~75 min)

Learn:

  • Dictionaries: key-value pairs, .keys(), .values(), .items(), .get()
  • Nested dictionaries (common in financial data/JSON)
  • Sets: unique values, intersections, unions
  • When to use each: list vs dict vs set

Resource: Automate the Boring Stuff — Chapter 5

Practice:

# Portfolio holdings
portfolio = {
    "AAPL": {"shares": 50, "avg_cost": 142.50},
    "MSFT": {"shares": 30, "avg_cost": 285.00},
    "GOOG": {"shares": 10, "avg_cost": 2750.00},
}

current_prices = {"AAPL": 178.50, "MSFT": 310.00, "GOOG": 2900.00}

for ticker, holding in portfolio.items():
    cost_basis = holding["shares"] * holding["avg_cost"]
    market_value = holding["shares"] * current_prices[ticker]
    pnl = market_value - cost_basis
    print(f"{ticker}: P&L = ${pnl:,.2f}")

Day 6 — Functions & Debugging Basics (~75 min)

Learn:

  • Defining functions with def
  • Parameters, arguments, return values
  • Default arguments
  • Docstrings (brief descriptions of what a function does)
  • Why functions matter: reusability, readability, testing
  • Reading tracebacks: when Python throws an error, read it bottom to top. The last line is the error type and message; the lines above show the call stack leading to it. Get comfortable with common errors:
    • NameError — you used a variable that doesn't exist (typo?)
    • TypeError — you passed the wrong type (string where a number was expected?)
    • IndexError — you went past the end of a list
    • KeyError — you asked for a dictionary key that doesn't exist

Resource: Automate the Boring Stuff — Chapter 3

Practice:

def calculate_loan_payment(principal, annual_rate, years):
    """Calculate monthly payment for a fixed-rate loan."""
    monthly_rate = annual_rate / 12
    num_payments = years * 12
    payment = principal * (monthly_rate * (1 + monthly_rate)**num_payments) / \
              ((1 + monthly_rate)**num_payments - 1)
    return round(payment, 2)

# Test it
print(calculate_loan_payment(300000, 0.065, 30))  # ~$1,896.20
print(calculate_loan_payment(300000, 0.055, 15))  # ~$2,451.25

# Debugging exercise: intentionally break the function call below
# and practice reading the traceback Python gives you.
# Try: calculate_loan_payment("300000", 0.065, 30)  # TypeError — string, not number
# Try: calculate_loan_payment(300000, 0, 30)          # ZeroDivisionError — 0% rate

Day 7 — File I/O, Error Handling & Git Basics (~90 min)

Learn:

  • Reading/writing text files (open(), with statement)
  • Reading/writing CSV files with the csv module
  • try/except for error handling
  • Common exceptions: FileNotFoundError, ValueError, KeyError
  • Git basics (just enough to push code to GitHub):
    • git init — turn a folder into a Git repository
    • git add . — stage your files
    • git commit -m "message" — save a snapshot
    • git remote add origin <url> — connect to GitHub
    • git push -u origin main — upload your code

Resources:

Practice:

import csv

# Write sample transaction data
transactions = [
    ["date", "description", "amount", "category"],
    ["2026-01-15", "Office Supplies", 234.50, "Operations"],
    ["2026-01-16", "Client Dinner", 187.25, "Entertainment"],
    ["2026-01-17", "Software License", 1200.00, "Technology"],
]

with open("transactions.csv", "w", newline="") as f:
    writer = csv.writer(f)
    writer.writerows(transactions)

# Read it back with error handling
try:
    with open("transactions.csv", "r") as f:
        reader = csv.DictReader(f)
        for row in reader:
            print(f"{row['date']}: {row['description']} — ${float(row['amount']):,.2f}")
except FileNotFoundError:
    print("Error: transactions.csv not found. Check the file path.")
except ValueError as e:
    print(f"Error parsing data: {e}")

Then push your work to GitHub:

cd python-finance
git init
git add .
git commit -m "Week 1: Python fundamentals exercises"
# Create a new repo on github.com, then:
git remote add origin https://github.com/YOUR_USERNAME/python-finance.git
git push -u origin main

Week 1 Milestone

You should be able to:

  • Write a script from scratch that reads a CSV, processes the data, and outputs results
  • Use functions to organize your code
  • Handle basic errors with try/except
  • Read a Python traceback and identify what went wrong
  • Push your code to GitHub using basic Git commands
  • Feel comfortable reading Python code even if you couldn't write it from memory yet

Week 2: pandas — Your New Excel

Goal: Replace 80% of what you do in Excel with pandas. This is the single most valuable skill for finance Python roles.

Resources This Week:

Note: Many of this week's code snippets build on each other. Work in one continuous notebook (or Colab file) so variables like df carry forward between exercises.

Day 8 — pandas Fundamentals (~75 min)

Learn:

  • Install pandas (if working locally, activate your venv first): pip install pandas openpyxl
  • DataFrame and Series — the two core objects
  • Reading data: pd.read_csv(), pd.read_excel()
  • Inspecting data: .head(), .tail(), .shape, .dtypes, .describe(), .info()

Resource: Kaggle — pandas Course (Lessons 1–2) (free, interactive, browser-based)

Practice:

import pandas as pd

# Create a DataFrame from scratch (simulating a ledger)
data = {
    "date": ["2026-01-05", "2026-01-12", "2026-01-19", "2026-01-26", "2026-02-02"],
    "client": ["Acme Corp", "Globex", "Acme Corp", "Initech", "Globex"],
    "amount": [15000, 8500, 22000, 12000, 9500],
    "status": ["paid", "pending", "paid", "overdue", "paid"],
}
df = pd.DataFrame(data)
df["date"] = pd.to_datetime(df["date"])

print(df.describe())
print(df.dtypes)

Day 9 — Selecting, Filtering & Sorting (~75 min)

Learn:

  • Column selection: df["col"], df[["col1", "col2"]]
  • Row filtering: df[df["amount"] > 10000]
  • Multiple conditions: & (and), | (or), ~ (not) — always wrap conditions in parentheses
  • Sorting: .sort_values(), .sort_index()
  • .loc[] and .iloc[] for precise indexing

Resource: Kaggle — pandas Course (Lesson 3)

Practice:

# Filter for large paid invoices
large_paid = df[(df["amount"] > 10000) & (df["status"] == "paid")]
print(large_paid)

# Sort by amount descending
print(df.sort_values("amount", ascending=False))

Day 10 — GroupBy & Aggregation (~75 min)

Learn:

  • .groupby() — the pandas equivalent of pivot tables
  • Aggregation functions: .sum(), .mean(), .count(), .min(), .max()
  • .agg() for multiple aggregations at once
  • .pivot_table() — literally pivot tables in code

Resource: Kaggle — pandas Course (Lesson 4)

Practice:

# Revenue by client
print(df.groupby("client")["amount"].agg(["sum", "mean", "count"]))

# Pivot: clients vs status
pivot = df.pivot_table(values="amount", index="client", columns="status",
                       aggfunc="sum", fill_value=0)
print(pivot)

Day 11 — Data Cleaning (~90 min)

Learn:

  • Handling missing data: .isna(), .fillna(), .dropna()
  • Renaming columns: .rename(), .columns
  • Changing data types: .astype()
  • String operations: df["col"].str.upper(), .str.contains(), .str.strip()
  • Removing duplicates: .drop_duplicates()

Resource: Kaggle — Data Cleaning Course (Lessons 1–3) (free)

Practice:

# Simulate messy data
messy = pd.DataFrame({
    "Name": ["  john smith", "JANE DOE", "john smith", "Bob Jones  "],
    "Department": ["Finance", None, "Finance", "Operations"],
    "Salary": ["85000", "92000", "85000", None],
})

# Clean it
cleaned = messy.copy()
cleaned["Name"] = cleaned["Name"].str.strip().str.title()
cleaned = cleaned.drop_duplicates(subset="Name")
cleaned["Salary"] = pd.to_numeric(cleaned["Salary"], errors="coerce")
cleaned["Department"] = cleaned["Department"].fillna("Unknown")
print(cleaned)

Day 12 — Dates & Time Series for Finance (~75 min)

Learn:

  • pd.to_datetime() for converting date columns
  • .dt accessors: .dt.year, .dt.month, .dt.day, .dt.day_name()
  • Month-end dates: pd.offsets.MonthEnd(), .dt.to_period("M")
  • Grouping by month/quarter: .groupby(df["date"].dt.to_period("M"))
  • Aging buckets (how many days since an event)

Resource: pandas docs — Time series / date functionality

Practice:

import pandas as pd
from datetime import datetime

# Invoice aging report
invoices = pd.DataFrame({
    "invoice_id": [101, 102, 103, 104, 105, 106],
    "client": ["Acme", "Globex", "Initech", "Acme", "Globex", "Initech"],
    "invoice_date": pd.to_datetime([
        "2025-12-01", "2025-12-15", "2026-01-05",
        "2026-01-20", "2026-02-10", "2026-02-28"
    ]),
    "amount": [5000, 12000, 3200, 8500, 15000, 4100],
    "status": ["paid", "overdue", "overdue", "paid", "pending", "overdue"],
})

# Calculate days outstanding
today = pd.Timestamp("2026-03-06")
invoices["days_outstanding"] = (today - invoices["invoice_date"]).dt.days

# Create aging buckets (standard AR aging)
bins = [0, 30, 60, 90, float("inf")]
labels = ["Current", "31-60 Days", "61-90 Days", "90+ Days"]
invoices["aging_bucket"] = pd.cut(invoices["days_outstanding"], bins=bins, labels=labels)

# Summary by aging bucket
print(invoices.groupby("aging_bucket", observed=True)["amount"].agg(["sum", "count"]))
print()

# Monthly revenue summary using .dt accessor
invoices["month"] = invoices["invoice_date"].dt.to_period("M")
print(invoices.groupby("month")["amount"].sum())

Day 13 — Merging, Joining & Reconciliation (~90 min)

Learn:

  • pd.merge() — SQL-style joins (inner, left, right, outer)
  • pd.concat() — stacking DataFrames vertically
  • Merge on single or multiple keys
  • Validating merges: indicator=True to flag matched/unmatched rows
  • Reconciliation basics: matching two data sources and reporting exceptions

Resource: pandas docs — Merge, join, concatenate

Practice:

# --- Basic merge (VLOOKUP equivalent) ---
employees = pd.DataFrame({
    "emp_id": [1, 2, 3, 4],
    "name": ["Alice", "Bob", "Carol", "Dave"],
    "dept_id": [10, 20, 10, 30],
})

departments = pd.DataFrame({
    "dept_id": [10, 20, 30],
    "dept_name": ["Finance", "Operations", "Compliance"],
    "budget": [500000, 350000, 275000],
})

merged = pd.merge(employees, departments, on="dept_id", how="left")
print(merged)

# --- Reconciliation: match bank statement to internal ledger ---
bank_statement = pd.DataFrame({
    "ref": ["TXN-001", "TXN-002", "TXN-003", "TXN-005"],
    "bank_amount": [1500.00, 2300.00, 875.50, 4200.00],
    "bank_date": pd.to_datetime(["2026-01-05", "2026-01-06", "2026-01-07", "2026-01-09"]),
})

internal_ledger = pd.DataFrame({
    "ref": ["TXN-001", "TXN-002", "TXN-003", "TXN-004"],
    "ledger_amount": [1500.00, 2300.00, 875.50, 3100.00],
    "ledger_date": pd.to_datetime(["2026-01-05", "2026-01-06", "2026-01-07", "2026-01-08"]),
})

# Outer merge with indicator to find matches and exceptions
recon = pd.merge(bank_statement, internal_ledger, on="ref", how="outer", indicator=True)
recon.columns = ["ref", "bank_amount", "bank_date", "ledger_amount", "ledger_date", "match_status"]

print("\n--- Full Reconciliation ---")
print(recon)

# Exception report
matched = recon[recon["match_status"] == "both"]
bank_only = recon[recon["match_status"] == "left_only"]   # in bank, not in ledger
ledger_only = recon[recon["match_status"] == "right_only"]  # in ledger, not in bank

print(f"\nMatched: {len(matched)}  |  Bank only: {len(bank_only)}  |  Ledger only: {len(ledger_only)}")

Day 14 — Excel Integration & Mini Project (~90 min)

Learn:

  • Reading Excel files: pd.read_excel() with sheet_name, skiprows, usecols
  • Writing to Excel: .to_excel() with multiple sheets using ExcelWriter
  • Reading from multiple sheets in one workbook
  • Exporting to CSV: .to_csv()

Resource: Real Python — A Guide to Excel Spreadsheets in Python with openpyxl

Mini project — Financial Data Cleaner (~60 min):

Build a script that:

  1. Reads a messy CSV or Excel file (create sample data with intentional issues)
  2. Cleans column names (lowercase, remove spaces)
  3. Handles missing values and removes duplicates
  4. Standardizes date formats
  5. Prints before/after row counts as a sanity check
  6. Outputs a clean Excel file with a summary sheet

Push this to GitHub. This is your first portfolio piece.

# Starter template for the mini project
with pd.ExcelWriter("quarterly_report.xlsx") as writer:
    df.to_excel(writer, sheet_name="All Transactions", index=False)
    large_paid.to_excel(writer, sheet_name="Large Paid", index=False)
    pivot.to_excel(writer, sheet_name="Pivot Summary")

print("Report saved to quarterly_report.xlsx")

Week 2 Milestone

You should be able to:

  • Open any CSV/Excel file in pandas and explore it
  • Filter, sort, group, and aggregate data
  • Work with dates: extract month/quarter, calculate aging, group by period
  • Merge multiple datasets and validate the merge results
  • Match two data sources and identify exceptions (reconciliation)
  • Clean messy data and export polished results
  • Explain what pandas does to a non-technical interviewer

Week 3: SQL, Databases & Automation

Goal: Learn to query databases with SQL, connect Python to SQLite, and automate one repetitive file-based task. These skills separate "I know pandas" from "I can build things."

Resources This Week:

Day 15 — SQL Fundamentals Part 1 (~75 min)

Learn:

  • What SQL is and why it matters (most finance data lives in databases)
  • SELECT, FROM, WHERE
  • ORDER BY, LIMIT
  • Comparison operators, LIKE, IN, BETWEEN
  • NULL handling: IS NULL, IS NOT NULL

Resource: SQLBolt — Lessons 1–4 (free, interactive, no setup)

Also recommended: Mode SQL Tutorial (free, finance-friendly examples)

Practice on SQLBolt — complete Lessons 1 through 4 interactively.

Day 16 — SQL Fundamentals Part 2 (~75 min)

Learn:

  • GROUP BY and aggregate functions: COUNT(), SUM(), AVG(), MIN(), MAX()
  • HAVING (filtering after grouping)
  • Aliases with AS
  • CASE WHEN — conditional logic inside queries (like IF in Excel)
  • COALESCE() — replace NULLs with a default value

Resource: SQLBolt — Lessons 5–7

Practice:

-- Find each department's total and average salary, only showing depts over $200k total
SELECT department,
       COUNT(*) AS headcount,
       SUM(salary) AS total_salary,
       ROUND(AVG(salary), 2) AS avg_salary
FROM employees
GROUP BY department
HAVING SUM(salary) > 200000
ORDER BY total_salary DESC;

-- CASE WHEN: categorize transactions by size
SELECT vendor,
       amount,
       CASE
           WHEN amount >= 10000 THEN 'Large'
           WHEN amount >= 1000  THEN 'Medium'
           ELSE 'Small'
       END AS size_category
FROM transactions;

-- COALESCE: fill in missing department names
SELECT emp_id,
       name,
       COALESCE(department, 'Unassigned') AS department
FROM employees;

Day 17 — SQL Joins (~75 min)

Learn:

  • INNER JOIN — only matching rows
  • LEFT JOIN — all rows from the left table, matches from the right
  • Joining on multiple columns
  • Table aliases for readability
  • When to use each join type

Resource: SQLBolt — Lessons 10–12

Practice:

-- Join employees to departments, showing all employees even without a department
SELECT e.name,
       e.salary,
       COALESCE(d.dept_name, 'No Department') AS department
FROM employees e
LEFT JOIN departments d ON e.dept_id = d.dept_id
ORDER BY e.salary DESC;

Day 18 — Python + SQLite Together (~90 min)

Learn:

  • sqlite3 module (built into Python — no install needed)
  • Creating a database, creating tables, inserting data
  • Running SQL queries from Python
  • Loading SQL results directly into pandas: pd.read_sql()

Practice:

import sqlite3
import pandas as pd

# Create a file-backed SQLite database
conn = sqlite3.connect("finance.db")

# Create and populate a table
conn.execute("""
    CREATE TABLE IF NOT EXISTS transactions (
        id INTEGER PRIMARY KEY,
        date TEXT,
        vendor TEXT,
        amount REAL,
        category TEXT
    )
""")

transactions = [
    ("2026-01-05", "AWS", 2400.00, "Technology"),
    ("2026-01-06", "Staples", 185.50, "Supplies"),
    ("2026-01-07", "Delta Airlines", 1250.00, "Travel"),
    ("2026-01-10", "AWS", 2400.00, "Technology"),
    ("2026-01-12", "Uber", 45.00, "Travel"),
]

conn.executemany(
    "INSERT INTO transactions (date, vendor, amount, category) VALUES (?, ?, ?, ?)",
    transactions
)
conn.commit()

# Query into pandas — use the SQL you learned this week
df = pd.read_sql("""
    SELECT category,
           SUM(amount) AS total,
           COUNT(*) AS num_transactions,
           ROUND(AVG(amount), 2) AS avg_amount
    FROM transactions
    GROUP BY category
    ORDER BY total DESC
""", conn)
print(df)

conn.close()

Day 19 — Automating File Operations (~75 min)

Learn:

  • pathlib module for file/folder operations
  • Listing files in a directory, filtering by extension
  • Renaming, moving, and organizing files programmatically
  • glob for pattern matching (*.xlsx, *_2026_*.csv)

Resource: Automate the Boring Stuff — Chapter 10

Practice:

from pathlib import Path
import shutil

# Organize files by extension (simulate a messy Downloads folder)
source = Path("messy_folder")

if not source.exists():
    print(f"Folder '{source}' not found — create it and add some test files first.")
else:
    for file in source.iterdir():
        if file.is_file():
            ext = file.suffix.lstrip(".")
            ext_folder = source / (ext if ext else "no_extension")
            ext_folder.mkdir(exist_ok=True)
            shutil.move(str(file), str(ext_folder / file.name))
            print(f"Moved {file.name} -> {ext_folder.name}/")

Day 20 — Report Automation & Scheduling (~75 min)

Learn:

  • Generating timestamped reports programmatically
  • datetime module for timestamps and date math
  • Combining pandas + SQLite + file output into a single pipeline
  • Overview of scheduling: cron (macOS/Linux) or Task Scheduler (Windows)

Resource: Python docs — datetime

Practice:

from datetime import datetime
import pandas as pd
import sqlite3

# Generate a daily summary report from the database
today = datetime.now()
report_name = f"daily_report_{today.strftime('%Y%m%d')}.xlsx"

# Pull data from SQLite
conn = sqlite3.connect("finance.db")
summary = pd.read_sql("""
    SELECT category,
           SUM(amount) AS total,
           COUNT(*) AS count
    FROM transactions
    GROUP BY category
""", conn)
conn.close()

# Add report metadata
summary.to_excel(report_name, index=False)
print(f"Report generated: {report_name} at {today.strftime('%H:%M')}")

Day 21 — Week 3 Mini Project: Automated Monthly Report (~90 min)

Build a script that:

  1. Reads transaction data from a CSV file (create sample data — don't depend on an API)
  2. Loads the data into a SQLite database
  3. Runs SQL queries to summarize by category and month
  4. Generates a formatted Excel report with summary statistics
  5. Timestamps the file name so it can run monthly

Keep it simple — the goal is connecting the pieces (file → database → query → report), not building something production-grade.

Push to GitHub. This is portfolio piece #2.

Week 3 Milestone

You should be able to:

  • Write SQL queries to filter, aggregate, and join data
  • Use CASE WHEN and COALESCE for conditional logic and NULL handling
  • Connect Python and SQLite together with pd.read_sql()
  • Automate file organization and repetitive tasks
  • Build a data pipeline (file → database → query → report)

Week 4: Projects, Portfolio & Job Prep

Goal: Build resume-worthy projects, polish your GitHub portfolio, and learn how to talk about your skills in interviews.

Resources This Week:

Day 22 — Data Visualization Basics (~75 min)

Learn:

  • matplotlib basics: line charts, bar charts
  • Install (if local): pip install matplotlib
  • Saving figures to files: .savefig()
  • pandas built-in plotting: df.plot()
  • Making charts look professional (titles, labels, formatting)

Resource: Kaggle — Data Visualization Course (Lessons 1–3) (free) — note that Kaggle teaches seaborn as well, but focus on matplotlib and df.plot() basics first.

Practice:

import pandas as pd
import matplotlib.pyplot as plt

months = ["Jan", "Feb", "Mar", "Apr", "May", "Jun"]
revenue = [82, 91, 78, 95, 88, 102]
expenses = [70, 75, 72, 80, 77, 85]

df = pd.DataFrame({"Revenue": revenue, "Expenses": expenses}, index=months)

ax = df.plot(kind="bar", figsize=(10, 6), color=["#2563eb", "#dc2626"])
ax.set_title("Monthly Revenue vs Expenses ($K)", fontsize=14, fontweight="bold")
ax.set_ylabel("Amount ($K)")
plt.tight_layout()
plt.savefig("revenue_vs_expenses.png", dpi=150)
print("Chart saved.")

Days 23–25 — Capstone Project: Bank Reconciliation & Exception Report (~4–5 hours total)

This is your flagship portfolio project. It mirrors a real finance-ops workflow: matching two data sources, identifying discrepancies, and producing a clear exception report.

Build a tool that:

  1. Reads two files — a bank statement CSV and an internal ledger CSV (create realistic sample data with ~50–100 rows each)
  2. Matches transactions — join on reference number (or date + amount), flag matched/unmatched rows using indicator=True
  3. Identifies exceptions:
    • Transactions in the bank but not in the ledger (and vice versa)
    • Matched transactions where the amounts differ
  4. Computes aging — how many days old is each unresolved exception?
  5. Summarizes — total matched, total unmatched, dollar value of exceptions, aging breakdown
  6. Outputs a polished Excel workbook:
    • "Matched" sheet — all reconciled transactions
    • "Exceptions" sheet — unmatched and amount mismatches
    • "Summary" sheet — control totals and aging breakdown

Tips:

  • Start with the data and merge logic first. Get that working before adding the Excel output.
  • Use pd.merge(..., indicator=True) — you practiced this on Day 13.
  • Test with small data (5 rows each) before scaling up.

Days 26–27 — Second Project: Expense Report Automator (~3–4 hours total)

Build a tool that:

  1. Reads expense data from multiple CSV files in a folder (one file per employee or per month)
  2. Concatenates them into one DataFrame with pd.concat()
  3. Categorizes expenses using rules (keyword matching on descriptions)
  4. Validates — flags duplicates, missing fields, amounts over a threshold
  5. Summarizes — totals by category, by employee, by month
  6. Outputs a polished Excel workbook with multiple sheets and a chart
  7. Logs any issues to a separate "exceptions" sheet

Day 28 — GitHub Portfolio & Interview Prep (~90 min)

No new coding today. Focus on packaging your work and preparing to talk about it.

GitHub portfolio setup:

  1. Create a GitHub profile with a professional bio
  2. Pin your 3 best repositories (data cleaner, monthly report, bank reconciliation)
  3. Each repo should have:
    • A clear README explaining what it does, how to run it, and a screenshot of output
    • Clean, commented code
    • A requirements.txt file (activate your venv first, then pip freeze > requirements.txt)
    • Sample input data so someone can actually run your code

Resume bullet points you can now honestly write:

  • Built automated financial reporting and reconciliation tools in Python using pandas, reducing manual Excel processing time
  • Developed data pipelines that read source files, clean and transform data, and output formatted Excel reports with exception tracking
  • Created bank reconciliation tool matching transactions across data sources with aging analysis and exception reporting
  • Proficient in SQL for data extraction, aggregation, joins, and conditional logic
  • Experience with Python libraries: pandas, matplotlib, sqlite3, openpyxl

Common interview questions for these roles and how to answer them:

QuestionHow to Answer
"How do you use Python in your workflow?"Describe your capstone projects. Emphasize replacing manual Excel work with automated, repeatable scripts.
"What's pandas?""It's a Python library for working with tabular data — think of it as Excel in code but faster, more reproducible, and it handles large datasets that would slow Excel down."
"Can you write a SQL query?"Walk through a GROUP BY with JOIN. Practice writing queries on paper.
"Tell me about a project you built."Use the STAR method: Situation (manual process), Task (automate it), Action (what you built), Result (time saved, errors reduced).
"How comfortable are you with large datasets?""pandas handles datasets much larger than Excel can manage comfortably. I've worked with CSVs and databases, and I know how to filter and aggregate efficiently so I'm not loading everything into memory at once."

Practice resources for interview SQL:


Daily Schedule Template

Time BlockActivity
0:00–0:15Review yesterday's code. Re-type one exercise from memory.
0:15–0:45Learn today's new concept (read/watch resource).
0:45–1:15Hands-on practice — type the code yourself, modify it, break it, fix it.
1:15–1:30Write a 2-sentence summary of what you learned (in a learning_log.md file).

Resource Summary

Most resources below are free. Where noted, some content is behind a paywall or requires an account.

ResourceWhat It CoversUsed InCostLink
Automate the Boring Stuff (2nd ed.)Python fundamentals, files, automationWeeks 1, 3Freeautomatetheboringstuff.com
Python Official TutorialAuthoritative Python referenceAll weeksFreedocs.python.org/3/tutorial
Real PythonIn-depth articles on any Python topicWeeks 2, 3Freemium (some articles free, full access paid)realpython.com
Google ColabFree browser-based Python environmentWeeks 1–2Freecolab.research.google.com
Kaggle — pandas Coursepandas fundamentals, hands-onWeek 2Free (account required)kaggle.com/learn/pandas
Kaggle — Data CleaningHandling messy real-world dataWeek 2Free (account required)kaggle.com/learn/data-cleaning
Kaggle — Data Visualizationmatplotlib and seabornWeek 4Free (account required)kaggle.com/learn/data-visualization
SQLBoltInteractive SQL fundamentalsWeek 3Freesqlbolt.com
Mode SQL TutorialSQL with analytics focusWeek 3Freemode.com/sql-tutorial
Corey Schafer YouTubePython and pandas video tutorialsWeeks 1–3Freeyoutube.com/c/Coreyms
DataLemurSQL interview practiceWeek 4Freemium (free tier available)datalemur.com
HackerRank — SQLSQL practice problemsWeek 4Free (account required)hackerrank.com/domains/sql
GitHub SkillsGit and GitHub basicsWeek 1Freegithub.com/skills

What You'll Have After 4 Weeks

Skills:

  • Python scripting and automation
  • pandas for data analysis (filtering, grouping, merging, cleaning, reconciliation, exporting)
  • Dates and aging analysis for finance reporting
  • SQL for querying databases (including joins, CASE WHEN, COALESCE)
  • File automation and report generation
  • Basic data visualization
  • Git and GitHub for version control

Portfolio (3 pinned GitHub projects):

  1. Financial Data Cleaner — reads messy files, standardizes and cleans data, exports polished output
  2. Automated Monthly Report — reads source data, stores in SQLite, generates timestamped Excel reports
  3. Bank Reconciliation Tool — matches transactions across two data sources, identifies exceptions, produces aging analysis

Resume additions:

  • Python (pandas, matplotlib, sqlite3, openpyxl)
  • SQL (SQLite, joins, aggregations, conditional logic)
  • Data automation, reconciliation, and pipeline development
  • GitHub portfolio with working code samples

Extension Topics (After the 4 Weeks)

In priority order:

  1. Keep building. Automate something real at your current job (even if they didn't ask for it). Nothing is more compelling in an interview than "I built this to solve an actual problem."
  2. Practice SQL on DataLemur — 15 minutes a day. Most finance Python roles will test SQL.
  3. Working with APIs — learn the requests library to pull data from web APIs. Good practice APIs:
    import requests
    
    response = requests.get("https://open.er-api.com/v6/latest/USD")
    response.raise_for_status()  # Raise an error if the request failed
    data = response.json()
    
    import pandas as pd
    rates = pd.Series(data["rates"], name="rate_vs_usd")
    print(rates[["EUR", "GBP", "JPY", "CAD", "CHF"]])
    
  4. Learn Jupyter Notebooks — many teams use them for analysis. Install with pip install notebook and run with jupyter notebook.
  5. Explore more pandas — time series operations (.resample(), .rolling()), pd.date_range(), and working with financial date indices.
  6. Basic web scrapingBeautifulSoup for pulling data from websites (useful but don't lead with this in interviews).

You don't need to know everything. You need to know enough to be useful on day one — and you will be.