Building a Fundamentals-Driven Stock Screener

2026-01-04

Most screeners are glorified spreadsheet filters. P/E < 15, yield > 3%, done. But cheap stocks are often cheap for good reason.

This project started after reading Double Vision on share class arbitrage. Samsung prefs at 24% discount to common, TSMC ADRs at premium to local. Structural inefficiencies from index rules, passive flows, liquidity prefs. If these exist in share class arb, they exist in fundamental valuation too. Identical quality metrics, wildly different multiples. Sector misclassification, analyst neglect, headline risk.

Stack: Python 3.11, TimescaleDB, Polygon.io.

Why existing screeners fail

Value traps: Low P/E stays low because the business is deteriorating
Static snapshots: No trajectory. ROE improving or declining?
Missing context: 5% yield means nothing if diluting 10% annually
No confidence weighting: All metrics treated equally regardless of data quality

Architecture

Polygon.io → Rate Limiter → Ingestion → TimescaleDB
                                              ↓
                                    DCF / Quality / Catalyst modules
                                              ↓
                                    Composite score + veto flags
                                              ↓
                                    Top 30 ranked

TimescaleDB

Hypertables partition by time automatically. daily_metrics stores OHLCV + fundamentals for every ticker/day. Primary key (ticker, date) for efficient range queries.

SELECT create_hypertable('daily_metrics', 'date', chunk_time_interval => INTERVAL '1 month');

Rate limiting

Polygon allows 1000 req/min. Token bucket instead of naive sleep:

class RateLimiter:
    def __init__(self, max_tokens=1000):
        self.tokens = float(max_tokens)
        self.refill_rate = max_tokens / 60.0
        self._lock = asyncio.Lock()

    async def acquire(self, n=1):
        async with self._lock:
            elapsed = time.time() - self.last_refill
            self.tokens = min(self.max_tokens, self.tokens + elapsed * self.refill_rate)
            if self.tokens >= n:
                self.tokens -= n
                return
            await asyncio.sleep((n - self.tokens) / self.refill_rate)

Burst up to 1000, refill at 16.67/sec. Calculates exact wait time instead of polling.

DCF with confidence intervals

DCF is sensitive. Small Δ in growth rate swings output 50%. So every valuation gets a confidence score:

@dataclass
class IntrinsicValueResult:
    dcf_value: Optional[float]
    dcf_confidence: float  # 0-1
    relative_value: Optional[float]
    relative_confidence: float
    margin_of_safety: Optional[float]

Confidence penalized for:

High FCF volatility
<3 years history
Growth >20% projected indefinitely
Negative FCF in 2+ years
Terminal value >90% of total

Final value = confidence-weighted avg of DCF and relative (sector median P/E × EPS).

Scoring

Four factors, weighted sum:

Value (35%): P/E, P/B, EV/EBITDA vs sector peers
Quality (30%): ROE, margins, consistency
Capital allocation (20%): Buybacks, dividends, debt paydown, insider buying
Catalysts (15%): Earnings beats, insider clusters

But the score alone isn't enough. Veto thresholds filter out:

@dataclass
class Thresholds:
    min_quality_score: float = 60.0
    min_dollar_volume: float = 1_000_000
    max_debt_to_equity: float = 3.0
    min_momentum_score: float = 20.0  # no falling knives
    max_share_dilution_3y: float = 0.25
    max_net_debt_to_ebitda: float = 4.0

Stocks accumulate value_trap_flags: "dilution", "negative_momentum", "high_leverage". These often matter more than the composite score.

Momentum is a veto, not a factor. Bottom quintile excluded. Not chasing high-flyers.

Data model

tickers: metadata, sector, shares outstanding
daily_metrics: hypertable, OHLCV + 20 fundamental fields
insider_transactions: Form 4, filtered to >$100k purchases
dividends: payment history, ex-dates, specials
buyback_programs: authorized amounts

Insider clusters (multiple execs buying in short window) weighted higher than single purchases.

Lessons

Confidence > precision. Surface uncertainty explicitly. Don't pretend you know the growth rate to 2 decimal places.
Token bucket is reusable infra. Same pattern for API clients, DB pools, queues.
TimescaleDB worth it for financial data. Hypertables, auto-partitioning, time-aware query planning.
Veto logic > scoring. A stock failing on dilution + negative_momentum is different from one failing on liquidity.