Building a Fundamentals-Driven Stock Screener
Most screeners are glorified spreadsheet filters. P/E < 15, yield > 3%, done. But cheap stocks are often cheap for good reason.
This project started after reading Double Vision on share class arbitrage. Samsung prefs at 24% discount to common, TSMC ADRs at premium to local. Structural inefficiencies from index rules, passive flows, liquidity prefs. If these exist in share class arb, they exist in fundamental valuation too. Identical quality metrics, wildly different multiples. Sector misclassification, analyst neglect, headline risk.
Stack: Python 3.11, TimescaleDB, Polygon.io.
Why existing screeners fail
- Value traps: Low P/E stays low because the business is deteriorating
- Static snapshots: No trajectory. ROE improving or declining?
- Missing context: 5% yield means nothing if diluting 10% annually
- No confidence weighting: All metrics treated equally regardless of data quality
Architecture
Polygon.io → Rate Limiter → Ingestion → TimescaleDB
↓
DCF / Quality / Catalyst modules
↓
Composite score + veto flags
↓
Top 30 ranked
TimescaleDB
Hypertables partition by time automatically. daily_metrics stores OHLCV + fundamentals for every ticker/day. Primary key (ticker, date) for efficient range queries.
SELECT create_hypertable('daily_metrics', 'date', chunk_time_interval => INTERVAL '1 month');
Rate limiting
Polygon allows 1000 req/min. Token bucket instead of naive sleep:
class RateLimiter:
def __init__(self, max_tokens=1000):
self.tokens = float(max_tokens)
self.refill_rate = max_tokens / 60.0
self._lock = asyncio.Lock()
async def acquire(self, n=1):
async with self._lock:
elapsed = time.time() - self.last_refill
self.tokens = min(self.max_tokens, self.tokens + elapsed * self.refill_rate)
if self.tokens >= n:
self.tokens -= n
return
await asyncio.sleep((n - self.tokens) / self.refill_rate)
Burst up to 1000, refill at 16.67/sec. Calculates exact wait time instead of polling.
DCF with confidence intervals
DCF is sensitive. Small Δ in growth rate swings output 50%. So every valuation gets a confidence score:
@dataclass
class IntrinsicValueResult:
dcf_value: Optional[float]
dcf_confidence: float # 0-1
relative_value: Optional[float]
relative_confidence: float
margin_of_safety: Optional[float]
Confidence penalized for:
- High FCF volatility
- <3 years history
- Growth >20% projected indefinitely
- Negative FCF in 2+ years
- Terminal value >90% of total
Final value = confidence-weighted avg of DCF and relative (sector median P/E × EPS).
Scoring
Four factors, weighted sum:
- Value (35%): P/E, P/B, EV/EBITDA vs sector peers
- Quality (30%): ROE, margins, consistency
- Capital allocation (20%): Buybacks, dividends, debt paydown, insider buying
- Catalysts (15%): Earnings beats, insider clusters
But the score alone isn't enough. Veto thresholds filter out:
@dataclass
class Thresholds:
min_quality_score: float = 60.0
min_dollar_volume: float = 1_000_000
max_debt_to_equity: float = 3.0
min_momentum_score: float = 20.0 # no falling knives
max_share_dilution_3y: float = 0.25
max_net_debt_to_ebitda: float = 4.0
Stocks accumulate value_trap_flags: "dilution", "negative_momentum", "high_leverage". These often matter more than the composite score.
Momentum is a veto, not a factor. Bottom quintile excluded. Not chasing high-flyers.
Data model
tickers: metadata, sector, shares outstandingdaily_metrics: hypertable, OHLCV + 20 fundamental fieldsinsider_transactions: Form 4, filtered to >$100k purchasesdividends: payment history, ex-dates, specialsbuyback_programs: authorized amounts
Insider clusters (multiple execs buying in short window) weighted higher than single purchases.
Lessons
- Confidence > precision. Surface uncertainty explicitly. Don't pretend you know the growth rate to 2 decimal places.
- Token bucket is reusable infra. Same pattern for API clients, DB pools, queues.
- TimescaleDB worth it for financial data. Hypertables, auto-partitioning, time-aware query planning.
- Veto logic > scoring. A stock failing on
dilution+negative_momentumis different from one failing onliquidity.