Production specification: Quarto revealJS seminar deck on corporate filings

Working title

Corporate Filings as a Research Data Source

Suggested subtitle:

From research ideas to EDGAR, SeekEdgar, and information extraction

Purpose of this file

This is a detailed production brief for creating a static Quarto revealJS seminar deck. It is intended for an LLM or human slide producer who will build the actual .qmd presentation.

The first attempt was too generic and visually underdeveloped. The revised deck must be more engaging, more concrete, and more useful for research students across a business school.

Core objective

Create a 40-minute seminar-style Quarto revealJS deck introducing corporate filings as a broad research data source for business school research students.

The talk should:

  1. Inspire students to use corporate filings in research.
  2. Demystify what EDGAR filings are.
  3. Go beyond common forms such as 10-K, 10-Q, 8-K, and 13F.
  4. Show how filings can support research in finance, accounting, management, strategy, innovation, information systems, entrepreneurship, marketing, and political economy.
  5. Explain practical access through EDGAR, SEC search tools, SEC APIs, and SeekEdgar.
  6. Include a light to moderate Python example.
  7. Use the presenter’s old website note on SEC textual analysis as a teaching anchor.
  8. Include the presenter’s Journal of Corporate Finance loan-contract example as a motivating case study.
  9. Be static for now. A live demo will be developed later.

Audience

Assume a balanced mix of business school research students:

  • Finance and accounting students may know some common filings.
  • Management, marketing, entrepreneurship, information systems, and international business students may know very little about filings.
  • Many students may be empirical researchers but not specialized in EDGAR.
  • Technical comfort is mixed. Some will code, some will not.

The deck should be accessible but not superficial. It should avoid long legal descriptions and avoid turning into a scraping workshop.

Speaker and tone

The speaker is a finance academic introducing a valuable data source to research students.

Tone:

  • Academic seminar style.
  • Practical and motivating.
  • Conceptual first, technical second.
  • Clear, precise, and research-oriented.
  • Avoid sales language.
  • Avoid sounding like a compliance training session.
  • Avoid overexplaining basic finance terms to finance students, but do not assume non-finance students know filing form codes.

Required output

Create the following deliverables:

  1. corporate-filings.qmd
    • Quarto revealJS source.
    • Should render cleanly with default revealJS settings or the user’s website default theme.
    • Do not over-style.
    • Use callouts, columns, fragments, simple diagrams, and charts.
  2. assets/
    • Local images and charts.
    • Charts should be generated from embedded source data where possible.
    • Avoid remote image dependencies.
  3. README.md
    • Include render instructions:

      quarto render corporate-filings.qmd
    • Include required packages if using Python chunks.

    • Include notes about optional dependencies.

  4. Optional:
    • references.bib, if using formal citations.
    • assets/chart_data.csv, if easier for reproducibility.

Quarto implementation requirements

Use Quarto revealJS.

Suggested YAML:

---
title: "Corporate Filings as a Research Data Source"
subtitle: "From research ideas to EDGAR, SeekEdgar, and information extraction"
author: "Adrian Gao"
date: today
format:
  revealjs:
    slide-number: true
    chalkboard: true
    preview-links: auto
    incremental: false
    code-overflow: wrap
    html-math-method: katex
execute:
  echo: false
  warning: false
  message: false
---

Avoid heavy custom CSS unless needed. The user’s website already enables default theming.

Use revealJS features:

  • Section divider slides.
  • Columns.
  • Fragments for staged points.
  • Callouts:
    • callout-note for key concepts.
    • callout-tip for practical advice.
    • callout-warning for pitfalls.
    • callout-important for takeaways.
  • Small code blocks only in the practical section or appendix.
  • Speaker notes are optional but useful.

Do not use long walls of text. For a seminar deck, many slides should have one strong message and one visual.

Content philosophy

The talk should not be a catalog of filing codes.

The deck should repeatedly connect:

Research question → filing type → information extracted → empirical design

For example:

Do lenders shape acquirer behavior? → loan agreement exhibits → acquisition covenants and collateral clauses → original contract-level dataset linked to M&A outcomes.

This framing is more useful than saying “Form 10-K contains Item 1, Item 1A, Item 7, etc.”

Overall narrative

The talk should follow this arc:

  1. Hook: Corporate filings are a public research infrastructure.
  2. Map: What the filing ecosystem contains.
  3. Examples: What kinds of research ideas filings enable.
  4. Workflow: How to move from idea to data.
  5. Tools: EDGAR, SEC APIs, SeekEdgar, Python.
  6. Pitfalls: How not to misuse filings.
  7. Activation: How students can start this month.

The central message:

Corporate filings are not just a finance dataset. They are a public, legally consequential, time-stamped archive of firm behavior.

Timing target

Total: 40 minutes.

Suggested timing:

Section Time
Opening and motivation 5 minutes
Filing ecosystem 8 minutes
Research examples 12 minutes
Workflow and tools 10 minutes
Pitfalls and takeaway 5 minutes

Target deck length:

  • Main slides: 26 to 30.
  • Appendix slides: 8 to 12.
  • Total source slides can be 38 to 42, but only 26 to 30 should be main-track slides.

Must-use factual anchors

Use these verified facts and sources.

SEC access and APIs

The SEC states that submissions by company and extracted XBRL data are available via RESTful APIs on data.sec.gov, with JSON formatted data.

Source: - SEC Developer Resources: https://www.sec.gov/about/developer-resources - SEC EDGAR APIs: https://www.sec.gov/search-filings/edgar-application-programming-interfaces - SEC accessing EDGAR data: https://www.sec.gov/search-filings/edgar-search-assistance/accessing-edgar-data

SEC filing volume and search volume

Use this as a visually engaging statistic slide.

From the SEC FY 2025 Congressional Budget Justification, workload data for the EDGAR Business Office:

Metric FY 2023 actual FY 2024 estimate FY 2025 request
Online searches for EDGAR filings, in millions 21,730 22,817 23,957
Number of electronic filings received, in millions 2.43 2.50 2.58

Interpretation: - 21,730 million searches = 21.73 billion searches. - 23,957 million searches = 23.957 billion searches. - Electronic filings received are around 2.4 to 2.6 million per year in these figures.

Source: - SEC FY 2025 Congressional Budget Justification, EDGAR Business Office workload data, page 77 in the PDF: https://www.sec.gov/files/fy-2025-congressional-budget-justification.pdf

SeekEdgar

SeekEdgar describes tools for searching SEC and Indian company filings across full documents, tables, footnotes, MD&A, SOX 404 reports, audit reports, and CD&A. It also emphasizes no-coding search workflows and Excel output.

Sources: - SeekEdgar home: https://www.seekedgar.com/ - SeekEdgar user guide or related pages, if accessible: https://www.seekedgar.com/fraank.html

Presenter’s old website note

Use the old note as the “you can do this” anchor. It introduces textual analysis on SEC filings, including building a filing index, downloading filings, and extracting information.

Source: - https://mingze-gao.com/posts/textual-analysis-on-sec-filings/

Presenter’s JCF paper

Use this as a light case study. Do not turn the presentation into a full paper presentation.

Paper: - Gao, Mingze, Thanh Son Luong, and Buhui Qiu. 2026. “Real Estate Collateral, Lender Screening, and M&A Performance.” Journal of Corporate Finance 98: 102962. - DOI: https://doi.org/10.1016/j.jcorpfin.2026.102962

Key message for this talk: - The study uses hand-collected loan agreements linked to M&A deals. - It shows how contractual details in filings can support original data construction. - The empirical point is not the main focus of the talk. The main focus is that filings and exhibits can reveal contract terms that are otherwise difficult to observe.

Sources: - ScienceDirect: https://www.sciencedirect.com/science/article/pii/S0929119926000209 - Macquarie researcher page: https://researchers.mq.edu.au/en/publications/real-estate-collateral-lender-screening-and-mampa-performance/ - Presenter research page: https://mingze-gao.com/research/

Important writing constraints

  • Do not use em dashes.
  • Use precise short sentences.
  • Avoid filler.
  • Avoid saying “filings are boring but useful.” Instead, show that they are rich.
  • Avoid calling EDGAR a “database of annual reports.” It is much broader.
  • Avoid overusing 10-K, 10-Q, 8-K, and 13F.
  • Use the term “corporate filings” broadly, but clarify that EDGAR is a U.S. SEC system.
  • When discussing SeekEdgar, do not oversell. Present it as an exploratory and no-code complement to EDGAR and Python.

Visual design requirements

Make the deck engaging.

Required visual elements:

  1. Big-number slide
    • Use SEC workload data.
    • Show electronic filings received and EDGAR searches.
    • Recommended visual:
      • Left: line/bar chart of electronic filings received, FY 2023 to FY 2025.
      • Right: line/bar chart of EDGAR searches, FY 2023 to FY 2025.
    • Keep charts simple and readable.
  2. Filing ecosystem map
    • Use a 2D map or grouped cards.
    • Group by research use-case:
      • Periodic reporting
      • Event disclosure
      • Ownership and trading
      • Governance and shareholder process
      • Capital raising
      • Deals and restructuring
      • Exhibits and contracts
  3. Research idea matrix
    • Rows: disciplines.
    • Columns: filing objects.
    • Each cell gives one example.
    • Keep it visually compact.
  4. From question to dataset workflow
    • Simple pipeline:
      • Research question
      • Filing family
      • Manual inspection
      • Extraction rule
      • Validation
      • Dataset
      • Empirical design
  5. Data layers diagram
    • Corporate filing as layered object:
      • Metadata
      • Form items
      • Tables
      • XBRL facts
      • Text narratives
      • Exhibits
      • Signatures and dates
  6. Loan-contract case-study graphic
    • Show a pipeline:
      • Loan agreement exhibit
      • Collateral and covenant clauses
      • Hand-collected variables
      • Link to M&A outcomes
    • Use this to show “filings enable original data construction.”
  7. Extraction ladder
    • Manual coding
    • Keyword search
    • Regex or rule-based parsing
    • HTML parsing
    • XBRL extraction
    • NLP or LLM-assisted coding
    • Human validation
  8. Pitfall grid
    • Common error on left.
    • How to avoid it on right.

Optional visual elements:

  • Screenshot-like mockups of EDGAR search pages.
  • A mock search query box with “AI”, “cybersecurity”, “credit agreement”, “customer concentration”.
  • A filing lifecycle timeline.
  • A “hidden goldmine” diagram showing main filing versus exhibit attachments.

Do not use stock-photo clutter. Prefer diagrams, charts, and well-designed text blocks.

Embedded chart data

Use the following data for the big-number charts.

edgar_stats = [
    {"fiscal_year": "FY 2023", "electronic_filings_millions": 2.43, "online_searches_billions": 21.730},
    {"fiscal_year": "FY 2024", "electronic_filings_millions": 2.50, "online_searches_billions": 22.817},
    {"fiscal_year": "FY 2025", "electronic_filings_millions": 2.58, "online_searches_billions": 23.957},
]

Chart instruction:

  • Do not imply these are all corporate operating-company filings. They are SEC EDGAR workload data.
  • Label clearly:
    • “Electronic filings received, millions”
    • “Online searches for EDGAR filings, billions”
  • Source note:
    • “Source: SEC FY 2025 Congressional Budget Justification, EDGAR Business Office workload data.”

Slide-by-slide specification

Slide 1. Title

Title: Corporate Filings as a Research Data Source

Subtitle: From research ideas to EDGAR, SeekEdgar, and information extraction

Visual: - Minimal seminar title slide. - Optional subtle background made from small filing-form labels: 10-K, DEF 14A, S-1, 13D, EX-10, 20-F, 6-K, Form 4.

Speaker message: - This is not a talk about memorizing form codes. - It is a talk about using public disclosures to study firms.

Slide 2. Opening hook

Title: A public archive of firm behavior

Content: - Firms disclose strategy, risks, transactions, ownership, governance, contracts, and financials. - The disclosures are public, time-stamped, and legally consequential. - This makes filings a research infrastructure, not just a reporting requirement.

Visual: - Three-column framing: - What firms do - What firms disclose - What researchers can measure

Callout:

:::{.callout-important}
The key opportunity is not just reading filings. It is converting disclosure into research variables.
:::

Slide 3. Why this matters beyond finance and accounting

Title: Not only finance and accounting

Content as cards: - Strategy: competitive positioning, restructuring, market entry. - Management: leadership, governance, incentives, organizational change. - Innovation and IS: AI, cybersecurity, data governance, digital transformation. - Marketing: customers, channels, product risk, brand incidents. - Entrepreneurship: IPOs, founder control, venture exits. - Political economy: regulation, sanctions, geopolitical exposure.

Visual: - Discipline cards or a matrix.

Speaker message: - A filing can be read as a legal document, a financial report, a strategy document, or a text corpus.

Slide 4. Big numbers

Title: EDGAR is used at massive scale

Content: - Around 2.4 to 2.6 million electronic filings received annually in SEC FY 2023 to FY 2025 workload data. - Around 21.7 to 24.0 billion online searches for EDGAR filings in the same workload data.

Visual: - Two charts: - Electronic filings received, millions. - Online searches, billions.

Source note: - SEC FY 2025 Congressional Budget Justification, EDGAR Business Office workload data.

Design: - This should be an engaging statistics slide, not a table-only slide.

Slide 5. What makes filings distinctive?

Title: Why filings are special as research data

Use four blocks:

  1. Public
    • No proprietary access required for core EDGAR data.
  2. Time-stamped
    • Filing dates and event dates can be linked to outcomes.
  3. Rich
    • Text, tables, XBRL, exhibits, signatures, metadata.
  4. Legally consequential
    • Disclosures are made under regulatory obligations.

Callout:

:::{.callout-note}
Filings are imperfect, strategic, and sometimes boilerplate. But those features can themselves become research objects.
:::

Slide 6. Section divider

Title: 1. The filing ecosystem

Subtitle: Think in research use-cases, not form codes

Visual: - Dark or simple section divider, depending on site theme.

Slide 7. What is EDGAR?

Title: EDGAR in one slide

Content: - SEC’s Electronic Data Gathering, Analysis, and Retrieval system. - Primary electronic submission and public access system for SEC filings. - Includes company filings, individual ownership filings, fund filings, exhibits, and metadata. - Public access through web search, company pages, full-text search, and APIs.

Visual: - Simple ecosystem: - Filers - EDGAR - Public users - Researchers - Investors - Regulators

Source: - SEC submit-filings or accessing EDGAR data pages.

Slide 8. The filing universe, grouped by research use

Title: A map of the filing universe

Visual: - Large grouped map with seven groups:

  1. Periodic reporting
    • 10-K, 10-Q, 20-F, 40-F
  2. Event disclosure
    • 8-K, 6-K
  3. Ownership and trading
    • 13D, 13G, 13F, Forms 3, 4, 5
  4. Governance and shareholder process
    • DEF 14A, DEFA14A, PRE 14A
  5. Capital raising and listing
    • S-1, F-1, S-3, prospectuses
  6. Deals and restructuring
    • S-4, merger proxy, tender-offer filings, bankruptcy-related disclosures
  7. Exhibits
    • EX-10 credit agreements, employment contracts, merger agreements, charters, bylaws

Speaker message: - The form is only the starting point. The research variable may be in a section, a table, or an exhibit.

Slide 9. Periodic reports

Title: Periodic reports: the recurring narrative of the firm

Content: - Business description. - Risk factors. - MD&A. - Financial statements and footnotes. - Segment and geographic discussion. - Controls, legal proceedings, and management certification.

Research examples: - Cybersecurity risk language. - AI adoption disclosure. - Customer concentration. - International exposure. - Climate and supply-chain risk. - Accounting estimates and footnotes.

Visual: - Filing page as layered object.

Slide 10. Event-driven filings

Title: Event filings: when something material happens

Content: - 8-Ks can report material agreements, leadership changes, results, impairments, restatements, financing events, acquisitions, and more. - Event filings are useful for event-study designs and for constructing event samples. - For foreign private issuers, 6-Ks often transmit important information released abroad or to foreign exchanges.

Visual: - Timeline: - Event - Filing - Market reaction - Follow-up disclosure - Outcome

Slide 11. Ownership, trading, and influence

Title: Ownership and influence

Content: - 13F: institutional holdings. - 13D and 13G: beneficial ownership, activism, large stakes. - Forms 3, 4, 5: insider holdings and transactions. - Useful for studying governance, monitoring, activism, informed trading, and investor networks.

Visual: - Ownership network graphic: - Institution - Insider - Activist - Firm - Governance outcome

Caution: - Make clear 13F has reporting thresholds and only covers certain managers and securities.

Slide 12. Governance and shareholder process

Title: Proxy filings: boards, pay, voting, and shareholder voice

Content: - DEF 14A and related proxy filings. - Executive compensation. - Director background. - Board structure. - Shareholder proposals. - Say-on-pay votes. - Related-party transactions.

Research examples: - CEO incentives and risk-taking. - Board diversity and monitoring. - Shareholder proposals and ESG. - Governance responses to poor performance.

Visual: - Proxy statement as three data panels: - People - Pay - Votes

Slide 13. Issuance, listing, and entrepreneurial firms

Title: Registration statements: firms telling their story to investors

Content: - S-1 and F-1 for IPOs and foreign issuers. - Prospectuses for offerings. - Useful for entrepreneurship, venture exits, founder control, risk narratives, capital raising, and business models. - Rich textual information appears before public trading history is available.

Visual: - IPO lifecycle: - Private firm - S-1/F-1 - Roadshow and pricing - Public firm - Post-IPO outcomes

Slide 14. Exhibits are the hidden goldmine

Title: Exhibits: where contracts live

Content: - Many filings include exhibits. - Exhibits can contain: - Credit agreements. - Acquisition agreements. - Employment contracts. - Underwriting agreements. - Charters and bylaws. - Supply or customer agreements. - Exhibits often hold details not summarized in the main filing.

Visual: - Main filing page with a magnifying glass pointing to EX-10.1 Credit Agreement.

Callout:

:::{.callout-important}
For many research questions, the exhibit is the dataset.
:::

Slide 15. Section divider

Title: 2. What research can filings enable?

Subtitle: From disclosure to original variables

Slide 16. Research idea matrix

Title: One archive, many research questions

Visual: - Matrix with disciplines as rows and filing objects as columns. - Example cells:

Discipline Filing object Research idea
Finance EX-10 credit agreements How contract terms shape investment decisions
Accounting 10-K footnotes How estimates and internal controls affect reporting quality
Management DEF 14A How leadership and board structure shape strategy
IS 10-K risk factors How firms disclose cyber and AI risk
Entrepreneurship S-1/F-1 How founder control affects IPO outcomes
Marketing 10-K business sections How customer concentration or product risk affects firm value
IB 20-F/6-K How foreign issuers disclose geopolitical and regulatory exposure

Keep this slide visual. Do not overcrowd.

Slide 17. Case study: loan contracts from filings

Title: Case study: loan contracts as research data

Content: - Question: Can collateral and lender screening shape M&A outcomes? - Data challenge: Detailed contract terms are not usually in standard datasets. - Filing opportunity: Loan agreement exhibits disclose collateral, covenants, and restrictions. - Research design: Hand-collected loan agreements linked to M&A deals.

Visual: - Pipeline: - SEC filing - Loan agreement exhibit - Collateral and covenant clauses - Hand-collected variables - M&A outcomes

Source: - Gao, Luong, and Qiu, Journal of Corporate Finance, 2026.

Speaker message: - The lesson is not just about real estate collateral. - The lesson is that filings can reveal contractual mechanisms behind corporate decisions.

Slide 18. What the loan-contract example teaches

Title: The broader lesson from contract exhibits

Content: - Filings can reveal mechanisms, not just outcomes. - Hand collection can still be high value when the variable is not in commercial databases. - Contracts can connect banking, law, governance, and corporate investment. - A good filing-based project often starts with a small manual pilot.

Visual: - Four cards: - Mechanisms - Measurement - Validation - Original data

Callout:

:::{.callout-tip}
A strong filing project often begins with reading 20 filings very carefully.
:::

Slide 19. Narrative disclosures

Title: Text is data, but not automatically good data

Content: - Risk factors. - MD&A. - Business descriptions. - Legal proceedings. - Footnotes. - Auditor language. - Sustainability or climate-related language. - AI, cybersecurity, privacy, supply chain, sanctions, and geopolitical risk.

Research use: - Measure exposure, attention, concern, complexity, strategic orientation, or response to regulation.

Caution: - Boilerplate language is not always noise. It may itself be strategic.

Visual: - Text snippets with highlighted words, but do not quote large copyrighted passages. - Use synthetic sample snippets if needed.

Slide 20. Governance and influence examples

Title: Governance, ownership, and influence

Content: - Proxy statements: compensation, boards, votes. - 13D/G: activism and block ownership. - Form 4: insider transactions. - 13F: institutional portfolios.

Example research questions: - Do activist threats reshape disclosure? - Do insider trades precede strategic announcements? - Do institutional holders influence innovation or ESG strategy? - Does board structure affect AI/cyber risk governance?

Visual: - Influence chain: - Owners - Governance - Managerial incentives - Disclosure or corporate policy - Outcomes

Slide 21. Innovation and digital transformation examples

Title: Filings for studying innovation, AI, and cyber risk

Content: - Firms discuss technology in business descriptions, risk factors, MD&A, and sometimes 8-Ks. - Cybersecurity incident disclosure and risk disclosure can be linked to market outcomes, governance, customers, and regulation. - AI disclosure can be studied as adoption, hype, risk, capability, or strategic orientation.

Visual: - Conceptual map: - Technology shock - Disclosure - Governance response - Market or real outcome

Callout:

:::{.callout-warning}
Do not treat every mention of “AI” as adoption. Start with careful validation.
:::

Slide 22. Foreign issuers and comparative work

Title: Foreign issuers: beyond U.S. domestic firms

Content: - 20-F annual reports and 6-K current reports. - Useful for comparative disclosure, international strategy, cross-listing, regulatory exposure, and geopolitical risk. - But comparability issues are nontrivial.

Visual: - World map silhouette or global firm icons. - Avoid overcomplicated details.

Slide 23. Section divider

Title: 3. From idea to data

Subtitle: The practical workflow

Slide 24. Workflow from research question to dataset

Title: A practical workflow

Visual: - Pipeline: 1. Define construct. 2. Identify filing family. 3. Manually inspect filings. 4. Write extraction protocol. 5. Pilot and validate. 6. Scale download. 7. Link to outcomes. 8. Document decisions.

Content: - Emphasize that manual inspection comes before large-scale scraping.

Callout:

:::{.callout-important}
Do not automate before you understand what the relevant disclosure actually looks like.
:::

Slide 25. Data layers inside a filing

Title: A filing is not one object

Visual: - Layered stack: - Metadata: CIK, accession number, filing date, form type. - Sections: items, MD&A, risk factors. - Tables: compensation, financials, ownership. - XBRL: structured accounting facts. - Text: narratives and footnotes. - Exhibits: contracts and attachments.

Content: - Different layers require different extraction methods.

Slide 26. Search first, scale later

Title: Search first, scale later

Content: - Start with EDGAR company search and full-text search. - Search phrases that should reveal the mechanism. - Inspect false positives and false negatives. - Build a short protocol. - Only then scale.

Visual: - Search box mockups: - “credit agreement” - “acquisition covenant” - “artificial intelligence” - “cybersecurity incident” - “customer concentration” - “going concern”

Mention: - SEC full-text search covers electronic filings since 2001. - It can filter by date, company, person, category, and location.

Slide 27. EDGAR and SeekEdgar

Title: EDGAR and SeekEdgar play different roles

Content as comparison table:

Tool Best for Limitation
EDGAR web search Official access, company browsing, full-text search Manual exploration can be slow
SEC APIs Reproducible downloads, submissions metadata, XBRL facts Requires coding and careful request handling
SeekEdgar Fast exploratory search, no-code extraction, tables, footnotes, MD&A Availability and access depend on subscription or institutional access
Python Flexible, reproducible pipelines Requires validation and maintenance

Speaker message: - Use the tool that matches the stage of the project. - Exploration and production are different tasks.

Slide 28. Light Python example

Title: A minimal Python example: from CIK to filing metadata

Content: - Show only a compact example. - Use Apple or Microsoft as a familiar example. - Fetch company submissions JSON. - Print recent form types and accession numbers.

Example code:

import requests
import pandas as pd

headers = {
    "User-Agent": "your-name your-email@example.com"
}

cik = "0000320193"  # Apple Inc.
url = f"https://data.sec.gov/submissions/CIK{cik}.json"

data = requests.get(url, headers=headers, timeout=30).json()
recent = pd.DataFrame(data["filings"]["recent"])

recent[["form", "filingDate", "accessionNumber", "primaryDocument"]].head(10)

Important note: - Include a real User-Agent with contact information when making scripted SEC requests. - Keep this as an example only. Do not run live in the static deck unless rendering environment supports internet.

Visual: - Code on the left, output mock table on the right.

Slide 29. Extraction ladder

Title: Information extraction: choose the right level

Visual: - Ladder from simple to complex:

  1. Manual coding.
  2. Keyword screening.
  3. Regular expressions.
  4. HTML parsing.
  5. XBRL facts.
  6. NLP classification.
  7. LLM-assisted extraction.
  8. Human validation.

Content: - Higher complexity is not always better. - For many projects, a validated rule is better than a black-box model.

Slide 30. Validation is the research design

Title: Validation is not an afterthought

Content: - Randomly check extracted observations. - Track false positives and false negatives. - Compare against hand-coded samples. - Record ambiguous cases. - Predefine coding rules where possible. - Report validation clearly.

Visual: - Confusion matrix or simple validation checklist.

Callout:

:::{.callout-warning}
The most common weakness in filing-based research is not access. It is unvalidated extraction.
:::

Slide 31. Section divider

Title: 4. Common pitfalls

Subtitle: Where filing-based projects fail

Slide 32. Pitfall grid

Title: Common pitfalls and fixes

Create a two-column grid:

Pitfall Fix
Treating form type as content Inspect items, sections, and exhibits
Confusing filing date and event date Extract both and justify timing
Ignoring amendments Track /A filings and restatements
Ignoring exhibits Search exhibit indexes
Overusing keywords Validate semantic meaning
Assuming text is comparable over time Account for rule changes and templates
Not documenting parsing decisions Build a reproducible protocol
Letting LLMs extract without audit Use human validation samples

Slide 33. Ethical and practical notes

Title: Responsible use of public filings

Content: - Public does not mean effortless. - Respect SEC access policies and rate limits. - Include contact details in scripted requests. - Avoid unnecessary repeated downloads. - Preserve reproducibility. - Do not infer sensitive individual-level claims beyond what the filing supports.

Visual: - Small checklist.

Slide 34. How to start this month

Title: A four-step starter plan

Content: 1. Pick one construct you cannot easily observe in standard datasets. 2. Identify two or three likely filing families. 3. Read 20 filings manually. 4. Build a pilot extraction and validation table.

Callout:

:::{.callout-tip}
The goal of the first week is not a full dataset. It is to prove that the signal exists.
:::

Slide 35. Closing takeaway

Title: Corporate filings are research infrastructure

Content: - They are public, longitudinal, and legally consequential. - They contain text, tables, XBRL, events, contracts, and metadata. - They can support both hand-collected and scalable computational research. - The best projects begin with a sharp construct and careful reading.

Final one-liner: > Start with a research question, not a form code.

Visual: - Reuse the workflow pipeline, now simplified.

Appendix slide specifications

The appendix should be useful for technically inclined students but should not interrupt the main 40-minute flow.

Appendix A2. Common filing families

Use a compact table:

Family Examples Typical research use
Periodic reports 10-K, 10-Q, 20-F, 40-F Business, risks, financials, MD&A
Current reports 8-K, 6-K Events, agreements, management changes
Ownership 13D, 13G, 13F, Forms 3/4/5 Ownership, activism, insider trading
Proxy DEF 14A, PRE 14A Boards, pay, votes, proposals
Registration S-1, F-1, S-3 IPOs, offerings, firm narratives
Deals S-4, merger proxy, tender offers M&A, restructuring, control changes
Exhibits EX-10, EX-2, EX-3 Contracts, charters, material agreements

Appendix A3. SEC API endpoints

List with examples:

  • Company submissions:
    • https://data.sec.gov/submissions/CIK##########.json
  • Company facts:
    • https://data.sec.gov/api/xbrl/companyfacts/CIK##########.json
  • Company concept:
    • https://data.sec.gov/api/xbrl/companyconcept/CIK##########/us-gaap/AccountsPayableCurrent.json
  • Frames:
    • https://data.sec.gov/api/xbrl/frames/us-gaap/AccountsPayableCurrent/USD/CY2019Q1I.json

Mention: - CIK must be zero padded to 10 digits. - Use a proper User-Agent.

Appendix A4. Python example: list recent filings

Use the code from Slide 28, but include more explanation.

Appendix A5. Python example: construct a filing URL

Show how accession numbers are converted into archive paths.

Example:

cik_int = "320193"
accession = "0000320193-24-000123"
accession_no_dash = accession.replace("-", "")

archive_base = f"https://www.sec.gov/Archives/edgar/data/{int(cik_int)}/{accession_no_dash}/"
print(archive_base)

Caution: - The primary document filename comes from the submissions JSON. - Do not hard-code filenames unless verified.

Appendix A6. From search terms to validation table

Show a sample validation protocol:

Field Definition Source Extraction rule Validation
AI disclosure Firm discusses AI as capability or strategy 10-K Item 1, Item 7 keyword screen plus manual coding 100 random filings
Cyber incident Material cybersecurity event 8-K item-based and keyword screen all positives manually checked
Credit agreement Loan contract disclosed EX-10 exhibit description contains credit agreement sample checked against exhibit index
Acquisition covenant Clause restricting acquisitions loan agreement exhibit clause extraction dual-coded subset

Appendix A7. SeekEdgar workflow

Describe conceptual workflow:

  1. Search phrase.
  2. Limit filing type or section.
  3. Inspect snippets and tables.
  4. Export results.
  5. Validate manually.
  6. Use for pilot or to design programmatic extraction.

Do not include screenshots unless available and permitted.

Appendix A8. Good research question templates

Examples:

  • What disclosure reveals a mechanism not available in standard datasets?
  • What firm behavior is documented before market outcomes are observed?
  • What contract term or governance mechanism shapes future decisions?
  • What regulatory change altered how firms disclose a topic?
  • What text can distinguish substantive adoption from symbolic disclosure?

Appendix A9. Suggested readings

Include: - Presenter’s old note. - Gao, Luong, and Qiu, 2026, Journal of Corporate Finance. - A few classic or representative SEC textual analysis papers, if the producer knows them and can cite accurately. - Do not invent citations. If uncertain, omit.

Python chart generation guidance

If generating charts in Quarto with Python, use simple matplotlib or pandas plots. Do not over-style.

Example chunk:

import pandas as pd
import matplotlib.pyplot as plt

stats = pd.DataFrame([
    {"FY": "FY 2023", "Electronic filings received": 2.43, "Online searches": 21.730},
    {"FY": "FY 2024", "Electronic filings received": 2.50, "Online searches": 22.817},
    {"FY": "FY 2025", "Electronic filings received": 2.58, "Online searches": 23.957},
])

fig, ax = plt.subplots(figsize=(7, 4))
ax.bar(stats["FY"], stats["Electronic filings received"])
ax.set_ylabel("Millions")
ax.set_title("Electronic filings received")
ax.spines[["top", "right"]].set_visible(False)
plt.tight_layout()

If using one chart per slide, ensure text remains large enough for a seminar room.

Suggested design patterns in Quarto

Columns

:::: {.columns}

::: {.column width="50%"}
Main idea here.
:::

::: {.column width="50%"}
![](assets/chart.png)
:::

::::

Callouts

:::{.callout-tip}
A strong filing-based project often begins by reading 20 filings very carefully.
:::

Fragments

::: {.fragment}
The hard part is not finding filings.
:::

::: {.fragment}
The hard part is knowing what in the filing maps to your construct.
:::

Section divider

# 1. The filing ecosystem

or

## 1. The filing ecosystem {.center}

depending on theme behavior.

Style and pacing rules

  1. Most slides should have one key message.
  2. No slide should contain more than 6 bullets unless it is a compact appendix table.
  3. Use examples frequently.
  4. Use form codes only after explaining the research use.
  5. Avoid dense legal text.
  6. Use “research idea” callouts.
  7. Use charts and diagrams to keep visual interest.
  8. Do not make the first 10 minutes technical.
  9. Keep Python to one main slide plus appendix.
  10. Use the loan-contract case study to showcase possibility, not to present the full paper.

Suggested “research idea” callouts

Sprinkle 4 to 6 of these across the deck.

Example:

:::{.callout-note title="Research idea"}
Do firms that disclose AI as a strategic capability make different workforce, acquisition, or cybersecurity investments?
:::
:::{.callout-note title="Research idea"}
Can contract exhibits reveal how lenders constrain future acquisitions, payouts, or asset sales?
:::
:::{.callout-note title="Research idea"}
Do firms change risk-factor language after competitors experience cyber incidents?
:::
:::{.callout-note title="Research idea"}
Can IPO registration statements reveal whether founder control changes post-IPO strategy?
:::

Specific instructions for the loan-contract case study

Use two slides only in the main deck.

Do not overdo finance details.

Slide 1 should introduce the case:

  • Public filings can disclose loan agreements as exhibits.
  • Those agreements contain contractual terms.
  • The paper hand-collects loan agreements linked to M&A deals.
  • This enables study of collateral, covenants, lender screening, and acquisition outcomes.

Slide 2 should extract the general lesson:

  • Filings enable original variable construction.
  • Contracts reveal mechanisms.
  • Manual data collection can be valuable when commercial datasets do not contain the key variable.
  • Start small, validate carefully, then scale.

Use citation: - Gao, Luong, and Qiu, 2026, Journal of Corporate Finance, DOI: 10.1016/j.jcorpfin.2026.102962.

Specific instructions for SeekEdgar

Do not frame SeekEdgar as replacing EDGAR.

Frame as:

  • Useful for exploratory search.
  • Helpful for no-code users.
  • Useful for searching text, sections, footnotes, tables, MD&A, SOX reports, audit reports, CD&A, and similar filing components.
  • Useful for generating pilot evidence and candidate samples.
  • For final published research, still validate and document extraction carefully.

Avoid claims about exact coverage unless verified from current SeekEdgar pages.

Specific instructions for EDGAR API

Keep the API section light.

Use these messages:

  • SEC APIs can give reproducible access to company submissions and XBRL facts.
  • APIs are best for metadata and structured data.
  • Full-text and exhibit-level extraction still requires additional parsing.
  • Always use a proper User-Agent.
  • Do not aggressively scrape.

Do not teach a full scraping framework in the main deck.

Quality checklist before delivery

The deck should pass these checks:

Content checks

  • Does it go beyond 10-K, 10-Q, 8-K, and 13F?
  • Does it explain exhibits clearly?
  • Does it include the JCF loan-contract example?
  • Does it include EDGAR, SEC APIs, and SeekEdgar?
  • Does it include Python, but not too much?
  • Does it include practical pitfalls?
  • Does it include research examples across disciplines?
  • Does it include the presenter’s old note as a resource?

Visual checks

  • Are there at least five non-text visual slides?
  • Is there a big-number chart slide?
  • Is there a filing universe map?
  • Is there a workflow diagram?
  • Is there a case-study pipeline?
  • Is there a pitfall grid?

Pedagogical checks

  • Would a non-finance student understand why filings matter?
  • Would a finance student still learn something new?
  • Does the deck inspire research ideas?
  • Does the deck avoid becoming a compliance or legal taxonomy lecture?
  • Does it lower the activation barrier?

Technical checks

  • Does quarto render corporate-filings.qmd work?
  • Are all assets local?
  • Are external links in an appendix?
  • Are Python chunks optional or renderable without live internet?
  • Are code blocks readable on slides?
  • Are source notes included where factual statistics are shown?

Suggested final slide wording

Title: Start with a research question, not a form code

Content: - What do firms do that is hard to observe? - Which disclosure would reveal it? - Can you validate the signal in 20 filings? - Can you scale it responsibly?

Final line: > Corporate filings are not just documents to read. They are empirical traces of firm behavior.

Final instruction to the deck producer

Build a polished static Quarto revealJS deck that a finance academic can deliver to mixed business school research students in 40 minutes. The deck must feel like a research seminar, not a software tutorial. It should be engaging, visual, and practically useful. The most important outcome is that students leave with a broader understanding of what corporate filings contain and a concrete sense that they can use filings to build novel research variables.

Back to top