Dataset Architecture - Sean Horgan

# Foundational Dataset Architecture for the Gun Trafficking Tracer Application ## 1.1. Executive Summary This document provides a comprehensive architectural blueprint for the dataset that will power the Gun Trafficking Tracer application. The objective is to construct a robust, scalable, and verifiable data foundation that enables users to identify Federal Firearms Licensees (FFLs) that may contribute to the illegal trafficking of firearms. This is not a static repository of information; it is the design for a dynamic analytical engine. The dataset's core value is derived from the systematic integration of disparate public data sources—including federal licensing records, regulatory enforcement actions, violent crime statistics, and firearms commerce reports—into a single, cohesive relational model. By contextualizing FFLs within their regulatory and geographic environments, the dataset is designed to reveal patterns, anomalies, and correlations that are otherwise obscured. This architecture will directly support all application features as specified in the Product Requirements Document (PRD), from faceted directory searches to complex data visualizations on the analytics dashboard, providing a powerful tool for concerned residents, policymakers, and industry members seeking to enhance public safety. ## 1.2. Guiding Principles of Data Architecture The design of this dataset is governed by three core principles to ensure its integrity, utility, and longevity. ### 1.2.1. Traceability and Verifiability Every data point within the system must be traceable to its original source. This principle is fundamental to the application's credibility. For data ingested directly from government publications, the schema will include metadata fields such as `source_document_id`, `retrieval_date`, and `source_url`. For data derived through analytical models, such as the sales volume estimations detailed in Section IV, the schema will track the `estimation_model_version` and a `confidence_score`. This ensures that every piece of information presented to the user can be audited for its origin and the methodology used in its derivation, fostering transparency and trust among all user groups. ### 1.2.2. Contextual Integrity Data, particularly regulatory and enforcement data, is not static; its meaning is shaped by the policy environment in which it was created. The dataset architecture is designed to capture this temporal and political context. A prime example is the "Zero Tolerance" policy for FFL violations implemented by the Bureau of Alcohol, Tobacco, Firearms and Explosives (ATF) from 2021 to 2025.1 Violations cited during this period were subject to a different, stricter enforcement standard than those cited before or after its repeal.3 The database schema will explicitly link violations to the specific enforcement policy in effect at the time of the inspection. This prevents the misinterpretation of historical data and provides users with a more nuanced understanding of an FFL's compliance history. ### 1.2.3. Scalability and Extensibility The relational data model is designed with future growth in mind. The initial data sources represent a foundational but not exhaustive set of relevant information. The architecture must be capable of incorporating new data streams without requiring a fundamental redesign. The schema is structured to readily accommodate future datasets, such as FFL-specific theft/loss reports, state-level licensing information, expanded crime statistics, or new categories of user-generated content. This extensibility ensures the application can evolve and increase in value as more data becomes publicly accessible. ## 1.3. Core Data Entities and Relational Overview The dataset is structured around a series of interconnected tables, or entities, with the `FFL_Dealers` table serving as the central hub. Each record in this table represents a unique Federal Firearms Licensee. All other data entities are designed to provide additional layers of information related to these licensees. The primary entities include: - **`FFL_Dealers`**: The master table containing core identifying and contact information for each licensee. - **`FFL_Inspections`**: A record of every documented compliance inspection conducted by the ATF for a given FFL. - **`FFL_Violations`**: A detailed ledger of specific violations cited during an inspection, linked back to the `FFL_Inspections` table. - **`FFL_Sales_Estimates`**: A table containing derived metrics for annual sales volume and other commerce-related analytics. - **`Weapon_Types_Sold`**: A junction table linking FFLs to the categories of firearms they are legally authorized to sell based on their license type. - **`Crime_Hotspots`**: A curated list of major U.S. cities with high rates of violent crime, used to provide geographic context. - **`Government_Contacts`**: A repository of contact information for relevant government and law enforcement agencies associated with each FFL's location. - **`User_Reviews`**: A table to store community-contributed reviews and ratings of FFLs. These entities are linked through foreign key relationships, allowing for complex queries that can, for example, retrieve all FFLs with a specific violation type, located within 100 miles of a crime hotspot, and sort them by their estimated sales volume, thereby fulfilling the core user journeys outlined in the PRD.5 # II. The Federal Firearms Licensee (FFL) Core Profile The central entity of the application dataset is the `FFL_Dealers` table. Each record in this table represents a single, unique Federal Firearms Licensee and serves as the primary anchor to which all other compliance, sales, geospatial, and user-generated data is connected. ## 2.1. Data Source and Acquisition Strategy The primary and most authoritative source for FFL data is the "Complete Federal Firearms Listings" published by the ATF.6 The ATF makes these lists available for public download on a monthly basis in various formats, including TXT, XLSX, and PDF.7 For the purposes of this dataset, the XLSX file (e.g., `0125-ffl-list-complete.xlsx`) will serve as the definitive source due to its structured and easily parsable format. The data acquisition process will be automated. A script will be scheduled to run monthly to check the ATF website for a new version of the complete FFL listing. Upon detecting a new file, the script will download and ingest the data, refreshing the `FFL_Dealers` table. This process will identify new licensees, flag licensees that are no longer on the active list, and update records for any FFLs that have changed their address, business name, or other details. To track the history and churn of FFLs over time, historical snapshots of the monthly lists will be archived. ## 2.2. Schema Definition: `FFL_Dealers` Table The `FFL_Dealers` table is designed to store the foundational identity and contact information for each licensee, directly supporting the FFL Directory and Listing Pages as described in the PRD.5 |Column Name|Data Type|Description| |---|---|---| |`ffl_id`|`BIGINT` (PK)|A unique, persistent identifier generated internally to serve as the primary key. This avoids reliance on the official license number, which could theoretically be reissued.| |`license_number`|`VARCHAR(255)`|The official, full FFL number (e.g., 9-86-XXX-XX-XX-12345). Indexed for fast lookups.| |`business_name`|`VARCHAR(255)`|The official registered name of the business as listed by the ATF.| |`trade_name`|`VARCHAR(255)`|The "doing business as" (DBA) name, if different from the business name.| |`owner_name`|`VARCHAR(255)`|The name of the parent company or individual owner, where available in the source data.| |`license_type`|`INTEGER` (FK)|A foreign key referencing the `License_Types` lookup table, corresponding to official ATF FFL types (e.g., 01 for Dealer, 07 for Manufacturer).6| |`license_status`|`ENUM`|An enumerated type to track the FFL's current operational status. Values include 'Active', 'Revoked', 'Previously Revoked', 'Surrendered', 'Expired'.| |`address_street`|`VARCHAR(255)`|The street address of the licensed premises.| |`address_city`|`VARCHAR(255)`|The city of the licensed premises.| |`address_state`|`VARCHAR(2)`|The two-letter state abbreviation.| |`address_zip`|`VARCHAR(10)`|The ZIP code of the licensed premises.| |`latitude`|`DOUBLE`|The geographic latitude, derived from geocoding the full premises address. Essential for map-based features and proximity searches.| |`longitude`|`DOUBLE`|The geographic longitude, derived from geocoding the full premises address.| |`phone_number`|`VARCHAR(20)`|The primary contact phone number, where available.| |`email_address`|`VARCHAR(255)`|The primary contact email address, where available.| |`google_maps_url`|`TEXT`|A pre-generated URL pointing to the FFL's location on Google Maps, used to power the "Review on Google Maps" CTA button.5| |`last_verified_date`|`DATE`|The publication date of the last official ATF list on which this FFL's information was confirmed.| ## 2.3. Implications of Data Sourcing and Latency A critical factor in the architecture of this dataset is the nature of the available source data. The ATF provides comprehensive FFL lists as static, monthly files, not through a real-time Application Programming Interface (API).7 While the ATF does operate the "FFL eZ Check" system for verifying the validity of a license, its terms of use explicitly state it is for "U.S. Government-authorized use only" and is structured for single, manual lookups, making it unsuitable for automated, large-scale data validation.9 This sourcing method introduces an inherent latency into the dataset. An FFL's status could change—for instance, a license could be revoked or surrendered—and this change may not be reflected in the application's data until the next monthly list is published and ingested. This potential lag can be up to 30 days. To manage this, the application must be transparent with its users. The `last_verified_date` field in the `FFL_Dealers` schema is essential for tracking data freshness. This field should be prominently displayed in the application's user interface, for example, with a disclaimer such as "Data as of January 10, 2025," corresponding to the publication date of the most recently ingested ATF list.11 This approach ensures data integrity by clearly communicating the data's temporal boundaries and managing user expectations about its real-time accuracy. # III. Regulatory Compliance and Enforcement Ledger A central function of the Gun Trafficking Tracer application is to provide users with detailed information about an FFL's history of regulatory compliance. This is essential for the "Violation-Based Search" and for populating the "Violation History" section on FFL listing pages.5 This section of the dataset is designed as a detailed ledger, capturing the outcomes of ATF inspections and the specific violations cited. ## 3.1. Data Source and Acquisition Strategy The data for FFL compliance and enforcement actions is fragmented and has been subject to changes in public availability. The acquisition strategy must therefore be multi-pronged. 1. **Historical "Zero Tolerance" Data (July 2021 - April 2025):** In a significant move toward transparency, the ATF began publicly posting detailed information on compliance inspections that resulted in citations for one of the five "Zero Tolerance" violations, including the full Final Notices of Revocation for 93 FFLs.1 This policy was later repealed in April 2025, and the ATF subsequently removed these public-facing reports from its website.2 However, this invaluable dataset was captured by news organizations and watchdog groups.13 The initial population of the compliance ledger will rely on acquiring these archived reports to build a baseline of serious enforcement actions. 2. **Ongoing ATF Inspection Summaries:** The ATF continues to publish monthly statistics on its "Firearms Compliance Inspection Results" page.14 These are provided as downloadable Microsoft Excel spreadsheets and contain aggregate data on the number of inspections conducted, outcomes (e.g., "No violations," "Warning letter," "Revocations"), and the most frequently cited violations.14 While these reports do not typically name individual FFLs, they are crucial for tracking overall enforcement trends and can be used to corroborate other data. 3. **Third-Party and FOIA-Sourced Data:** Non-governmental organizations, most notably Brady, have successfully utilized the Freedom of Information Act (FOIA) to obtain and publish thousands of ATF inspection reports that are not otherwise publicly available.16 The data architecture is designed to ingest these more detailed, unstructured reports as they are acquired, parsing them to extract specific violations and outcomes associated with individual FFLs. ## 3.2. Schema Definition: `FFL_Inspections` and `FFL_Violations` Tables To accurately model the relationship between an inspection event and the violations found, two separate but linked tables are required. ### 3.2.1. Table: `FFL_Inspections` This table records each distinct inspection event for a given FFL. |Column Name|Data Type|Description| |---|---|---| |`inspection_id`|`BIGINT` (PK)|A unique identifier for each inspection event.| |`ffl_id`|`BIGINT` (FK)|A foreign key referencing the `FFL_Dealers` table.| |`inspection_date`|`DATE`|The date on which the ATF inspection was completed.| |`inspection_outcome`|`ENUM`|The final result of the inspection. Possible values are derived from ATF terminology: 'No Violations', 'Report of Violations', 'Warning Letter', 'Warning Conference', 'License Revoked', 'License Surrendered'.15| |`policy_context`|`VARCHAR(255)`|A field to store the name of the ATF enforcement policy in effect at the time of the inspection (e.g., "Enhanced Regulatory Enforcement Policy 2021-2025").| |`source_document_url`|`TEXT`|A direct URL or reference to the source document (e.g., ATF press release, FOIA report) where the inspection result was published.| ### 3.2.2. Table: `FFL_Violations` This table records each specific violation cited during an inspection, allowing for a one-to-many relationship with the `FFL_Inspections` table. |Column Name|Data Type|Description| |---|---|---| |`violation_id`|`BIGINT` (PK)|A unique identifier for each cited violation.| |`inspection_id`|`BIGINT` (FK)|A foreign key referencing the `FFL_Inspections` table.| |`violation_type_code`|`VARCHAR(50)`|A foreign key referencing a `Violation_Types` lookup table. Codes correspond to specific infractions (e.g., `27 CFR 478.124(c)(1)` for failure to obtain a completed Form 4473).15| |`violation_description`|`TEXT`|A detailed text description of the violation, extracted from the source inspection report if available.| |`is_zero_tolerance`|`BOOLEAN`|A flag set to TRUE if the violation is one of the five serious infractions specified in the PRD and the former "Zero Tolerance" policy.5| ## 3.3. Implications of a Shifting Regulatory Landscape The collection and interpretation of FFL violation data cannot be divorced from its political and regulatory context. The PRD places significant emphasis on the five "Zero Tolerance" violations, which were the cornerstone of the Biden administration's Enhanced Regulatory Enforcement Policy announced in 2021.1 This policy led to a more than 500% increase in license revocations for willful violations, a stark departure from previous enforcement patterns.19 However, this policy was officially repealed on April 7, 2025, by a new administration.2 The subsequent policy changes included removing the public-facing revocation reports, encouraging revoked FFLs to reapply for their licenses, and raising the evidentiary bar for certain violations, such as changing the standard from "failing" to conduct a background check to "refusing" to do so.3 This shift means that an FFL's violation history must be interpreted through the lens of the policy that was in effect at the time. A violation that triggered a mandatory revocation in 2023 might only result in a warning letter under the new policy. To maintain analytical integrity and present information fairly, the dataset must capture this context. The `policy_context` field in the `FFL_Inspections` table is therefore not merely descriptive metadata; it is a critical component for accurate analysis. It allows the application to differentiate between violations cited under different regimes. The user interface can then present this information with appropriate nuance, such as "Violation: Failing to conduct a required background check (Cited under 2021 Zero Tolerance Policy)." This prevents the application from creating a false equivalency between past and present infractions and ensures a more accurate and defensible representation of an FFL's compliance record. # IV. Estimated Firearms Commerce and Sales Analytics A primary requirement of the Gun Trafficking Tracer application is to allow users to search, sort, and analyze FFLs based on their sales activity.5 However, sales data for individual, privately-held FFLs is not publicly available. FFL software used by dealers for inventory and sales management keeps this data proprietary.22 To overcome this significant data gap, a robust estimation methodology is required to generate a defensible proxy for sales volume. This section details the model for creating an **Estimated Annual Sales Volume (EASV)** score and other key commerce metrics. ## 4.1. Estimation Methodology The EASV will not be a precise dollar figure or unit count but rather a relative index score that allows for meaningful comparison and ranking among FFLs. The score will be the output of a multi-factor regression model that synthesizes several publicly available datasets that are correlated with firearms sales. The ATF itself uses criteria such as sales volume and multiple handgun sales to select FFLs for focused compliance inspections, validating the relevance of these metrics as indicators of significant commercial activity.23 The primary input factors for the model are as follows: 1. **State-Level Sales Proxies:** The FBI's National Instant Criminal Background Check System (NICS) is widely regarded as the strongest available proxy for firearms sales trends.24 The National Shooting Sports Foundation (NSSF) provides monthly adjusted NICS figures that filter out non-sales-related checks (e.g., for carry permits), offering a cleaner signal of market activity.25 The model will use year-over-year and month-over-month changes in state-level adjusted NICS checks as a foundational input to determine the overall commercial climate in an FFL's state. 2. **FFL Type and Geographic Density:** Not all FFLs are equal in commercial potential. A Type 07 Manufacturer, which can also act as a dealer, has a different business model than a Type 01 Dealer or a Type 02 Pawnbroker.6 The model will weight FFLs based on their license type. Furthermore, the total state-level sales proxy will be apportioned to individual FFLs based on the density of competing licensees within a given county or metropolitan area. 3. **ATF Firearms Trace Data:** The ATF's National Tracing Center (NTC) provides public data on firearms recovered in crimes, aggregated at the state and major city level.27 While tracing is not a direct measure of sales, studies and ATF procedures indicate that FFLs that are the source of a high number of traced crime guns, particularly those with a short "time-to-crime" (the period between the firearm's retail sale and its recovery in a crime), are often high-volume dealers.23 This data will be used as a significant weighting factor in the EASV model, increasing the score for FFLs in cities with high trace volumes. 4. **Industry and Economic Reports:** National and industry-specific reports from sources like the NSSF, Fortune Business Insights, and public filings from traded companies like Smith & Wesson provide valuable data on production numbers, market size, and overall economic trends.25 This macroeconomic data will be used to calibrate and validate the model's output against known industry-wide performance. ## 4.2. Schema Definition: `FFL_Sales_Estimates` Table This table will store the annual outputs of the estimation model for each FFL, providing the data needed for sales-based sorting, the FFL listing page charts, and the analytics dashboard. |Column Name|Data Type|Description| |---|---|---| |`estimate_id`|`BIGINT` (PK)|A unique identifier for each annual estimate record.| |`ffl_id`|`BIGINT` (FK)|A foreign key referencing the `FFL_Dealers` table.| |`year`|`INTEGER`|The calendar year for which the estimate is calculated.| |`estimated_annual_sales_volume`|`INTEGER`|The calculated EASV index score. This is a unitless value for relative ranking.| |`estimated_avg_firearms_per_sale`|`FLOAT`|An inferred metric indicating the average number of firearms per transaction. This is a key indicator for potential straw purchasing and is derived from analyses of multiple handgun sales reports (ATF Form 3310.4) and other trafficking indicators.33| |`monthly_sales_trend`|`JSON`|A JSON array of 12 numeric values representing the estimated proportional sales distribution for each month of the year. This directly supports the 12-month bar chart on the FFL listing page.5| |`regional_sales_avg`|`INTEGER`|The average EASV score for the FFL's region (e.g., county or state) for comparison.| |`national_sales_avg`|`INTEGER`|The average EASV score for all FFLs nationally for comparison.| |`confidence_score`|`FLOAT`|A score from 0.0 to 1.0 indicating the model's confidence in the estimate, based on the completeness and quality of input data available for the specific FFL and its locality.| ## 4.3. Implications of Estimation and Transparency The necessity of estimating sales volume carries significant implications for the application's credibility. Since the EASV is a derived metric and not a reported fact, transparency is paramount. Users, ranging from concerned residents to firearms industry members, must understand the basis of the figures they are seeing. The application's "About this App" modal, as specified in the PRD, must include a clear, user-friendly explanation of the sales volume estimation methodology.5 It should explicitly state that the figure is an estimate based on public data proxies and is intended for comparative purposes. The inclusion of a `confidence_score` in the database schema is a critical component of this transparency. This score allows the system to internally flag and externally communicate the inherent uncertainty in the estimation. For example, an estimate for an FFL in a major city with rich NICS and trace data would have a high confidence score, while an estimate for a rural FFL in a state with less available data might have a lower score. The user interface can use this score to visually qualify the data (e.g., using tooltips or lighter colors for low-confidence estimates). This approach positions the application not as an arbiter of absolute fact, but as a provider of data-driven, contextualized risk assessment, which is essential for maintaining trust with its diverse user base. # V. Inferred Weapon Inventory and Specializations To power the "Weapon-Type Search" feature, the dataset must categorize the types of firearms each FFL is authorized to sell, with a particular focus on those commonly used in mass shootings, as required by the PRD.5 As with sales volume, FFLs do not publicly report their specific inventory. Therefore, this information must be inferred from the most reliable available indicators: the FFL's federal license type and its Special Occupational Taxpayer (SOT) status. ## 5.1. Inference Methodology The types of weapons an FFL can legally manufacture, import, or deal in are strictly defined by their license type under the Gun Control Act (GCA) of 1968 and, for certain weapon categories, the National Firearms Act (NFA) of 1934.35 The methodology for populating an FFL's weapon specializations is a rules-based system based on these federal classifications. 1. **FFL License Type Mapping:** The primary categorization is based on the FFL type issued by the ATF. - **Standard Firearms (GCA/Title I):** Type 01 (Dealer), Type 02 (Pawnbroker), Type 07 (Manufacturer), and Type 08 (Importer) licenses authorize the holder to deal in standard firearms, which include most commercially available handguns, rifles, and shotguns.36 - **Destructive Devices (NFA/Title II):** Type 09 (Dealer), Type 10 (Manufacturer), and Type 11 (Importer) licenses are specifically for "Destructive Devices," a category that includes items like grenades, bombs, and firearms with a bore over one-half inch.35 - **Ammunition:** A Type 06 license is for manufacturing ammunition only.26 - **Collectors:** A Type 03 license is for collectors of "Curios and Relics" and does not permit the holder to engage in the business of selling firearms.37 These will be categorized separately. 2. **Special Occupational Taxpayer (SOT) Status Mapping:** An FFL's SOT status is the definitive indicator of their authority to handle NFA/Title II weapons beyond destructive devices. While SOT status is not included in the main public FFL list, it is a necessary registration for any FFL dealing in these items. The dataset will infer the likelihood of SOT status based on business names and other indicators, and this data can be augmented as more sources become available. - **Class 3 SOT:** This applies to Type 01, 02, and 09 FFLs and allows them to _deal_ in NFA items such as silencers (suppressors), short-barreled rifles (SBRs), short-barreled shotguns (SBSs), and machine guns manufactured before May 1986.40 - **Class 2 SOT:** This applies to Type 07 and 10 FFLs, allowing them to _manufacture_ and deal in NFA items.26 - **Class 1 SOT:** This applies to Type 08 and 11 FFLs, allowing them to _import_ and deal in NFA items.42 ## 5.2. Schema Definition: `Weapon_Types_Sold` and `Weapon_Categories` Tables A junction table is used to create a many-to-many relationship between FFLs and the weapon categories they are authorized to sell. ### 5.2.1. Table: `Weapon_Categories` (Lookup Table) |Column Name|Data Type|Description| |---|---|---| |`weapon_category_id`|`INTEGER` (PK)|A unique identifier for the weapon category.| |`category_name`|`VARCHAR(255)`|The human-readable name of the category (e.g., "Handgun", "Rifle", "Shotgun", "Semi-Automatic Rifle", "NFA - Silencer", "NFA - Short-Barreled Rifle", "Destructive Device").| |`is_mass_shooting_weapon`|`BOOLEAN`|A flag set to TRUE for categories frequently identified in mass shooting events, such as semi-automatic rifles styled after the AR-15 and AK-47.43| ### 5.2.2. Table: `Weapon_Types_Sold` (Junction Table) |Column Name|Data Type|Description| |---|---|---| |`ffl_id`|`BIGINT` (FK)|A foreign key referencing the `FFL_Dealers` table.| |`weapon_category_id`|`INTEGER` (FK)|A foreign key referencing the `Weapon_Categories` table.| ## 5.3. Implications of State-Level "Assault Weapon" Bans The PRD's requirement to identify dealers of "weapons... most commonly used in mass shootings" necessitates careful handling of the term "assault weapon".5 This term is not a technical firearm classification but a legal and political one, with definitions that vary significantly from one jurisdiction to another. While federal law does not currently ban the sale of common semi-automatic rifles like the AR-15 by licensed dealers, several states have enacted their own legislation prohibiting or heavily restricting their sale, such as the Protect Illinois Communities Act and Washington's Initiative 1639.45 This creates a complex legal patchwork where an FFL's federal license (e.g., Type 01 or 07) would permit the sale of such a firearm, but their state law forbids it. This legal ambiguity can create confusion even for licensees themselves.47 Therefore, the dataset cannot accurately flag an FFL as a seller of "assault weapons" based solely on its federal license type. To ensure legal accuracy and avoid misrepresenting an FFL's business practices, the application logic must incorporate state-level legal data. The system will maintain a separate list of states with active "assault weapon" bans. When a user queries for this weapon type, the application will perform a two-step check: first, it will identify FFLs with the appropriate federal license type (01, 02, 07, 08), and second, it will filter out any FFLs located in states where such sales are prohibited. This layered approach adds a crucial dimension of legal context, preventing the application from making false or misleading claims about FFLs operating in states with stricter firearms laws. # VI. Geospatial Intelligence and Community Risk Mapping Geospatial data is a fundamental component of the Gun Trafficking Tracer application, powering essential features such as location-based searches, proximity filtering, and the mapping of FFLs in relation to areas of high crime.5 This section outlines the strategy for acquiring, storing, and utilizing geographic data to build a layer of community risk intelligence. ## 6.1. Data Source and Acquisition Strategy The geospatial dataset will be constructed from three primary sources: 1. **Geocoding of Addresses:** All physical addresses in the dataset—including FFL premises, local city halls, police departments, and ATF field offices—must be converted into precise geographic coordinates (latitude and longitude). This process, known as geocoding, will be performed using a reliable, high-volume geocoding service. Options include the U.S. Census Geocoder, which is well-suited for large, batch-based jobs and is available at no cost, or commercial services like the Google Maps Geocoding API for its accuracy and robust documentation.48 The resulting coordinates are the foundation for all distance calculations and map-based visualizations. 2. **Crime Hotspot Curation:** The PRD requires a feature to find FFLs within a 100-mile radius of a "major violent crime hotspot".5 Rather than attempting to process raw, incident-level crime data, which involves complex spatial analysis techniques like Nearest Neighbor Indexing or STAC ellipses, a more pragmatic and effective approach will be used.51 A curated list of "Major Crime Hotspot Cities" will be created and maintained. This list will be populated by identifying the top 30-50 U.S. cities with the highest per-capita rates of violent crime and/or homicide, based on the FBI's annual Uniform Crime Reporting (UCR) data and corroborated by reputable journalistic and academic analyses.53 The geographic centroid (latitude and longitude) of each of these cities will be stored in the database. 3. **Government Entity Locations:** To fulfill the PRD requirement of providing contact information for local government officials, the dataset will include the locations and contact details for the city hall and police department nearest to each FFL.5 This data will be acquired through public directories and web scraping. The locations and contact information for regional ATF Field Offices are publicly available on the ATF's website and will be ingested directly.55 ## 6.2. Schema Definition: `Crime_Hotspots` and `Government_Contacts` Tables These tables store the curated geospatial data needed to provide context to FFL locations. ### 6.2.1. Table: `Crime_Hotspots` |Column Name|Data Type|Description| |---|---|---| |`hotspot_id`|`INTEGER` (PK)|A unique identifier for each curated hotspot city.| |`city_name`|`VARCHAR(255)`|The name of the city (e.g., "Memphis", "Detroit", "Baltimore").| |`state`|`VARCHAR(2)`|The two-letter state abbreviation.| |`latitude`|`DOUBLE`|The geographic latitude of the city's center.| |`longitude`|`DOUBLE`|The geographic longitude of the city's center.| |`violent_crime_rate_per_100k`|`FLOAT`|The city's violent crime rate per 100,000 residents, from the latest FBI UCR data.53| |`homicide_rate_per_100k`|`FLOAT`|The city's homicide rate per 100,000 residents, from the latest FBI UCR data.54| |`data_year`|`INTEGER`|The year of the UCR data used for the rates.| ### 6.2.2. Table: `Government_Contacts` |Column Name|Data Type|Description| |---|---|---| |`contact_id`|`BIGINT` (PK)|A unique identifier for each government contact record.| |`ffl_id`|`BIGINT` (FK)|A foreign key referencing the `FFL_Dealers` table, linking this contact to a specific FFL.| |`entity_type`|`ENUM`|The type of government entity. Values: 'Local Police Dept', 'City Hall', 'ATF Field Office'.| |`entity_name`|`VARCHAR(255)`|The official name of the entity (e.g., "Memphis Police Department", "City of Detroit - City Hall").| |`phone_number`|`VARCHAR(20)`|The public contact phone number for the entity.| |`email_address`|`VARCHAR(255)`|The public contact email address for the entity.| |`address_full`|`TEXT`|The full mailing address of the entity.| ## 6.3. Implications of the Curated Hotspot Approach The decision to use a curated list of high-crime cities as proxies for regional hotspots is a strategic one, designed to balance functionality with technical feasibility. While academic methods for identifying micro-level hotspots are powerful, they require access to and processing of massive, granular crime datasets that are not uniformly available nationwide and would introduce significant complexity to the application's backend.51 The curated list approach, by contrast, directly and efficiently addresses the user journey specified in the PRD: finding FFLs within a wide radius of a known high-crime area.5 The implementation is simplified to a series of straightforward geospatial queries that calculate the Haversine distance between each FFL's coordinates and the coordinates of the 30-50 cities in the `Crime_Hotspots` table. This method is computationally efficient, scalable, and easily maintainable. The list of hotspot cities can be updated annually upon the release of new FBI UCR data, ensuring the feature remains relevant over time. This approach delivers the intended functionality to the user without incurring the substantial data engineering and processing overhead of a real-time national crime mapping system. # VII. User Engagement and Action-Oriented Data A key objective of the Gun Trafficking Tracer application is to empower users to take action.5 This is facilitated through features that allow for community feedback and provide tools for reporting suspicious activity. This section defines the data structures necessary to support these user-generated content features while maintaining a clear distinction from the verified, government-sourced data that forms the core of the dataset. ## 7.1. Schema Definition: `User_Reviews` Table This table is designed to store community-contributed reviews and ratings for each FFL, as well as the AI-generated violation summary specified in the PRD.5 |Column Name|Data Type|Description| |---|---|---| |`review_id`|`BIGINT` (PK)|A unique identifier for each user review.| |`ffl_id`|`BIGINT` (FK)|A foreign key referencing the `FFL_Dealers` table.| |`user_id`|`BIGINT` (FK)|A foreign key referencing a `Users` table (to be implemented for user authentication).| |`rating`|`INTEGER`|A star rating provided by the user, from 1 to 5.| |`review_text`|`TEXT`|The free-text content of the user's review.| |`ai_generated_summary`|`TEXT`|A field to store the AI-generated summary of the FFL's violation history, created on-demand for the user as per the PRD feature.5| |`submission_date`|`TIMESTAMP`|The date and time the review was submitted.| |`review_status`|`ENUM`|A status field for moderation purposes. Values: 'Pending', 'Approved', 'Rejected'.| ## 7.2. Schema Definition: `Suspicious_Activity_Reports` Table This table captures data submitted by users through the "Report Suspicious Activity" modal. It is designed as an internal logging mechanism, not a direct reporting tool to law enforcement. |Column Name|Data Type|Description| |---|---|---| |`report_id`|`BIGINT` (PK)|A unique identifier for each report.| |`ffl_id`|`BIGINT` (FK)|A foreign key referencing the `FFL_Dealers` table.| |`user_id`|`BIGINT` (FK)|A foreign key referencing the `Users` table.| |`activity_type`|`ENUM`|A predefined category of suspicious activity selected by the user (e.g., 'Straw Purchase Attempt', 'Unlicensed Dealing', 'Suspicious Behavior', 'Other').| |`report_details`|`TEXT`|Additional details provided by the user in a free-text field.| |`report_date`|`TIMESTAMP`|The date and time the report was created within the application.| ## 7.3. Implications for Handling User-Generated Content The introduction of user-generated content into a dataset built primarily on official government sources necessitates careful architectural and legal considerations. User reviews and reports are inherently subjective, potentially biased, and unverified. If handled improperly, this content could be used to harass legitimate businesses, exposing the application and its operators to significant legal liability. To mitigate this risk, the database schema must enforce a strict separation between verified data and user-generated content. The application's backend must include a robust moderation workflow. The `review_status` field in the `User_Reviews` table is a critical element of this workflow, ensuring that all community submissions are reviewed by a moderator before being made public. Furthermore, the "Report Suspicious Activity" feature must be designed with legal boundaries in mind. The application itself should not act as an intermediary that forwards reports to law enforcement. Doing so could classify the application as a mandatory reporter and create complex legal obligations. Instead, the feature should function as a tool to help the user structure their own report. After the user fills out the form, the application should generate a formatted summary of the information and provide the user with the direct contact details for the appropriate authorities, such as the local police department and the ATF's tip line.1 The final act of submission must be performed by the user, thereby placing the legal responsibility for making the report on the individual, not the application provider. This design empowers the user to act while protecting the platform from undue legal risk. # VIII. Integrated Data Schema and Recommendations The data architecture outlined in this document provides a comprehensive and cohesive foundation for the Gun Trafficking Tracer application. By integrating core FFL profiles, regulatory compliance data, estimated commerce analytics, inferred weapon specializations, and geospatial intelligence, the final relational model creates a powerful tool for analysis and public awareness. The tables—`FFL_Dealers`, `FFL_Inspections`, `FFL_Violations`, `FFL_Sales_Estimates`, `Weapon_Types_Sold`, `Crime_Hotspots`, `Government_Contacts`, and `User_Reviews`—are designed to interrelate, enabling the complex queries and data visualizations required to fulfill the application's mission. ## 8.1. Data Maintenance and Refresh Schedule To ensure the data remains timely and relevant, a structured maintenance and refresh schedule is essential. This schedule should be automated to the greatest extent possible. - **Monthly:** The automated script should download and ingest the latest "Complete Federal Firearms Listings" from the ATF to update the core `FFL_Dealers` table.7 Concurrently, the script should download and process the latest ATF "Firearms Compliance Inspection Results" summary spreadsheets.14 - **Annually:** Upon the final release of the FBI's Uniform Crime Reporting (UCR) data for the preceding year, the curated `Crime_Hotspots` list should be reviewed and updated to reflect any changes in national violent crime trends. - **Annually:** The Estimated Annual Sales Volume (EASV) model should be re-run using the full dataset from the previous year's adjusted NICS figures and any new industry commerce reports to refresh all sales-related estimates. ## 8.2. Recommendations for Future Enhancements While the proposed architecture is comprehensive, several opportunities exist for future enhancement as new data becomes available. - **Integration of FFL Theft/Loss Data:** The ATF requires FFLs to report all firearms that are lost or stolen from their inventory using ATF Form 3310.11.28 This data is collected by the NTC and represents a powerful indicator of a licensee's security practices and potential for diversion.28 While not currently published at an individual FFL level, this information could potentially be obtained via targeted FOIA requests. Integrating this data would add a critical new dimension to the risk profile of each FFL. - **Time-to-Crime Analytics:** "Time-to-crime" is a key metric used by the ATF to identify firearms traffickers.23 Public data is currently aggregated at the state level.27 Gaining access to FFL-specific trace data is legally and technically challenging due to statutory restrictions. However, a future enhancement could involve partnering with academic or research institutions that have been granted access to anonymized trace data to develop a "crime gun trace risk score" that could be incorporated into the dataset. - **State-Level Licensing and Regulatory Data:** The current model infers weapon sale authorizations based on federal licenses. A significant enhancement would be to incorporate state-specific firearms dealer licensing requirements and sales restrictions. For example, integrating the specific definitions and prohibitions from states like New York, California, and Washington would allow for much greater accuracy in determining which FFLs can legally sell certain categories of firearms, such as "assault weapons".46 This would add a valuable layer of granularity and legal precision to the "Weapon-Type Search" feature.