# Data Scraping
Can you write a tool in python so I can run it from a jupyter notebook like colab? The code should be in an ipynb file that I can easily import into colab. Break the code into cells that make it modular and easy to run repeatably.
The goal is to download the PDFs referenced as the "ATF Inspection Report". The tool will need to visit the Gun Store Transparency Project site at https://gunstoretransparency.org/ to download all the ATF Reports (AKA Inspection Reports) listed in all US states into a google cloud bucket? E.g. here's an example for the first 30 inspection reports for state of MA: https://gunstoretransparency.org/?zip%5Bdistance%5D%5Bfrom%5D=50&zip_op=1&state=MA&table-page=1.
Store each report in a directory by the state and business name associated with the report with the following structure: ROOT FOLDER/State 2-letter abbreviation/FFL Business Name/.
The code must provide inputs for:
* Google Cloud Storage bucket name
* Which state to download reports from, to include an option to download from all states
The code must support the authentication of the google cloud user that has access to bucket.
The code should provide a report on the total number of reports by state and FFL business identified, the numbers of successful and failed reports download to the bucket.
References
* https://colab.research.google.com/drive/1pR_ME4RFhZbBS1-53nU6qMadrH97KTH8#scrollTo=3d3eee8e
* https://gemini.google.com/app/3ea05293a8282d0f
# Examples of ATF Reports
https://gunstoretransparency.org/gun-store/focal-point-technologies
# Gemini Deep Research
Build a dataset to help combat gun trafficking by enabling users like concerned residents, policy makers, elected officials, and members of firearms industry, identify Federal Firearms License (FFL) dealers that may contribute to the trafficking of firearms and provides tools to take action and reduce illegal gun trafficking.
The data needs to support the app user experience and capabilities as defined in the attached PRD.
Guidelines
* Prioritize looking for data in official public sources like the ATF
* Maintain the lineage of all the data that will be used by the app. It should be easy to confirm the source of data.
* The output format must be a csv and support updates in the future