Scrapy 2.14 文件

Scrapy 是一個快速的高階網頁爬取與網頁擷取框架，用於爬取網站並從網頁中提取結構化資料。它的應用範圍廣泛，從資料探勘到監控以及自動化測試皆可適用。

Getting help

遇到困難了嗎？我們很樂意提供幫助！

命令行工具: 了解用於管理您的 Scrapy 專案的命令行工具。
Spiders: Write the rules to crawl your websites.
Selectors: Extract the data from web pages using XPath.
Scrapy 外殼: Test your extraction code in an interactive environment.
項目: Define the data you want to scrape.
Item Loaders: Populate your items with the extracted data.
Item Pipeline: Post-process and store your scraped data.
Feed exports: Output your scraped data using different formats and storages.
Requests and Responses: Understand the classes used to represent HTTP requests and responses.
Link Extractors: Convenient classes to extract links to follow from pages.
設定: Learn how to configure Scrapy and see all available settings.
Exceptions: See all available exceptions and their meaning.

Frequently Asked Questions: Get answers to most frequently asked questions.
Debugging Spiders: Learn how to debug common problems of your Scrapy spider.
Spiders Contracts: Learn how to use contracts for testing your spiders.
Common Practices: Get familiar with some Scrapy common practices.
Broad Crawls: Tune Scrapy for crawling a lot domains in parallel.
使用您的瀏覽器開發者工具進行網頁擷取: Learn how to scrape with your browser's developer tools.
Selecting dynamically-loaded content: Read webpage data that is loaded dynamically.
Debugging memory leaks: Learn how to find and get rid of memory leaks in your crawler.
Downloading and processing files and images: Download files and/or images associated with your scraped items.
Deploying Spiders: Deploying your Scrapy spiders and run them in a remote server.
AutoThrottle extension: Adjust crawl rate dynamically based on load.
基準測試: Check how Scrapy performs on your hardware.
Jobs: pausing and resuming crawls: Learn how to pause and resume crawls for large spiders.
Coroutines: Use the coroutine syntax.
asyncio: Use asyncio and asyncio-powered libraries.

架構概覽: Understand the Scrapy architecture.
Add-ons: Enable and configure third-party extensions.
Downloader Middleware: Customize how pages get requested and downloaded.
Spider Middleware: Customize the input and output of your spiders.
Extensions: Extend Scrapy with your custom functionality
Signals: See all available signals and how to work with them.
Scheduler: Understand the scheduler component.
項目匯出器: Quickly export your scraped items to a file (XML, CSV, etc).
Download handlers: Customize how requests are downloaded or add support for new URL schemes.
Components: Learn the common API and some good practices when building custom Scrapy components.
核心 API: Use it on extensions and middlewares to extend Scrapy functionality.

發行備註: See what has changed in recent Scrapy versions.
Contributing to Scrapy: Learn how to contribute to the Scrapy project.
Versioning and API stability: Understand Scrapy versioning and API stability.