环境准备
- ubuntu服务器
- scrapyd
- scrapyd-client
- 一个写好的scrapy爬虫文件
开始执行
安装package
1 2 3 |
pip install scrapyd pip install scrapyd-client |
开启scrapyd服务器
直接在某个窗口执行命令scrapyd,会得到以下信息
1 2 3 4 5 6 7 8 9 |
2017-06-23T12:05:35+0800 [-] Loading /Users/brucedone/anaconda/envs/scrapy_project/lib/python2.7/site-packages/scrapyd/txapp.py... 2017-06-23T12:05:36+0800 [-] Scrapyd web console available at http://127.0.0.1:6800/ 2017-06-23T12:05:36+0800 [-] Loaded. 2017-06-23T12:05:36+0800 [twisted.scripts._twistd_unix.UnixAppLogger#info] twistd 16.5.0 (/Users/brucedone/anaconda/envs/scrapy_project/bin/python 2.7.12) starting up. 2017-06-23T12:05:36+0800 [twisted.scripts._twistd_unix.UnixAppLogger#info] reactor class: twisted.internet.selectreactor.SelectReactor. 2017-06-23T12:05:36+0800 [-] Site starting on 6800 2017-06-23T12:05:36+0800 [twisted.web.server.Site#info] Starting factory <twisted.web.server.Site instance at 0x106da50e0> 2017-06-23T12:05:36+0800 [Launcher] Scrapyd 1.2.0 started: max_proc=32, runner=u'scrapyd.runner' |
如上所示,这个时候就已经开启了scrapyd服务器了,在当前机器的端口开启了6800端口,打开本地的浏览器http://127.0.0.1:6800
就可以看到scrapyd的界面了
部署爬虫
切换到存在scrapy.cfg文件目录,打开scrapy.cfg文件
1 2 3 4 5 6 7 8 9 10 11 12 |
# Automatically created by: scrapy startproject # # For more information about the [deploy] section see: # https://scrapyd.readthedocs.org/en/latest/deploy.html [settings] default = zara_main.settings [deploy] url = http://127.0.0.1:6800/ project = zara_main |
我们看下deploy下的url,这个就是直接指向你scrapyd所在机器的端口了,假定你机器A:192.168.0.1 上开启的scrapyd服务上,你要从B机器上部署上去,那这里的url就要填http://192.168.0.1:6800
(另外,请你保证你能从B机器上访问到)
使用命令
1 2 |
scrapyd-deploy |
OK,就可以看到正常的部署了,这个时候你就可以使用命令操作你的spider了
scrapyd使用
1 2 3 |
$ curl http://localhost:6800/schedule.json -d project=myproject -d spider=spider2 {"status": "ok", "jobid": "26d1b1a6d6f111e0be5c001e648c57f8"} |
更多命令参考,请点击==>传送门<==