环境准备

  • ubuntu服务器
  • scrapyd
  • scrapyd-client
  • 一个写好的scrapy爬虫文件

开始执行

安装package

    pip install scrapyd
    pip install scrapyd-client

开启scrapyd服务器

直接在某个窗口执行命令scrapyd,会得到以下信息

    2017-06-23T12:05:35+0800 [-] Loading /Users/brucedone/anaconda/envs/scrapy_project/lib/python2.7/site-packages/scrapyd/txapp.py...
    2017-06-23T12:05:36+0800 [-] Scrapyd web console available at http://127.0.0.1:6800/
    2017-06-23T12:05:36+0800 [-] Loaded.
    2017-06-23T12:05:36+0800 [twisted.scripts._twistd_unix.UnixAppLogger#info] twistd 16.5.0 (/Users/brucedone/anaconda/envs/scrapy_project/bin/python 2.7.12) starting up.
    2017-06-23T12:05:36+0800 [twisted.scripts._twistd_unix.UnixAppLogger#info] reactor class: twisted.internet.selectreactor.SelectReactor.
    2017-06-23T12:05:36+0800 [-] Site starting on 6800
    2017-06-23T12:05:36+0800 [twisted.web.server.Site#info] Starting factory <twisted.web.server.Site instance at 0x106da50e0>
    2017-06-23T12:05:36+0800 [Launcher] Scrapyd 1.2.0 started: max_proc=32, runner=u'scrapyd.runner'

如上所示,这个时候就已经开启了scrapyd服务器了,在当前机器的端口开启了6800端口,打开本地的浏览器http://127.0.0.1:6800就可以看到scrapyd的界面了

部署爬虫

切换到存在scrapy.cfg文件目录,打开scrapy.cfg文件

    # Automatically created by: scrapy startproject
    #
    # For more information about the [deploy] section see:
    # https://scrapyd.readthedocs.org/en/latest/deploy.html
    
    [settings]
    default = zara_main.settings
    
    [deploy]
    url = http://127.0.0.1:6800/
    project = zara_main

我们看下deploy下的url,这个就是直接指向你scrapyd所在机器的端口了,假定你机器A:192.168.0.1 上开启的scrapyd服务上,你要从B机器上部署上去,那这里的url就要填http://192.168.0.1:6800(另外,请你保证你能从B机器上访问到)

使用命令

    scrapyd-deploy

OK,就可以看到正常的部署了,这个时候你就可以使用命令操作你的spider了

scrapyd使用

    $ curl http://localhost:6800/schedule.json -d project=myproject -d spider=spider2
    {"status": "ok", "jobid": "26d1b1a6d6f111e0be5c001e648c57f8"}

更多命令参考,请点击==>传送门<==