一,python爬取性感小姐姐写真全套
52pojie有个热帖,通过python爬取网站“https://www.2meinv.com/” 的小姐姐。爬取没有用到多线程,也没有用到什么异步之类的。代码分享给大家做个参考,这里默认爬取只爬取了10页数据,存盘路径在 path = “G:/spider/image/2meinv/”
代码如下:
import os
import re
import time
from urllib import request
from bs4 import BeautifulSoup
def get_last_page(text):
return int(re.findall('[^/$]\d*', re.split('/', text)[-1])[0])
def html_parse(url, headers):
time.sleep(3)
resp = request.Request(url=url, headers=headers)
res = request.urlopen(resp)
html = res.read().decode("utf-8")
soup = BeautifulSoup(html, "html.parser", from_encoding="utf-8")
return soup
headers = {
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.114 Safari/537.36 Edg/91.0.864.59'
}
url = "https://www.2meinv.com/"
for p in range(1, 10 + 1):
next_url = url + "index-" + str(p) + ".html"
soup = html_parse(next_url, headers)
link_node = soup.findAll('div', attrs={"class": "dl-name"})
for a in link_node:
path = "G:/spider/image/2meinv/"
href = a.find('a', attrs={'target': '_blank'}).get('href')
no = re.findall('[^-$][\d]', href)[1] + re.findall('[^-$][\d]', href)[2]
first_url = url + "/article-" + no + ".html"
title = a.find('a', attrs={'target': '_blank'}).text
path = path + title + "/"
soup = html_parse(href, headers)
count = soup.find('div', attrs={'class': 'des'}).find('h1').text
last_page = get_last_page(count)
for i in range(1, last_page + 1):
next_url = url + "/article-" + no + "-" + str(i) + ".html"
soup = html_parse(next_url, headers)
image_url = soup.find('img')['src']
image_name = image_url.split("/")[-1]
fileName = path + image_name
if not os.path.exists(path):
os.makedirs(path)
if os.path.exists(fileName):
continue
request.urlretrieve(image_url, filename=fileName)
request.urlcleanup()
print(title, "下载完成了")
使用的使用需要安装BeautifulSoup这个依赖。安装的方式参考:Windows安装pip方法,添加bs4环境以及用法
原贴地址:https://www.52pojie.cn/thread-1665835-1-1.html
二,如何运行一个python程序
方法一:Windows下的 Python IDLE界面
打开IDLE,输入python代码
例如:print(‘zahui.top’)
回车就可以显示结果:zahui.top
方法二:使用记事本运行python代码
打开记事本,将python代码复制到记事本,然后将记事本文件重命名为*.py
打开IDLE,点击file→open→选择保存的py类型的文件打开。点击run→Run Module,或者按F5。