爬取B站“冰冰vlog.001“评论&&做词云

本文摘要:爬取冰冰vlog.001评论做词云库准备(推荐清华镜像安装)详细步骤爬取评论生成词云python小白,纯属来玩的(狗头)库准备(推荐清华镜像安装)requestsbs4jiebawordcloudimageiomatplotlib详细步骤爬取评论代码直接贴上了,自行研究import requestsimport timefrom bs4 import BeautifulSoupimport jsondef get_html(url): headers = { accept:

ayx爱游戏体育网页登录入口

ayx爱游戏体育网页登录入口

爬取"冰冰vlog.001"评论&&做词云库准备(推荐清华镜像安装)详细步骤爬取评论生成词云python小白,纯属来玩的(狗头)库准备(推荐清华镜像安装)requestsbs4jiebawordcloudimageiomatplotlib详细步骤爬取评论代码直接贴上了,自行研究import requestsimport timefrom bs4 import BeautifulSoupimport jsondef get_html(url): headers = { 'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8', 'user-agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.106 Safari/537.36', } r = requests.get(url, timeout=30,headers=headers) r.raise_for_status() r.endcodding = 'utf-8' return r.textdef get_content(url): comments = [] html = get_html(url) try: s=json.loads(html) except: print("jsonload error") num=len(s['data']['replies']) i=0 while i<num: comment=s['data']['replies'][i] InfoDict={} InfoDict['Uname']=comment['member']['uname'] InfoDict['Like']=comment['like'] InfoDict['Content']=comment['content']['message'] InfoDict['Time']=time.strftime("%Y-%m-%d %H:%M:%S",time.localtime(comment['ctime'])) comments.append(InfoDict) i=i+1 return commentsdef Out2File(dict): with open('BiliBiliComments.txt', 'a+',encoding='utf-8') as f: i=0 for comment in dict: i=i+1 try: f.write('姓名:{}t 点赞数:{}t n 评论内容:{}t 评论时间:{}t n '.format( comment['Uname'], comment['Like'], comment['Content'], comment['Time'])) f.write("-----------------n") except: print("out2File error") print('当前页面生存完成')if __name__ == '__main__': e=0 page=1 while e == 0 : url = "https://api.bilibili.com/x/v2/reply?pn="+ str(page)+"&type=1&oid=800760067&sort=2" try: print() content=get_content(url) print("page:",page) Out2File(content) page=page+1 # 为了降低被封ip的风险,每爬20页便歇5秒。if page%10 == 0: time.sleep(5) except: e=11234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465生成词云这个需要先有个图片,随便网上找,然后和py放在一个文件夹下。

ayx爱游戏app体育官方下载

代码附上(文件,图片改成自己的)import jieba.analysefrom wordcloud import WordCloud,ImageColorGeneratorfrom imageio import imreadimport matplotlib.pyplot as pltclass wc: def __init__(self,txt_file,img_file,font_file): self.f = open(txt_file,encoding='utf-8') self.txt = self.f.read() self.f.close() self.tags = jieba.analyse.extract_tags(self.txt,topK=100) self.text = ' '.join(self.tags) self.img = imread(img_file) self.wc = WordCloud(font_path=font_file,background_color='white',max_words=100,mask=self.img,max_font_size=80) self.word_cloud = self.wc.generate(self.text) def show_wc(self): plt.imshow(self.word_cloud) plt.axis("off") plt.show()if __name__=='__main__': mywc = wc('BiliBiliComments.txt','u=2959490536,2877096479&fm=26&gp=0.jpg','simsun.ttc') mywc.show_wc()1234567891011121314151617181920212223242526还是那句话,冰冰真可爱(狗头)小编是一名python开发工程师,这里有我自己整理了一套最新的python系统学习教程,包罗从基础的python剧本到web开发、爬虫、数据分析、数据可视化、机械学习等。想要这些资料的可以关注小编,并在后台私信小编:“01”即可领取。


本文关键词:爬取,站,“,冰冰,vlog.001,评论,amp,做词,云,爬取,ayx爱游戏app体育官方下载

本文来源:ayx爱游戏app体育官方下载-www.kayimage.cn