弹幕数据,正在不雅看时用户的感受
评论数据,已不雅看完成的用户感受
横坐标为分钟刻度
纵坐标为弹幕量
Date:收集时间2019.01.03,所以有三天的弹幕
Chapter: 第几个篇章,B站跨年有三个篇章,每一个篇章60min安排
VideoTime: 在Chapter中的此刻播放时间(相关于篇章开始的秒数)
SenderId: 弹幕发送者的匿名ID
DanMuContent: 弹幕文本内容
import pandas as pddf = pd.read_csv('data/弹幕new.csv')#剔往反复项df.drop_duplicates(inplace=True)#查验 反省数据个数print(len(df))#显示前5行df.head()
def str2float(string):#将VideoTime从字符串变成浮点数try:return float(string)except:return 0.0df['VideoTime'] = df['VideoTime'].apply(str2float)print('Chapter 1', df[df['Chapter']==1]['VideoTime'].max())print('Chapter 2', df[df['Chapter']==2]['VideoTime'].max())print('Chapter 3', df[df['Chapter']==3]['VideoTime'].max())
Chapter 1 4253.188Chapter 2 4000.27Chapter 3 4555.0
chapter1 = df[df['Chapter']==1]chapter2 = df[df['Chapter']==2]chapter3 = df[df['Chapter']==3]#将时间放在一个时间线上chapter2['VideoTime'] = chapter2['VideoTime']+ 4253.188chapter3['VideoTime'] = chapter3['VideoTime']+ 4253.188 + 4000.27#兼并chapter1, chapter2, chapter3chapter = pd.concat([chapter1, chapter2, chapter3])#VideoTime升序chapter.sort_values(by='VideoTime', ascending=True, inplace=True)chapter
横坐标为分钟刻度
纵坐标为弹幕量
def second2minute(second):#将VideoTime从秒数变成分钟数try:return int(float(second)/60)except:return 0chapter['VideoTime'] = chapter['VideoTime'].apply(second2minute)chapter
import matplotlib.pyplot as plt%matplotlib inlineplt.rcParams['font.sans-serif'] = ['Arial Unicode MS']danmudf = chapter.groupby('VideoTime').agg({'DanMuContent': ['count']})danmudf.plot(kind='line', figsize=(20, 10), legend=False)plt.title("2020年B站跨年晚会弹幕量趋向图", fontweight='bold', fontsize=25)plt.xlabel('时间点', fontweight='bold', fontsize=20)plt.ylabel('弹幕量', fontweight='bold', fontsize=20, rotation=0)plt.show()
(37, 63)
(100, 120)
以后都是独峰
37-63阶段
115四周
import reimport jiebaimport csvfrom pyecharts import options as optsfrom pyecharts.charts import Page, WordCloudfrom pyecharts.globals import SymbolType# 读取文件中的文本text = ''.join(df['DanMuContent'])#剔除非中文的内容(只保存中文)text = ''.join(re.findall(r'[u4e00-u9fa5]+', text))wordlist = jieba.lcut(text)wordset = [w for w in set(wordlist) if len(w)>1]wordfreq = []#词语计数for word in wordset:freq = wordlist.count(word)wordfreq.append((word, freq))# 词频排序wordfreq = sorted(wordfreq, key=lambda k:k[1], reverse=True)wordcloud =WordCloud()wordcloud.add("",wordfreq,word_size_range=[20,100])wordcloud.set_global_opts(title_opts=opts.TitleOpts(title="2020年B站跨年晚会"))wordcloud.render('B站跨年.html')wordcloud.render_notebook()
弹幕量维持在较高水平的阶段,弹幕内容有什么特点
弹幕质变化的拐点四周,弹幕内容有什么特点
-END-