wordcloud词云

本文最后更新于 2025年4月27日晚上

加载停用词

stopwords.txt是一个自定义的停用词列表，里面的词都会被过滤

1
2
3

"""读取停用词"""
with open("stopwords.txt", "r", encoding="utf-8") as fp:
    stopwords = set([s.rstrip() for s in fp.readlines()])

读取文本内容，过滤停用词

"""获取文本内容"""
with open("input.txt", "r", encoding="utf-8") as fp:
    content = fp.read()

"""中文分词"""
content = jieba.lcut(content)

"""去除停用词"""
text = []
for word in content:
    if word not in stopwords:
        text.append(word)

计算词频

词频格式是字典{词：数量}，text是一个去掉停用词后的词数组，直接统计

1	frequency = dict(Counter(text)) # 去掉停用词后的词频统计

计算词频的目的是根据词频来生成词云

1	wordcloud.fit_words(frequency)

wordcloud（）参数

具体API请查阅wordcloud.WordCloud — wordcloud 1.8.1 documentation (amueller.github.io)

常用的有这些

wc = WordCloud(font_path='C:\\Windows\\Fonts\\STZHONGS.TTF',  # 字体
               background_color="white",  # 背景色
               mask=mask_image,  # 遮罩
               prefer_horizontal=0.6,  # 水平文字比例
               width=800,  # 宽度
               height=1000,  # 高度
               colormap="tab10"  # 指定字体颜色
               )

其中遮罩是一个白底的图片，非白色部分就是词云的形状

有时候找的背景图不是白色或者不够白，可以在画图工具中打开图片，画笔-填充-颜色默认-白色-点击背景色

colormap就是指定的颜色集合，参数填的是string类型，可选的值有下面这些，左侧列表都可作为参数值，对应的颜色是右边这些

结果

代码

from wordcloud import WordCloud
import jieba
from collections import Counter
from imageio import imread
import matplotlib.pyplot as plt

"""读取停用词"""
with open("stopwords.txt", "r", encoding="utf-8") as fp:
    stopwords = set([s.rstrip() for s in fp.readlines()])  # 数组转集合

"""获取文本内容"""
with open("input.txt", "r", encoding="utf-8") as fp:
    content = fp.read()

"""中文分词"""
content = jieba.lcut(content)

"""去除停用词"""
text = []
for word in content:
    if word not in stopwords:
        text.append(word)

frequency = dict(Counter(text))  # 去掉停用词后的词频统计

mask_image = imread("map.jpg")

wc = WordCloud(font_path='C:\\Windows\\Fonts\\STZHONGS.TTF',  # 字体
               background_color="white",  # 背景色
               mask=mask_image,  # 遮罩
               prefer_horizontal=0.6,  # 水平文字比例
               width=800,  # 宽度
               height=1000,  # 高度
               colormap="tab10"
               )

wc.fit_words(frequency)

plt.imshow(wc, interpolation="bilinear")
plt.axis("off")
plt.show()
wc.to_file("output.png")

python

#python #worldcloud

wordcloud词云

https://xinhaojin.github.io/2021/11/10/wordcloud词云/

作者

xinhaojin

发布于

2021年11月10日

许可协议

selenium+beautifulsoup4获取网页动态加载的数据上一篇

红米AC2100在线刷breed+老毛子固件下一篇