BeautifulSoup的一些方法-白红宇

BeautifulSoup的一些方法

阅读量：4648 次

发布时间：2019-06-09

本文共 1144 字，大约阅读时间需要 3 分钟。

1、首先要下载BeautifulSoup:

pip3 install BeautifulSoup4

2、

from bs4 import BeautifulSoup

s = '''

<html><head><title>The Dormouse's story</title></head>

<body>

<p class="title"><b>The Dormouse's story</b></p>

<p class="story">Once upon a time there were three little sisters; and their names were

<a href="http://example.com/elsie" class="sister" id="link1">Elsie</a>,

<a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and

<a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>;

and they lived at the bottom of a well.</p>

'''

bs=BeautifulSoup(s,"html.parser")

#打印出字符串s，不完整的标签会自动补全

print(bs)

#取到所用标签中内容

print(bs.text)

#每个标签当成一个元素，从外到内遍历

print(bs.find_all())

#找到所用的a标签

print(bs.find_all("a"))

#找到所有的body标签，虽然body不完整，但会自动补全的

print(bs.find_all("body"))

#找到每个a标签的href值

for tag in bs.find_all("a"):

print(tag.get("href"))

#找到每个a标签的name属性值

for tag in bs.find_all():

print(tag.name)

if tag.name in ["script","link"]:

tag.decompose() # 去除标签script和link

# 打印出去除标签后的字符串

print(str(bs))

# 打印出去除字符串后的文本内容

print(bs.text)

转载于:https://www.cnblogs.com/fangsheng/p/9756866.html

你可能感兴趣的文章

ASP.NET MVC Identity 兩個多個連接字符串問題解決一例

查看>>

过滤器与拦截器区别

查看>>

USACO 1.5.4 Checker Challenge

查看>>

第二阶段站立会议7

查看>>

[18]Debian Linux Install GNU GCC Compiler and Development Environment

查看>>

JAVA多线程

查看>>

ACE(Adaptive Communication Environment)介绍

POJ 2031 Building a Space Station

查看>>

面向对象1

查看>>

编程开发之--java多线程学习总结（5）

查看>>

register_globals（全局变量注册开关）

查看>>

as3调用外部swf里的类的方法

查看>>

如何让 zend studio 10 识别 Phalcon语法并且进行语法提示

查看>>

任意阶幻方（魔方矩阵）C语言实现

查看>>