BeautifulSoup SELECT 사용법

BeautifulSoup 간단 사용법

BeautifulSoup 패키지 설치

pip install bs4

사용법

from bs4 import BeautifulSoup
html = """<html><head></head><body>test data</body></html> """
soup = BeautifulSoup(html, 'html.parser')
print(soup.select_one('body').text)

SELECT(), SELECT_ONE() 설명

태그이름	태그이름으로 찾음
.클래스이름'	클래스이름으로 찾음
#아이디이름'	아이디이름으로 찾음 (아이디는 연속X)
상위태그이름>자식태그>자식태그'	부모 자식간의 태그 조회' >' 로 구분
상위태그이름 자손태그'	부모 자손간의 태그 조회 #띄어쓰기(공백) 로 구분 #자식을 건너 띈다.
[속성]'	태그 안의 속성을 찾음
태그이름.클래스이름'	해당태그의 클래스이름을 찾음
#아이디이름 > 태그이름.클래스이름	아이이디 이름으로 찾고 자식태그와 클래스이름으로 찾음

※ select()는 조건에 맞는 태그를 여러개 가져옵니다.(1개만 가져와도 타입이 select_one와 다릅니다.)
※ select_one()은 조건에 맞는 태그를 한개(여러개가 있어도 한개만 가져옵니다.)

응용

1. 네이버 쇼핑 -> 개발자 모드 (F12) -> 아무 텍스트나 검색 후 태그 확인

2. Requests 와 beautiful soup을 통해 네이버 쇼핑 (url:https://shopping.naver.com/home/p/index.naver)의 html 태그들을 가져온다.

3. select() 를 통해 원하는 태그에 해당하는 내용만 가져온다.

태그 <div></div>로 시작하는 html 중 첫번째 내용

print("태그이름:", soup.select('div')[0])

태그.클래스

print("태그.클래스(띄어쓰기)", soup.select("div.footer_area"))

자손 태그 (띄어쓰기)

print("자손 태그 ", soup.select("div script"))

#아이디 태그

print("#아이디 태그 ", soup.select("#_body script"))

* 참고내용 [출처]https://pythonblog.co.kr/coding/11/#python%20BeautifulSoup

print("자식태그  (>) : ",s.select("div.section_cell>div>h3>strong")[0].text)  
      print("#자손 태그 (띄어쓰기) : ", s.select("div.section_cell strong")[0].text)  
      print("#아이디 태그 조합", s.select("#t134953 div.tit_area strong")[0].text) 
      print("=======================")      
      print(s.select("#t134953 div.list_type a")[0])
      print("=======================")
      print(s.select("#t134953 div.list_type a span.txt")[0].text)      
      print(s.select("#t134953 div.list_type a")[0]['href'])      
      print("=======================")
      print()
      print(s.select("#t134953 div.list_type a img"))      
      print(s.select("#t134953 div.list_type a img")[0]['src'])
      print("=======================")      
      print(s.select("#t134953 div.list_type a>span.txt")[0].text)            
      print("=======================")      
      for tags in s.select("#t134953 div.list_type ul>li"):
            print(tags)
            print("img link :", tags.select_one('img')['src'])
            print("a txt : ", tags.select_one('a span.txt').text)
            print("a link :", tags.select_one('a')['href'])
            print("a txt :", tags.select_one('a>span.txt').text)
            print("price :", tags.select_one('span.price>em').text)
            print(tags.select('span.list_tag span'))            
            print("hot deal : ", "".join([v.text for v in tags.select('span.list_tag span')]))            
            print("#####################################") 
      print("@@@@@@@@@@ select() 와 select_one()")
      print(s.select_one("div.section_cell div li a").text) #select_one와 select의 차이(여러개가 있지만 하나만 가져온다.)
      print(    s.select("div.section_cell div li a")[0].text) 
      print(    s.select("div.section_cell div li a")[1].text)

저작자표시 비영리 동일조건 (새창열림)

코드몽규의 삽질저장소

BeautifulSoup SELECT 사용법

댓글

티스토리툴바

BeautifulSoup SELECT 사용법

관련글

댓글

티스토리툴바