Skip to content

Soup Sieve


Soup Sieve is a CSS selector library designed to be used with Beautiful Soup 4. It aims to provide selecting, matching, and filtering using modern CSS selectors. Soup Sieve currently provides selectors from the CSS level 1 specifications up through the latest CSS level 4 drafts (though some are not yet implemented).

Soup Sieve was written with the intent to replace Beautiful Soup's builtin select feature, and as of Beautiful Soup version 4.7.0, it now is 🎊. Soup Sieve can also be imported in order to use its API directly for more controlled, specialized parsing.

Soup Sieve has implemented most of the CSS selectors up through the level 4 drafts, though there are a number that don't make sense in a non-browser environment. Selectors that cannot provide meaningful functionality simply do not match anything. Some of the supported selectors are:

  • .classes
  • #ids
  • [attributes=value]
  • parent child
  • parent > child
  • sibling ~ sibling
  • sibling + sibling
  • :not(element.class, element2.class)
  • :is(element.class, element2.class)
  • parent:has(> child)
  • and many more


You must have Beautiful Soup already installed:

pip install beautifulsoup4

In most cases, assuming you've installed version 4.7.0, that should be all you need to do, but if you've installed via some alternative method, and Soup Sieve is not automatically installed for your, you can install it directly:

pip install soupsieve

If you want to manually install it from source, navigate to the root of the project and run

python build
python install


To use Soup Sieve, you must create a BeautifulSoup object:

>>> import bs4

>>> text = """
... <div>
... <!-- These are animals -->
... <p class="a">Cat</p>
... <p class="b">Dog</p>
... <p class="c">Mouse</p>
... </div>
... """
>>> soup = bs4.BeautifulSoup(text, 'html5lib')

Then you can begin to use Soup Sieve to select a single tag:

>>> import soupsieve as sv
>>> sv.select_one('p:is(.a, .b, .c)', soup)
<p class="a">Cat</p>

To select all tags:

>>> import soupsieve as sv
>>>'p:is(.a, .b, .c)', soup)
[<p class="a">Cat</p>, <p class="b">Dog</p>, <p class="c">Mouse</p>]

To select closest, direct ancestor:

>>> import soupsieve as sv
>>> el = sv.select_one('.c', soup)
>>> sv.closest('div', el)
<!-- These are animals -->
<p class="a">Cat</p>
<p class="b">Dog</p>
<p class="c">Mouse</p>

To filter:

>>> sv.filter('p:not(.b)', soup.div)
[<p class="a">Cat</p>, <p class="c">Mouse</p>]

To match:

>>> els ='p:is(.a, .b, .c)', soup)
>>> sv.match(els[0], 'p:not(.b)')
>>> sv.match(els[1], 'p:not(.b)')

Or even just extracting comments:

>>> sv.comments(soup)
[' These are animals ']

Selectors do not have to be constrained to one line either. You can span selectors over multiple lines just like you would in a CSS file.

>>> selector = """
... .a,
... .b,
... .c
... """
>>>, soup)
[<p class="a">Cat</p>, <p class="b">Dog</p>, <p class="c">Mouse</p>]

You can even use comments to annotate a particularly complex selector.

>>> selector = """
... /* This isn't complicated, but we're going to annotate it anyways.
...    This is the a class */
... .a,
... /* This is the b class */
... .b,
... /* This is the c class */
... .c
... """
>>>, soup)
[<p class="a">Cat</p>, <p class="b">Dog</p>, <p class="c">Mouse</p>]

If you've ever used Python's Re library for regular expressions, you may know that it is often useful to pre-compile a regular expression pattern, especially if you plan to use it more than once. The same is true for Soup Sieve's matchers, though is not required. If you have a pattern that you want to use more than once, it may be wise to pre-compile it early on:

>>> selector = sv.compile('p:is(.a, .b, .c)')
>>> selector.filter(soup.div)
[<p class="a">Cat</p>, <p class="b">Dog</p>, <p class="c">Mouse</p>]

A compiled object has all the same methods, though the parameters will be slightly different as they don't need things like the pattern or flags once compiled. See API documentation for more info.

Compiled patterns are cached, so if for any reason you need to clear the cache, simply issue the purge command.

>>> sv.purge()