Doing HTTP requests? Always use 'requests' library, not 'urllib'
I wish someone told me that when I started doing web scraping with Python
A few days ago, I decided to read more in detail about proxies. What kind of proxies are there, what is a difference between a residential proxy and datacenter one, etc.
I found an article called “My Problems (and Solutions to) Scraping LOTS of Data” by Zach Burchill.
At some point, Zach says the following:
“For anything using HTTP requests, always use the
requests
Python package instead ofurllib
or any of its descendants.” — Zack
Right after reading this phrase, I had so many flashbacks about trying to debug simple get requests that I did with ‘urllib’.
Web scraping was my first interaction with Python. I never paid attention to which package I am using. When you just begin with Python you can think like ‘urllib’ and ‘requests’ are totally the same. In reality, urllib has a unique talent to break out of the blue.
All of the time the solution was to just make the same request with ‘requests’ package.
So, let it be a no-brainer for you — ditch ‘urllib’ when you need to make HTTP requests.
P.S. ‘urllib’ is quite decent for many other things. Such as encoding and breaking down the structure of URL, for example.