Metadata-Version: 2.4
Name: pynav2
Version: 2.1
Summary: Headless programmatic web browser on top of Requests and Beautiful Soup
Home-page: https://github.com/sloft/pynav2/
Download-URL: https://pypi.org/project/pynav2/#files
Author: sloft
Author-email: nomail@example.com
License: GNU Lesser General Public License Version 3 (LGPLv3)
Keywords: programmatic,web,browser
Classifier: Programming Language :: Python :: 3.4
Classifier: Programming Language :: Python :: 3.5
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: 3.7
Classifier: Operating System :: OS Independent
Classifier: License :: OSI Approved :: GNU Library or Lesser General Public License (LGPL)
Classifier: Development Status :: 4 - Beta
Classifier: Environment :: Console
Classifier: Topic :: Internet
Classifier: Intended Audience :: Developers
Classifier: Topic :: Internet :: WWW/HTTP
Classifier: Topic :: Internet :: WWW/HTTP :: Browsers
Classifier: Topic :: Internet :: WWW/HTTP :: Indexing/Search
Classifier: Topic :: Internet :: WWW/HTTP :: Site Management
Classifier: Topic :: Internet :: WWW/HTTP :: Site Management :: Link Checking
Classifier: Topic :: Software Development :: Libraries
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.4
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: requests
Requires-Dist: beautifulsoup4
Dynamic: author
Dynamic: author-email
Dynamic: classifier
Dynamic: description
Dynamic: description-content-type
Dynamic: download-url
Dynamic: home-page
Dynamic: keywords
Dynamic: license
Dynamic: license-file
Dynamic: requires-dist
Dynamic: requires-python
Dynamic: summary

# Pynav2
## Headless programmatic web browser on top of Requests and Beautiful Soup

### Requirements
Python 3.4+

Unittest tested from Python 3.4 to 3.7

### Installation
If python3 is the default python binary
```bash
pip install pynav2
```
If python2 is the default python binary
```bash
pip3 install pynav2
```
### Licence
GNU LGPLv3 (GNU Lesser General Public License Version 3)

### Interactive mode examples
Required for all examples
```python
from pynav2 import Browser
b = Browser()
```

#### HTTP GET request and print the response
Get http://example.com (use https if available on server)
```python
>>> b.get('example.com')
<Response [200]>
>>> b.text  # alias for b.response.text
'<!DOCTYPE html>\n<html lang="mul" class="no-js">\n<head>\n<meta charset="utf-8">\n<title>example.com</title>...'
```

#### HTTP GET request and print the json response
Get http://example.com/user-agent/json wich return a the json-encoded content of a response if nay
```python
>>> b.get('example.com/user-agent/json')
<Response [200]>
>>> b.json  # alias for b.response.json()
{'user-agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64; rv:56.0) Gecko/20100101 Firefox/56.0'}
```

#### HTTP POST request and print the response
```python
>>> data = {'q': 'python'}
>>> b.post('example.com/search', data=data)
<Response [200]>
>>> b.text
'<!DOCTYPE html>\n<html lang="mul" class="no-js">\n<head>\n<meta charset="utf-8">\n<title>example.com</title>...'
```

#### HTTP POST json request and print the json response
```python
>>> import json
>>> data = {'login': 'user', 'password': 'pass'}
>>> b.post('example.com/login', json=json.dumps(data))  # json to send in the body of the request
<Response [200]>
>>> b.json
{'login': 'success'}
```

#### HTTP HEAD request and print response headers
```python
>>> b.head('example.com')
<Response [200]>
>>> b.response.headers
{'Server': 'nginx', 'Content-Type': 'text/html; charset=utf-8', 'Content-Length': '48842', 'Age': '3154', 'Connection': 'keep-alive'}
```

#### HTTP PUT request and print the json response
```python
>>> data = {'version': '2.1', 'licence': 'LGPL'}
>>> b.put('example.com/api/about/', data=data)
<Response [200]>
>>> b.json
{'update': 'success'}
```

#### HTTP PATCH request and print the json response
```python
>>> data = {'version': '2.1'}
>>> b.patch('example.com/api/about/', data=data)
<Response [200]>
>>> b.json
{'patch': 'success'}
```

#### HTTP DELETE request and print the json response
```python
>>> b.delete('example.com/api/user/102')
<Response [200]>
>>> b.json
{'delete': 'success'}
```

#### HTTP OPTIONS request and print the json response
```python
>>> b.options('example.com/api/user')
<Response [200]>
>>> b.json
{'options': '...'}
```

#### Get all links
```python
>>> b.get('example.com')
<Response [200]>
>>> b.links
['http://example.com/news', 'http://example.com/forum', 'http://example.com/contact']
>>> for link in b.links:
...   print(link)
...
http://example.com/news
http://example.com/forum
http://example.com/contact

```

#### Filter links
Any beautifulSoup.find_all() parameter can be added, see [Beautiful Soup documentation](https://www.crummy.com/software/BeautifulSoup/bs4/doc/)
```python
>>> import re
>>> b.get('example.com')
<Response [200]>
>>> b.get_links(text='Python Events')  # regular expression
>>> b.get_links(class_="jump-link")  # no regular expression for class attribute
>>> b.get_links(href="windows")   # regular expression
>>> b.get_links(title=re.compile('success'))  # manual regular expression
```

#### Get all images
```python
>>> b.get('example.com')
<Response [200]>
>>> b.images
['http://example.com/img/logo.png', 'http://example.com/img/picture.jpg', 'http://there.com/news.gif']
```

#### Filter images
Any beautifulSoup.find_all() parameter can be added, see [Beautiful Soup documentation](https://www.crummy.com/software/BeautifulSoup/bs4/doc/)
```python
>>> b.get('example.com')
<Response [200]>
>>> b.get_images(src='logo')  # regular expression
>>> b.get_images(class_='python-logo')  # no regular expression for class attribute
>>> b.get_images(alt='yth')  # regular expression
```

#### Download file
```python
>>> b.verbose=True
>>> b.download('http://example.com/ubuntu-amd64', '/tmp')  # it will follow redirect and look for headers content-disposition to find filename
downloading ubuntu-18.04.1-desktop-amd64.iso (1.8 GB) to: /tmp/ubuntu-18.04.1-desktop-amd64.iso
download completed in 12 minutes 5 seconds (1.8 GB)

```

####  Handle referer
```python
>>> b.handle_referer = True
>>> b.get('somewhere.com')
>>> b.get('example.com')  # request headers will have http://somewhere.com as referer
>>> b.get('there.com')  # request headers will have http://example.com as referer
```

####  Set referer manually 
```python
>>> b.referer = 'http://www.here.com'
>>> b.get('example.com') # request headers will have http://here.com as referer
```

####  Set user-agent 
useragent module include a list of user-agents :

firefox_windows, chrome_windows, edge_windows, ie_windows, firefox_linux, chrome_linux, safari_mac

Default user-agent is firefox_windows 
```python
>>> from pynav2 import useragent
>>> b.user_agent = useragent.firefox_linux
>>> b.get('example.com')  # request headers will have 'Mozilla/5.0 (X11; Linux x86_64; rv:56.0) Gecko/20100101 Firefox/56.0' as User-Agent
>>> b.user_agent = 'my_app/v1.0'
>>> b.get('example.com')  # request headers will have my_app/v1.0 as User-Agent 
```

#### Set sleep time before a request 
```python
>>> b.set_sleep_time(0.5, 1.5)  # random x seconds between 0.5 to 1.5 seconds and wait x before each request
>>> b.get('example.com') # wait x seconds before request
```

#### Define request timeout
10 seconds timeout
```python
>>> b.timeout = 10
```

#### Close all opened TCP sessions
```python
>>> b.get('example1.com')
>>> b.get('example2.com')
>>> b.get('example3.com')
>>> b.session.close()
```

#### Set HTTP proxy working with HTTPS request for one request
For SOCKS proxies see [Requests documentation](http://docs.python-requests.org/en/master/user/advanced/#socks)
```python
>>> b.get('https://httpbin.org/ip').json()['origin']
111.111.111.111
>>> proxies = {'https':'10.0.0.0:1234'}
>>> b.timeout = 10  # could be useful to wait 10 seconds if proxies are slow
>>> b.get('https://httpbin.org/ip', proxies=proxies).json()['origin']
10.0.0.0
```

#### Set HTTP proxy working with HTTPS request for all requests
For SOCKS proxies see [Requests documentation](http://docs.python-requests.org/en/master/user/advanced/#socks)
```python
>>> b.get('https://httpbin.org/ip').json()['origin']
111.111.111.111
>>> b.proxies = {'https':'10.0.0.0:1234'}
>>> b.timeout = 10  # could be useful to wait 10 seconds if proxies are slow
>>> b.get('https://httpbin.org/ip').json()['origin']
10.0.0.0
```

#### Set HTTP proxy working with HTTPS request for all request and another proxy for a specific domain
For SOCKS proxies see [Requests documentation](http://docs.python-requests.org/en/master/user/advanced/#socks)
```python
>>> b.get('https://httpbin.org/ip').json()['origin']
111.111.111.111
>>> b.proxies = {'https':'10.0.0.0:1234', 'https://specific-domain.com' : '10.11.12.13:1234'}
>>> b.timeout = 10  # could be useful to wait 10 seconds if proxies are slow
>>> b.get('https://httpbin.org/ip').json()['origin']
10.0.0.0
>>> b.get('https://specific-domain.com/ip').json()['origin']
10.11.12.13
```

#### Get beautifulsoup instance
After a get or post request, Browser.bs (beautifulsoup) is automatically initiated with b.response.text

See [Beautifll Soup documentation](https://www.crummy.com/software/BeautifulSoup/bs4/doc/) 
```python
>>> b.get('example.com')
>>> b.bs.find_all('a')
```

#### Get requests objects instances

See [Requests documentation](http://docs.python-requests.org/en/master/) 
```python
>>> b.get('example.com')
>>> b.session
>>> b.request
>>> b.response
```

#### Get browser history
```python
>>> b.get('example1.com')
>>> b.get('example2.com')
>>> b.get('example3.com')
>>> print b.history
['example1.com', 'example2.com', 'example3.com']
```

#### Disable "InsecureRequestWarning: Unverified HTTPS request is being made"
```python
>>> import urllib3
>>> urllib3.disable_warnings()
>>> b.get('example.com')  # no warnings 
```
