Metadata-Version: 2.4
Name: haralyzer
Version: 1.8.0
Summary: A python framework for getting useful stuff out of HAR files
Home-page: https://github.com/mrname/haralyzer
Download-URL: https://github.com/mrname/haralyzer/tarball/1.0
Author: Justin Crown
Author-email: justincrown1@gmail.com
License: MIT
Keywords: har
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Natural Language :: English
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 2.7
Requires-Dist: cached-property
Requires-Dist: python-dateutil
Dynamic: author
Dynamic: author-email
Dynamic: classifier
Dynamic: description
Dynamic: download-url
Dynamic: home-page
Dynamic: keywords
Dynamic: license
Dynamic: requires-dist
Dynamic: summary

=========
Haralyzer
=========

.. image:: https://badge.fury.io/py/haralyzer.svg
    :target: http://badge.fury.io/py/haralyzer

.. image:: https://travis-ci.org/mrname/haralyzer.svg?branch=master
    :target: https://travis-ci.org/mrname/haralyzer

.. image:: https://coveralls.io/repos/mrname/haralyzer/badge.svg?branch=master
  :target: https://coveralls.io/r/mrname/haralyzer?branch=master

.. image:: https://readthedocs.org/projects/haralyzer/badge/?version=latest
    :target: http://haralyzer.readthedocs.org/en/latest/

A Python Framework For Using HAR Files To Analyze Web Pages.

Overview
--------

The haralyzer module contains two classes for analyzing web pages based
on a HAR file. ``HarParser()`` represents a full file (which might have
multiple pages), and ``HarPage()`` represents a single page from said file.

``HarParser`` has a couple of helpful methods for analyzing single entries
from a HAR file, but most of the pertinent functions are inside of the page
object.

``haralyzer`` was designed to be easy to use, but you can also access more
powerful functions directly.

Quick Intro
-----------

HarParser
+++++++++

The ``HarParser`` takes a single argument of a ``dict`` representing the JSON
of a full HAR file. It has the same properties of the HAR file, EXCEPT that each
page in HarParser.pages is a HarPage object::

    import json
    from haralyzer import HarParser, HarPage

    with open('har_data.har', 'r') as f:
        har_parser = HarParser(json.loads(f.read()))

    print har_parser.browser
    # {u'name': u'Firefox', u'version': u'25.0.1'}

    print har_parser.hostname
    # 'humanssuck.net'

    for page in har_parser.pages:
        assert isinstance(page, HarPage, None)
        # returns True for each

HarPage
+++++++

The ``HarPage`` object contains most of the goods you need to easily analyze a
page. It has helper methods that are accessible, but most of the data you need is
in properties for easy access. You can create a HarPage object directly by giving
it the page ID (yes, I know it is stupid, it's just how HAR is organized), and either
a ``HarParser`` with `har_parser=parser`, or a ``dict`` representing the JSON of a full HAR
file (see example above) with `har_data=har_data`::

    import json
    From haralyzer import HarPage

    with open('har_data.har', 'r') as f:
        har_page = HarPage('page_3', har_data=json.loads(f.read()))

    ### GET BASIC INFO
    har_page.hostname
    # 'humanssuck.net'
    har_page.url
    $ 'http://humanssuck.net/about/'

    ### WORK WITH LOAD TIMES (all load times are in ms) ###

    # Get image load time in milliseconds as rendered by the browser
    har_page.image_load_time
    # prints 713

    # We could do this with 'css', 'js', 'html', 'audio', or 'video'

    ### WORK WITH SIZES (all sizes are in bytes) ###

    # Get the total page size (with all assets)
    har_page.page_size
    # prints 2423765

    # Get the total image size
    har_page.image_size
    # prints 733488
    # We could do this with 'css', 'js', 'html', 'audio', or 'video'

    # Get duplicate requests (requests to the same URL 2 or more times) if any
    har_page.duplicate_url_request
    # Returns a dict where the key is a string of the URL and the value is an int of the number
    # of requests to that URL. Only requests with 2 or more are included.
    # {'https://test.com/': 3}

    # Get the transferred sizes (works only with HAR files, generated with Chrome)
    har_page.page_size_trans
    har_page.image_size_trans
    har_page.css_size_trans
    har_page.text_size_trans
    har_page.js_size_trans
    har_page.audio_size_trans
    har_page.video_size_trans

*IMPORTANT NOTE* - Technically, the `page_id` attribute of a single entry in a
HAR file is optional. As such, if your HAR file contains entries that do not map
to a page, an additional page will be created with an ID of `unknown`. This
"fake page" will contain all such entries. Since it is not a real page, it does
not have attributes for things like time to first byte or page load, and will
return `None`.

MultiHarParser
++++++++++++++

The ``MutliHarParser`` takes a ``list`` of ``dict``, each of which represents the JSON
of a full HAR file. The concept here is that you can provide multiple HAR files of the
same page (representing multiple test runs) and the ``MultiHarParser`` will provide
aggregate results for load times::

    import json
    from haralyzer import HarParser, HarPage

    test_runs = []
    with open('har_data1.har', 'r') as f1:
        test_runs.append( (json.loads( f1.read() ) )
    with open('har_data2.har', 'r') as f2:
        test_runs.append( (json.loads( f2.read() ) )

    multi_har_parser = MultiHarParser(har_data=test_runs)

    # Get the mean for the time to first byte of all runs in MS
    print multi_har_parser.time_to_first_byte
    # 70

    # Get the total page load time mean for all runs in MS
    print multi_har_parser.load_time
    # 150

    # Get the javascript load time mean for all runs in MS
    print multi_har_parser.js_load_time
    # 50

    # You can get the standard deviation for any of these as well
    # Let's get the standard deviation for javascript load time
    print multi_har_parser.get_stdev('js')
    # 5
    # We can also do that with 'page' or 'ttfb' (time to first byte)
    print multi_har_parser.get_stdev('page')
    # 11
    print multi_har_parser.get_stdev('ttfb')
    # 10

    ### DECIMAL PRECISION ###

    # You will notice that all of the results are above. That is because
    # the default decimal precision for the multi parser is 0. However, you
    # can pass whatever you want into the constructor to control this.

    multi_har_parser = MultiHarParser(har_data=test_runs, decimal_precision=2)
    print multi_har_parser.time_to_first_byte
    # 70.15


Advanced Usage
==============

``HarPage`` includes a lot of helpful properties, but they are all
easily produced using the public methods of ``HarParser`` and ``HarPage``::

    import json
    from haralyzer import HarPage

    with open('har_data.har', 'r') as f:
        har_page = HarPage('page_3', har_data=json.loads(f.read()))

    ### ACCESSING FILES ###

    # You can get a JSON representation of all assets using HarPage.entries #
    for entry in har_page.entries:
        if entry['startedDateTime'] == 'whatever I expect':
            ... do stuff ...

    # It also has methods for filtering assets #
    # Get a collection of entries that were images in the 2XX status code range #
    entries = har_page.filter_entries(content_type='image.*', status_code='2.*')
    # This method can filter by:
    # * content_type ('application/json' for example)
    # * status_code ('200' for example)
    # * request_type ('GET' for example)
    # * http_version ('HTTP/1.1' for example)
    # * load_time__gt (Takes an int representing load time in milliseconds.
    #   Entries with a load time greater than this will be included in the
    #   results.)
    # Parameters that accept a string use a regex by default, but you can also force a literal string match by passing regex=False

    # Get the size of the collection we just made #
    collection_size = har_page.get_total_size(entries)

    # We can also access files by type with a property #
    for js_file in har_page.js_files:
        ... do stuff ....

    ### GETTING LOAD TIMES ###

    # Get the BROWSER load time for all images in the 2XX status code range #
    load_time = har_page.get_load_time(content_type='image.*', status_code='2.*')

    # Get the TOTAL load time for all images in the 2XX status code range #
    load_time = har_page.get_load_time(content_type='image.*', status_code='2.*', asynchronous=False)

This could potentially be out of date, so please check out the sphinx docs.


More.... Advanced Usage
=======================

All of the HarPage methods above leverage stuff from the HarParser,
some of which can be useful for more complex operations. They either
operate on a single entry (from a HarPage) or a ``list`` of entries::

    import json
    from haralyzer import HarParser

    with open('har_data.har', 'r') as f:
        har_parser = HarParser(json.loads(f.read()))

    for page in har_parser.pages:
        for entry in page.entries:
            ### MATCH HEADERS ###
            if har_parser.match_headers(entry, 'Content-Type', 'image.*'):
                print 'This would appear to be an image'
            ### MATCH REQUEST TYPE ###
            if har_parser.match_request_type(entry, 'GET'):
                print 'This is a GET request'
            ### MATCH STATUS CODE ###
            if har_parser.match_status_code(entry, '2.*'):
                print 'Looks like all is well in the world'


Asset Timelines
+++++++++++++++

The last helper function of ``HarParser`` requires it's own section, because it
is odd, but can be helpful, especially for creating charts and reports.

It can create an asset timeline, which gives you back a ``dict`` where each
key is a ``datetime`` object, and the value is a ``list`` of assets that were
loading at that time. Each value of the ``list`` is a ``dict`` representing
an entry from a page.

It takes a ``list`` of entries to analyze, so it assumes that you have
already filtered the entries you want to know about::

    import json
    from haralyzer import HarParser

    with open('har_data.har', 'r') as f:
        har_parser = HarParser(json.loads(f.read()))

    ### CREATE A TIMELINE OF ALL THE ENTRIES ###
    entries = []
    for page in har_parser.pages:
        for entry in page.entries:
            entries.append(entry)

    timeline = har_parser.create_asset_timeline(entries)

    for key, value in timeline.items():
        print(type(key))
        # <type 'datetime.datetime'>
        print(key)
        # 2015-02-21 19:15:41.450000-08:00
        print(type(value))
        # <type 'list'>
        print(value)
        # Each entry in the list is an asset from the page
        # [{u'serverIPAddress': u'157.166.249.67', u'cache': {}, u'startedDateTime': u'2015-02-21T19:15:40.351-08:00', u'pageref': u'page_3', u'request': {u'cookies':............................


With this, you can examine the timeline for any number of assets. Since the key is a ``datetime``
object, this is a heavy operation. We could always change this in the future, but for now,
limit the assets you give this method to only what you need to examine.
