Metadata-Version: 2.1
Name: bigcode-fetcher
Version: 0.1.2
Summary: Tool to search and fetch code from GitHub
Home-page: https://github.com/tuvistavie/bigcode-tools/tree/master/code-fetcher
Author: Daniel Perez
Author-email: tuvistavie@gmail.com
License: UNKNOWN
Download-URL: https://github.com/tuvistavie/bigcode-tools/archive/master.zip
Description: # bigcode-fetcher
        
        A utility to search and fetch code from GitHub.
        This tool was build to easily create datasets for repository analysis.
        
        The tool works in two phases, `search` finds repositories using the GitHub API,
        and saves the result in a JSON file. `download` fetch all the repositories
        inside the JSON file.
        
        ## Install
        
        This tool can be installed by running
        
        ```
        pip install bigcode-fetcher
        ```
        
        or by fetching this repository and running
        
        ```
        pip install .
        ```
        
        in this directory.
        
        ## Usage
        
        ### `search` command
        
        By default, the utility searches for repositories fulfilling the following conditions
        
        * `size` between 1M and 100M
        * `stars` count > 10
        * non-viral `license` (MIT,Apache-2.0,MPL-2.0,BSD-2-Clause,BSD-3-Clause,BSD-4-Clause,MS-PL)
        
        and retrieves the first 100 projects, ordered by number of stars.
        
        To avoid API rate limiting, an access token can be provided either with the `--token`
        CLI argument or with the `GITHUB_TOKEN` environment variable.
        
        See the help to see all the options:
        
        ```
        bigcode-fetcher search -h
        ```
        
        #### Example
        
        Search for all Apache commons projects written in Java
        
        ```
        mkdir -p apache-common-projects
        bigcode-fetcher search --language Java --user apache --stars '>0' --keyword commons --max-repos 500 -o apache-common-projects/apache-commons.json
        ```
        
        ### `download` command
        
        This commands will simply `git clone` all the repositories in the
        `JSON` generated by the `search` command.
        
        To reduce the download size, only the latest revision is fetched by default (i.e. `git clone --depth 1`). This can be disabled by passing in the `--full` flag.
        
        `USERNAME/REPO` will be fetched in `OUTPUT_DIR/USERNAME/REPO`, where
        `OUTPUT_DIR` is set by the `--output` option.
        
        The command will ignore the project if the directory already exists,
        so running the command multiple times is safe, and recommended to make
        sure all repositories have been fetched.
        
        See the help for more information:
        
        ```
        bigcode-fetcher download -h
        ```
        
        #### Example
        
        Download all the Apache commons project generated above
        
        ```
        mkdir -p apache-common-projects/repositories
        bigcode-fetcher download -i apache-common-projects/apache-commons.json -o apache-common-projects/repositories
        ```
        
Platform: UNKNOWN
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Provides-Extra: test
