Metadata-Version: 2.4
Name: awkg
Version: 0.2.0
Summary: awkg is an awk-like text-processing tool powered by python language
Home-page: https://github.com/thammegowda/awkg
Download-URL: https://github.com/thammegowda/awkg
Author: Thamme Gowda
Author-email: tgowdan@gmail.com
License: GNU General Public License v3 (GPLv3)
Platform: any
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: Developers
Classifier: Topic :: Text Processing
Classifier: Topic :: Utilities
Classifier: Topic :: Text Processing :: General
Classifier: Topic :: Text Processing :: Filters
Classifier: License :: OSI Approved :: GNU General Public License v3 (GPLv3)
Classifier: Programming Language :: Python :: 3
Requires-Python: >=3.5
Description-Content-Type: text/markdown
Dynamic: author
Dynamic: author-email
Dynamic: classifier
Dynamic: description
Dynamic: description-content-type
Dynamic: download-url
Dynamic: home-page
Dynamic: license
Dynamic: platform
Dynamic: requires-python
Dynamic: summary

# awkg

`awkg` is an `awk` like utility using modern day python language.
`awk` is amazingly simple, fast and quite handy. However, its domain specific constrain 
sometimes get in our way. `awkg` follows the steps of  `awk`'s design (including its convention for name😉)
and exposes full power of the modern day python. 
Python's large set of off-the-shelf existing libraries can of course be imported and used. 

# Installation 
```bash

# Install from pypy 
$ pip install awkg

# Install from github
$ pip install git+https://github.com/thammegowda/awkg.git

```

# CLI usage: 

```
$ awkg -h 
usage: awkg [-h] [-i INP] [-o OUT] [-F FS] [-OFS OFS] [-ORS ORS]
            [-b BEGIN_SCRIPT] [-e END_SCRIPT] [-im IMPORTS] [-it INIT_PATH]
            [-v]
            inline_script

awkg is an awk-like text-processing tool powered by python language

positional arguments:
  inline_script         Inline python script

optional arguments:
  -h, --help            show this help message and exit
  -i INP, --inp INP     Input file path; None=STDIN
  -o OUT, --out OUT     Output file path; None=STDOUT
  -F FS, -FS FS, --field-sep FS
                        the input field separator. Default=None implies white
                        space
  -OFS OFS, --out-field-sep OFS
                        the out field separator. Default=None implies same as
                        input FS.
  -ORS ORS, --out-rec-sep ORS
                        the output record separator. Default=None implies same
                        as input RS.
  -b BEGIN_SCRIPT, --begin BEGIN_SCRIPT
                        BEGIN block. initialize variables or whatever
  -e END_SCRIPT, --end END_SCRIPT
                        END block. Print summaries or whatever
  -im IMPORTS, --import IMPORTS
                        Imports block. Specify a list of module names to be
                        imported.Semicolon (;) is the delimiter. Ex:
                        json;numpy as np
  -it INIT_PATH, --init INIT_PATH
                        The rc file that initializes environment.Default is
                        $HOME/.awkg.py
  -v, --version         show program's version number and exit
                        
```
# Example

### Compute mean and std of words per sequence
```bash
 
cat data/train.src | awkg -b 'arr=[]; import numpy as np' 'arr.append(NF)' \
   -e 'arr=np.array(arr); print(f"{NR} lines from {FNAME}, mean={arr.mean():.2f}; std={arr.std():.4f}")'
```
### Filter records
```
# use print() explicitely 
cat data/train.src  | awkg  'if NF >= 25: print(*R)' 

Assign boolean expression to special variable RET to trigger implicit print 
cat data/train.src  | awkg  'RET = NF >= 25'

# print respects the OFS value
cat data/train.src  |  awkg  'if NF >= 25: print(NR, NF)' -OFS='\t'
```

## Special Variables
+ `NF`  : Number of fields
+ `NR`  : Record number 
+ `R`   : An array having all the columns of current record.
+ `R0`  : analogous to `$0` it stores the input line before splitting into `R`; since python does
 not permit `$` in the identifiers, it is renamed as `R0`
+ `RET` : When this variable is set to Truth value of `true` implicit `print(*R)` is triggered
+ `FS` : Input Field separator
+ `OFS` : Output Field separator; Unless explicitly set, `OFS=FS`
+ `ORS` : Output Record separator
+ `RS` (Currently Not in use)
+ `_locals` , `_globals` - all variables in local and global scope

You are allowed to use any valid python identifiers, than the above variables

## Default import modules 
These modules are imported by default
+ `sys`
+ `os`
+ `re`
+ `from pathlib import Path`


## Author:
+ [Thamme Gowda](https://twitter.com/thammegowda)

## Related tools
+ [pawk](https://github.com/alecthomas/pawk) similar to this repository, slightly different implementation.
+ [gawk](https://www.gnu.org/software/gawk/manual/gawk.html) GNU awk
 

