sos.cleaner.parsers — Parser Interface

class sos.cleaner.parsers.SoSCleanerParser(config={}, skip_cleaning_files=[])[source]

Bases: object

Parsers are used to build objects that will take a line as input, parse it for a particular pattern (E.G. IP addresses) and then make any necessary subtitutions by referencing the SoSMap() associated with the parser.

Ideally a new parser subclass will only need to set the class level attrs in order to be fully functional.

Parameters:

conf_file (str) – The configuration file to read from

Variables:
  • name (str) – The parser name, used in logging errors

  • regex_patterns (list) – A list of regex patterns to iterate over for every line processed

  • mapping (SoSMap()) – Used by the parser to store and obfuscate matches

  • map_file_key (str) – The key in the map_file to read when loading previous obfuscation matches

compile_regexes = True
generate_item_regexes()[source]

Generate regexes for items the parser will be searching for repeatedly without needing to generate them for every file and/or line we process

Not used by all parsers.

get_map_contents()[source]

Get the contents of the mapping used by the parser

Returns:

All matches and their obfuscate counterparts

Return type:

dict

load_map_entries()[source]
map_file_key = 'unset'
name = 'Undefined Parser'
parse_line(line)[source]

This will be called for every line in every file we process, so that every parser has a chance to scrub everything.

This will first try to identify needed obfuscations for items we have already encountered (if the parser uses compiled regexes that is) and make those substitutions early on. After which, we will then parse the line again looking for new matches.

parse_string_for_keys(string_data)[source]

Parse a given string for instances of any obfuscated items, without applying the normal regex comparisons first. This is mainly used to obfuscate filenames that have, for example, hostnames in them.

Rather than try to regex match the string_data, just use the builtin checks for substrings matching known obfuscated keys

Parameters:

string_data (str) – The line to be parsed

Returns:

The obfuscated line

Return type:

str

parser_skip_files = []
regex_patterns = []
skip_cleaning_files = []
skip_line_patterns = []