How do I exclude files with specific file extension?
See original GitHub issueDescription of problem:
I tried to exclude files with certain extensions such as e.g. .class
from being collected. I therefore created a yaml-file and added it as filter. Log2timeline does properly include the filter-file. However when run, the files are not excluded.
Why are the filters not working? I would expect all files ending with .class
to be excluded.
My suggestion is that something goes wrong here: https://github.com/log2timeline/plaso/blob/5972f9b38bb364879328130e4b95c21d36b4ffb0/plaso/engine/worker.py#L869-L874
The worker calls the dfvfs' CompareLocation
-method, which for some reason starts to compare each segment of the path to the file with the regex pattern.
It constructs the location segments from my root directory through which results in: ['home', 'user', 'projects', 'plaso_fork', 'wrapper', 'test_data', 'bad_file.class']
. The loop breaks as soon as a missmatch occurs in these lines which is right away!
Could this be due to the fact that i use a wrapper?
Command line and arguments:
I run the below wrapper script for plaso using the followin command line arguments:
--storage_file ./dd.plaso --temporary_directory . --logfile ./logfile_dd.log.gz --partitions all --filter-file exclude_files_test.yaml --no_dependencies_check --process_archives --skip_compressed_streams --single_process --debug test_exlusion
Source data:
I created a test directory with the following structure:
test_data
├── bad_file.class
├── bin
│ ├── bad_file_2.class
│ └── good_file_2.txt
├── good_file.txt
└── lib
├── bad_file_3.class
└── good_file_3.txt
And used a filter_file that looked like this:
description: Exclude all Bad File Extensions.
type: exclude
paths:
- '/.+[.]class'
Plaso version:
20210606
Operating system Plaso is running on:
Ubuntu 20.04.2 LTS
Installation method:
Pulled log2timeline/plaso
from github and created a wrapper in python in the following form:
import sys
from plaso.cli import log2timeline_tool
from plaso.cli import tools as cli_tools
input_reader = cli_tools.StdinInputReader()
tool = log2timeline_tool.Log2TimelineTool(input_reader=input_reader)
def main(args):
if not tool.ParseArguments(args):
return False
try:
tool.ExtractEventsFromSources()
except:
return False
if __name__ == "__main__":
args = sys.argv[1:]
main(args)
Debug output/tracebacks:
The filtered debug log shows that all files are processed
2021-07-07 15:54:30,993 [DEBUG] (MainProcess) PID:313944 <worker> [ProcessFileEntry] processing file entry: OS:/home/treebeard/Projects/plaso_fork/bore_wrapper/test_exlusion
...2021-07-07 15:54:30,993 [DEBUG] (MainProcess) PID:313944 <worker> [ProcessFileEntryDataStream] proce...
2021-07-07 15:54:30,995 [DEBUG] (MainProcess) PID:313944 <worker> [ProcessFileEntry] done processing file entry: OS:/home/treebeard/Projects/plaso_fork/bore_wrapper/test_exlusion
2021-07-07 15:54:30,997 [DEBUG] (MainProcess) PID:313944 <worker> [ProcessFileEntry] processing file entry: OS:/home/treebeard/Projects/plaso_fork/bore_wrapper/test_exlusion/bin
...2021-07-07 15:54:30,997 [DEBUG] (MainProcess) PID:313944 <worker> [ProcessFileEntryDataStream] proce...
2021-07-07 15:54:30,998 [DEBUG] (MainProcess) PID:313944 <worker> [ProcessFileEntry] done processing file entry: OS:/home/treebeard/Projects/plaso_fork/bore_wrapper/test_exlusion/bin
2021-07-07 15:54:31,000 [DEBUG] (MainProcess) PID:313944 <worker> [ProcessFileEntry] processing file entry: OS:/home/treebeard/Projects/plaso_fork/bore_wrapper/test_exlusion/lib
...2021-07-07 15:54:31,000 [DEBUG] (MainProcess) PID:313944 <worker> [ProcessFileEntryDataStream] proce...
2021-07-07 15:54:31,002 [DEBUG] (MainProcess) PID:313944 <worker> [ProcessFileEntry] done processing file entry: OS:/home/treebeard/Projects/plaso_fork/bore_wrapper/test_exlusion/lib
2021-07-07 15:54:31,003 [DEBUG] (MainProcess) PID:313944 <worker> [ProcessFileEntry] processing file entry: OS:/home/treebeard/Projects/plaso_fork/bore_wrapper/test_exlusion/bad_file.class
...2021-07-07 15:54:31,003 [DEBUG] (MainProcess) PID:313944 <worker> [ProcessFileEntryDataStream] proce...
2021-07-07 15:54:31,216 [DEBUG] (MainProcess) PID:313944 <worker> [ProcessFileEntry] done processing file entry: OS:/home/treebeard/Projects/plaso_fork/bore_wrapper/test_exlusion/bad_file.class
2021-07-07 15:54:31,217 [DEBUG] (MainProcess) PID:313944 <worker> [ProcessFileEntry] processing file entry: OS:/home/treebeard/Projects/plaso_fork/bore_wrapper/test_exlusion/good_file.txt
...2021-07-07 15:54:31,217 [DEBUG] (MainProcess) PID:313944 <worker> [ProcessFileEntryDataStream] proce...
2021-07-07 15:54:31,398 [DEBUG] (MainProcess) PID:313944 <worker> [ProcessFileEntry] done processing file entry: OS:/home/treebeard/Projects/plaso_fork/bore_wrapper/test_exlusion/good_file.txt
2021-07-07 15:54:31,399 [DEBUG] (MainProcess) PID:313944 <worker> [ProcessFileEntry] processing file entry: OS:/home/treebeard/Projects/plaso_fork/bore_wrapper/test_exlusion/bin/bad_file_2.class
...2021-07-07 15:54:31,400 [DEBUG] (MainProcess) PID:313944 <worker> [ProcessFileEntryDataStream] proce...
2021-07-07 15:54:31,583 [DEBUG] (MainProcess) PID:313944 <worker> [ProcessFileEntry] done processing file entry: OS:/home/treebeard/Projects/plaso_fork/bore_wrapper/test_exlusion/bin/bad_file_2.class
2021-07-07 15:54:31,585 [DEBUG] (MainProcess) PID:313944 <worker> [ProcessFileEntry] processing file entry: OS:/home/treebeard/Projects/plaso_fork/bore_wrapper/test_exlusion/bin/good_file_2.txt
...2021-07-07 15:54:31,585 [DEBUG] (MainProcess) PID:313944 <worker> [ProcessFileEntryDataStream] proce...
2021-07-07 15:54:31,767 [DEBUG] (MainProcess) PID:313944 <worker> [ProcessFileEntry] done processing file entry: OS:/home/treebeard/Projects/plaso_fork/bore_wrapper/test_exlusion/bin/good_file_2.txt
2021-07-07 15:54:31,768 [DEBUG] (MainProcess) PID:313944 <worker> [ProcessFileEntry] processing file entry: OS:/home/treebeard/Projects/plaso_fork/bore_wrapper/test_exlusion/lib/bad_file_3.class
...2021-07-07 15:54:31,769 [DEBUG] (MainProcess) PID:313944 <worker> [ProcessFileEntryDataStream] proce...
2021-07-07 15:54:31,950 [DEBUG] (MainProcess) PID:313944 <worker> [ProcessFileEntry] done processing file entry: OS:/home/treebeard/Projects/plaso_fork/bore_wrapper/test_exlusion/lib/bad_file_3.class
2021-07-07 15:54:31,951 [DEBUG] (MainProcess) PID:313944 <worker> [ProcessFileEntry] processing file entry: OS:/home/treebeard/Projects/plaso_fork/bore_wrapper/test_exlusion/lib/good_file_3.txt
...2021-07-07 15:54:31,952 [DEBUG] (MainProcess) PID:313944 <worker> [ProcessFileEntryDataStream] proce...
2021-07-07 15:54:32,135 [DEBUG] (MainProcess) PID:313944 <worker> [ProcessFileEntry] done processing file entry: OS:/home/treebeard/Projects/plaso_fork/bore_wrapper/test_exlusion/lib/good_file_3.txt
Issue Analytics
- State:
- Created 2 years ago
- Comments:11 (6 by maintainers)
Top GitHub Comments
yes at the moment that would lead to repeated similar looking filter expressions
having a more granular find spec could be an option here, but that has not been high on the priority list
Very good idea! Thanks for your inputs.