question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

How do I exclude files with specific file extension?

See original GitHub issue

Description of problem:

I tried to exclude files with certain extensions such as e.g. .class from being collected. I therefore created a yaml-file and added it as filter. Log2timeline does properly include the filter-file. However when run, the files are not excluded.

Why are the filters not working? I would expect all files ending with .class to be excluded.

My suggestion is that something goes wrong here: https://github.com/log2timeline/plaso/blob/5972f9b38bb364879328130e4b95c21d36b4ffb0/plaso/engine/worker.py#L869-L874

The worker calls the dfvfs' CompareLocation-method, which for some reason starts to compare each segment of the path to the file with the regex pattern.

https://github.com/log2timeline/dfvfs/blob/5e3c089de915d6db981771232c10a53af12b6877/dfvfs/helpers/file_system_searcher.py#L393-L403

It constructs the location segments from my root directory through which results in: ['home', 'user', 'projects', 'plaso_fork', 'wrapper', 'test_data', 'bad_file.class']. The loop breaks as soon as a missmatch occurs in these lines which is right away!

Could this be due to the fact that i use a wrapper?

Command line and arguments:

I run the below wrapper script for plaso using the followin command line arguments: --storage_file ./dd.plaso --temporary_directory . --logfile ./logfile_dd.log.gz --partitions all --filter-file exclude_files_test.yaml --no_dependencies_check --process_archives --skip_compressed_streams --single_process --debug test_exlusion

Source data:

I created a test directory with the following structure:

test_data
├── bad_file.class
├── bin
│   ├── bad_file_2.class
│   └── good_file_2.txt
├── good_file.txt
└── lib
    ├── bad_file_3.class
    └── good_file_3.txt

And used a filter_file that looked like this:

description: Exclude all Bad File Extensions.
type: exclude
paths:
- '/.+[.]class'

Plaso version:

20210606

Operating system Plaso is running on:

Ubuntu 20.04.2 LTS

Installation method:

Pulled log2timeline/plaso from github and created a wrapper in python in the following form:

import sys

from plaso.cli import log2timeline_tool
from plaso.cli import tools as cli_tools

input_reader = cli_tools.StdinInputReader()
tool = log2timeline_tool.Log2TimelineTool(input_reader=input_reader)

def main(args):
    if not tool.ParseArguments(args):
        return False
    try:
        tool.ExtractEventsFromSources()
    except:
        return False


if __name__ == "__main__":
    args = sys.argv[1:]
    main(args)

Debug output/tracebacks:

The filtered debug log shows that all files are processed
2021-07-07 15:54:30,993 [DEBUG] (MainProcess) PID:313944 <worker> [ProcessFileEntry] processing file entry: OS:/home/treebeard/Projects/plaso_fork/bore_wrapper/test_exlusion
...2021-07-07 15:54:30,993 [DEBUG] (MainProcess) PID:313944 <worker> [ProcessFileEntryDataStream] proce...
2021-07-07 15:54:30,995 [DEBUG] (MainProcess) PID:313944 <worker> [ProcessFileEntry] done processing file entry: OS:/home/treebeard/Projects/plaso_fork/bore_wrapper/test_exlusion
2021-07-07 15:54:30,997 [DEBUG] (MainProcess) PID:313944 <worker> [ProcessFileEntry] processing file entry: OS:/home/treebeard/Projects/plaso_fork/bore_wrapper/test_exlusion/bin
...2021-07-07 15:54:30,997 [DEBUG] (MainProcess) PID:313944 <worker> [ProcessFileEntryDataStream] proce...
2021-07-07 15:54:30,998 [DEBUG] (MainProcess) PID:313944 <worker> [ProcessFileEntry] done processing file entry: OS:/home/treebeard/Projects/plaso_fork/bore_wrapper/test_exlusion/bin
2021-07-07 15:54:31,000 [DEBUG] (MainProcess) PID:313944 <worker> [ProcessFileEntry] processing file entry: OS:/home/treebeard/Projects/plaso_fork/bore_wrapper/test_exlusion/lib
...2021-07-07 15:54:31,000 [DEBUG] (MainProcess) PID:313944 <worker> [ProcessFileEntryDataStream] proce...
2021-07-07 15:54:31,002 [DEBUG] (MainProcess) PID:313944 <worker> [ProcessFileEntry] done processing file entry: OS:/home/treebeard/Projects/plaso_fork/bore_wrapper/test_exlusion/lib
2021-07-07 15:54:31,003 [DEBUG] (MainProcess) PID:313944 <worker> [ProcessFileEntry] processing file entry: OS:/home/treebeard/Projects/plaso_fork/bore_wrapper/test_exlusion/bad_file.class
...2021-07-07 15:54:31,003 [DEBUG] (MainProcess) PID:313944 <worker> [ProcessFileEntryDataStream] proce...
2021-07-07 15:54:31,216 [DEBUG] (MainProcess) PID:313944 <worker> [ProcessFileEntry] done processing file entry: OS:/home/treebeard/Projects/plaso_fork/bore_wrapper/test_exlusion/bad_file.class
2021-07-07 15:54:31,217 [DEBUG] (MainProcess) PID:313944 <worker> [ProcessFileEntry] processing file entry: OS:/home/treebeard/Projects/plaso_fork/bore_wrapper/test_exlusion/good_file.txt
...2021-07-07 15:54:31,217 [DEBUG] (MainProcess) PID:313944 <worker> [ProcessFileEntryDataStream] proce...
2021-07-07 15:54:31,398 [DEBUG] (MainProcess) PID:313944 <worker> [ProcessFileEntry] done processing file entry: OS:/home/treebeard/Projects/plaso_fork/bore_wrapper/test_exlusion/good_file.txt
2021-07-07 15:54:31,399 [DEBUG] (MainProcess) PID:313944 <worker> [ProcessFileEntry] processing file entry: OS:/home/treebeard/Projects/plaso_fork/bore_wrapper/test_exlusion/bin/bad_file_2.class
...2021-07-07 15:54:31,400 [DEBUG] (MainProcess) PID:313944 <worker> [ProcessFileEntryDataStream] proce...
2021-07-07 15:54:31,583 [DEBUG] (MainProcess) PID:313944 <worker> [ProcessFileEntry] done processing file entry: OS:/home/treebeard/Projects/plaso_fork/bore_wrapper/test_exlusion/bin/bad_file_2.class
2021-07-07 15:54:31,585 [DEBUG] (MainProcess) PID:313944 <worker> [ProcessFileEntry] processing file entry: OS:/home/treebeard/Projects/plaso_fork/bore_wrapper/test_exlusion/bin/good_file_2.txt
...2021-07-07 15:54:31,585 [DEBUG] (MainProcess) PID:313944 <worker> [ProcessFileEntryDataStream] proce...
2021-07-07 15:54:31,767 [DEBUG] (MainProcess) PID:313944 <worker> [ProcessFileEntry] done processing file entry: OS:/home/treebeard/Projects/plaso_fork/bore_wrapper/test_exlusion/bin/good_file_2.txt
2021-07-07 15:54:31,768 [DEBUG] (MainProcess) PID:313944 <worker> [ProcessFileEntry] processing file entry: OS:/home/treebeard/Projects/plaso_fork/bore_wrapper/test_exlusion/lib/bad_file_3.class
...2021-07-07 15:54:31,769 [DEBUG] (MainProcess) PID:313944 <worker> [ProcessFileEntryDataStream] proce...
2021-07-07 15:54:31,950 [DEBUG] (MainProcess) PID:313944 <worker> [ProcessFileEntry] done processing file entry: OS:/home/treebeard/Projects/plaso_fork/bore_wrapper/test_exlusion/lib/bad_file_3.class
2021-07-07 15:54:31,951 [DEBUG] (MainProcess) PID:313944 <worker> [ProcessFileEntry] processing file entry: OS:/home/treebeard/Projects/plaso_fork/bore_wrapper/test_exlusion/lib/good_file_3.txt
...2021-07-07 15:54:31,952 [DEBUG] (MainProcess) PID:313944 <worker> [ProcessFileEntryDataStream] proce...
2021-07-07 15:54:32,135 [DEBUG] (MainProcess) PID:313944 <worker> [ProcessFileEntry] done processing file entry: OS:/home/treebeard/Projects/plaso_fork/bore_wrapper/test_exlusion/lib/good_file_3.txt

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:11 (6 by maintainers)

github_iconTop GitHub Comments

1reaction
joachimmetzcommented, Jul 7, 2021

Which again would lead to an bloated filter file, right?

yes at the moment that would lead to repeated similar looking filter expressions

having a more granular find spec could be an option here, but that has not been high on the priority list

0reactions
welkomiercommented, Jul 12, 2021

Very good idea! Thanks for your inputs.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Exclude or filter files using file type and regular expressions ...
Open the Code42 app. · Select settings button Settings. · Select Backup Sets. · Next to File Exclusions, select Change. · Enter a...
Read more >
How to Exclude Specific File Extension While Copying Files ...
Exclude Specific File Extension While Copying Files Recursively. To do this, we use the 'xargs' command to make 'cp' consider the output of...
Read more >
Exclude certain file extensions when getting files from a directory
var files = Directory.GetFiles(jobDir);. But it seems that this function can only choose the file types you want to include, not exclude.
Read more >
How to exclude specific extension in Get-ChildItem in ...
To exclude specific extensions, you need to use –Exclude parameter in Get-ChildItem. Example. For example, as shown below when we run ...
Read more >
How do I exclude/ignore specific file types/extensions with ...
Try using the "find" utility to exclude these files and then pipe the result to rmlint: $ find /target/dir -type f ! -name...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found