Module glob

glob is a cross-platform, pure Nim module for matching files against Unix style patterns. It supports creating patterns, testing file paths, and walking through directories to find matching files or directories. For example, the pattern src/**/*.nim will be expanded to return all files with a .nim extension in the src directory and any of its subdirectories.

It's similar to Python's glob module but supports extended glob syntax like {} groups.

Note that while glob works on all platforms, the patterns it generates can be platform specific due to differing path separator characters.

Syntax

tokenexampledescription
??.nimacts as a wildcard, matching any single character
**.nimmatches any string of any length until a path separator is found
****/licensesame as * but crosses path boundaries to any depth
[][ch]character class, matches any of the characters or ranges inside
{}{nim,js}string class (group), matches any of the strings inside
/foo/*.jsliteral path separator (even on Windows)
\foo\*.jsescape character (not path separator, even on Windows)

Any other characters are matched literally. Make special note of the difference between / and \. Even when on Windows platforms you should not use \ as a path separator, since it is actually the escape character in glob syntax. Instead, always use / as the path separator. This module will then use the correct separator when the glob is created.

Character Classes

Within bracket expressions ([]) you can use POSIX character classes, which are basically named groups of characters. These are the available classes and their roughly equivalent regex values:

POSIX classsimilar tomeaning
[:upper:][A-Z]uppercase letters
[:lower:][a-z]lowercase letters
[:alpha:][A-Za-z]upper- and lowercase letters
[:digit:][0-9]digits
[:xdigit:][0-9A-Fa-f]hexadecimal digits
[:alnum:][A-Za-z0-9]digits, upper- and lowercase letters
[:word:][A-Za-z0-9_]alphanumeric and underscore
[:blank:][ \t]space and TAB characters only
[:space:][ \t\n\r\f\v]blank (whitespace) characters
[:cntrl:][\x00-\x1F\x7F]control characters
[:cntrl:][!"\#$%&'()*+,-./:;<=>?@\[\]^_`{|}~]punctuation characters
[:ascii:][\x00-\x7F]ASCII characters
[:graph:][^ [:cntrl:]]graphic characters (all characters which have graphic representation)
[:punct:]punctuation (all graphic characters except letters and digits)
[:print:][[:graph] ]graphic characters and space

Examples

For these examples let's imagine we have this file structure:

├─ assets/
  └─ img/
     ├─ favicon.ico
     └─ logo.svg
├─ src/
  ├─ glob/
    ├─ other.nim
    ├─ regexer.nim
    └─ private/
       └─ util.nim
  └─ glob.nim
└─ glob.nimble
glob patternfiles returned
*@["glob.nimble"]
src/*.nim@["src/glob.nim"]
src/**/*.nim@["src/glob.nim", "src/glob/other.nim", "src/glob/regexer.nim", "src/glob/private/util.nim"]
**/*.{ico,svg}@["assets/img/favicon.ico", "assets/img/logo.svg"]
**/????.???@["src/glob.nim", "src/glob/private/util.nim", "assets/img/logo.svg"]

For more info on glob syntax see this link for a good reference, although it references a few more extended features which aren't yet supported.

Roadmap

There are a few more extended glob features and other capabilities which aren't supported yet but will potentially be added in the future. This includes:

  • multiple patterns (something like glob(["*.nim", "!foo.nim"]))
  • ?(...patterns): match zero or one occurrences of the given patterns
  • *(...patterns): match zero or more occurrences of the given patterns
  • +(...patterns): match one or more occurrences of the given patterns
  • @(...patterns): match one of the given patterns
  • !(...patterns): match anything except the given patterns

Types

Glob = object
  pattern*: string
  regexStr*: string
  regex*: Regex
Represents a compiled glob pattern and its backing regex.
GlobResult = tuple[path: string, kind: PathComponent]
The type returned by the walkGlobKinds iterator, containing the item's path and its kind - ie. pcFile, pcDir.

Procs

proc `$`(glob: Glob): string {.
raises: [], tags: []
.}
Converts a Glob object to its string representation. Useful for using echo glob directly.
proc hasMagic(str: string): bool {.
raises: [], tags: []
.}
Returns true if the given pattern contains any of the special glob characters *, ?, [, [.
proc globToRegex(pattern: string; isDos = isDosDefault): Regex {.
raises: [RegexError, GlobSyntaxError], tags: []
.}
Converts a string glob pattern to a regex pattern.
proc glob(pattern: string; isDos = isDosDefault): Glob {.
raises: [GlobSyntaxError, RegexError], tags: []
.}
Constructs a new Glob object from the given pattern.
proc matches(input: string; glob: Glob; isDos = isDosDefault): bool {.
raises: [], tags: []
.}
Returns true if input is a match for the given glob object.
proc matches(input, pattern: string; isDos = isDosDefault): bool {.
raises: [RegexError, GlobSyntaxError], tags: []
.}
Constructs a Glob object from the given pattern and returns true if input is a match.
proc listGlob(pattern: string; root = ""; relative = true; expandDirs = true;
             includeHidden = false; includeDirs = false): seq[string] {.
raises: [ AssertionError, OSError, UnpackError, GlobSyntaxError, RegexError, OSError], tags: [ReadDirEffect]
.}
Returns a list of all the files matching pattern.

Iterators

iterator walkGlobKinds(pattern: string; root = ""; relative = true; expandDirs = true;
                      includeHidden = false; includeDirs = false): GlobResult {.
raises: [ AssertionError, OSError, UnpackError, GlobSyntaxError, RegexError, OSError], tags: [ReadDirEffect]
.}

Iterates over all the paths within the scope of the given glob pattern, yielding all those that match. root defaults to the current working directory (by using os.getCurrentDir).

Returned paths are relative to root by default but relative = false will yield absolute paths instead.

Directories in the glob pattern are expanded by default. For example, given a ./src directory, src will be equivalent to src/** and thus all elements within the directory will match. Set expandDirs = false to disable this behavior.

Hidden files and directories are not yielded by default but can be included by setting includeHidden = true. The same goes for directories and the includeDirs = true parameter.

iterator walkGlob(pattern: string; root = ""; relative = true; expandDirs = true;
                 includeHidden = false; includeDirs = false): string {.
raises: [ AssertionError, OSError, UnpackError, GlobSyntaxError, RegexError, OSError], tags: [ReadDirEffect]
.}
Equivalent to walkGlobKinds but rather than yielding a GlobResult it yields only the path of the item, ignoring its kind.