Module glob

glob is a cross-platform, pure Nim module for matching files against Unix style patterns. It supports creating patterns, testing file paths, and walking through directories to find matching files or directories. For example, the pattern src/**/*.nim will be expanded to return all files with a .nim extension in the src directory and any of its subdirectories.

It's similar to Python's glob module but supports extended glob syntax like {} groups.

Note that while glob works on all platforms, the patterns it generates can be platform specific due to differing path separator characters.

Syntax

tokenexampledescription
??.nimacts as a wildcard, matching any single character
**.nimmatches any string of any length until a path separator is found
****/licensesame as * but crosses path boundaries to any depth
[][ch]character class, matches any of the characters or ranges inside
{}{nim,js}string class (group), matches any of the strings inside
/foo/*.jsliteral path separator (even on Windows)
\foo\*.jsescape character (not path separator, even on Windows)

Any other characters are matched literally. Make special note of the difference between / and \. Even when on Windows platforms you should not use \ as a path separator, since it is actually the escape character in glob syntax. Instead, always use / as the path separator. This module will then use the correct separator when the glob is created.

Character classes

Matching special characters

If you need to match some special characters like ] or - inside a bracket expression, you'll need to use them in specific ways to match them literally.

characterspecialliteraldescription
][)}]][]_.]must come first or is treated as closing bracket
-[_-=][-_]must come first or last or is treated as a range
![!<>][<!>]must not come first or is treated as negation character

POSIX classes

Within bracket expressions ([]) you can use POSIX character classes, which are basically named groups of characters. These are the available classes and their roughly equivalent regex values:

POSIX classsimilar tomeaning
[:upper:][A-Z]uppercase letters
[:lower:][a-z]lowercase letters
[:alpha:][A-Za-z]upper- and lowercase letters
[:digit:][0-9]digits
[:xdigit:][0-9A-Fa-f]hexadecimal digits
[:alnum:][A-Za-z0-9]digits, upper- and lowercase letters
[:word:][A-Za-z0-9_]alphanumeric and underscore
[:blank:][ \t]space and TAB characters only
[:space:][ \t\n\r\f\v]blank (whitespace) characters
[:cntrl:][\x00-\x1F\x7F]control characters
[:ascii:][\x00-\x7F]ASCII characters
[:graph:][^ [:cntrl:]]graphic characters (all characters which have graphic representation)
[:punct:][!"\#$%&'()*+,-./:;<=>?@\[\]^_`{|}~]punctuation (all graphic characters except letters and digits)
[:print:][[:graph] ]graphic characters and space

Extended pattern matching

glob supports most of the extended pattern matching syntax found under bash's extglob flag:

?(...patterns)match zero or one occurrences of the given patterns
*(...patterns)match zero or more occurrences of the given patterns
+(...patterns)match one or more occurrences of the given patterns
@(...patterns)match one of the given patterns

Note that the !(...patterns) form that allows for matching anything except the given patterns is not currently supported. This is a limitation in the regex backend.

Examples

For these examples let's imagine we have this file structure:

├─ assets/
  └─ img/
     ├─ favicon.ico
     └─ logo.svg
├─ src/
  ├─ glob/
    ├─ other.nim
    ├─ regexer.nim
    └─ private/
       └─ util.nim
  └─ glob.nim
└─ glob.nimble
glob patternfiles returned
*@["glob.nimble"]
src/*.nim@["src/glob.nim"]
src/**/*.nim@["src/glob.nim", "src/glob/other.nim", "src/glob/regexer.nim", "src/glob/private/util.nim"]
**/*.{ico,svg}@["assets/img/favicon.ico", "assets/img/logo.svg"]
**/????.???@["src/glob.nim", "src/glob/private/util.nim", "assets/img/logo.svg"]

For more info on glob syntax see this link for a good reference, although it references a few more extended features which aren't yet supported. As a cheatsheet, this wiki might also be useful.

Roadmap

There are a few more extended glob features and other capabilities which aren't supported yet but will potentially be added in the future. This includes:

  • multiple patterns (something like glob(["*.nim", "!foo.nim"]))

Types

Glob = object
  pattern*: string
  regexStr*: string
  regex*: Regex
Represents a compiled glob pattern and its backing regex.
GlobResult = tuple[path: string, kind: PathComponent]
The type returned by the walkGlobKinds iterator, containing the item's path and its kind - ie. pcFile, pcDir.

Procs

proc `$`(glob: Glob): string {.
raises: [], tags: []
.}
Converts a Glob object to its string representation. Useful for using echo glob directly.
proc hasMagic(str: string): bool {.
raises: [], tags: []
.}
Returns true if the given pattern contains any of the special glob characters *, ?, [, {.
proc globToRegex(pattern: string; isDos = isDosDefault): Regex {.
raises: [RegexError, GlobSyntaxError], tags: []
.}
Converts a string glob pattern to a regex pattern.
proc glob(pattern: string; isDos = isDosDefault): Glob {.
raises: [GlobSyntaxError, RegexError], tags: []
.}
Constructs a new Glob object from the given pattern.
proc matches(input: string; glob: Glob; isDos = isDosDefault): bool {.
raises: [], tags: []
.}
Returns true if input is a match for the given glob object.
proc matches(input, pattern: string; isDos = isDosDefault): bool {.
raises: [RegexError, GlobSyntaxError], tags: []
.}
Constructs a Glob object from the given pattern and returns true if input is a match.
proc listGlob(pattern: string; root = ""; relative = true; expandDirs = true;
             includeHidden = false; includeDirs = false): seq[string] {.
raises: [OSError, UnpackError, GlobSyntaxError, RegexError, OSError], tags: []
.}
Returns a list of all the files matching pattern.

Iterators

iterator walkGlobKinds(pattern: string; root = ""; relative = true; expandDirs = true;
                      includeHidden = false; includeDirs = false): GlobResult {.
raises: [OSError, UnpackError, GlobSyntaxError, RegexError, OSError], tags: []
.}

Iterates over all the paths within the scope of the given glob pattern, yielding all those that match. root defaults to the current working directory (by using os.getCurrentDir).

Returned paths are relative to root by default but relative = false will yield absolute paths instead.

Directories in the glob pattern are expanded by default. For example, given a ./src directory, src will be equivalent to src/** and thus all elements within the directory will match. Set expandDirs = false to disable this behavior.

Hidden files and directories are not yielded by default but can be included by setting includeHidden = true. The same goes for directories and the includeDirs = true parameter.

iterator walkGlob(pattern: string; root = ""; relative = true; expandDirs = true;
                 includeHidden = false; includeDirs = false): string {.
raises: [OSError, UnpackError, GlobSyntaxError, RegexError, OSError], tags: []
.}
Equivalent to walkGlobKinds but rather than yielding a GlobResult it yields only the path of the item, ignoring its kind.