glob is a cross-platform, pure Nim module for matching files against Unix style patterns. It supports creating patterns, testing file paths, and walking through directories to find matching files or directories. For example, the pattern src/**/*.nim will be expanded to return all files with a .nim extension in the src directory and any of its subdirectories.
It's similar to Python's glob module but supports extended glob syntax like {} groups.
Note that while glob works on all platforms, the patterns it generates can be platform specific due to differing path separator characters.
Syntax
token | example | description |
---|---|---|
? | ?.nim | acts as a wildcard, matching any single character |
* | *.nim | matches any string of any length until a path separator is found |
** | **/license | same as * but crosses path boundaries to any depth |
[] | [ch] | character class, matches any of the characters or ranges inside |
{} | {nim,js} | string class (group), matches any of the strings inside |
/ | foo/*.js | literal path separator (even on Windows) |
\ | foo\*.js | escape character (not path separator, even on Windows) |
Any other characters are matched literally. Make special note of the difference between / and \. Even when on Windows platforms you should not use \ as a path separator, since it is actually the escape character in glob syntax. Instead, always use / as the path separator. This module will then use the correct separator when the glob is created.
Character classes
Matching special characters
If you need to match some special characters like ] or - inside a bracket expression, you'll need to use them in specific ways to match them literally.
character | special | literal | description |
---|---|---|---|
] | [)}]] | []_.] | must come first or is treated as closing bracket |
- | [_-=] | [-_] | must come first or last or is treated as a range |
! | [!<>] | [<!>] | must not come first or is treated as negation character |
POSIX classes
Within bracket expressions ([]) you can use POSIX character classes, which are basically named groups of characters. These are the available classes and their roughly equivalent regex values:
POSIX class | similar to | meaning |
---|---|---|
[:upper:] | [A-Z] | uppercase letters |
[:lower:] | [a-z] | lowercase letters |
[:alpha:] | [A-Za-z] | upper- and lowercase letters |
[:digit:] | [0-9] | digits |
[:xdigit:] | [0-9A-Fa-f] | hexadecimal digits |
[:alnum:] | [A-Za-z0-9] | digits, upper- and lowercase letters |
[:word:] | [A-Za-z0-9_] | alphanumeric and underscore |
[:blank:] | [ \t] | space and TAB characters only |
[:space:] | [ \t\n\r\f\v] | blank (whitespace) characters |
[:cntrl:] | [\x00-\x1F\x7F] | control characters |
[:ascii:] | [\x00-\x7F] | ASCII characters |
[:graph:] | [^ [:cntrl:]] | graphic characters (all characters which have graphic representation) |
[:punct:] | [!"\#$%&'()*+,-./:;<=>?@\[\]^_`{|}~] | punctuation (all graphic characters except letters and digits) |
[:print:] | [[:graph] ] | graphic characters and space |
Extended pattern matching
glob supports most of the extended pattern matching syntax found under bash's extglob flag:
?(...patterns) | match zero or one occurrences of the given patterns |
*(...patterns) | match zero or more occurrences of the given patterns |
+(...patterns) | match one or more occurrences of the given patterns |
@(...patterns) | match one of the given patterns |
Note that the !(...patterns) form that allows for matching anything except the given patterns is not currently supported. This is a limitation in the regex backend.
Examples
For these examples let's imagine we have this file structure:
├─ assets/ │ └─ img/ │ ├─ favicon.ico │ └─ logo.svg ├─ src/ │ ├─ glob/ │ │ ├─ other.nim │ │ ├─ regexer.nim │ │ └─ private/ │ │ └─ util.nim │ └─ glob.nim └─ glob.nimble
glob pattern | files returned |
---|---|
* | @["glob.nimble"] |
src/*.nim | @["src/glob.nim"] |
src/**/*.nim | @["src/glob.nim", "src/glob/other.nim", "src/glob/regexer.nim", "src/glob/private/util.nim"] |
**/*.{ico,svg} | @["assets/img/favicon.ico", "assets/img/logo.svg"] |
**/????.??? | @["src/glob.nim", "src/glob/private/util.nim", "assets/img/logo.svg"] |
For more info on glob syntax see this link for a good reference, although it references a few more extended features which aren't yet supported. As a cheatsheet, this wiki might also be useful.
Roadmap
There are a few more extended glob features and other capabilities which aren't supported yet but will potentially be added in the future. This includes:
- multiple patterns (something like glob(["*.nim", "!foo.nim"]))
Types
Glob = object pattern*: string regexStr*: string regex*: Regex base*: string magic*: string
- Represents a compiled glob pattern and its backing regex. Also stores the glob's base & magic components as per the splitPattern proc.
GlobResult = tuple[path: string, kind: PathComponent]
- The type yielded by the walkGlobKinds iterator, containing the item's path and its kind - ie. pcFile, pcDir.
PatternStems = tuple[base: string, magic: string]
- The type returned by splitPattern where base contains the leading non-magic path components and magic contains any path segments containing or following special glob characters.
Procs
proc hasMagic(str: string): bool {.
raises: [], tags: [].}- Returns true if the given string is glob-like, ie. if it contains any of the special characters *, ?, [, { or an extglob which is one of the characters ?, !, @, +, or * followed by (.
proc `$`(glob: Glob): string {.
raises: [], tags: [].}- Converts a Glob object to its string representation. Useful for using echo glob directly.
proc globToRegex(pattern: string; isDos = isDosDefault): Regex {.
raises: [RegexError, GlobSyntaxError], tags: [].}- Converts a string glob pattern to a regex pattern.
proc splitPattern(pattern: string): PatternStems {.
raises: [], tags: [].}-
Splits the given pattern into two parts: the base which is the part containing no special glob characters and the magic which includes any path segments containing or following special glob characters.
When pattern is not glob-like, ie. pattern.hasMagic == false, it will be considered a literal matcher and the entire pattern will be returned as magic, while base will be the empty string "".
proc glob(pattern: string; isDos = isDosDefault): Glob {.
raises: [GlobSyntaxError, RegexError], tags: [].}- Constructs a new Glob object from the given pattern.
proc matches(input: string; glob: Glob): bool {.
raises: [], tags: [].}- Returns true if input is a match for the given glob object.
proc matches(input, pattern: string; isDos = isDosDefault): bool {.
raises: [RegexError, GlobSyntaxError], tags: [].}- Constructs a Glob object from the given pattern and returns true if input is a match. Shortcut for matches(input, glob(pattern, isDos)).
proc listGlob(pattern: string | Glob; root = ""; relative = true; expandDirs = true; includeHidden = false; includeDirs = false): seq[string]
- Returns a list of all the file system items matching pattern. See the documentation for walkGlobKinds for more info.
Iterators
iterator walkGlobKinds(pattern: string | Glob; root = ""; relative = true; expandDirs = true; includeHidden = false; includeDirs = false): GlobResult
-
Iterates over all the paths within the scope of the given glob pattern, yielding all those that match. root defaults to the current working directory (by using os.getCurrentDir).
Returned paths are relative to root by default but relative = false will yield absolute paths instead.
Directories in the glob pattern are expanded by default. For example, given a ./src directory, src will be equivalent to src/** and thus all elements within the directory will match. Set expandDirs = false to disable this behavior.
Hidden files and directories are not yielded by default but can be included by setting includeHidden = true. The same goes for directories and the includeDirs = true parameter.
iterator walkGlob(pattern: string | Glob; root = ""; relative = true; expandDirs = true; includeHidden = false; includeDirs = false): string
- Equivalent to walkGlobKinds but rather than yielding a GlobResult it yields only the path of the item, ignoring its kind.