src/glob

glob is a cross-platform, pure Nim module for matching files against Unix style patterns. It supports creating patterns, testing file paths, and walking through directories to find matching files or directories. For example, the pattern src/**/*.nim will be expanded to return all files with a .nim extension in the src directory and any of its subdirectories.

It's similar to Python's glob module but supports extended glob syntax like {} groups.

Note that while glob works on all platforms, the patterns it generates can be platform specific due to differing path separator characters.

Syntax

tokenexampledescription
??.nimacts as a wildcard, matching any single character
**.nimmatches any string of any length until a path separator is found
****/licensesame as * but crosses path boundaries to any depth
[][ch]character class, matches any of the characters or ranges inside
{}{nim,js}string class (group), matches any of the strings inside
/foo/*.jsliteral path separator (even on Windows)
\foo\*.jsescape character (not path separator, even on Windows)

Any other characters are matched literally. Make special note of the difference between / and \. Even when on Windows platforms you should not use \ as a path separator, since it is actually the escape character in glob syntax. Instead, always use / as the path separator. This module will then use the correct separator when the glob is created.

Character classes

Matching special characters

If you need to match some special characters like ] or - inside a bracket expression, you'll need to use them in specific ways to match them literally.

characterspecialliteraldescription
][)}]][]_.]must come first or is treated as closing bracket
-[_-=][-_]must come first or last or is treated as a range
![!<>][<!>]must not come first or is treated as negation character

POSIX classes

Within bracket expressions ([]) you can use POSIX character classes, which are basically named groups of characters. These are the available classes and their roughly equivalent regex values:

POSIX classsimilar tomeaning
[:upper:][A-Z]uppercase letters
[:lower:][a-z]lowercase letters
[:alpha:][A-Za-z]upper- and lowercase letters
[:digit:][0-9]digits
[:xdigit:][0-9A-Fa-f]hexadecimal digits
[:alnum:][A-Za-z0-9]digits, upper- and lowercase letters
[:word:][A-Za-z0-9_]alphanumeric and underscore
[:blank:][ \t]space and TAB characters only
[:space:][ \t\n\r\f\v]blank (whitespace) characters
[:cntrl:][\x00-\x1F\x7F]control characters
[:ascii:][\x00-\x7F]ASCII characters
[:graph:][^ [:cntrl:]]graphic characters (all characters which have graphic representation)
[:punct:][!"\#$%&'()*+,-./:;<=>?@\[\]^_`{|}~]punctuation (all graphic characters except letters and digits)
[:print:][[:graph] ]graphic characters and space

Extended pattern matching

glob supports most of the extended pattern matching syntax found under bash's extglob flag:

?(...patterns)match zero or one occurrences of the given patterns
*(...patterns)match zero or more occurrences of the given patterns
+(...patterns)match one or more occurrences of the given patterns
@(...patterns)match one of the given patterns
!(...patterns)match anything except the given patterns

Examples

For these examples let's imagine we have this file structure:

├─ assets/
  └─ img/
     ├─ favicon.ico
     └─ logo.svg
├─ src/
  ├─ glob/
    ├─ other.nim
    ├─ regexer.nim
    └─ private/
       └─ util.nim
  └─ glob.nim
└─ glob.nimble
glob patternfiles returned
*@["glob.nimble"]
src/*.nim@["src/glob.nim"]
src/**/*.nim@["src/glob.nim", "src/glob/other.nim", "src/glob/regexer.nim", "src/glob/private/util.nim"]
**/*.{ico,svg}@["assets/img/favicon.ico", "assets/img/logo.svg"]
**/????.???@["src/glob.nim", "src/glob/private/util.nim", "assets/img/logo.svg"]

For more info on glob syntax see this link for a good reference, although it references a few more extended features which aren't yet supported. As a cheatsheet, this wiki might also be useful.

Roadmap

There may be some features and other capabilities which aren't supported yet but will potentially be added in the future, for example:

  • unicode character support
  • multiple patterns (something like glob(["*.nim", "!foo.nim"]))

Types

FilterDescend = (path: string) -> bool

A predicate controlling whether or not to recurse into a directory when iterating with a recursive glob pattern. Returning true will allow recursion, while returning false will prevent it.

path can either be relative or absolute, which depends on GlobOption.Absolute being present in the iterator's options.

FilterYield = (path: string, kind: PathComponent) -> bool

A predicate controlling whether or not to yield a filesystem item. Paths for which this predicate returns false will not be yielded.

path can either be relative or absolute, which depends on GlobOption.Absolute being present in the iterator's options. kind is an os.PathComponent.

Glob = object
  pattern*: string
  regexStr*: string
  regex*: Regex
  base*: string
  magic*: string ## Represents a compiled glob pattern and its backing regex. Also stores
                 ## the glob's ``base`` & ``magic`` components as given by the
                 ## `splitPattern proc <#splitPattern,string>`_.
  
GlobEntry = tuple[path: string, kind: PathComponent]
Represents a filesystem entity matched by a glob pattern, containing the item's path and its kind as an os.PathComponent.
GlobOption {.pure.} = enum
  Absolute, IgnoreCase, NoExpandDirs, FollowLinks, ## iterator behavior
  Hidden, Files, Directories, FileLinks, DirLinks ## to yield or not to yield
Flags that control the behavior or results of the file system iterators. See defaultGlobOptions for some usage & examples.
flagmeaning
GlobOption.Absoluteyield paths as absolute rather than relative to root
GlobOption.IgnoreCasematching will ignore case differences
GlobOption.NoExpandDirsif pattern is a directory don't treat it as <dir>/**/*
GlobOption.Hiddenyield hidden files or directories
GlobOption.Directoriesyield directories
GlobOption.Filesyield files
GlobOption.DirLinksyield links to directories
GlobOption.FileLinksyield links to files
GlobOption.FollowLinksrecurse into directories through links
GlobOptions = set[GlobOption]
The set type containing flags for controlling glob behavior.
var options: GlobOptions = {}
if someCondition: options += GlobOption.Absolute
PatternStems = tuple[base: string, magic: string]
The type returned by splitPattern where base contains the leading non-magic path components and magic contains any path segments containing or following special glob characters.

Consts

defaultGlobOptions = {GlobOption.Files, GlobOption.FileLinks,
                      GlobOption.DirLinks}
The default options used when none are provided. If a new set is provided it overrides the defaults entirely, so in order to partially modify the default options you can use Nim's set union and intersection operators:
const optsNoFiles = defaultGlobOptions - {Files}
const optsHiddenNoLinks = defaultGlobOptions + {Hidden} - {FileLinks, DirLinks}

On case-insensitive filesystems (like Windows), this also includes GlobOption.IgnoreCase.

Procs

func glob(pattern: string; isDos = isDosDefault; ignoreCase = isDosDefault): Glob {.
    ...raises: [GlobSyntaxError, RegexError], tags: [RootEffect], forbids: [].}
Constructs a new Glob object from the given pattern.
func globToRegex(pattern: string; isDos = isDosDefault;
                 ignoreCase = isDosDefault): Regex {.
    ...raises: [RegexError, GlobSyntaxError], tags: [], forbids: [].}
Converts a string glob pattern to a regex pattern.
func hasMagic(str: string): bool {....raises: [], tags: [RootEffect], forbids: [].}
Returns true if the given string is glob-like, ie. if it contains any of the special characters *, ?, [, { or an extglob which is one of the characters ?, !, @, +, or * followed by (.

Example:

doAssert("*.nim".hasMagic)
doAssert("profile_picture.{png,jpg}".hasMagic)
doAssert(not "literal_match.html".hasMagic)
func matches(input, pattern: string; isDos = isDosDefault;
             ignoreCase = isDosDefault): bool {.
    ...raises: [RegexError, GlobSyntaxError], tags: [RootEffect], forbids: [].}
Check that input matches the given pattern and return true if it does. Shortcut for matches(input, glob(pattern, isDos, ignoreCase)).

Example:

when defined posix:
  doAssert "src/dir/foo.nim".matches("src/**/*.nim")
elif defined windows:
  doAssert r"src\dir\foo.nim".matches("src/**/*.nim")
func matches(input: string; glob: Glob): bool {....raises: [], tags: [RootEffect],
    forbids: [].}
Returns true if input is a match for the given glob object.

Example:

const matcher = glob("src/**/*.nim")
const matcher2 = glob("bar//src//**/*.nim")
when defined posix:
  doAssert("src/dir/foo.nim".matches(matcher))
  doAssert(not r"src\dir\foo.nim".matches(matcher))
  doAssert("bar/src/dir/foo.nim".matches(matcher2))
  doAssert("./bar//src/baz.nim".matches(matcher2))
elif defined windows:
  doAssert(r"src\dir\foo.nim".matches(matcher))
  doAssert(not "src/dir/foo.nim".matches(matcher))
func splitPattern(pattern: string): PatternStems {....raises: [],
    tags: [RootEffect], forbids: [].}

Splits the given pattern into two parts: the base which is the part containing no special glob characters and the magic which includes any path segments containing or following special glob characters.

When pattern is not glob-like, ie. pattern.hasMagic == false, it will be considered a literal matcher and the entire pattern will be returned as magic, while base will be the empty string "".

Example:

doAssert "root_dir/inner/**/*.{jpg,gif}".splitPattern == ("root_dir/inner", "**/*.{jpg,gif}")
doAssert "this/is-a/literal-match.txt".splitPattern == ("", "this/is-a/literal-match.txt")

Iterators

iterator walkGlob(pattern: string | Glob; root = "";
                  options = defaultGlobOptions;
                  filterDescend: FilterDescend = nil;
                  filterYield: FilterYield = nil): string

Iterates over all the paths within the scope of the given glob pattern, yielding all those that match. root defaults to the current working directory (by using os.getCurrentDir).

See GlobOption for the flags available to alter iteration behavior and output.

Example:

for path in walkGlob("src/*.nim"):
  ## `path` is a file only in the `src` directory (not any of its
  ## subdirectories) with the `.nim` file extension
  discard

for path in walkGlob("docs/**/*.{png, svg}"):
  ## `path` is a file in the `docs` directory or any of its
  ## subdirectories with either a `png` or `svg` file extension
  discard
iterator walkGlobKinds(pattern: string | Glob; root = "";
                       options = defaultGlobOptions;
                       filterDescend: FilterDescend = nil;
                       filterYield: FilterYield = nil): GlobEntry
Equivalent to walkGlob but yields a GlobEntry which contains the path as well as the kind of the item.

Example:

for path, kind in walkGlobKinds("src/*.nim"):
  doAssert(path is string and kind is PathComponent)

## include hidden items, exclude links
const optsHiddenNoLinks = defaultGlobOptions + {Hidden} - {FileLinks, DirLinks}
for path, kind in walkGlobKinds("src/**/*", options = optsHiddenNoLinks):
  doAssert(kind notin {pcLinkToFile, pcLinkToDir})