Introduction
I've implemented a couple Unix core utilities in Haskell, and want to continue
a series of posts going through the details - starting with simple programs
like cat
, seq
, and which
, and then moving on towards more featureful
programs like uniq
, tr
and maybe grep
.
So, let's implement which
in Haskell!
Background
On most operating systems (Linux, Windows, MacOS, *BSD), the PATH
environment
variable defines which directories contain executables. which
helps you find
an executable by searching through these directories.
On Linux, PATH
typically looks something like this:
% echo $PATH
/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/sbin:/usr/local/sbin:/usr/sbin
A colon separated, ordered list of directory paths.
On Windows, PATH
looks like this
PS C:\> echo $Env:PATH
C:\Windows\system32;C:\Windows;C:\WindowsSystem32\Wbem;C:\Windows\System32\WindowsPowerShell\v1.0;C:\Users\leaf\AppData\Roaming\local\bin
A semicolon separated, ordered list of directory paths.
The goal is the same for both, parse the variable output into a list, search
each of these directories in order, and return the first match. This is
mirroring the behavior that your shell (sh
, zsh
, powershell
) performs to
determine what executable to run based on the command names you provide.
It's reasonable for executables with the same name to appear in different
places in PATH
, which makes the ordering important. For instance, if you had
a custom cat
executable on Linux, you could place that in /usr/local/bin
,
and it would shadow the system copy in /bin
.
Module and Imports
The top of the file contains the module definition and imports. Since this is
going to be an executable, not a library, we use module Main where
. We'll
skip going through the imports for now, but reference them as we move through
the file. To follow along with the examples, you can put this header in a file
and load it from ghci
with :load which.hs
. This
GitHub repository has a working
.cabal
file as an example.
module Main where
-- which
--
-- look for the arguments on $PATH
import Control.Monad (filterM, when)
import Data.List.Split (splitOn)
import Data.Maybe
import System.Directory
import System.Environment (getArgs, getEnv)
import System.Exit (exitFailure)
import System.Info (os)
Data Flow
Our input comes from two places, the command line and the environment. PATH
expands to a list of directories that may contain the target file. We need to
maintain the following constraints:
- only consider directories on
PATH
that exist - there must be a file with the same name in the directory
- the file match we found must be executable
- if no match can be found, continue, but exit with failure when finished
After expanding PATH
, we'll be able to use filters to apply each of the
constraints.
Parsing PATH
Lets start with parsing PATH
. To read the variable, we need two functions to
start:
getEnv :: String -> IO String
splitOn :: Eq a => [a] -> [a] -> [[a]]
Composing these with <$>
shows we're on the right track - dividing
PATH
into directories.
*Main> take 5 . reverse . splitOn ":" <$> getEnv "PATH"
["/usr/local/games","/usr/games","/bin","/sbin","/usr/bin"]
What if some of these directories are invalid? Or point to a network location
that's not accessible right now? We need to filter out directories that don't
exist. Since checking for directory existance is necessarily dependant on the
outside world, we'll need filterM
instead of filter
.
*Main> take 5 . reverse . splitOn ":" <$> getEnv "PATH" >>= filterM doesDirectoryExist
["/usr/local/games","/usr/games","/bin","/sbin","/usr/bin"]
This works for Linux, but what about Windows? We need some platform dependant
behavior and a function to wrap this up. The os
constant from System.Info
gives us a plain String
to work with. Alternatively, we could use the CPP
language extension and #ifndef mingw32_HOST_OS
to make this decision at
compile time. For this kind of simple difference, keeping the behavior present
in the code seems cleaner.
getPaths :: IO [FilePath]
-- ^ convert the PATH variable to a list of valid directories
getPaths =
splitOn separator <$> getEnv "PATH"
>>= filterM doesDirectoryExist
where
separator = case os of
"mingw32" -> ";"
_ -> ":"
vs
getPaths :: IO [FilePath]
-- ^ convert the PATH variable to a list of valid directories
getPaths =
splitOn separator <$> getEnv "PATH"
>>= filterM doesDirectoryExist
where
#ifndef mingw32_HOST_OS
separator = ":" -- POSIX
#else
separator = ";" -- Windows
#endif
Filtering on IO
Now that we have a list of directories to consider, we can start restricting
(filter
) and manipulating (map
) the list to match the constraints. The
map
and filter
functions apply non-Monad functions to iterables. This is
what we're used to when working with pure code.
-- even :: Integral a => a -> Bool
-- map (*2) :: Num b => [b] -> [b]
*Main> map (*2) . filter even $ [1..10]
[4,8,12,16,20]
Our input comes from the IO Monad, but this isn't a problem. We can use <$>
to apply pure functions on Monad values, like we did to split the PATH
variable. The tricky part is that the functions we need to apply for filtering
and mapping produce IO values.
executable :: FilePath -> IO Bool
doesFileExist :: FilePath -> IO Bool
mapM
and filterM
allow us to apply these functions, and >>=
lets us bind
them together, similar to what we did with .
and map
+ filter
.
filterM :: Applicative m => (a -> m Bool) -> [a] -> m [a]
mapM :: (Traversable t, Monad m) => (a -> m b) -> t a -> m (t b)
The flow of data will be:
- a list of directories
- a list of filepaths (directory + file)
- a list of filepaths that exist
- a list of executable filepaths
Translating this flow into Haskell:
search :: String -> IO [FilePath]
-- ^ look for a file in the right directories, that's executable
-- instead of checking dir contents, create the path and check for existence
search file =
getPaths
>>= mapM addFileToPath
>>= filterM doesFileExist
>>= filterM runnable
where
addFileToPath directory = pure $ pathJoin directory file
With this framework in place, the next step is to fill in the supporting
functions to do the real world work. pathJoin
acts like Python's
os.path.join()
- be platform aware and don't add unnecessary path separators.
We can't guarantee that the path names end in separators. For example,
/usr/bin
vs /usr/bin/
.
runnable :: FilePath -> IO Bool
-- ^ is this file executable? expects it to exist
runnable file = executable <$> getPermissions file
pathJoin :: FilePath -> String -> FilePath
-- ^ join a directory path and filename, platform aware
-- could have used System.FilePath.Posix (filePath) here
pathJoin [] file = file
pathJoin base file =
if last base == separator
then base <> file
else base <> [separator] <> file
where
separator = case os of
"mingw32" -> '\\'
_ -> '/'
Putting it all together
The hard work is done, what's left to do is tie it all together. The error
condition is when no match could be found. Otherwise, we print the first
result. head
and Maybe
provide a simple way to express this.
main :: IO ()
-- ^ print the path to each argument if possible
-- if anything didn't exist, exit failure
main = do
paths <- getArgs >>= mapM which
when (any isNothing paths) exitFailure
mapM_ putStrLn $ catMaybes paths
which :: String -> IO (Maybe String)
-- ^ grab the first result, if there was one
which file =
maybeHead <$> search file
where
maybeHead [] = Nothing
maybeHead (x:_) = Just x
main
applies each command line argument to this function, and checks for
Nothing
.
Conclusion
We have a fully functional, platform independent implementation of which
in
Haskell! We used filterM
and mapM
to work with IO functions in an elegant
way, and produce a pipeline that applies each constraint. System.Info (os)
let's us account for differences in platforms. Moreover, it takes advantage of
Haskell's lazy evaluation, and won't continue searching for a match after it's
been found.
- Linux
$ stack exec which python bash
/usr/bin/python
/bin/bash
- Windows
PS C:\> stack exec which python.exe cmd.exe
C:\Users\leaf\AppData\Local\Programs\Python\Python37-32\python.exe
C:\Windows\system32\cmd.exe
Thanks to u/andersk for pointing some efficiency improvements!
Full implementation
Here's the full source. You can also find it here.
module Main where
-- which
--
-- look for the arguments on $PATH
import Control.Monad (filterM, when)
import Data.List.Split (splitOn)
import Data.Maybe
import System.Directory
import System.Environment (getArgs, getEnv)
import System.Exit (exitFailure)
import System.Info (os)
main :: IO ()
-- ^ print the path to each argument if possible
-- if anything didn't exist, exit failure
main = do
paths <- getArgs >>= mapM which
when (any isNothing paths) exitFailure
mapM_ putStrLn $ catMaybes paths
which :: String -> IO (Maybe String)
-- ^ grab the first result, if there was one
which file =
maybeHead <$> search file
where
maybeHead [] = Nothing
maybeHead (x:_) = Just x
search :: String -> IO [FilePath]
-- ^ look for a file in the right directories, that's executable
search file =
getPaths
>>= mapM addFileToPath
>>= filterM doesFileExist
>>= filterM runnable
where
addFileToPath directory = pure $ pathJoin directory file
pathJoin :: FilePath -> String -> FilePath
-- ^ join a directory path and filename, platform aware
-- could have used System.FilePath.Posix (filePath) here
pathJoin [] file = file
pathJoin base file =
if last base == separator
then base <> file
else base <> [separator] <> file
where
separator = case os of
"mingw32" -> '\\'
_ -> '/'
getPaths :: IO [FilePath]
-- ^ convert the PATH variable to a list of valid directories
getPaths =
splitOn separator <$> getEnv "PATH"
>>= filterM doesDirectoryExist
where
separator = case os of
"mingw32" -> ";"
_ -> ":"
runnable :: FilePath -> IO Bool
-- ^ is this file executable? expects it to exist
runnable file = executable <$> getPermissions file