Introducing F#

Jun 25, 11:00 pm

Article Author: Frank Antonsen
.NET 3.5 Books

Introduction


Functional Programming (FP) is getting more and more acceptance into the mainstream. Scripting languages such as Perl and Python are increasingly incorporating concepts from FP into their cores. The Standard Template Library (STL) of modern C++ also includes many FP ideas, and the most common library, Boost, contains even more. The increased interest in FP becomes very clear when you notice that both Sun and Microsoft are working on FP languages. Sun is working a language called Fortress, which is a mixed FP and OOP language for highly demanding numerical calculations, where people would normally use an old-timer like Fortran. Microsoft has very quickly adapted a functional language for the .NET platform as part of their experiments with generics.


This language, F#, is the topic of this article. It is an adaptation of a dialect of one of the most common modern FP languages, Meta Language (ML), which has its roots in experiments in compiler construction tools and automatic proof systems of the 1970’s designed by universities in Scotland. It quickly grew to be the main modern FP language, Microsoft is also working on an extension of C#, called Cω (C omega), which adds many FP notions to C#, but this is still very much an experimental language.


If you have been following my previous articles on FP in C#, you will already be familiar with the ML-family of languages. What I presented as pseudo-executable pseudo-code or FP pseudo-code in those articles is actually almost ML. The ML dialect selected by Microsoft as a basis for F# is technically known as OCaml. This is one of the fastest running languages around – it can rival hand-coded C/C++ in speed, and easily outperforms Java, not to mention the various scripting languages. It also supports normal object-oriented programming and it usually generates much faster executables. This makes interfacing to programs written in other .NET languages such as C# or Visual Basic a lot easier. After all, such interoperability is what .NET is all about, and now having a proper FP language at your disposal, you are better able to combine the best of both worlds.


When I m confronted with a new programming language, I usually find it helpful to rewrite an old familiar program in the new language, because I know exactly what the old program does, and can then concentrate on getting the syntax of the new language right. So, I will start by rewriting the string application from my previous Thinking Recursively article on FP in C# (see the Related Links section), just to get started. Then I ll turn to the more complete file application of the second article, this will be done in a sequel to this article. Along the way, you ll see how to define and manipulate Recursive Data Structures (RDS s) in F#.


System Requirements


To run this code, you should install the F# compiler from Microsoft first. This also comes with a plug-in for Visual Studio 2003/2005, which you should also install if you plan to use VS. It will run on any computer with at least the .NET Framework 1.1 installed. The samples in this article are based on version 1.1.8 of F#, released in January 2006.


At the time I originally wrote this article, if you wanted to use the interactive environment fsi, you would need to use .NET 2.0, but more recent versions of fsi will work with .NET 1.x (F# is evolving very rapidly!).


Installing and Compiling the Sample Code


The sample download for this article contains a VS solution containing 2 projects: StringApplication and FileApplication. These correspond to the applications that were developed in the FP in C# series so-far. To install the applications, simply download it from the ASPToday site, unzip to a folder and you can open it within VS to compile it.


To register F# with Visual Studio, you need to install the plug-in which comes with the installer from Microsoft. The compiler needs to know where to find a special DLL, fslib10.dll (fslibng.dll in older versions). This will provide the full benefit of IntelliSense for F#. I m not aware of any similar plug-in for SharpDeveloper or other IDE s at this time. Figure 1shows an example of F# being used in Visual Studio 2005.


.



Figure 1. An example of an F# project in Visual Studio 2005.


When compiling the two projects, Visual Studio will generate two executables, just like a normal .NET solution. These can then be run from the command line. I do not cover GUI applications in this article, but you do have the entire .NET library at your disposal in F#, so you can develop GUI applications in F# just like in any other .NET language.


Your First F# Program


In my pseudo-executable pseudo-code I usually define a function in terms of a series of special cases. For instance the length() function that calculated the number of elements in a list looked like this:



length [] = 0
length (x::xs) = 1 + length xs


Here [] denotes the empty list, and (x::xs) is short hand for a list whose first element is x, with the (possibly empty) remainder called xs. How would that look in F#, then? One way is like this:



open List
let rec mylength x = if x=[] then 0 else 1 + mylength (tl x);


I used the word mylength here because a function length(), doing precisely the same thing, already exists in F#.


First of all, any declaration, whether it is of a function or a variable, in F# must be preceded by the let keyword. Secondly, recursive definitions must be marked with the rec keyword


The F# analog to the C# using directive is the open directive. It works the same way as an explicit opening of a namespace. The F# library containing the list-manipulation functions is called simply List, and the open directive opens up this namespace. Here, I used the built-in tail-function, which is called simply tl() in the ML family. Similarly, the head function is called hd(). Notice, by the way, that brackets around arguments are optional – I’ll say more about this later. Functional languages also support another syntax for specifying special cases. In the snippet above, I used an if expression, very much like I would use an if statement in C#. A more common, and more powerful way is to use pattern matching. In F#, the way you write special cases for the arguments is by using the match keyword. Defining mylength using pattern matching would look like this:



let rec mylength lst = match lst with
                      []      -> 0
                      | x::xs -> 1 + mylength xs


The match expression is similar to a C# switch statement, it first specifies which identifier to do the matching on (here lst), it then lists the cases, after each case an arrow points to the code to execute if that case matches. In fact, you can also use a match expression anywhere you would normally use a switch statement. To get used to the syntax, you could try to write the definitions of the head and tail functions using pattern matching. Obviously, if you only have two cases, it might be a lot simpler to write if… else, just like you probably wouldn t use a switch statement in C# unless you have more than two cases.


If you re familiar with other FP languages, you ll notice that F# differs from many other FP languages; in standard ML, functions are declared using the fun keyword, whether they are recursive or not, while Haskell uses no special keywords at all and is thus virtually identical to the pseudo-code. These are really just minor syntactical variations, the underlying structure is the same. This makes F# programs more verbose than the corresponding program in standard ML or Haskell, but they are still quite terse compared to VB or C#.


Type Inference – Letting the Compiler Find the Right Type for You


Another thing to note here is the way F# works with types. Note that I didn t specify any types – the compiler automatically infers the correct (i.e. most general) type to be used. In this case, the inferred type will be ‘a list -> int, which you should read as a function taking a list of any type (‘a is a generic type variable) and returning an integer.


The compiler will automatically infer the correct type of an expression, so you rarely need to specify the types yourself. This could lead to the impression that F# is not as strongly typed as C#, but in fact, it is quite the opposite. All the languages of the ML-family are very strongly typed. In F# there is absolutely no implicit conversion, not even between two closely related types such as chars and strings or ints and doubles. It is a compile time error to write x = 1 + 1.0 – the compiler first induces that x must be an int, then it reaches the plus sign, and after that it reaches the conclusion that x must be a floating point number, at which point it reports a type error. You will have to explicitly convert one of the arguments to the type of the other using one of the built-in conversion functions such as Int.to_float() or Float.to_int(). The advantage of this very strong typing is that once a program compiles, a large number of common mistakes have been eliminated.


The empty list [] is a bit special, it is polymorphic, it is an empty list of any type. The same quantity, [], can denote an empty list of integers and an empty list of XML nodes. It all depends on the context, and in a given context it will always have precisely one explicit type. There is another special value, a bit similar to null in C#. It is denoted by (). Any type can be instantiated in F#, and a function with no return value (i.e., void in C#) would return the value () in F#. The technical name for the type of () is unit, which is the F#/ML analogue of void in C#. So, () is the unique instance of the unit type.


All the types from the Common Type System of .NET are available in F#, either through their normal name, such as System.Int32 or through F# type definitions, such as int. Moreover, F# uses the .NET generics feature, so types such as list<int> are available directly in F# already either by using the .NET 2.0 generic name above or writing the ML-like definition int list. In fact, as part of the F# project, the underlying opcodes in MS IL were augmented, resulting in an extended intermediate language called ILX. ILX contains a lot of very advanced features that I don t have the room to cover here. ILX doesn t replace the IL from the normal Common Language Runtime, instead you can think of it as a kind of library adding features otherwise only found in .NET 2.0 to .NET 1.1.


Combining Types – Lists and Tuples


Lists are the most important data structure in any FP language. So I think it is worthwhile to spend a little more time showing you how to use them in F#. First, an anonymous list is declared using square brackets with elements separated by semicolons. You can use this to initialize a named list, just like you would do for arrays in C#. For instance:



let lst = [1;2;3;4];


You add an element to the front of a list using the :: operator. The list above is actually the same as 1:: 2 :: 3 :: 4 :: [], which is how it is constructed internally. To append two lists of the same type, F# has the </span>operator:</p> <!-- @START —>

let lst = [1;2]  [3;4];
<p></pre><!-- @END —>


Personally, I find this choice of operator a bit strange, but that is the ML standard. In Haskell you would use ++ as an infix operator instead. The elements of a list must all be of the exact same type, but sometimes you want to have a similar kind of data structure containing different types. For this situation F# has tuples. These are denoted by parentheses with elements separated by commas, just like argument lists to functions in C#. An example of their use is below:



let pair = (1, "one")
let pair = 1 :: "one"
let (x,y,z) = (1 , 2, 3)


The last example shows how you can assign tuples in one go. You can use this to assign values to several variables in a single operation. This is actually an example of a common idiom in FP: you use a series of patterns to define behavior. This has nothing to do with design patterns, instead the word comes from pattern matching. What the compiler does when it sees a line like let (x,y,z) = (1 , 2, 3), as in the code snippet just quoted, is to match the pattern on the left – the tuple of unknown variables (x,y,z) – with the expression on the right – the tuple of three constants (1,2,3). The result of the match being the assignments x=1, y=2, and z=3. You saw the same pattern matching in the match expression introduced earlier. The type assigned to a tuple (x,y,z) is ‘a * ‘b * ‘c, where ‘a, ‘b, ‘c are the types of x, y, and z, respectively. This is known as a product type, the underlying mathematical construction is that of the usual Cartesian product of sets (the set of possible values for the given type). Moreover, just like the empty list [] was special (it was polymorphic and could match a list of any type) so is the empty tuple (). As I mentioned above this is a special type called a unit. Fortunately, you don t need to worry much about such subtleties when you write actual F# programs as you will see. I just mentioned it for completeness and for those who know about ML/FP already.


Surprisingly, prior to version 1.1.5 from November last year, F# did not have a standard list printing function, so I wrote one myself. Now a function called print or print_any (the two are synonymous) can be found in the ML library, MLLib.Pervasives, but it is still worth having a look at my earlier home written function. The construction is very simple, and actually closely follows the list printing function of my C# FunctionalProgramming library I ve been developing in my previous articles. But it also illustrates some interesting F# features. The code looks like this:



let print_list lst = let rec print_internal lst = 
                match lst with
                  [] -> ""
                  | x1 ::[] -> x1 
                  | x1 :: x2 -> x1 ^ "," ^ print_internal x2
                in
                  print_string ("[" ^ print_internal lst ^ "]")


Let me go through this step by step. The actual expression evaluated by this function is the very last line. It uses the string concatenation operator ^ together with the built-in print_string function. Instead of using this, I could also have written System.Console.WriteLine() and called the .NET library, but this is the ML-purist way. The main part of the code though, is the let … in expression. This declares the local definitions, in my case I only have one such local definition and it is a recursive function. In F# you can have local functions as well as local identifiers – there really is no distinction between the two. The (local or private) recursive function itself is defined in terms of three special cases: an empty list, a singleton list and a general list.


Like C#, F# has two types of comments: block comments and line-comments. The former are denoted by (* … ) instead of /..*/, while the latter is the same as in C#. I will be using the former comments before major blocks of code and use the latter for commenting individual lines, much like I would in C# actually.


You can define much more complicated types of data in F#, but I won t covered that here. In the following article I will show you how to define enumerated data types, struct-like types and more complex recursive data types. The compiler will accept files with either the .fs or the .ml extension, which allows you to port existing ML programs to .NET very easily


Executing F# Code


In FP you work by defining types and sets of functions that manipulating the types. This is very much what you do in C# too, except that in C#, all types must themselves be classes or fit inside other classes.


In F#, a program is no different from an expression. When you write applications in C#, you define a Main() method which the runtime system then invokes. Without a Main() method your program is really just a collection of data definitions and methods, but it doesn t actually execute any code. In F# you can do this, but you don’t need to do so. The result of a program is by default just the result of its last expression. By default, this will be written to the console upon exit. In the absence of a special method automatically invoked by the runtime, F# obviously needs some other way of telling the compiler to actually execute some particular code. In general, you can divide F# code into two categories: definitions (which you can recognize by the let keyword) and requests for execution. The keyword do is used to tell the compiler that the following line is not a definition but something it should actually compute. If a source file is to be executable, you would normally add one or more do statements at the end. These would then be executed in the order that they are found. For instance, to use the new mylength() function, you could write:



do print (mylength [1;2;3;4;5])


This brings me to another point worth mentioning and that is function applications. This may seem like a strange, esoteric topic to bring up, but actually there is more than one way to apply a function in F#. In C#, if I have a method taking two arguments I invoke it like this: obj.method(arg1, arg2). The parenthese are mandatory and I have to supply all the arguments. None of this holds in F#, I ve already mentioned that parentheses are optional in F#, and so far this has mostly been a matter of personal taste whether you should use them or not. In general, my own rule of thumb is to use them only when they are necessary to make code unambiguous. This is the same rule of thumb I use to decide whether to use parentheses in an expression, so it makes sense to me at least to use it here too. But there are actually cases where it would be wrong to put parentheses in. Remember F# treats values and functions on an equal footing, just like all FP languages tend to do. So I can write let x = y anywhere and expect it to define a new value x with the same type and value as y. Similarly, I can also write let g = f, if f is a function. This will define a new function g with the same type (i.e., signature) and the same value (i.e., function body) as f. Notice how I do not specify any arguments at all, in fact writing let g = f(x) will result in g having the same type and value as f s return value. Of course, g may very well still be a function (if f was a higher order function), but it is more likely to be a number or a string instead.


This becomes much more powerful if you only specify some of the arguments. This is known as partial application or currying. Suppose I have a function of two variables, I can then decide to fix the value of one of the arguments and obtain a new function of only one argument. Like this:



let fst = f 5       // fixes first argument
let fst y = f 5 y // equivalent
let snd x = f x 5 // fixes second argument


Here I fixed the argument to the value 5 in each case just to show you the principle. There are a lot of things to note in this innocent looking piece of code. First, if a function has several arguments, say x, y and z, I would normally use it like this f x y z and not f (x,y,z). There is a subtle difference between these two forms. The second, C#-like way of writing assumes the function takes a tuple as its argument, whereas the first assumes it takes three arguments. The types of the functions would differ. In the first case it would be ‘a -> ‘b -> ‘c -> ‘d, whereas in the second case it would be (‘a * ‘b * ‘c) -> ‘d. These are completely different types.


The first type may look strange at first, but it makes sense when you take currying into account. What it says is this: if you fix the first and second arguments, say, you get a function of type ‘c > ‘d, while if you fix all three you get something of type ‘d back. Since currying (or partial application) is such a powerful tool, once you get used to it most functional developers would prefer the ‘a>‘b->‘c->‘d type of function to the more C#-like (‘a*‘b*‘c)->‘d. For the same reason, most F# developers would avoid using parentheses around arguments. To make this work smoothly, FP developers tend to put arguments which are other functions first in the argument list.


The final thing you should notice in the example above is that one of the functions is called fst’. In F# (as in most modern FP languages) you can use an apostrophe in a variable name, provided that it is the last character (it is pronounced prime in that case).


Interactive F#


As of version 1.1.3, the F# release comes with an interactive environment. This is something most FP languages have and it is a great tool for getting familiar with language constructions and for experimenting. Once you start working with this tool, you will discover that it really boosts your productivity. (It would be nice to have something similar for C#) This new tool is called fsi and is located in the bin folder. It is called from the command line, but to use it, you must have .NET 2.0 installed. You can see an example of fsi in Figure 2.



Figure 2. The new interactive F# environment.


In this environment you can type any valid F# code and it will be evaluated immediately. However, in order to tell the interpreter that you ve finished typing and want it to compile and execute what you ve keyed in, you need to terminate the code with a double semicolon. If you do not end a line with this double semicolon, fsi will assume you intend to continue the code on the next line. There is no need for a line continuation character as in VB or in the C++ preprocessor. If your lines of code are valid F#, you will see the resulting type information displayed and if the line is executable, its computed value. You can see an example of this in Figure 3.



Figure 3. An example of an FSI session


This should actually be enough to get you started writing simple F# programs. What you still need to know is the names of the built-in functions like hd for head and tl for tail. But I think this is best done through some concrete examples. The F# installer creates a /samples directory together with a /fsharp-manual. You should take a close look at some of the samples, some of them demonstrate some very advanced techniques.


The F# Version of the StringApplication


In my previous article on FP in C#, entitled Thinking Recursively in C#, I showed you a simple function that finds the largest common substring of two strings. The first, imperative, version used looping over the characters of the two input strings keeping track of the best fit (i.e., longest substring) found so far. I then translated this to an FP-style C# application using recursion together with an accumulator argument. It is this version I want to translate into F# now. Just like you could write an imperative version of this in C#, you could also do that in F#, but since F# is a functional language an FP solution is more elegant.


The first step is to define two helper functions to convert a string into a list of characters and vice versa. These are called Convert.explode and Converter.implode, respectively. They are part of the standard ML library, but unfortunately not of the F# (or OCaml) libraries. But I can easily write them in F#:


Converting the Strings



(* conversion functions *)
let rec explode str = let len = String.length str in if len=0 then [] else (String.sub str 0 1) :: explode (String.sub str 1 (len-1))
let rec implode lst = match lst with [] -> "" | x1 :: x2 -> x1 ^ implode x2


Here I used the F# standard module String; its sub function extracts the substring quite similar to C# s Substring method. If I add an open String directive to my program I can skip the String prefix writing simply length and sub. With this out of the way, I can jump straight into rewriting the central functions. The first one to rewrite is the Utils.same function. This was a general purpose function which looked at two lists and returned a list of consecutive matching elements taken from the start of the list. Here it is in F#:



let rec same lst1 lst2 =
   match (lst1, lst2) with
       ([], _)        -> []   // first list is empty
     | (_,[])         -> []   // second list is empty
     | (x::xs, y::ys) -> if (x = y) then x::(same xs ys) else []


The only new thing here, besides the terseness, is the underscore in the first two match rules. This is a wildcard matching anything, so the first case looks for lst1 = [], and the second looks for lst2 = []. Next, I have to define the common1 and common2 functions. Here is a more or less direct translation into F# of my FP C# solution:



let rec common l1 l2 = 
      let rec common2 l1 l2 = 
             match (l1, l2) with
               ([], _) -> []         // first list empty
             | (_, []) -> []         // second list empty
               | (x::xs, y::ys)  -> let sub = implode (same l1 l2) in
                                       if (sub = "") then common2 l1 ys 
                                          else sub :: (common2 l1 ys)
       in
            match (l1, l2) with
             ([], _) -> []      // first list empty
           | (_, []) -> []      // second list empty
           | (x::xs, y::ys) -> (common2 l1 l2)  (common xs l2)
let longest s1 s2 = 
       let rec pickLongest s acc = 
           match s with
           []      -&gt; acc         // done, return accumulator
          |(x::xs) -&gt; pickLongest xs (if String.length x &gt; String.length acc
                                          then x else acc)
      in
          let (l1, l2) = (explode s1, explode s2) in 
              let rsd = common l1 l2 in 
                  pickLongest rsd &quot;&quot;
<p></pre><!-- @END —>


This way of solving the problem illustrates the use of nested let…in expressions. You can also see a quick way of assigning values to multiple identifiers using a tuple (l1, l2). Finally, in pickLongest you see how every expression in F# returns a value, in this case the if expression. Here, I m using the if expression the same way I would use the ternary ?: operator in C#; the C# version of that if expression would be (x.Length > acc.Length)? x : acc. In C# you probably shouldn t use the ternary operator too often in method arguments, but in F# you tend to use it because the code is usually terser anyway. I could just as easily have used another let…in expression.


This solution differs from the C# solution in a few ways. First of all, common2 is really just a helper function for common1, which is a general purpose function computing the list of common sublists. So common2 has been defined locally to common1, which has been renamed to simply common, so that there is no can be no confusing the names. Similarly, pickLongest has been made local to longest, which is a renamed version of the same function in C#. This reduced the solution to two general purpose functions. To improve it further, I would start applying some of the built-in list/tuple manipulating functions as well as various higher order ones.


Looping in F#


The whole point of the string application in my Thinking Recursively article was to help you turn imperative programs, using loops, into functional ones, using recursion. But you can actually loop in F#. It has both a for loop and a while loop, but they are a little different from what you are used to in C#. In F# everything has to have a value, so a loop must return some value (which can be unit – the F# version of void) In other words, loops can appear on the right hand side of an assignment.


The syntax is fairly familiar:



for i = 0 to 10 do myfunc i done  // for loop
while i <= 10 do myfunc i done // while loop


The do and done keywords serve to delimit the code to be executed, just like curly braces in C#. The type of the loop expressions themselves is unit. You can make the body of the loop span more than one expression by separating them with semicolons as you would in C#. The block is terminated with the done keyword. In F#, a semicolon is used for denoting the sequencing of expressions (and optionally for termination of a line). In well-structured F# programs you rarely need to have more than a single function call in a loop, but you can, if you wish. The same syntactical structure is used to make loops return a value.


When you do use a loop in F#, you would normally want it to modify a value. By default, any value in F# is immutable just like strings in C#. But there is a way around this, if you really want to overwrite a value and not create a new copy of it, you can do so by prefixing the identifier with the ! operator. For instance to make a purely iterative version of the factorial function, I could write something like this:



let fact n = (let res = ref 1 in 
             for i = 1 to n do
               res := !res * i;
               done;
             res)


The first statement declares a local variable, res, it is an integer initialized to the value one and I tell the compiler that I want to use it as an imperative variable by using the keyword ref. Then I use it in a for-loop. This loop uses a normal immutable F# variable i as a counter and re-assigns (operator :=) the res variable to its old value (the ! operator) times i. Finally it returns the accumulated result. While you can do this in F#, it is clearly only intended for special situations such as highly-optimized inner-loops.


Recursion is usually much more effective in F#, so you would typically only use these loops when you are working with non-F# data structures such as classes in the .NET library. The compiler is very good at generating efficient code for recursive functions and types.


To assign values to an F# array you use a variation of the list initialization syntax where you replace [ and ] with [| and |], for example let arr = [|1;2;3;4|]. You can read the ith element of this array like this arr.(!i), and you can assign something to it using the </span> operator, arr.(!i) < arr.(!j).


The CompatArray Module


If you work on older versions of .NET, you need to include a special utility module for converting between the generic arrays and lists of F# and the standard CLR array types. Inserting this code snippet at the top of the file, where the open directives are, will result in this utility module being loaded and being given the short name CompatArray:



module CompatArray = Microsoft.FSharp.Compatibility.CompatArray


This module conains a number of important functions (from_list and to_list) for converting between the two types.


Conclusion


This was a quick tour of the new F# language. I showed you how to use standard libraries with the open directive, and how to define values or functions with the let keyword. You also saw that the compiler automatically figures out the correct type, so there is little need to explicitly state the type as you do in C# or VB. This type inferencing, together with the very strong typing of F# can be a great help in eliminating errors at compile (or even editing) time. In fact, a very similar feature is proposed for version three of C# as part of LINQ.


We also saw that in F#, both values and functions can be local, and in both cases you would use the let … in expression to make them so. A particularly powerful feature of F#, found in almost all modern FP languages, is pattern matching. This is a kind of an advanced switch statement, and can be used to greatly simplify function definitions. I also showed you how to do imperative style programming, such as loops, in F#, and how to use the language in interactive mode. Both of these should make the transition from C# or VB easier.


Finally, you saw how to define simple types, and how to define functions in terms of others by currying them. In part 2 of this series, I ll cover more advanced types and higher order functions. I will also be covering anonymous functions and touch upon OOP in F# and how to interoperate with C#.

Founders at Work

Commenting is closed for this article.