Introduction
25 November 2021
Updated: 03 September 2023
These notes are based on working through The Rust Programming Language Book. It can also be accessed using
rustup docs --book
Getting Started
Hello World
To create a hello-world program simply create a new folder with a file called main.rs
which defines a main
function with the following:
main.rs
You can then compile your code with:
Thereafter, run it with:
A few things to note on the main.rs
file above:
- The
main
function is the entrypoint to the application println!
is a macro (which apparently we will learn about in chapter 19)- semicolons
;
are required at the end of each statement
Cargo
Cargo is Rust’s built-in package manager, to create a new project with Cargo use:
This should create a rust project in a rust_intro
directory with the following structure:
The Cargo.toml
file looks something like this:
Cargo.toml
You can now use cargo to build and run the application:
cargo build
builds the applicationcargo run
builds and runs the applicationcargo check
checks that code compiles but does not create an outputcargo fmt
formats source filescargo build --release
builds a release version of the applicationcargo doc --open
opens cargo docs for all crates in the current project
A Basic Program
Basic Program
Let’s define a basic program below which takes an input from stdin, stores it in a variable, and prints it out to the stdout
src/main.rs
In the above, we can see the import statement that imports std::io
:
Next, the String::new()
creates a new String
value that’s assigned to a mutable
variable named guess
. We can either define mutable or immutable variables like so:
stdin().read_line
is a function that takes the input from stdin and appends it to the reference given to it. Like variables, references can also be mutable or immutable, when using a reference we can do either of the following:
The read_line
function returns an io::Result
, in Rust, Result
types are enumerations which are either Ok
or Err
. The Result
type has an expect
method defined which takes in the error message that will be thrown if there is an Err
, otherwise it will return the Ok
value
Leaving out the expect
will result in a compiler warning but the program will still be able to run and compile
Adding a Dependency
To add a dependency we need to add a new line to the Cargo.toml
file with the dependency name and version. We’ll need to add the rand
dependency to generate a random number
Cargo.toml
Then, run cargo build
to update dependencies and build the application. Next, you can use cargo update
Generating a Random Number
To generate a random number we’ll import rand::Rng
and use it as follows:
Even though we’re not using the Rng
import directly, it’s required in scope for the other functions we are using. Rng
is called a trait and this defines what methods/functions we can call from a crate
Parsing Input to a Number
To parse the input to a number we can use the parse
on strings like so:
Note that we’re able to use the same guess
variable as before, this is because rust allows us to shadow a previous variable declaration, this is useful when doing things like converting data types. So the code now looks like this:
Matching
The match
expression is used for flow-control/pattern matching and allows us to do something or return a specific value based on the branch that the code matches.
We can use the Ordering
enum for comparing the value of two numbers:
Looping
To loop, we can use the loop
flow-control structure:
Lastly, we can break out of the loop using break
or move to the next iteration with continue
Using the above two statements, we can update the guess
definition using match
and continue
And when we compare the guess:
Final Product
Combining all the stuff mentioned above, the final code should look something like this:
main.rs
Core Concepts
Variables and Mutability
Variables
By default variables are immutable, but the mut
keyword lets us create a mutable variable as discussed above:
Constants
Rust also makes use of the concept of a constant which will always be the same for the lifetime of a program within their defined scope, and can be defined at any scope (including the global scope). By convention these are also capitalized:
Constants can make use of a few simple operations when defining them but can’t be anything that would need to be evaluated at runtime for example. This is a key distinction between a constant and an immutable variable
Shadowing
Shadowing allows us to redeclare a variable with a specific scope and will shadow the previous declaration after it’s new definition
Shadowing is different to a mutable variable and is pretty much just a way for us to re-declare a variable and reuse an existing name within a scope. This is useful for cases where we’re transforming a value in some way. Shadowing also allows us to change the type of a variable which is something that we can’t do with a mutable variable
For example, the below will work with shadowing but not mutability:
Data Types
Rust is a statically typed language. In most cases we don’t need to explicitly define the type of a variable, however in cases where the compiler can’t infer the type we do need to specify it, e.g. when parsing a string to a number:
We need to specify that the result of the parsing should be a u32
, or if we want to use i32
:
Scalar Types
Rust has 4 scalar types, namely;
- Integers
- Floating-point numbers
- Booleans
- Characters
Integers
An integer is a number without a fraction component. The available integer types are:
length | signed | unsigned |
---|---|---|
8 bit | i8 | u8 |
16 bit | i16 | u16 |
32 bit | i32 | u32 |
64 bit | i64 | u64 |
128 bit | i128 | u128 |
arch | isize | usize |
The arch size uses 32 or 64 bits depending on if the system is a 32 or 64 bit one respectively
Additionally, number literals can use a type suffix as well as an _
for separation, for example the below are all equal:
Floats
Rust has two types of floats, f32
for single-precision and f64
for double-precision values. The default is f64
which is about the same speed as f32
on modern CPUs but with added precision
Operations
The following operations are supported for integer and float operations:
Booleans
Booleans are defined in rust as either true
or false
using the bool
type:
Characters
Character types are defined using char
values specified with single quotes '
and store a Unicode scalar value which supports a bit more than ASCII
Compound Types
Compound types are used to group multiple values into a single type, Rust has two compound types:
- Tuple
- Array
Tuples
Tuples are groups of values. They have a fixed length once declared. Tuples can be defined as with or without an explicit type, for example:
Tuples can also be destructured as you’d expect:
We can also access a specific index of a tuple using a .
followed by the index:
Arrays
Arrays can hold a collection of the same type of value.
Arrays are also fixed-length once defined but other than that they’re pretty much the same as in most other languages. When defining the type of an array we us the syntax of [type; size]
though this can also usually be inferred
An example of defining an array can be seen below:
You can also create an array with the same value repeated using the following notation
Accessing elements of an array can be done using []
notation:
Note that when accessing an array’s elements outside of the valid range you will also get an index our of bounds
error during runtime
Functions
Defining a Function
Functions are defined using the fn
keyword along with ()
and {}
for it’s arguments and body respectively
And can be called as you would in other languages:
Parameters
Function parameters are defined using a name and type, like so:
Statements and Expressions
Functions may have multiple statements and can have an optional ending expression. Statements do not return values whereas expressions do
An example of an expression which can have a value is this scoped block:
In the above case, x
will have the value of the last expression in the block, in this case 10
. As we’ve also seen above, expressions don’t contain a ;
. Adding a ;
to an expression turns it into a statement
Function Return Values
As we’ve discussed in the case of a block above, a block can evaluate to the ending expression of that block. So in the case of a function, we can return a value from a function by having it be the ending expression, however we must also state the return value of the function or it will not compile
A simple function that adds two numbers can be seen below:
Comments
Comments make use of //
and is required at the start of each line a comment is on:
Control Flow
If / Else
if
statements will branch based on a bool
expression. They can optionally contain multiple else if
branches, and an else
branch:
We can have only an if
statement:
Or an if
and else
:
Or even multiple else if
statements with an optional else
branch
Loops
loop
We can use the loop
keyword to define a loop that we can escape by using the break
keyword:
We can use the continue
keyword as well to early-end an iteration like so:
It’s also possible to have loops inside of other looks. We can label a look as well and we can break out of any level of a look by using it’s label:
The above will log out:
Since we’re breaking the 'outer
loop from inside of the inner loop
Loops can also break and return a value like so:
In the above we are able to set last
to the value of the count
when the loop breaks
while
We also have a while
loop which will continue for as long as a specific condition is true
for
A for
loop allows us to iterate through the elements in a collection, we can do this as:
Additionally, it’s also possible to use a for
loop with a range instead of an array:
Ownership
Ownership is a concept of Rust that enables it to be memory safe without the need for a garbage collector
Ownership is implemented as a set of rules that are checked during compile time and does not slow down the program while it runs
Stack and Heap
Data stored on the stack must have a known, fixed size, data that has an unknown or changing size must be stored on the heap. The heap is less organized. Adding values to the heap is called allocating
. When storing data on the heap we also store a reference to its address on the stack. We call this a pointer
. When storing data to a stack we make use of pushing
Pushing to the stack is faster than allocating to the heap, and likewise for accessing data from the stack compared to the heap
When code calls a function, the values passed to the function as well as its variables are stored on the stack or heap. When the function is completed the values are popped off the stack
Ownership keeps track of what data is on the heap, reduces duplication, and cleans up unused data
Ownership Rules
- Each value in Rust has a variable that’s called its owner
- There can only be one owner at a time
- When an owner goes out of scope the value is dropped
Variable Scope
We can see the scope of a variable s
in a given block below:
The String Type
The types covered earlier are all fixed-size, but strings may have different, possibly unknown sizes
A general string is different to a string literal in that string literals are immutable. ASide from the string literal type Rust also has a String
type
We can use the String
type to construct a string from a literal like so:
The type of s
is now a String
compared to a literal which is &str
As mentioned, String
s can be mutable, which means we can add to it like so:
Memory and Allocation
In the case of a string literal the text contents ae known at compile time and is hardcoded into the executable. The String
type can be mutable, which means that:
- Memory must be requested from the allocator during runtime
- Memory must be returned to the allocator when done being used
Rust does this by automatically returning memory once the variable that owns it is no longer in scope by calling a drop
function
Moving a Variable
When assigning and passing values around, we have two cases in Rust, for values which live on the stack:
An assignment like the above creates 2 values of 5 and binds them to the variables x
and y
. This is done by creating a copy of x
on the stack
However, when doing this with a value on the heap, both variables will reference the same value in memory:
As far as memory returning goes, there is a potential problem in the above code, which is that if x
goes out of scope and is dropped, the value for y
would also be dropped - Rust gets around this by invalidating a the previous version of the value can’t be used anymore. Which means doing the below will give us an error:
And this is the error:
Cloning a Variable
If we do want to create an actual deep copy of data from the heap, we can use the clone
method. So on the String
type this would look like:
Using this, both x
and y
will be dropped independent of one another
Ownership and Functions
When passing a variable to a function the new function will take ownership of a variable, and unless we return ownership back to the caller the value will be dropped:
If we want to return ownership back to the caller’s scope, we need to return it from the function like so:
References and Borrowing
The issue with ownership handling as discussed above is that it’s often the case that we would want to use the variable that was passed to a function and that the caller may want to retain ownership of it
Because of this, Rust has a concept of borrowing which is when we pass a variable by reference while allowing the original owner of a variable to remain the owner
We can pass a value by reference using &
when defining and calling the function like so:
Mutable References
By default, references are immutable and the callee can’t modify it’s value. If we want to make a reference mutable we make use of &mut
An important limitation to note is that the only one mutable reference to a variable can exist at a time, which means that the following will not compile:
The error we see is:
The restriction above ensures that mutation of a variable is controlled and helps prevent bugs like race conditions
Race conditions occur when the following behaviors occur:
- Two or more pointers access the same data at the same time
- At least one of the pointers is being used to write to the data
- There’s no mechanism to synchronize access to the data
Dangling References
In other languages with pointers it’s possible to create a pointer that references a location in memory that may have been cleared. With Rust the compiler ensures that there are no dangling references. For example, the below function will not compile:
With the following error:
What’s happpening in the above code can be seen below:
Since in the above code, s
is created inside of the function it will go out of scope when the function is completed. This leads to the value for s
being dropped which means that the result will be a reference to a value that is no longer valid
In order to avoid this, we need to return s
so that we can pass ownership back to the caller
Though the code is now valid, we stil get a warning because result
is not used, however this will not prevent compilation and the above code will still run
Rules of References
The following are the basic rules for working with references
- You can either have onme mutatble reference or any number of immuatable references to a variable simultaneously
- References must always be valid
The Slice Type
The slice type is a reference to a sequence of elements in a collection instead of the collection itself. Since a slice is a kind of reference in itself it doesn’t have ownership of the original collection
To understand the usecase for a slice type, take the following example of a function that needs to get the first word in a string
Since we don’t want to create a copy of the string, we can maybe try something that lets us get the index of the space in the string
Next, say we want to use the above function in our code:
However, we may run into a problem where the position no longer can be used to index the input string
In order to mitigate this, we can make use of a Slice that references a part of this string
The syntax for slicing is to use a range within the brackets of a collection. So for a string:
In the above, since the 0
is the start and 11
is the end of the string, we can also leave these out of the range to automatically get the start and end parts of the collection
The hello
and world
variables not contain a reference to the specific parts of the String
without creating a new value
We can also use the following to refer to the full string:
The type of hello
can also be seen to be &str
which is an immutable reference to a string
Using this, we can redefine the first_word
function like this:
Using the function now will give us an error:
Something else to note is that string literals are stored as slices. If we try to use a string literal with our function like so:
We will get a mismatched types
error:
This is because our function requires &String
, we can change our function instead to use &str
which will work on string references as well as slices:
The above will work with refernecs to String
and str
Other Slice Types
Slices apply to general collections, so we can use it with an array like so:
Slices can also be used by giving them a start element and a length, like so:
Summary
Ownership, borrowing, and slices help us ensure memory safety and control as we have seen above
Structs
A struct is a data type used for packaging and naming related values.
Structs are similar to tuples in that they can hold multiple pieces of data
Defining a Struct
We can define structs using the struct
keyword:
We can create a struct be defining a concrete instance like so:
Struct instances can also be mutable which will allow us to modify properties:
We can also return structs from functions, as well as using the shorthand struct syntax:
We can also use the struct update syntax to create a new struct based on an existing one:
It should also be noted that when creating a struct like this, we can no longer refer to values in the original user
as this will give us an error:
This is because the value has been moved to the new struct
Tuple Structs
Tuple structs can be defined using the following syntax:
And can then be used like:
Unit Type Structs
You can also define a struct that has no data (is unit) by using this syntax:
Unit structs are useful when we want to define a type that implements a trait or some other type but doesn’t have any data on the type itself
Printing Structs
In order to make a struct printable we can use a Debug
attribute with a debugging print specifier of {:?}
like this:
Also, instead of the println!
macro, we can use dbg!
like so:
The dbg!
macro will also return ownership of the input, which means that we can use it while instantiating a Rectangle
as well as when trying to view the entire data:`
And the output:
We can see that the value assigned to height
is logged as well as assigned to the rect
struct
Method Syntax
Methods aan be added to structs using the impl
block. Everything in this block is a part of the Rectangle
struct
The first value passed to a struct method is always a reference to the struct itself. We can add an area
method to the Rectangle
struct like so:
And we can then use this method on our rect
like so:
Note that in the area
function, &self
is shorthand for self: &Self
which is a reference to the current instance
We can also define methods that have the same name as a struct field, these methods can then return something different if called vs when accessed
Multiple Parameters
We can give methods multiple parameters by defining them after self
Associated Functions
Functions defined in an impl
block are called associated functions because they’re associated with the type named in the impl
block
We can also define associated functions that don’t have self
as their first param (and are therefore not methods) like so:
We can use these functions with the ::
syntax:
The above syntax tells us that the function is namespaced by the struct. This syntax is used for associated functions as well as namespaces created by modules
Multiple Impl Blocks
Note that it is also allowed for us to have multiple impl
blocks for a struct, though not necessary in the cases we’ve done here
Enums and Pattern Matching
Enums in Rust are most like algebraic data types in functional languages like F#
Defining an Enum
Enums can be defined using the enum
keyword:
We can also associate a value with an enum like so:
We can further make it such that each of the enum values are of a different type
Enums can also be defined with their data inline:
The impl
keyword can also be used to define methods on enums like with structs:
The Option Enum
The Option Enum defined in the standard library and is defined like so:
The Option
enum is also included by default and can be used without specifying the namespace
Option types help us avoid null values and ensure that we correctly handle for when we do or don’t have data
In general, in order to handle an Option
value we need to ensure that we handle the None
and Some
values
The Match Control Flow Construct
The match
construct allows us to compare a value against a set of patterns and then execute based on that, it’s used like this:
We can also have a more complex body for the match body:
We can also use patterns to handle the data from a match, for example:
The match
can also use an _
to handle all other cases:
Or, can also capture the value like this:
If we would like to do nothing, we can also return unit:`
If-Let Control Flow
In Rust, something that’s commonly done is to do something based on a single pattern match, the previous example can be done like this using if let
which is a little cleaner
We can also use if let
with an else, for example in the below snippet we return a boolean value:
Packages, Crates, and Modules
As programs grow, there becomes a need to organize and separate code to make it easier to manage
- Packages: A cargo feature for building, testing, and sharing crates
- Crates: A tree of modules that produces a library or executable
- Modules and Use: Let you control organization, scope, and privacy
- Paths: A way of naming an item such as a struct, function, or module
Packages and Crates
A package is one or more crate that provide a set of functionality. A package contains a Cargo.toml
file that describes ho to build a crate
Crates can either be a binary or library. Binary crates must have a main
which will run the binary
Libraries don’t have a main
and can’t be executed
Cargo follows a convention that src/main.rs
is a binary crate with the name of the package, and src/lib.rs
is the library crate root
A package can have additional binary crates by placing them in the src/bin
directory. Each file in here will be a separate binary crate
Defining Modules to Control Scope and Privacy
- Starts from the crate root, either
src/main.rs
orsrc/lib.rs
- Modules are declared in the root file using the
mod
keyword. The compiler will look for the module in the following locations. For example, a module calledgarden
- If using
mod garden
followed by curly brackets: inline directly following the declaration - If using
mod garden;
then in the filesrc/garden.rs
orsrc/garden/mod.rs
- If using
- Submodules can be declared in other files and the compiler will follow the similar pattern as above, for example a module
vegetables
in thegarden
module:mod vegetables {...}
as inlinemod vegetables;
assrc/garden/vegetables.rs
orsrc/garden/vegetables/mod.rs
- Paths can refer to code from a module. For example, a type
Carrot
in thevegetables
submodule would be used with:crate::garden::vegetables::Carrot
(as long as privacy rules allow it to be used) - Modules are private by default, and can be made by adding
pub
when declaring the module, for examplepub mod vegetables
- The
use
keyword allows us to create a shorthand for an item, for example doinguse crate::garden::vegetables::Carrot
we can just useCarrot
in a file without the full path
We can create modules using cargo new --lib <LIB NAME>
The src/lib.rs
and src/main.rs
are called crate roots and things accessed relative to here are accessed by the crate
module
Referencing Items by Paths
Paths can ee referenced in one of two ways:
- An absolute path using
crate
- A relative path using
self
,super
, or an identifier in the current module
For example, referencing add_to_waitlist
in the below file may look like this:
Also note the pub
keyword which allows us to access the relevant submodules and function
Best Practices for Packages with a Binary and Library
When a package has a binary src/main.rs
crate root and a src/lib.rs
crate root, both crates will have the package name by default, in this case you should have the minimum necessary code in the main.rs
to start the binary, while keeping the public API in the lib.rs
good so that it can be used by other consumers easily. Like this, the binary crate uses the library crate and allows it to be a client of the library
Relative Paths with Super
Modules can use the super
path to access items from a higher level module:
Bring Paths into Scope
We can bring paths into scope using the use
keyword, this makes it easier to access an item from a higher level scope
For example, say we wanted to add a method undo_order_fix
at the top level of our module file, and we need to us a method from the back_of_house
module, we can do this like so:
Note that if we were to try to use the top-level use
in another submodule this would not work and we would need to import it within the scope of that module
For example, the following will fail:
But this will work:
Idiomatic Paths
In the above examples, we’re importing paths to functions and using them directly, however, this isn’t the preferred way to do this in rust, it’s instead preferred to keep the last-level of the path in order to identify that the path is part of another module. So for example, instead of the above use this instead:
The exception to this rule is when importing structs or enums into scope, we prefer to use the full name, for example:
The exception to this is when bringing two items with the same name into scope:
This is to ensure that we refer to the correct value
Renaming Imports
In order to get around naming issues, we can also use as
to rename a specific import - so the above example can be written like:
Re-exporting Names
We can re-export an item with pub use
, like so:
This is often useful to restructure the imports from a client perspective when our internal module organization is different than what we want to make the public API
Nested Paths
When importing a lo of stuff from a similar path, you can join the imports together in a few ways
For modules under the same parent:
Can become:
For using the top-level as well as some items under a path:
Can become:
And for importing all items under a specific path using a glob:
Note that using globbing can make it difficult to identify where a specific item has been imported from
The Glob operator is often used when writing tests to bring a specific module into scope
Common Collections
Collections allow us to store data that can grow and shrink in size
- Vectors store a variable number of values next to each other
- Strings are collections of characters
- Hash Maps allow us to store a value with a particular key association
Vectors
Vectors allow us to store a list of a single data type.
Creating a Vector
When creating a vector without initial items you need to provide a type annotation:
Or using Vec::from
if we have an initial list of items
The above can also be using the vec!
macro
Updating Values
In order to update the values of a vector we must first define it as mutable, thereafter we can add items to it using the push
method
Dropping a Vector Drops its Elements
When a vector gets dropped, all of it’s contents are dropped - which means that if there are references to an element we will have a compiler error
Reading Elements
We can read elements using either []
notation or the .get
method:
The difference int he two access methods above is that d1
is an &i32
whereas d2
is an Option<$i32>
This is an important distinction since when running the code, the first reference will panic with index out of bounds
Note that as long as we have a borrowed, immutable reference to an item we can’t also use the mutable reference to do stuff, for example we can’t push an item into d
while d1
is still in scope:
The above will not compile with the following error:
The reason for the above error is that vectors allocate items in memory next to one another, which means that adding a new element to a vector may result in the entire vector being reallocated, which would impact any existing references
Looping over Elements
We can use a for in
loop to iterate over elements in a vector, like so:
We can also use the *
dereference operator to modify the elements in the vector:
Which is equivalent to:
And will add 10 to each item in the list
Using Enums to Store Multiple Types
Vectors can only store values that are the same type. In order to store multiple types of data we can store them as enums with specific values. For example, if we want to store a list of users
Remove the Last Element
We can use the pop
method to remove the last element from the vector
UTF-8 Encoded Text as Strings
The String
type differs from str
in that it is growable, mutable, and owned
Creating a String
Strings are created using the new
function:
When we have some initial data for the string, we can use the .to_string
method:
An alternative to the above is:
Updating a String
We can use functions like push_str
to add to a string:
Strings can be updated by using concatenation (+
) or the format!
marco:
If we want to create a new string from a concatenation of multiple other strings then we can use the format!
marco:
Indexing into Strings
Rust doesn’t support string indexing. This is due to how String
is implemented internally and that the resulting string value may not always be the expected value in the string
This is especially relevant with non-numeric characters - since rust supports all UTF-8 Characters, something that seems like a simple string may be encoded as a non-trivial sequence of characters
Slicing Strings
Since indexing isn’t a good idea in rust since the return value can be unexpected - it’s usually more appropriate to create a slice:
Hash Maps
Hash maps store key-value pairs as HashMap<K, V>
. This is bsically a map/object/dictionary and works like so:
Importing
To use a hash map, we need to import it:
Adding Items
We can also create items using a list of keys and a list of values along with the iterators and the collect method
The zip
method gathers data into an iterator of tuples, and the collect
method gathers data into different collection types. In our case, a HashMap
We specify HashMap<_,_>
to tell the collect method that we want a hash map back and not someother kind of collection. We use the _
because rust can still infer the type of the key and value
Get a Value by Key
We can get a value by key using the .entry
method
Note that
users
needs to be defined as mutable in order for the us to use the.entry
method
Update Item
We can update an item by just setting the key and value like when we added them initially:
Update Item if Exists
Another usecase is updating a value in a map only if the value does not exist, this can be done using the .or_insert
method. We can see that here the entry for bob is not updated:
Error Handling
Rust uses the Result<T,E>
type for recoverable errors, and panic!
for errors that should stop execution
Unrecoverable Errors with panic!
We can panic like so:
We can also create a panic by doing something that yields an undefined behaviour:
Which will result in:
Recoverable Errors with Result
Usually if we encounter an error that we can handle, we should return a Result
The Result
type is an enum defined in the standard library as follows:
The result type is always in scope, so it doesn’t need to be imported
The Result
type is usually used with a match expression, for example:
We can also do further specific checks on the type of the error based on theResult
value
The shortcut to panic on error is the .unwrap
method which can be used like so:
Or we can unwrap with a specific error message:
We can also choose to handle only Ok
values by using the ?
operator, and then have the error automatically propagated like so:
The ?
operator will do an early return in the result of an Err
in the above case. The ?
operator can only be used for types that implement FromResidual
like Result
or Option
When to Panic
Generally, we use unwrap
or expect
when prototyping or writing tests, but other than this it can be okay to use .unwrap
in a case where the compiler thinks that we may have a Result
but due to the specific circumstance we know that we won’t have an error
An example of when we know that the data will definitely not be an error can be seen below:
Otherwise it is advisable to panic when it’s possible that code could end up in a bad state, for example if a user enters data in an incorrect format
Generics, Traits, and Lifetimes
Generic Functions
We can define generic functons using the following structure:
For example, a function that finds the first vue in a slice would be defined like so:
Generic Structs
We can use a generic in a struct like so:
Generic Enums
Enums work the same as structs, for example the Option
enum:
Or for the Result
enum:
Method Definitions
methods can also be implemented on generic structs and enums, for example
Traits
A trait defines funcionality that a specific type can have and share with other types. This helps us define shared behaviour in an abstract way
We can use traits to enforce constriants on generics so that we can specify that it meets a specific requirement
Defining a Trait
A trait can be defined usnig the trait
keyword, for example, we can define a trait called Identifier
with a get_id
method:
Implement a Trait
We then create a type that implements this like so:
Default Implementations
Traits can also have a default implementation, for example we can have a Validity
trait which looks like this:
Traits as Parameters
We can specify a trait as a function parameter like so:
The above is syntax sugar for a something known as a trait bound
We can also specify that a value needs to implement multiple traits, as well as nultiple different generics:
Trait bounds can also be speficied using the where
clause when we have multiple bounds:
Conditional Trait Implementation
Traits can also be generically implemented for a generic, like so:
Validate References with Lifetimes
Lifetimes are a kind of generic that ensures that a reference’s data is valid for as long as we need them to be
Prevent Dangling References
Lifetimes aim to prevent dangling references
If we try to run the below code we will have a compiler error:
This is because the reference to x
doesn’t live as long as r
which is in the outer scope
In the above example, we would say that the lifetime of r
is 'a
and the lifetime of x
is 'b
. We can also see that the 'a
block extends beyond the 'b
block, so 'a
outlives 'b
and so r
will not be able to reference x
in the outer scope
Given the following function longest
:
Rust will not be able to compile since it can’t tell whether the returned value refers to x
or y
, and therefore can’t tell which value the result refers to. In this case, when compiling we will get the following error:
In this context, the compuler is telling us that we need to specify the lifetime value and it shows us how we need to do that:
Using lifetime values helps the compiler identify issues more easily
The above also prevents us from using result
outside of the 'b
scope since it’s not valid due to the lifetime value:
We’ll get the following error:
This is because of the lifetime of the result
value being longer than 'b
, and due to the lifetime constraing the value of result
is only valid for the shortest lifetime parameter
Using the concept of lifetimes we can also specify a lifetime parameter that shows that the return type of a function is not dependant on a specific value, for example in the following function, we only care about the lifetime of x
:
Lifetimes in Structs
We can define structs to hold refernces, however if this is done then it becomes necessary to have a lifetime annotation to each reference in the struct’s definition. Struct lifetime definitions look like so:
Lifetime Elision
Often in rust we have cases where the compiler can identify a lifetime rule based on patterns in a function’s body, in this case we don’t necessarily need to specify the lifetime. The rules that the compiler uses to identify these are called lifetime elision rules
Lifetimes on Method Definitions
Lifetimes on methods are specified like so:
By default, the lifetime of the result is always the same as the lifetime ofthe struct, which means that we don’t need to specify it in the input or output values
The Static Lifetime
The 'static
lifetime is a lifetime of a value that’s stored in the program’s inary and is always available. All string literals are 'static
Automated Tests
How to Write Tests
File Structure
A test file should contain tests in their own isolated module using the #[cfg(test)]
attribute, and each test having the #[test]
attribute on it. We usually also want our test to make use of code from a module we want to test, we can do this by using the super:**
import which would be the module exported from the file we’re in
Writing Tests
In the src/lib.rs
file add the following content implementing what was discussed above:
In the above test, we can also see the assert!
macro which will cause a panic if the test fails
The following are some of the other test macros we can use:
Macro | Use |
---|---|
assert! | Check that a boolean is true |
assert_eq! | Check that two values are equal |
assert_neq | Check that two values are not equal |
Running Tests
To run tests, use cargo test
which will automatically run all tests within the project using:
Testing for Panics
We can also state that a test should panic by using the #[should_panic]
attribute on a specific test:
In the above example it’s also possible for us to specify the exact panic message for the test:
Return Result
from a Test
Instead of panicking we can also get a test to return a Result
, for example:
Controlling Test Runs
By default, the test runner will compile and run all tests in parallel, but we can control this more specifically
Consecutive Test Runs
We can limit the number of threads to get tests to run sequentially
Show Test Output
By default, any prints from within a test will not be shown, if we want to see these we can instead use:
Running Tests by Name
We can specify a set of tests to run like so:
So if we want to run all tests with the word hello
:
Ignoring a Test Unless Specified
We can make the default test runner ignore a specific test with the #[ignore]
attribute. The test runnner will then only run it if we specify that it should:
Conventions
Unit Tests
Unit tests are usually contained in the same file as the code being tested in a module called tests
Integration Tests
Integration tests are stored in a tests
directory, cargo knows to find integration tests here
Additionally, for code we want to share between integration tests we can create a module in the tests
directory and import it from there
Functional Language Features
Closures
Closures are anonymous functions that can capture their outer scope
Defining Closures
Closures can be defined in the following ways:
Capturing the Environment
A simple closure which immutably borrows a
which is defined externally can be created and used like so:
Borrowing Mutably
Closures can also make use of mutable borrows. For example, the below closure will add an item to a Vec
:
Types of Closures
There are three traits that closures can implement, depending on which ones they have the compiler will enforce their usage in certain places:
FnOnce
- Can only be called once - Applies if it moves captured values out of its bodyFnMut
- Does not move captured values out of its body but may mutate captured values. Can be called more than onceFn
- Pure closures, don’t move captured values or mutate them, can be called more than once
Iterators
Iterators allow us to perform a sequence of operations over a sequence.
Iterators in Rust are lazy and are not computed until a method that consumes the iterator is called
The Iterator Trait
All iterators implement the Iterator
trait along with a next
method and Item
type:
Consuming Adaptors
Methods that call the next
method are called consuming adaptors because calling them uses up the iterator
These methods take ownership of the iterator which means that after they are called we can’t use the iterator
Examples of this are the sum
or collect
methods
Iterator Adaptors
Methods that transform the iterator are called iterator adaptors and they turn the iterator from one type of iterator into another. These can be chainged in order to handle more complex iterator processes
Since iterator adaptors are lazy we need to call a consuming adaptor in order to evaluate a final values
Examples of this are the map
or filter
methods
Example of Closures and Iterator
An example using iterators with closures can be seen below:
Smart Pointers
A pointer is a general concept for a varaible that contains an addres in memory
Smart pointers act like pointers but have some additional functionality. Smart pointers also differ from references in that they usually own the data
Some smart pointers in the standard library include String
and Vec
Smart pointers are implemented as structs that implement Deref
and Drop
. Deref
allows smart pointer to behave like a reference, and Drop
allows customization of code that gets run when a smart pointer goes out of scope
Some other smart pointers in the standard library include:
Box<T>
- allocating values on the heapRc<T>
- counting references to allow for multiple ownershipRef<T>
,RefMut<T>
- enforce borrowing rules ar runtime
Box<T>
A box is the most straightforward smart pointer, it allows us to store data on the heap along with a pointer to the data on the heap
Boxes don’t have a performance overhead and don’t do lot - often used in the following situations:
- Types that’s size can’t be known at compile time but we want to use them in places that require an exact size
- Large amount of data that we want to transfer ownership of but don’t want the data to be copied
- When you want to own a data but don’t care about the specific type but rather a trait implementation
Using Box<T>
to Store Data on the Heap
Using the Box::new
we can create a value that will be stored on the heap
Often it doesn’t make sense to store simple values like i32
on the heap, this is usually used for more complex data
Recursive Types using Boxes
Recursive types are problematic since rust can’t tell the size of the type at compile time. In order to help, we can Box
a value in the recursive type
For example, take the type below which can be used to represent a person’s reporting structure in a company
A Person
can either be a Boss
or an Employee
with a manager who is either another Employee
or a Boss
Trying to compile the above will result in the following error:
As per the compiler message, we can use a Box
which will fix the unknown size issue since it’s size is known at compile time:
The code can then be used to represent an employee’s reporting hierarchy:
Which prints:
Boxes only provide indirection and heap allocation and don’t do any other special functions