Initially I started writing this article because of the poor quality of Module 4, Lesson 4 in the original version of the course MOC 2780. What's wrong with those 6 pages? Technically nothing, but if you've never seen what cryptography does in security systems, you're lost. Fortunately Microsoft Learning has acknowledged the course and its timetable were "suboptimal" and will be releasing the B-revision soon. From a MOC2780B (revised module 4, lesson 4) perspective, this article can be considered additional reading.
In much the same way as in MOC2780A, the entire Module 4 of the MOC 2787 is affected as well. Therefor reading this article is recommended in preparation of the exams 70-443, 70-444 and 70-447.
To be honest, I don't know how relevant the encryption stuff is towards the exam 70-431. I don't recall it from the beta exam I sat, nor have I read reports that specifically included cryptography. The preparation guide provides little guidance on it, it only says Configure encryption under Installing and Configuring SQL Server 2005, Configure SQL Server security, but that may be aimed at protocol encryption. Should you want to play at safe while preparing based on the MS Press Self-Paced Training Kit "SQL Server 2005, Implementation and Maintenance" (ISBN-10: 0-7356-2271-X), this article provides the necessary background for Chapter 2, Lesson 6. If you prepare for the exam based on the Sybex MCTS Study Guide "Microsoft SQL Server 2005 Implementation and Maintenance" (ISBN-10: 0-470-02565-4) this article is a must read, the Sybex book doesn't cover SQL Servers cryptography.
----
First I'll describe the tools needed for cryptographic security;
- Cryptographic hashes (are not mentioned in courseware, but included here for completeness)
- Key based encryption
- Symmetric keys
- Asymmetric key-pairs (private key & public key)
- Certificates
- Passwords/passphrases
and then continue to encryption hierarchy, combining different forms of cryptography and the double protection of keys and certificates.
Cryptographic hashes
Cryptographic hashes are in security related documentation mostly referred to as just hash. However in a database context, you may also read about hashes in the terms like hash indices and hash joins. Hash indices and hash joins are related to the query processor and have nothing to do with the security subsystem. So when you encounter hash in SQL Server documentation, verify if it is meant cryptographic or related to the query processor. In the remainder of this article, the term hash will refer to a cryptographic hash.
Now for what a hash does, a hash is based on some input of any length. This input is divided in blocks and based on these blocks, calculations are performed. These combined calculations are called the algorithm. One of the characteristics of the algorithm is that it's output has a fixed length. In SQL Server the following algorithms for hashes can be used, with the output length in bits between parentheses; MD2(128), MD4(128), MD5(128), SHA(160) aka SHA0, SHA1(160). The output of the hashing operation may be referred to as hash or as Message Digest, the input of a hash operation may also be named Message. The most important characteristic of a hash is that it is one way, so you can calculate the Message Digest of a Message, but it isn't possible to retrieve the Message if you only have the Message Digest.
You can call a hash function from SQL Server directly, see HASHBYTES for more info.
The typical application for hashes is to verify the original content, without having to store the original content. For example with passwords, where systems typically store the password hash, not the password itself. When the user types the password, it is hashed and compared to the stored hash. If they match, the user has proven to know the password. You may also encounter hashes with downloads, where the (MD5) hash of the downloadable file is posted. After you downloaded the file, you can compute the hash of the file. When it matches the posted hash, the download was successful. Check out winMd5Sum as example.
Key based encryption
When encrypting information, the original information is referred to as plaintext and the encrypted information as ciphertext (or cyphertext). During the encryption operation a key is applied to the plaintext based on an algorithm (the keys' properties must match the algorithm), resulting in unintelligible ciphertext. The only way to read the ciphertext is by decrypting it, which again involves applying a key to the ciphertext based on the corresponding algorithm.
The words plaintext and ciphertext could be a little misleading, as you might think of it as text, but plaintext and ciphertext can be binary too. Specific for SQL Server 2005, the key based encryption functions can handle the data types; char, varchar, nchar, nvarchar, binary, varbinary. Other data types should be casted as one of the afore mentioned datatypes, also the datatypes are limited to 8000 bytes (actually the encrypted data is limited to 8000 bytes, which means the plaintext usually is shorter based on the algorithm used)!
Symmetric keys
With symmetric keys, the algorithm performing the operations uses the same (symmetric) key for both the encryption and decryption operation. Symmetric keys are considered fast for cryptographic operations in comparison to asymmetric keys, naturally the real speed of the encryption and decryption depends on the encryption algorithm, the length (in bits) of the symmetric key and the available processing power. With encryption, the algorithm is bound to the symmetric key, so during creation you must specify for which algorithm (DES, DESX, Triple DES, RC2, RC4, AES 128, AES 192, AES 256) the key is created. The second requirement when creating a symmetric key is that it is encrypted. It may sound strange, encrypting a key, but in reality the key is the most vulnerable part in cryptographic systems. By encrypting the symmetric key (that may have encrypted numerous fields in the database), we prevent someone who has no access to the decryption of the symmetric key, to access the data protected by this key. At the end of this article, you will see that you can build/use a hierarchy of keys that will enable you to keep numerous secrets by guarding only a few.
Creating a symmetric key is done via CREATE SYMMETRIC KEY. After creation, the key is stored in the database and information about the key can be retrieved from the system view sys.symmetric_keys. As you may see in this view, the key has a name and a GUID (you'll also see the columns for storing the algorithm). Both the name and the GUID are important when identifying the key in encryption and decryption operations.
To make use of a symmetric key, this symmetric key should be open. This has to be done because the symmetric key is stored in an encrypted state. So you can only make use of the symmetric key when you have access to the key or know the password that was used to encrypt the symmetric key. For syntax, see OPEN SYMMETRIC KEY.
The final step is using the symmetric key to encrypt and decrypt information. This can be done using the functions EncryptByKey and DecryptByKey. Note that you can include an "authenticator", which basically is a property of the record that you can encrypt along with the data.
To see how the symmetric key based encryption should be used, please take a look at the example by Laurentiu Cristofor.
Asymmetric keys
With asymmetric keys, the algorithm performing the cryptographic operations uses a key pair. The keys of this pair (let's call them Pub and Priv right away) match in such a way that one key can undo the operation of the other key. So if you have plaintext and encrypt this with the key Pub, the resulting ciphertext can only be decrypted with Priv. The other way around, when you use Priv to encrypt some plaintext, the only way to decrypt the resulting ciphertext is using Pub.
By calling the key-pair Priv (private key) and Pub (public key), the main area of use is very clear. While a person or system can share it's public key with the whole world, the private key is kept secret. Now let's say that two people, Alice and Bob, want to exchange information, without anyone else being able to learn about the exchanged information. Both Alice and Bob have a private key (which each of them keeps a secret) and a public key (known to the both of them and the rest of the world). Now Alice wants to leave Bob a message to.... (well, it's to remain unknown to the rest of the world), so Alice encrypts the message with Bob's public key (Pub-B). With Bob being the only one who has access to Bob's private key (Priv-B), Alice knows only Bob can decrypt the message and she can safely store the encrypted message for Bob to read. After decrypting the message, Bob wants to answer Alice in an equally secure manner, so he uses Pub-A to encrypt the message. Even with everybody being able to retrieve the encrypted message, only Alice can decrypt it, for she has Priv-A.
Alice encrypts and stores:
EncryptPub-B(Plaintext_1:I am Alice) = Ciphertext_1
Bob retrieves and decrypts:
DecryptPriv-B(Ciphertext_1) = Plaintext_1:I am Alice
-------------------------------------
Bob encrypts and stores:
EncryptPub-A(Plaintext_2:I am Bob) = Ciphertext_2
Alice retrieves and decrypts:
DecryptPriv-A(Ciphertext_2) = Plaintext_2:I am Bob
Besides preventing information disclosure, asymmetric keys can also provide authentication. In the previous information exchange, Alice and Bob wanted to keep the information undisclosed. But what if Oscar wants to trick Bob and pretend he is Alice. Oscar would only have to fetch Bob's public key, encrypt the message with Pub-B and state in the message that he is Alice.
Oscar encrypts and stores:
EncryptPub-B(Plaintext_1:I am Alice) = Ciphertext_1
Bob retrieves and decrypts:
DecryptPriv-B(Ciphertext_1) = Plaintext_1:I am Alice
To prevent Oscar (or anyone else) from manipulating the flow of information, Alice and Bob agree to encrypt the message with their private keys prior to storing. So Alice has a message; "plaintext" and encrypts it with Pub-B (she now knows only Bob can decrypt it). Next she encrypts the resulting ciphertext with Priv-A and stores that message. When Bob retrieves that message, he uses Pub-A to decrypt the first stage, this verified that the message was encrypted by Alice (as only she has access to Priv-A), next he decrypts the message with Priv-B and now has access to the plaintext stored by Alice.
Alice encrypts and stores:
EncryptPriv-A(EncryptPub-B(Plaintext_1:I am Alice)) = Ciphertext_1
Bob retrieves and decrypts:
DecryptPub-A(DecryptPriv-B(Ciphertext_1)) = Plaintext_1:I am Alice
-------------------------------------
Bob encrypts and stores:
EncryptPriv-B(EncryptPub-A(Plaintext_2:I am Bob)) = Ciphertext_2
Alice retrieves and decrypts:
DecryptPub-B(DecryptPriv-A(Ciphertext_2)) = Plaintext_2:I am Bob
So it doesn't matter that Oscar has access to all public keys and can retrieve the ciphertext, because Priv-A and Priv-B are a secret held by their respective owners, Oscar can't interfere with this process. The outer encryption, where the originator uses its private key, is also referred to as signing. Also note that a message does not have to be encrypted to be signed.
Alice signs and stores:
EncryptPriv-A(Plaintext_1:I am Alice) = Ciphertext_1
Anyone retrieves and verifies:
DecryptPub-A(Ciphertext_1) = Plaintext_1:I am Alice
Like with symmetric key, asymmetric keys can be created with a T-SQL CREATE statement and viewed through a security catalog view; CREATE ASYMMETRIC KEY and sys.asymmetric_keys. Asymmetric keys are based on the RSA-algorithm with keys of 512, 1024 or 2048 bits. As you learned above, the private key is to remain secret, so in order to protect it, the private key has to be stored in encrypted form. Unlike symmetric key based encryption and decryption, with asymmetric key based operations the keys do not need to be opened. However when performing an operation that requires access to the private key, the private key must be decrypted during the operation. The following functions are performed with the public keys; EncryptByAsmKey and VerifySignedByAsmKey. The operations performed with a private key are; DecryptByAsmKey and SignByAsmKey.
Certificates
Very close to the asymmetric keys are certificates. In fact, nothing changes on the side of the private key and the public key. The certificate is only used to store properties associated with X.509 v1 certificates and associate those with the public key, read Laurentiu's post for more info. Because certificates are named different from asymmetric keys, you'll get different syntax and another security catalog view; CREATE CERTIFICATE, sys.certificates, EncryptByCert, VerifySignedByCert, DecryptByCert, SignByCert. However certificates do have a major advantage over asymmetric keys; you can backup a certificate (certificates were designed with .CER files in mind); BACKUP CERTIFICATE.
Passwords and passphrases
As you may have noticed in the syntax for creating and opening symmetric keys and when creating or using the private key on asymmetric keys and certificates, there is the option to use a password (ENCRYPTION BY PASSWORD='@v3RyCo/\/\pl&xPa$suu0rD'). When a password is provided on creation of the key or certificate, that password is used as a sort of symmetric key to provide the necessary encryption. Also, passphrases can be used to encrypt data directly without the need for any keys and key handling. This can be done with the functions EncryptByPassPhrase and DecryptByPassPhrase. Passwords and passphrases are essentially the same kind of thing, though we see passwords as hard to guess character strings and passphrases as long but easy to remember phrases. Passphrases typically contain a lot of spaces, whereas passwords contain mixed case, numbers and symbols and are usually enforced by a password policy.
Encryption hierarchy
Encrypting keys and certificates with a password is one option of protecting that key or certificate, but it does involve a lot of password management. Another option is building an encryption hierarchy in SQL Server 2005. To be able to explain the hierarchy, two special symmetric keys must be introduced;
- SERVICE MASTER KEY
- DATABASE MASTER KEY
Service master key
The service master key is created when SQL Server 2005 is installed. The key is created based on and protected with the credentials of the Windows Account that is used as the SQL Server Service account. The service master key is used to encrypt password which are stored in the master database (like SQL logins and credentials for linked servers), also this key acts as the root for the SQL Server encryption hierarchy. The service master key can not be created or dropped, it can however be altered, backed up and restored. In fact, backing up the service master key is recommended right after installing the SQL Server instance. See ALTER SERVICE MASTER KEY for more information.
Database master key
The database master key can be created as a starting point for encryption in the database. This can be done with the statement CREATE MASTER KEY which must include the ENCRYPTION BY PASSWORD. This creates the master key in the database (encrypted by the password and by the service master key), also the database master key is stored in the master database where it is encrypted with the service master key. The database master key is stored in the master database to facilitate automatic decryption. It is possible to drop the copy from the master database, but then it is required to manually open a database master key prior to using it. Like the service master key, it is best practice to backup the database master key too. See CREATE MASTER KEY for more information.
Encrypting asymmetric keys and certificates
When the ENCRYPTION BY PASSWORD directive is omitted for an asymmetric key or certificate when it is created, the master key of the database where the asymmetric key or certificate will be stored is used to encrypt the private key. This way, the new key or certificate is automatically tied into the encryption hierarchy of SQL Server. Asymmetric keys and certificates can only be encrypted once, so when you alter them, you can switch between encryption by the database master key and encryption by a password, or in the later case change the encryption password.
Multiple encryptions of symmetric keys
Normal symmetric keys and database master keys can be encrypted more than once. For the database master key, this is very convenient, as it is both possible to use cryptographic functionality transparently (based on the encryption by the service master key) and transfer the database to another instance while preserving the encryption hierarchy (based on the encryption by password). For normal symmetric keys this is very convenient as well, as multiple users may need access to the same encrypted data. The bulk of the data (think like thousands of records with an encrypted field) does not need to be encrypted multiple times and can be accessed through the same symmetric key. This symmetric key is, depending on the capabilities users/processes, accessible through one or more passwords, other symmetric keys, asymmetric keys and/or certificates.
Combining cryptographic operations
Typically symmetric and asymmetric key (or certificate) based encryptions are combined to achieve the desired security level, while still maintaining good performance. If you revisit the example, you'll notice that the data in the column is encrypted with the symmetric key (remember: symmetric key = fast). The table in the example only has 2 rows, but that same table could hold millions of rows, making the algorithm doing the encryption and decryption very important in terms of performance. Because all this data is encrypted with a single symmetric key, this key should be well protected. The example used a password to encrypt the key, but a very common approach is to secure the symmetric key with a certificate (or asymmetric keys, which essentially comes down to the same). As a non-SQL Server example, EFS takes the same route, using a symmetric key to encrypt a potentially big file and then encrypting that symmetric key with the public key for each user that should have access to the file.
Now for SQL Server 2005, Laurentiu has a great 2nd example where he uses the possibilities offered by the encryption hierarchy. A database master key is created and used to encrypt the certificates for all users (dbo and Charlie) participating in the example. The certificates are then used to encrypt the symmetric key and the symmetric key will be used to encrypt the (2, but potentially thousands) salaries in the t_employees table.
Interesting to note about the 2nd example is the function DecryptByKeyAutoCert. The main advantage of this function is that it utilizes the encryption hierarchy and transparently opens keys on demand (and closes them after the operation completed). Similar functionality is provided by the function DecryptByKeyAutoAsymKey, if the symmetric key is encrypted with an asymmetric key.
Permissions on keys and certificates
One important thing not yet mentioned about the keys and certificates, is that they are securables. So in order to use them, a principal should be granted the necessary privileges on the keys and certificates. If you look closer at the second example, you see it is no issue at first, as everything is done as dbo, so dbo automatically is owner (and in full control) of the keys and certificates created. But when Charlie makes his appearance, he must be granted the necessary privileges; naturally select on the views, but also VIEW DEFENITION on the symmetric key (to be able to use it) and CONTROL on the certificate. Alternately Charlie could have been made owner of the certificate when it was created through the AUTHORIZATION directive. Bottom line is, to use a key or certificate, you must be granted the proper access to the key or certificate and be able to decrypt it. For more information on permissions, see GRANT.
Cryptography and authentication
What this article doesn't cover is signing programmed modules and mapping users and logins to certificates. However, I will add those topics when covering impersonation (EXESUTE AS on the todo-list).
Recommended reading: Handbook of Applied Cryptography, Laurentiu Cristofor's blog, Wikipedia. Not exactly reading, but check out those 4 presentation from the 2006 PASS conference.