一致性Hash算法(KetamaHash)的c#实现-c#-IT落伍者

最近在研究一致性HASH算法（Consistent Hashing）用于解决memcached集群中当服务器出现增减变动时对散列值的影响后来在JAVAEYE上的一篇文章中找到了其中的 KetamaHash 算法的JAVA实现（一种基于虚拟结点的HASH算法）于是为了加深理解对照 JAVA版本用C#重写了一个放到这里如果大家感兴趣的话可以下载测试一下如果发现写法有问题请及时告之我以便我及时修正

下面是对Ketama的介绍

Ketama is an implementation of a consistent hashing algorithm meaning you can add or remove servers from the memcached pool without causing a complete remap of all keys

Heres how it works:

* Take your list of servers (eg: : : :)

* Hash each server string to several () unsigned ints

* Conceptually these numbers are placed on a circle called the continuum (imagine a clock face that goes from to ^)

* Each number links to the server it was hashed from so servers appear at several points on the continuum by each of the numbers they hashed to

* To map a key>server hash your key to a single unsigned int and find the next biggest number on the continuum The server linked to that number is the correct server for that key

* If you hash your key to a value near ^ and there are no points on the continuum greater than your hash return the first server in the continuum

If you then add or remove a server from the list only a small proportion of keys end up mapping to different servers

我的理解这类方法其实有点像大学里的微积分的思想（通过连续函数的取值区间来计算图形面积）通过把已知的实结点（memcached服务IP端口）列表结成一个圆然后在两两实结点之间放置尽可能多的虚拟节点(上面文中的unsigned ints) 假设用户数据映射在虚拟节点上（用户数据真正存储位置是在该虚拟节点代表的实际物理服务器上）这样就能抑制分布不均匀最大限度地减小服务器增减时的缓存重新分布如下图

下面是添加结点时

Consistent Hashing最大限度地抑制了hash键的重新分布另外要取得比较好的负载均衡的效果往往在服务器数量比较少的时候需要增加虚拟节点来保证服务器能均匀的分布在圆环上因为使用一般的hash方法服务器的映射地点的分布非常不均匀使用虚拟节点的思想为每个物理节点（服务器）在圆上分配～个点这样就能抑制分布不均匀最大限度地减小服务器增减时的缓存重新分布用户数据映射在虚拟节点上就表示用户数据真正存储位置是在该虚拟节点代表的实际物理服务器上

了解了原理下面看一下具体实现

JAVA实现代码取自Spy Memcached client中的类原文的作者也尽可能的对代码进行注释说明所以让我剩了不少时间

下面是相应的NET实现（注释参考JAVA版本）:

public class KetamaNodeLocator

{

//原文中的JAVA类TreeMap实现了Comparator方法这里我图省事直接用了net下的SortedList其中Comparer接口方法）

private SortedList<long string> ketamaNodes = new SortedList<long string>();

private HashAlgorithm hashAlg;

private int numReps = ;

//此处参数与JAVA版中有区别因为使用的静态方法所以不再传递HashAlgorithm alg参数

public KetamaNodeLocator(List<string> nodes int nodeCopies)

{

ketamaNodes = new SortedList<long string>();

numReps = nodeCopies;

//对所有节点生成nCopies个虚拟结点

foreach (string node in nodes)

{

//每四个虚拟结点为一组

for (int i = ; i < numReps / ; i++)

{

//getKeyForNode方法为这组虚拟结点得到惟一名称

byte[] digest = puteMd(node + i);

/** Md是一个字节长度的数组将字节的数组每四个字节一组分别对应一个虚拟结点这就是为什么上面把虚拟结点四个划分一组的原因*/

for (int h = ; h < ; h++)

{

long m = HashAlgorithmhash(digest h);

ketamaNodes[m] = node;

}

public string GetPrimary(string k)

{

byte[] digest = puteMd(k);

string rv = GetNodeForKey(HashAlgorithmhash(digest ));

return rv;

}

string GetNodeForKey(long hash)

{

string rv;

long key = hash;

//如果找到这个节点直接取节点返回

if (!ketamaNodesContainsKey(key))

{

//得到大于当前key的那个子Map然后从中取出第一个key就是大于且离它最近的那个key 说明详见:

var tailMap = from coll in ketamaNodes

where collKey > hash

select new { collKey };

if (tailMap == null || tailMapCount() == )

key = ketamaNodesFirstOrDefault()Key;

else

key = tailMapFirstOrDefault()Key;

}

rv = ketamaNodes[key];

return rv;

}

下面的代码与JAVA中有所不同它使用静态方法实现

public class HashAlgorithm

{

public static long hash(byte[] digest int nTime)

{

long rv = ((long)(digest[ + nTime * ] & xFF) << )

| ((long)(digest[ + nTime * ] & xFF) << )

| ((long)digest[ + nTime * ] & xFF);

return rv & xffffffffL; /* Truncate to bits */

}

/**

* Get the md of the given key

public static byte[] computeMd(string k)

{

MD md = new MDCryptoServiceProvider();

byte[] keyBytes = mdComputeHash(EncodingUTFGetBytes(k));

mdClear();

//mdupdate(keyBytes);

//return mddigest();

return keyBytes;

}