Linkedin工程师如何优化他们的Java代码

2014-12-19 11:22| 发布者: joejoe0332| 查看: 2123| 评论: 0|原作者: Linkedin|来自: greenrobot

摘要: 最近在刷各大公司的技术博客的时候，我在Linkedin的技术博客上面发现了一篇很不错博文。这篇博文介绍了Linkedin信息流中间层Feed Mixer，它为Linkedin的Web主页，大学主页，公司主页以及客户端等多个分发渠道提供支 ...

4. 提前编译正则表达式

　　字符串的操作在Java中算是开销比较大的操作。还好Java提供了一些工具让正则表达式尽可能地高效。动态的正则表达式在实践中比较少见。在接下来要举的例子中，每次调用 String.replaceAll() 都包含了一个常量模式应用到输入值中去。因此我们预先编译这个模式可以节省CPU和内存的开销。

优化前：
?
1
2
3
private String transform(String term) {
return outputTerm = term.replaceAll(_regex, _replacement);
}
优化后：
?
1
2
3
4
private final Pattern _pattern = Pattern.compile(_regex);
private String transform(String term) {
String outputTerm = _pattern.matcher(term).replaceAll(_replacement);
}

5. 尽可能地缓存Cache it if you can

　　将结果保存在缓存里也是一个避免过多开销的方法。但缓存只适用于在相同数据集撒花姑娘吗的相同数据操作（比如对一些配置的预处理或者一些字符串处理）。现在已经有多种LRU（Least Recently Used ）缓存算法实现，但是Linkedin使用的是 Guava cache (具体原因见这里) 大致代码如下：

private final int MAX_ENTRIES = 1000;
private final LoadingCache<String, String> _cache;
// Initializing the cache
_cache = CacheBuilder.newBuilder().maximumSize(MAX_ENTRIES).build(new CacheLoader<String,String>() {
@Override
public String load(String key) throws Exception {
return expensiveOperationOn(key);
}
}
);
  
//Using the cache
String output = _cache.getUnchecked(input);

6. String的intern方法有用，但是也有危险

　　String 的 intern 特性有时候可以代替缓存来使用。

　　从这篇文档，我们可以知道：

“A pool of strings, initially empty, is maintained privately by the class String. When the intern method is invoked, if the pool already contains a string equal to this String object as determined by the equals(Object) method, then the string from the pool is returned. Otherwise, this String object is added to the pool and a reference to this String object is returned”.

　　这个特性跟缓存很类似，但有一个限制，你不能设置最多可容纳的元素数目。因此，如果这些intern的字符串没有限制（比如字符串代表着一些唯一的 id），那么它会让内存占用飞速增长。Linkedin曾经在这上面栽过跟头——当时是对一些键值使用intern方法，线下模拟的时候一切正常，但一旦部署上线，系统的内存占用一下就升上去了（因为大量唯一的字符串被intern了）。所以最后Linkedin选择使用 LRU 缓存，这样可以限制最大元素数目。