Ruby 2.4 的一些新特性

2016-7-23 16:28| 发布者: joejoe0332| 查看: 1433| 评论: 0|原作者: leoxu|来自: oschina

摘要: Ruby 2.4 为正则表达式新增加了一个新的 #match? 方法，它比 Ruby 2.3 中Regexp的任何一个方法都要快三倍: ……

使用了 Regexp#match? 的更快的正则表达式

Ruby 2.4 为正则表达式新增加了一个新的 #match? 方法，它比 Ruby 2.3 中Regexp的任何一个方法都要快三倍:

require 'benchmark/ips'
 
Benchmark.ips do |bench|
  EMPTY_STRING  = ''
  WHITESPACE    = "   \n\t\n   "
  CONTAINS_TEXT = '   hi       '
 
  PATTERN = /\A[[:space:]]*\z/
 
  bench.report('Regexp#match?') do
    PATTERN.match?(EMPTY_STRING)
    PATTERN.match?(WHITESPACE)
    PATTERN.match?(CONTAINS_TEXT)
  end
 
  bench.report('Regexp#match') do
    PATTERN.match(EMPTY_STRING)
    PATTERN.match(WHITESPACE)
    PATTERN.match(CONTAINS_TEXT)
  end
 
  bench.report('Regexp#=~') do
    PATTERN =~ EMPTY_STRING
    PATTERN =~ WHITESPACE
    PATTERN =~ CONTAINS_TEXT
  end
 
  bench.report('Regexp#===') do
    PATTERN === EMPTY_STRING
    PATTERN === WHITESPACE
    PATTERN === CONTAINS_TEXT
  end
 
  bench.compare!
end
 
# >> Warming up --------------------------------------
# >>        Regexp#match?   160.255k i/100ms
# >>         Regexp#match    44.904k i/100ms
# >>            Regexp#=~    71.184k i/100ms
# >>           Regexp#===    71.839k i/100ms
# >> Calculating -------------------------------------
# >>        Regexp#match?      2.630M (± 4.0%) i/s -     13.141M in   5.004929s
# >>         Regexp#match    539.361k (± 3.9%) i/s -      2.694M in   5.002868s
# >>            Regexp#=~    859.713k (± 4.2%) i/s -      4.342M in   5.060080s
# >>           Regexp#===    872.217k (± 3.5%) i/s -      4.382M in   5.030612s
# >>
# >> Comparison:
# >>        Regexp#match?:  2630002.5 i/s
# >>           Regexp#===:   872217.5 i/s - 3.02x slower
# >>            Regexp#=~:   859713.0 i/s - 3.06x slower
# >>         Regexp#match:   539361.3 i/s - 4.88x slower

当你调用 Regexp#===, Regexp#=~, 或者是 Regexp#match 时, Ruby 会使用匹配结果MatchData 来对 $~ 全局变量进行设置:

/^foo (\w+)$/ =~ 'foo bar'      # => 0
$~                              # => #<MatchData "foo bar" 1:"bar">
 
/^foo (\w+)$/.match('foo baz')  # => #<MatchData "foo baz" 1:"baz">
$~                              # => #<MatchData "foo baz" 1:"baz">
 
/^foo (\w+)$/ === 'foo qux'     # => true
$~                              # => #<MatchData "foo qux" 1:"qux">

Regexp#match? 会返回一个布尔值并且避免构建一个MatchData对象或者更新全局状态:

1 2	`/^foo (\w+)$/.match?('foo wow')` `# => true` `$~` `# => nil`

通过直接跳过全局变量的操作，Ruby就能够避免为 MatchData分配内存。

新的用于 Enumerable 的 #sum 方法

现在你可以在任意一个 Enumerable 对象上调用 #sum 方法了:

1	`[1,` `1,` `2,` `3,` `5,` `8,` `13,` `21].sum` `# => 54`

#sum 方法有一个默认为 0 的可选参数。这个值是求和计算的起始值，意思是 [].sum 的结果为0。

如果你在一个非整形数组上调用 #sum，那么你就要提供一个初始值才行:

class ShoppingList
  attr_reader :items
 
  def initialize(*items)
    @items = items
  end
 
  def +(other)
    ShoppingList.new(*items, *other.items)
  end
end
 
eggs   = ShoppingList.new('eggs')          # => #<ShoppingList:0x007f952282e7b8 @items=["eggs"]>
milk   = ShoppingList.new('milks')         # => #<ShoppingList:0x007f952282ce68 @items=["milks"]>
cheese = ShoppingList.new('cheese')        # => #<ShoppingList:0x007f95228271e8 @items=["cheese"]>
 
eggs + milk + cheese                       # => #<ShoppingList:0x007f95228261d0 @items=["eggs", "milks", "cheese"]>
[eggs, milk, cheese].sum                   # => #<TypeError: ShoppingList can't be coerced into Integer>
[eggs, milk, cheese].sum(ShoppingList.new) # => #<ShoppingList:0x007f9522824cb8 @items=["eggs", "mi

在代码的最后一行，一个空的购物清单(ShoppingList.new)被提供出来作为初始值了。

一个用于检查目录或者文件是否为空的新方法

在 Ruby 2.4 中你可以使用 File 或者 Dir 模块来检查目录护着文件是否为空:

1

2

3

4

5

Dir.empty?('empty_directory')      # => true
Dir.empty?('directory_with_files') # => false
 
File.empty?('contains_text.txt')   # => false
File.empty?('empty.txt')           # => true

File.empty? 方法等同于 File.zero? 现在它在所有维护的 Ruby 版本中都已经是可用的了:

1 2	`File.zero?('contains_text.txt')` `# => false` `File.zero?('empty.txt')` `# => true`

不幸的是这些还不能用于 Pathname。

从Regexp匹配结果中提取被命名的匹配值

在 Ruby 2.4 中你可以在一个Regexp匹配结果上调用 #named_captures 来得到一个包含了你所命名的匹配分组以及它们所对应值的哈希表:

1 2	`pattern = /(?<first_name>John) (?<last_name>\w+)/` `pattern.match('John Backus').named_captures` `# => { "first_name" => "John", "last_name" => "Backus" }`

Ruby 2.4 还增加了一个 #values_at 方法用来提取你关心的被命名的匹配值:

1 2	`pattern = /(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})/` `pattern.match('2016-02-01').values_at(:year,` `:month)` `# => ["2016", "02"]`

#values_at 方法对于基于位置的匹配分组也能用:

1 2	`pattern = /(\d{4})-(\d{2})-(\d{2})$/` `pattern.match('2016-07-18').values_at(1,` `3)` `# => ["2016", "18"]`

新的 Integer#digits 方法

如果你想要访问一个整型数中特定位置（从右至左）上的数字，那就可以使用 Integer#digits:

1

2

3

4

5

digits                  # => [3, 2, 1]
digits[0]               # => 3
 
# Equivalent behavior in Ruby 2.3:
to_s.chars.map(&:to_i).reverse # => [3, 2, 1]

如果给定一个非十进制的数你想要知道它的基于位置的数字信息，可以传入一个不同的基数。例如，要查看一个十六进制数字的基于位置的数字信息，你可以传入16:

1 2	`0x7b.digits(16)` `# => [11, 7]` `0x7b.digits(16).map { \|digit\| digit.to_s(16) }` `# => ["b", "7"]`

对 Logger 接口的提升

在 Ruby 2.3 中 Logger 库设置起来会有一点点麻烦:

logger1 = Logger.new(STDOUT)
logger1.level    = :info
logger1.progname = 'LOG1'
 
logger1.debug('This is ignored')
logger1.info('This is logged')
 
# >> I, [2016-07-17T23:45:30.571508 #19837]  INFO -- LOG1: This is logged

Ruby 2.4 将这一配置挪到了 Logger 的构造器中:

1

2

3

4

5

6

logger2 = Logger.new(STDOUT, level: :info, progname: 'LOG2')
 
logger2.debug('This is ignored')
logger2.info('This is logged')
 
# >> I, [2016-07-17T23:45:30.571556 #19837]  INFO -- LOG2: This is logged

将 CLI 选项解析成了一个哈希表

使用 OptionParser解析命令行标识常常涉及到许多要将选项向下解析成哈希表的套路化的东西：

require 'optparse'require 'optparse/date'require 'optparse/uri'config = {}cli =
  OptionParser.new do |options|
    options.define('--from=DATE', Date) do |from|
      config[:from] = from
    end
 
    options.define('--url=ENDPOINT', URI) do |url|
      config[:url] = url
    end
 
    options.define('--names=LIST', Array) do |names|
      config[:names] = names
    end
  end

现在你可以通过在对参数进行解析时的:into关键词参数来提供一个哈希表了:

require 'optparse'
require 'optparse/date'
require 'optparse/uri'
 
cli =
  OptionParser.new do |options|
    options.define '--from=DATE',    Date
    options.define '--url=ENDPOINT', URI
    options.define '--names=LIST',   Array
  end
 
config = {}
 
args = %w[
  --from  2016-02-03
  --url   https://blog.blockscore.com/
  --names John,Daniel,Delmer
]
 
cli.parse(args, into: config)
 
config.keys    # => [:from, :url, :names]
config[:from]  # => #<Date: 2016-02-03 ((2457422j,0s,0n),+0s,2299161j)>
config[:url]   # => #<URI::HTTPS https://blog.blockscore.com/>
config[:names] # => ["John", "Daniel", "Delmer"]

更快的 Array#min 和 Array#max

在 Ruby 2.4 中 Array 类定义了它自己的 #min和#max实例方法。这一修改戏剧性地提升了Array上 #min 和 #max 方法的运行速度:

require 'benchmark/ips'
 
Benchmark.ips do |bench|
  NUMS = 1_000_000.times.map { rand }
 
  # By binding the Enumerable method to our array
  # we can bench the previous speed in Ruby 2.3
  ENUM_MIN  = Enumerable.instance_method(:min).bind(NUMS)
 
  # Bind the `#min` method to our test array also
  # so our benchmark code is as similar as possible
  ARRAY_MIN = Array.instance_method(:min).bind(NUMS)
 
  bench.report('Array#min') do
    ARRAY_MIN.call
  end
 
  bench.report('Enumerable#min') do
    ENUM_MIN.call
  end
 
  bench.compare!
end
 
# >> Warming up --------------------------------------
# >>            Array#min     3.000  i/100ms
# >>       Enumerable#min     2.000  i/100ms
# >> Calculating -------------------------------------
# >>            Array#min     35.147  (± 2.8%) i/s -    177.000  in   5.039133s
# >>       Enumerable#min     21.839  (± 4.6%) i/s -    110.000  in   5.040531s
# >> Comparison:
# >>            Array#min:       35.1 i/s
# >>       Enumerable#min:       21.8 i/s - 1.61x slower

对整型数进行了简化

到 Ruby 2.4 为止你还得管理许多的数字类型:

# Find classes which subclass the base "Numeric" class:
numerics = ObjectSpace.each_object(Module).select { |mod| mod < Numeric }
 
# In Ruby 2.3:
numerics # => [Complex, Rational, Bignum, Float, Fixnum, Integer, BigDecimal]
 
# In Ruby 2.4:
numerics # => [Complex, Rational, Float, Integer, BigDecimal]

现在像 Fixnum 以及 Bignum 这样实现细节 Ruby 都能为你进行管理。这应该可以帮助你避免掉像下面这样的有点微妙的BUG:

def categorize_number(num)
  case num
  when Fixnum then 'fixed number!'
  when Float  then 'floating point!'
  end
end
 
# In Ruby 2.3:
categorize_number(2)        # => "fixed number!"
categorize_number(2.0)      # => "floating point!"
categorize_number(2 ** 500) # => nil
 
# In Ruby 2.4:
categorize_number(2)        # => "fixed number!"
categorize_number(2.0)      # => "floating point!"
categorize_number(2 ** 500) # => "fixed number!"

如果你在源代码中硬编码了 Bignum 或者 Fixnum，也没关系。这些常量现在会指向 Integer:

1

2

3

Fixnum  # => Integer
Bignum  # => Integer
Integer # => Integer

用于浮点数修改器的新的参数支持

#round, #ceil, #floor, 和 #truncate 现在可以接受一个精度参数了

1

2

3

4

55.ceil(1)     # => 4.6
55.floor(1)    # => 4.5
55.truncate(1) # => 4.5
55.round(1)    # => 4.6

这些方法在Integer上运行的效果也都是一样的:

1

2

3

4

ceil(1)        # => 4.0
floor(1)       # => 4.0
truncate(1)    # => 4.0
round(1)       # => 4.0

对于unicode字符大小写敏感

看看下面这个句子:

My name is JOHN. That is spelled Ｊ-Ο-Ｈ-Ｎ

在Ruby 2.3中，在这个字符串上面调用 #downcase 方法，输出如下:

my name is john. that is spelled Ｊ-Ο-Ｈ-Ｎ

这是因为这个字符串中的“Ｊ-Ο-Ｈ-Ｎ”是使用的unicode字符。

Ruby 的字母大小写方法现在能正常处理unicode字符了:

1

2

3

4

5

sentence =  "\uff2a-\u039f-\uff28-\uff2e"
sentence                              # => "Ｊ-Ο-Ｈ-Ｎ"
sentence.downcase                     # => "ｊ-ο-ｈ-ｎ"
sentence.downcase.capitalize          # => "Ｊ-ο-ｈ-ｎ"
sentence.downcase.capitalize.swapcase # => "ｊ-Ο-Ｈ-Ｎ"

一个用来指定新字符串大小的新选项

当要创建一个字符串时，你现在可以声明一个:capacity选项来告诉Ruby它应该为你的字符串分配多少内存了。这样对性能的提升有所帮助，因为在你使字符串变大时可以避免Ruby重新分配内存:

require 'benchmark/ips'
 
Benchmark.ips do |bench|
  bench.report("Without capacity") do
    append_me = ' ' * 1_000
    template  = String.new
 
    100.times { template << append_me }
  end
 
  bench.report("With capacity") do
    append_me = ' ' * 1_000
    template  = String.new(capacity: 100_000)
 
    100.times { template << append_me }
  end
 
  bench.compare!
end
 
# >> Warming up --------------------------------------
# >>     Without capacity     1.690k i/100ms
# >>        With capacity     3.204k i/100ms
# >> Calculating -------------------------------------
# >>     Without capacity     16.031k (± 7.4%) i/s -    160.550k in  10.070740s
# >>        With capacity     37.225k (±18.0%) i/s -    362.052k in  10.005530s
# >>
# >> Comparison:
# >>        With capacity:    37225.1 i/s
# >>     Without capacity:    16031.3 i/s - 2.32x slower

对针对符号的匹配行为进行了固定化

尽管 String#match会返回 MatchData，但 Ruby 2.3 的 Symbol#match 会返回匹配的位置。这样的不一致在Ruby 2.4中进行了已经被修复:

# Ruby 2.3 behavior:
 
'foo bar'.match(/^foo (\w+)$/)  # => #<MatchData "foo bar" 1:"bar">
:'foo bar'.match(/^foo (\w+)$/) # => 0
 
# Ruby 2.4 behavior:
 
'foo bar'.match(/^foo (\w+)$/)  # => #<MatchData "foo bar" 1:"bar">
:'foo bar'.match(/^foo (\w+)$/) # => #<MatchData "foo bar" 1:"bar">

条件表达式中的多重赋值

现在你可以在一个条件表达式中对多个变量进行赋值了:

branch1 =
  if (foo, bar = %w[foo bar])
    'truthy'
  else
    'falsey'
  endbranch2 =
  if (foo, bar = nil)
    'truthy'
  else
    'falsey'
  endbranch1 # => "truthy"branch2 # => "falsey"

尽管你可能并不应该做那种事情。

针对线程的异常报告方面的改进

如果在一个线程中遇到了异常，那么 Ruby 默认会悄悄地吞下那个错误:

puts 'Starting some parallel work'
 
thread =
  Thread.new do
    sleep 1
 
    fail 'something very bad happened!'
  end
 
sleep 2
 
puts 'Done!'

1

2

3

$ ruby parallel-work.rb
Starting some parallel work
Done!

如果你想要在当一个线程中异常发生时让整个进程都失败，那你就可以使用 Thread.abort_on_exception = true。在上面的 parallel-work.rb 中加上这个会改变程序的输出:

1

2

3

$ ruby parallel-work.rb
Starting some parallel work
parallel-work.rb:9:in 'block in <main>': something very bad happened! (RuntimeError)

现在在 Ruby 2.4 中你有了一个位于错误被悄悄忽略和终止整个程序之间的中间位置。不使用 abort_on_exception，你可以设置 Thread.report_on_exception = true:

$ ruby parallel-work.rb
Starting some parallel work
#<Thread:0x007ffa628a62b8@parallel-work.rb:6 run> terminated with exception:
parallel-work.rb:9:in 'block in <main>': something very bad happened! (RuntimeError)
Done!