经常在/var/log/messages里发现这种消息,它是对流入的包进行路由检查失败后,发出的警告。

martian source 192.168.1.1 from 10.0.0.1, on dev eth1
ll header: 52:54:00:98:99:d0:52:54:00:de:d8:10:08:00 

代码出处在此

kernel_source/net/ipv4/route.c 
static void ip_handle_martian_source(struct net_device *dev,
				     struct in_device *in_dev,
				     struct sk_buff *skb,
				     __be32 daddr,
				     __be32 saddr)
{
	RT_CACHE_STAT_INC(in_martian_src);
#ifdef CONFIG_IP_ROUTE_VERBOSE
	if (IN_DEV_LOG_MARTIANS(in_dev) && net_ratelimit()) {
		/*
		 *	RFC1812 recommendation, if source is martian,
		 *	the only hint is MAC header.
		 */
		printk(KERN_WARNING "martian source %pI4 from %pI4, on dev %s\n",
			&daddr, &saddr, dev->name);
		if (dev->hard_header_len && skb_mac_header_was_set(skb)) {
			int i;
			const unsigned char *p = skb_mac_header(skb);
			printk(KERN_WARNING "ll header: ");
			for (i = 0; i < dev->hard_header_len; i++, p++) {
				printk("%02x", *p);
				if (i < (dev->hard_header_len - 1))
					printk(":");
			}
			printk("\n");
		}
	}
#endif
}

其中,10.0.0.1表示src ip,192.168.0.1表示dst ip,eth1表示实际收包的设备,来看看为什么会有错误。

kernel_source/net/ipv4/fib_frontend.c
int fib_validate_source(){
    // 代码有选择性省略
    // 反转src和dst
	struct flowi fl = { .nl_u = { .ip4_u =
				      { .daddr = src,
					.saddr = dst,
					.tos = tos } },
			    .mark = mark,
			    .iif = oif };


	in_dev = __in_dev_get_rcu(dev);
    // 拿到收包的设备,顺便取出该设备上rp_filter的flag
	if (in_dev) {
		no_addr = in_dev->ifa_list == NULL;
		rpf = IN_DEV_RPFILTER(in_dev);
		if (mark && !IN_DEV_SRC_VMARK(in_dev))
			fl.mark = 0;
	}
	rcu_read_unlock();

	if (in_dev == NULL)
		goto e_inval;
	net = dev_net(dev);
    // 以src为dst,查fib
	if (fib_lookup(net, &fl, &res))
		goto last_resort;
	if (res.type != RTN_UNICAST)
		goto e_inval_res;
	*spec_dst = FIB_RES_PREFSRC(res);
	fib_combine_itag(itag, &res);

#ifdef CONFIG_IP_ROUTE_MULTIPATH
    // 以src作为dst,查到的发送dev和当前收包的dev相同,一切ok
	if (FIB_RES_DEV(res) == dev || res.fi->fib_nhs > 1)
#else
	if (FIB_RES_DEV(res) == dev)
#endif
	{
		ret = FIB_RES_NH(res).nh_scope >= RT_SCOPE_HOST;
		fib_res_put(&res);
		return ret;
	}
	fib_res_put(&res);
	if (no_addr)
		goto last_resort;
    // 如果dev不相同,并且rp_filter置为on,则检查失败
	if (rpf == 1)
		goto e_inval;

}

给一些的配置

sysctl.conf
// 0 means off,1 means on
net.ipv4.conf.all.rp_filter = 0
net.ipv4.conf.default.rp_filter = 0
net.ipv4.conf.eth0.rp_filter = 0
net.ipv4.conf.lo.rp_filter = 0
net.ipv4.conf.vboxnet0.rp_filter = 0
net.ipv4.conf.wlan0.rp_filter = 0

net.ipv4.conf.all.rp_filter是总开关,一开全开。

// 0 means off
net.ipv4.conf.all.log_martians = 0
net.ipv4.conf.default.log_martians = 0
net.ipv4.conf.eth0.log_martians = 0
net.ipv4.conf.lo.log_martians = 0
net.ipv4.conf.vboxnet0.log_martians = 0
net.ipv4.conf.wlan0.log_martians = 0

是否记录martian的开关,对应于代码里的IN_DEV_LOG_MARTIANS(in_dev)

小结一下,这个错误提示是很常见的,以至于常常被忽略,大多数情况它是做了正确的事情,不过当发现有意外的丢包,可以想想是否是遭遇了火星包,排查方法是:先做tcpdump,发现有traffic(tcpdump不受火星包的影响),但kernel hook或者app收不到包,应怀疑是martian source导致丢包,用dmesg看下是否有相关提示。

martian source 192.168.1.1 from 10.0.0.1, on dev eth1
ll header: 52:54:00:98:99:d0:52:54:00:de:d8:10:08:00 

做一下翻译:eth1上收到了src=10.0.0.1,dst=192.168.1.1的包,但是按照本机的路由设置对10.0.0.1进行路由计算,得出的out dev不是eth1。 一般遇到这种还是保持rp_filter=1吧,毕竟这个开关能让系统免受很多火星来客的干扰,研究下路由配置应该能解决问题;如果确实很复杂的使用场景,比如这台server有好多个网口,需要在不同网口之间转发,放开rp_filter的限制也无妨。