• 发文章

  • 发资料

  • 发帖

  • 提问

  • 发视频

创作活动
0
登录后你可以
  • 下载海量资料
  • 学习在线课程
  • 观看技术视频
  • 写文章/发帖/加入社区
返回

电子发烧友 电子发烧友

  • 全文搜索
    • 全文搜索
    • 标题搜索
  • 全部时间
    • 全部时间
    • 1小时内
    • 1天内
    • 1周内
    • 1个月内
  • 默认排序
    • 默认排序
    • 按时间排序
  • 全部板块
    • 全部板块
大家还在搜
  • KVM中的SRIOV和ubuntu绑定

    SRIOV卡的KVM。 5:enp4s0f0:mtu 1500 qdisc mq state UP模式DEFAULT group default qlen 1000 link / ether 00:0c

    2018-11-07 11:13

  • Linux KVM SRIOV欺骗数据包丢帧

    主要问题是在几个小时到几周之后,单个VF将进入客户的糟糕状态,我们将在父母和孩子身上看到以下错误。版本:Centos = 7.5.1804内核= 4.4.121-1.el7.centos.x86_64(当前);试过3.10.0,***,***,4.14.68IXGBE = 5.3.7(当前);试过5.3.5,4.2.1-k,......IXGBEVF = 4.3.5(当前);试过2.12.1-k,....QEMU = 1.5.3(当前);试过2.0.0Libvirt = 3.9.0(当前)在父级上我们将看到此错误:ixgbe 0000:05:00.0 ethx:193检测到欺骗数据包x 0000:05:00.0 ethx:45检测到欺骗数据包x 0000:05:00.0 ethx:3检测到欺骗数据包x 0000:05:00.0 ethx:126检测到欺骗数据包在孩子身上,你会看到丢包的增加。2:eth0:mtu 1500 qdisc mq状态***模式DEFAULT组默认qlen 1000链接/ ether 52:54:00:5e:a9:f8 brd ff:ff:ff:ff:ff:ff RX:字节数据包错误丢弃超限mcast 455429589913 520093667 0 375674 0 375680 TX:丢弃的字节数据包载波collsns 463147231075 514071570 0 0 0 0我没有办法查看出来的欺骗数据包,但我可以看到传入的数据包被客户端损坏和丢弃。最好的例子是ARP,因为它会击中每个父母,孩子。 (IP审查)父捕获:10:36:26.492879 02:00:00:00:00:01> ff:ff:ff:ff:ff:ff,ethertype ARP(0x0806),长度60:请求谁拥有ZZZ.ZZZ.ZZZ.ZZZ告诉XXX.XXX.XXX.XXX,长度4610:36:26.540880 02:00:00:00:00:01> ff:ff:ff:ff:ff:ff,ethertype ARP(0x0806),长度60:请求 - 有BBB.BBB.BBB.BBB告诉XXX.XXX.XXX.XXX,长度4610:36:26.553161 02:00:00:00:00:01> ff:ff:ff:ff:ff:ff,ethertype ARP( 0x0806),长度60:请求谁拥有AAA.AAA.AAA.AAA告诉XXX.XXX.XXX.XXX,长度4610:36:26.559508 02:00:00:00:00:01> ff:ff:ff: ff:ff:ff,ethertype ARP(0x0806),长度60:请求谁有YYY.YYY.YYY.YYY告诉XXX.XXX.XXX.XXX,长度46儿童捕获:10:36:26.501491 02:00:00:00:00:01> ff:ff:ff:ff:ff:ff,ethertype ARP(0x0806),长度60:请求谁有ZZZ.ZZZ.ZZZ.ZZZ告诉XXX.XXX.XXX.XXX,长度4610:36:26.549499 00:00:00:00:00:00> 00:00:00:00:00:00,802.3,长度0:LLC,dsap Null(0x00)个别,ssap Null(0x00)命令,ctrl 0x0000:信息,发送seq 0,rcv seq 0,标志[Command],长度46 0x0000:0000 0000 0000 0000 0000 0000 0000 0000 ........... ..... 0x0010:0000 0000 0000 0000 0000 0000 0000 0000 ................ 0x0020:0000 0000 0000 0000 0000 0000 0000 .......... .... 10:36:26.561776 00:00:00:00:00:00> 00:00:00:00:00:00,802.3,长度0:LLC,dsap Null(0x00)个人,ssap Null( 0x00)命令,ctrl 0x0000:信息,发送seq 0,rcv seq 0,标志[命令],长度46 0x0000:0000 0000 0000 0000 0000 0000 0000 0000 ................ 0x0010:0000 0000 0000 0000 0000 0000 0000 0000 ................ 0x0020:0000 0000 0000 0000 0000 0000 0000 .............. 10 :36:26.568122 02:00:00:00:00:01> ff:ff:ff:ff:ff:ff,et hertype ARP(0x0806),长度60:请求谁有YYY.YYY.YYY.YYY告诉XXX.XXX.XXX.XXX,长度46在此一个VF处于错误状态期间,所有其他客户将看到与父母相同的数据包。目前唯一的解决方案是重新启动来宾。有时会摧毁客人并重新启动它。以上来自于谷歌翻译以下为原文The primary issue is after several hours to upwards of a couple weeks a single VF will get into a bad state for a guest and we will see the following errors on the parent and child. Versions:Centos = 7.5.1804Kernel = 4.4.121-1.el7.centos.x86_64 (Current); Tried 3.10.0,***, ***, 4.14.68IXGBE = 5.3.7 (Current); Tried 5.3.5,***, ......IXGBEVF = 4.3.5 (Current); Tried 2.12.1-k, ....QEMU = 1.5.3 (Current); Tried 2.0.0Libvirt = 3.9.0 (Current) On the parent we will see this error:ixgbe 0000:05:00.0 ethx: 193 Spoofed packets detectedixgbe 0000:05:00.0 ethx: 45 Spoofed packets detectedixgbe 0000:05:00.0 ethx: 3 Spoofed packets detectedixgbe 0000:05:00.0 ethx: 126 Spoofed packets detectedOn the child you will see an increase in dropped packets.2: eth0:mtu 1500 qdisc mq state *** mode DEFAULT group default qlen 1000 link/ether 52:54:00:5e:a9:f8 brd ff:ff:ff:ff:ff:ff RX: bytespacketserrorsdropped overrun mcast 455429589913 520093667 0 3756740 375680 TX: bytespacketserrorsdropped carrier collsns 463147231075 514071570 0 0 0 0 I don't have a way to view the spoofed packets going out,***, child. (IPs censored) Parent capture:10:36:26.492879 02:00:00:00:00:01 > ff:ff:ff:ff:ff:ff,***,***, length 4610:36:26.540880 02:00:00:00:00:01 > ff:ff:ff:ff:ff:ff,***,***, length 4610:36:26.553161 02:00:00:00:00:01 > ff:ff:ff:ff:ff:ff,***,***, length 4610:36:26.559508 02:00:00:00:00:01 > ff:ff:ff:ff:ff:ff,***,***, length 46Child Capture:10:36:26.501491 02:00:00:00:00:01 > ff:ff:ff:ff:ff:ff,***,***, length 4610:36:26.549499 00:00:00:00:00:00 > 00:00:00:00:00:00,***,***,***,***,***,***,***,***, length 460x0000:0000 0000 0000 0000 0000 0000 0000 0000................0x0010:0000 0000 0000 0000 0000 0000 0000 0000................0x0020:0000 0000 0000 0000 0000 0000 0000 ..............10:36:26.561776 00:00:00:00:00:00 > 00:00:00:00:00:00,***,***,***,***,***,***,***,***, length 460x0000:0000 0000 0000 0000 0000 0000 0000 0000................0x0010:0000 0000 0000 0000 0000 0000 0000 0000................0x0020:0000 0000 0000 0000 0000 0000 0000 ..............10:36:26.568122 02:00:00:00:00:01 > ff:ff:ff:ff:ff:ff,***,***, length 46 During the time this one VF is in a bad state, all other guests will see the same packets as the parent. The only current solution is to reboot the guest. Sometimes destroy the guest and start it back up.

    2018-10-24 15:12

  • fpga virtex 5 与dsp c6678 srio 4x 通信问题

    本人使用virtex 5 与dsp c6678 srio通信,fpga是从模式,一直使用的1x。现在调试4x的时候遇到问题,4x会自动变成1x通信,或者:dsp成4x,fpga也训练成4x,但是此时两者通信不了,查原因发现fpga srio ip核的 lnk_trdy_n信号不对,但lnk_rrdy_n信号正常,mode-sel正常。 希望得到高人解答。(Serial RapidIO v5.6)

    2018-06-21 00:10

  • 基于加速卡的FPGA生态系统布局是怎样的?

    FPGA加速卡是如何产生的?主要的FPGA加速卡产品有哪些?基于加速卡的FPGA生态系统布局是怎样的?

    2021-06-17 06:07

  • 英特尔X710-T4 SAN问题

    我们在集群中添加了一个新节点,我们在服务器中安装了2个Intel X710-T4卡。我们的两个SAN目前只有1Gb(我们有Tegile T3100和联想px12-450r)。每当我尝试使用这些NIC上的其中一个端口设置SAN连接时,都会导致各种问题。我们有CSV离线并且报告已损坏且无法读取,我们丢失了VM配置,并且VM停止从此节点(2016 Hyper V群集)托管的任何卷启动。奇怪的是,在我们更新/迁移到2016之前,这个节点在我们的2012集群中工作正常。我有适配器的最新驱动程序。以下是发生的一些事情: - 当我最初将节点带入集群时,我使用X710-T4配置了所有3个与SAN的连接(2个用于每个独立交换机/控制器的tegile,1个用于直接连接的lenovo)。每当此节点获得Lenovo上任一卷的所有权时,该卷上的VM将不再启动。他们会给出不同的奇怪启动错误,最终它甚至会硬锁定联想SAN本身,我必须硬启动它。我将该连接移至板载broadcom 1Gb NIC并解决了这些问题。 tegile仍然通过X710-T4连接,当它继续运行时,这里也发生了一些奇怪的事情。有时,连接的iSCSI设备列表在该节点上只是空白,即使它仍在运行。在最新的情况下,节点从Tegile获得LUN的所有权,并且该LUN上的所有VM立即停止工作,并且CSV报告为损坏且不可读。我将CSV移动到另一个节点,过了一会儿它终于再次开始工作了。问题是此节点坚持取得存储节点的所有权,并且似乎没有办法阻止它(无法在CSV上设置节点首选项)。所以现在我害怕取消暂停这个节点,我正在考虑将Tegile连接转移到broadcom中,并希望避免所有的麻烦....但是当我们最终升级我们的SAN并转到10Gb我不想运行再次进入这个问题。我意识到这可能难以破译......在这一点上,我只是在寻找建议。它是适配器属性之一吗?连接到SAN的适配器除了未选中IPv4之外的所有协议,它们将巨型帧设置为9014并且它们被设置为不允许操作系统将其关闭(省电的事情)。除此之外,它们基本上处于默认设置。我想我可能会在这些适配器上禁用SRV-IO,但这是导致我的问题(我对此表示怀疑)。让我知道你的想法!以上来自于谷歌翻译以下为原文We added a new node to our cluster and we've got 2 of the Intel X710-T4 cards in the server. Both of our SANs are currently only 1Gb (we have a Tegile T3100 and a Lenovo px12-450r). Whenever i try to setup SAN connections using one of the ports on these NICs it causes all sorts of issues. We have CSVs go offline and report being corrupted and unreadable, we've lost VM configs and VMs stop booting from any volumes that are being hosted by this node (2016 Hyper V cluster). The odd part is that this same node was working fine in our 2012 cluster before we updated/migrated into 2016. I have the latest drivers for the adapter. Here are some things that have happened: - When I initially brought the node into the cluster I configured all 3 connections to the SANs using the X710-T4 (2 for the tegile for each independent switch/controller and 1 for the lenovo which is directly connected). Right off the bat each time this node took ownership of either volume on the Lenovo the VMs that were on that volume would no longer boot. They would give different weird boot errors and eventually it would even hard lock the Lenovo SAN itself and I'd have to hard boot it. I moved that connection down to the onboard broadcom 1Gb NIC and that solved those issues. The tegile was still connected via the X710-T4 and while it continued to operate, some odd things happened here as well. Sometimes the list of connected iSCSI devices would just be blank on that node even though it was still operating. In the latest case the node took ownership of a LUN from the Tegile and immediately all the VMs on that LUN stopped working and the CSV reported as corrupt and unreadable. I moved the CSV to another node and after a while it finally started working again. Problem is this node insists on taking ownership of storage nodes and there doesnt appear to be a way to stop it (cant set node preferences on CSVs). So right now I'm scared to unpause this node and am contemplating just moving the Tegile connections into the broadcom as well and hopefully avoid all the hassle.... but when we eventually do upgrade our SAN and go 10Gb I dont want to run into this issue again. I realize this is probably incredibly hard to decipher... at this point I'm just looking for suggestions. Is it one of the adapter properties? The adapters that connect to the SANs have all protocols except IPv4 unchecked, they have jumbo frames set to 9014 and they are set to not allow the OS to turn them off (power saving thing). Aside from that they are basically at default settings. I think I could probably disable SRV-IO on these adapters but is that causing my issue (I doubt it). Let me know what you think!

    2018-11-15 11:15