{"id":873,"date":"2011-08-27T16:03:28","date_gmt":"2011-08-27T21:03:28","guid":{"rendered":"http:\/\/www.shainmiley.com\/wordpress\/?p=873"},"modified":"2012-03-07T20:37:53","modified_gmt":"2012-03-08T01:37:53","slug":"sunwattr_ro-errorpermission-denied-on-opensolaris-using-gluster-3-0-5-partii","status":"publish","type":"post","link":"https:\/\/www.shainmiley.com\/wordpress\/2011\/08\/27\/sunwattr_ro-errorpermission-denied-on-opensolaris-using-gluster-3-0-5-partii\/","title":{"rendered":"SUNWattr_ro error:Permission denied on OpenSolaris using Gluster 3.0.5&#8211;PartII"},"content":{"rendered":"<p>Recently one of our 3ware 9650SE raid cards started spitting out errors indicating that the unit was repeatedly issuing a bunch of soft resets. The lines in the log look similar to this:<\/p>\n<p>WARNING: tw1: tw_aen_task AEN 0x0039 Buffer ECC error corrected address=0xDF420<br \/>\nWARNING: tw1: tw_aen_task AEN 0x005f Cache synchronization failed; some data lost unit=22<br \/>\nWARNING: tw1: tw_aen_task AEN 0x0001 Controller reset occurred resets=13<\/p>\n<p>I downloaded and installed the latest firmware\u00c2\u00a0for the card (version 4.10.00.021), which the release notes claimed had several fixes for cards experiencing soft resets. \u00c2\u00a0Much to my disappointment the resets continued to occur despite the new revised firmware.<\/p>\n<p>The card was under warranty, so I contacted 3ware support and had a new one sent overnight. \u00c2\u00a0The new card seemed to resolve the issues associated with random soft resets, however the resets and the downtime had left this node little out of sync with the other Gluster server.<\/p>\n<p>After doing a &#8216;zfs replace&#8217; on two bad disks (at this point I am still unsure whether the bad drives where a symptom or the cause of the issues with the raid card, however what I do know is that the Cavier Geen Western Digital drives that are populating this card have a very high error rate, and we are currently in the process of replacing all 24 drives with hitachi ones), I set about trying to\u00c2\u00a0initiate a &#8216;self-heal&#8217; on the known up to date node using the following command:<\/p>\n<p>server2:\/zpool\/glusterfs# ls -laR *<\/p>\n<p>After some time I decided to tail the log file to see if there were any errors that might indicate a problem with the self heal.\u00c2\u00a0Once again the Gluster error log begun to fill up with errors associated with setting extended attributes on\u00c2\u00a0SUNWattr_ro.<\/p>\n<p>At that point I began to worry whether or not the AFR (Automatic File Replication) portion of the Replicate\/AFR translator was actually working correctly or not. \u00c2\u00a0I started running some tests to determine what exactly was going on. \u00c2\u00a0I began by copying over a few files to test replication. \u00c2\u00a0All the files showed up on both nodes, so far so good.<\/p>\n<p>Next it was time to test AFR so I began deleting a few files off one node and then attempting to self heal those same deleted files. \u00c2\u00a0After a couple of minutes, I re-listed the files and the deleted files had in fact been restored. Despite the successful copy, the errors continued to show up every single time the file\/directory was accessed (via stat). \u00c2\u00a0It seemed that even though AFR was able to copy all the files to the new node correctly, gluster for some reason continued to want to self heal the files over and over again.<\/p>\n<p>After finding the function that sets the extended attributes on Solaris, the following patch was created:<\/p>\n<div class=\"ex\">&#8212; compat.c    Tue Aug 23 13:24:33 2011<br \/>\n+++ compat_new.c        Tue Aug 23 13:24:49 2011<br \/>\n@@ -193,7 +193,7 @@<br \/>\n {<br \/>\n        int attrfd = -1;<br \/>\n        int ret = 0;<br \/>\n&#8211;<br \/>\n+<br \/>\n        attrfd = attropen (path, key, flags|O_CREAT|O_WRONLY, 0777);<br \/>\n        if (attrfd &gt;= 0) {<br \/>\n                ftruncate (attrfd, 0);<br \/>\n@@ -200,13 +200,16 @@<br \/>\n                ret = write (attrfd, value, size);<br \/>\n                close (attrfd);<br \/>\n        } else {<br \/>\n&#8211;               if (errno != ENOENT)<br \/>\n&#8211;                       gf_log (&#8220;libglusterfs&#8221;, GF_LOG_ERROR,<br \/>\n+               if(!strcmp(key,&#8221;SUNWattr_ro&#8221;)&amp;&amp;!strcmp(key,&#8221;SUNWattr_rw&#8221;)) {<br \/>\n+<br \/>\n+                       if (errno != ENOENT)<br \/>\n+                               gf_log (&#8220;libglusterfs&#8221;, GF_LOG_ERROR,<br \/>\n                                &#8220;Couldn&#8217;t set extended attribute for %s (%d)&#8221;,<br \/>\n                                path, errno);<br \/>\n&#8211;               return -1;<br \/>\n+                       return -1;<br \/>\n+               }<br \/>\n+               return 0;<br \/>\n        }<br \/>\n&#8211;<br \/>\n        return 0;<br \/>\n }<\/div>\n<p>The patch simply ignores the two Solaris specific extended attributes (SUNWattr_ro and SUNWattr_rw), and returns a &#8216;0&#8217; to the posix layer instead of a &#8216;-1&#8217; if either of these is encountered.<\/p>\n<p>We&#8217;ve been running this code change on both Solaris nodes for several days and so far so good, the errors are gone and replicate and AFR both seem to be working very well.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Recently one of our 3ware 9650SE raid cards started spitting out errors indicating that the unit was repeatedly issuing a bunch of soft resets. The lines in the log look similar to this: WARNING: tw1: tw_aen_task AEN 0x0039 Buffer ECC error corrected address=0xDF420 WARNING: tw1: tw_aen_task AEN 0x005f Cache synchronization failed; some data lost unit=22 [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[4,8,3,13,14],"tags":[],"_links":{"self":[{"href":"https:\/\/www.shainmiley.com\/wordpress\/wp-json\/wp\/v2\/posts\/873"}],"collection":[{"href":"https:\/\/www.shainmiley.com\/wordpress\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.shainmiley.com\/wordpress\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.shainmiley.com\/wordpress\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.shainmiley.com\/wordpress\/wp-json\/wp\/v2\/comments?post=873"}],"version-history":[{"count":17,"href":"https:\/\/www.shainmiley.com\/wordpress\/wp-json\/wp\/v2\/posts\/873\/revisions"}],"predecessor-version":[{"id":1261,"href":"https:\/\/www.shainmiley.com\/wordpress\/wp-json\/wp\/v2\/posts\/873\/revisions\/1261"}],"wp:attachment":[{"href":"https:\/\/www.shainmiley.com\/wordpress\/wp-json\/wp\/v2\/media?parent=873"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.shainmiley.com\/wordpress\/wp-json\/wp\/v2\/categories?post=873"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.shainmiley.com\/wordpress\/wp-json\/wp\/v2\/tags?post=873"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}