Tuesday, 2009-04-21

*** johndoigiii has quit IRC00:01
*** jgay has quit IRC01:10
*** parker-fcnyu has joined #cc02:13
*** dini has joined #cc02:24
*** Danny_B has quit IRC03:00
*** Danny_B has joined #cc03:00
*** stevel has quit IRC03:13
*** parker-fcnyu has quit IRC03:14
*** dini has quit IRC03:25
*** Danny_B has quit IRC03:36
*** stevel has joined #cc03:38
*** Danny_B has joined #cc03:53
*** K`Tetch has quit IRC04:29
*** johndoigiii has joined #cc04:31
*** cchelpbot has joined #cc05:14
*** stevel has quit IRC05:28
*** stevel has joined #cc05:32
*** parker-fcnyu has joined #cc06:17
*** johndoigiii has quit IRC07:01
*** Roderick_ has quit IRC07:42
*** stevel has quit IRC08:06
*** Roderick_ has joined #cc09:21
*** haoyu_ has joined #cc10:19
*** haoyu has quit IRC10:20
*** haoyu_ is now known as haoyu10:25
*** stevel has joined #cc11:39
*** Orango has quit IRC11:44
*** haoyu has quit IRC12:15
*** [mharrison] has quit IRC12:15
*** paulproteus_ has joined #cc12:28
*** paulproteus has quit IRC12:28
*** haoyu has joined #cc12:29
*** kreynen has joined #cc13:01
*** kreynen has quit IRC13:37
*** parker-fcnyu has quit IRC13:44
*** michaelkrnac has joined #cc13:57
*** kreynen has joined #cc14:09
*** jgay has joined #cc14:34
*** parker-fcnyu has joined #cc14:47
*** nathany has joined #cc14:49
*** nathany has quit IRC14:50
*** nathany_ has joined #cc14:50
*** Peter__ has joined #CC14:54
*** Peter__ is now known as Danny14:54
DannyWhat's needed to put a wiki under a creative commons license? Do all the contributors need to give permission?14:55
nathany_Danny: typically, yes, unless some assignment of copyright has occured14:58
nathany_many systems allow you to include language on the editing screen that states that by submitting to the wiki, the contributor agrees to have their content licensed14:59
Dannynathany_, wow, that's a pain. probably makes sense to start a new wiki starting with creative commons and phace out the old one14:59
Dannyphase*14:59
nathany_Danny: I mean, this isn't legal advice, so a lawyer might say differently14:59
nathany_but yeah, you're right14:59
Dannydamn14:59
Dannyoh well14:59
Dannythanks!14:59
nathany_does the existing one have a license?14:59
Dannynone15:00
*** nathany_ is now known as nathany15:00
nathanyah15:00
nathanydarn15:00
Dannyyeah, oh well15:02
Dannythanks for the help15:02
Dannyit might be better to start fresh anyways15:02
*** michaelkrnac has quit IRC15:09
*** Danny has quit IRC15:12
*** lolo has joined #cc15:12
*** parker-fcnyu has quit IRC15:14
*** nkinkade has joined #cc15:15
*** parker-fcnyu has joined #cc15:28
*** K`Tetch has joined #cc15:32
*** tanjir_ has joined #cc15:34
*** tanjir has quit IRC15:34
*** [mharrison] has joined #cc15:36
*** johndoigiii has joined #cc16:07
*** parker-fcnyu has quit IRC16:22
*** Bovinity has joined #cc16:27
*** K`Tetch has quit IRC16:30
*** K`Tetch has joined #cc16:30
nkinkadenathany: There was a slight misconfiguration of Nagios from a few days back that was causing us not to get SMS alert (only email alerts).  I'm going to manually trigger a failure condition for forum.creativecommons.org, so you can ignore an SMS that comes in shortly for that.16:33
nathanynkinkade: got it16:34
nkinkadecc.engine was just down on a5 and we didn't get alerts.  I noticed that paster was up to 2.3G of resident memory, so it seems that the watchdog script wasn't working right.  I'm checking that too.16:35
*** paulproteus_ is now known as paulproteus16:47
*** radiance29 has joined #cc17:11
*** radiance29 has left #cc17:11
*** lolo has left #cc17:35
*** kreynen_ has joined #cc17:39
*** kreynen has quit IRC17:55
*** johndoigiii is now known as johndoigiii-lunc18:39
*** johndoigiii-lunc is now known as johndoig-lunch18:40
nathanynkinkade: i assume you didn't figure out the watchdog situation?18:44
nathanyi just killed paster to get it responsive again18:44
nathany:(18:44
nkinkadenathany: I haven't checked it yet.18:45
nathanyok18:45
nkinkadeI suspect it has to do with the way ps is reporting the size of the process.18:45
nkinkadePerhaps above a certain size it reports at N.NG instead of in KB??18:45
nathanynkinkade: yes18:47
nathanyit does18:47
nkinkadeThe watchdog currently kills paster when it get's to 1G.  Maybe we should back that down to 800M or 900M.18:48
nkinkadenathany: I just set it to 750000KB, which is what it used to be before you made those changes to cc.engine.  Let's see where that gets us.18:49
nathanyok18:49
*** balleyne has joined #cc19:15
nathanynkinkade: ping19:28
nkinkadenathany: Yeah.19:28
nkinkadea5 is having issues.19:28
nathanyright19:28
paulproteusps_mem.py is way more reliable for measuring RAM. How are you measuring?19:28
paulproteusNeed a hand?19:28
paulproteus(more reliable than e.g. ps)19:28
nathanyany idea what it is?19:28
nathanyexcept that top, etc don't report things as being pegged19:29
nkinkadeNot yet.  Memory looks good and CPU looks good.19:29
nkinkadeThere are around 4000 network connections in the TIME_WAIT state.19:29
nkinkadeThat seems high.19:29
paulproteusThat's free of cost, though, which is nice at least.19:29
nathanyexcept that things queue up behind them, rgiht?19:29
nkinkadeBut overall thoughput is about normal at around 600K/s19:29
nathanypaulproteus: suggestions?19:30
paulproteusWhat's the service time of an average HTTP request now? What indicates to you there is a problem at all?19:30
nathany(or nkinkade)19:31
nathanypaulproteus: it's not responding :)19:31
paulproteusTCP queue, it looks like?19:31
paulproteusRight, I see that now.19:31
paulproteusApache on 8080 is immediate19:31
paulproteusVarnish is not19:31
nathanyweird19:31
paulproteusThe varnish processes are spending a lot of time writing to disk19:31
paulproteusYou can see that in iotop19:31
paulproteusLet me see where their disk i/o is going19:32
paulproteusIt could be that Apache is fast because there's no kernel queue on port 808019:32
nathany(presumably to the cahce)19:32
paulproteusMaybe Varnish is handling all the connections it can; can we ask Varnish to increase that #?19:32
paulproteusLet me read the varnish conf docs19:32
nathanyit seems to be abetting19:34
nathany(er, nevermind)19:34
nkinkadeMy browser seems to be hanging on getting data from google-analytics.com19:34
paulproteusThat's a separate issue, since "time curl http://a5.creativecommons.org/" takes 4eva19:35
nathanyright19:35
paulproteus(whereas :8080 in that curl returns instantaneously)19:35
nathanyyea19:35
paulproteusWow, so Varnish has about 1000 sockets open.19:35
paulproteusIt's not hurting at all.19:36
paulproteusIt's just got a bunch of connections doing nothing.19:36
nkinkadeI restarted Varnish a minute ago, and that didn't seem to do anything.19:36
paulproteusI really want to see if we can just ask Varnish to accept more simultaneous connections.19:36
paulproteusThe other thing I'm interested in doing is just proxying all HTTP straight to Apache "for now"19:37
nkinkadepaulproteus: Give that a shot.  I suspect that Apache will then become overwhelmed.19:38
nkinkadeBut it's worth a try.19:38
nkinkadeI guess I could put in a Varnish rule that just passes everything to Apache.19:38
paulproteusBut I don't think that'd help; Varnish would still queue the connections.19:40
paulproteusI used a "simple TCP proxy" and now Apache receives the connections (indirectly)19:40
nkinkadeFor me, requests to :8080 are hanging too.19:40
paulproteusYeah, I bet this just overloaded Apache in an instant. (-:19:41
paulproteusReverting19:41
paulproteus(varnish restarted; creativecommons.org loads quickly)19:41
paulproteus(...and now not quickly)19:41
paulproteus:8080 is quick now htough19:41
paulproteus    * -p listen_depth=4096 (default 1024)19:43
paulproteusThat's what we ought to change19:43
paulproteusat http://varnish.projects.linpro.no/wiki/Performance19:43
paulproteusone sec, let me see how to19:43
nathanyi saw that... looks like a CLI option?19:43
paulproteusYa, added and restarted19:44
paulproteusResponsive now19:44
paulproteusStill responsive19:44
paulproteus(kinda?)19:45
paulproteusNot really anymore...19:45
nkinkadeIt's quick again.19:46
nkinkadeI might have slowed because I restarted Apache.19:46
paulproteusOkay, yeah19:46
paulproteusWhy did you restart Apache?19:46
nkinkadeWhat we were seeing looked vaguely like a problem Varnish used to have with KeepAlives in Apache, so I had disabled them to see.19:46
paulproteusOh yeah19:46
paulproteusNo longer fast for me, fwiw19:46
paulproteus:8080 is fast still19:47
paulproteusWe need more sockets in Varnish, I wonder?19:47
paulproteuslet me read more19:47
nkinkadeNo here either.19:47
nathanyit seems like it's slow, but struggling along better than it was previously19:48
paulproteus3s response time now, which is bearable19:48
paulproteus(as measured with time curl http://a5.creativecommons.org/ )19:49
paulproteustime curl http://creativecommons.org/ # still waiting, >15s, wonder why19:49
paulproteusThe TCP sockets still aren't being picked up by Varnish quickly enough.19:50
nathanyis it getting busy waiting for connections to return from zope/apache?19:50
nathany(no idea, speculation)19:50
paulproteusincreasing thread pool count19:51
paulproteusSo one question is, are our connections mostly coming from one source? If so, can we attempt to degrade service for that source?19:53
* paulproteus checks19:53
paulproteusBasically, no19:55
nathanypaulproteus: is iptraf not helpful here?19:55
nathanyit seems to suggest lots of traffic from one ip19:55
paulproteusI can't listen if you are, with it19:56
nathanypaulproteus: i'm out19:56
paulproteusI was using netstat -a -n # list of ips, then uniq -c to generate the histogram19:56
nkinkadepaulproteus: I checked that already.19:56
nkinkadeIt's seems like a heterogeneous pool of IPs.19:56
paulproteusThe only really fast growing byte count is my iptraf session over SSH19:57
nathanyah19:57
paulproteusWhich is consistent with poor service for everyone19:57
nathanynevermind :)19:57
nathanyso is this just a spike in service that we haven't seen before?19:57
nkinkadeInterestingly varnishhist seems to be reporting mostly normal graphs.19:57
paulproteusnkinkade, The problem is what varnish doesn't see19:58
nkinkadeThat makes sense.19:58
paulproteusIt's letting connections sit un-handled in the kernel queue19:58
nkinkadeBut the numbers from varnishstat don't look out of place either.19:58
nkinkadepaulproteus: You can run /home/nkinkade/bin/conn_stat.sh for some stats based on netstat.19:59
nathanynkinkade: they wouldn't; it appears the problem is how quickly varnish is picking up connections (if i understand correctly)19:59
nathany(which i'm not sure i do)19:59
paulproteusnathany, That's what I'm saying, at least; if it's correct is a separate issue (-;19:59
paulproteusBut I think it is19:59
nkinkadeThings look busy, but they don't seem that much busier than what I've seen in the past.19:59
paulproteusI just enabled syncookies.20:01
nathanywow20:01
paulproteusSuddenly it's all fast.20:01
nathanyfast now20:01
nathanywtf?20:01
paulproteusI have no idea if these are related.20:01
paulproteusI should dump the traffic and see if it's really a SYN flood20:02
nathanypaulproteus: sounds good20:02
paulproteus(still fast)20:03
nkinkadeNice.20:03
nkinkadeIf  that's the case, could it be that netstat wouldn't even list all the SYN requests?20:03
paulproteusI think that's right, yeah20:03
paulproteusIt could be that bogus connections were preventing the kernel from handling the real ones20:03
nkinkadeAh.20:03
paulproteusSo they were in an even earlier queue20:03
nathanyhrm20:04
nkinkadeStill quick for me.20:04
paulproteusI have 50M of packets; I guess that's good enough for me.20:05
paulproteusca. 640K packets20:06
paulproteusstopped dumping20:06
paulproteusThat was fun!20:07
paulproteusIt's hard to say which Varnish options I added were/are necessary.20:07
nkinkadepaulproteus: Which did you add?20:07
paulproteusNow that we know that syncookies fixed it, I wonder if increasing the size of a kernel queue could have also fixed it.20:08
paulproteusBut then again an attacker (if it was an attack) could just double/etc the number of SYN packets he sends.20:08
paulproteus              -p listen_depth=4096 \20:08
paulproteus              -p thread_pools=4 \20:08
paulproteus              -p thread_pool_max=4000 \20:08
paulproteusto /etc/default/varnish20:08
nkinkadeIf the queue you are talking about is before the queue that netstat works with, I wonder if concurrent connection limiting would have done anything.20:10
nkinkadepaulproteus: http://varnish.projects.linpro.no/wiki/Performance20:11
nkinkadeIs that what you saw earlier?20:11
paulproteusYes, that's what I read.20:11
paulproteusWhat are you saying about concurrent connection limiting?20:12
paulproteusThrough all of this, it's interesting that port 22 and 8080 were super responsive.20:14
nkinkadeOne of the things I had been looking forward to with the upgrade to Lenny was the ability to limit concurrent connections on a per IP basis ... just to avoid one address slamming the machine.20:15
nkinkadeLike you were able to do with netcat one time.20:15
paulproteusHow can you do that limit? Is it a netfilter feature?20:15
* paulproteus nods20:15
nkinkadepaulproteus: Yeah, it's a netfilter feature ... rather it's some Netfilter module.20:15
nkinkadeip_connlimit, or something like that.20:16
nkinkadeI can't remember now.20:16
nkinkadeBut if turning syncookies on fixed this, then I imagine that it was at a level that the kernel could identify the IP address and perhaps could have taken action, like just dropping it.20:16
nkinkadeI'm not sure about that, though.20:17
nkinkadeAnd it certainly does nothing much for any distributed flood of traffic.20:17
paulproteusYeah20:17
nkinkadeBut I was also confused as to why SSH and everything else was so responsive during that whole ordeal.20:17
paulproteusWe'll see when we look at that dump (/root/2009-04-21-packets-for-a-minute-or-two.tcpdump) I guess.20:17
paulproteusOkay, I'm going to go have lunch, and then head to the office.20:23
*** parker-fcnyu has joined #cc20:30
*** johndoig-lunch is now known as johndoigiii20:35
*** michaelkrnac has joined #cc20:39
*** michaelkrnac has quit IRC20:51
*** Orango has joined #cc21:00
*** Orango has joined #cc21:01
*** Orango has quit IRC21:10
*** Orango has joined #cc21:13
*** parker-fcnyu has quit IRC21:24
*** kreynen_ has quit IRC21:30
*** kreynen has joined #cc21:34
*** [mharrison] has quit IRC21:47
*** tanjir__ has joined #cc21:48
*** balleyne has quit IRC21:49
nathanynkinkade: you around?21:50
johndoigiiinathany: http://dpaste.com/36462/ are the wrapping span's necessary?21:50
nkinkadenathany: Yeah.21:51
nathanyjohndoigiii: i wouldn't think so; couldn't you just put style="display:none" on the span with the tal:omit-tag=""?21:51
nathany(we can probably remove the tal:omit-tag anyway since it's disregarded)21:51
nathanynkinkade: i'm looking @ the planet bug (finally/again)21:51
nkinkadeAh.  Thanks.21:51
nkinkadeI feel like I should have been investigating that myself a bit.21:52
nathanyquestion re: the plugin... why do we look for an id that matches "license_url" on the parent?21:52
nathany(it's ok, pretty low priority)21:52
nkinkadeMy assessment that it was rdfadict was largely based on the word RDFa showing up somewhere in the backtrace.21:52
nathany(my question above has zero to do with the bug, it just seems weird21:52
nkinkadeI'll have to look at the plugin again to answer that.21:53
nkinkadeIt's been a quite some months since I touched it, maybe as much as a year.21:53
nathanyno problem21:53
nkinkadenathany: Do you mean on line 107?21:55
nathanyyes21:55
*** parker-fcnyu has joined #cc21:56
nkinkadeHmm.  I'm not really sure anymore.   At this point I can't really say more than my note indicates.21:57
nkinkadeI'm not sure how I determined before that THE license URL would always be wrapped in a <span> tag with id=license_url21:59
nathanylol22:00
nathanyyeah, me either :)22:00
nkinkadehaha.22:00
nkinkadeAh, now I see.  See ./software/themes/cc/index.html.tmpl22:00
nkinkadeLook at line 109.22:01
nathanyoh, the filter gets the rendered page, not the source22:01
nkinkadeYeah.22:01
nathanythat makes [more] sense now22:01
nkinkadeIn this case it does.22:01
nathanynkinkade: anyway, i fixed the other issue but not sure if i should commit to trunk, production, ...?22:02
nathanyso i reassigned to you for committing :)22:02
nkinkadeThere are two ways Planet Venus filters can work.  1) By getting each entry as it's parsed 2) By getting the final rendered page.22:02
nathanygot it22:02
nkinkadeWhere was the other issue, by the way?22:02
nkinkadeAre you working on a local checkout?22:04
nathanynkinkade: nope, i fixed it on a822:04
nathanyit's in the get_license_name filter22:04
nkinkadeFuck.22:04
nkinkadeSorry.22:04
nathanynkinkade: no problem22:04
nkinkadeAnd there I went blaming it on poor rdfadict. :-)22:05
nathanybtw, where does the code live that pulls out the license URL?22:05
nathanyi assume it's in a different filter/plugin?22:05
nathanywe need to do something similar there to fix http://code.creativecommons.org/issues/issue28422:05
nkinkadeThe code that pulls out the license URL lives somewhere in the planet code.22:05
nkinkadeI had to alter the core code to make the plugin work.22:05
nathanyah22:05
nkinkadeI think the changes are visible in svn:22:05
nkinkadehttp://code.creativecommons.org/viewsvn/planet/22:06
nathanyawesome22:06
nkinkadehttp://code.creativecommons.org/viewsvn/planet/trunk/software/planet/shell/tmpl.py?r1=8298&r2=1018822:07
nkinkadeSomewhere around there.22:07
nkinkadeAs far as your change, the best would be to checkout planet, make the change in trunk, then svnmerge it to production, then svn up ... the usual.22:08
nkinkadepaulproteus: What's the status with newer versions of svn and svnmerge?22:08
nkinkadeDidn't you say that lately svnmerge was basically part of subversion?22:08
*** tanjir_ has quit IRC22:10
nathanynkinkade: it sort of is, but it's implemented in a way that's incompatible with the svnmerge we've been using22:10
nathanyyou can get it to upgrade but it means upgrading the server + everyone's client22:10
*** haoyu has quit IRC22:10
nathany(although most people's clients are probably already updated)22:10
*** haoyu has joined #cc22:10
* paulproteus nods in nkinkade's and nathany's general direction22:12
johndoigiiinathany, if you add anything to the attributes of the span tag it will it in with the ${i18n_name_of_item} and thus break the markup22:13
nathanyjohndoigiii: that's incredibly depressing22:14
nathanynkinkade: if I kill the license cache planet will recreate it, right?22:46
nathany(cache/license_mappings/license.maps)22:47
nkinkadenathany: Yeah, it'll get recreated.22:52
nkinkadeIt should be safe to do something like rm -rf cache/*22:52
nathanynkinkade: thanks, i just assumed that and tried it :)22:52
nkinkade:-)22:52
nkinkadenathany: For those Deeds that you need generated.  Is that all license versions, or just 3.0?22:55
nathanynkinkade: all versions22:55
nkinkadenathany: If -v isn't specified for mkdeeds, will it generate them all, or is there some way to tell it to do all without using a comma separated list?22:56
nathanynkinkade: iirc you can omit it and it'll do them all22:56
nkinkadeI'll give it a go.22:56
nathanyit always starts with a list of every license and then prunes it based on the command line22:56
nathany(iirc)22:57
*** kreynen_ has joined #cc23:03
*** nathany has quit IRC23:07
*** kreynen has quit IRC23:20
*** kreynen_ has quit IRC23:22
*** at1z0r has joined #cc23:24
*** at1z0r has left #cc23:24
*** nathany has joined #cc23:46
nkinkadepaulproteus: Do you happen to be logged into a5?23:47
nathanypaulproteus: nkinkade: did we disable syncookies?23:47
nkinkadenathany: I haven't touched a5 since the problem earlier.23:47
nkinkadenathany: Looks like paster got up to 2.4G again.23:49
nkinkadeI'm restarting it.  I'm not sure why the watchdog script suddenly stopped working.23:49
paulproteusnkinkade, I do not23:49
paulproteus 18:49:39 up 8 days,  6:31,  3 users,  load average: 49.93, 107.61, 62.9923:49
paulproteusSweet23:49
nathanynkinkade: we should figure out what's up with that23:50
nathanysooner rather than later23:50
nkinkadepaulproteus: Do you have a second to take a look at the watchdog script?23:50
paulproteusWhich app is paster, and can't we just move it to cycling processes via FCGI? Sorry that I don't know much about this deployment so will sound clueless. )-:23:50
paulproteusnkinkade, You got it, tell me where it is23:50
nkinkadeI'm just about to head out the door  ... a friend is on the way to pick me up in about 5 minutes.23:50
nathanyfor fuck's sake23:50
nathany(sorry)23:50
nathanynkinkade: where does the watchdog live?23:50
nkinkadea5:/root/bin/check_webservices.sh23:51
paulproteusI think we can handle it, nathany23:51
paulproteusWow, not a trivial script. (-:23:51
nathany(the FFS wasn't directed at you, nkinkade -- of course it's end of day for you :) )23:51
nkinkadepaulproteus: You should be able to see at the top where I set how much memory is the max for paster.23:51
paulproteusnkinkade, Ya23:51
nkinkadeIt's nearly the end of the day for SF, too.23:52
paulproteus"Luckily" my day got started late23:52
nathanypaulproteus: as discussed before, Zope isn't amenable to fcgi handling or we may have moved to that previously23:53
nathanyheh23:53
paulproteusI can chat with NK about it, nathany, if you need to go23:53
nathanyi'll hang out a bit23:53
nathanyNK has to run, i believe23:53
paulproteusOh oops, I read the wrong nathan in the 5min message23:53
nkinkadepaulproteus:  I've got to run, but as long as cc.engine can stay going for about 2 hours, then I can look more into when I get back after dinner.23:53
nathanynkinkade: thanks... paulproteus will send you an email and let you know if we do anything :)23:54
paulproteusACK23:54
nkinkadeThe script had been working just fine for months and months.23:54
nkinkadeSuddenly today it seems to not be catching paster (cc.engine) before it's memory usage spirals out of control.23:54
nathanypaulproteus: yes, complex, but i think the part we need to care about is the first 10-15 lines23:54
nathany(the part that looks @ paster)23:54
* paulproteus installs emacs23:54
nkinkadeUnless of course it's because the memory usage grows from acceptable to distastrous in less than 2 minutes.23:54
paulproteusne'er mind it's already there23:55
nathanyi suppose that's possible but seems unlikely23:55
nkinkadeOkay.  I'll check back on #cc when I get back, and maybe send an email me if you find something.23:55
paulproteus+ '[' 7604 106540 -gt 750000 ']'23:55
paulproteus/root/bin/check_webservices.sh: line 22: [: too many arguments23:55
nkinkadeDoh!23:55
nathanyLOL23:56
nkinkadeOkay.  I'll check back in a bit later.23:56
paulproteusI'm looking into why that is...23:56
* paulproteus is running with bash -x23:56
nkinkadeThanks paulproteus.  sorry about the bum script.23:56
paulproteusHeh, that's life. (-:23:56
* paulproteus eats some chocolate chips and carries on23:56
nathanypaulproteus: so it had an incorrect #! or you're still looking @ it?23:59
paulproteusI'm still looking23:59

Generated by irclog2html.py 2.6 by Marius Gedminas - find it at mg.pov.lt!