*** parker-fcnyu has joined #cc | 00:10 | |
*** mralex has quit IRC | 00:27 | |
*** nkinkade has quit IRC | 01:16 | |
*** jgay has quit IRC | 01:28 | |
*** erlehmann has quit IRC | 01:33 | |
*** pktck has joined #cc | 01:44 | |
*** erlehmann has joined #cc | 01:46 | |
*** pktck has quit IRC | 02:20 | |
*** parker-fcnyu has quit IRC | 02:56 | |
*** pktck has joined #cc | 03:02 | |
*** parker-fcnyu has joined #cc | 03:25 | |
*** oshani has quit IRC | 05:32 | |
*** JoiIto has joined #cc | 05:38 | |
*** Kaetemi has quit IRC | 06:03 | |
*** JoiIto has quit IRC | 06:27 | |
*** parker-fcnyu has quit IRC | 06:32 | |
*** MarkDude has quit IRC | 07:20 | |
*** erlehmann has quit IRC | 07:27 | |
*** pktck has quit IRC | 07:29 | |
*** pktck has joined #cc | 07:40 | |
*** MarkDude has joined #cc | 08:09 | |
*** pktck has quit IRC | 08:19 | |
*** JED3 has quit IRC | 08:35 | |
*** JED3 has joined #cc | 08:35 | |
*** pktck has joined #cc | 08:40 | |
*** wormsxulla has quit IRC | 08:46 | |
*** niekie has quit IRC | 09:01 | |
*** niekie has joined #cc | 09:05 | |
*** pktck has quit IRC | 09:24 | |
*** wormsxulla has joined #cc | 09:34 | |
*** FHaag has joined #cc | 09:44 | |
*** pktck has joined #cc | 09:45 | |
*** pktck has quit IRC | 09:51 | |
*** bassel has joined #cc | 09:59 | |
*** pktck has joined #cc | 10:03 | |
*** MarkDude has quit IRC | 10:06 | |
*** MarkDude has joined #cc | 10:07 | |
*** pktck has quit IRC | 10:08 | |
*** pktck has joined #cc | 10:28 | |
*** pktck has quit IRC | 10:39 | |
*** pktck has joined #cc | 10:42 | |
*** pktck has quit IRC | 10:47 | |
*** akila87 has joined #cc | 10:52 | |
*** oshani has joined #cc | 11:19 | |
*** oshani has quit IRC | 11:21 | |
*** bassel has quit IRC | 12:12 | |
*** bassel has joined #cc | 12:12 | |
*** bassel has quit IRC | 12:57 | |
*** tvol has joined #cc | 13:04 | |
*** oshani has joined #cc | 13:08 | |
*** bassel has joined #cc | 13:09 | |
*** midoubleko has quit IRC | 13:44 | |
*** midoubleko has joined #cc | 13:45 | |
*** midoubleko has quit IRC | 13:50 | |
*** midoubleko has joined #cc | 13:50 | |
*** paroneayea has quit IRC | 13:52 | |
*** paroneayea has joined #cc | 13:56 | |
*** igorlukanin has joined #cc | 14:06 | |
*** akila87 has left #cc | 14:08 | |
*** parker-fcnyu has joined #cc | 14:20 | |
*** akila87 has joined #cc | 14:28 | |
*** nkinkade has joined #cc | 14:52 | |
*** Pascalcmoi has joined #cc | 15:22 | |
Pascalcmoi | A website using Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License mean that only usa redisent can legaly read the page? | 15:23 |
---|---|---|
paroneayea | Pascalcmoi: no, it just means that the legal code is tuned particularly to the legal system of United States law | 15:23 |
Pascalcmoi | thanks paroneayea | 15:24 |
paroneayea | np | 15:24 |
*** Pascalcmoi has left #cc | 15:25 | |
paroneayea | nkinkade: Yo | 15:31 |
paroneayea | ah, never mind | 15:31 |
*** jed_ has joined #cc | 15:34 | |
*** JED3 has quit IRC | 15:37 | |
paroneayea | well | 15:38 |
paroneayea | we're running sanity again | 15:38 |
paroneayea | now with caching of RDF queries and new zeland added to jurisdictions.rdf | 15:39 |
*** oshani has quit IRC | 15:41 | |
*** Kaetemi has joined #cc | 15:44 | |
*** Kaetemi has joined #cc | 15:44 | |
*** jed_ has quit IRC | 15:59 | |
*** oshani has joined #cc | 16:04 | |
*** tieguy has joined #cc | 16:12 | |
*** erlehmann has joined #cc | 16:27 | |
*** mralex has joined #cc | 16:49 | |
*** jed_ has joined #cc | 16:50 | |
*** erlehmann has quit IRC | 16:50 | |
*** igorlukanin has quit IRC | 16:50 | |
nkinkade | paroneayea: What's the bit about cc.engine taking a while to start up after restarting Apache? | 16:53 |
nkinkade | How long? | 16:53 |
paroneayea | nkinkade: a few seconds | 16:53 |
nkinkade | Oh, that's not too bad. Do you think the problems with paster running away with memory should be gone too? | 16:54 |
nkinkade | Before a script had to run every few minutes to check memory usage adn would have to kill paster about every 7 or 8 hours to get it to release memory. | 16:54 |
paroneayea | I hope so, but I'm not sure what would have been causing the memory load in paster | 16:55 |
nkinkade | NRY had some ideas, but I never knew/understood just what they were. | 16:55 |
nkinkade | We'll know soon enough, because I get mails when the script reloads paster, but I suspect I'll need to change the script and pid files may have moved around. | 16:55 |
paroneayea | that's part of the thing | 16:56 |
paroneayea | we switched this over to fastcgi so | 16:56 |
paroneayea | there's no separate daemon | 16:56 |
paroneayea | apache / mod_fcgid is starting and managing the process | 16:56 |
paroneayea | and handling forking and etc | 16:56 |
nkinkade | paroneayea: Cool and I see that the cc.engine pid is still at /var/run/cc.engine.pid | 16:57 |
nkinkade | And the script should still work if necessary. | 16:57 |
paroneayea | hm, no.. I don't think that does anything | 16:57 |
paroneayea | I just haven't shut off the old system | 16:57 |
nkinkade | Oh. | 16:57 |
paroneayea | in case we need to switch things over fast | 16:57 |
paroneayea | so technically we are running doubletime | 16:58 |
paroneayea | two cc.engines are running right now | 16:58 |
*** FHaag has left #cc | 16:58 | |
nkinkade | How long does it take to shut the old system down? Nothing more than /etc/init.d/cc-engine stop/start, right? | 16:58 |
paroneayea | yeah | 16:58 |
paroneayea | you have to do it from the old cc.engine directory | 16:58 |
nkinkade | The old init script no longer works? | 16:58 |
paroneayea | it does | 16:58 |
paroneayea | but you always seemed to need to run it from there | 16:59 |
paroneayea | if I ran it from anywhere else it did nothing | 16:59 |
nkinkade | I never found that to be true. Hmm. | 16:59 |
nkinkade | Can I try it now? | 16:59 |
paroneayea | go for it | 16:59 |
paroneayea | shouldn't affect the running engine at all | 16:59 |
nkinkade | I just ran it from my home directory and it worked. | 17:00 |
nkinkade | It also just reclaimed about 500M of memory. | 17:00 |
paroneayea | ah :) | 17:01 |
nkinkade | Which can now be used for other things. | 17:01 |
paroneayea | yay! | 17:01 |
nkinkade | Like disc caching, etc. | 17:01 |
nkinkade | Cool. | 17:01 |
paroneayea | anyway cc.engine has been running all morning, I've noticed no problems, and nobody's emailed webmaster with any new problems | 17:02 |
paroneayea | heading out to lunch | 17:11 |
*** niekie has quit IRC | 17:14 | |
*** igorlukanin has joined #cc | 17:16 | |
*** michi__ has joined #cc | 17:16 | |
*** niekie has joined #cc | 17:19 | |
*** akila87 has left #cc | 17:20 | |
*** niekie has quit IRC | 17:27 | |
*** niekie has joined #cc | 17:31 | |
*** niekie has quit IRC | 17:38 | |
*** niekie has joined #cc | 17:40 | |
*** tvol has quit IRC | 17:56 | |
*** michi__ has quit IRC | 17:57 | |
nkinkade | I just realized that Gmail for Google Apps has been sending all info@ and webamaster@ emails to spam! | 17:57 |
paroneayea | nkinkade: oh no! :( | 18:14 |
nkinkade | It's not uncommon for Google to send legitimate email to spam, which is why I check over it every single day. I'm just not sure why I didn't catch it earlier. | 18:16 |
*** igorlukanin has quit IRC | 18:48 | |
*** akozak has joined #cc | 18:50 | |
dithyramble | hey akozak | 18:57 |
akozak | hey | 18:57 |
akozak | dithyramble, just wondering when you were flying out to see if we could get to the airport together | 18:58 |
paulproteus | akozak: FWIW I'm heading to greg-g's place Thu night | 18:58 |
dithyramble | I'm flying out of LAN at 7:11pm (on United 5702) | 18:58 |
paulproteus | Wanna join us? (-: | 18:58 |
paulproteus | (Then I'm flying out of DTW that night, I think 7 PM) | 18:59 |
paulproteus | (er, I'm flying out of DTW Fri night at 7 PM) | 18:59 |
paulproteus | Let me rephrase it. You should join us. | 18:59 |
paulproteus | You can work remotely on Friday, surely. | 19:00 |
akozak | paulproteus, thanks, but I shouldn't be away from home that long... will still be in the process of moving. | 19:00 |
paulproteus | Aww, okaaaaayyyyyy. | 19:00 |
*** midoubleko has quit IRC | 19:00 | |
akozak | yea, I'm sure it would be fun :) | 19:00 |
*** midoubleko has joined #cc | 19:00 | |
akozak | dithyramble, ok perfect, thanks | 19:01 |
*** wormsxulla has quit IRC | 19:13 | |
*** parker-fcnyu has quit IRC | 19:15 | |
paroneayea | well | 19:25 |
paroneayea | nkinkade: regarding cpu usage on a5 | 19:27 |
paroneayea | it looks like most often it's all the apache processes' | 19:27 |
nkinkade | paroneayea: What do you think? | 19:27 |
paroneayea | looking at top | 19:27 |
paroneayea | nkinkade: what do you say we switch over to nginx | 19:27 |
paroneayea | kidding kidding kidding | 19:28 |
paroneayea | in seriousness though, it could simply be that a5 just gets a lot of http traffic all the time | 19:29 |
nkinkade | paroneayea: So does a8, much more in fact, but a8 is less loaded. | 19:29 |
paroneayea | hm | 19:29 |
nkinkade | Granted a8 is mostly small static files. | 19:29 |
nkinkade | :-) | 19:29 |
paroneayea | is a8 where the buttons are then? | 19:29 |
paroneayea | the embeddable images I mean | 19:30 |
paroneayea | I've always thought that must be a huge amount of http traffic | 19:30 |
nkinkade | Yeah, a8 hosts all those icons and buttons. | 19:31 |
nkinkade | a5 right now is pumping out steadily around 600KB/s to 800KB/s | 19:31 |
paroneayea | what could make apache really cpu intensive? | 19:31 |
paroneayea | maybe a lot of rewrite rules and etc? | 19:32 |
*** wormsxulla has joined #cc | 19:32 | |
nkinkade | Varnish's hitrate seems to hover around 95%, which doesn't seem *too* bad, though I guess it could be better. | 19:34 |
nkinkade | And we have APC caching PHPs opcode and that all looks nice: http://a5.creativecommons.org/apc.php | 19:35 |
nkinkade | A 100% hit rate for APC. | 19:35 |
nkinkade | a5 does have a quite a few rewrite rules. | 19:35 |
paroneayea | I turned on the rewrite log for a bit | 19:36 |
paroneayea | it looked like every static file hit hits a LOT of rules | 19:36 |
nkinkade | paroneayea: Yeah, don't leave the log on for long. | 19:39 |
nkinkade | Especially if your loglevel is more than 5 or 6 | 19:39 |
nkinkade | It will produce vast amounts of data and just slow things down even more. | 19:39 |
nkinkade | To see how many rewrite rules it has to hit, one just needs to look at the vhost config in confg/ | 19:40 |
nkinkade | conf/ | 19:40 |
nkinkade | We are also gzipping things when a client can accept it, so that may be taking some CPU, but I would expect Varnish to cache the gzipped response. | 19:41 |
paroneayea | running pages from the new license engine through the validator | 19:42 |
paroneayea | it's not validating, though the templates should be the same as the old engine | 19:42 |
paroneayea | though other pages on cc.org aren't validating either :( | 19:42 |
nkinkade | Most pages on CC.org have never validated. | 19:46 |
paroneayea | Looks like there's another RDF issue with the new engine | 19:47 |
paroneayea | as in, since the new engine uses the RDF as the database | 19:47 |
paroneayea | it's exposing problems we have with missing things in our RDF | 19:47 |
nkinkade | paroneayea: Did you just see that webmaster@ email? | 19:48 |
paroneayea | yeah | 19:48 |
paroneayea | http://creativecommons.org/international/br/ <- the licenses on here not showing up | 19:48 |
paroneayea | and here's why: | 19:49 |
paroneayea | we only have RDF for 2.5 licenses w/ brazil | 19:49 |
paroneayea | not 3.0 | 19:49 |
nkinkade | Seems like maybe the unit tests should check this ... run through each jurisdiction and fetch the deed. If a 404 comes back, the the test fails. | 19:49 |
paroneayea | but based on what data? | 19:51 |
paroneayea | according to the RDF this is correct | 19:51 |
*** JoiIto has joined #cc | 19:51 | |
paroneayea | we don't have those licenses in the RDF is what | 19:51 |
paroneayea | I could put together a scraper possibly that checks pages like: http://creativecommons.org/international/br/ | 19:51 |
paroneayea | and looks for all the licenses they're expecting to exist | 19:52 |
nkinkade | paroneayea: It's ugly but it wouldn't be hard to scrape international/ | 19:52 |
paroneayea | yeah | 19:52 |
nkinkade | Then again paulproteus might just find it beautiful. | 19:52 |
paroneayea | :) | 19:52 |
nkinkade | He's like that. :-) | 19:52 |
paroneayea | why don't I roll back the engine and write a tool/test to do that | 19:53 |
paroneayea | so we can make sure we're not screwing over any other jurisdictions with missing RDF data | 19:53 |
nkinkade | paroneayea: As a suggestion, you could also look into grabbing the data directly from the WP database. | 19:58 |
nkinkade | But I guess a list of the juris. is not what you need. | 19:59 |
*** oshani has quit IRC | 19:59 | |
nkinkade | And the db has nothing about version number, I think. | 19:59 |
nkinkade | Which apparently is what you need. | 20:01 |
nkinkade | paulproteus: dithyramble: Where on a6 is the crawl data being stored? | 20:04 |
nkinkade | Do you feel that it's vital that it be backed up? | 20:04 |
paulproteus | nkinkade: We haven't really done anything with a6. We're developing locally. | 20:04 |
nkinkade | Ah. | 20:04 |
nkinkade | Somewhere something is eating up a lot of space in the last week. | 20:05 |
paulproteus | Based on this conversation, what I'll do is remember to tell you when we deploy and start wanting backups. | 20:05 |
paulproteus | Huh. | 20:05 |
nkinkade | paulproteus: Soon it shouldn't matter. | 20:05 |
paulproteus | Okay. | 20:05 |
paulproteus | (Because of the Rapture?) | 20:05 |
*** oshani has joined #cc | 20:05 | |
nkinkade | Once we move over to hosting at the ISC (hopefully) we'll have more bandwidth and also a lot more disc space and I intend to backup / from top to bottom. | 20:06 |
paulproteus | Oh my GOD | 20:06 |
paulproteus | AWEESOME | 20:06 |
* paulproteus is so jealous. | 20:06 | |
nkinkade | But for the moment, we are still in the CC office and disc space is down to 35G. | 20:06 |
paulproteus | Oh, you just mean backups' hosting? | 20:06 |
nkinkade | paulproteus: I was being unfair to you. I noticed it go up and up over the past week and I automatically assumed it coincided with your return. :-) | 20:07 |
paulproteus | nkinkade: Heh (-: | 20:07 |
nkinkade | In the past, with resource issues, I could usually single you out and be right about 50% of the time. :-) | 20:07 |
paulproteus | (-: | 20:08 |
nkinkade | With that type of percentage, I usually just shot first and asked questions later. | 20:08 |
nkinkade | Although a6 *is* looking suspcious: | 20:09 |
nkinkade | backup:/media/1TB/backups/creativecommons# du -sh * | 20:09 |
nkinkade | 62Ga5.creativecommons.org | 20:09 |
nkinkade | 281Ga6.creativecommons.org | 20:09 |
nkinkade | Not to say 281G is unreasonable, but a5 being our "main" machine and using on 20% of the disc space that a6 is using seems odd. | 20:10 |
paulproteus | I'll leave this to you, nkinkade (-: | 20:10 |
nkinkade | paulproteus: So to sum up ... you know of no places on a6 that may have lately been loaded up with lots of data. | 20:10 |
nkinkade | ? | 20:10 |
paulproteus | I know of nothing related to me lately on a6. | 20:10 |
paroneayea | nkinkade: I'm going to start the old engine back up, just fyi | 20:11 |
nkinkade | paroneayea: Cool. | 20:12 |
paroneayea | paulproteus: I have a question for you, scraping related | 20:36 |
paroneayea | http://creativecommons.org/international/ what's the best way to scrape for the licenses that appear under Completed Licenses vs the ones that appear under Project Jurisdictions? | 20:36 |
paroneayea | my guess is that since they aren't distinguished by appearing in separate divs and etc there's no way to really do things via xpath | 20:37 |
paroneayea | so will I just have to "iterate until I hit that point"? | 20:37 |
paulproteus | Yeah, I think that's what's you have to do. | 20:39 |
paulproteus | You could also change the template so they have a class or something. | 20:39 |
paroneayea | yes I suppose I could look to change that page itself | 20:39 |
paroneayea | nkinkade: are all the /international/ pages managed by wordpress, I assume? | 20:40 |
nkinkade | paroneayea: Yes. | 20:40 |
paulproteus | If they're managed by WordPress, then I would treat them as unchangeable. | 20:40 |
paulproteus | And just scrape messily. | 20:40 |
nkinkade | paroneayea: How about just selecting any all divs with class of ifloat in the first div with class icontainer? | 20:40 |
nkinkade | I feel like BeautifulSoup would allow for that, but I haven't used it in a while. | 20:41 |
paroneayea | because they're in the same div | 20:41 |
nkinkade | paroneayea: From what I can tell, they are in two separate divs. | 20:42 |
paroneayea | oh | 20:42 |
paroneayea | wait | 20:42 |
paroneayea | yeah you're right | 20:42 |
paroneayea | I was being too reliant on the highlighting via firebug's inspector :) | 20:43 |
paroneayea | which made it look like the div that held those jurisdiction icons was just a block above them | 20:43 |
paroneayea | well never mind then, this should be very easy :) | 20:43 |
*** bassel has quit IRC | 21:01 | |
*** oshani has quit IRC | 21:08 | |
*** JoiIto has quit IRC | 21:13 | |
jed_ | just fancied up a tool that i've been using to test some of my work http://code.creativecommons.org/~john/ | 21:57 |
*** jed_ is now known as JED3 | 21:57 | |
nkinkade | JED3: How is this? .... http://us3.php.net/manual/en/function.mysql-fetch-assoc.php : "does not contain CC-REL Metadata." | 22:05 |
nkinkade | Ooops ... wrong ULR paste. :-) | 22:05 |
nkinkade | http://creativecommons.org/licenses/by-nc-sa/3.0/ | 22:05 |
nkinkade | There it is. | 22:05 |
JED3 | nkinkade: it worked? | 22:07 |
nkinkade | JED3: It said: "does not contain CC-REL Metadata." | 22:07 |
nkinkade | But that shouldn't be right for the deeds. | 22:07 |
JED3 | well, i don't believe the deeds have any self-referential rel=license's do they? | 22:08 |
JED3 | nkinkade: ^^ | 22:09 |
JED3 | so i guess that message of "no cc-rel metadata" is a bit misleading | 22:10 |
JED3 | perhaps it would be better suited as "no CC rel=license link found" | 22:10 |
paroneayea | JED3: that's awesome! | 22:11 |
paroneayea | well | 22:11 |
paroneayea | it's super pretty :) | 22:11 |
paroneayea | and minimal and nice working :) | 22:11 |
JED3 | paroneayea: thanks! | 22:11 |
JED3 | this is the same thing we use on the deeds for the referer checking | 22:11 |
nkinkade | JED3: That could be, but the deeds do contain plenty of cc-rel metadata. | 22:12 |
nkinkade | Maybe I've just misunderstood what the tool does and is for., | 22:12 |
JED3 | nkinkade: its for extracting CC metadata from a page and displaying in a human form | 22:13 |
JED3 | try inputting "http://joi.ito.com/" as an example | 22:13 |
nkinkade | JED3: So it will only extract the metadata under certain circumstances? | 22:14 |
JED3 | nkinkade: it extracts everything, but will display when its able to make assertions from a work's triples graph | 22:15 |
JED3 | for instance if you include cc:attributionName or cc:attributionURL on a page but are not specifying a license for that work, those 2 triples are worthless for our sake | 22:16 |
nkinkade | Cool. So it's not an all purpose cc-rel metadata extractor, but meant more for checking, for example, the marking on a site, perhaps using the chooser HTML or something similar. | 22:19 |
JED3 | nkinkade: correct | 22:27 |
mralex | JED3: if you really wanted to procrastinate, you could add :hover and :active states for that Scrape button ;)) | 23:23 |
JED3 | mralex: oOo good idea | 23:23 |
mralex | http://neography.com/experiment/circles/solarsystem/ | 23:24 |
mralex | mmm, css3 | 23:25 |
JED3 | mralex: mmm html5+webgl http://www.youtube.com/watch?v=OxoFcyKYwr0&fmt=22 | 23:28 |
mralex | fun | 23:29 |
mralex | if webgl ever goes anywhere | 23:30 |
mralex | or the zombie uprising of vrml | 23:30 |
akozak | dithyramble, youre flying out of GRR right? | 23:34 |
akozak | paulproteus, do you happen to know what airport he's flying out of on the 17th? :P | 23:37 |
akozak | I forgot to ask | 23:37 |
akozak | paulproteus, oh nevermind | 23:37 |
akozak | he sais LAN | 23:37 |
akozak | said* | 23:37 |
paulproteus | Wow I'm Full | 23:51 |
dithyramble | akozak: at least it's not called BRR | 23:51 |
Generated by irclog2html.py 2.6 by Marius Gedminas - find it at mg.pov.lt!