Saturday, 2008-09-06

*** ftobia has joined #cc00:46
*** Bovinity has joined #cc00:57
*** Roderick__ has joined #CC00:58
*** Bovinity has quit IRC01:05
*** mlinksva has quit IRC01:05
*** nkinkade has quit IRC01:06
*** Roderick_ has quit IRC01:14
*** stevel has quit IRC01:16
*** ftobia has quit IRC01:28
*** tvol has joined #CC01:54
*** tvol has quit IRC02:14
*** MAWAR_PUTIH has joined #cc02:53
*** MAWAR_PUTIH has quit IRC03:33
*** Roderick_ has joined #CC03:35
*** Roderick__ has quit IRC03:51
*** adjohn has joined #cc05:15
*** Mihai` has quit IRC06:04
*** is4 has joined #cc07:19
*** isforinsects has quit IRC07:19
*** luser__ is now known as bringatowel08:12
*** bringatowel has joined #cc08:12
paulproteusankitg, Mornin'.08:42
ankitgpaulproteus: 01:42 morning ...08:43
paulproteusYeah, I'm going to bed soon.08:43
paulproteusWe're going to talk tomorrow at 9am still?08:43
ankitgsure ... whatever time works for you ... just working on school assignments now ...08:45
paulproteusI'm taking that as a yes.08:45
ankitgyep08:45
paulproteusCool.  Maybe 8:30 instead.08:45
paulproteusI'll try to be on then, but if not, then nine.08:45
ankitgok, I'll keep an eye out around 0830 - 0930 ...08:45
ankitgg'nite08:55
*** BobChao has joined #cc09:25
*** isforinsects1 has joined #cc09:25
*** is4 has quit IRC09:26
*** balor has joined #cc10:54
*** isforinsects1 has quit IRC11:00
*** sama has joined #cc11:20
*** ereslibre has quit IRC11:32
*** sama has quit IRC11:53
*** Mihai` has joined #cc12:35
*** ereslibre has joined #cc12:51
*** Mihai` has quit IRC13:31
paulproteusankitg, Morning.15:36
ankitgpaulproteus, morning15:37
paulproteusWow, awesome.15:37
paulproteusI'm going to get the slightest bit dressed and be right back.15:38
paulproteusIs your latest code in git?15:38
ankitgk' take your time ...15:38
ankitglicChange v.8.2 has been in GIT since before the final eval ...15:38
ankitg*licChange.py15:38
paulproteusSure, but I thought you made some changes a few nights ago.15:39
ankitgyes, but been a bit sluggish on that ... individual study on patents in China coming to a close, need to write a research paper ... so a bit stretched for time ...15:40
ankitgThings in the works for the next version ::15:45
ankitgstderr [for progress output]15:45
ankitgmake changes[] changes{} and thus make changes smart but NOT adding changingkey [urls that change the license] but adding changes in changingkey [only the info where the changes take place]15:45
ankitgmake stats file [total URLs with changing licenses: len(changes) \nURL (IP): <URL> (<Geo-IP>)\nLic1<L1>\nDate:<Dt1>15:45
ankitg*s/but/by15:46
paulproteusankitg, You would probably do well to think more about the output format, but let's discuss that after we look at the actual code.15:49
* ankitg should stop pasting notes he made for his own reference as nobody else can get them without explanation ...15:50
ankitgok ... let go straight in ...15:50
ankitgfirst we have licDict ={} ...15:50
paulproteusYour comments look reasonably good now.15:52
paulproteusThat's good.15:52
paulproteusYou seem to create a variable "filecounter" that is unused.15:53
ankitgthis is an empty dict. in which we store all the information about the urls (which may / may not change their lic) in this format ... the licDict($URL) = [[$licType,'$version,$localizaton],[repeat]] [$date1], [$date2]''15:53
paulproteusYup, that much I gather.15:53
paulproteusI can see how the code works.15:53
ankitgit was previously used in the output of progress ...15:53
paulproteusFor now I want to talk about style.15:53
paulproteusI remember that, but remove it now because it's unused.15:54
ankitgnoted15:54
paulproteusIt would be awesome if you could actually make some commits while we talk so I can see that you understand how to act on this style advice.15:54
paulproteusSo if I am in a directory with licChange.py and want to import your module and run the licChange(dir) function twice, will it work properly?15:55
ankitgit will run the first time to create file called listdir.txt which will have a list of all the files that were processed ... on the second call it will read from the text file and ignore the files that have already been processed ...15:57
paulproteusI think that it will do the wrong thing because the dictionary licDict won't be cleared between runs.15:59
paulproteusBecause I'm talking about just running the function twice without quitting Python.15:59
paulproteusDo you see what I mean?15:59
ankitgI see what you mean, but isn't that what we want to do ... ?16:00
ankitgwe don't want to start each month with a clean slate ...16:01
ankitgwe want the information from the previous month's access as well16:01
paulproteusAs far as I can see it, the licDict and incurl variables should be defined inside the function licChange().16:01
paulproteusHow do you get access to that data right now?16:01
ankitgright now, it's on the EC2 instance ... all the data for licChange's logs ...16:02
paulproteus...16:02
paulproteusI know.16:02
paulproteusListen a sec.16:02
paulproteusLet me ask you to make a change to the program and see if it still works.16:02
paulproteusPlease move licDict's and incurl's declaration into the licChange() function.16:02
paulproteusThen return those objects, and have changes() take those objects as parameters.16:03
paulproteusIt's considered bad form to modify global variables (that is, ones defined outside a function that does the modification).16:05
ankitgit will work but the results will be different ... in the scenario you propose licDict will be cleared on each run ... the reason why I put it outside was since licDict records all the data and changes pulls out the changes in the data, it would make sense to have a comprehensive licDict of all the data ... suppose I run licDict on month 1 and it finds URL1 with one hit [and thus no changes], and then on month 2 there is another occurance of U16:06
ankitg*s / URL 2 / URL116:07
paulproteusBut...16:07
paulproteus...the program quits after main().16:07
paulproteusI don't see how what you're saying is possible.16:08
* ankitg realizes the mistake ... stops applying interpreter logic and goes to make the proposed changes to the code ...16:09
paulproteusCool. (-:16:09
paulproteusA separate question might be how to make sure you get the previous months' data.16:09
paulproteusAnyway, I was just halfway through writing a short demo program to show you that things didn't work the way you were saying.16:10
ankitgI was thinking write to file ... but as I've come to realize read it back in the correct format is *painful* ...16:10
ankitg*s/read/reading16:11
paulproteusankitg, Do you know about "pickle"?16:11
ankitgnot as of now ... but I have a feeling I am going to find out prety soon16:11
paulproteusIt's a really, really easy way to save Python objects and get them back later.16:12
* ankitg googles16:13
paulproteusMy 'net connection seems to currently suck, otherwise I'd already have a tutorial link for you.16:13
ankitgserialization of course ...16:13
paulproteusThere, so if you just want to save that dict to a file and pick it up later, you can do that.16:14
paulproteusIt just means you can't run more than one of your things in parallel.16:14
paulproteusAnd that thing may become enormous after a while.16:14
* ankitg adds "Read up on Pickle" to his to do list16:14
paulproteusI would appreciate it if you kept the loggy-related todo list in git.16:15
paulproteusThen I can see what's on your mind, and add things for you to look at or know what to show you.16:15
paulproteusBTW: http://www.network-theory.co.uk/docs/pytut/pickleModule.html16:15
ankitgwould the Labs (comments) work better?16:15
ankitgre: Loggy to-do list ^^ Lab Posts + Comments16:17
paulproteusI see what you mean.16:17
paulproteusI'd personally reply faster if you emailed cc-devel, actually.16:17
paulproteusfastest, even.16:18
paulproteusI don't check my RSS feeds more often than once every 1-2 days, but email I'm always on.16:18
ankitgthen CC-Dev list for queries and Labs for updates and ToDos?16:18
paulproteusSure, that makes sense.16:19
paulproteusI think that as you go down the pickling route, eventually the dictionary will just become too enormous.16:19
paulproteusMaybe I'm wrong about that, so by all means try it first.16:20
paulproteus"Do the simplest thing that could possibly work," as they say.16:20
paulproteusOh, one more thing.16:21
paulproteusYou can skip lines between sections of the long licChange() function to make it easier to read.16:21
paulproteusAnd the way you've written this, because it relies on filenames, you can't run this on stdin; you have to run it on filenames.16:22
paulproteusIf you use S3 FUSE or something, that would be fine, but if not, you may run out of disk space.16:22
paulproteusActually, the problem with using Labs for ToDos is that then it's hard to see all the todos at once.16:23
ankitgThen S3 FUSE goes on the todo list as well ...16:23
paulproteusUsing Labs for To-do *updates* makes sense, but I would really like it if I could see the actual to-do list at any given moment.  That's why I liked the idea of storing in git with the code.16:24
ankitgokie ... GIT for ToDo ... Labs for updates ... dev-list for queries ...16:24
paulproteusI appreciate that you're writing comments, but you don't necessarily have to write a comment for each line if it's clear from the code what that line does.16:24
paulproteusankitg, Great.16:25
paulproteusTake this line:16:25
paulproteus  31                for line in zfile: #Iterates through all the16:25
paulproteusFirst of all, the comment is truncated.  Secondly, it's clear from the "for line in zfile:" what the line does.16:26
paulproteusComments are best-used when you will need to explain not *what* is happening, but *why*.16:26
ankitgfine, will remove those comments ...16:26
paulproteusI think that's mostly all I have to say as far as the code review, at least for now.  Surely once you have those things fixed, I'll have more to say.16:28
ankitgokie, lets get this done first then.16:30
paulproteusBeyond the code review portion of the conversation, I want to make sure you don't lose sight of the research angle of this project.16:31
paulproteusBeyond cleaning up your Python, the most important thing for you to do is to think about the results you generated and consider all the possible reasons a license "change" would be seen, and see if we can somehow filter out what we don't want from what we do.16:31
ankitgwell one possible reason for so many URLS is that each URL is like $var1.com/.../.../../something1.html and $var1.com/.../.../../something2.html ... which leads to many entries ...16:36
paulproteusRight - it would be nice if the data could be post-processed to point out that those share some structure at the start.16:38
paulproteusBut even if we don't consider that problem, just to explain the data we have:16:39
paulproteusWhat are the possible reasons an image referer could change?16:39
paulproteusSince that's what you're measuring.16:39
ankitgah, the caveats for our proxy for measurement ... hmmm ...16:40
paulproteusI don't necessarily want an answer right away.16:42
paulproteusBut I think that writing a few paragraphs about that (for labs-dot and cc-devel, let's say) should be your highest priority.16:42
paulproteusWe can further discuss it there, and I'm sure Mike will have some ideas, too.16:43
ankitgokie ... will think about this and make a post to labs and list ...16:44
paulproteusOkay.16:47
paulproteusAnything else to discuss before I get dressed for the day?16:47
ankitgI think that's enough To Dos for now, thank you very much! (-:16:47
paulproteusOkay, great!16:48
*** soma has quit IRC16:52
* ankitg heads out for a late dinner ...16:54
*** [mharrison] has joined #cc16:56
*** ankitg has quit IRC17:28
*** balor has quit IRC17:46
*** nathany has joined #cc17:48
*** nathany has quit IRC18:00
*** nathany has joined #cc18:01
*** [mharrison] has quit IRC18:10
paulproteusHowdy nathany.18:33
nathanymorning paulproteus18:33
paulproteusI got up early to take a tour of the SF Friends School opening near us.18:34
paulproteusRight next door to me.18:34
paulproteusIn the old Levi Strauss building.18:34
nathanyah, cool18:34
nathanyis it friendly?18:34
paulproteusYeah, it's great.18:34
paulproteusIt's probably still open; you might want to go take a look and chat with the teachers there.18:34
nathanyi'm at the office18:35
paulproteusOh.18:35
nathany(working)18:35
paulproteusHow long are you going to be there?18:36
paulproteusThe day, basically?18:36
nathanyat least until 1 or 2... possibly longer18:37
nathanyRichard is here hanging out with me, i'm just trying to get some traction on search18:37
paulproteusOh, cool.18:37
paulproteusI was going to say I might join you if it's okay if you don't mind me not doing CC stuff.18:37
nathany(and getting distracted by seeing about binding Mylyn to Roundup)18:37
paulproteusGlad you have some company at least.18:37
nathanyi don't care... we'll be here at least another hour or two18:37
nathanybtw, the roundup mail interface is quite handy18:38
paulproteusYay!18:40
*** rohitj has joined #cc18:52
*** balor has joined #cc19:05
*** balor has quit IRC19:09
*** ereslibre has quit IRC19:38
*** bring2towels has joined #cc19:59
*** bringatowel has quit IRC19:59
*** UltraMagnus has joined #cc20:35
*** isforinsects has joined #cc20:54
*** nathany has quit IRC21:00
*** b52 has joined #cc21:18
*** b52 has left #cc21:18
*** pktck has joined #cc21:18
*** pktck has quit IRC21:27
*** pktck has joined #cc21:28
*** rohitj has quit IRC21:33
*** pktck has quit IRC21:36
*** pktck has joined #cc21:39
*** ankitg has joined #cc22:21
*** UltraMagnus has quit IRC22:41
*** K`Tetch_U has joined #cc22:44
*** K`Tetch has quit IRC22:54
*** isforinsects has quit IRC23:57

Generated by irclog2html.py 2.6 by Marius Gedminas - find it at mg.pov.lt!