*** ftobia has joined #cc | 00:46 | |
*** Bovinity has joined #cc | 00:57 | |
*** Roderick__ has joined #CC | 00:58 | |
*** Bovinity has quit IRC | 01:05 | |
*** mlinksva has quit IRC | 01:05 | |
*** nkinkade has quit IRC | 01:06 | |
*** Roderick_ has quit IRC | 01:14 | |
*** stevel has quit IRC | 01:16 | |
*** ftobia has quit IRC | 01:28 | |
*** tvol has joined #CC | 01:54 | |
*** tvol has quit IRC | 02:14 | |
*** MAWAR_PUTIH has joined #cc | 02:53 | |
*** MAWAR_PUTIH has quit IRC | 03:33 | |
*** Roderick_ has joined #CC | 03:35 | |
*** Roderick__ has quit IRC | 03:51 | |
*** adjohn has joined #cc | 05:15 | |
*** Mihai` has quit IRC | 06:04 | |
*** is4 has joined #cc | 07:19 | |
*** isforinsects has quit IRC | 07:19 | |
*** luser__ is now known as bringatowel | 08:12 | |
*** bringatowel has joined #cc | 08:12 | |
paulproteus | ankitg, Mornin'. | 08:42 |
---|---|---|
ankitg | paulproteus: 01:42 morning ... | 08:43 |
paulproteus | Yeah, I'm going to bed soon. | 08:43 |
paulproteus | We're going to talk tomorrow at 9am still? | 08:43 |
ankitg | sure ... whatever time works for you ... just working on school assignments now ... | 08:45 |
paulproteus | I'm taking that as a yes. | 08:45 |
ankitg | yep | 08:45 |
paulproteus | Cool. Maybe 8:30 instead. | 08:45 |
paulproteus | I'll try to be on then, but if not, then nine. | 08:45 |
ankitg | ok, I'll keep an eye out around 0830 - 0930 ... | 08:45 |
ankitg | g'nite | 08:55 |
*** BobChao has joined #cc | 09:25 | |
*** isforinsects1 has joined #cc | 09:25 | |
*** is4 has quit IRC | 09:26 | |
*** balor has joined #cc | 10:54 | |
*** isforinsects1 has quit IRC | 11:00 | |
*** sama has joined #cc | 11:20 | |
*** ereslibre has quit IRC | 11:32 | |
*** sama has quit IRC | 11:53 | |
*** Mihai` has joined #cc | 12:35 | |
*** ereslibre has joined #cc | 12:51 | |
*** Mihai` has quit IRC | 13:31 | |
paulproteus | ankitg, Morning. | 15:36 |
ankitg | paulproteus, morning | 15:37 |
paulproteus | Wow, awesome. | 15:37 |
paulproteus | I'm going to get the slightest bit dressed and be right back. | 15:38 |
paulproteus | Is your latest code in git? | 15:38 |
ankitg | k' take your time ... | 15:38 |
ankitg | licChange v.8.2 has been in GIT since before the final eval ... | 15:38 |
ankitg | *licChange.py | 15:38 |
paulproteus | Sure, but I thought you made some changes a few nights ago. | 15:39 |
ankitg | yes, but been a bit sluggish on that ... individual study on patents in China coming to a close, need to write a research paper ... so a bit stretched for time ... | 15:40 |
ankitg | Things in the works for the next version :: | 15:45 |
ankitg | stderr [for progress output] | 15:45 |
ankitg | make changes[] changes{} and thus make changes smart but NOT adding changingkey [urls that change the license] but adding changes in changingkey [only the info where the changes take place] | 15:45 |
ankitg | make stats file [total URLs with changing licenses: len(changes) \nURL (IP): <URL> (<Geo-IP>)\nLic1<L1>\nDate:<Dt1> | 15:45 |
ankitg | *s/but/by | 15:46 |
paulproteus | ankitg, You would probably do well to think more about the output format, but let's discuss that after we look at the actual code. | 15:49 |
* ankitg should stop pasting notes he made for his own reference as nobody else can get them without explanation ... | 15:50 | |
ankitg | ok ... let go straight in ... | 15:50 |
ankitg | first we have licDict ={} ... | 15:50 |
paulproteus | Your comments look reasonably good now. | 15:52 |
paulproteus | That's good. | 15:52 |
paulproteus | You seem to create a variable "filecounter" that is unused. | 15:53 |
ankitg | this is an empty dict. in which we store all the information about the urls (which may / may not change their lic) in this format ... the licDict($URL) = [[$licType,'$version,$localizaton],[repeat]] [$date1], [$date2]'' | 15:53 |
paulproteus | Yup, that much I gather. | 15:53 |
paulproteus | I can see how the code works. | 15:53 |
ankitg | it was previously used in the output of progress ... | 15:53 |
paulproteus | For now I want to talk about style. | 15:53 |
paulproteus | I remember that, but remove it now because it's unused. | 15:54 |
ankitg | noted | 15:54 |
paulproteus | It would be awesome if you could actually make some commits while we talk so I can see that you understand how to act on this style advice. | 15:54 |
paulproteus | So if I am in a directory with licChange.py and want to import your module and run the licChange(dir) function twice, will it work properly? | 15:55 |
ankitg | it will run the first time to create file called listdir.txt which will have a list of all the files that were processed ... on the second call it will read from the text file and ignore the files that have already been processed ... | 15:57 |
paulproteus | I think that it will do the wrong thing because the dictionary licDict won't be cleared between runs. | 15:59 |
paulproteus | Because I'm talking about just running the function twice without quitting Python. | 15:59 |
paulproteus | Do you see what I mean? | 15:59 |
ankitg | I see what you mean, but isn't that what we want to do ... ? | 16:00 |
ankitg | we don't want to start each month with a clean slate ... | 16:01 |
ankitg | we want the information from the previous month's access as well | 16:01 |
paulproteus | As far as I can see it, the licDict and incurl variables should be defined inside the function licChange(). | 16:01 |
paulproteus | How do you get access to that data right now? | 16:01 |
ankitg | right now, it's on the EC2 instance ... all the data for licChange's logs ... | 16:02 |
paulproteus | ... | 16:02 |
paulproteus | I know. | 16:02 |
paulproteus | Listen a sec. | 16:02 |
paulproteus | Let me ask you to make a change to the program and see if it still works. | 16:02 |
paulproteus | Please move licDict's and incurl's declaration into the licChange() function. | 16:02 |
paulproteus | Then return those objects, and have changes() take those objects as parameters. | 16:03 |
paulproteus | It's considered bad form to modify global variables (that is, ones defined outside a function that does the modification). | 16:05 |
ankitg | it will work but the results will be different ... in the scenario you propose licDict will be cleared on each run ... the reason why I put it outside was since licDict records all the data and changes pulls out the changes in the data, it would make sense to have a comprehensive licDict of all the data ... suppose I run licDict on month 1 and it finds URL1 with one hit [and thus no changes], and then on month 2 there is another occurance of U | 16:06 |
ankitg | *s / URL 2 / URL1 | 16:07 |
paulproteus | But... | 16:07 |
paulproteus | ...the program quits after main(). | 16:07 |
paulproteus | I don't see how what you're saying is possible. | 16:08 |
* ankitg realizes the mistake ... stops applying interpreter logic and goes to make the proposed changes to the code ... | 16:09 | |
paulproteus | Cool. (-: | 16:09 |
paulproteus | A separate question might be how to make sure you get the previous months' data. | 16:09 |
paulproteus | Anyway, I was just halfway through writing a short demo program to show you that things didn't work the way you were saying. | 16:10 |
ankitg | I was thinking write to file ... but as I've come to realize read it back in the correct format is *painful* ... | 16:10 |
ankitg | *s/read/reading | 16:11 |
paulproteus | ankitg, Do you know about "pickle"? | 16:11 |
ankitg | not as of now ... but I have a feeling I am going to find out prety soon | 16:11 |
paulproteus | It's a really, really easy way to save Python objects and get them back later. | 16:12 |
* ankitg googles | 16:13 | |
paulproteus | My 'net connection seems to currently suck, otherwise I'd already have a tutorial link for you. | 16:13 |
ankitg | serialization of course ... | 16:13 |
paulproteus | There, so if you just want to save that dict to a file and pick it up later, you can do that. | 16:14 |
paulproteus | It just means you can't run more than one of your things in parallel. | 16:14 |
paulproteus | And that thing may become enormous after a while. | 16:14 |
* ankitg adds "Read up on Pickle" to his to do list | 16:14 | |
paulproteus | I would appreciate it if you kept the loggy-related todo list in git. | 16:15 |
paulproteus | Then I can see what's on your mind, and add things for you to look at or know what to show you. | 16:15 |
paulproteus | BTW: http://www.network-theory.co.uk/docs/pytut/pickleModule.html | 16:15 |
ankitg | would the Labs (comments) work better? | 16:15 |
ankitg | re: Loggy to-do list ^^ Lab Posts + Comments | 16:17 |
paulproteus | I see what you mean. | 16:17 |
paulproteus | I'd personally reply faster if you emailed cc-devel, actually. | 16:17 |
paulproteus | fastest, even. | 16:18 |
paulproteus | I don't check my RSS feeds more often than once every 1-2 days, but email I'm always on. | 16:18 |
ankitg | then CC-Dev list for queries and Labs for updates and ToDos? | 16:18 |
paulproteus | Sure, that makes sense. | 16:19 |
paulproteus | I think that as you go down the pickling route, eventually the dictionary will just become too enormous. | 16:19 |
paulproteus | Maybe I'm wrong about that, so by all means try it first. | 16:20 |
paulproteus | "Do the simplest thing that could possibly work," as they say. | 16:20 |
paulproteus | Oh, one more thing. | 16:21 |
paulproteus | You can skip lines between sections of the long licChange() function to make it easier to read. | 16:21 |
paulproteus | And the way you've written this, because it relies on filenames, you can't run this on stdin; you have to run it on filenames. | 16:22 |
paulproteus | If you use S3 FUSE or something, that would be fine, but if not, you may run out of disk space. | 16:22 |
paulproteus | Actually, the problem with using Labs for ToDos is that then it's hard to see all the todos at once. | 16:23 |
ankitg | Then S3 FUSE goes on the todo list as well ... | 16:23 |
paulproteus | Using Labs for To-do *updates* makes sense, but I would really like it if I could see the actual to-do list at any given moment. That's why I liked the idea of storing in git with the code. | 16:24 |
ankitg | okie ... GIT for ToDo ... Labs for updates ... dev-list for queries ... | 16:24 |
paulproteus | I appreciate that you're writing comments, but you don't necessarily have to write a comment for each line if it's clear from the code what that line does. | 16:24 |
paulproteus | ankitg, Great. | 16:25 |
paulproteus | Take this line: | 16:25 |
paulproteus | 31 for line in zfile: #Iterates through all the | 16:25 |
paulproteus | First of all, the comment is truncated. Secondly, it's clear from the "for line in zfile:" what the line does. | 16:26 |
paulproteus | Comments are best-used when you will need to explain not *what* is happening, but *why*. | 16:26 |
ankitg | fine, will remove those comments ... | 16:26 |
paulproteus | I think that's mostly all I have to say as far as the code review, at least for now. Surely once you have those things fixed, I'll have more to say. | 16:28 |
ankitg | okie, lets get this done first then. | 16:30 |
paulproteus | Beyond the code review portion of the conversation, I want to make sure you don't lose sight of the research angle of this project. | 16:31 |
paulproteus | Beyond cleaning up your Python, the most important thing for you to do is to think about the results you generated and consider all the possible reasons a license "change" would be seen, and see if we can somehow filter out what we don't want from what we do. | 16:31 |
ankitg | well one possible reason for so many URLS is that each URL is like $var1.com/.../.../../something1.html and $var1.com/.../.../../something2.html ... which leads to many entries ... | 16:36 |
paulproteus | Right - it would be nice if the data could be post-processed to point out that those share some structure at the start. | 16:38 |
paulproteus | But even if we don't consider that problem, just to explain the data we have: | 16:39 |
paulproteus | What are the possible reasons an image referer could change? | 16:39 |
paulproteus | Since that's what you're measuring. | 16:39 |
ankitg | ah, the caveats for our proxy for measurement ... hmmm ... | 16:40 |
paulproteus | I don't necessarily want an answer right away. | 16:42 |
paulproteus | But I think that writing a few paragraphs about that (for labs-dot and cc-devel, let's say) should be your highest priority. | 16:42 |
paulproteus | We can further discuss it there, and I'm sure Mike will have some ideas, too. | 16:43 |
ankitg | okie ... will think about this and make a post to labs and list ... | 16:44 |
paulproteus | Okay. | 16:47 |
paulproteus | Anything else to discuss before I get dressed for the day? | 16:47 |
ankitg | I think that's enough To Dos for now, thank you very much! (-: | 16:47 |
paulproteus | Okay, great! | 16:48 |
*** soma has quit IRC | 16:52 | |
* ankitg heads out for a late dinner ... | 16:54 | |
*** [mharrison] has joined #cc | 16:56 | |
*** ankitg has quit IRC | 17:28 | |
*** balor has quit IRC | 17:46 | |
*** nathany has joined #cc | 17:48 | |
*** nathany has quit IRC | 18:00 | |
*** nathany has joined #cc | 18:01 | |
*** [mharrison] has quit IRC | 18:10 | |
paulproteus | Howdy nathany. | 18:33 |
nathany | morning paulproteus | 18:33 |
paulproteus | I got up early to take a tour of the SF Friends School opening near us. | 18:34 |
paulproteus | Right next door to me. | 18:34 |
paulproteus | In the old Levi Strauss building. | 18:34 |
nathany | ah, cool | 18:34 |
nathany | is it friendly? | 18:34 |
paulproteus | Yeah, it's great. | 18:34 |
paulproteus | It's probably still open; you might want to go take a look and chat with the teachers there. | 18:34 |
nathany | i'm at the office | 18:35 |
paulproteus | Oh. | 18:35 |
nathany | (working) | 18:35 |
paulproteus | How long are you going to be there? | 18:36 |
paulproteus | The day, basically? | 18:36 |
nathany | at least until 1 or 2... possibly longer | 18:37 |
nathany | Richard is here hanging out with me, i'm just trying to get some traction on search | 18:37 |
paulproteus | Oh, cool. | 18:37 |
paulproteus | I was going to say I might join you if it's okay if you don't mind me not doing CC stuff. | 18:37 |
nathany | (and getting distracted by seeing about binding Mylyn to Roundup) | 18:37 |
paulproteus | Glad you have some company at least. | 18:37 |
nathany | i don't care... we'll be here at least another hour or two | 18:37 |
nathany | btw, the roundup mail interface is quite handy | 18:38 |
paulproteus | Yay! | 18:40 |
*** rohitj has joined #cc | 18:52 | |
*** balor has joined #cc | 19:05 | |
*** balor has quit IRC | 19:09 | |
*** ereslibre has quit IRC | 19:38 | |
*** bring2towels has joined #cc | 19:59 | |
*** bringatowel has quit IRC | 19:59 | |
*** UltraMagnus has joined #cc | 20:35 | |
*** isforinsects has joined #cc | 20:54 | |
*** nathany has quit IRC | 21:00 | |
*** b52 has joined #cc | 21:18 | |
*** b52 has left #cc | 21:18 | |
*** pktck has joined #cc | 21:18 | |
*** pktck has quit IRC | 21:27 | |
*** pktck has joined #cc | 21:28 | |
*** rohitj has quit IRC | 21:33 | |
*** pktck has quit IRC | 21:36 | |
*** pktck has joined #cc | 21:39 | |
*** ankitg has joined #cc | 22:21 | |
*** UltraMagnus has quit IRC | 22:41 | |
*** K`Tetch_U has joined #cc | 22:44 | |
*** K`Tetch has quit IRC | 22:54 | |
*** isforinsects has quit IRC | 23:57 |
Generated by irclog2html.py 2.6 by Marius Gedminas - find it at mg.pov.lt!