Replies: 20 comments
-
Also moving discussion from #1120 over here since that thread is already so long 🙂Ping @rkdarst and @Sefriol . I guess with @Sefriol 's PR (#1186) the concern with being able to list and read other students' feedback is alleviated (with the exception of #1202), now that the feedback directory in the exchange is not listable and the hash is computed from the cache. @Sefriol brought up the additional concern that:
I think with #1200 this should be ok now, since the hash is now computed from the student ID and the original notebook. You can try to override the student id if you want, but it won't work unless you actually have the original notebook submitted by that student. |
Beta Was this translation helpful? Give feedback.
-
hubshare seems pretty dead project to me or is there any other alternatives on jupyterlab side? Anyway, I think using cache and random string seems like the best option to me, unless we really want to store additional information into the hash. |
Beta Was this translation helpful? Give feedback.
-
One interesting thing is that, if grading is a function of only the input state, knowing the notebook (plus the tests from your own notebook) is the same as knowing the grade. Except manual grading. And I doubt data protection authorities would be impressed by that argument. Anyway I don't consider this a top-priority problem, but maybe I should... We can almost use the random string added to the filename when it's submitted as a token to retrieve it (either directly or part of the nb hash). But it's not stored in the student cache (per #1104, if it was students could more easily find their submitted assignments and modify them). One could add two random strings, one to be used when retrieving and one not saved... |
Beta Was this translation helpful? Give feedback.
-
Wait... the latest algorithm hashes the timestamp, right? Which is stored only in the cache, not in student dir. So student would have to share more than just the notebook in order to have someone else able to compete the hash... there's not a lot of entropy but it's not small either. So actually it's not that bad. (actually almost good!) Thinking a bit more, if we process a timestamp.txt in the submission dir, it's not that far off to add a feedbacktoken.txt file there, too, which gets treated the same way as the timestamp, and since the name is obvious it's up to student to not reveal it. |
Beta Was this translation helpful? Give feedback.
-
It's not dead, just delayed... I inquired about the status a few months ago and there are still plans to work on it. There's nothing separate on the JuptyerLab side; hubshare is the official Jupyter plan for how to do filesharing on JupyterHub.
Well, if there are hidden tests then the grades also wouldn't be a function of just the input state.
Yes, it definitely adds an additional barrier to entry. And I guess you're right, if we added a secret token it would act basically like a shared secret between the student and the instructor. Other student's wouldn't be able to (easily) guess it and so wouldn't be able to read any feedback files without explicitly having the file, the exact time it was submitted, and the token (I guess they could try to brute force it, but if we make it long enough it shouldn't be an issue). So the main way someone's feedback could be leaked is if they explicitly gave the token away too. I think that's a reasonable compromise for the moment. If you're happy with that solution, would you like to have a go at implementing it? (I'm unfortunately not going to have much time to work on things for the next few weeks, though I might be able to next weekend) |
Beta Was this translation helpful? Give feedback.
-
This problem becomes significantly less if we move to an API-based exchange, with back-end database: having said all that, we've not yet looked at implementing the feedback code in our local external exchange code yet :oops: |
Beta Was this translation helpful? Give feedback.
-
Hi, I think the grading should be deterministic, even if there are hidden tests. If the code between two notebooks is exactly the same, character for character, surely it would have the same outcome for all tests, hidden or not? Something I suggested in the hackathon in Edinburgh was to add encryption to the submission and feedback mechanism (but I have not implemented yet). The way it could work is that the instructor includes a public key in the metadata of the notebook in the "generate" step, the student uses this key to encrypt the notebook before submitting it, and before that adds a feedback public key to the metadata of the notebook. The instructor decrypts the submitted notebook, grades and generate the feedback, this feedback is then encrypted with the public key of the student and send out. The advantages of this is that
Unfortunately I don't have time this week to have a look at this. |
Beta Was this translation helpful? Give feedback.
-
API-like system would have many advantages, allow many additional features (for example realtime grading) and possibility to use already existing solutions for python-code grading (i.e. fighting against plagiarism etc), but that would require pretty much rewriting the whole thing from scratch (not quite, but not far off. I don't think that there is much to reuse. Even UI things should be moved to JupyterLab). Encryption has it's positives and negatives. I currently like how instructors can debug the system quite easily by going over certain directories. Doing that at the same time with encryption is quite a time consuming task since you would have to make a solution for dealing with private/public keys at the same time (most likely using database entries). How would you handle the public/private key storage? Store them in the database and provide API for students to receive their own public key? Other than that it would be quite nice solution, since privacy and data security are quite lacking in the current solution. |
Beta Was this translation helpful? Give feedback.
-
Actually, we're already using one. There is a plan to push the code into the public domain, however our time keeps getting distracted by people wanting things done |
Beta Was this translation helpful? Give feedback.
-
I agree debugging would be made more difficult, but it would not be too bad: to publish assignment there is only a change in metadata in the notebook. One step is added when collecting, first we collect, say in a folder "collected_encrypted" then we decrypt into the "collected" folder and from there it follows the normal path, fully decrypted. THe feedback generation is the same as before but with the encryption step at the very end, so we still have a chance to look at the feedback generated before it is sent off to the students. The key management is a more tricky issue. The encryption key for submission could be part of the assignment data in the nbgrader database. For the students key one could use the same for all submission and have it stored somewhere in the student's home directory. I think the same key could be used for all submissions. |
Beta Was this translation helpful? Give feedback.
-
One more thought about students accessing other student's feedback: if students can access other feedback, they still cannot know who the feedback was for. So as long as the number of students is large enough there would still be some anonymity. One could even argue for publishing all feedbacks, so students can learn from other people's mistakes or solutions... |
Beta Was this translation helpful? Give feedback.
-
Pretty much my point. nbgrader was not designed API-service in mind. In general I don't think it would be that complicated to implement (time consuming, most likely), but in the end that would lead into redoing the whole extension (especially when combined with the JupyterLab translation). But in the end, I would say that it might be a good solution choice either way. For example, we are already using external grading server for autograding the notebooks, since it takes too much resources compared to normal instructor instance. |
Beta Was this translation helpful? Give feedback.
-
This thread has moved to a whole other topic, it seems. If nbgrader is used with separate, feedback is already designed to be secure (just could use a bit more entropy, but it's OK for now). And if a student doesn't send an exact copy of their notebook file AND timestamp file, there is chance of other feedback being found. Having dedicated randomness for the hash would be in line with [Kerckhoff's principle(https://en.wikipedia.org/wiki/Kerckhoffs%27s_principle).
Our setup using kubernetes+actual users has no known security problems using current nbgrader (except the minor entropy consideration above). In a certain way I actually quite like the current exchange for its elegant use of standards. If there is a setup where every user has the same filesystem user ID, then yes, everything above is not true. It's nice to discuss encryption in theory, but if you ask me that's trying a bit too much to work around the problem. Even if there is encryption, as long as student has same uid as instructors, then students can also delete files, alter the released files, release their own assignments, and so on (removing formgrader/config is security by obscurity). I think as long as there is a filesystem-based exchange, it can only be considered secure as long as instructors have different UIDs from students (and then everything works nicely and securely). If not true, nothing will work. So in short, the world (cloud deployments) has sort of moved beyond the concept of users with their own uid, so there's just a bit of a culture mismatch now, not some fundamental flaw with the old generation. I don't think there is need to do such big changes to the filesystem-based exchange, and instead wait for an API-based exchange (modern solution to a modern world). But of course I'm not the one doing anything about it! ... perhaps we could add some instructions for changing UIDs of the formgrader service notebooks? |
Beta Was this translation helpful? Give feedback.
-
Again, not true.... the fix is to fork nbgrader, and modify There is a piece of work to make the exchange service a ..... however, as @rkdarst says, we're drifting off target here |
Beta Was this translation helpful? Give feedback.
-
One hackish idea I had to solve this problem was to add 50 extra decimal digits to the end of the timestamp microseconds. That provides enough entropy that it could be considered secure, and requires no other changes (assuming that all the parsers can handle the excess digits and ignore them). They should probably be written only to timestamp.txt and not used as part of the filenames. I don't really like the idea since it's repurposing something which is not obviously security related into something security related. But the simplicity is nice. |
Beta Was this translation helpful? Give feedback.
-
There are actually libraries which use processor cycles to create timestamps (in addition to normal timestamp). I don't know how this works however when there multiple processors and kubernetes in the mix. |
Beta Was this translation helpful? Give feedback.
-
Once you push the exchange through an API, you can do stuff like check user IDs [feedback is for a notebook belonging to user |
Beta Was this translation helpful? Give feedback.
-
One of the problems with current feedback system if instructor is ever required to change the contents of the notebook. |
Beta Was this translation helpful? Give feedback.
-
This is a problem anyway... If a lecturer releases an updated assignment after a student has fetched it, the update won't show. ..... and even if it did, you now have the problem that there's the original released version, the student worked-on version, and now a changed-instructor version - how do you resolve the differences? |
Beta Was this translation helpful? Give feedback.
-
Yeah, basically to roll updates to the students, we would need a system where:
|
Beta Was this translation helpful? Give feedback.
-
@rkdarst noted in #1200:
As I responded there, I'm not really sure of a way around this access issue, barring having better access controls (which we should have once hubshare exists...). But I'm opening an issue here to at least track that this is a problem, and maybe brainstorm ways to deal with it.
Beta Was this translation helpful? Give feedback.
All reactions