2017-08-29 LFS GSoC summary I've started with the goal of resolving the LFS bugs and modernizing the codebase which was using deprecated APIs like tsleep. Significant bug fixes were: - Found and helped diagnose buffer overflow under unusual circumstances (heavy use under COMPAT_LINUX): fix buffer overflow/KASSERT when cookies are supplied cvs rdiff -u -r1.49 -r1.50 src/sys/ufs/lfs/ulfs_vnops.c - Helped discover lock reversal between lfs_writer_enter and lfs_seglock, the primary cause of deadlocks in LFS. This is detailed in the following bug report: http://gnats.netbsd.org/cgi-bin/query-pr-single.pl?number=52301 Unfortunately, the fix for it, while it works, introduced an extreme performance regression. I've had to revert it and look for causes for the performance regressions. I suspect this is because we wait for disk operations to fully occur while MARK_VNODE'd, and so letting go of all marked vnodes can take a long time. The implicit use of KERNEL_LOCK by the filesystem not being marked MPSAFE is making things seem worse. - Discovered insufficient locking in manipulating on-disk inode data, causing a data race. this is causing asserts but also appears as fsck inconsistencies, as the number of blocks doesn't match. This appears as KASSERTS about truncating to zero, but having more than zero effective blocks by the end of lfs_truncate, when e.g. running firefox. This is because lfs_writevnodes and other iterators on vnodes are not holding vn_lock (it's necessary while manipulating any lfs_dino_*). Attempting to fix the above, I've established a justified theory for why the sane "locking" order not requiring restarting or juggling locks is vn_lock -> lfs_writer_enter -> lfs_seglock. This is detailed more in: http://coypu.sdf.org/2017-08-21-LFS It's also documented in a bug report: http://gnats.netbsd.org/cgi-bin/query-pr-single.pl?number=52510 Unfortunately, fixing this is not immediate - their callers also hold vn_lock sometimes. this is a patch starting work on it: http://coypu.sdf.org/lfs-vnlock-writevnodes2 Besides some callers still holding vn_lock for one file, we now run into deadlocks. I *suspect* this is because we violate the lock ordering, and grab vn_lock after grabbing seglock. Some scheme to juggle locks when it turns out we can't grab all of them, while still maintaining filesystem consistency, must be devised. I've also made the following cleanup commits and modernizing code, as well as adding and removing comments clarifying things: - Renamed i_flag to i_state, as "flags" exists as well and was the cause of mistakes that only by coincidence did not result in very bad bugs - Replaced many users of mtsleep and tsleep with condvars. - Remove uses of splbio, no longer necessary. - Wrote XXX comments about some more flaws I ran into, but didn't start investigating. All in all, I've made many separate commits to LFS related code to NetBSD src during GSoC and a little before the official start. they are individually visible in the following link: https://v4.freshbsd.org/search?q=lfs&committer%5B%5D=maya&sort=commit_date