Thread

  1. Re: [v9.1] sepgsql - userspace access vector cache

    Kohei Kaigai <kohei.kaigai@emea.nec.com> — 2011-07-21T09:29:40Z

    The attached patch is revised userspace-avc patch.
    
    List of updates:
    - The GUC of sepgsql.avc_threshold was removed.
    - "char *ucontext" of avc_cache was replaced by "bool tcontext_is_valid".
    - Comments added onto static variables
    - Comments of sepgsql_avc_unlabeled() was revised.
    - Comments of sepgsql_avc_compute() was simplified.
    - Comments of sepgsql_avc_check_perms_label() also mention about
      permissive domain, that performs similar to system's permissive mode.
    - selinux_status_close() become invoked on on_proc_exit() hook.
    
    Thanks,
    --
    NEC Europe Ltd, SAP Global Competence Center
    KaiGai Kohei <kohei.kaigai@emea.nec.com>
    
    
    > -----Original Message-----
    > From: Kohei Kaigai
    > Sent: 20. Juli 2011 17:04
    > To: 'Yeb Havinga'; Kohei KaiGai
    > Cc: Robert Haas; PgHacker
    > Subject: RE: [HACKERS] [v9.1] sepgsql - userspace access vector cache
    > 
    > Yeb, Thanks for your volunteering.
    > 
    > > On 2011-07-14 21:46, Kohei KaiGai wrote:
    > > > Sorry, the syscache part was mixed to contrib/sepgsql part
    > > > in my previous post.
    > > > Please see the attached revision.
    > > >
    > > > Although its functionality is enough simple (it just reduces
    > > > number of system-call invocation), its performance
    > > > improvement is obvious.
    > > > So, I hope someone to volunteer to review these patches.
    > > This is a review of the two userspace access vector cache patches for
    > > SE-PostgreSQL. Proofreading the source I've seen mostly cosmetic things.
    > > Though I have a few questions, I think the overal shape of the patch is
    > > good enough to mark it ready for comitter.
    > >
    > > Remarks:
    > >
    > > * The patches apply cleanly, compile cleanly on Fedora 15. It depends on
    > > libselinux-2.0.99, and that is checked on by the configure script: good.
    > >
    > > * I run SELECT sepgsql_restorecon(NULL) and saw comparable numbers in
    > > speed gain and also backend process memory increase as indicated by
    > > KaiGai-san. The vmsize without the patch increases from running
    > > restorecon 3864kB, with the patch is increases 8160kB, a difference of
    > > ~5MB. Where this change comes from is unclear to me. The avc_cache holds
    > > only 7 entries, and the avc memory context stats indicate it's only at
    > > 8kB. Investigation into the SECLABELOID syscache revealed that even when
    > > that cache was set to a #buckets of 2, still after restorecon() the
    > > backend's vmsize increased about the ~5MB more.
    > >
    > The sepgsql_restorecon(NULL) assigns default security label on all the
    > database objects being controlled, thus, its workload caches security
    > label (including text data) of these objects.
    > So, ~5MB of difference is an upper limit of syscache usage because of
    > SECLABELOID.
    > The number of buckets is independent from the memory consumption.
    > 
    > > * The size of SECLABELOID is currently set to 2048, which is equal to
    > > sizes of the pg_proc and pg_attribute caches and hence makes sense. The
    > > initial contents of pg_seclabel is 3346 entries. Just to get some idea
    > > what the cachesize means for restorecon performance I tested some
    > > different values (debugging was on). Until a size of 256, restorecon had
    > > comparable performance. Small drop from ~415 to ~425 from 128 to 64.
    > > Cache of 32 had ~435ms performance. Cache of 2 had 680ms. Without
    > > debugging symbols, I also got a slightly better performance on the
    > > restorecon call with a 1024 syscache size. This might be something to
    > > tweak in later versions, when there is more experience what this cache
    > > size means for performance on real databases.
    > >
    > Thanks for your research.
    > The reason why I set 2048 is just a copy from the catalog which may
    > hold biggest number of entries (pg_proc). It may be improved later.
    > 
    > > * The cache is called userspace, presumably because it isn't cached in
    > > shared memory? If sepgsql is targeted at installations with many
    > > different kinds of clients (having different subject contexts), or
    > > access different objects, this is a good choice.
    > >
    > SELinux's kernel implementation also has access vector cache:
    >   http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=blob;f=security/selinux/avc.c
    > 
    > The reason why I didn't choose shared memory is the length of security
    > label text is variable, so it is not feasible to acquire a fixed length
    > region on shared memory in startup time.
    > 
    > > * Checked that the cache keeps working after errors, ok.
    > >
    > > * uavc.c defines some static variables like lru_hint, threshold and
    > > unlabeled. It would be nice to have a small comment with each one that
    > > says what the variable represents (like is done for the avc_cache struct
    > > members right above it)
    > >
    > OK, I'll add comments here.
    > 
    > > * I had to google SELINUX_AVD_FLAGS_PERMISSIVE to understand what was
    > > going on. Since the goal why it is recorded a domain is permissive, is
    > > to prevent flooding the log, maybe renaming avc_cache.permissive into
    > > avc_cache.log_violations_once would explain itself.
    > >
    > It is a bit concern for me, because the permissive domain means it shall
    > be performed like as system is configured as permissive mode.
    > The behavior of log-violation-at-once is a property of permissive mode.
    > So, how about an idea to add comments about permissive domain around
    > the code to modify cache->allowed?
    > 
    > > * selinux_status_open() is called and it's man page mentions
    > > selinux_status_close(). It currently unmaps memory and closes a file
    > > descriptor, which is done on process exit anyway. I don't have enough
    > > experience with these kinds of system calls to have a feeling whether it
    > > might change in the future (and in such cases I'd have added a call to
    > > selinux_status_close() on proc_exit, just to be on the safe side).
    > >
    > I omitted it because OS will handle these things correctly on process
    > Exit time, however, it is good idea to move on safe side.
    > 
    > > * sepgsel_avs_unlabeled(). I was confused a bit because objects can have
    > > a label 'unlabeled', and the comments use wording like 'when unlabeled'.
    > > Does this mean with no label, or with a label 'unlabeled'?
    > >
    > It means object has no label. I'll fix this comment.
    > 
    > > * sepgsql_avc_compute(): the large comment in the beginning is hard to
    > > follow since sentences seem to have been a bit mixed up.
    > >
    > OK, I'll simplify the comment. How about your feeling about?
    > 
    >    /*
    >     * Validation check of the supplied security context.
    >     * Because it always invoke system-call, frequent check should be avoided.
    >     * Unless security policy is reloaded, validation status shall be kept,
    >     * so we also cache whether the supplied security context was valid, or not.
    >     */
    > 
    > > * question: why is ucontext stored in avc_cache?
    > >
    > The 'tcontext' has to be kept on avc_cache as-is, even if the security label
    > is not valid according to security_check_context_raw(), because it is used to
    > hash-key.
    > If your point is that validation status should be kept in bool, not char *,
    > it is fair enough for me.
    > 
    > > * there is a new guc parameter for the user: avc_threshold, which is
    > > also documented. My initial question after reading the description was:
    > > what are the 'items' and how can I see current usage / what are numbers
    > > to expect? Without that knowledge any educated change of that parameter
    > > would be impossible. Would it be possible to remove this guc?
    > >
    > Hmm, it might require users deep knowledge about inside of sepgsql,
    > although it is useful to few situations.
    > OK, I'll remove it in this version.
    > 
    > > * avc_init has magic numbers 384, 4096. Maybe these can be defines (with
    > > a small comment what it is?)
    > >
    > I'll add a comment around static variable of avc_threshold.
    > 
    > > * Finally, IMO the call sites, e.g. check_relation_priviliges, have
    > > improved in readability with this patch.
    > >
    > 
    > Regarding to the point you suggested, I'll revise my patch within a day.
    > 
    > Thanks,
    > --
    > NEC Europe Ltd, SAP Global Competence Center
    > KaiGai Kohei <kohei.kaigai@emea.nec.com>